391 Rules: What It Takes to Build a Real Bar Prep Taxonomy From Scratch

The single most tedious thing we have built is our MBE rule taxonomy. Five months of work. Three domain-expert reviewers. One abandoned first attempt. 391 rules at the end.

Nobody talks about this work because it is not glamorous. But it is the substrate for everything else in the product. Without it, per-rule mastery tracking is impossible. Without per-rule mastery tracking, targeted practice is impossible. Without targeted practice, the software is just a fancier question dispenser.

Here is what building it actually took.

What a "rule" is, for MBE purposes

The first hard problem is defining what counts as a rule. The answer is not obvious.

Is "hearsay" one rule, or is each hearsay exception a rule, or is each sub-clause of each exception its own rule?

Too coarse and the taxonomy becomes useless — "Evidence" is one rule, essentially. Too fine and it becomes ungrounded — 4,000 sub-clauses that nobody could distinguish reliably.

We landed on a middle level: each hearsay exception is one rule (excited utterance, present sense impression, statement against interest, etc.). The definitional prerequisites of hearsay itself are also one rule ("hearsay definition"). Where a rule has meaningful branches that produce different question types, we split further. Where the sub-clauses are always tested together, we kept them combined.

The final count landed at 391 rules across the seven MBE subjects. That is roughly:

Constitutional Law: 55 rules
Criminal Law and Procedure: 50 rules
Civil Procedure: 60 rules
Contracts (including UCC): 55 rules
Evidence: 65 rules
Real Property: 55 rules
Torts: 50 rules

Note that this is not the only defensible taxonomy. Different domain experts would build slightly different splits. What matters is internal consistency — every question in the bank is tagged to exactly one rule, and every rule tests something that produces recognizable, differentiable questions.

The first attempt failed

Our first taxonomy came from generalizing existing outline structures — MBE prep books, law school outlines, and the NCBE subject specifications. This produced roughly 250 rules.

When we started tagging real released MBE questions against those rules, we discovered a problem: about a third of released questions did not fit cleanly into any single rule. They were testing intersections — hearsay AND relevance, negligence AND causation, warranty AND UCC gap-fillers. The taxonomy was too clean.

We could not just add "intersection rules" as new entries because the combinatorial explosion would produce thousands of nodes. We had to rebuild.

The second attempt

The revised taxonomy took a different approach. Every question is tagged to a primary rule — the doctrinal rule the question is most heavily testing — plus zero to two secondary rules — related rules the question also touches.

For the candidate, only the primary rule matters for the mastery dashboard. Secondary rules feed into the recommendation engine that queues related practice: "you have missed three primary-rule questions on Hearsay Definition; you have also missed two secondary-rule questions on it while primarily being tested on Excited Utterance. Practice hearsay definition."

This is more work at the tagging step but produces a much better product.

The tagging work

Each released MBE question we ingested was tagged by a domain expert against the taxonomy. Where two experts disagreed on the primary rule, a third resolved the tie.

The interrater agreement rate on our first pass was 71 percent. That is below what we consider publishable. We iterated on the rule definitions until interrater agreement reached 89 percent, at which point we froze the taxonomy and re-tagged the full question bank.

Roughly 3,400 questions were tagged in the final pass. That is 3,400 expert judgments, times two reviewers per question, times a third judgment on the roughly 11 percent disputed cases.

Cost: about $28,000 in expert time, plus five months of internal engineering to build the tagging tool, the review-and-resolve workflow, and the database schema.

Why nobody else has done this recently

Two reasons.

Incumbents have legacy question libraries with old tags. BarBri, Kaplan, and Themis have subject-level tags on tens of thousands of questions from decades of accumulated content. Re-tagging them at the rule level would be a multi-year, seven-figure project against no obvious revenue impact — nobody was demanding it.

New entrants have not had the domain expertise. Building a good MBE taxonomy requires people who know MBE questions cold. That is a narrow talent pool. Most software startups that want to enter this market do not have that talent on the founding team.

We did. We spent it here.

What this enables

Rule-level tagging is the load-bearing element for three product features:

Per-rule mastery tracking. You see accuracy on 391 rules individually, not seven subject-level averages.

Targeted question queuing. The system pulls questions weighted toward your weakest rules, not just your weakest subjects.

Diagnostic depth. After a 30-question diagnostic, we can tell you the five specific rules you are weakest on — not "you should study more torts."

None of those features are possible with subject-level tagging alone. All three are what candidates using the product say made the biggest difference.

The lesson

Some infrastructure is not glamorous. It is tedious, expensive, and hard to demo. But it is the substrate that makes everything downstream possible. If you are building a serious product in a domain, do the tedious substrate work. It is the moat that lasts.