status: proposed date: 2026-03-16
ADR-0043: Player Count Sentiment Model Improvements
Context and Problem Statement
The current PlayerCountPoll model (ADR-0010) uses BGG’s three-tier voting system: for each player count, voters choose one of Best, Recommended, or Not Recommended. This system was adopted for BGG migration compatibility, but it has fundamental statistical flaws that the commons group should address as the specification matures.
Known Flaws in the Three-Tier Model
1. Overlapping categories. “Best” is conceptually a subset of “Recommended” – if a player count is the best, it is by definition recommended. But the poll treats them as mutually exclusive choices. A voter who considers 3-player the best experience cannot simultaneously mark it as recommended. The categories answer different questions (“Is this the ideal count?” vs “Is this count good?”) but force a single answer.
2. Missing middle ground. There is no option between “Recommended” and “Not Recommended.” A player count that is playable but mediocre – works fine, wouldn’t seek it out – has no natural home. Voters are forced to round up to “Recommended” or round down to “Not Recommended,” inflating or deflating the signal.
3. Anchoring bias. The boundary between “Best” and “Recommended” is entirely subjective and varies per voter. One voter’s “Best” is another’s “Recommended.” There is no calibration mechanism – unlike the weight scale (which has anchor games), the poll categories have no reference points.
4. Forced ranking. A voter who thinks 3-player and 4-player are equally excellent must choose “Best” for one and “Recommended” for the other. The model cannot express ties at the top. This creates artificial differentiation where none exists in the voter’s actual opinion.
5. Non-independence across player counts. A voter’s responses at different player counts are not independent decisions. Voters mentally rank all player counts, then map that ranking onto three buckets. The three-tier model treats each player count as an independent poll, but the data-generating process is inherently comparative.
6. Aggregation artifacts. A game where 80% of voters say “Best at 3” and a different 80% say “Best at 4” appears to have two equally “best” counts. But no individual voter may actually consider both counts equally best – the aggregate masks disagreement. Without per-voter data, the source of the pattern is unrecoverable.
Decision Drivers
- BGG migration requires preserving three-tier data for the foreseeable future – any improvement must be backward-compatible
- Statistical soundness: the replacement model should produce data amenable to standard statistical analysis (means, medians, distributions)
- UI simplicity: the voting interface must be intuitive for casual users, not just statisticians
- Community adoption: the model must be easy to contribute to – a complex system that nobody uses is worse than a flawed one with millions of votes
- The 33+ files across the specification that reference the current model represent significant refactoring cost for any structural change
Considered Options
Option A: Numeric Rating Per Player Count (1-5 Scale)
Each voter independently rates each supported player count on a 1-5 scale:
| Player Count | Your Rating |
|---|---|
| 1 | 2 / 5 |
| 2 | 4 / 5 |
| 3 | 5 / 5 |
| 4 | 5 / 5 |
| 5 | 3 / 5 |
Strengths:
- Produces real numeric distributions (mean, median, std dev, percentiles) per player count
- A voter CAN rate 3p and 4p both 5/5 – no forced ranking
- Aligns with how BGG already handles overall game ratings (the 1-10 scale)
- Independent per player count – no cross-count comparison forced
- Standard statistical tools apply directly
Weaknesses:
- Requires a new UI paradigm (5-point scale per count vs single radio button)
- Not backward-compatible with existing BGG three-tier data
- Scale calibration: what does “3 out of 5” mean? Needs anchor definitions.
Option B: Pairwise Preference / Ranked Choice
Voters rank all supported player counts from best to worst. Aggregation uses a Condorcet method, Borda count, or similar social choice function.
Strengths:
- Most statistically rigorous – captures full preference ordering
- No category overlap or forced bucketing
- Well-studied aggregation methods from voting theory
Weaknesses:
- Complex to aggregate and explain to users
- Difficult UI for games with wide player ranges (ranking 1-8 is tedious)
- Unfamiliar paradigm – most users have never seen ranked-choice voting for board game data
- No established board game community uses this approach
Option C: Binary Per Count (Would Play / Would Not Play)
For each player count, a single yes/no question: “Would you play this game at this player count?”
Strengths:
- Simplest possible signal – no ambiguity, no overlap
- Eliminates the Best/Recommended boundary problem entirely
- Easy to aggregate: percentage of “yes” votes per count
Weaknesses:
- Loses all granularity between “great” and “fine”
- Cannot distinguish “best at 3” from “acceptable at 3”
- The filtering use case (“best at exactly 3”) becomes impossible
Option D: Dual-Layer Model
Maintain two parallel data layers:
- Layer 1 (BGG compatibility): The existing three-tier votes (best/recommended/not_recommended). Populated during BGG migration and by voters who prefer the familiar interface.
- Layer 2 (native): A numeric 1-5 rating per player count. The statistically preferred data source for new contributions.
Filtering uses Layer 2 when sufficient data exists, falling back to Layer 1. Over time, as native contributions accumulate, Layer 2 becomes the authoritative source.
Strengths:
- Full backward compatibility – no existing data is lost or invalidated
- Gradual migration path – both layers coexist indefinitely
- Layer 2 produces proper statistical distributions while Layer 1 serves migration needs
Weaknesses:
- Two parallel systems increase complexity for implementations and API consumers
- Unclear when to declare Layer 2 “sufficient” and deprioritize Layer 1
- Voters may be confused by two different rating interfaces
Decision Outcome
Chosen option: “Dual-layer model” (Option D), adopting numeric per-count ratings as the native model with BGG three-tier data preserved as a legacy migration layer.
The specification defines PlayerCountRating as the primary schema: each voter independently rates each supported player count on a 1-5 scale, producing standard statistical distributions (mean, std dev) per count. The BGG three-tier data (Best/Recommended/Not Recommended) is preserved as PlayerCountPollLegacy for migration compatibility (ADR-0032). Filtering and derived fields use the numeric model when available, falling back to converted legacy data.
Numeric per-count ratings were chosen over ranked choice (Option B, too complex for voters) and binary would-play (Option C, loses granularity). The pure numeric approach (Option A) is effectively what the dual-layer model implements as its native layer – Option D simply adds the legacy compatibility that migration requires.
Consequences
- Good, because the native model uses standard numeric data amenable to means, medians, percentiles, and confidence intervals
- Good, because voters can rate multiple player counts equally – no forced ranking, no overlapping categories
- Good, because BGG migration data is preserved without loss – the legacy schema stores the original three-tier votes
- Good, because the specification transparently acknowledges the limitations of the inherited model
- Good, because implementations can convert legacy three-tier data to approximate numeric values for unified querying
- Bad, because two parallel schemas (native + legacy) increase complexity for implementations
- Bad, because the numeric scale lacks anchor definitions (unlike the weight scale) – a future RFC should define what 1-5 means for player count quality
- Bad, because applications built against the old three-tier
PlayerCountPollschema will need to update toPlayerCountRating