Why We're Sharing This
Most betting analytics platforms treat their models as black boxes. They show you projections but never explain how those numbers were generated.
We think that's backwards. If you're trusting our projections to inform your betting decisions, you deserve to know:
- What data we use
- How we process it
- Why certain players get certain projections
- Where our models perform well (and where they struggle)
This isn't marketing speak. This is the technical reality of how THE LINEUP works.
Features per prediction
Models in production
Seasons of training data
The Core Problem We're Solving
Player performance prediction is hard because:
- High variance: Even elite players have bad games
- Context matters: Matchups, pace, home/away, rest days all affect output
- Line movements: Props move based on information we may not have
- Injuries: Missing teammates change usage patterns
Books set lines based on these factors. To find +EV, we need to model them better.
Our Approach: Stat-Specific Tiered Models
The key insight: A player's tier for points is different from their tier for rebounds.
LeBron is an elite scorer AND an elite rebounder. But a defensive center averaging 8 rebounds might only score 6 points. Using minutes played to group players fails because it treats all stats the same.
Our solution: Separate tier classifications for each stat.
How Tiers Are Assigned
| Stat | Elite Threshold | Solid Threshold | What Elite Means | |------|-----------------|-----------------|------------------| | Points | 22+ PPG | 12+ PPG | Primary scorers | | Rebounds | 8+ RPG | 5+ RPG | Dominant rebounders | | Assists | 6+ APG | 3+ APG | Primary playmakers | | Steals | 1.3+ SPG | 0.8+ SPG | Perimeter disruptors | | Blocks | 1.2+ BPG | 0.6+ BPG | Rim protectors | | 3PM | 2.5+ per game | 1.2+ per game | Volume shooters |
Why This Matters
Improvement from Stat-Specific Tiering
The old approach under-predicted elite scorers by 25-30% because they got grouped with 15 PPG players in the same "stars" tier. Now, Shai Gilgeous-Alexander gets predicted using a model trained only on 22+ PPG scorers.
The 104 Features We Use
Every prediction uses 104 carefully selected features:
Recent Performance (Rolling Windows)
- Last 3 games average (L3)
- Last 5 games average (L5)
- Last 10 games average (L10)
- Season average
- Variance and consistency metrics
- Hot/cold form indicators
Contextual Factors
- Opponent defensive rating vs. position
- Pace of play (possessions per game)
- Home/away splits
- Rest days since last game
- Back-to-back indicator
- Minutes analysis
Injury-Aware Features (v3.8)
- Teammate injury impact on usage
- Recovery trajectory tracking
- Minutes restriction indicators
- Lineup changes due to injuries
Line Movement Features (v3.8)
- Opening vs. current line difference
- Movement direction and magnitude
- Sharp money indicators
- Consensus vs. outlier lines
How a Projection Gets Made
Here's the actual flow for generating a projection:
Step 1: Feature Assembly
Player: Jayson Tatum
Game: vs. Lakers (away)
Features: L3=28.2, L5=27.8, L10=26.4, season=26.1
Context: LAL allows 24.3 PPG to SFs, pace=100.2, rest=1 day
Step 2: Tier Classification
PTS tier: Elite (26.1 > 22.0 threshold)
REB tier: Solid (8.5 > 5.0 but < 8.0)
AST tier: Solid (4.8 > 3.0 but < 6.0)
Step 3: Model Selection
Points: Load pts_elite_v3.8_production.pkl
Rebounds: Load reb_solid_v3.8_production.pkl
Assists: Load ast_solid_v3.8_production.pkl
Step 4: Prediction Generation
Points projection: 25.8
Rebounds projection: 8.2
Assists projection: 4.6
Step 5: Confidence Interval
PTS range: 19.2 - 32.4 (model uncertainty + variance)
The Algorithm: LightGBM
We use LightGBM (Light Gradient Boosting Machine), not because it's trendy, but because:
- Handles mixed feature types well - Numeric stats + categorical context
- Fast training and inference - We retrain daily, need speed
- Robust to overfitting - Regularization built-in
- Interpretable - We can see which features matter most
We evaluated random forests, XGBoost, and neural networks. LightGBM consistently performed best on our validation sets while being 3-5x faster to train.
Where We Struggle (Honest Assessment)
No model is perfect. Here's where ours has known weaknesses:
1. Blowouts
- If a game becomes lopsided, starters sit in the 4th quarter
- We don't predict game flow (yet), so our minutes assumptions can be wrong
2. Breaking News
- Trade deadline, unexpected injuries, load management
- Our features lag by at least one game
3. Low-Volume Stats
- Steals and blocks are highly variable game-to-game
- Even 60-70% hit rates mean 30-40% misses
4. First Games Back
- Players returning from injury have unpredictable minutes
- We weight recent form, which may be stale
How We Measure Accuracy
We track accuracy publicly on our accuracy dashboard. Key metrics:
- MAE (Mean Absolute Error): Average prediction miss in raw stat units
- Hit Rate: Percentage of over/under calls that were correct
- Calibration: How well our confidence ranges match reality
We don't cherry-pick results. Every projection we make gets tracked and graded automatically when games complete.
Overall MAE Improvement
Low REB MAE Improvement
Low BLK MAE Improvement
Low AST MAE Improvement
The Daily Pipeline
Every day, our system:
- 6:00 AM - Sync overnight game results
- 6:15 AM - Update player rolling averages
- 6:30 AM - Recalculate tier assignments
- 7:00 AM - Generate projections for upcoming games
- 7:30 AM - Compare projections to current prop lines
- 8:00 AM - Identify +EV opportunities
- 11:00 PM - Auto-settle yesterday's picks, calculate CLV
This runs automatically. No human intervention unless something breaks.
What's Next: Continuous Improvement
We're continuously improving. Current experiments:
- Lineup-aware projections - Adjust for specific 5-man rotations
- Real-time game state - Factor in current score and time
- Expanded sports - NFL and NHL using similar architecture
- Opponent player props - How do defenders affect specific stats?
We validate everything before production. Most experiments fail. The ones that work become features.
Why Transparency Matters
Our Philosophy
If we can't explain why a projection is what it is, we shouldn't be confident in it. Transparency isn't just good ethics - it's good modeling practice. By showing our work, we invite scrutiny that makes us better.
Other platforms might have good models. But if they won't tell you how they work, how can you trust them when they're wrong?
We believe in showing our hit rates, explaining our methodology, and being honest about our limitations. That's what makes THE LINEUP different.
See our accuracy in action: View the public accuracy dashboard - updated daily with real results.
Ready to try it?: Get today's projections and see our ML models in action.