How Ubunifu Madness Works

A transparent look at the data, models, and methods behind every prediction.

Elo Ratings

Every team starts at 1500. After each game, the winner gains points and the loser drops by the same amount. How many points depends on three things: how likely the win was (upsets move the needle more), the margin of victory, and whether the game was at home or away. We apply a 101.9-point home court advantage and use a K-factor of 21.8 — tuned via Optuna optimization to balance responsiveness with stability.

Between seasons, every team's rating regresses 11% toward 1500. This prevents ratings from inflating over time and accounts for roster turnover. Ratings update daily from ESPN game results using the exact same formula used to process 41 years of historical games.

An average D1 team sits around 1500. Top 25 teams are typically 1800+. The #1 team is usually around 2100. The system has been validated against 4,302 tournament games from 1985 to present.

Strength of Schedule

Strength of schedule (SOS) is the average Elo rating of all opponents a team has faced during the season. A team with a high SOS has been tested against tough competition, while a low SOS suggests a softer schedule. This matters because a 25-5 record against a weak schedule is very different from 25-5 against elite opponents.

SOS is available on the Compare page and through the Madness Agent. It helps contextualize win-loss records and Elo ratings — a team with a high Elo and high SOS has proven themselves against quality opponents.

Conference Strength

Conference rankings use four metrics. Average Elo is the mean rating across all teams in the conference — it tells you overall depth. Non-Conference Win Rate counts how a conference performs against outside opponents in regular season games, removing the noise of intra-conference cannibalization. Top 5 Elo measures elite talent at the top. Parity is the inverse of Elo standard deviation — higher parity means teams are more evenly matched with no weak links.

These metrics refresh automatically as new game results come in. Non-conference win rate is the most telling — it directly measures how a conference performs against the rest of D1.

ML Ensemble Prediction System

Predictions are powered by a V6 ML ensemble model that builds features from live database state — Elo, efficiency, records, rankings — and outputs a calibrated win probability. The model is trained on 165,640 games (regular season, conference tournaments, and NCAA tournaments from 2012–2025) using 43 features.

The model uses game-type context as a feature — it knows whether a game is regular season, conference tournament, or NCAA tournament and adjusts predictions accordingly. Gender-specific post-calibration is applied for conference tournament games: women's predictions are compressed 10% toward 50% to account for higher single-elimination volatility; men's predictions use no compression (calibration analysis showed it was not needed).

If the ML model is unavailable (e.g., during first deployment), the system falls back to a live blend of Elo ratings and SOS-adjusted win records. This ensures every D1 team gets a prediction even without the full ML pipeline.

Tossup Handling

When the blended model's confidence is below 55% — meaning neither team is favored above 55% — the game is labeled a TOSSUP. This is the model being honest: a 54% prediction is barely better than a coin flip, and pretending to have a strong pick would be misleading.

Tossup games still receive a prediction — the model picks whichever side has the higher probability — and they count toward overall accuracy. On the Scores page, tossup games show the model's result (correct or missed) with a yellow "TOSSUP" label so you can see which low-confidence picks landed. The Performance page includes tossups in its accuracy totals.

V6 Model Details

The V6 model is an ensemble of Logistic Regression (37.8%) and LightGBM (62.2%), trained on 165,640 men's and women's games from 2012–2025 across all game types with recency-weighted training (5-season half-life — recent games matter ~7x more). It uses 43 features organized into ten categories:

Elo: Current rating, rating difference, expected win probability
Conference: Average Elo, non-conference win rate, tournament historical win rate
Four Factors: eFG%, turnover rate, offensive rebound rate, free throw rate (and opponent versions)
Efficiency: Offensive and defensive points per 100 possessions, tempo, adjusted efficiency margin
Schedule: Strength of schedule (average opponent Elo)
Rankings: KenPom rank, NET rank, consensus rank from Massey Ordinals
Momentum: Last 10 games win percentage and margin of victory
Game Context: Is conference tournament, is NCAA tournament, is neutral site, rest days difference
Quality: Win percentage vs top-50 Elo teams, Barthag, coach tenure, raw win percentages
Volatility: Margin standard deviation, close game percentage, upset vulnerability index

Smooth isotonic calibration ensures well-distributed probabilities. LR provides stable baselines while LGB captures non-linear interactions. Validation Brier score: 0.139, accuracy: 79.6% on 2023–2026 holdout.

Live Data Pipeline

Historical data comes from Kaggle's March Machine Learning Mania dataset, covering every D1 game from 1985 to present. This seeds the database with initial Elo ratings, conference strength, and team stats.

Once the database is populated, the app runs independently of the CSV files. A daily cron job fetches completed game scores from ESPN, computes Elo updates using our own formula, updates team records, refreshes conference strength metrics, recomputes player stats, and recalculates strength of schedule for every team. ESPN provides the raw scores — every analytical metric is computed by us.

Live scores on the Scores page come directly from ESPN's scoreboard API, enriched with our Elo ratings and model win probabilities. Crucially, every prediction is locked before tipoff — once a game starts, the pre-game prediction is frozen and never updated retroactively. This ensures honest performance tracking. After games finish, the Scores page shows whether the locked prediction was correct, along with a daily accuracy summary.

The Performance page aggregates all locked predictions into cumulative accuracy charts, daily breakdowns, calibration curves, and a game-by-game log. This gives you full transparency into how well the model actually performs over time — no cherry-picking, no retroactive changes.

Madness Agent

The chat agent uses Claude (Anthropic's AI) with tool access to query our entire database in real time. It doesn't just get a static text dump — it actively looks up data to answer your questions. The agent has six tools:

Team Lookup: Search any D1 team by name — get Elo, record, conference, stats, momentum, coach info, and strength of schedule
Matchup Prediction: Get head-to-head win probabilities with stat comparisons for any two teams
Conference Analysis: Conference strength metrics, top teams in each conference
Rankings: Top teams by Elo, filterable by conference
Live Scores: Today's games and results from ESPN
Upset Finder: Identify potential upsets where underdogs have meaningful win probability

When you ask "Who should I pick in a Duke vs. UNC matchup?", the agent calls the matchup prediction tool, pulls real win probabilities and stats, then explains its reasoning with specific numbers. Strict grounding rules prevent the agent from fabricating stats, guessing tournament seeds, or making style claims without checking the data. Every claim must be backed by a tool result — if the data is not available, the agent says so rather than guessing.

Interactive Bracket

The bracket page supports four modes, each offering a different perspective on the tournament field:

My Bracket: Click matchups to make your own picks. Predictions are saved locally and can be synced across devices via email. A Monte Carlo simulation (1,000 runs) shows championship and Final Four probabilities to inform your decisions.
Model Bracket: Every game is decided by the V6 ML ensemble, always picking the statistical favorite (chalk strategy). Generated once after Selection Sunday and permanently locked.
Agent Bracket: The AI agent fills out a bracket using a balanced strategy that introduces calculated upsets where the underdog has at least 35% win probability. Also generated once and locked.
Consensus Bracket: Combines the Model and Agent brackets. Where both agree, the pick is high confidence. Where they disagree, the model's probability breaks the tie. The agreement percentage shows how aligned the two approaches are — typically 70-80% on chalk picks, diverging on the 20-30% of games where upsets are plausible.

All bracket modes support AI-powered per-matchup analysis (click any matchup for a detailed breakdown) and text export for sharing.

Advanced Analytics

Beyond Elo, the Power Rankings surface advanced metrics computed from every game's box score data. Click any team row to see its full analytics profile. All metrics update daily via our ESPN data pipeline.

Adjusted Efficiency (AdjOE / AdjDE / AdjEM): Points per 100 possessions, iteratively adjusted for opponent strength. A team scoring 80 points against an elite defense is more impressive than 80 against a weak one. AdjEM (net margin) is the single best predictor of team quality. Inspired by the methodology popularized by Ken Pomeroy.
Barthag: Win probability against an average D1 team on a neutral court, derived from adjusted efficiency using a Pythagorean formula (AdjOE^11.5 / (AdjOE^11.5 + AdjDE^11.5)). More intuitive than raw Elo — 0.95 means a team would beat 95% of D1 opponents. This concept was introduced by T-Rank.
Luck: Actual win percentage minus Pythagorean expected win percentage (based on total points scored vs. allowed). Positive luck means a team is winning more close games than expected — a signal that performance may regress. Our Pythagorean formula uses an exponent of 9, tuned for college basketball.
Floor / Ceiling: The 10th and 90th percentile of a team's game-by-game net efficiency. This reveals the range of outcomes — a team with a high ceiling but low floor is a classic March Madness dark horse. This metric is unique to Ubunifu Madness.
Upset Vulnerability: A composite score (0–100) combining margin volatility, luck, three-point reliance, and free throw shooting. Higher scores indicate teams more prone to losing games they "should" win. This original metric is exclusive to our platform.
True Shooting %: Overall scoring efficiency capturing 2-point field goals, 3-pointers, and free throws in a single number: PTS / (2 × (FGA + 0.44 × FTA)). More complete than eFG% alone.
Additional metrics: 3-point attempt rate (offensive style), assist-to-turnover ratio (ball movement quality), defensive rebound % (second-chance denial), steal % (perimeter pressure), block % (rim protection), close-game record (clutch performance), and margin consistency (scoring variance).

Equal Coverage: Men's and Women's NCAA

Ubunifu Madness provides full-depth analytics for both men's and women's basketball — same advanced metrics, same prediction model, same UI treatment. Most bracket tools treat women's basketball as an afterthought or ignore it entirely. We believe the women's tournament deserves the same analytical depth.

Every feature works across both tournaments: advanced power rankings, opponent-adjusted efficiency, live scores, bracket builder, head-to-head comparisons, the AI agent, and player stats. A persistent gender toggle in the navigation bar lets you switch seamlessly. Your preference is remembered across pages and sessions.

A note on Elo scales: Men's and women's Elo ratings operate as independent pools. Top women's teams may have higher raw Elo numbers than top men's teams — this reflects different competitive dynamics (historically more dominant programs like UConn and South Carolina in women's basketball). The raw numbers should only be compared within the same gender. Adjusted efficiency and all other analytics are computed independently per gender.

Known Limitations

No injury adjustments — a key player being out is not reflected in predictions
Predictions operate at team level — individual player matchups are not modeled
Women's Massey ordinals are not available from Kaggle, reducing feature coverage for women's static model
Early season ratings carry more uncertainty — Elo and adjusted efficiency stabilize after ~15 games
Men's and women's Elo scales differ in magnitude — compare within gender only
Adjusted efficiency assumes all possessions are equally valuable (no late-game situation modeling)

Built by Richard Pallangyo for the Kaggle March ML Mania 2026 competition. Questions about methodology? The Madness Agent can explain. See also: Terms & Disclaimers.