How Elo Works
What is Elo?
The Elo rating system, originally developed for chess by Arpad Elo, is a method for calculating the relative skill of players. Every driver starts at a rating of 2000. After each race, ratings are adjusted based on finishing position — beating a higher-rated driver gains you more points than beating a lower-rated one.
How It Works in F1
Each race is treated as a series of head-to-head matchups between every pair of drivers. For each pairing:
- The driver who finishes ahead "wins" the matchup
- An expected score is calculated based on the rating difference:
E = 1 / (1 + 10^((opponent_rating - player_rating) / 400)) - The rating adjustment is:
K × (actual_score - expected_score) - Base K-factor is 48, scaled by season length (12.0 reference races) and normalized by √(grid size − 1)
- At each season's end, ratings regress 3% toward the starting Elo of 2000
A driver's total rating change for a race is the sum of adjustments across all opponents. With 20 drivers on the grid, that's 19 head-to-head matchups per race.
Why Elo for F1?
Unlike championship points, Elo ratings account for the strength of the field. Winning against a grid of highly-rated drivers is worth more than winning against weaker competition. This makes it possible to compare drivers across different eras — a 2600-rated driver in 1955 demonstrated the same level of dominance as a 2600-rated driver in 2024.
Across 1149 races and 864 drivers in F1 history, the Elo system produces a natural distribution of skill tiers.
The all-time greats. Drivers who achieved sustained dominance against the strongest competition. Only 6 drivers (0.7%) have reached this level.
Championship contenders and race winners who consistently performed at the highest level. 36 drivers (4.2%).
Solid midfield performers and occasional podium finishers. Competitive drivers who proved themselves against strong fields. 84 drivers (9.7%).
Reliable competitors who carved out respectable F1 careers without reaching the top step. 247 drivers (28.6%).
The largest group — short careers, backmarkers, or drivers who never had the machinery to climb higher. Many only raced a handful of grands prix. 491 drivers (56.8%).
Let's walk through a simplified 4-driver race to see exactly how the math works.
Step 1: Set up the race
Step 2: Calculate K per pair
The K-factor for each head-to-head matchup is adjusted for season length and grid size:
Fewer season races → higher K (each race matters more). Larger grids → lower K per pair (more matchups to sum over).
Step 3: Head-to-head matchups
Every pair of drivers is compared. The higher finisher "wins" the matchup. Here are all 6 pairings:
| Matchup | Winner | Expected | Surprise | Winner Δ | Loser Δ |
|---|---|---|---|---|---|
| Verstappen vs Hamilton | Verstappen | 67.9% | 32.1% | +4.45 | -4.45 |
| Verstappen vs Norris | Verstappen | 72.7% | 27.3% | +3.79 | -3.79 |
| Verstappen vs Alonso | Verstappen | 93.0% | 7.0% | +0.97 | -0.97 |
| Hamilton vs Norris | Hamilton | 55.7% | 44.3% | +6.13 | -6.13 |
| Hamilton vs Alonso | Hamilton | 86.3% | 13.7% | +1.9 | -1.9 |
| Norris vs Alonso | Norris | 83.4% | 16.6% | +2.3 | -2.3 |
Expected = probability the higher-rated driver wins (based on rating gap). Surprise = how unexpected the result was. Beating a much stronger opponent = high surprise = big gain.
Step 4: Sum all adjustments
Each driver's total change is the sum of their gains/losses across all matchups:
Step 5: Season-end regression
At the end of each season, every driver's rating regresses 3% toward the starting Elo of 2000. This prevents ratings from inflating forever and gives returning drivers a slight pull toward the mean:
Key takeaways
- Zero-sum per matchup — the winner's gain exactly equals the loser's loss
- Upsets pay more — a low-rated driver beating a high-rated one gains big; the expected winner gains little
- Every position matters — even P10 vs P11 creates a rating adjustment
- Scale with the grid — with 20 drivers, each race has 190 matchups, producing meaningful Elo swings
Now imagine the same four drivers, but this time Alonso — rated 450 points below Verstappen — wins the race. Verstappen, the highest-rated driver, finishes last. This is where Elo gets interesting.
Alonso's expected win probability against Verstappen is just 7.0%. By winning that matchup, the surprise factor is enormous — and the Elo reward reflects it. Meanwhile Verstappen, expected to beat everyone comfortably, loses all three matchups and pays a steep price.
| Matchup | Winner | Expected | Surprise | Winner Δ | Loser Δ |
|---|---|---|---|---|---|
| Alonso vs Norris | Alonso | 16.6% | 83.4% | +11.55 | -11.55 |
| Alonso vs Hamilton | Alonso | 13.7% | 86.3% | +11.96 | -11.96 |
| Alonso vs Verstappen | Alonso | 7.0% | 93.0% | +12.89 | -12.89 |
| Norris vs Hamilton | Norris | 44.3% | 55.7% | +7.72 | -7.72 |
| Norris vs Verstappen | Norris | 27.3% | 72.7% | +10.07 | -10.07 |
| Hamilton vs Verstappen | Hamilton | 32.1% | 67.9% | +9.41 | -9.41 |
Compare the outcomes
Notice how the same drivers with the same starting Elos produce dramatically different results depending on who finishes where. Alonso's upset win earns him nearly as much as Verstappen loses — the system heavily rewards beating opponents you're not expected to beat.
"vs expected finish" shows the difference compared to the first example, where everyone finished in Elo order. Alonso swings by 41.6 Elo compared to finishing last, while Verstappen's P4 costs him 41.6 more than his expected P1 win.
Circuit Park Zandvoort — 2025 (Round 15)
Average field Elo: 2154 · Grid size: 20
I.Hadjar gained +52.7 by finishing P3 against a field averaging 2154 Elo.
L.Norris dropped -81.0 after finishing P18 — underperforming relative to their 2544 Elo.
| # | Driver | Elo Before | Elo After | Change | ||
|---|---|---|---|---|---|---|
| 1 | O.Piastri |
|
2513 | 2527 | +14.7 | |
| 2 | M.Verstappen |
|
2404 | 2422 | +17.8 | |
| 3 | I.Hadjar |
|
2052 | 2104 | +52.7 | |
| 4 | G.Russell |
|
2393 | 2401 | +7.7 | |
| 5 | A.Albon |
|
2128 | 2160 | +32.0 | |
| 6 | O.Bearman |
|
1993 | 2037 | +43.5 | |
| 7 | L.Stroll |
|
2046 | 2077 | +31.5 | |
| 8 | F.Alonso |
|
2091 | 2111 | +20.2 | |
| 9 | Y.Tsunoda |
|
2041 | 2062 | +21.0 | |
| 10 | E.Ocon |
|
1992 | 2014 | +21.6 | |
| 11 | F.Colapinto |
|
1914 | 1939 | +24.9 | |
| 12 | L.Lawson |
|
1998 | 2008 | +9.8 | |
| 13 | C.Sainz |
|
2090 | 2083 | -7.2 | |
| 14 | N.Hülkenberg |
|
2103 | 2089 | -14.4 | |
| 15 | G.Bortoleto |
|
1963 | 1961 | -2.6 | |
| 16 | A.Antonelli |
|
2045 | 2027 | -18.0 | |
| 17 | P.Gasly |
|
2096 | 2066 | -30.0 | |
| 18 | L.Norris |
|
2544 | 2463 | -81.0 | |
| 19 | C.Leclerc |
|
2374 | 2301 | -73.1 | |
| 20 | L.Hamilton |
|
2297 | 2226 | -70.8 |
Elo is a powerful tool for measuring relative performance, but it has inherent blind spots — especially when applied to a sport as complex as Formula 1. Keep these caveats in mind when interpreting ratings.
Car performance is invisible
Elo treats every result as a reflection of driver skill. In reality, machinery plays a massive role. A dominant car inflates its driver's rating; a poor car suppresses it. A driver switching from a backmarker to a frontrunner will see a sharp Elo rise that's partly car, not skill. Head-to-head teammate comparisons (which Elo captures well) help, but inter-team matchups always blend driver ability with engineering.
DNFs and mechanical failures
Elo penalises any finishing position — including a retirement on lap 1. A driver who crashes out and one whose engine fails are treated identically: both lose Elo from head-to-head matchups against every driver who finished ahead. This means reliability-plagued seasons can drag a rating down through no fault of the driver.
Team orders and strategic results
When a driver yields a position on team orders, Elo records the result at face value. A number-two driver told to hold station will accumulate a lower rating than their raw pace might deserve. Elo has no way to distinguish a genuine on-track battle from a choreographed swap.
Grid size and era context
A 20-car grid generates fewer head-to-head matchups per race than a 26-car grid. Our K-factor scales with grid size to compensate, but smaller fields still mean fewer data points per event. Additionally, the talent depth of the grid varies across decades — dominating a shallow field and dominating a deep one may look the same in raw Elo terms.
Cross-era comparison nuance
The system is designed so that a given Elo value represents the same level of dominance relative to contemporaries regardless of era. However, it cannot tell you whether a 2600-rated driver from the 1950s would beat a 2600-rated driver from the 2020s in the same car — it only says both dominated their respective fields to a similar degree.
What Elo doesn't capture
Qualifying pace, race craft in wheel-to-wheel battles, tyre management, wet-weather ability, and development feedback are all crucial driver skills that Elo compresses into a single number. Sprint races and qualifying results are also not factored into ratings — only the main Grand Prix finishing order matters.