Forums

Checking if Elo system is oppressive [With proofs]

Sort:
basketstorm

UPD to whoever is reading this: start reading from the last page and go back as I reveal proofs that are more substantial and easier to understand there. Below simulation was just a topic-starter. You'll find more revelation on further pages.

Using ChatGPT powers I simulated 1000000 chess games in a pool of 1000 players. Pairing was rating based with small diffusion to emulate online presence factor. Win/loss factor - just like prescribed by Elo. All players had hidden strength in Elo: 90% of players - from 1000 to 1400, 10% players - from 1400 to 2800. Initial rating was 200, rating floor - 100.

Graphs:

Blue: initial strength distribution.

Green: rating after simulation show that the largest group is minimal-Elo players. Mid-Elo group received artificial bump despite the fact that strength of players was constant during simulation!

Full table with data for each player (names are all fake based on names of real great players and names repeat but that doesn't matter because each player has unique id):
https://pastebin.com/raw/JqGKun3K

Conclusion:

best of the best climbed to the top easily.
Low elo players unfairly end up in a various rating ranges, apparently because of luck, not because of lack of skill. And now you can't blame virtual players for lack of skill. Because game result was dictated by their actual hidden strength.

So in the end we have cases like:

 id player_name hidden_strength_Elo final_rating_Elo
176 Magnus Portisch 1097 509
468 Vladimir Svidler 1263 497
571 Sergey Short 1239 1042

That means actual strength could be 1200, but rating could be 500 OR 1000.

Or look at this oppressed guy:

 id player_name hidden_strength_Elo final_rating_Elo
467 Boris Nepomniachtchi 1203 355
With strength 1203, his rating is 355.
Each player here played 1000 games!
Some more oppression:
 id player_name hidden_strength_Elo final_rating_Elo
266 Hikaru Gajdosko 1112 100
322 Magnus Capablanca 1003 132

Magnus is weaker than Gajdosko but Gajdosko is stuck at 100. Is this fair?

This all aligns with my observations and experience here on chess.com and explains why many people astonished by randomness in apparent strength of their opponents that have same rating.

Thoughts?

xtreme2020
There’s no way chatGPT can simulate something like this, it can’t even answer the simplest problems I’ve ever seen
Fiochtree

incredible

basketstorm
xtreme2020 wrote:
There’s no way chatGPT can simulate something like this, it can’t even answer the simplest problems I’ve ever seen

ChatGPT is evolving. Free version can't do much, sure, but most advanced runs analysis, you need to wait for a while and it returns with graphs and tables

MasterJyanM

intresting.quite intresting. surprise

basketstorm

For comparison, current global rapid leaderboard on chess.com:

Peak is at 400. But I guess that's because most people don't play much Rapid and their initial rating (400) remains around 400. Those who play mostly sit in 100-300 category. And after the peak we see same decreasing slope I had with my green graph. Occasionally those 400s step into a fight. Their actual strength is very random. But low-Elo player get punished like if that opponent's 400 rating was real!

chesssblackbelt

who cares about ratings at that level? its just about having fun

xtreme2020
#4 this simulation would realistically need a really powerful computer and a custom written program, nothing like chatGPT can do this. ChatGPT still gives you a random number when you ask it the last digits of pi, and contradicts itself all the time. It’s evolving yes, but it still isn’t very good, and not nearly this good.
basketstorm
xtreme2020 wrote:
#4 this simulation would realistically need a really powerful computer and a custom written program, nothing like chatGPT can do this. ChatGPT still gives you a random number when you ask it the last digits of pi, and contradicts itself all the time. It’s evolving yes, but it still isn’t very good, and not nearly this good.

The data is not coming from the language model itself. It's a result of actual program execution. So it is indeed a "custom written program" in this case.

And I think, you overestimate the complexity of this task.

xtreme2020
By the very definition of elo, if someone is actually 1200 strength and playing 500s they will win every single game no matter how many games they played, never mind losing enough to stay consistently at 500. What you don’t understand is that elo isn’t just some random rigged number, it’s the definition of skill. There is no “hidden elo strength” because the elo you are at is always the elo you play at.
chesssblackbelt

thats not true. 500s will beat 1200s sometimes

xtreme2020
#9 well it did something wrong, because it’s mathematically impossible for a 1263 to ever lose to a 497, never mind losing enough to consistently stay at that level.
chesssblackbelt

with elo they give like +0.1 for beating someone 700 elo lower or smth

i've lost to loads to 1400s and i'm 2300. its not even that rare

xtreme2020
#11 maybe .1% of the time, I don’t know the exact number, but 50% of the time?
basketstorm
xtreme2020 wrote:
By the very definition of elo, if someone is actually 1200 strength and playing 500s they will win every single game no matter how many games they played, never mind losing enough to stay consistently at 500. What you don’t understand is that elo isn’t just some random rigged number, it’s the definition of skill. There is no “hidden elo strength” because the elo you are at is always the elo you play at.

500 vs 1200 - 1.75% win chance for 500.

I understand what Elo is perfectly.
Hidden Elo strength in this case would be a comparison to engine's (like Stockfish) Elo. If player wins 50% games against Stockfish in 1400 Elo setting, that's what I call hidden Elo strength of that player.

basketstorm
xtreme2020 wrote:
#9 well it did something wrong, because it’s mathematically impossible for a 1263 to ever lose to a 497, never mind losing enough to consistently stay at that level.

1263 and 497? You misunderstood my table. 1263 in that table is hidden Elo strength, 497 - resulting Elo after simulation. At the same time a different player in simulation has 1239 hidden strength (close to previous player's strength) but final Elo is 1042. That's the problem: players are rated identically but have wildly different actual strength.

xtreme2020
#13 not if you’re also 700. And, you losing to 1400s is the result of you being distracted, or having a bad day probably. This program assumes neither. But even if you lose a decent amount of games to 1400s, would you say any 2300 could ever lose 50% of their games over a long time span to a 1400?No, the score would probably be around 100/1 on a bad day.
chesssblackbelt
xtreme2020 wrote:
#11 maybe .1% of the time, I don’t know the exact number, but 50% of the time?

definitely not 50% lol i'm just saying theres a chance

xtreme2020
#15 well, the stockfish elo setting is inaccurate. There is your problem.
basketstorm
xtreme2020 wrote:
#13 not if you’re also 700. And, you losing to 1400s is the result of you being distracted, or having a bad day probably. This program assumes neither. But even if you lose a decent amount of games to 1400s, would you say any 2300 could ever lose 50% of their games over a long time span to a 1400?No, the score would probably be around 100/1 on a bad day.

No one said about losing 50% of games with large Elo difference.