this post was submitted on 09 Jun 2025

820 points (91.8% liked)

Technology

71537 readers

4414 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

820

ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic (www.tomshardware.com)

submitted 1 week ago by Lifecoach5000@lemmy.world to c/technology@lemmy.world

205 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] stevedice@sh.itjust.works 8 points 6 days ago

2025 Mazda MX-5 Miata 'got absolutely wrecked' by Inflatable Boat in beginner's boat racing match — Mazda's newest model bamboozled by 1930s technology.

[–] FourWaveforms@lemm.ee 5 points 6 days ago

If you don't play chess, the Atari is probably going to beat you as well.

LLMs are only good at things to the extent that they have been well-trained in the relevant areas. Not just learning to predict text string sequences, but reinforcement learning after that, where a human or some other agent says "this answer is better than that one" enough times in enough of the right contexts. It mimics the way humans learn, which is through repeated and diverse exposure.

If they set up a system to train it against some chess program, or (much simpler) simply gave it a tool call, it would do much better. Tool calling already exists and would be by far the easiest way.

It could also be instructed to write a chess solver program and then run it, at which point it would be on par with the Atari, but it wouldn't compete well with a serious chess solver.

[–] untakenusername@sh.itjust.works 2 points 5 days ago

this is because an LLM is not made for playing chess

[+] FMT99@lemmy.world 289 points 1 week ago* (last edited 4 days ago) (23 children)

[deleted]

[–] spankmonkey@lemmy.world 229 points 1 week ago (15 children)

AI including ChatGPT is being marketed as super awesome at everything, which is why that and similar AI is being forced into absolutely everything and being sold as a replacement for people.

Something marketed as AGI should be treated as AGI when proving it isn't AGI.

[–] pelespirit@sh.itjust.works 14 points 1 week ago (12 children)

Not to help the AI companies, but why don't they program them to look up math programs and outsource chess to other programs when they're asked for that stuff? It's obvious they're shit at it, why do they answer anyway? It's because they're programmed by know-it-all programmers, isn't it.

[–] rebelsimile@sh.itjust.works 29 points 1 week ago (1 children)

Because they’re fucking terrible at designing tools to solve problems, they are obviously less and less good at pretending this is an omnitool that can do everything with perfect coherency (and if it isn’t working right it’s because you’re not believing or paying hard enough)

load more comments (1 replies)

[–] ImplyingImplications@lemmy.ca 26 points 1 week ago

why don't they program them

AI models aren't programmed traditionally. They're generated by machine learning. Essentially the model is given test prompts and then given a rating on its answer. The model's calculations will be adjusted so that its answer to the test prompt will be closer to the expected answer. You repeat this a few billion times with a few billion prompts and you will have generated a model that scores very high on all test prompts.

Then someone asks it how many R's are in strawberry and it gets the wrong answer. The only way to fix this is to add that as a test prompt and redo the machine learning process which takes an enormous amount of time and computational power each time it's done, only for people to once again quickly find some kind of prompt it doesn't answer well.

There are already AI models that play chess incredibly well. Using machine learning to solve a complexe problem isn't the issue. It's trying to get one model to be good at absolutely everything.

load more comments (10 replies)

load more comments (14 replies)

[–] suburban_hillbilly@lemmy.ml 30 points 1 week ago (2 children)

Most people do. It's just called AI in the media everywhere and marketing works. I think online folks forget that something as simple as getting a Lemmy account by yourself puts you into the top quintile of tech literacy.

load more comments (2 replies)

[–] malwieder@feddit.org 27 points 1 week ago (5 children)

Google Maps doesn't pretend to be good at chess. ChatGPT does.

load more comments (5 replies)

[–] iAvicenna@lemmy.world 16 points 1 week ago (1 children)

well so much hype has been generated around chatgpt being close to AGI that now it makes sense to ask questions like "can chatgpt prove the Riemann hypothesis"

load more comments (1 replies)

load more comments (19 replies)

[–] Objection@lemmy.ml 83 points 1 week ago (5 children)

Tbf, the article should probably mention the fact that machine learning programs designed to play chess blow everything else out of the water.

[–] bier 29 points 1 week ago (1 children)

Yeah its like judging how great a fish is at climbing a tree. But it does show that it's not real intelligence or reasoning

[–] 13igTyme@lemmy.world 13 points 1 week ago (1 children)

Don't call my fish stupid.

load more comments (1 replies)

[–] Zenith@lemm.ee 15 points 1 week ago

I forgot which airline it is but one of the onboard games in the back of a headrest TV was a game called “Beginners Chess” which was notoriously difficult to beat so it was tested against other chess engines and it ranked in like the top five most powerful chess engines ever

load more comments (3 replies)

[–] NeilBru@lemmy.world 76 points 1 week ago* (last edited 1 week ago) (9 children)

An LLM is a poor computational/predictive paradigm for playing chess.

[–] surph_ninja@lemmy.world 30 points 1 week ago (1 children)

This just in: a hammer makes a poor screwdriver.

load more comments (1 replies)

load more comments (8 replies)

[–] AlecSadler@sh.itjust.works 60 points 1 week ago (14 children)

ChatGPT has been, hands down, the worst AI coding assistant I've ever used.

It regularly suggests code that doesn't compile or isn't even for the language.

It generally suggests AC of code that is just a copy of the lines I just wrote.

Sometimes it likes to suggest setting the same property like 5 times.

It is absolute garbage and I do not recommend it to anyone.

[–] j4yt33@feddit.org 17 points 1 week ago (5 children)

I find it really hit and miss. Easy, standard operations are fine but if you have an issue with code you wrote and ask it to fix it, you can forget it

load more comments (5 replies)

load more comments (13 replies)

[–] nednobbins@lemm.ee 50 points 1 week ago (5 children)

Sometimes it seems like most of these AI articles are written by AIs with bad prompts.

Human journalists would hopefully do a little research. A quick search would reveal that researches have been publishing about this for over a year so there's no need to sensationalize it. Perhaps the human journalist could have spent a little time talking about why LLMs are bad at chess and how researchers are approaching the problem.

LLMs on the other hand, are very good at producing clickbait articles with low information content.

[–] nova_ad_vitum@lemmy.ca 24 points 1 week ago (7 children)

Gotham chess has a video of making chatgpt play chess against stockfish. Spoiler: chatgpt does not do well. It plays okay for a few moves but then the moment it gets in trouble it straight up cheats. Telling it to follow the rules of chess doesn't help.

This sort of gets to the heart of LLM-based "AI". That one example to me really shows that there's no actual reasoning happening inside. It's producing answers that statistically look like answers that might be given based on that input.

For some things it even works. But calling this intelligence is dubious at best.

load more comments (6 replies)

load more comments (4 replies)

[–] floofloof@lemmy.ca 45 points 1 week ago* (last edited 1 week ago) (5 children)

I suppose it's an interesting experiment, but it's not that surprising that a word prediction machine can't play chess.

load more comments (5 replies)

[–] Halosheep@lemm.ee 43 points 1 week ago (3 children)

I swear every single article critical of current LLMs is like, "The square got BLASTED by the triangle shape when it completely FAILED to go through the triangle shaped hole."

[–] drspod@lemmy.ml 42 points 1 week ago (4 children)

It's newsworthy when the sellers of squares are saying that nobody will ever need a triangle again, and the shape-sector of the stock market is hysterically pumping money into companies that make or use squares.

[–] inconel@lemmy.ca 19 points 1 week ago (1 children)

It's also from a company claiming they're getting closer to create morphing shape that can match any hole.

load more comments (1 replies)

load more comments (3 replies)

load more comments (2 replies)

[–] MonkderVierte@lemmy.zip 41 points 1 week ago (1 children)

LLM are not built for logic.

[–] PushButton@lemmy.world 18 points 1 week ago (6 children)

And yet everybody is selling to write code.

The last time I checked, coding was requiring logic.

load more comments (6 replies)

[–] anubis119@lemmy.world 36 points 1 week ago (6 children)

A strange game. How about a nice game of Global Thermonuclear War?

[–] ada@piefed.blahaj.zone 18 points 1 week ago

No thank you. The only winning move is not to play

load more comments (5 replies)

[–] Furbag@lemmy.world 29 points 1 week ago (6 children)

Can ChatGPT actually play chess now? Last I checked, it couldn't remember more than 5 moves of history so it wouldn't be able to see the true board state and would make illegal moves, take it's own pieces, materialize pieces out of thin air, etc.

load more comments (6 replies)

[–] cley_faye@lemmy.world 25 points 1 week ago

Ah, you used logic. That's the issue. They don't do that.

[–] arc99@lemmy.world 20 points 1 week ago (1 children)

Hardly surprising. Llms aren't -thinking- they're just shitting out the next token for any given input of tokens.

[–] stevedice@sh.itjust.works 1 points 6 days ago (1 children)

That's exactly what thinking is, though.

[–] arc99@lemmy.world 1 points 4 days ago* (last edited 4 days ago) (1 children)

An LLM is an ordered series of parameterized / weighted nodes which are fed a bunch of tokens, and millions of calculations later result generates the next token to append and repeat the process. It's like turning a handle on some complex Babbage-esque machine. LLMs use a tiny bit of randomness ("temperature") when choosing the next token so the responses are not identical each time.

But it is not thinking. Not even remotely so. It's a simulacrum. If you want to see this, run ollama with the temperature set to 0 e.g.

ollama run gemma3:4b
>>> /set parameter temperature 0
>>> what is a leaf

You will get the same answer every single time.

[–] stevedice@sh.itjust.works 1 points 19 hours ago* (last edited 19 hours ago)

I know what an LLM is doing. You don't know what your brain is doing.

[–] Lembot_0003@lemmy.zip 15 points 1 week ago (4 children)

The Atari chess program can play chess better than the Boeing 747 too. And better than the North Pole. Amazing!

load more comments (4 replies)

[–] finitebanjo@lemmy.world 15 points 1 week ago

All these comments asking "why don't they just have chatgpt go and look up the correct answer".

That's not how it works, you buffoons, it trains off of datasets long before it releases. It doesn't think. It doesn't learn after release, it won't remember things you try to teach it.

Really lowering my faith in humanity when even the AI skeptics don't understand that it generates statistical representations of an answer based on answers given in the past.

[–] jsomae@lemmy.ml 13 points 1 week ago (1 children)

Using an LLM as a chess engine is like using a power tool as a table leg. Pretty funny honestly, but it's obviously not going to be good at it, at least not without scaffolding.

[–] kent_eh@lemmy.ca 3 points 6 days ago (1 children)

is like using a power tool as a table leg.

Then again, our corporate lords and masters are trying to replace all manner of skilled workers with those same LLM "AI" tools.

And clearly that will backfire on them and they'll eventually scramble to find people with the needed skills, but in the meantime tons of people will have lost their source of income.

[–] jsomae@lemmy.ml 1 points 6 days ago* (last edited 6 days ago) (1 children)

If you believe LLMs are not good at anything then there should be relatively little to worry about in the long-term, but I am more concerned.

It's not obvious to me that it will backfire for them, because I believe LLMs are good at some things (that is, when they are used correctly, for the correct tasks). Currently they're being applied to far more use cases than they are likely to be good at -- either because they're overhyped or our corporate lords and masters are just experimenting to find out what they're good at and what not. Some of these cases will be like chess, but others will be like code*.

(* not saying LLMs are good at code in general, but for some coding applications I believe they are vastly more efficient than humans, even if a human expert can currently write higher-quality less-buggy code.)

[–] kent_eh@lemmy.ca 1 points 5 days ago (1 children)

I believe LLMs are good at some things

The problem is that they're being used for all the things, including a large number of tasks that thwy are not well suited to.

[–] jsomae@lemmy.ml 2 points 5 days ago

yeah, we agree on this point. In the short term it's a disaster. In the long-term, assuming AI's capabilities don't continue to improve at the rate they have been, our corporate overlords will only replace people for whom it's actually worth it to them to replace with AI.

load more comments