this post was submitted on 10 Aug 2025

98 points (99.0% liked)

AI - Artificial intelligence

80 readers

29 users here now

AI related news and articles.

Rules:

No Videos.
No self promotion: Don't post links to your articles.

founded 2 months ago

MODERATORS

Pro@programming.dev

GPT-5: "How many times does the letter b appear in blueberry?" (kieranhealy.org)

submitted 1 day ago* (last edited 1 day ago) by Pro@programming.dev to c/Aii@programming.dev

27 comments fedilink hide all child comments

Hackernews.

top 27 comments

sorted by: hot top controversial new old

[–] Auth@lemmy.world 2 points 18 hours ago

I hate these examples. I feel they do such a bad job of showing that LLMs are lying machines and just make the journalists look like pedantic idiots. There is a perfect example of LLMs being lying machines and its from the GPT 5 presentation. GPT 5 tries to explain how lift works like a PHD holder and (according to what i've read from the people who actually have a PHD) it gets it completely wrong. Its explanation is incorrect, the wing shape it uses as an example is incorrect. It just all around fails at providing a correct explanation.

I asked it the same question and it answered correctly then tried 5 or six other random word examples and every one was correct.

[–] thespcicifcocean@lemmy.world 31 points 1 day ago (1 children)

Not only is it wrong, it's adamant that it's right. It's finally reached human level intelligence.

[–] jj4211@lemmy.world 5 points 1 day ago (1 children)

CEO job, here it comes.

[–] PartyAt15thAndSummit@lemmy.zip 1 points 1 day ago

I was thinking middle manager.

[–] RaivoKulli@sopuli.xyz 16 points 1 day ago (1 children)

AI has learned to argue like people online do

[–] 13igTyme@lemmy.world 3 points 1 day ago

Well what do you think the models were trained on?

[–] Blackmist@feddit.uk 12 points 1 day ago (1 children)

AI bros: nooo, it's not meant for counting letters!

[–] Perspectivist@feddit.uk 12 points 1 day ago (1 children)

It literally isn’t. It’s a Large Language Model - an AI system designed to generate natural-sounding language.

The fact that it gets any answers right at all isn’t because it knows things - it’s because it’s been trained on data that contains a lot of correct information. Its answers are based on statistical probabilities. It’s physically incapable of looking at a word and counting the letters in it.

[–] jj4211@lemmy.world 5 points 1 day ago

Also, it fires off the prompt or various processed versions of it at various backends and then "stuffs" the prompt with the results.

So you ask a prompt "What team won a sports game last night?" and then the LLM ultimately gets the Prompt like.

"Summarize the following sports results to indicate a winning team and statistics: : ESPN: The Lakers held off the Bulls to finish with a 119 to 117 victory FoxSports: The Bulls lose to the Lakers by two points ...."

So you end up asking a question, then your question may query a traditional search engine, and the LLM is summarizing the traditional results, and usually doing an ok job of it, except it can mess up by mixing the results together in nonsensical ways. Like one source cited Canada as having the most lakes by a certain criteria with like 300,000 lakes and another result talking about Alaska having 3 million lakes and ends up telling the user "Canada has significantly more lakes than the United states, with 300,000 while the USA has 200,000, with Alaska contributing the most to the US with 3 million lakes".

[–] WolfLink@sh.itjust.works 4 points 1 day ago

Bluebberry!

[–] Benign@fedia.io 15 points 1 day ago

This is the kind of shit you need to show the kids/teens to make them understand how badly llm's can hallucinate. Enough of these examples and the message might stick.

[–] prole@lemmy.blahaj.zone 5 points 1 day ago (1 children)

2+2=5

[–] toynbee@lemmy.world 5 points 1 day ago

For extremely high values of two?

[–] brucethemoose@lemmy.world 3 points 1 day ago* (last edited 1 day ago)

I feel like diffusion LLMs would get this better.

After “position 5,” an autoregressive LLM has one chance, one pass, to get the next token right instead of another bullet point. And if it randomly picks another bullet point because the temperature is at 1 or whatever, the whole answer is hosed.

Not that OpenAI would ever do that. They just want to deep fry autoregressive transformers more and more instead of, you know, trying something actually interesting.

[–] Bonesince1997@lemmy.world 24 points 1 day ago (2 children)

Bluebberry. It's easy to miss the double b in the middle, ya dummie. Can't you wait for my next version! I am the smarttest.

[–] FoxyFerengi@startrek.website 10 points 1 day ago

This makes me feel extra bbouncy

[–] ThePyroPython@lemmy.world 2 points 1 day ago

The fact that people will soon exist in separate realities thanks to sovereign internet policies, interaction bubbles, and AI serving up tailor-made slop makes me want to KMS.

[–] Perspectivist@feddit.uk 4 points 1 day ago (1 children)

[–] jj4211@lemmy.world 5 points 1 day ago

Of course, the downside of any of these memes for demonstration is that it's hard to know if it's real or not because once it becomes a meme, it tends to somehow or another get tweaked out. Like a commonly cited question from the humanity's last exam which LLMs could not answer suddenly got answered by them with pretty much the verbatim same answer, after enough people posted it and replied and it would appear in prompt stuffing giving the LLM the answer crowd sourced.

[–] Gobbel2000@programming.dev 3 points 1 day ago

Guess we've been wrong all along—time to update the dicttionary.

[–] otacon239@lemmy.world 1 points 1 day ago (2 children)

Why won’t my hammer put this screw in?

[–] pulsewidth@lemmy.world 8 points 1 day ago (1 children)

And there it is, the "he's using ChatGPT wrong" defence, as discussed in the post.

[–] otacon239@lemmy.world -1 points 1 day ago (2 children)

While I do admit OpenAI (and all the others) are overselling this like crazy, being strictly anti-AI just isn’t going to do me any favors.

When I need a language model, I use a language model. I’m not going to ask my English teacher a math question. There are times when it’s useful and times when it’s not, like any other tool.

Every technology that’s come before it has this double wave of haters and lovers that pull on the extremes of embrace or despise. I’m just taking the middle road and ignoring the hype either way.

I’m just grabbing the hammer when I need it.

[–] tyler@programming.dev 6 points 1 day ago (1 children)

but the problem is that you understand that but the majority of people do not. They're not told it's an LLM and even if they are they do not understand what that means. They call it AI. They think it's smart. So the only way to get them to stop using it for every goddamn thing is by showing how problematic it is.

[–] otacon239@lemmy.world 3 points 1 day ago* (last edited 1 day ago)

I tell almost everyone I meet how dangerous it is. It’s one of my biggest talking points with friends. I am very well aware of how much it is a problem. But as someone who understands it, I personally feel comfortable using it.

Notice my choice of language throughout. I’m not applying this logic to the general public.

[–] Perspectivist@feddit.uk 5 points 1 day ago

You’re right - this is a mix of big tech hatred combined with false expectations about what “AI” means and what an LLM should be able to do.

I think one big “issue” with current LLMs is that they’re too good. The first time I saw text from one, it was so bad it was hilarious - perfectly polished sentences that were total nonsense. It was obvious this thing was just talking, and all of it was gibberish. Now the answers make sense, and they’re often factually correct as well, so I can’t really blame people for forgetting that under the hood it’s still the same gibberish machine. They start applying standards to it as if it were an early AGI, and when it fails a basic task like this, it feels like they’ve been lied to (which, to be fair, they have) - when in reality, they’re just trying to drive screws with a hammer.

[–] tigeruppercut@lemmy.zip 2 points 1 day ago

Hit it harder