this post was submitted on 04 Apr 2025

356 points (88.2% liked)

Technology

68400 readers

2491 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

356

Anthropic has developed an AI 'brain scanner' to understand how LLMs work and it turns out the reason why chatbots are terrible at simple math and hallucinate is weirder than you thought (www.pcgamer.com)

submitted 2 days ago by cm0002@lemmy.world to c/technology@lemmy.world

163 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] ICastFist@programming.dev 6 points 14 hours ago

Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

[–] Technoworcester@lemm.ee 144 points 1 day ago (2 children)

'is weirder than you thought '

I am as likely to click a link with that line as much as if it had

'this one weird trick' or 'side hussle'.

I would really like it if headlines treated us like adults and got rid of click baity lines.

[–] BackgrndNoize@lemmy.world 39 points 1 day ago (1 children)

But then you wouldn't need to click on thir Ad infested shite website where 1-2 paragraphs worth of actual information is stretched into a giant essay so that they can show you more Ads the longer you scroll

[–] Technoworcester@lemm.ee 23 points 1 day ago (4 children)

I will never understand how ppl survive without ad blockers. Tried it once recently and it was a horrific experience.

[–] BackgrndNoize@lemmy.world 6 points 1 day ago

I'm thankful for such people's sacrifice, if it wasn't for them there would be even more anti ad block measures in place

load more comments (3 replies)

[–] BeardedGingerWonder@feddit.uk 17 points 1 day ago (5 children)

They do it because it works on the whole. If straight titles were as effective they'd be used instead.

load more comments (5 replies)

[–] cholesterol@lemmy.world 38 points 1 day ago (1 children)

you can't trust its explanations as to what it has just done.

I might have had a lucky guess, but this was basically my assumption. You can't ask LLMs how they work and get an answer coming from an internal understanding of themselves, because they have no 'internal' experience.

Unless you make a scanner like the one in the study, non-verbal processing is as much of a black box to their 'output voice' as it is to us.

[–] cley_faye@lemmy.world 4 points 1 day ago

Anyone that used them for even a limited amount of time will tell you that the thing can give you a correct, detailed explanation on how to do a thing, and provide a broken result. And vice versa. Looking into it by asking more have zero chance of being useful.

[–] dkc@lemmy.world 51 points 1 day ago (1 children)

The research paper looks well written but I couldn’t find any information on if this paper is going to be published in a reputable journal and peer reviewed. I have little faith in private businesses who profit from AI providing an unbiased view of how AI works. I think the first question I’d like answered is did Anthropic’s marketing department review the paper and did they offer any corrections or feedback? We’ve all heard the stories about the tobacco industry paying for papers to be written about the benefits of smoking and refuting health concerns.

[–] StructuredPair@lemmy.world 15 points 1 day ago

A lot of ai research isn't published in journals but either posted to a corporate website or put up on the arxiv. There are some ai journals, but the ai community doesn't particularly value those journals (and threw a bit of a fit when they came out). This article is mostly marketing and doesn't show anything that should surprise anyone familiar with how neural networks work generically in my opinion.

[–] SplashJackson@lemmy.ca 5 points 1 day ago (1 children)

The AIs have shrinks now?

[–] nilclass@discuss.tchncs.de 4 points 1 day ago

You can become one too! Get your certification here https://mt.cert.ccc.de/

[–] harryprayiv@infosec.pub 181 points 2 days ago (50 children)

To understand what's actually happening, Anthropic's researchers developed a new technique, called circuit tracing, to track the decision-making processes inside a large language model step-by-step. They then applied it to their own Claude 3.5 Haiku LLM.

Anthropic says its approach was inspired by the brain scanning techniques used in neuroscience and can identify components of the model that are active at different times. In other words, it's a little like a brain scanner spotting which parts of the brain are firing during a cognitive process.

This is why LLMs are so patchy at math. (Image credit: Anthropic)

Anthropic made lots of intriguing discoveries using this approach, not least of which is why LLMs are so terrible at basic mathematics. "Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains.

But here's the really funky bit. If you ask Claude how it got the correct answer of 95, it will apparently tell you, "I added the ones (6+9=15), carried the 1, then added the 10s (3+5+1=9), resulting in 95." But that actually only reflects common answers in its training data as to how the sum might be completed, as opposed to what it actually did.

In other words, not only does the model use a very, very odd method to do the maths, you can't trust its explanations as to what it has just done. That's significant and shows that model outputs can not be relied upon when designing guardrails for AI. Their internal workings need to be understood, too.

Another very surprising outcome of the research is the discovery that these LLMs do not, as is widely assumed, operate by merely predicting the next word. By tracing how Claude generated rhyming couplets, Anthropic found that it chose the rhyming word at the end of verses first, then filled in the rest of the line.

"The planning thing in poems blew me away," says Batson. "Instead of at the very last minute trying to make the rhyme make sense, it knows where it’s going."

Anthropic discovered that their Claude LLM didn't just predict the next word. (Image credit: Anthropic)

Anthropic also found, among other things, that Claude "sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal 'language of thought'."

Anywho, there's apparently a long way to go with this research. According to Anthropic, "it currently takes a few hours of human effort to understand the circuits we see, even on prompts with only tens of words." And the research doesn't explain how the structures inside LLMs are formed in the first place.

But it has shone a light on at least some parts of how these oddly mysterious AI beings—which we have created but don't understand—actually work. And that has to be a good thing.

[–] FundMECFSResearch@lemmy.blahaj.zone 16 points 1 day ago

Thanks for copypasting. It should be criminal to share a clickbait non-descriptive headline without atleast copying a couple paragraphs for context.

[–] Goretantath@lemm.ee 4 points 1 day ago

So it does the math in its head and gives the correct answer and copies the answersheet from the teachers book into the "show your work" section. Pretty much what i would have done as a kid if i could have, instead i had to fight them and take a hit to my score for not showing my work.

load more comments (48 replies)

[–] shaggyb@lemmy.world 8 points 1 day ago

Don't tell me that my thoughts aren't weird enough.

[–] Imgonnatrythis@sh.itjust.works 78 points 2 days ago (5 children)

"Ask Claude to add 36 and 59 and the model will go through a series of odd steps, including first adding a selection of approximate values (add 40ish and 60ish, add 57ish and 36ish). Towards the end of its process, it comes up with the value 92ish. Meanwhile, another sequence of steps focuses on the last digits, 6 and 9, and determines that the answer must end in a 5. Putting that together with 92ish gives the correct answer of 95," the MIT article explains."

That is precisrly how I do math. Feel a little targeted that they called this odd.

[–] JayGray91@lemmy.zip 29 points 1 day ago (1 children)

I think it's odd in the sense that it's supposed to be software so it should already know what 36 plus 59 is in a picosecond, instead of doing mental arithmetics like we do

At least that's my takeaway

[–] shawn1122@lemm.ee 17 points 1 day ago* (last edited 1 day ago) (1 children)

This is what the ARC-AGI test by Chollet has also revealed of current AI / LLMs. They have a tendency to approach problems with this trial and error method and can be extremely inefficient (in their current form) with anything involving abstract / deductive reasoning.

Most LLMs do terribly at the test with the most recent breakthrough being with reasoning models. But even the reasoning models struggle.

ARC-AGI is simple, but it demands a keen sense of perception and, in some sense, judgment. It consists of a series of incomplete grids that the test-taker must color in based on the rules they deduce from a few examples; one might, for instance, see a sequence of images and observe that a blue tile is always surrounded by orange tiles, then complete the next picture accordingly. It’s not so different from paint by numbers.

The test has long seemed intractable to major AI companies. GPT-4, which OpenAI boasted in 2023 had “advanced reasoning capabilities,” didn’t do much better than the zero percent earned by its predecessor. A year later, GPT-4o, which the start-up marketed as displaying “text, reasoning, and coding intelligence,” achieved only 5 percent. Gemini 1.5 and Claude 3.7, flagship models from Google and Anthropic, achieved 5 and 14 percent, respectively.

https://archive.is/7PL2a

[–] Goretantath@lemm.ee 3 points 1 day ago

Its funny because i approach life with a trial and error method too, not efficient but i get the job done in the end. Always see others who dont and give up like all the people bad at computers who ask the tech support at the company to fix the problem instead of thinking about it for two secs and wonder where life went wrong.

[–] Kolanaki@pawb.social 38 points 1 day ago (14 children)

I use a calculator. Which an AI should also be and not need to do weird shit to do math.

[–] sapetoku@sh.itjust.works 8 points 1 day ago

A regular AI should use a calculator subroutine, not try to discover basic math every time it's asked something.

load more comments (13 replies)

load more comments (3 replies)

[–] perestroika@lemm.ee 10 points 1 day ago* (last edited 1 day ago) (1 children)

Wow, interesting. :)

Not unexpectedly, the LLM failed to explain its own thought process correctly.

[–] shneancy@lemmy.world 4 points 18 hours ago (1 children)

tbf, how do you know what to say and when? or what 2+2 is?

you learnt it? well so did AI

i'm not an AI nut or anything, but we can barely comprehend our own internal processes, it'd be concerning if a thing humanity created was better at it than us lol

[–] elbarto777@lemmy.world 1 points 8 hours ago (1 children)

You're comparing two different things.

Of course I can reflect on how I came with a math result.

"Wait, how did you come up with 4 when I asked you 2+2?"

You can confidently say: "well, my teacher said it once and I'm just parroting it." Or "I pictured two fingers in my mind, then pictured two more fingers and then I counted them." Or "I actually thought that I'd say some random number, came up with 4 because it's my favorite digit, said it and it was pure coincidence that it was correct!"

Whereas it doesn't seem like Claude can't do this.

Of course, you could ask me "what's the physical/chemical process your neurons follow for you to form those four fingers you picture in your mind?" And I would tell you I don't know. But again, that's a different thing.

[–] shneancy@lemmy.world 1 points 6 hours ago

yeah i was referring more to the chemical reactions. the 2+2 example is not the best one but langauge itself is a great case study. once you get fluent enough at any langauge everything just flows, you have a thought and then you compose words to describe it, and the reverse is true, you hear something and your brain just understands. How do we do any of that? no idea

load more comments