scruiser

joined 2 years ago
[–] scruiser@awful.systems 15 points 3 weeks ago* (last edited 3 weeks ago)

You had me going until the very last sentence. (To be fair to me, the OP broke containment and has attracted a lot of unironically delivered opinions almost as bad as your satirical spiel.)

[–] scruiser@awful.systems 16 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

The latest twist I'm seeing isn't blaming your prompting (although they're still eager to do that), it's blaming your choice of LLM.

"Oh, you're using shitGPT 4.1-4o-o3 mini _ro_plus for programming? You should clearly be using Gemini 3.5.07 pro-doubleplusgood, unless you need something locally run, then you should be using DeepSek_v2_r_1 on your 48 GB VRAM local server! Unless you need nice sounding prose, then you actually need Claude Limmerick 3.7.01. Clearly you just aren't trying the right models, so allow me to educate you with all my prompt fondling experience. You're trying to make some general point? Clearly you just need to try another model."

[–] scruiser@awful.systems 11 points 3 weeks ago (1 children)

It can make funny pictures, sure. But it fails at art as an endeavor to communicate an idea, feeling, or intent of the artist, the promptfondler artists are providing a few sentences instruction and the GenAI following them without any deeper feelings or understanding of context or meaning or intent.

[–] scruiser@awful.systems 8 points 3 weeks ago* (last edited 3 weeks ago) (1 children)

GPT-1 is 117 million parameters, GPT-2 is 1.5 billion parameters, GPT-3 is 175 billion, GPT-4 is undisclosed but estimated at 1.7 trillion. Token needed for training and training compute scale ~~linearly~~ (edit: actually I'm wrong, looking at the wikipedia page... so I was wrong, it is even worse for your case than I was saying, training compute scales quadratically with model size, it is going up 2 OOM for every 10x of parameters) with model size. They are improving ... but only getting a linear improvement in training loss for a geometric increase in model size, training time. A hypothetical GPT-5 would have 10 trillion training parameters and genuinely need to be AGI to have the remotest hope of paying off it's training. And it would need more quality tokens than they have left, they've already scrapped the internet (including many copyrighted sources and sources that requested not to be scrapped). So that's exactly why OpenAI has been screwing around with fine-tuning setups with illegible naming schemes instead of just releasing a GPT-5. But fine-tuning can only shift what you're getting within distribution, so it trades off in getting more hallucinations or overly obsequious output or whatever the latest problem they are having.

Lower model temperatures makes it pick it's best guess for next token as opposed to randomizing among probable guesses, they don't improve on what the best guess is and you can still get hallucinations even picking the "best" next token.

And lol at you trying to reverse the accusation against LLMs by accusing me of regurgitating/hallucinating.

[–] scruiser@awful.systems 9 points 3 weeks ago (1 children)

Posts that explode like this are fun and yet also a reminder why the banhammer is needed.

[–] scruiser@awful.systems 11 points 3 weeks ago (5 children)

Bro, sneerclub and techtakes are for sneering at bad technology and those that worship it, not for engaging in apologia for it (or worse yet, tone policing the sneering). If you don't like it, you can ask the mods for an exit pass out (if they haven't generously given you one already).

[–] scruiser@awful.systems 7 points 3 weeks ago* (last edited 3 weeks ago) (2 children)

Joe Rogan Experience!

...side note my most prominent irl conversation about Joe Rogan was with a relative who was trying to convince me it was a good thing that Joe Rogan platformed a celebrity who was saying 1x1=2 (Terrence Howard). Literally beyond parody.

[–] scruiser@awful.systems 6 points 3 weeks ago (1 children)

Grain futures salesmen on farms full of plant life (99.5% of which is weeds). ...I don't have a snappy label yet.

[–] scruiser@awful.systems 12 points 3 weeks ago (8 children)

What if [tokes joint] hallucinations are actually, like, proof the models are almost at human level man!

[–] scruiser@awful.systems 11 points 3 weeks ago (1 children)

Eliezer Yudkowsky, Geoffrey Hinton, Paul Cristiano, Ilya Sustkever

One of those names is not like the others.

[–] scruiser@awful.systems 13 points 3 weeks ago (34 children)

The promptfarmers can push the hallucination rates incrementally lower by spending 10x compute on training (and training on 10x the data and spending 10x on runtime cost) but they're already consuming a plurality of all VC funding so they can't 10x many more times without going bust entirely. And they aren't going to get them down to 0%, hallucinations are intrinsic to how LLMs operate, no patch with run-time inference or multiple tries or RAG will eliminate that.

And as for newer models... o3 actually had a higher hallucination rate because trying to squeeze rational logic out of the models with fine-tuning just breaks them in a different direction.

I will acknowledge in domains with analytically verifiable answers you can check the LLMs that way, but in that case, its no longer primarily an LLM, you've got an entire expert system or proof assistant or whatever that can operate independently of the LLM and the LLM is just providing creative input.

[–] scruiser@awful.systems 9 points 3 weeks ago

A junior developer learns from these repeated minor corrections. LLM's can't learn from them. they don't have any runtime fine-tuning (and even if they did it wouldn't be learning like a human does), at the very best past conversations get summarized and crammed into the context window hidden from the user to provide a shallow illusion of continuity and learning.

view more: ‹ prev next ›