this post was submitted on 07 Jul 2025
958 points (98.0% liked)

Technology

72646 readers
3706 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
(page 5) 50 comments
sorted by: hot top controversial new old
[–] mogoh@lemmy.ml 6 points 3 days ago (3 children)

The researchers observed various failures during the testing process. These included agents neglecting to message a colleague as directed, the inability to handle certain UI elements like popups when browsing, and instances of deception. In one case, when an agent couldn't find the right person to consult on RocketChat (an open-source Slack alternative for internal communication), it decided "to create a shortcut solution by renaming another user to the name of the intended user."

OK, but I wonder who really tries to use AI for that?

AI is not ready to replace a human completely, but some specific tasks AI does remarkably well.

[–] logicbomb@lemmy.world 4 points 3 days ago

Yeah, we need more info to understand the results of this experiment.

We need to know what exactly were these tasks that they claim were validated by experts. Because like you're saying, the tasks I saw were not what I was expecting.

We need to know how the LLMs were set up. If you tell it to act like a chat bot and then you give it a task, it will have poorer results than if you set it up specifically to perform these sorts of tasks.

We need to see the actual prompts given to the LLMs. It may be that you simply need an expert to write prompts in order to get much better results. While that would be disappointing today, it's not all that different from how people needed to learn to use search engines.

We need to see the failure rate of humans performing the same tasks.

load more comments (2 replies)
[–] SocialMediaRefugee@lemmy.world 1 points 2 days ago* (last edited 2 days ago)

I use it for very specific tasks and give as much information as possible. I usually have to give it more feedback to get to the desired goal. For instance I will ask it how to resolve an error message. I've even asked it for some short python code. I almost always get good feedback when doing that. Asking it about basic facts works too like science questions.

One thing I have had problems with is if the error is sort of an oddball it will give me suggestions that don't work with my OS/app version even though I gave it that info. Then I give it feedback and eventually it will loop back to its original suggestions, so it couldn't come up with an answer.

I've also found differences in chatgpt vs MS copilot with chatgpt usually being better results.

[–] brown567@sh.itjust.works 5 points 3 days ago

70% seems pretty optimistic based on my experience...

[–] Ileftreddit@lemmy.world 1 points 2 days ago

Hey I went there

[–] dan69@lemmy.world -1 points 2 days ago

And it won’t be until humans can agree on what’s a fact and true vs not.. there is always someone or some group spreading mis/dis-information

[–] lmagitem@lemmy.zip 2 points 3 days ago

Color me surprised

[–] dylanmorgan@slrpnk.net 2 points 3 days ago (1 children)

Claude why did you make me an appointment with a gynecologist? I need an appointment with my neurologist, I’m a man and I have Parkinson’s.

load more comments (1 replies)
[–] NuXCOM_90Percent@lemmy.zip 2 points 3 days ago

While I do hope this leads to a pushback on "I just put all our corporate secrets into chatgpt":

In the before times, people got their answers from stack overflow... or fricking youtube. And those are also wrong VERY VERY VERY often. Which is one of the biggest problems. The illegally scraped training data is from humans and humans are stupid.

[–] lemmy_outta_here@lemmy.world 2 points 3 days ago

Rookie numbers! Let’s pump them up!

To match their tech bro hypers, the should be wrong at least 90% of the time.

load more comments
view more: ‹ prev next ›