this post was submitted on 01 Jun 2025
23 points (89.7% liked)
savedyouaclick
177 readers
1 users here now
News and other articles with clickbait bypassed by including the spoiler in the title (put it after a vertical bar "|") or in the post body. Same basic idea as reddit's savedyouaclick, but less uptight for now. E.g. it's fine to link directly to non-paywalled news sources. Mod volunteers wanted.
founded 11 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I generally distrust AI for finding information, but in contrast this is a good use. After a human's audit, the AI analyzing code for more completeness, which then the developer can verify. There's no blind trust in the AI's output, or the path of the assessment itself created by AI which would lead to pitfalls wirh audits.
I don't think you understood the article. The AI did not find any vulnerability most of the time, even after directly being prompted to find it.
But the Human already found a vulnerability and fixed it. Also, as seen in numerous Github AI Hacker1 fails, most of the time the AI will make up CVEs or fixes for these, essentially being more of a roadblock than a helping hand to verifying what the human wrote.
As seen in the article, even if you would point exactly to the problem, the AI will still not find it or make up problems 99.92% of the time.
Don't forget the 28/100 false positive rate.
So only 8% success rate, and 28% false positive rate (even worse than just failing to find the issue) in ideal conditions.
What I understood from the article was that the developer was testing it on a vulnerability they found, and the AI detected it very occasionally. It found a random other problem, which yes can often be a false positive, but I gathered from the article that there was one that was a previously undiscovered vulnerability. But that's where the developer verifies instead of taking ChatGPT at its word.
Of course I still don't trust it to code the fix, but in terms of looking for problem areas in code. While its effectiveness in practice is marginal, as an application of AI in general, it can search big areas and try to come up with a few candidates, that I think is a legit use case.
I mean, if you run it 100 Times and spend like 2 Months chasing down the false positives, maybe.