"For a culture and a nation to cut themselves off from the great ethical and religious forces of their own history is for them to commit suicide. To cultivate the essential moral judgments, to preserve and to protect them as a common good without imposing them coercively seems to me to be a condition for the continuance of freedom as opposed to all sorts of nihilism and their totalitarian consequences."
—Joseph Ratzinger (Pope Benedict XVI)
Icare4America reshared this.
WalnutLum
in reply to Karna • • •The Blog Post from the researcher is a more interesting read.
Important points here about benchmarking:
I'm not sure if a signal to noise ratio of 1:100 is uh... Great...
How I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation
Sean Heelan's Blogdrspod
in reply to WalnutLum • • •ddh
in reply to drspod • • •ThirdConsul
in reply to ddh • • •ddh
in reply to ThirdConsul • • •balsoft
in reply to Karna • • •DonutsRMeh
in reply to balsoft • • •balsoft
in reply to DonutsRMeh • • •HowdWeGetHereAnyways
in reply to DonutsRMeh • • •DonutsRMeh
in reply to HowdWeGetHereAnyways • • •Melmi
in reply to DonutsRMeh • • •data1701d (He/Him)
in reply to Melmi • • •utopiah
in reply to Karna • • •Looks like another of those "Asked AI to find X. AI does find X as requested. Claims that the AI autonomously found X."
I mean... the program literally does what has been asked. It's dataset includes examples related to the request.
Shocked Pikachu face? Really?
Revan343
in reply to utopiah • • •utopiah
in reply to Revan343 • • •Maybe I misunderstood but the vulnerability was unknown to them but the class of vulnerability, let's say "bugs like that", are well known and published by the security community, aren't there?
My point being that if it's previously unknown and reproducible (not just "luck") is major, if it's well known in other projects, even though unknown to this specific user, then it's unsurprising.
Edit: I'm not a security researcher but I believe there are already a lot of tools doing static and dynamic analysis. IMHO It'd be helpful to know how those perform already versus LLMs used here, namely across which dimensions (reliability, speed, coverage e.g. exotic programming languages, accuracy of reporting e.g. hallucinations, etc) is each solution better or worst than the other. I'm always wary of "ex nihilo" demonstrations. Apologies if there is benchmark against existing tools and if I missed that.