This amazes me:
"Anthropic found that even when models received hints—such as metadata suggesting the right answer or code with built-in shortcuts—their CoT outputs often excluded mention of those hints, instead generating detailed but inaccurate rationales. This means the CoT did not reflect all the factors that actually influenced the model’s output."
No. It means its 'explanation' is based on LLM probability, it understands nothing. As with any other result.
arstechnica.com/ai/2025/04/res…
Researchers concerned to find AI models misrepresenting their “reasoning” processes
New Anthropic research shows AI models often fail to disclose reasoning shortcuts.Benj Edwards (Ars Technica)
woodland creature
in reply to woodland creature • • •woodland creature
in reply to woodland creature • • •