The illusion of reasoning: step-level evaluation reveals decorative chain-of-thought in frontier language models
arXiv:2603.22816v2 Announce Type: replace-cross Abstract: Language models increasingly “show their work” by writing step-by-step reasoning before answering. But are these reasoning steps genuinely used, or decorative narratives generated after the model has already decided? We introduce step-level faithfulness evaluation –…
