A new benchmark from Artificial Analysis reveals alarming weaknesses in the factual reliability of large language models. Out of 40 models tested, only four achieved a positive score – with Google’s Gemini 3 Pro clearly in the lead.
The article Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high appeared first on THE DECODER.
