Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

November 19, 2025

2025-11-19 06:57 GMT · 7 months ago aimagpro.com

A new benchmark from Artificial Analysis reveals alarming weaknesses in the factual reliability of large language models. Out of 40 models tested, only four achieved a positive score – with Google’s Gemini 3 Pro clearly in the lead.
The article Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high appeared first on THE DECODER.