Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high
A new benchmark from Artificial Analysis reveals alarming weaknesses in the factual reliability of large language models. Out of 40 models tested, only four achieved a positive score – with Google’s Gemini 3 Pro clearly in the lead. The article…
