Stop benchmarking in the lab: Inclusion Arena shows how LLMs perform in production

Researchers from Inclusion AI and Ant Group proposed a new LLM leaderboard that takes its data from real, in-production apps.

Researchers from Inclusion AI and Ant Group proposed a new LLM leaderboard that takes its data from real, in-production apps.

Original: https://venturebeat.com/ai/stop-benchmarking-in-the-lab-inclusion-arena-shows-how-llms-perform-in-production/