The LLM Bottleneck: Why Open-Source Vision LLMs Struggle with Hierarchical Visual Recognition
arXiv:2505.24840v2 Announce Type: replace-cross Abstract: This paper reveals that many open-source large language models (LLMs) lack hierarchical knowledge about our visual world, unaware of even well-established biology taxonomies. This shortcoming makes LLMs a bottleneck for vision LLMs’ hierarchical visual recognition…
