Presentation: Deploy MultiModal RAG Systems with vLLM

October 10, 2025

2025-10-10 05:12 GMT · 8 months ago aimagpro.com

Stephen Batifol discusses building and optimizing self-hosted, multimodal RAG systems. He breaks down vector search, nearest neighbor indexes (FLAT, IVF, HNSW), and the critical role of choosing the right embedding model. He then explains vLLM inference optimization (paged attention, quantization) and uses Mistral’s Pixtral to detail multimodal large language model architecture. By Stephen Batifol