RAG audit
RAG audit — A RAG (Retrieval-Augmented Generation) audit is a systematic review of a production RAG system covering retrieval quality, generation quality, cost, and security. The deliverable is a prioritized fix list: from chunking strategy and embedding model selection, through evaluation framework, to LLM call cost optimization.
What a RAG audit covers
- Chunking strategy — how large the document chunks injected into the LLM are, whether they preserve context, how they’re tagged with metadata.
- Embedding model — does the chosen model understand the domain (finance, law, medicine), is it over-priced.
- Vector DB — choice and configuration (ChromaDB, Pinecone, Vertex AI Vector Search, pgvector), indexing, filtering.
- Retrieval evaluation — recall@k, MRR (Mean Reciprocal Rank), coverage of typical queries.
- Generation prompt — does the model receive readable context, are instructions clear, how to avoid hallucinations.
- End-to-end metrics — faithfulness, answer relevance, hallucination rate, latency, cost per query.
- Security — are there personal data in embeddings (GDPR), is prompt injection blocked.
Common pitfalls
- No evaluation at all — system “works” because the product manager tested 5 queries.
- Generic embedding model for niche domain —
all-MiniLM-L6-v2for Polish tax law is a dead end. - One-shot retrieval — advanced RAG uses multi-hop, hybrid search, rerankers.
- Vector DB as the only index — for typed domains, BM25 + filtering beats dense vectors.
- No quality monitoring — system degrades on new documents, no one notices.
How fewtokensai helps
I run RAG audits in 1–2 weeks, with a concrete ROI-prioritized report. Production RAG experience (IG Group: internal knowledge base chatbot with $100k+ annual savings). Schedule an audit or read about the Enterprise RAG service.