AI glossary

RAG audit

RAG audit — A RAG (Retrieval-Augmented Generation) audit is a systematic review of a production RAG system covering retrieval quality, generation quality, cost, and security. The deliverable is a prioritized fix list: from chunking strategy and embedding model selection, through evaluation framework, to LLM call cost optimization.

What a RAG audit covers

Chunking strategy — how large the document chunks injected into the LLM are, whether they preserve context, how they’re tagged with metadata.
Embedding model — does the chosen model understand the domain (finance, law, medicine), is it over-priced.
Vector DB — choice and configuration (ChromaDB, Pinecone, Vertex AI Vector Search, pgvector), indexing, filtering.
Retrieval evaluation — recall@k, MRR (Mean Reciprocal Rank), coverage of typical queries.
Generation prompt — does the model receive readable context, are instructions clear, how to avoid hallucinations.
End-to-end metrics — faithfulness, answer relevance, hallucination rate, latency, cost per query.
Security — are there personal data in embeddings (GDPR), is prompt injection blocked.

Common pitfalls

No evaluation at all — system “works” because the product manager tested 5 queries.
Generic embedding model for niche domain — all-MiniLM-L6-v2 for Polish tax law is a dead end.
One-shot retrieval — advanced RAG uses multi-hop, hybrid search, rerankers.
Vector DB as the only index — for typed domains, BM25 + filtering beats dense vectors.
No quality monitoring — system degrades on new documents, no one notices.

How fewtokensai helps

I run RAG audits in 1–2 weeks, with a concrete ROI-prioritized report. Production RAG experience (IG Group: internal knowledge base chatbot with $100k+ annual savings). Schedule an audit or read about the Enterprise RAG service.

What a RAG audit covers

Common pitfalls

How fewtokensai helps

Let's talk.