Enterprise RAG
I build and audit production Retrieval-Augmented Generation systems — vector DB selection, chunking, embeddings, evaluation framework that actually proves if it works.
When a plain LLM isn’t enough
Most companies discover this after the first ChatGPT pilot: the model “hallucinates” about products, internal policy, customers. Retrieval-Augmented Generation (RAG) fixes this by injecting verified documents into the LLM’s context. Sounds simple. In production it’s full of traps.
At IG Group I led a generative AI adoption program where we built a RAG-powered internal knowledge base chatbot. Annual savings: $100k+, hundreds of manual hours eliminated each month.
What I deliver
- RAG audit (1–2 weeks) — review of existing stack: chunking strategy, embedding model, vector DB choice, retrieval evaluation, generation prompt. Report with a concrete, ROI-prioritized fix list.
- Greenfield implementation — vector DB selection (ChromaDB, Pinecone, Vertex AI Vector Search, pgvector, Weaviate), document ETL pipeline, evaluation framework (recall@k, MRR, hallucination rate, faithfulness).
- Multi-modal RAG — when sources are not just text (PDFs with tables, images, audio).
- Cost optimization — RAG can burn through your OpenAI/Anthropic budget in weeks. I tune caching, embedding tier, chunking so cost is linear with value.
- Evaluation as a continuous process — without measurement, RAG is wishful thinking. I build pipelines that score quality daily on real queries.
Common pitfalls
- Bad chunking — 100-token paragraphs lose context, 2000-token paragraphs flood the generator with noise.
- Generic embedding model —
all-MiniLM-L6-v2may not understand your domain (accounting, law, medicine). - No evaluation — the system “works” because the first 5 queries from the product manager looked good.
- One-shot retrieval — advanced RAG uses multi-hop, hybrid search, rerankers.
- Vector DB as the only index — for hard-typed domains, BM25 + filtering can beat dense vectors.
Stack I use
Python · LangChain · LlamaIndex · ChromaDB · Pinecone · Vertex AI Vector Search · pgvector · OpenAI · Anthropic · Cohere Rerank · AzureOpenAI · Ragas evaluation · DeepEval