Service

Enterprise RAG

I build and audit production Retrieval-Augmented Generation systems — vector DB selection, chunking, embeddings, evaluation framework that actually proves if it works.

When a plain LLM isn’t enough

Most companies discover this after the first ChatGPT pilot: the model “hallucinates” about products, internal policy, customers. Retrieval-Augmented Generation (RAG) fixes this by injecting verified documents into the LLM’s context. Sounds simple. In production it’s full of traps.

At IG Group I led a generative AI adoption program where we built a RAG-powered internal knowledge base chatbot. Annual savings: $100k+, hundreds of manual hours eliminated each month.

What I deliver

RAG audit (1–2 weeks) — review of existing stack: chunking strategy, embedding model, vector DB choice, retrieval evaluation, generation prompt. Report with a concrete, ROI-prioritized fix list.
Greenfield implementation — vector DB selection (ChromaDB, Pinecone, Vertex AI Vector Search, pgvector, Weaviate), document ETL pipeline, evaluation framework (recall@k, MRR, hallucination rate, faithfulness).
Multi-modal RAG — when sources are not just text (PDFs with tables, images, audio).
Cost optimization — RAG can burn through your OpenAI/Anthropic budget in weeks. I tune caching, embedding tier, chunking so cost is linear with value.
Evaluation as a continuous process — without measurement, RAG is wishful thinking. I build pipelines that score quality daily on real queries.

Common pitfalls

Bad chunking — 100-token paragraphs lose context, 2000-token paragraphs flood the generator with noise.
Generic embedding model — all-MiniLM-L6-v2 may not understand your domain (accounting, law, medicine).
No evaluation — the system “works” because the first 5 queries from the product manager looked good.
One-shot retrieval — advanced RAG uses multi-hop, hybrid search, rerankers.
Vector DB as the only index — for hard-typed domains, BM25 + filtering can beat dense vectors.

Stack I use

Python · LangChain · LlamaIndex · ChromaDB · Pinecone · Vertex AI Vector Search · pgvector · OpenAI · Anthropic · Cohere Rerank · AzureOpenAI · Ragas evaluation · DeepEval

When a plain LLM isn’t enough

What I deliver

Common pitfalls

Stack I use

Let's talk.