Skip to content
fewtokensai
Service

Enterprise RAG

I build and audit production Retrieval-Augmented Generation systems — vector DB selection, chunking, embeddings, evaluation framework that actually proves if it works.

When a plain LLM isn’t enough

Most companies discover this after the first ChatGPT pilot: the model “hallucinates” about products, internal policy, customers. Retrieval-Augmented Generation (RAG) fixes this by injecting verified documents into the LLM’s context. Sounds simple. In production it’s full of traps.

At IG Group I led a generative AI adoption program where we built a RAG-powered internal knowledge base chatbot. Annual savings: $100k+, hundreds of manual hours eliminated each month.

What I deliver

  • RAG audit (1–2 weeks) — review of existing stack: chunking strategy, embedding model, vector DB choice, retrieval evaluation, generation prompt. Report with a concrete, ROI-prioritized fix list.
  • Greenfield implementation — vector DB selection (ChromaDB, Pinecone, Vertex AI Vector Search, pgvector, Weaviate), document ETL pipeline, evaluation framework (recall@k, MRR, hallucination rate, faithfulness).
  • Multi-modal RAG — when sources are not just text (PDFs with tables, images, audio).
  • Cost optimization — RAG can burn through your OpenAI/Anthropic budget in weeks. I tune caching, embedding tier, chunking so cost is linear with value.
  • Evaluation as a continuous process — without measurement, RAG is wishful thinking. I build pipelines that score quality daily on real queries.

Common pitfalls

  1. Bad chunking — 100-token paragraphs lose context, 2000-token paragraphs flood the generator with noise.
  2. Generic embedding modelall-MiniLM-L6-v2 may not understand your domain (accounting, law, medicine).
  3. No evaluation — the system “works” because the first 5 queries from the product manager looked good.
  4. One-shot retrieval — advanced RAG uses multi-hop, hybrid search, rerankers.
  5. Vector DB as the only index — for hard-typed domains, BM25 + filtering can beat dense vectors.

Stack I use

Python · LangChain · LlamaIndex · ChromaDB · Pinecone · Vertex AI Vector Search · pgvector · OpenAI · Anthropic · Cohere Rerank · AzureOpenAI · Ragas evaluation · DeepEval

Let's talk about your AI

Let's talk.

30 minutes, no obligation. Tell me where your AI initiative is stuck or what you're planning — you'll leave with concrete next steps.