Week 4: Advanced RAG Techniques - 1. The Limits of Naive RAG

Advanced retrieval combines dense search, keyword search, merged candidates, and reranking.

Before You Read

Diagnose the common failure modes of basic top-k vector retrieval.
Explain why query wording, chunk boundaries, and ranking all affect answer quality.
Choose advanced retrieval techniques based on observed errors.

Working Model

Naive RAG assumes the user query already looks like the best document chunk. Real users ask vague, multi-part, conversational, or domain-specific questions, so production systems often need query rewriting, hybrid search, reranking, and evaluation.

In Week 3, we built a "Naive RAG" pipeline: take the user's query, embed it, find the top- $K$ most similar chunks using Cosine Similarity, and stuff them into the LLM prompt.

While this works for simple questions, it quickly breaks down in production. To understand why we need Advanced RAG techniques, we must first understand the failure modes of Naive RAG.

Failure Mode 1: The Vocabulary Gap

Users rarely ask questions using the exact terminology found in your documents.

Document: "The enterprise telecommuting policy mandates a minimum of two on-site days per fiscal quarter."
User Query: "Can I work from home?" Because the words are completely different, the embeddings might not be close enough in vector space, resulting in a failed retrieval.

Failure Mode 2: Complex Queries

Users often ask multi-part questions or questions that require aggregation.

User Query: "What are the differences between the 2023 and 2024 healthcare plans?" A naive semantic search will just look for chunks that are semantically similar to that entire sentence. It might find a chunk about the 2023 plan, but miss the 2024 plan, or vice versa. It doesn't know how to break the question down.

Failure Mode 3: The "Lost in the Middle" Phenomenon

If you retrieve too many chunks (e.g., $K=20$ ) to ensure you don't miss anything, you run into a new problem. Research by Liu et al. (2023) shows that LLMs are very good at extracting information from the beginning and end of their context window, but they often ignore or forget information buried in the middle.

Failure Mode 4: Exact Keyword Matches

Dense embeddings are great for semantic meaning, but terrible for exact matches. If a user searches for a specific error code like ERR-90210, a dense vector search might return chunks about ERR-90211 because they are "semantically similar" (they are both error codes).

To solve these problems, we must move beyond Naive RAG and implement Advanced RAG techniques: Query Translation, Hybrid Search, and Re-ranking.