Retrieval Mechanics

Retrieval systems find relevant information from potentially massive document collections. Their efficiency and effectiveness determine both system performance and response quality.

Finding exact nearest neighbors in high-dimensional spaces becomes computationally prohibitive at scale. ANN algorithms make this practical:

Hierarchical Navigable Small World (HNSW):

  • Creates multi-layer graphs connecting vectors by similarity
  • Enables logarithmic-time search by navigating from distant to close neighbors
  • Balances speed and accuracy through adjustable parameters

Inverted File Index with Product Quantization (IVF-PQ):

  • Partitions vector space into clusters for coarse filtering
  • Compresses vectors through product quantization to reduce memory requirements
  • Enables billion-scale vector search on standard hardware

Other Common Approaches:

  • Locality-Sensitive Hashing (LSH): Hash-based techniques for approximate matching
  • Random Projection Trees: Tree structures for partitioning vector space

ANN algorithms involve trade-offs between search speed, memory usage, and result accuracy that must be tuned based on application requirements.

Combining multiple retrieval approaches can overcome the limitations of individual methods:

Sparse-Dense Hybrid Retrieval:

  • Dense retrieval: Vector similarity captures semantic relationships
  • Sparse retrieval: Keyword matching (BM25) captures exact terminology
  • Hybrid approaches combine both signals for improved relevance

Reciprocal Rank Fusion (RRF):

  • Merges results from multiple retrieval methods
  • Weights items based on their rank in each result set
  • Provides robust performance across diverse query types

ColBERT and Late-Interaction Models:

  • Represent texts as sets of contextualized token embeddings
  • Perform fine-grained matching between query and document terms
  • Balance computational efficiency with matching precision

Fusion approaches provide robustness against the weaknesses of individual retrieval methods, handling both semantic concepts and specific terminology effectively.

User queries often don't match document phrasing, creating retrieval challenges that can be addressed through query transformation:

Pseudo-Relevance Feedback (PRF):

  • Performs initial retrieval to find potentially relevant documents
  • Extracts key terms from these documents to expand the original query
  • Creates a more comprehensive query that matches relevant document terminology

Neural Query Expansion:

  • Uses LLMs to generate alternative phrasings of the original query
  • Creates multiple search queries from a single user question
  • Improves recall by covering different ways information might be expressed

Hypothetical Document Content (HyDE):

  • Uses an LLM to generate an ideal answer document
  • Retrieves real documents similar to this hypothetical document
  • Bridges the query-document vocabulary gap effectively

These techniques transform user questions into more effective retrieval queries, significantly improving the ability to find relevant information even when expressed differently.