Lexical-Semantic Fusion

Combining multiple retrieval approaches can overcome the limitations of individual methods:

Sparse-Dense Hybrid Retrieval:

  • Dense retrieval: Vector similarity captures semantic relationships
  • Sparse retrieval: Keyword matching (BM25) captures exact terminology
  • Hybrid approaches combine both signals for improved relevance

Reciprocal Rank Fusion (RRF):

  • Merges results from multiple retrieval methods
  • Weights items based on their rank in each result set
  • Provides robust performance across diverse query types

ColBERT and Late-Interaction Models:

  • Represent texts as sets of contextualized token embeddings
  • Perform fine-grained matching between query and document terms
  • Balance computational efficiency with matching precision

Fusion approaches provide robustness against the weaknesses of individual retrieval methods, handling both semantic concepts and specific terminology effectively.