Augmented Generation

Augmented generation is the process where an AI combines retrieved information with its own knowledge to create accurate, helpful responses. It's like giving the AI research materials before asking it to write an essay.

This critical phase takes the relevant information found during retrieval and carefully incorporates it into the AI's response. Through special prompting techniques and control mechanisms, the AI can produce answers that are both factually accurate and naturally written.

Good prompting helps AI use the information it finds when creating answers:

In-Context Learning (ICL):

  • Puts retrieved information directly in the prompt as examples
  • Tells the AI to use these examples when answering
  • Makes context look different from instructions
  • Example: 'Here are some passages about diabetes. Use this information to answer the question: What are the common symptoms of Type 2 diabetes?'

Chain-of-Thought (CoT) Prompting:

  • Asks the AI to think step-by-step through its reasoning
  • Directs the AI to examine the retrieved facts before drawing conclusions
  • Makes answers more accurate for hard questions that need multiple thinking steps
  • Example: 'First, review the information about climate change in these documents. Then, identify the key factors mentioned. Finally, explain how these factors contribute to rising sea levels.'

Retrieval-Augmented Prompting Patterns:

  • Context segregation: Clearly separating found information from instructions (Example: 'CONTEXT: [retrieved documents] QUESTION: [user query]')
  • Source attribution: Keeping track of where information came from for citations (Example: 'According to document #2 from the company handbook...')
  • Relevance assessment: Asking the AI to first check if the information is helpful before using it (Example: 'Review these passages and determine which are relevant to the question before answering')

How you structure your prompt greatly affects how well the AI uses the retrieved information in its final answer.

Techniques to ensure generated content remains faithful to retrieved information:

Constrained Decoding:

  • Entity Grounding: Ensuring mentioned entities appear in retrieved context
  • Citation Alignment: Generating inline citations linked to specific sources
  • Factual Anchoring: Requiring statements to be traceable to retrieved content

Factual Consistency Checks:

  • NLI-Based Verification: Using natural language inference to verify claims
  • Self-Consistency: Generating multiple responses and identifying consensus
  • Uncertainty Expression: Encouraging models to express uncertainty when information is ambiguous

Two-Stage Generation:

  • First extracting relevant facts from retrieved documents
  • Then synthesizing these facts into coherent responses
  • Separating information extraction from text generation

These controls balance the creative capabilities of LLMs with the factual constraints of retrieved information, reducing hallucination while maintaining fluent, natural responses.