Retrieval Augmented Generation (RAG), Augmented Generation

Augmented Generation

Augmented generation is the process where an AI combines retrieved information with its own knowledge to create accurate, helpful responses. It's like giving the AI research materials before asking it to write an essay.

This critical phase takes the relevant information found during retrieval and carefully incorporates it into the AI's response. Through special prompting techniques and control mechanisms, the AI can produce answers that are both factually accurate and naturally written.

Instruction-Based Contextualization

Good prompting helps AI use the information it finds when creating answers:

In-Context Learning (ICL):

Puts retrieved information directly in the prompt as examples
Tells the AI to use these examples when answering
Makes context look different from instructions
Example: 'Here are some passages about diabetes. Use this information to answer the question: What are the common symptoms of Type 2 diabetes?'

Chain-of-Thought (CoT) Prompting:

Asks the AI to think step-by-step through its reasoning
Directs the AI to examine the retrieved facts before drawing conclusions
Makes answers more accurate for hard questions that need multiple thinking steps
Example: 'First, review the information about climate change in these documents. Then, identify the key factors mentioned. Finally, explain how these factors contribute to rising sea levels.'

Retrieval-Augmented Prompting Patterns:

Context segregation: Clearly separating found information from instructions (Example: 'CONTEXT: [retrieved documents] QUESTION: [user query]')
Source attribution: Keeping track of where information came from for citations (Example: 'According to document #2 from the company handbook...')
Relevance assessment: Asking the AI to first check if the information is helpful before using it (Example: 'Review these passages and determine which are relevant to the question before answering')

How you structure your prompt greatly affects how well the AI uses the retrieved information in its final answer.

Controlled Generation

Techniques to ensure generated content remains faithful to retrieved information:

Constrained Decoding:

Entity Grounding: Ensuring mentioned entities appear in retrieved context
Citation Alignment: Generating inline citations linked to specific sources
Factual Anchoring: Requiring statements to be traceable to retrieved content

Factual Consistency Checks:

NLI-Based Verification: Using natural language inference to verify claims
Self-Consistency: Generating multiple responses and identifying consensus
Uncertainty Expression: Encouraging models to express uncertainty when information is ambiguous

Two-Stage Generation:

First extracting relevant facts from retrieved documents
Then synthesizing these facts into coherent responses
Separating information extraction from text generation

These controls balance the creative capabilities of LLMs with the factual constraints of retrieved information, reducing hallucination while maintaining fluent, natural responses.