rag-implementation

安装量: 76
排名: #12496

安装

npx skills add https://github.com/wshobson/agents --skill rag-implementation
RAG Implementation
Master Retrieval-Augmented Generation (RAG) to build LLM applications that provide accurate, grounded responses using external knowledge sources.
When to Use This Skill
Building Q&A systems over proprietary documents
Creating chatbots with current, factual information
Implementing semantic search with natural language queries
Reducing hallucinations with grounded responses
Enabling LLMs to access domain-specific knowledge
Building documentation assistants
Creating research tools with source citation
Core Components
1. Vector Databases
Purpose
Store and retrieve document embeddings efficiently
Options:
Pinecone
Managed, scalable, serverless
Weaviate
Open-source, hybrid search, GraphQL
Milvus
High performance, on-premise
Chroma
Lightweight, easy to use, local development
Qdrant
Fast, filtered search, Rust-based
pgvector
PostgreSQL extension, SQL integration
2. Embeddings
Purpose
Convert text to numerical vectors for similarity search
Models (2026):
Model
Dimensions
Best For
voyage-3-large
1024
Claude apps (Anthropic recommended)
voyage-code-3
1024
Code search
text-embedding-3-large
3072
OpenAI apps, high accuracy
text-embedding-3-small
1536
OpenAI apps, cost-effective
bge-large-en-v1.5
1024
Open source, local deployment
multilingual-e5-large
1024
Multi-language support
3. Retrieval Strategies
Approaches:
Dense Retrieval
Semantic similarity via embeddings
Sparse Retrieval
Keyword matching (BM25, TF-IDF)
Hybrid Search
Combine dense + sparse with weighted fusion
Multi-Query
Generate multiple query variations
HyDE
Generate hypothetical documents for better retrieval
4. Reranking
Purpose
Improve retrieval quality by reordering results
Methods:
Cross-Encoders
BERT-based reranking (ms-marco-MiniLM)
Cohere Rerank
API-based reranking
Maximal Marginal Relevance (MMR)
Diversity + relevance
LLM-based
Use LLM to score relevance Quick Start with LangGraph from langgraph . graph import StateGraph , START , END from langchain_anthropic import ChatAnthropic from langchain_voyageai import VoyageAIEmbeddings from langchain_pinecone import PineconeVectorStore from langchain_core . documents import Document from langchain_core . prompts import ChatPromptTemplate from langchain_text_splitters import RecursiveCharacterTextSplitter from typing import TypedDict , Annotated class RAGState ( TypedDict ) : question : str context : list [ Document ] answer : str

Initialize components

llm

ChatAnthropic ( model = "claude-sonnet-4-6" ) embeddings = VoyageAIEmbeddings ( model = "voyage-3-large" ) vectorstore = PineconeVectorStore ( index_name = "docs" , embedding = embeddings ) retriever = vectorstore . as_retriever ( search_kwargs = { "k" : 4 } )

RAG prompt

rag_prompt

ChatPromptTemplate . from_template ( """Answer based on the context below. If you cannot answer, say so. Context: {context} Question: {question} Answer:""" ) async def retrieve ( state : RAGState ) -

RAGState : """Retrieve relevant documents.""" docs = await retriever . ainvoke ( state [ "question" ] ) return { "context" : docs } async def generate ( state : RAGState ) -

RAGState : """Generate answer from context.""" context_text = "\n\n" . join ( doc . page_content for doc in state [ "context" ] ) messages = rag_prompt . format_messages ( context = context_text , question = state [ "question" ] ) response = await llm . ainvoke ( messages ) return { "answer" : response . content }

Build RAG graph

builder

StateGraph ( RAGState ) builder . add_node ( "retrieve" , retrieve ) builder . add_node ( "generate" , generate ) builder . add_edge ( START , "retrieve" ) builder . add_edge ( "retrieve" , "generate" ) builder . add_edge ( "generate" , END ) rag_chain = builder . compile ( )

Use

result

await rag_chain . ainvoke ( { "question" : "What are the main features?" } ) print ( result [ "answer" ] ) Advanced RAG Patterns Pattern 1: Hybrid Search with RRF from langchain_community . retrievers import BM25Retriever from langchain . retrievers import EnsembleRetriever

Sparse retriever (BM25 for keyword matching)

bm25_retriever

BM25Retriever . from_documents ( documents ) bm25_retriever . k = 10

Dense retriever (embeddings for semantic search)

dense_retriever

vectorstore . as_retriever ( search_kwargs = { "k" : 10 } )

Combine with Reciprocal Rank Fusion weights

ensemble_retriever

EnsembleRetriever ( retrievers = [ bm25_retriever , dense_retriever ] , weights = [ 0.3 , 0.7 ]

30% keyword, 70% semantic

) Pattern 2: Multi-Query Retrieval from langchain . retrievers . multi_query import MultiQueryRetriever

Generate multiple query perspectives for better recall

multi_query_retriever

MultiQueryRetriever . from_llm ( retriever = vectorstore . as_retriever ( search_kwargs = { "k" : 5 } ) , llm = llm )

Single query → multiple variations → combined results

results

await multi_query_retriever . ainvoke ( "What is the main topic?" ) Pattern 3: Contextual Compression from langchain . retrievers import ContextualCompressionRetriever from langchain . retrievers . document_compressors import LLMChainExtractor

Compressor extracts only relevant portions

compressor

LLMChainExtractor . from_llm ( llm ) compression_retriever = ContextualCompressionRetriever ( base_compressor = compressor , base_retriever = vectorstore . as_retriever ( search_kwargs = { "k" : 10 } ) )

Returns only relevant parts of documents

compressed_docs

await compression_retriever . ainvoke ( "specific query" ) Pattern 4: Parent Document Retriever from langchain . retrievers import ParentDocumentRetriever from langchain . storage import InMemoryStore from langchain_text_splitters import RecursiveCharacterTextSplitter

Small chunks for precise retrieval, large chunks for context

child_splitter

RecursiveCharacterTextSplitter ( chunk_size = 400 , chunk_overlap = 50 ) parent_splitter = RecursiveCharacterTextSplitter ( chunk_size = 2000 , chunk_overlap = 200 )

Store for parent documents

docstore

InMemoryStore ( ) parent_retriever = ParentDocumentRetriever ( vectorstore = vectorstore , docstore = docstore , child_splitter = child_splitter , parent_splitter = parent_splitter )

Add documents (splits children, stores parents)

await parent_retriever . aadd_documents ( documents )

Retrieval returns parent documents with full context

results

await parent_retriever . ainvoke ( "query" ) Pattern 5: HyDE (Hypothetical Document Embeddings) from langchain_core . prompts import ChatPromptTemplate class HyDEState ( TypedDict ) : question : str hypothetical_doc : str context : list [ Document ] answer : str hyde_prompt = ChatPromptTemplate . from_template ( """Write a detailed passage that would answer this question: Question: {question} Passage:""" ) async def generate_hypothetical ( state : HyDEState ) -

HyDEState : """Generate hypothetical document for better retrieval.""" messages = hyde_prompt . format_messages ( question = state [ "question" ] ) response = await llm . ainvoke ( messages ) return { "hypothetical_doc" : response . content } async def retrieve_with_hyde ( state : HyDEState ) -

HyDEState : """Retrieve using hypothetical document."""

Use hypothetical doc for retrieval instead of original query

docs

await retriever . ainvoke ( state [ "hypothetical_doc" ] ) return { "context" : docs }

Build HyDE RAG graph

builder

StateGraph ( HyDEState ) builder . add_node ( "hypothetical" , generate_hypothetical ) builder . add_node ( "retrieve" , retrieve_with_hyde ) builder . add_node ( "generate" , generate ) builder . add_edge ( START , "hypothetical" ) builder . add_edge ( "hypothetical" , "retrieve" ) builder . add_edge ( "retrieve" , "generate" ) builder . add_edge ( "generate" , END ) hyde_rag = builder . compile ( ) Document Chunking Strategies Recursive Character Text Splitter from langchain_text_splitters import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter ( chunk_size = 1000 , chunk_overlap = 200 , length_function = len , separators = [ "\n\n" , "\n" , ". " , " " , "" ]

Try in order

) chunks = splitter . split_documents ( documents ) Token-Based Splitting from langchain_text_splitters import TokenTextSplitter splitter = TokenTextSplitter ( chunk_size = 512 , chunk_overlap = 50 , encoding_name = "cl100k_base"

OpenAI tiktoken encoding

) Semantic Chunking from langchain_experimental . text_splitter import SemanticChunker splitter = SemanticChunker ( embeddings = embeddings , breakpoint_threshold_type = "percentile" , breakpoint_threshold_amount = 95 ) Markdown Header Splitter from langchain_text_splitters import MarkdownHeaderTextSplitter headers_to_split_on = [ ( "#" , "Header 1" ) , ( "##" , "Header 2" ) , ( "###" , "Header 3" ) , ] splitter = MarkdownHeaderTextSplitter ( headers_to_split_on = headers_to_split_on , strip_headers = False ) Vector Store Configurations Pinecone (Serverless) from pinecone import Pinecone , ServerlessSpec from langchain_pinecone import PineconeVectorStore

Initialize Pinecone client

pc

Pinecone ( api_key = os . environ [ "PINECONE_API_KEY" ] )

Create index if needed

if "my-index" not in pc . list_indexes ( ) . names ( ) : pc . create_index ( name = "my-index" , dimension = 1024 ,

voyage-3-large dimensions

metric

"cosine" , spec = ServerlessSpec ( cloud = "aws" , region = "us-east-1" ) )

Create vector store

index

pc . Index ( "my-index" ) vectorstore = PineconeVectorStore ( index = index , embedding = embeddings ) Weaviate import weaviate from langchain_weaviate import WeaviateVectorStore client = weaviate . connect_to_local ( )

or connect_to_weaviate_cloud()

vectorstore

WeaviateVectorStore ( client = client , index_name = "Documents" , text_key = "content" , embedding = embeddings ) Chroma (Local Development) from langchain_chroma import Chroma vectorstore = Chroma ( collection_name = "my_collection" , embedding_function = embeddings , persist_directory = "./chroma_db" ) pgvector (PostgreSQL) from langchain_postgres . vectorstores import PGVector connection_string = "postgresql+psycopg://user:pass@localhost:5432/vectordb" vectorstore = PGVector ( embeddings = embeddings , collection_name = "documents" , connection = connection_string , ) Retrieval Optimization 1. Metadata Filtering from langchain_core . documents import Document

Add metadata during indexing

docs_with_metadata

[ ] for doc in documents : doc . metadata . update ( { "source" : doc . metadata . get ( "source" , "unknown" ) , "category" : determine_category ( doc . page_content ) , "date" : datetime . now ( ) . isoformat ( ) } ) docs_with_metadata . append ( doc )

Filter during retrieval

results

await vectorstore . asimilarity_search ( "query" , filter = { "category" : "technical" } , k = 5 ) 2. Maximal Marginal Relevance (MMR)

Balance relevance with diversity

results

await vectorstore . amax_marginal_relevance_search ( "query" , k = 5 , fetch_k = 20 ,

Fetch 20, return top 5 diverse

lambda_mult

0.5

0=max diversity, 1=max relevance

) 3. Reranking with Cross-Encoder from sentence_transformers import CrossEncoder reranker = CrossEncoder ( 'cross-encoder/ms-marco-MiniLM-L-6-v2' ) async def retrieve_and_rerank ( query : str , k : int = 5 ) -

list [ Document ] :

Get initial results

candidates

await vectorstore . asimilarity_search ( query , k = 20 )

Rerank

pairs

[ [ query , doc . page_content ] for doc in candidates ] scores = reranker . predict ( pairs )

Sort by score and take top k

ranked

sorted ( zip ( candidates , scores ) , key = lambda x : x [ 1 ] , reverse = True ) return [ doc for doc , score in ranked [ : k ] ] 4. Cohere Rerank from langchain . retrievers import CohereRerank from langchain_cohere import CohereRerank reranker = CohereRerank ( model = "rerank-english-v3.0" , top_n = 5 )

Wrap retriever with reranking

reranked_retriever

ContextualCompressionRetriever ( base_compressor = reranker , base_retriever = vectorstore . as_retriever ( search_kwargs = { "k" : 20 } ) ) Prompt Engineering for RAG Contextual Prompt with Citations rag_prompt = ChatPromptTemplate . from_template ( """Answer the question based on the context below. Include citations using [1], [2], etc. If you cannot answer based on the context, say "I don't have enough information." Context: {context} Question: {question} Instructions: 1. Use only information from the context 2. Cite sources with [1], [2] format 3. If uncertain, express uncertainty Answer (with citations):""" ) Structured Output for RAG from pydantic import BaseModel , Field class RAGResponse ( BaseModel ) : answer : str = Field ( description = "The answer based on context" ) confidence : float = Field ( description = "Confidence score 0-1" ) sources : list [ str ] = Field ( description = "Source document IDs used" ) reasoning : str = Field ( description = "Brief reasoning for the answer" )

Use with structured output

structured_llm

llm . with_structured_output ( RAGResponse ) Evaluation Metrics from typing import TypedDict class RAGEvalMetrics ( TypedDict ) : retrieval_precision : float

Relevant docs / retrieved docs

retrieval_recall : float

Retrieved relevant / total relevant

answer_relevance : float

Answer addresses question

faithfulness : float

Answer grounded in context

context_relevance : float

Context relevant to question

async def evaluate_rag_system ( rag_chain , test_cases : list [ dict ] ) -

RAGEvalMetrics : """Evaluate RAG system on test cases.""" metrics = { k : [ ] for k in RAGEvalMetrics . annotations } for test in test_cases : result = await rag_chain . ainvoke ( { "question" : test [ "question" ] } )

Retrieval metrics

retrieved_ids

{ doc . metadata [ "id" ] for doc in result [ "context" ] } relevant_ids = set ( test [ "relevant_doc_ids" ] ) precision = len ( retrieved_ids & relevant_ids ) / len ( retrieved_ids ) recall = len ( retrieved_ids & relevant_ids ) / len ( relevant_ids ) metrics [ "retrieval_precision" ] . append ( precision ) metrics [ "retrieval_recall" ] . append ( recall )

Use LLM-as-judge for quality metrics

quality

await evaluate_answer_quality ( question = test [ "question" ] , answer = result [ "answer" ] , context = result [ "context" ] , expected = test . get ( "expected_answer" ) ) metrics [ "answer_relevance" ] . append ( quality [ "relevance" ] ) metrics [ "faithfulness" ] . append ( quality [ "faithfulness" ] ) metrics [ "context_relevance" ] . append ( quality [ "context_relevance" ] ) return { k : sum ( v ) / len ( v ) for k , v in metrics . items ( ) }

返回排行榜