Skip to content

基于向量存储的问答系统示例

本示例展示如何使用 LangChain 1.2 构建一个基于向量存储的问答系统,能够根据文档内容回答用户的问题。

功能说明

  • 支持文档加载和处理
  • 支持向量存储和检索
  • 支持基于文档的问答
  • 可扩展的架构

代码实现

1. 安装依赖

bash
pip install langchain langchain-openai python-dotenv faiss-cpu langchain-community

2. 创建配置文件

python
# .env
OPENAI_API_KEY=your-api-key

3. 实现问答系统

python
# qa_system.py
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.chains import RetrievalQA
from dotenv import load_dotenv
import os

# 加载环境变量
load_dotenv()

class QASystem:
    def __init__(self, model_name="gpt-3.5-turbo", temperature=0.7):
        # 初始化嵌入模型
        self.embeddings = OpenAIEmbeddings(
            api_key=os.environ.get("OPENAI_API_KEY")
        )
        
        # 初始化聊天模型
        self.llm = ChatOpenAI(
            api_key=os.environ.get("OPENAI_API_KEY"),
            model_name=model_name,
            temperature=temperature
        )
        
        # 初始化向量存储
        self.vectorstore = None
        self.qa_chain = None
    
    def load_document(self, file_path):
        """加载文档并创建向量存储"""
        # 加载文档
        loader = TextLoader(file_path)
        documents = loader.load()
        
        # 分割文档
        text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
        texts = text_splitter.split_documents(documents)
        
        # 创建向量存储
        self.vectorstore = FAISS.from_documents(texts, self.embeddings)
        
        # 创建检索链
        self.qa_chain = RetrievalQA.from_chain_type(
            llm=self.llm,
            chain_type="stuff",
            retriever=self.vectorstore.as_retriever()
        )
        
        print(f"Document loaded successfully. Split into {len(texts)} chunks.")
    
    def query(self, question):
        """根据文档回答问题"""
        if not self.vectorstore:
            return "Please load a document first."
        
        response = self.qa_chain.invoke({"query": question})
        return response["result"]

if __name__ == "__main__":
    # 初始化问答系统
    qa_system = QASystem()
    
    # 加载文档
    document_path = "path/to/document.txt"
    qa_system.load_document(document_path)
    
    print("Q&A System: Hello! Ask me anything about the document.")
    
    # 交互循环
    while True:
        user_input = input("You: ")
        if user_input.lower() in ["exit", "quit", "bye"]:
            print("Q&A System: Goodbye!")
            break
        
        response = qa_system.query(user_input)
        print(f"Q&A System: {response}")

4. 准备文档

创建一个 document.txt 文件,包含一些关于 LangChain 的信息:

LangChain is a framework for building applications powered by language models. It provides a set of tools, components, and interfaces that simplify the process of creating LLM-powered applications.

The core components of LangChain include:

1. Models: Interfaces to various language models (LLMs, chat models, embedding models)
2. Prompts: Templates for formatting input to language models
3. Chains: Combinations of components to create workflows using LCEL
4. Agents: Entities that can make decisions and use tools
5. Tools: External services that agents can use
6. Memory: Components for storing and managing conversation history
7. Vector Stores: Databases for storing and retrieving embeddings
8. Retrievers: Components for retrieving relevant information

LangChain supports various model providers, including OpenAI, Anthropic, Hugging Face, and more. It also supports local models through libraries like ctransformers and Ollama.

To get started with LangChain 1.2, you can install it using pip:

pip install langchain

For specific integrations, you may need to install additional packages, such as:

pip install langchain-openai  # For OpenAI integration
pip install langchain-anthropic  # For Anthropic integration
pip install langchain-huggingface  # For Hugging Face integration

5. 运行问答系统

bash
python qa_system.py

示例输出

Document loaded successfully. Split into 3 chunks.
Q&A System: Hello! Ask me anything about the document.
You: What is LangChain?
Q&A System: LangChain is a framework for building applications powered by language models. It provides a set of tools, components, and interfaces that simplify the process of creating LLM-powered applications.

You: What are the core components of LangChain?
Q&A System: The core components of LangChain include:

1. Models: Interfaces to various language models (LLMs, chat models, embedding models)
2. Prompts: Templates for formatting input to language models
3. Chains: Combinations of components to create workflows using LCEL
4. Agents: Entities that can make decisions and use tools
5. Tools: External services that agents can use
6. Memory: Components for storing and managing conversation history
7. Vector Stores: Databases for storing and retrieving embeddings
8. Retrievers: Components for retrieving relevant information

You: How to install LangChain?
Q&A System: To get started with LangChain 1.2, you can install it using pip:

pip install langchain

For specific integrations, you may need to install additional packages, such as:

pip install langchain-openai  # For OpenAI integration
pip install langchain-anthropic  # For Anthropic integration
pip install langchain-huggingface  # For Hugging Face integration

You: exit
Q&A System: Goodbye!

扩展功能

1. 支持多个文档

python
def load_documents(self, file_paths):
    """加载多个文档并创建向量存储"""
    documents = []
    for file_path in file_paths:
        loader = TextLoader(file_path)
        documents.extend(loader.load())
    
    # 分割文档
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
    texts = text_splitter.split_documents(documents)
    
    # 创建向量存储
    self.vectorstore = FAISS.from_documents(texts, self.embeddings)
    
    # 创建检索链
    self.qa_chain = RetrievalQA.from_chain_type(
        llm=self.llm,
        chain_type="stuff",
        retriever=self.vectorstore.as_retriever()
    )
    
    print(f"Loaded {len(file_paths)} documents. Split into {len(texts)} chunks.")

2. 使用不同的链类型

python
# 创建检索链(使用 map_reduce 链类型)
self.qa_chain = RetrievalQA.from_chain_type(
    llm=self.llm,
    chain_type="map_reduce",
    retriever=self.vectorstore.as_retriever()
)

# 或使用 refine 链类型
self.qa_chain = RetrievalQA.from_chain_type(
    llm=self.llm,
    chain_type="refine",
    retriever=self.vectorstore.as_retriever()
)

3. 添加记忆功能

python
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import ChatMessageHistory

# 创建消息历史存储
store = {}

def get_session_history(session_id: str):
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

# 创建带记忆的检索链
# 注意:需要使用自定义的提示词模板来包含历史记录
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use the following context to answer the question."),
    MessagesPlaceholder(variable_name="history"),
    ("human", "Context: {context}\n\nQuestion: {question}")
])

# 创建基础链
base_chain = prompt | self.llm

# 创建带记忆的链
chain_with_history = RunnableWithMessageHistory(
    base_chain,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history"
)

4. 使用不同的向量存储

python
# 使用 Chroma 向量存储
from langchain_community.vectorstores import Chroma

# 创建向量存储
self.vectorstore = Chroma.from_documents(texts, self.embeddings)

# 或使用 Pinecone 向量存储
from langchain_community.vectorstores import Pinecone
import pinecone

# 初始化 Pinecone
pinecone.init(
    api_key="your-pinecone-api-key",
    environment="your-pinecone-environment"
)

# 创建索引
index_name = "langchain-demo"
if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        name=index_name,
        dimension=1536,  # OpenAI 嵌入模型的维度
        metric="cosine"
    )

# 创建向量存储
self.vectorstore = Pinecone.from_documents(
    texts, 
    self.embeddings, 
    index_name=index_name
)

总结

本示例展示了如何使用 LangChain 1.2 构建一个基于向量存储的问答系统。通过这个示例,您可以了解到:

  1. 如何加载和处理文档
  2. 如何创建和使用向量存储
  3. 如何构建和运行检索链
  4. 如何扩展问答系统的功能

您可以根据具体需求,进一步扩展和定制这个问答系统,构建更复杂的 LLM 应用。