Appearance
系统部署
将RAG系统部署到生产环境是确保系统稳定运行的关键步骤。本章节将详细介绍RAG系统的部署策略、方法和最佳实践。
1. 部署架构
选择部署架构
- 本地部署:部署在本地服务器上
- 云服务部署:部署在云服务提供商的平台上
- 容器化部署:使用Docker容器部署
- Serverless部署:使用Serverless服务部署
架构选择因素
- 性能需求:系统的响应时间和吞吐量要求
- 可扩展性:系统的扩展能力
- 可靠性:系统的稳定性和可用性
- 成本:部署和维护成本
- 安全性:数据安全和访问控制要求
2. 本地部署
环境准备
- 操作系统:Linux、Windows、macOS
- Python环境:Python 3.8+
- 依赖库:按照项目需求安装依赖
- 硬件资源:足够的CPU、内存和存储空间
部署步骤
安装依赖:
bashpip install -r requirements.txt配置环境变量:
bash# 创建.env文件 echo "OPENAI_API_KEY=your_api_key" > .env启动服务:
bashpython app.py
示例配置
python
# app.py
from flask import Flask, request, jsonify
from rag_system import RAGSystem
app = Flask(__name__)
rag = RAGSystem()
@app.route('/api/query', methods=['POST'])
def query():
data = request.json
question = data.get('question')
result = rag.query(question)
return jsonify(result)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)3. Docker部署
Dockerfile
dockerfile
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
# 安装系统依赖
RUN apt-get update && apt-get install -y \
gcc \
&& rm -rf /var/lib/apt/lists/*
# 复制依赖文件
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 复制应用代码
COPY . .
# 暴露端口
EXPOSE 5000
# 启动命令
CMD ["python", "app.py"]Docker Compose
yaml
# docker-compose.yml
version: '3.8'
services:
rag-app:
build: .
ports:
- "5000:5000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- REDIS_URL=redis://redis:6379
volumes:
- ./data:/app/data
depends_on:
- redis
- vector-db
redis:
image: redis:alpine
ports:
- "6379:6379"
vector-db:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- vector-db-data:/data
volumes:
vector-db-data:部署命令
bash
# 构建镜像
docker-compose build
# 启动服务
docker-compose up -d
# 查看日志
docker-compose logs -f
# 停止服务
docker-compose down4. 云服务部署
AWS部署
ECS部署
json
// task-definition.json
{
"family": "rag-service",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"containerDefinitions": [
{
"name": "rag-app",
"image": "your-repo/rag-app:latest",
"portMappings": [
{
"containerPort": 5000,
"protocol": "tcp"
}
],
"environment": [
{
"name": "OPENAI_API_KEY",
"valueFrom": "arn:aws:secretsmanager:..."
}
]
}
]
}Azure部署
Container Instances
bash
# 创建资源组
az group create --name rag-rg --location eastus
# 创建容器
az container create \
--resource-group rag-rg \
--name rag-app \
--image your-repo/rag-app:latest \
--ports 5000 \
--environment-variables OPENAI_API_KEY=$OPENAI_API_KEYGCP部署
Cloud Run
yaml
# cloudbuild.yaml
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/rag-app', '.']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/$PROJECT_ID/rag-app']
- name: 'gcr.io/cloud-builders/gcloud'
args:
- 'run'
- 'deploy'
- 'rag-app'
- '--image'
- 'gcr.io/$PROJECT_ID/rag-app'
- '--platform'
- 'managed'
- '--region'
- 'us-central1'5. Kubernetes部署
Deployment配置
yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rag-app
spec:
replicas: 3
selector:
matchLabels:
app: rag-app
template:
metadata:
labels:
app: rag-app
spec:
containers:
- name: rag-app
image: your-repo/rag-app:latest
ports:
- containerPort: 5000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: rag-secrets
key: openai-api-key
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 5000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 5000
initialDelaySeconds: 5
periodSeconds: 5Service配置
yaml
# k8s-service.yaml
apiVersion: v1
kind: Service
metadata:
name: rag-service
spec:
selector:
app: rag-app
ports:
- port: 80
targetPort: 5000
type: LoadBalancerHPA配置
yaml
# k8s-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: rag-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: rag-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 806. 环境配置管理
配置文件
python
# config.py
import os
from dataclasses import dataclass
@dataclass
class Config:
# API Keys
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
# 数据库配置
VECTOR_DB_TYPE = os.getenv('VECTOR_DB_TYPE', 'chroma')
VECTOR_DB_PATH = os.getenv('VECTOR_DB_PATH', './vector_db')
# Redis配置
REDIS_URL = os.getenv('REDIS_URL', 'redis://localhost:6379')
# 应用配置
DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
PORT = int(os.getenv('PORT', 5000))
HOST = os.getenv('HOST', '0.0.0.0')
# 性能配置
MAX_WORKERS = int(os.getenv('MAX_WORKERS', 4))
CACHE_TTL = int(os.getenv('CACHE_TTL', 3600))
class ProductionConfig(Config):
DEBUG = False
class DevelopmentConfig(Config):
DEBUG = True
config = {
'development': DevelopmentConfig,
'production': ProductionConfig,
'default': DevelopmentConfig
}密钥管理
python
# secrets_manager.py
import boto3
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient
def get_secret_aws(secret_name):
"""从AWS Secrets Manager获取密钥"""
client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId=secret_name)
return response['SecretString']
def get_secret_azure(vault_url, secret_name):
"""从Azure Key Vault获取密钥"""
credential = DefaultAzureCredential()
client = SecretClient(vault_url=vault_url, credential=credential)
secret = client.get_secret(secret_name)
return secret.value
def get_secret_gcp(project_id, secret_id):
"""从GCP Secret Manager获取密钥"""
from google.cloud import secretmanager
client = secretmanager.SecretManagerServiceClient()
name = f"projects/{project_id}/secrets/{secret_id}/versions/latest"
response = client.access_secret_version(request={"name": name})
return response.payload.data.decode("UTF-8")7. CI/CD流水线
GitHub Actions
yaml
# .github/workflows/deploy.yml
name: Deploy RAG Service
on:
push:
branches: [ main ]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install pytest
- name: Run tests
run: pytest
- name: Build Docker image
run: docker build -t rag-app:${{ github.sha }} .
- name: Push to registry
run: |
echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
docker tag rag-app:${{ github.sha }} your-repo/rag-app:latest
docker push your-repo/rag-app:latest
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/rag-app rag-app=your-repo/rag-app:${{ github.sha }}
kubectl rollout status deployment/rag-app8. 健康检查
python
# health_check.py
from flask import Flask, jsonify
import psutil
import time
app = Flask(__name__)
start_time = time.time()
@app.route('/health')
def health():
"""健康检查端点"""
return jsonify({
'status': 'healthy',
'uptime': time.time() - start_time
})
@app.route('/ready')
def ready():
"""就绪检查端点"""
# 检查依赖服务是否可用
checks = {
'vector_db': check_vector_db(),
'llm_api': check_llm_api()
}
if all(checks.values()):
return jsonify({'status': 'ready', 'checks': checks})
else:
return jsonify({'status': 'not ready', 'checks': checks}), 503
@app.route('/metrics')
def metrics():
"""Prometheus指标端点"""
return jsonify({
'cpu_percent': psutil.cpu_percent(),
'memory_percent': psutil.virtual_memory().percent,
'disk_usage': psutil.disk_usage('/').percent
})
def check_vector_db():
"""检查向量数据库连接"""
try:
# 执行简单的查询测试
return True
except:
return False
def check_llm_api():
"""检查LLM API可用性"""
try:
# 执行简单的API调用测试
return True
except:
return False