Skip to content

系统部署

将RAG系统部署到生产环境是确保系统稳定运行的关键步骤。本章节将详细介绍RAG系统的部署策略、方法和最佳实践。

1. 部署架构

选择部署架构

  • 本地部署:部署在本地服务器上
  • 云服务部署:部署在云服务提供商的平台上
  • 容器化部署:使用Docker容器部署
  • Serverless部署:使用Serverless服务部署

架构选择因素

  • 性能需求:系统的响应时间和吞吐量要求
  • 可扩展性:系统的扩展能力
  • 可靠性:系统的稳定性和可用性
  • 成本:部署和维护成本
  • 安全性:数据安全和访问控制要求

2. 本地部署

环境准备

  • 操作系统:Linux、Windows、macOS
  • Python环境:Python 3.8+
  • 依赖库:按照项目需求安装依赖
  • 硬件资源:足够的CPU、内存和存储空间

部署步骤

  1. 安装依赖

    bash
    pip install -r requirements.txt
  2. 配置环境变量

    bash
    # 创建.env文件
    echo "OPENAI_API_KEY=your_api_key" > .env
  3. 启动服务

    bash
    python app.py

示例配置

python
# app.py
from flask import Flask, request, jsonify
from rag_system import RAGSystem

app = Flask(__name__)
rag = RAGSystem()

@app.route('/api/query', methods=['POST'])
def query():
    data = request.json
    question = data.get('question')
    result = rag.query(question)
    return jsonify(result)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

3. Docker部署

Dockerfile

dockerfile
# Dockerfile
FROM python:3.9-slim

WORKDIR /app

# 安装系统依赖
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# 复制依赖文件
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 复制应用代码
COPY . .

# 暴露端口
EXPOSE 5000

# 启动命令
CMD ["python", "app.py"]

Docker Compose

yaml
# docker-compose.yml
version: '3.8'

services:
  rag-app:
    build: .
    ports:
      - "5000:5000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - REDIS_URL=redis://redis:6379
    volumes:
      - ./data:/app/data
    depends_on:
      - redis
      - vector-db

  redis:
    image: redis:alpine
    ports:
      - "6379:6379"

  vector-db:
    image: chromadb/chroma:latest
    ports:
      - "8000:8000"
    volumes:
      - vector-db-data:/data

volumes:
  vector-db-data:

部署命令

bash
# 构建镜像
docker-compose build

# 启动服务
docker-compose up -d

# 查看日志
docker-compose logs -f

# 停止服务
docker-compose down

4. 云服务部署

AWS部署

ECS部署

json
// task-definition.json
{
  "family": "rag-service",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "rag-app",
      "image": "your-repo/rag-app:latest",
      "portMappings": [
        {
          "containerPort": 5000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "OPENAI_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:..."
        }
      ]
    }
  ]
}

Azure部署

Container Instances

bash
# 创建资源组
az group create --name rag-rg --location eastus

# 创建容器
az container create \
  --resource-group rag-rg \
  --name rag-app \
  --image your-repo/rag-app:latest \
  --ports 5000 \
  --environment-variables OPENAI_API_KEY=$OPENAI_API_KEY

GCP部署

Cloud Run

yaml
# cloudbuild.yaml
steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'gcr.io/$PROJECT_ID/rag-app', '.']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'gcr.io/$PROJECT_ID/rag-app']
  - name: 'gcr.io/cloud-builders/gcloud'
    args:
      - 'run'
      - 'deploy'
      - 'rag-app'
      - '--image'
      - 'gcr.io/$PROJECT_ID/rag-app'
      - '--platform'
      - 'managed'
      - '--region'
      - 'us-central1'

5. Kubernetes部署

Deployment配置

yaml
# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rag-app
  template:
    metadata:
      labels:
        app: rag-app
    spec:
      containers:
      - name: rag-app
        image: your-repo/rag-app:latest
        ports:
        - containerPort: 5000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: rag-secrets
              key: openai-api-key
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 5000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 5000
          initialDelaySeconds: 5
          periodSeconds: 5

Service配置

yaml
# k8s-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: rag-service
spec:
  selector:
    app: rag-app
  ports:
  - port: 80
    targetPort: 5000
  type: LoadBalancer

HPA配置

yaml
# k8s-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: rag-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: rag-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

6. 环境配置管理

配置文件

python
# config.py
import os
from dataclasses import dataclass

@dataclass
class Config:
    # API Keys
    OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
    PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
    
    # 数据库配置
    VECTOR_DB_TYPE = os.getenv('VECTOR_DB_TYPE', 'chroma')
    VECTOR_DB_PATH = os.getenv('VECTOR_DB_PATH', './vector_db')
    
    # Redis配置
    REDIS_URL = os.getenv('REDIS_URL', 'redis://localhost:6379')
    
    # 应用配置
    DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
    PORT = int(os.getenv('PORT', 5000))
    HOST = os.getenv('HOST', '0.0.0.0')
    
    # 性能配置
    MAX_WORKERS = int(os.getenv('MAX_WORKERS', 4))
    CACHE_TTL = int(os.getenv('CACHE_TTL', 3600))

class ProductionConfig(Config):
    DEBUG = False

class DevelopmentConfig(Config):
    DEBUG = True

config = {
    'development': DevelopmentConfig,
    'production': ProductionConfig,
    'default': DevelopmentConfig
}

密钥管理

python
# secrets_manager.py
import boto3
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

def get_secret_aws(secret_name):
    """从AWS Secrets Manager获取密钥"""
    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId=secret_name)
    return response['SecretString']

def get_secret_azure(vault_url, secret_name):
    """从Azure Key Vault获取密钥"""
    credential = DefaultAzureCredential()
    client = SecretClient(vault_url=vault_url, credential=credential)
    secret = client.get_secret(secret_name)
    return secret.value

def get_secret_gcp(project_id, secret_id):
    """从GCP Secret Manager获取密钥"""
    from google.cloud import secretmanager
    client = secretmanager.SecretManagerServiceClient()
    name = f"projects/{project_id}/secrets/{secret_id}/versions/latest"
    response = client.access_secret_version(request={"name": name})
    return response.payload.data.decode("UTF-8")

7. CI/CD流水线

GitHub Actions

yaml
# .github/workflows/deploy.yml
name: Deploy RAG Service

on:
  push:
    branches: [ main ]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2
    
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest
    
    - name: Run tests
      run: pytest
    
    - name: Build Docker image
      run: docker build -t rag-app:${{ github.sha }} .
    
    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker tag rag-app:${{ github.sha }} your-repo/rag-app:latest
        docker push your-repo/rag-app:latest
    
    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/rag-app rag-app=your-repo/rag-app:${{ github.sha }}
        kubectl rollout status deployment/rag-app

8. 健康检查

python
# health_check.py
from flask import Flask, jsonify
import psutil
import time

app = Flask(__name__)
start_time = time.time()

@app.route('/health')
def health():
    """健康检查端点"""
    return jsonify({
        'status': 'healthy',
        'uptime': time.time() - start_time
    })

@app.route('/ready')
def ready():
    """就绪检查端点"""
    # 检查依赖服务是否可用
    checks = {
        'vector_db': check_vector_db(),
        'llm_api': check_llm_api()
    }
    
    if all(checks.values()):
        return jsonify({'status': 'ready', 'checks': checks})
    else:
        return jsonify({'status': 'not ready', 'checks': checks}), 503

@app.route('/metrics')
def metrics():
    """Prometheus指标端点"""
    return jsonify({
        'cpu_percent': psutil.cpu_percent(),
        'memory_percent': psutil.virtual_memory().percent,
        'disk_usage': psutil.disk_usage('/').percent
    })

def check_vector_db():
    """检查向量数据库连接"""
    try:
        # 执行简单的查询测试
        return True
    except:
        return False

def check_llm_api():
    """检查LLM API可用性"""
    try:
        # 执行简单的API调用测试
        return True
    except:
        return False