系统部署

将RAG系统部署到生产环境是确保系统稳定运行的关键步骤。本章节将详细介绍RAG系统的部署策略、方法和最佳实践。

1. 部署架构

选择部署架构

本地部署：部署在本地服务器上
云服务部署：部署在云服务提供商的平台上
容器化部署：使用Docker容器部署
Serverless部署：使用Serverless服务部署

架构选择因素

性能需求：系统的响应时间和吞吐量要求
可扩展性：系统的扩展能力
可靠性：系统的稳定性和可用性
成本：部署和维护成本
安全性：数据安全和访问控制要求

2. 本地部署

环境准备

操作系统：Linux、Windows、macOS
Python环境：Python 3.8+
依赖库：按照项目需求安装依赖
硬件资源：足够的CPU、内存和存储空间

部署步骤

安装依赖：
bash
```
pip install -r requirements.txt
```

配置环境变量：

bash

# 创建.env文件
echo "OPENAI_API_KEY=your_api_key" > .env

启动服务：
bash
```
python app.py
```

示例配置

python

# app.py
from flask import Flask, request, jsonify
from rag_system import RAGSystem

app = Flask(__name__)
rag = RAGSystem()

@app.route('/api/query', methods=['POST'])
def query():
    data = request.json
    question = data.get('question')
    result = rag.query(question)
    return jsonify(result)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

3. Docker部署

Dockerfile

dockerfile

# Dockerfile
FROM python:3.9-slim

WORKDIR /app

# 安装系统依赖
RUN apt-get update && apt-get install -y \
    gcc \
    && rm -rf /var/lib/apt/lists/*

# 复制依赖文件
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 复制应用代码
COPY . .

# 暴露端口
EXPOSE 5000

# 启动命令
CMD ["python", "app.py"]

Docker Compose

yaml

# docker-compose.yml
version: '3.8'

services:
  rag-app:
    build: .
    ports:
      - "5000:5000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - REDIS_URL=redis://redis:6379
    volumes:
      - ./data:/app/data
    depends_on:
      - redis
      - vector-db

  redis:
    image: redis:alpine
    ports:
      - "6379:6379"

  vector-db:
    image: chromadb/chroma:latest
    ports:
      - "8000:8000"
    volumes:
      - vector-db-data:/data

volumes:
  vector-db-data:

部署命令

bash

# 构建镜像
docker-compose build

# 启动服务
docker-compose up -d

# 查看日志
docker-compose logs -f

# 停止服务
docker-compose down

4. 云服务部署

AWS部署

ECS部署

json

// task-definition.json
{
  "family": "rag-service",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "rag-app",
      "image": "your-repo/rag-app:latest",
      "portMappings": [
        {
          "containerPort": 5000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        {
          "name": "OPENAI_API_KEY",
          "valueFrom": "arn:aws:secretsmanager:..."
        }
      ]
    }
  ]
}

Azure部署

Container Instances

bash

# 创建资源组
az group create --name rag-rg --location eastus

# 创建容器
az container create \
  --resource-group rag-rg \
  --name rag-app \
  --image your-repo/rag-app:latest \
  --ports 5000 \
  --environment-variables OPENAI_API_KEY=$OPENAI_API_KEY

GCP部署

Cloud Run

yaml

# cloudbuild.yaml
steps:
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'gcr.io/$PROJECT_ID/rag-app', '.']
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'gcr.io/$PROJECT_ID/rag-app']
  - name: 'gcr.io/cloud-builders/gcloud'
    args:
      - 'run'
      - 'deploy'
      - 'rag-app'
      - '--image'
      - 'gcr.io/$PROJECT_ID/rag-app'
      - '--platform'
      - 'managed'
      - '--region'
      - 'us-central1'

5. Kubernetes部署

Deployment配置

yaml

# k8s-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rag-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rag-app
  template:
    metadata:
      labels:
        app: rag-app
    spec:
      containers:
      - name: rag-app
        image: your-repo/rag-app:latest
        ports:
        - containerPort: 5000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: rag-secrets
              key: openai-api-key
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 5000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 5000
          initialDelaySeconds: 5
          periodSeconds: 5

Service配置

yaml

# k8s-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: rag-service
spec:
  selector:
    app: rag-app
  ports:
  - port: 80
    targetPort: 5000
  type: LoadBalancer

HPA配置

yaml

# k8s-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: rag-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: rag-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

6. 环境配置管理

配置文件

python

# config.py
import os
from dataclasses import dataclass

@dataclass
class Config:
    # API Keys
    OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
    PINECONE_API_KEY = os.getenv('PINECONE_API_KEY')
    
    # 数据库配置
    VECTOR_DB_TYPE = os.getenv('VECTOR_DB_TYPE', 'chroma')
    VECTOR_DB_PATH = os.getenv('VECTOR_DB_PATH', './vector_db')
    
    # Redis配置
    REDIS_URL = os.getenv('REDIS_URL', 'redis://localhost:6379')
    
    # 应用配置
    DEBUG = os.getenv('DEBUG', 'False').lower() == 'true'
    PORT = int(os.getenv('PORT', 5000))
    HOST = os.getenv('HOST', '0.0.0.0')
    
    # 性能配置
    MAX_WORKERS = int(os.getenv('MAX_WORKERS', 4))
    CACHE_TTL = int(os.getenv('CACHE_TTL', 3600))

class ProductionConfig(Config):
    DEBUG = False

class DevelopmentConfig(Config):
    DEBUG = True

config = {
    'development': DevelopmentConfig,
    'production': ProductionConfig,
    'default': DevelopmentConfig
}

密钥管理

python

# secrets_manager.py
import boto3
from azure.identity import DefaultAzureCredential
from azure.keyvault.secrets import SecretClient

def get_secret_aws(secret_name):
    """从AWS Secrets Manager获取密钥"""
    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId=secret_name)
    return response['SecretString']

def get_secret_azure(vault_url, secret_name):
    """从Azure Key Vault获取密钥"""
    credential = DefaultAzureCredential()
    client = SecretClient(vault_url=vault_url, credential=credential)
    secret = client.get_secret(secret_name)
    return secret.value

def get_secret_gcp(project_id, secret_id):
    """从GCP Secret Manager获取密钥"""
    from google.cloud import secretmanager
    client = secretmanager.SecretManagerServiceClient()
    name = f"projects/{project_id}/secrets/{secret_id}/versions/latest"
    response = client.access_secret_version(request={"name": name})
    return response.payload.data.decode("UTF-8")

7. CI/CD流水线

GitHub Actions

yaml

# .github/workflows/deploy.yml
name: Deploy RAG Service

on:
  push:
    branches: [ main ]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v2
    
    - name: Set up Python
      uses: actions/setup-python@v2
      with:
        python-version: '3.9'
    
    - name: Install dependencies
      run: |
        pip install -r requirements.txt
        pip install pytest
    
    - name: Run tests
      run: pytest
    
    - name: Build Docker image
      run: docker build -t rag-app:${{ github.sha }} .
    
    - name: Push to registry
      run: |
        echo ${{ secrets.DOCKER_PASSWORD }} | docker login -u ${{ secrets.DOCKER_USERNAME }} --password-stdin
        docker tag rag-app:${{ github.sha }} your-repo/rag-app:latest
        docker push your-repo/rag-app:latest
    
    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/rag-app rag-app=your-repo/rag-app:${{ github.sha }}
        kubectl rollout status deployment/rag-app

8. 健康检查

python

# health_check.py
from flask import Flask, jsonify
import psutil
import time

app = Flask(__name__)
start_time = time.time()

@app.route('/health')
def health():
    """健康检查端点"""
    return jsonify({
        'status': 'healthy',
        'uptime': time.time() - start_time
    })

@app.route('/ready')
def ready():
    """就绪检查端点"""
    # 检查依赖服务是否可用
    checks = {
        'vector_db': check_vector_db(),
        'llm_api': check_llm_api()
    }
    
    if all(checks.values()):
        return jsonify({'status': 'ready', 'checks': checks})
    else:
        return jsonify({'status': 'not ready', 'checks': checks}), 503

@app.route('/metrics')
def metrics():
    """Prometheus指标端点"""
    return jsonify({
        'cpu_percent': psutil.cpu_percent(),
        'memory_percent': psutil.virtual_memory().percent,
        'disk_usage': psutil.disk_usage('/').percent
    })

def check_vector_db():
    """检查向量数据库连接"""
    try:
        # 执行简单的查询测试
        return True
    except:
        return False

def check_llm_api():
    """检查LLM API可用性"""
    try:
        # 执行简单的API调用测试
        return True
    except:
        return False

系统部署 ​

1. 部署架构 ​

选择部署架构 ​

架构选择因素 ​

2. 本地部署 ​

环境准备 ​

部署步骤 ​

示例配置 ​

3. Docker部署 ​

Dockerfile ​

Docker Compose ​

部署命令 ​

4. 云服务部署 ​

AWS部署 ​

ECS部署 ​

Azure部署 ​

Container Instances ​

GCP部署 ​

Cloud Run ​

5. Kubernetes部署 ​

Deployment配置 ​

Service配置 ​

HPA配置 ​

6. 环境配置管理 ​

配置文件 ​

密钥管理 ​

7. CI/CD流水线 ​

GitHub Actions ​

8. 健康检查 ​

系统部署

1. 部署架构

选择部署架构

架构选择因素

2. 本地部署

环境准备

部署步骤

示例配置

3. Docker部署

Dockerfile

Docker Compose

部署命令

4. 云服务部署

AWS部署

ECS部署

Azure部署

Container Instances

GCP部署

Cloud Run

5. Kubernetes部署

Deployment配置

Service配置

HPA配置

6. 环境配置管理

配置文件

密钥管理

7. CI/CD流水线

GitHub Actions

8. 健康检查