健康检查

健康检查用于监控后端服务器的状态，自动剔除故障服务器，提高系统可用性。

被动健康检查

基本配置

nginx

upstream backend {
    server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    server_name lb.example.com;

    location / {
        proxy_pass http://backend;
    }
}

参数说明

max_fails

最大失败次数
默认值：1
超过该值则标记为不可用

fail_timeout

失败超时时间
默认值：10s
超时后重新尝试连接

工作原理

1. 请求发送到服务器A
2. 服务器A失败（max_fails次）
3. 标记服务器A为不可用
4. 请求发送到其他服务器
5. fail_timeout后重新尝试服务器A

主动健康检查

使用第三方模块

需要安装nginx_upstream_check_module模块。

nginx

upstream backend {
    server 192.168.1.10:8080;
    server 192.168.1.11:8080;
    server 192.168.1.12:8080;

    check interval=3000 rise=2 fall=3 timeout=1000 type=http;
    check_http_send "HEAD /health HTTP/1.0\r\n\r\n";
    check_http_expect_alive http_2xx http_3xx;
}

server {
    listen 80;
    server_name lb.example.com;

    location / {
        proxy_pass http://backend;
    }

    location /status {
        check_status;
        access_log off;
    }
}

参数说明

interval

检查间隔
单位：毫秒

rise

成功次数
连续成功该次数后标记为可用

fall

失败次数
连续失败该次数后标记为不可用

timeout

超时时间
单位：毫秒

type

检查类型
http、tcp、ssl_hello、mysql、ajp

完整配置

被动健康检查

nginx

upstream backend {
    server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:8080 max_fails=3 fail_timeout=30s;

    keepalive 32;
}

server {
    listen 80;
    server_name lb.example.com;

    access_log /var/log/nginx/lb.access.log;
    error_log /var/log/nginx/lb.error.log;

    location / {
        proxy_pass http://backend;

        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        proxy_connect_timeout 5s;
        proxy_send_timeout 5s;
        proxy_read_timeout 5s;

        proxy_next_upstream error timeout http_502 http_503 http_504;
        proxy_next_upstream_tries 2;
    }
}

主动健康检查

nginx

upstream backend {
    server 192.168.1.10:8080;
    server 192.168.1.11:8080;
    server 192.168.1.12:8080;

    check interval=5000 rise=2 fall=3 timeout=2000 type=http;
    check_http_send "GET /health HTTP/1.0\r\nHost: backend\r\n\r\n";
    check_http_expect_alive http_2xx http_3xx;

    keepalive 32;
}

server {
    listen 80;
    server_name lb.example.com;

    location / {
        proxy_pass http://backend;

        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }

    location /status {
        check_status;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}

健康检查端点

后端健康检查

nginx

server {
    listen 8080;
    server_name backend.example.com;

    location /health {
        access_log off;
        return 200 "OK\n";
        add_header Content-Type text/plain;
    }
}

数据库检查

nginx

upstream backend {
    server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    server_name lb.example.com;

    location /health {
        proxy_pass http://backend/health;
        proxy_next_upstream error timeout http_502 http_503 http_504;
    }
}

监控和日志

记录健康检查

nginx

log_format health '$remote_addr - $remote_user [$time_local] "$request" '
                  '$status $body_bytes_sent "$http_referer" '
                  '"$http_user_agent" "$upstream_addr" '
                  '"$upstream_status"';

access_log /var/log/nginx/health.log health;

状态监控

nginx

server {
    listen 80;
    server_name status.example.com;

    location /nginx_status {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}

故障转移

自动故障转移

nginx

upstream backend {
    server 192.168.1.10:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:8080 max_fails=3 fail_timeout=30s;
    server 192.168.1.12:8080 max_fails=3 fail_timeout=30s;
}

server {
    listen 80;
    server_name lb.example.com;

    location / {
        proxy_pass http://backend;

        proxy_next_upstream error timeout http_502 http_503 http_504;
        proxy_next_upstream_tries 2;
    }
}

备用服务器

nginx

upstream backend {
    server 192.168.1.10:8080;
    server 192.168.1.11:8080;
    server 192.168.1.12:8080 backup;
}

server {
    listen 80;
    server_name lb.example.com;

    location / {
        proxy_pass http://backend;
    }
}

常见问题

服务器频繁切换

原因： max_fails和fail_timeout设置不当

解决： 调整参数

nginx

upstream backend {
    server 192.168.1.10:8080 max_fails=5 fail_timeout=60s;
    server 192.168.1.11:8080 max_fails=5 fail_timeout=60s;
}

健康检查失败

原因： 健康检查端点配置错误

解决： 检查健康检查端点

nginx

server {
    listen 8080;

    location /health {
        access_log off;
        return 200 "OK\n";
    }
}

总结

健康检查的关键点：

被动检查：max_fails和fail_timeout
主动检查：第三方模块支持
健康端点：提供/health接口
故障转移：自动切换到可用服务器
监控日志：记录健康检查状态

合理配置健康检查，提高系统可用性和稳定性。

健康检查 ​

被动健康检查 ​

基本配置 ​

参数说明 ​

工作原理 ​

主动健康检查 ​

使用第三方模块 ​

参数说明 ​

完整配置 ​

被动健康检查 ​

主动健康检查 ​

健康检查端点 ​

后端健康检查 ​

数据库检查 ​

监控和日志 ​

记录健康检查 ​

状态监控 ​

故障转移 ​

自动故障转移 ​

备用服务器 ​

常见问题 ​

服务器频繁切换 ​

健康检查失败 ​

总结 ​

健康检查

被动健康检查

基本配置

参数说明

工作原理

主动健康检查

使用第三方模块

参数说明

完整配置

被动健康检查

主动健康检查

健康检查端点

后端健康检查

数据库检查

监控和日志

记录健康检查

状态监控

故障转移

自动故障转移

备用服务器

常见问题

服务器频繁切换

健康检查失败

总结