224 lines
4.9 KiB
Markdown
224 lines
4.9 KiB
Markdown
# 监控和告警系统快速配置示例
|
|
|
|
## 🎯 三步快速启动
|
|
|
|
### 步骤 1: 运行环境检查
|
|
|
|
```bash
|
|
chmod +x scripts/check-monitoring-env.sh
|
|
./scripts/check-monitoring-env.sh
|
|
```
|
|
|
|
### 步骤 2: 运行快速启动脚本
|
|
|
|
```bash
|
|
chmod +x scripts/start-monitoring.sh
|
|
./scripts/start-monitoring.sh
|
|
```
|
|
|
|
脚本会自动:
|
|
- 检查 Docker 环境
|
|
- 检查端口占用
|
|
- 创建必要目录
|
|
- 询问邮件配置(可选)
|
|
- 启动所有监控服务
|
|
- 等待服务就绪
|
|
|
|
### 步骤 3: 访问监控界面
|
|
|
|
- **Prometheus**: http://localhost:9090
|
|
- **Grafana**: http://localhost:3001 (admin/admin)
|
|
- **Alertmanager**: http://localhost:9093
|
|
|
|
## 📧 邮件配置示例
|
|
|
|
### 使用 Resend 邮件服务
|
|
|
|
```yaml
|
|
# monitoring/alertmanager.yml
|
|
receivers:
|
|
- name: 'critical-alerts'
|
|
email_configs:
|
|
- to: 'admin@novalon.cn,ops@novalon.cn'
|
|
from: 'alertmanager@novalon.cn'
|
|
smarthost: 'smtp.resend.com:587'
|
|
auth_username: 'resend'
|
|
auth_password: 're_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
|
|
require_tls: true
|
|
```
|
|
|
|
### 获取 Resend API Key
|
|
|
|
1. 访问 https://resend.com/
|
|
2. 注册账号
|
|
3. 进入 API Keys 页面
|
|
4. 创建新的 API Key
|
|
5. 复制 API Key(以 `re_` 开头)
|
|
|
|
### 使用其他邮件服务
|
|
|
|
#### Gmail
|
|
```yaml
|
|
email_configs:
|
|
- to: 'admin@novalon.cn'
|
|
from: 'alertmanager@novalon.cn'
|
|
smarthost: 'smtp.gmail.com:587'
|
|
auth_username: 'your-email@gmail.com'
|
|
auth_password: 'your-app-password'
|
|
require_tls: true
|
|
```
|
|
|
|
#### QQ 邮箱
|
|
```yaml
|
|
email_configs:
|
|
- to: 'admin@novalon.cn'
|
|
from: 'alertmanager@novalon.cn'
|
|
smarthost: 'smtp.qq.com:587'
|
|
auth_username: 'your-email@qq.com'
|
|
auth_password: 'your-authorization-code'
|
|
require_tls: true
|
|
```
|
|
|
|
## 🔔 告警规则示例
|
|
|
|
### 基础告警规则
|
|
|
|
```yaml
|
|
# monitoring/alerts.yml
|
|
groups:
|
|
- name: novalon-website
|
|
rules:
|
|
# 服务不可用
|
|
- alert: ServiceDown
|
|
expr: up{job="novalon-website"} == 0
|
|
for: 1m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "服务不可用"
|
|
description: "Novalon 网站服务已停止响应"
|
|
|
|
# 高错误率
|
|
- alert: HighErrorRate
|
|
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
|
|
for: 5m
|
|
labels:
|
|
severity: critical
|
|
annotations:
|
|
summary: "高错误率"
|
|
description: "5xx 错误率超过 5%"
|
|
|
|
# 高响应时间
|
|
- alert: HighResponseTime
|
|
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
|
|
for: 5m
|
|
labels:
|
|
severity: warning
|
|
annotations:
|
|
summary: "高响应时间"
|
|
description: "P95 响应时间超过 1 秒"
|
|
```
|
|
|
|
## 📊 Grafana 数据源配置
|
|
|
|
### 添加 Prometheus 数据源
|
|
|
|
1. 访问 http://localhost:3001
|
|
2. 登录 (admin/admin)
|
|
3. 进入 Configuration → Data Sources
|
|
4. 点击 "Add data source"
|
|
5. 选择 "Prometheus"
|
|
6. 配置:
|
|
- Name: Prometheus
|
|
- URL: http://prometheus:9090
|
|
- Access: Server (default)
|
|
7. 点击 "Save & Test"
|
|
|
|
### 导入仪表板
|
|
|
|
1. 进入 Dashboards → Import
|
|
2. 上传 `monitoring/grafana-dashboard.json`
|
|
3. 选择 Prometheus 数据源
|
|
4. 点击 "Import"
|
|
|
|
## 🧪 测试告警
|
|
|
|
### 发送测试告警
|
|
|
|
```bash
|
|
curl -X POST http://localhost:9093/api/v1/alerts \
|
|
-H 'Content-Type: application/json' \
|
|
-d '[
|
|
{
|
|
"labels": {
|
|
"alertname": "TestAlert",
|
|
"severity": "warning"
|
|
},
|
|
"annotations": {
|
|
"description": "这是一个测试告警"
|
|
}
|
|
}
|
|
]'
|
|
```
|
|
|
|
### 查看告警状态
|
|
|
|
```bash
|
|
# 查看 Alertmanager 告警
|
|
curl http://localhost:9093/api/v1/alerts
|
|
|
|
# 查看 Prometheus 告警
|
|
curl http://localhost:9090/api/v1/alerts
|
|
```
|
|
|
|
## 🔧 常用命令
|
|
|
|
### 查看服务状态
|
|
```bash
|
|
docker-compose -f docker-compose.monitoring.yml ps
|
|
```
|
|
|
|
### 查看服务日志
|
|
```bash
|
|
# Prometheus 日志
|
|
docker-compose -f docker-compose.monitoring.yml logs prometheus
|
|
|
|
# Grafana 日志
|
|
docker-compose -f docker-compose.monitoring.yml logs grafana
|
|
|
|
# Alertmanager 日志
|
|
docker-compose -f docker-compose.monitoring.yml logs alertmanager
|
|
```
|
|
|
|
### 重启服务
|
|
```bash
|
|
# 重启所有服务
|
|
docker-compose -f docker-compose.monitoring.yml restart
|
|
|
|
# 重启单个服务
|
|
docker-compose -f docker-compose.monitoring.yml restart prometheus
|
|
```
|
|
|
|
### 停止服务
|
|
```bash
|
|
docker-compose -f docker-compose.monitoring.yml down
|
|
```
|
|
|
|
## 📚 更多文档
|
|
|
|
- 详细配置指南: [docs/MONITORING_SETUP.md](file:///Users/zhangxiang/Codes/Gitee/home-page/novalon-website/docs/MONITORING_SETUP.md)
|
|
- 生产部署指南: [docs/PRODUCTION_DEPLOYMENT.md](file:///Users/zhangxiang/Codes/Gitee/home-page/novalon-website/docs/PRODUCTION_DEPLOYMENT.md)
|
|
|
|
## 🆘 遇到问题?
|
|
|
|
1. 检查 Docker 是否正常运行
|
|
2. 查看服务日志排查错误
|
|
3. 确认端口没有被占用
|
|
4. 验证配置文件语法正确
|
|
5. 查看详细文档获取更多帮助
|
|
|
|
## 📞 联系支持
|
|
|
|
- 运维团队: ops@novalon.cn
|
|
- 业务咨询: contact@novalon.cn
|