Files
novalon-website/docs/MONITORING_QUICKSTART.md
T

224 lines
4.9 KiB
Markdown

# 监控和告警系统快速配置示例
## 🎯 三步快速启动
### 步骤 1: 运行环境检查
```bash
chmod +x scripts/check-monitoring-env.sh
./scripts/check-monitoring-env.sh
```
### 步骤 2: 运行快速启动脚本
```bash
chmod +x scripts/start-monitoring.sh
./scripts/start-monitoring.sh
```
脚本会自动:
- 检查 Docker 环境
- 检查端口占用
- 创建必要目录
- 询问邮件配置(可选)
- 启动所有监控服务
- 等待服务就绪
### 步骤 3: 访问监控界面
- **Prometheus**: http://localhost:9090
- **Grafana**: http://localhost:3001 (admin/admin)
- **Alertmanager**: http://localhost:9093
## 📧 邮件配置示例
### 使用 Resend 邮件服务
```yaml
# monitoring/alertmanager.yml
receivers:
- name: 'critical-alerts'
email_configs:
- to: 'admin@novalon.cn,ops@novalon.cn'
from: 'alertmanager@novalon.cn'
smarthost: 'smtp.resend.com:587'
auth_username: 'resend'
auth_password: 're_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
require_tls: true
```
### 获取 Resend API Key
1. 访问 https://resend.com/
2. 注册账号
3. 进入 API Keys 页面
4. 创建新的 API Key
5. 复制 API Key(以 `re_` 开头)
### 使用其他邮件服务
#### Gmail
```yaml
email_configs:
- to: 'admin@novalon.cn'
from: 'alertmanager@novalon.cn'
smarthost: 'smtp.gmail.com:587'
auth_username: 'your-email@gmail.com'
auth_password: 'your-app-password'
require_tls: true
```
#### QQ 邮箱
```yaml
email_configs:
- to: 'admin@novalon.cn'
from: 'alertmanager@novalon.cn'
smarthost: 'smtp.qq.com:587'
auth_username: 'your-email@qq.com'
auth_password: 'your-authorization-code'
require_tls: true
```
## 🔔 告警规则示例
### 基础告警规则
```yaml
# monitoring/alerts.yml
groups:
- name: novalon-website
rules:
# 服务不可用
- alert: ServiceDown
expr: up{job="novalon-website"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "服务不可用"
description: "Novalon 网站服务已停止响应"
# 高错误率
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "高错误率"
description: "5xx 错误率超过 5%"
# 高响应时间
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "高响应时间"
description: "P95 响应时间超过 1 秒"
```
## 📊 Grafana 数据源配置
### 添加 Prometheus 数据源
1. 访问 http://localhost:3001
2. 登录 (admin/admin)
3. 进入 Configuration → Data Sources
4. 点击 "Add data source"
5. 选择 "Prometheus"
6. 配置:
- Name: Prometheus
- URL: http://prometheus:9090
- Access: Server (default)
7. 点击 "Save & Test"
### 导入仪表板
1. 进入 Dashboards → Import
2. 上传 `monitoring/grafana-dashboard.json`
3. 选择 Prometheus 数据源
4. 点击 "Import"
## 🧪 测试告警
### 发送测试告警
```bash
curl -X POST http://localhost:9093/api/v1/alerts \
-H 'Content-Type: application/json' \
-d '[
{
"labels": {
"alertname": "TestAlert",
"severity": "warning"
},
"annotations": {
"description": "这是一个测试告警"
}
}
]'
```
### 查看告警状态
```bash
# 查看 Alertmanager 告警
curl http://localhost:9093/api/v1/alerts
# 查看 Prometheus 告警
curl http://localhost:9090/api/v1/alerts
```
## 🔧 常用命令
### 查看服务状态
```bash
docker-compose -f docker-compose.monitoring.yml ps
```
### 查看服务日志
```bash
# Prometheus 日志
docker-compose -f docker-compose.monitoring.yml logs prometheus
# Grafana 日志
docker-compose -f docker-compose.monitoring.yml logs grafana
# Alertmanager 日志
docker-compose -f docker-compose.monitoring.yml logs alertmanager
```
### 重启服务
```bash
# 重启所有服务
docker-compose -f docker-compose.monitoring.yml restart
# 重启单个服务
docker-compose -f docker-compose.monitoring.yml restart prometheus
```
### 停止服务
```bash
docker-compose -f docker-compose.monitoring.yml down
```
## 📚 更多文档
- 详细配置指南: [docs/MONITORING_SETUP.md](file:///Users/zhangxiang/Codes/Gitee/home-page/novalon-website/docs/MONITORING_SETUP.md)
- 生产部署指南: [docs/PRODUCTION_DEPLOYMENT.md](file:///Users/zhangxiang/Codes/Gitee/home-page/novalon-website/docs/PRODUCTION_DEPLOYMENT.md)
## 🆘 遇到问题?
1. 检查 Docker 是否正常运行
2. 查看服务日志排查错误
3. 确认端口没有被占用
4. 验证配置文件语法正确
5. 查看详细文档获取更多帮助
## 📞 联系支持
- 运维团队: ops@novalon.cn
- 业务咨询: contact@novalon.cn