智能知识系统 - 部署指南
本文档提供智能知识系统的完整部署指南,支持 Docker Compose 和 Kubernetes 两种部署方式。
目录
系统要求
最低配置
| 组件 | 最低要求 | 推荐配置 |
|---|---|---|
| CPU | 2 核 | 4 核+ |
| 内存 | 4 GB | 8 GB+ |
| 磁盘 | 20 GB | 50 GB+ SSD |
| 操作系统 | Linux/macOS | Ubuntu 22.04 LTS |
| Docker | 20.10+ | 24.0+ |
| Docker Compose | 2.0+ | 2.20+ |
软件依赖
# Docker
docker --version # >= 20.10
# Docker Compose
docker-compose --version # >= 2.0
# 或
docker compose version
# Kubernetes (可选)
kubectl version --client # >= 1.25
快速开始
1. 克隆项目
2. 配置环境变量
3. 启动服务
4. 验证部署
# 健康检查
curl http://localhost:8001/health
# 访问 Web 界面
open http://localhost:8008
# 访问 API 文档
open http://localhost:8001/docs
Docker Compose 部署
服务架构
┌─────────────────────────────────────────────────────────┐
│ Nginx │
│ :8008 │
└─────────────────────┬───────────────────────────────────┘
│
┌─────────────────────┴───────────────────────────────────┐
│ API Gateway │
│ :8001 │
└─────┬─────────┬─────────┬─────────┬─────────┬──────────┘
│ │ │ │ │
┌─────▼───┐ ┌──▼────┐ ┌──▼────┐ ┌──▼─────┐ ┌─▼─────┐
│Postgres │ │ Redis │ │Prometheus│ │Grafana│ │Exporters│
│ :5436 │ │ :6381 │ │ :9090 │ │ :3000 │ │ │
└─────────┘ └───────┘ └──────────┘ └───────┘ └─────────┘
完整部署步骤
1. 准备配置文件
创建 .env 文件:
# 数据库配置
POSTGRES_PASSWORD=your_secure_password_here
POSTGRES_USER=zhineng
POSTGRES_DB=zhineng_kb
# Redis 配置
REDIS_PASSWORD=your_redis_password_here
# API 配置
API_PORT=8000
LOG_LEVEL=INFO
# AI API 配置
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_API_URL=https://api.deepseek.com/v1
# 监控配置
GRAFANA_ADMIN_PASSWORD=your_grafana_password
2. 启动核心服务
# 启动数据库和缓存
docker-compose up -d postgres redis
# 等待数据库初始化
sleep 10
# 启动 API 服务
docker-compose up -d api
# 启动前端服务
docker-compose up -d nginx
3. 启动监控服务
# 启动 Prometheus
docker-compose up -d prometheus
# 启动 Grafana
docker-compose up -d grafana
# 启动 Exporters
docker-compose up -d redis-exporter postgres-exporter
4. 验证服务状态
使用部署脚本
# 环境检查
./scripts/deploy.sh check
# 构建镜像
./scripts/deploy.sh build
# 启动服务
./scripts/deploy.sh start
# 停止服务
./scripts/deploy.sh stop
# 重启服务
./scripts/deploy.sh restart
# 查看状态
./scripts/deploy.sh status
# 查看日志
./scripts/deploy.sh logs [service]
# 健康检查
./scripts/deploy.sh health
数据持久化
Docker Compose 默认使用命名卷进行数据持久化:
volumes:
postgres_data: # PostgreSQL 数据
redis_data: # Redis 数据
prometheus_data: # Prometheus 指标
grafana_data: # Grafana 配置和仪表板
备份和恢复:
Kubernetes 部署
准备 K8s 清单文件
1. 命名空间
2. ConfigMap
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: zhineng-config
namespace: zhineng-kb
data:
LOG_LEVEL: "INFO"
API_PORT: "8000"
POSTGRES_HOST: "postgres-service"
POSTGRES_PORT: "5432"
REDIS_HOST: "redis-service"
REDIS_PORT: "6379"
3. Secret
# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: zhineng-secret
namespace: zhineng-kb
type: Opaque
data:
POSTGRES_PASSWORD: <base64-encoded-password>
REDIS_PASSWORD: <base64-encoded-password>
DEEPSEEK_API_KEY: <base64-encoded-api-key>
生成 base64 编码:
4. PostgreSQL 部署
# k8s/postgres.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
namespace: zhineng-kb
spec:
serviceName: postgres-service
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: pgvector/pgvector:pg16
ports:
- containerPort: 5432
env:
- name: POSTGRES_USER
value: "zhineng"
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: zhineng-secret
key: POSTGRES_PASSWORD
- name: POSTGRES_DB
value: "zhineng_kb"
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
volumeMounts:
- name: init-sql
mountPath: /docker-entrypoint-initdb.d
volumes:
- name: postgres-storage
persistentVolumeClaim:
claimName: postgres-pvc
- name: init-sql
configMap:
name: init-sql-config
---
apiVersion: v1
kind: Service
metadata:
name: postgres-service
namespace: zhineng-kb
spec:
selector:
app: postgres
ports:
- port: 5432
targetPort: 5432
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
namespace: zhineng-kb
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
5. Redis 部署
# k8s/redis.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: zhineng-kb
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:7-alpine
ports:
- containerPort: 6379
args:
- redis-server
- --requirepass
- $(REDIS_PASSWORD)
env:
- name: REDIS_PASSWORD
valueFrom:
secretKeyRef:
name: zhineng-secret
key: REDIS_PASSWORD
volumeMounts:
- name: redis-storage
mountPath: /data
volumes:
- name: redis-storage
persistentVolumeClaim:
claimName: redis-pvc
---
apiVersion: v1
kind: Service
metadata:
name: redis-service
namespace: zhineng-kb
spec:
selector:
app: redis
ports:
- port: 6379
targetPort: 6379
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: redis-pvc
namespace: zhineng-kb
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
6. API 服务部署
# k8s/api.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
namespace: zhineng-kb
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: zhineng-api:latest
ports:
- containerPort: 8000
env:
- name: DATABASE_URL
value: "postgresql://zhineng:$(POSTGRES_PASSWORD)@postgres-service:5432/zhineng_kb"
- name: REDIS_URL
value: "redis://:$(REDIS_PASSWORD)@redis-service:6379/0"
- name: LOG_LEVEL
valueFrom:
configMapKeyRef:
name: zhineng-config
key: LOG_LEVEL
envFrom:
- secretRef:
name: zhineng-secret
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 5
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: api-service
namespace: zhineng-kb
spec:
selector:
app: api
ports:
- port: 8000
targetPort: 8000
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-ingress
namespace: zhineng-kb
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- api.yourdomain.com
secretName: api-tls
rules:
- host: api.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8000
7. HPA (Horizontal Pod Autoscaler)
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-hpa
namespace: zhineng-kb
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
K8s 部署命令
# 创建命名空间
kubectl apply -f k8s/namespace.yaml
# 创建配置和密钥
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secret.yaml
# 部署数据库
kubectl apply -f k8s/postgres.yaml
# 部署缓存
kubectl apply -f k8s/redis.yaml
# 等待就绪
kubectl wait --for=condition=ready pod -l app=postgres -n zhineng-kb --timeout=60s
kubectl wait --for=condition=ready pod -l app=redis -n zhineng-kb --timeout=60s
# 部署 API 服务
kubectl apply -f k8s/api.yaml
# 配置自动扩缩容
kubectl apply -f k8s/hpa.yaml
# 查看状态
kubectl get all -n zhineng-kb
# 查看日志
kubectl logs -f deployment/api -n zhineng-kb
使用 Helm 部署
创建 Helm Chart:
# 创建 Chart 结构
helm create zhineng-kb
# 编辑 values.yaml
cat > zhineng-kb/values.yaml << EOF
replicaCount: 3
image:
repository: zhineng-api
tag: latest
pullPolicy: IfNotPresent
service:
type: ClusterIP
port: 8000
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 70
targetMemoryUtilizationPercentage: 80
postgres:
enabled: true
persistence:
size: 10Gi
redis:
enabled: true
persistence:
size: 2Gi
EOF
# 部署
helm install zhineng-kb ./zhineng-kb -n zhineng-kb --create-namespace
# 升级
helm upgrade zhineng-kb ./zhineng-kb -n zhineng-kb
# 回滚
helm rollback zhineng-kb -n zhineng-kb
环境配置
核心环境变量
| 变量名 | 说明 | 默认值 | 必填 |
|---|---|---|---|
| DATABASE_URL | PostgreSQL 连接字符串 | - | 是 |
| REDIS_URL | Redis 连接字符串 | - | 是 |
| API_HOST | API 监听地址 | 0.0.0.0 | 否 |
| API_PORT | API 监听端口 | 8000 | 否 |
| LOG_LEVEL | 日志级别 | INFO | 否 |
| DEEPSEEK_API_KEY | DeepSeek API 密钥 | - | 是* |
| DEEPSEEK_API_URL | DeepSeek API 地址 | - | 否 |
- 仅推理服务需要
数据库配置
# PostgreSQL 连接字符串格式
postgresql://[user]:[password]@[host]:[port]/[database]
# 示例
DATABASE_URL=postgresql://zhineng:zhineng123@postgres:5432/zhineng_kb
Redis 配置
# Redis 连接字符串格式
redis://:[password]@[host]:[port]/[db]
# 示例
REDIS_URL=redis://:redis123@redis:6379/0
监控配置
Prometheus 配置
Prometheus 配置文件位于 monitoring/prometheus/prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'zhineng-api'
static_configs:
- targets: ['api:8000']
metrics_path: '/api/v1/metrics/prometheus'
- job_name: 'postgres'
static_configs:
- targets: ['postgres-exporter:9187']
- job_name: 'redis'
static_configs:
- targets: ['redis-exporter:9121']
Grafana 仪表板
访问 Grafana: http://localhost:3000
默认凭据:
- 用户名: admin
- 密码: admin123
导入仪表板:
1. 登录 Grafana
2. 导航到 Dashboards -> Import
3. 上传 monitoring/grafana/dashboards/overview.json
告警规则 (可选)
创建告警规则文件 monitoring/prometheus/alerts.yml:
groups:
- name: zhineng_alerts
interval: 30s
rules:
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.05
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} errors/sec"
- alert: HighResponseTime
expr: histogram_quantile(0.95, http_request_duration_seconds_bucket) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High response time detected"
description: "P95 response time is {{ $value }} seconds"
生产环境建议
安全配置
- 更改默认密码
# 修改 .env 文件中的密码
POSTGRES_PASSWORD=your_strong_password
REDIS_PASSWORD=your_strong_password
GRAFANA_ADMIN_PASSWORD=your_strong_password
- 启用 HTTPS
# nginx/nginx.conf
server {
listen 443 ssl http2;
server_name yourdomain.com;
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# ... 其他配置
}
- 配置防火墙
- 启用认证 (未实现)
# 在 main.py 中添加 JWT 认证
from fastapi.security import HTTPBearer
security = HTTPBearer()
@app.get("/api/v1/protected")
async def protected_route(credentials: HTTPAuthorizationCredentials = Depends(security)):
# 验证 token
pass
性能优化
- 数据库连接池
# config.py
DATABASE_POOL_SIZE = int(os.getenv("DATABASE_POOL_SIZE", "20"))
DATABASE_MAX_OVERFLOW = int(os.getenv("DATABASE_MAX_OVERFLOW", "10"))
- Redis 缓存策略
# 配置缓存过期时间
CACHE_TTL = {
"document": 3600, # 文档缓存 1 小时
"search": 300, # 搜索结果缓存 5 分钟
"embedding": 86400, # 嵌入向量缓存 24 小时
}
- API 限流
# gateway/rate_limiter.py
RATE_LIMITS = {
"default": 100, # 每分钟 100 次请求
"search": 50, # 搜索 API 每分钟 50 次
"reason": 20, # 推理 API 每分钟 20 次
}
备份策略
- 自动备份 (使用 Cron)
- 远程备份
- 备份验证
故障排查
常见问题
1. 容器启动失败
# 查看日志
docker-compose logs api
# 检查配置
docker-compose config
# 重新构建
docker-compose build --no-cache api
2. 数据库连接失败
# 检查 PostgreSQL 状态
docker-compose ps postgres
# 进入容器检查
docker-compose exec postgres psql -U zhineng -d zhineng_kb
# 检查网络
docker network ls
docker network inspect zhineng-network
3. API 响应慢
# 检查资源使用
docker stats
# 查看慢查询日志
docker-compose exec postgres cat /var/log/postgresql/postgresql-slow.log
# 检查缓存状态
docker-compose exec redis redis-cli -a redis123 INFO stats
4. 内存不足
日志查看
# 查看所有日志
docker-compose logs
# 跟踪日志
docker-compose logs -f
# 查看特定服务日志
docker-compose logs -f api
# 查看最近 100 行
docker-compose logs --tail=100 api
健康检查
# 使用部署脚本
./scripts/deploy.sh health
# 手动检查
curl http://localhost:8001/health
curl http://localhost:8001/api/v1/domains
curl http://localhost:8001/api/v1/stats