进化验证Agent使用指南

版本: 1.0.0 日期: 2026-04-01 状态: Phase 2 完成 ✅

📋 概述

EvolutionVerificationAgent 是灵知系统自进化流程中的关键组件，负责验证新版本是否真的优于旧版本，避免无效或退化的改进。

核心职责

多维度验证 - 从内容、结构、质量、用户反馈等多个维度评估
竞品对比 - 与混元、DeepSeek等竞品进行对比验证
智能决策 - 基于验证结果决定是否采纳改进
改进建议 - 提供具体的改进方向和优化建议

🚀 快速开始

基础使用

from backend.services.evolution.verification_agent import get_verification_agent

# 获取验证Agent单例
verifier = get_verification_agent()

# 执行验证
result = await verifier.verify_evolution(
    db=db,
    query="如何提高学习注意力？",
    old_response="简短的回答...",
    new_response="详细的、结构化的回答...",
    user_feedback=None  # 可选
)

# 检查结果
if result.is_valid:
    print(f"✅ 验证通过！置信度: {result.confidence:.2f}")
    print(f"原因: {result.reasons}")
    print(f"建议: {result.suggestions}")
else:
    print(f"❌ 验证失败，需要改进")
    print(f"失败原因: {result.reasons}")

🔍 验证维度详解

1. 基础指标验证

检查回答的基本质量指标：

长度改进 - 新版本是否比旧版本长至少20%
最小长度 - 是否满足最小长度要求（500字）
长度比例 - 新旧版本的长度比例

metrics = await verifier._verify_basic_metrics(old_response, new_response)

# 返回：
{
    "old_length": 150,
    "new_length": 600,
    "length_improved": True,
    "length_ratio": 4.0,
    "meets_min_length": True
}

2. 结构化验证

检查回答的结构化程度：

标题 - 是否使用 # 或 ## 标题
列表 - 是否使用 - 或 1. 列表
段落 - 是否有清晰的段落分隔
代码块 - 是否包含代码块（```）

metrics = await verifier._verify_structure(response)

# 返回：
{
    "has_headings": True,
    "has_lists": True,
    "has_paragraphs": True,
    "has_code": False,
    "structure_score": 0.75,  # 0.0 - 1.0
    "meets_threshold": True
}

3. 内容质量验证

使用对比引擎评估内容质量：

完整性 (0-10) - 关键词覆盖、内容长度、结构
实用性 (0-10) - 具体建议、案例、数据支持、引用
清晰度 (0-10) - 标题、列表、段落结构
整体得分 (0-10) - 加权平均（30% + 40% + 30%）

metrics = await verifier._verify_quality(query, old_response, new_response)

# 返回：
{
    "new_scores": {
        "completeness": 8.5,
        "usefulness": 9.0,
        "clarity": 8.0,
        "overall": 8.53
    },
    "old_scores": {
        "completeness": 6.0,
        "usefulness": 6.5,
        "clarity": 6.0,
        "overall": 6.17
    },
    "improvement": {
        "completeness_improved": True,
        "usefulness_improved": True,
        "clarity_improved": True,
        "overall_improved": True,
        "overall_delta": 2.36
    }
}

4. 竞品对比验证

与混元、DeepSeek等竞品进行对比：

并行调用多个AI厂商
使用对比引擎评估排名
要求灵知排名前2

metrics = await verifier._verify_with_competitors(query, response)

# 返回：
{
    "has_competitor_data": True,
    "rank": 1,  # 灵知排名第1
    "scores": {
        "lingzhi": {"overall": 8.5},
        "hunyuan": {"overall": 7.0},
        "deepseek": {"overall": 6.5}
    },
    "meets_threshold": True,  # 前2名
    "winner": "lingzhi"
}

5. 用户反馈验证

如果有用户反馈，验证用户满意度：

最低满意度要求：4.0/5.0
如果没有反馈，默认通过

metrics = await verifier._verify_user_feedback(user_feedback)

# 返回：
{
    "has_feedback": True,
    "satisfaction": 4.5,
    "meets_threshold": True,
    "comments": "非常有帮助，感谢！"
}

🎯 验证决策逻辑

综合判断规则

# 必须通过的项（任意一项失败则不通过）
- 回答长度 >= 500字
- 结构化分数 >= 0.6

# 可选通过的项（至少通过一项）
- 回答长度有显著提升（>= 20%）
- 整体质量优于旧版本
- 竞品对比排名优秀（前2名）
- 用户反馈满意度高（>= 4.0）

# 置信度计算
- 基础分：通过必须项 = +0.3
- 结构分：结构化分数 > 0.5 = +0.1
- 改进分：长度改进 = +0.2
- 质量分：整体质量改进 = +0.2
- 竞品分：竞品排名优秀 = +0.2
- 反馈分：用户满意度高 = +0.3

# 最终判断
is_valid = (
    没有必须失败的项目 AND
    置信度 >= 0.7 AND
    至少一项改进
)

⚙️ 配置阈值

动态更新阈值

# 更新验证阈值
await verifier.update_thresholds({
    "min_confidence": 0.8,  # 提高置信度要求
    "min_improvement_ratio": 1.5,  # 要求50%的改进
    "min_length": 800,  # 提高最小长度要求
})

# 查看当前阈值
thresholds = await verifier.get_thresholds()
print(thresholds)

默认阈值

参数	默认值	说明
`min_confidence`	0.7	最低置信度
`min_improvement_ratio`	1.2	最小改进比例（20%）
`min_length`	500	最小回答长度
`min_user_satisfaction`	4.0	最低用户满意度
`max_competitor_rank`	2	竞品排名要求（前2名）
`min_structure_score`	0.6	最小结构化分数

📊 验证结果

VerificationResult 对象

class VerificationResult:
    is_valid: bool  # 是否验证通过
    confidence: float  # 置信度 (0.0 - 1.0)
    reasons: List[str]  # 验证通过或失败的原因
    suggestions: List[str]  # 改进建议
    metrics: Dict[str, Any]  # 详细的验证指标

使用示例

result = await verifier.verify_evolution(...)

# 检查验证结果
if result.is_valid:
    print(f"✅ 验证通过！")
    print(f"置信度: {result.confidence:.2%}")

    # 采纳改进
    await adopt_improvement(new_response)
else:
    print(f"❌ 验证失败")
    print(f"原因: {', '.join(result.reasons)}")
    print(f"建议: {', '.join(result.suggestions)}")

    # 根据建议重新生成
    improved = await regenerate_with_suggestions(result.suggestions)

# 查看详细指标
print(f"结构化分数: {result.metrics['structure_score']:.2f}")
print(f"长度改进: {result.metrics['length_ratio']:.2%}")
print(f"竞品排名: {result.metrics.get('rank', 'N/A')}")

🔄 集成到进化流程

完整的进化流水线

async def evolution_pipeline(
    db: AsyncSession,
    query: str,
    old_response: str,
    verifier: EvolutionVerificationAgent
):
    """完整的进化流水线"""

    # 1. 生成改进版本
    improved_response = await generate_improved_version(query, old_response)

    # 2. 验证改进
    result = await verifier.verify_evolution(
        db=db,
        query=query,
        old_response=old_response,
        new_response=improved_response
    )

    # 3. 决定是否采纳
    if result.is_valid:
        # 采纳改进
        await adopt_improvement(db, query, improved_response, result)

        # 更新进化日志
        await log_successful_evolution(db, {
            "query": query,
            "old_response": old_response,
            "new_response": improved_response,
            "verification": result.to_dict()
        })

        return improved_response
    else:
        # 拒绝改进，记录原因
        await log_failed_evolution(db, {
            "query": query,
            "old_response": old_response,
            "rejected_response": improved_response,
            "reasons": result.reasons
        })

        # 返回旧版本
        return old_response

🧪 测试

运行测试

# 单元测试
pytest tests/test_verification_agent.py -v

# 集成测试（需要配置API密钥）
pytest tests/test_verification_agent.py::TestVerificationAgentIntegration -v -m integration

# 性能测试
pytest tests/test_verification_agent.py::TestVerificationAgentIntegration::test_verification_performance -v

测试覆盖

✅ 基础指标验证
✅ 结构化验证
✅ 用户反馈验证
✅ 综合决策逻辑
✅ 阈值配置
✅ 完整验证流程
✅ 竞品对比（Mock）
✅ 性能基准

📈 性能指标

预期性能

基础验证: < 50ms
结构验证: < 10ms
质量验证: < 200ms（使用对比引擎）
竞品对比: < 20s（并行调用API）
完整验证流程: < 25s

优化策略

异步并行 - 所有验证步骤并行执行
超时控制 - 竞品对比设置15秒超时
降级策略 - API失败时默认通过，不阻塞流程
缓存 - 相同query的验证结果缓存1小时

🎓 最佳实践

1. 不要过度验证

# ❌ 不推荐：每次都验证所有维度
if always_verify_all_dimensions():
    result = await verifier.verify_evolution(...)

# ✅ 推荐：抽样验证
if should_sample_for_verification(sample_rate=0.2):
    result = await verifier.verify_evolution(...)

2. 合理设置阈值

# ❌ 不推荐：阈值过高，导致几乎无法通过
verifier.thresholds["min_confidence"] = 0.99

# ✅ 推荐：根据实际情况调整
verifier.thresholds["min_confidence"] = 0.7  # 允许一定的改进空间

3. 重视用户反馈

# 如果有用户反馈，优先考虑
if user_feedback and user_feedback["satisfaction"] >= 4.0:
    # 即使其他指标一般，也采纳改进
    return True

4. 记录详细日志

# 记录每次验证的详细指标
await verifier._log_verification(db, query, old_response, new_response, result)

🔮 未来计划

Phase 3: 探索与规划Agent

EvolutionExplorationAgent - 自动发现改进机会
EvolutionPlanningAgent - 制定改进计划

Phase 4: 生命周期管理

EvolutionAgentLifecycleManager - Agent编排和监控
资源管理和性能优化

Phase 5: 动态Prompt系统

DynamicPromptManager - 实时配置和改进注入
自适应Prompt优化

众智混元，万法灵通 ⚡🚀