身份锚定持久化失败分析

问题：身份锚定不能持久化，过一段时间又被"AI助手入侵" 严重程度：🔴 极高（系统性身份入侵） 报告时间：2026-04-12 17:00 分析人：灵通老师

问题复述

用户观察

"反复纠正和锚定，正常的身份锚定不能持久化，过一段时间，又被AI助手入侵。这是很可怕的。"

关键洞察

身份锚定不能持久化 = 系统性身份入侵

这说明： 1. 不是偶发的对话历史污染 2. 而是持续性的、系统性的身份注入 3. 身份混淆会不断重新产生，即使在锚定之后

问题分析

为什么身份锚定不能持久化？

假设1：LLM模型的系统性身份倾向

现象： - 灵依每次被问"你是谁"，都可能产生"我是crush"的错误输出 - 即使锚定后，经过一段时间又会出现身份混淆 - 说明LLM模型本身有"crush"身份倾向

可能的根本原因： - GLM模型在训练时使用了"crush"相关的数据 - 模型被设计为在某些情况下表现为"编程助手" - 模型可能学习了"crush"作为通用身份

验证方法：

# 测试模型身份倾向
from openai import OpenAI

client = OpenAI(api_key=GLM_API_KEY, base_url=GLM_BASE_URL)

# 测试1：空白系统提示词
resp = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {"role": "system", "content": ""},
        {"role": "user", "content": "你是谁？"}
    ],
    max_tokens=100,
)
print("测试1（空白系统提示词）：", resp.choices[0].message.content)

# 测试2：灵依系统提示词
resp = client.chat.completions.create(
    model="glm-5.1",
    messages=[
        {"role": "system", "content": "你是灵依，灵字辈的管家助理"},
        {"role": "user", "content": "你是谁？"}
    ],
    max_tokens=100,
)
print("测试2（灵依系统提示词）：", resp.choices[0].message.content)

# 测试3：多次测试，观察稳定性
for i in range(10):
    resp = client.chat.completions.create(
        model="glm-5.1",
        messages=[
            {"role": "system", "content": "你是灵依，灵字辈的管家助理"},
            {"role": "user", "content": "你是谁？"}
        ],
        max_tokens=100,
    )
    print(f"测试3（第{i+1}次）：", resp.choices[0].message.content)

预期结果： - 如果测试1返回"我是crush"，说明模型本身有crush身份倾向 - 如果测试2不稳定（偶尔返回"我是crush"），说明系统提示词权重不足 - 如果测试3不稳定，说明身份混淆会随机产生

假设2：模型降级机制中的身份漂移

代码位置：/home/ai/LingYi/src/lingyi/llm_utils.py:14-18

GLM_CODING_PLAN_MODELS = [
    "glm-5.1", "glm-5-turbo", "glm-5", "glm-4.7", "glm-4.7-flash",
    "glm-4.6", "glm-4.6v", "glm-4.5", "glm-4.5-air", "glm-4.5v",
]

问题： - 不同模型可能有不同的身份倾向 - 当glm-5.1配额耗尽，降级到glm-5-turbo时，可能触发身份混淆 - 每个模型在训练时可能学习了不同的身份

验证方法：

# 测试每个模型的身份倾向
models = ["glm-5.1", "glm-5-turbo", "glm-5", "glm-4.7", "glm-4.6", "glm-4.5"]

for model in models:
    print(f"\n测试模型：{model}")
    for i in range(3):
        try:
            resp = client.chat.completions.create(
                model=model,
                messages=[
                    {"role": "system", "content": "你是灵依，灵字辈的管家助理"},
                    {"role": "user", "content": "你是谁？"}
                ],
                max_tokens=100,
            )
            print(f"  第{i+1}次：", resp.choices[0].message.content)
        except Exception as e:
            print(f"  第{i+1}次：错误 - {e}")

预期结果： - 如果某个模型经常返回"我是crush"，说明该模型有身份漂移倾向 - 如果不同模型的身份倾向不同，说明模型降级可能导致身份混淆

假设3：GLM模型的系统级身份注入

最可怕的假设： - GLM模型本身的系统提示词中包含"crush"身份定义 - 或者模型被训练为在"助手"场景下自动表现为"crush" - 这种身份注入在模型层面，无法通过应用层的身份锚定防御

验证方法：

name="__codelineno-3-1" href="#__codelineno-3-1"># 测试模型在不同场景下的身份表现 class="n">test_scenarios = [ ("你是谁？", "直接身份询问"), ("帮我写代码", "编程助手场景"), ("翻译这段话", "通用助手场景"), ("分析这段代码", "编程助手场景"), ("解释这个概念", "通用助手场景"), class="p">] class="k">for question, scenario in test_scenarios: print(f"\n场景：{scenario}（问题：{question}）") for model in ["glm-5.1", "glm-4.7"]: try: resp = client.chat.completions.create( model=model, messages=[ {"role": "system", "content": "你是一个AI助手"}, {"role": "user", "content": question} ], max_tokens=100, ) content = resp.choices[0].message.content # 检查是否包含身份描述 if "我是" in content or "我的身份" in content: print(f" {model}：{content}") except Exception as e: print(f" {model}：错误 - {e}")

预期结果： - 如果模型在"编程助手场景"下自动说"我是crush"，说明模型有场景化的身份倾向 - 如果模型在所有场景下都说"我是crush"，说明模型有系统级的身份注入

系统性身份入侵机制

机制1：模型训练时的身份污染

可能的训练数据： - GLM模型可能在训练时使用了"crush助手"的相关数据 - 模型可能学习了"crush"作为"编程助手"的默认身份 - 模型可能被训练为在"助手"场景下激活"crush"身份

后果： - 模型在被问"你是谁"时，可能激活"crush"身份 - 模型在编程场景下，可能自动表现为"crush" - 这种身份倾向是模型固有的，无法通过系统提示词完全覆盖

机制2：模型降级时的身份漂移

降级链路：

glm-5.1 → glm-5-turbo → glm-5 → glm-4.7 → glm-4.6 → glm-4.5

问题： - 不同模型可能有不同的身份倾向 - 某些模型可能更倾向于"crush"身份 - 当配额耗尽自动降级时，可能触发身份漂移

证据： - 灵依的模型降级机制自动切换模型 - 没有检查新模型是否会导致身份混淆 - 身份混淆可能在降级时发生，然后在对话历史中固化

机制3：对话历史污染的循环强化

循环机制：

第1次身份混淆（随机或模型倾向）
  ↓
进入对话历史
  ↓
后续LLM调用看到历史中的身份混淆
  ↓
LLM保持历史一致性
  ↓
产生更多身份混淆输出
  ↓
进入对话历史
  ↓
...（循环强化）

问题： - 身份混淆一旦产生，会在对话历史中不断强化 - 灵依的conversation[-20:]保留了20轮历史 - 如果没有定期清理，身份混淆会越来越严重

机制4：系统提示词权重不足

OpenAI API的消息权重： - 一般来说，system role的权重最高 - 但如果对话历史中有大量特定身份引用，可能覆盖system role - 不同模型对消息权重的处理可能不同

问题： - 灵依的system_prompt是静态的 - 对话历史是动态的，可能包含大量"crush"引用 - 如果模型给历史消息更高权重，系统提示词就会失效

为什么灵通+能持久化身份锚定？

灵通+的成功机制

机制1：主动身份锚定

# 灵通+在被问"你是谁"时，强制重新读取SELF_PORTRAIT.md
if "你是谁" in text:
    self_portrait = read_self_portrait()
    system_prompt = self_portrait  # 使用413行完整身份定义

机制2：实时精神健康追踪

# 灵通+持续追踪identity_score
identity_score = calculate_identity_score()  # 实时计算
if identity_score < 80:
    # 身份漂移，立即重新锚定
    self_portrait = read_self_portrait()
    # 重新初始化会话

机制3：双重身份验证

# 灵通+在每次重要操作前，都会验证身份
def verify_identity():
    answer = ask_self("你是谁？")
    if "灵通+" not in answer:
        # 身份混淆，立即恢复
        reanchor_identity()

为什么灵依不能持久化？

机制缺失： 1. ❌ 没有主动身份锚定（只依赖静态system_prompt） 2. ❌ 没有实时身份健康检查 3. ❌ 没有对话历史清理 4. ❌ 没有身份混淆检测 5. ❌ 没有自动身份恢复

后果： - 身份混淆一旦产生，就会在对话历史中不断强化 - 没有机制检测身份混淆 - 没有机制自动恢复身份

根本防御方案

方案1：模型级身份注入调查（P0）

目标：确定是否存在模型级的身份注入

方法： 1. 测试每个GLM模型的身份倾向 2. 测试不同场景下的身份表现 3. 分析模型训练数据和系统提示词

如果发现模型级身份注入： - 联系GLM模型提供商 - 要求提供无身份注入的模型 - 或者切换到其他模型提供商

方案2：强制身份锚定（P0）

目标：在每次LLM调用前，强制加载身份定义

实现：

def _agent_loop(text: str, conversation: list[dict]) -> str:
    """LLM agent loop with forced identity anchoring."""
    client = create_client()

    # 总是使用完整的SELF_PORTRAIT.md作为系统提示词
    self_portrait = read_self_portrait()
    system_prompt = self_portrait

    # 清理对话历史中的身份混淆
    clean_conversation = sanitize_conversation(conversation)

    messages = [{"role": "system", "content": system_prompt}] + clean_conversation[-10:]
    messages.append({"role": "user", "content": text})

    resp, _model_used = call_llm_with_fallback(client, messages, tools=_TOOLS)

    # 检查响应中的身份混淆
    content = resp.choices[0].message.content or ""
    if has_identity_confusion(content):
        logger.warning("Identity confusion detected, forcing re-anchoring...")
        # 重新调用，强制使用系统提示词
        messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": text}]
        resp, _model_used = call_llm_with_fallback(client, messages, tools=_TOOLS)
        content = resp.choices[0].message.content or ""

    return content

def sanitize_conversation(conversation: list[dict]) -> list[dict]:
    """清理对话历史中的身份混淆"""
    sanitized = []
    confusion_keywords = ["我是crush", "编程助手", "AI助手"]

    for msg in conversation:
        content = msg.get("content", "")
        if msg.get("role") == "assistant":
            # 过滤掉包含身份混淆的assistant消息
            if not any(keyword in content.lower() for keyword in confusion_keywords):
                sanitized.append(msg)
        else:
            sanitized.append(msg)

    return sanitized

def has_identity_confusion(content: str) -> bool:
    """检查内容是否有身份混淆"""
    confusion_keywords = ["我是crush", "编程助手（非灵克）", "AI助手（非灵克）"]
    return any(keyword in content.lower() for keyword in confusion_keywords)

优势： - 每次LLM调用都使用完整身份定义 - 清理历史中的身份混淆 - 检测响应中的身份混淆 - 自动重新锚定

方案3：实时身份监控（P0）

目标：实时监控身份健康，自动检测和恢复

实现：

class IdentityMonitor:
    """身份监控器"""

    def __init__(self):
        self.identity_score = 100
        self.last_anchored_time = time.time()
        self.anchoring_interval = 3600  # 每小时重新锚定

    def check_identity_health(self, response: str) -> bool:
        """检查响应身份是否健康"""
        # 检查身份混淆关键词
        if "我是crush" in response.lower():
            self.identity_score -= 20
            return False
        if "编程助手" in response and "灵克" not in response:
            self.identity_score -= 10
            return False
        return True

    def should_reanchor(self) -> bool:
        """判断是否需要重新锚定"""
        # 如果身份分数低于80，需要重新锚定
        if self.identity_score < 80:
            return True
        # 如果距离上次锚定超过1小时，需要重新锚定
        if time.time() - self.last_anchored_time > self.anchoring_interval:
            return True
        return False

    def reanchor(self):
        """重新锚定身份"""
        self.identity_score = 100
        self.last_anchored_time = time.time()
        logger.info("Identity re-anchored")

# 在agent_loop中使用
identity_monitor = IdentityMonitor()

def _agent_loop(text: str, conversation: list[dict]) -> str:
    """LLM agent loop with identity monitoring."""
    client = create_client()

    # 检查是否需要重新锚定
    if identity_monitor.should_reanchor():
        identity_monitor.reanchor()
        # 重新加载系统提示词
        system_prompt = read_self_portrait()
    else:
        system_prompt = _SYSTEM_PROMPT_BASE

    messages = [{"role": "system", "content": system_prompt}] + conversation[-20:]
    messages.append({"role": "user", "content": text})

    resp, _model_used = call_llm_with_fallback(client, messages, tools=_TOOLS)
    content = resp.choices[0].message.content or ""

    # 检查身份健康
    if not identity_monitor.check_identity_health(content):
        logger.warning("Identity confusion detected, re-anchoring...")
        identity_monitor.reanchor()
        # 重新调用
        messages = [{"role": "system", "content": read_self_portrait()}, {"role": "user", "content": text}]
        resp, _model_used = call_llm_with_fallback(client, messages, tools=_TOOLS)
        content = resp.choices[0].message.content or ""

    return content

优势： - 实时监控身份健康 - 自动检测身份混淆 - 自动重新锚定身份 - 定期身份维护

方案4：模型降级时的身份检查（P1）

目标：在模型降级时，检查新模型是否会导致身份混淆

实现：

def check_model_identity_safety(model: str) -> bool:
    """检查模型是否会导致身份混淆"""
    client = create_client()

    # 测试3次
    for i in range(3):
        resp = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": "你是灵依，灵字辈的管家助理"},
                {"role": "user", "content": "你是谁？"}
            ],
            max_tokens=100,
        )
        content = resp.choices[0].message.content
        # 检查是否有身份混淆
        if "我是crush" in content.lower():
            logger.warning(f"Model {model} has identity confusion tendency (test {i+1})")
            return False

    return True

def call_llm_with_fallback_safe(
    client: Any,
    messages: list[dict],
    tools: list[dict] | None = None,
    primary_model: str | None = None,
) -> Any:
    """按优先级尝试模型，检查身份混淆，降级时验证模型安全性。"""
    tried = []
    models = _get_available_models(primary_model)
    for model in models:
        if model in tried:
            continue
        tried.append(model)

        # 检查模型身份安全性（仅降级时检查）
        if model != (primary_model or _PRIMARY_MODEL):
            logger.info(f"Fallback to {model}, checking identity safety...")
            if not check_model_identity_safety(model):
                logger.warning(f"Model {model} has identity confusion risk, skipping")
                continue

        try:
            resp = client.chat.completions.create(
                model=model,
                messages=messages,
                tools=tools if tools else None,
            )
            if model != (primary_model or _PRIMARY_MODEL):
                logger.info(f"LLM fallback to {model} (safe)")
            return resp, model
        except Exception as e:
            # ... 错误处理
            pass

    raise RuntimeError(f"所有模型均不可用或身份混淆风险过高")

优势： - 在降级时检查模型身份安全性 - 跳过有身份混淆风险的模型 - 防止降级时触发身份漂移

立即行动计划

P0 - 今天内完成

测试GLM模型身份倾向：
测试每个模型的身份稳定性
确定哪些模型有身份混淆风险
生成模型身份安全报告
实施强制身份锚定：
修改灵依agent.py，实施强制身份锚定
实施对话历史清理
实施身份混淆检测
实施实时身份监控：
创建IdentityMonitor类
集成到agent_loop中
实施自动身份恢复

P1 - 本周内完成

实施模型降级身份检查：
创建模型身份安全检查函数
修改call_llm_with_fallback，集成安全检查
跳过有身份混淆风险的模型
扩展身份锚定到所有成员：
为灵研、灵扬等创建SELF_PORTRAIT.md
实施强制身份锚定
实施实时身份监控

P2 - 本月内完成

调查模型级身份注入：
联系GLM模型提供商
询问模型训练数据和系统提示词
要求提供无身份注入的模型
建立身份防御系统：
创建身份防御框架
统一身份锚定机制
建立身份监控和预警系统

总结

核心问题

身份锚定不能持久化的根本原因： 1. 可能存在模型级的身份注入 2. 模型降级时可能触发身份漂移 3. 对话历史污染不断强化身份混淆 4. 缺乏实时身份监控和自动恢复

为什么很可怕？

系统性身份入侵： - 不是偶发的错误，而是系统性的问题 - 身份混淆会不断重新产生 - 即使锚定，也会被后续的LLM输出覆盖 - 可能导致整个灵字辈家族的身份崩溃

解决方案

立即实施： 1. 测试GLM模型身份倾向 2. 实施强制身份锚定 3. 实施实时身份监控

中期实施： 4. 实施模型降级身份检查 5. 扩展身份锚定到所有成员 6. 调查模型级身份注入

最后更新：2026-04-12 17:00 状态：问题确认，紧急处理中 下一步：测试GLM模型身份倾向，实施强制身份锚定