别让LLM写文件：Agent进度跟踪的正确Harness方式|flask|六狼博客|技术博客|技术论坛|六狼网络|六狼科技|六狼IT|六狼星球

2026年5月24日

摘要：用”把schema塞进提示词、让LLM写文件”做Agent进度跟踪，我翻车了。本文把我栽过的坑、钩子postToolUse也救不了的根本矛盾、Claude Code是怎么绕过去的，以及一套可落地的工程范式，一次讲清楚。

一、一个看起来很优雅的设计

给Agent加”进度跟踪”时，最快的方式无疑是把schema塞进提示词，让LLM自己维护一个JSON文件。于是我写下：

Track your progress in `progress.json`.
Schema:
{
  "tasks": [
    { "id": string, "title": string, "status": "todo"|"doing"|"done" }
  ]
}
Update `progress.json` every time you make progress.

简洁，自洽，符合直觉。
跑起来之后发现：

跑十次有三次status写成了"Done" / "complete" / "finished"
JSON偶尔多个尾逗号，整个文件挂掉
并发子Agent同时写，文件被覆盖，只留一条

于是我加了一层postToolUse修正——每次LLM写完文件后，用脚本做schema校验、枚举归一化、格式修复：

# posthook.py —— 每次LLM写完后跑一遍
import json

def fix_progress(path):
    with open(path) as f:
        data = json.load(f)

    # 修正status字段的大小写和枚举偏差
    for task in data.get("tasks", []):
        status = task.get("status", "").lower()
        if status in ("complete", "finished"):
            task["status"] = "done"
        elif status not in ("todo", "doing", "done"):
            task["status"] = "todo"

    # ... 其他修正：字段补全、类型校验、schema归一化等

    with open(path, "w") as f:
        json.dump(data, f, indent=2)
    return data

钩子postToolUse确实能修掉格式错误和枚举偏差，我终于松了一口气。

二、钩子postToolUse修正：治标不治本

但跑了一段时间，更隐蔽的问题冒出来了：

LLM改一个字段，顺手把其他任务”优化”掉了。场景是这样的：Agent正在处理task#9，它读取了progress.json，里面有10个任务。它想更新#9的status为”done”，于是调用edit工具输出新的JSON。但LLM拿到的不是”diff”接口，而是”重写整个文件”的指令——它重新生成了一份JSON，task#4被漏掉了。钩子postToolUse校验通过了：schema是对的，字段类型也没错。但task#4没了。

根本矛盾在这里：

问题	钩子postToolUse能修吗？	根因
大小写/枚举偏差	能	生成模型的自由文本倾向
JSON语法错误	能	生成模型的序列化不稳定性
整文件重写丢数据	不能	LLM上下文有限，只能”凭印象”重写
并发覆盖	不能	文件系统没有事务隔离
幻觉新字段	不能	生成模型的创造性是特性，不是bug
过期读导致的状态冲突	不能	LLM看到的永远是上一轮快照

钩子postToolUse是事后修，但数据丢失和并发冲突发生在LLM生成的那一刻，根本无从恢复。根本原因是：校验层放错了位置——它应该在LLM动笔之前就把非法意图拦住，而不是等它写完了再擦屁股。

三、第一性原理：LLM是推理引擎，不是数据库

把”LLM写文件”当成存储层，等价于让一个没有事务、没有约束、上下文有限、还会幻觉的实体去维护关键状态。它的失败模式是结构性的，不是prompt调一调、postToolUse修一修能彻底解决的：

失败模式	根因
schema漂移	LLM是生成模型，不是确定性序列化器
整文件重写丢数据	上下文窗口装不下全量状态，LLM只能”凭印象”重写
没校验闸门	错误写入和正确写入对LLM来说看起来一样
过期读	LLM看到的永远是上一轮的快照
自由文本枚举	自然语言模型天生抗拒离散约束
不幂等	LLM不知道”我上一秒做过这件事”

结论：状态必须住在进程里，由工程师管。LLM只负责”提出意图”。

四、Claude Code是怎么做的

社区有对Claude Code实现方式的分析与讨论（如 GitHub 上相关技术和代码拆解），其架构思路值得借鉴，没有任何”write your progress to a file”的提示词，而是四个独立工具：

TaskCreate：新建任务- TaskGet：读取单个任务详情- TaskList：列出所有任务- TaskUpdate：更新任务（partial patch）

每个工具的入参都是一个strict zod schema（additionalProperties: false），存储完全在主进程的内存+持久化层里，UI直接订阅这个store渲染。

// src/tools/TaskUpdateTool/prompt.ts节选
Use this tool to update a task in the task list.

## Status Workflow
Status progresses: pending → in_progress → completed
Use `deleted` to permanently remove a task.

## Staleness
Make sure to read a task's latest state using `TaskGet` before updating it.

更关键的是BashTool的提示里反过来明确写着：

NEVER use the TodoWriteTool or Agent tools (inside specific subroutines)

Reserve using the Bash exclusively for system commands … If you are unsure and there is a relevant dedicated tool, default to using the dedicated tool.

而系统主提示词里又再写一遍：”Do NOT use the Bash to run commands when a relevant dedicated tool is provided. … This is CRITICAL to assisting the user.”

三层叠加：

架构隔离 — 任务存储不是文件，Bash物理上写不进
schema闸门 — 每个Task工具都是strict zod，非法入参直接拒
Prompt引导 — 正向要求+反向禁止+每个工具自带When to Use / NOT to Use

这才是LLM能”稳定”管理状态的真正原因——模型并没有变聪明，是错误路径被工程上堵死了。

五、推荐架构

LLM ──tool_call(JSON)──▶ Validator (strict schema)
                              │
                              ▼  拒绝非法 → 错误返回LLM自修Your Store (SQLite / Redis / RDB)
                              │
                              ▼
                    Tool返回规范化、格式化后的文本 ──▶ LLM

四层职责：

层	职责	谁写
LLM	提出”意图”	模型
Tool schema	校验、归一化、拒绝	工程师
Store	持久化、并发、事务	工程师
Formatter	把存储结构渲染成LLM易读文本	工程师

六、8条工程化原则

1.一动词一工具

不要updateProgress(整个state)，拆成：

progress.create({ subject, description })          // → store层返回id
progress.get({ id })
progress.list({ status? })
progress.update({ id, patch })                      //只接patch
progress.delete({ id })

每个工具是一个原子操作。LLM再也”重写不了整个世界”。

2. Strict schema把校验前移到工具边界

import { z } from 'zod';

const StatusEnum = z.enum(['pending', 'in_progress', 'done', 'blocked']);

const CreateInput = z.strictObject({
  subject: z.string().min(1).max(200),
  description: z.string().max(2000).optional(),
  blockedBy: z.array(z.string()).optional(),
  idempotencyKey: z.string().min(1).max(100).optional(),
});

const UpdateInput = z.strictObject({
  id: z.string(),
  patch: z.strictObject({
    status: StatusEnum.optional(),
    subject: z.string().min(1).max(200).optional(),
    description: z.string().max(2000).optional(),
  }).refine(obj => Object.keys(obj).length > 0, {
    message: "patch cannot be empty",
  }),
  expectedVersion: z.number().int().optional(),
});

要点：

strictObject → 多余字段直接拒（防止幻觉新字段）
枚举 → 拒绝"Done" / "complete"
ID 不让LLM生成，store层uuid/自增-拒绝时返回结构化错误，让LLM下一轮自我修正

3. Patch语义，不要Replace语义`update`只接diff，不接整条记录：

//  错progress.update({ id: '3', record: { subject, description, status, ... } });

//  对progress.update({ id: '3', patch: { status: 'done' } });

好处：

不需要LLM把全量上下文背在脑子里
不可能误删兄弟字段
token消耗大幅下降

4.进程内格式化的读接口`list` / `get`的返回值由代码渲染，而不是吐原始JSON：

function renderTaskList(tasks: Task[]): string {
  return tasks.map(t => {
    const blocked = t.blockedBy.length
      ? ` [blocked by ${t.blockedBy.map(id => `#${id}`).join(', ')}]`
      : '';
    return `#${t.id} [${t.status}] ${t.subject}${blocked}`;
  }).join('\n');
}

LLM每次看到的都是同样形状，下游行为天然稳定。Claude Code的TaskListTool.mapToolResultToToolResultBlockParam就是这套做法。

5.过期写防护

两种约束强度递增的方案：

// 方案A：软性约束（文档约定）
// 在update工具描述里写：
// "Always call `get` before `update` to fetch the latest state."
// 依赖LLM自觉遵守，无法强制

// 方案B：硬性约束（乐观并发控制）
// 由store层在写入前比对版本，不一致直接拒
progress.update({
  id: '3',
  expectedVersion: 7,
  patch: { status: 'done' },
});

6.幂等键

progress.create({
  subject: 'Fix login bug',
  idempotencyKey: 'turn-42-task-A',
});
//同一个key重复调用 → 返回已有记录，不重复创建

LLM重试、回放对话历史时不会污染状态。

7. Prompt加固（锦上添花）

系统提示词+每个工具描述里写清楚：

“所有进度跟踪必须用progress.*工具”
“严禁用shell或文件写工具维护进度文件”
每个工具配When to Use / When NOT to Use
在Bash/Edit工具的描述里也显式禁掉这条路（双向围栏）

光靠prompt不够，但和前6条一起就是保险带+安全绳。

8.让错误路径”做了也没用”

如果你的UI/系统只从store读，那么LLM即便写了野文件，也什么都不会发生：

UI不会因此刷新
后续list看不到
依赖关系不生效

架构级约束>> Prompt级约束。”做不到”永远比”被告知不要做”靠谱。

七、最小可用实现（TypeScript + zod + SQLite）

// store.ts
import Database from 'better-sqlite3';
import { randomUUID } from 'crypto';

const db = new Database('progress.db');
db.exec(`
  CREATE TABLE IF NOT EXISTS tasks (
    id TEXT PRIMARY KEY,
    subject TEXT NOT NULL,
    description TEXT,
    status TEXT NOT NULL DEFAULT 'pending',
    version INTEGER NOT NULL DEFAULT 1,
    created_at INTEGER NOT NULL,
    updated_at INTEGER NOT NULL
  );
  CREATE TABLE IF NOT EXISTS idempotency (
    key TEXT PRIMARY KEY,
    task_id TEXT NOT NULL
  );
`);

export const store = {
  create(input: { subject: string; description?: string; idempotencyKey?: string }) {
    if (input.idempotencyKey) {
      const row = db.prepare('SELECT task_id FROM idempotency WHERE key = ?').get(input.idempotencyKey);
      if (row) return this.get((row as any).task_id);
    }
    const id = randomUUID();
    const now = Date.now();
    db.prepare(`INSERT INTO tasks (id, subject, description, status, version, created_at, updated_at)
                VALUES (?, ?, ?, 'pending', 1, ?, ?)`)
      .run(id, input.subject, input.description ?? null, now, now);
    if (input.idempotencyKey) {
      db.prepare('INSERT INTO idempotency (key, task_id) VALUES (?, ?)').run(input.idempotencyKey, id);
    }
    return this.get(id);
  },

  update(input: { id: string; expectedVersion?: number; patch: Record<string, unknown> }) {
    const current = this.get(input.id);
    if (!current) throw new ToolError('NOT_FOUND', `Task ${input.id} not found`);
    if (input.expectedVersion != null && current.version !== input.expectedVersion) {
      throw new ToolError('STALE', `expected v${input.expectedVersion}, got v${current.version}; call get() and retry`);
    }
    const fields = ['subject', 'description', 'status'].filter(k => k in input.patch);
    const sets = fields.map(f => `${f} = ?`).join(', ');
    const values = fields.map(f => (input.patch as any)[f]);
    db.prepare(`UPDATE tasks SET ${sets}, version = version + 1, updated_at = ? WHERE id = ?`)
      .run(...values, Date.now(), input.id);
    return this.get(input.id);
  },

  get(id: string) {
    return db.prepare('SELECT * FROM tasks WHERE id = ?').get(id) as Task | undefined;
  },

  list() {
    return db.prepare('SELECT * FROM tasks ORDER BY created_at').all() as Task[];
  },
};

// tools.ts —— 工具边界做schema校验
export const tools = {
  'progress.create': {
    inputSchema: CreateInput,
    handler: (args: z.infer<typeof CreateInput>) => store.create(args),
  },
  'progress.update': {
    inputSchema: UpdateInput,
    handler: (args: z.infer<typeof UpdateInput>) => store.update(args),
  },
  // ...
};

//调用入口export 
function callTool(name: string, rawArgs: unknown) {
  const tool = tools[name];
  if (!tool) return { error: `unknown tool: ${name}` };
  const parsed = tool.inputSchema.safeParse(rawArgs);
  if (!parsed.success) {
    return { error: 'INVALID_INPUT', details: parsed.error.issues };  //立刻返回给LLM
  }
  try {
    return { ok: tool.handler(parsed.data) };
  } catch (e) {
    if (e instanceof ToolError) return { error: e.code, message: e.message };
    throw e;
  }
}

这100多行代码，已经把上面8条原则里最关键的5条（动词拆分、strict schema、patch、过期写防护、幂等键）实现了。

八、迁移阶梯（从最便宜到最强）

如果你已经在跑”LLM写文件”的旧方案，不必一夜重构。建议按这个顺序推进：

本周 — 在工具边界加strict schema校验，失败时返回结构化错误。光这一步通常稳定性翻倍。
下周 — 把”整文件写”换成patch-only update，store层分配ID。
下个迭代 — 存储迁到SQLite /你已有的DB，LLM完全看不到文件。
打磨期 — 加version/etag、幂等键、审计日志。
收尾 — 收紧系统提示词+工具描述+在Bash/Edit路径里显式禁掉旁路。

九、如果必须保留文件存储

某些sandbox /内网场景确实跑不起持久服务。这时降级方案：

一条记录一个文件（UUID文件名）— 物理消除”误改兄弟”
append-only事件日志 +物化视图 — LLM只追加事件，状态由你聚合
update用JSON Patch (RFC 6902)而不是整条对象
每次写入都过包装器（FileEditTool上面套一层）做schema校验，失败立即报错给LLM核心思想没变：减少LLM一次能搞砸的范围。

参考

RFC 6902 JSON Patch
zod strictObject / Pydantic model_config = ConfigDict(extra='forbid')

写在最后

这篇文章讲的是”别让LLM写文件”，但在实践中有个悖论：我们确实想让LLM输出大量结构化内容（比如报告、配置、数据视图）来提升用户体验。如果完全不让它碰文件，这些需求怎么解？如果让它写，又怎么保证输出的稳定性和一致性？

这个问题我最近在项目中踩了不少新坑，下一篇会专门聊这个话题——不是简单的”加个schema prompt”，而是一套从校验、渲染到回滚的完整Harness。

大家有什么想法或踩坑经历？欢迎评论区讨论。

文章摘自：https://www.cnblogs.com/haihai1203/p/20135161

一	二	三	四	五	六	日
« 5月
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

六狼博客

目录