Compare commits

..

12 Commits

Author SHA1 Message Date
xiaomo c128fad38e docs: add 6 FTE concept cards
CI / check (pull_request) Successful in 2m51s
- agent-as-graduate: onboarding metaphor and teaching threshold
- three-learning-carriers: memory/skill/workflow framework
- switching-cost-process-knowledge-as-moat: process knowledge as moat
- opc-why-fte-agents-matter-most: why OpenClaw bets on FTE
- fte-maturity-threshold: who can onboard an agent
- fte-product-landscape: OpenClaw vs Claude Code vs Hermes
2026-06-07 14:26:30 +00:00
xingyue 60fdb0a7ff Merge pull request 'fix(cli): thread list defaults to active threads only' (#159) from fix/147-thread-list-active-default into main
CI / check (push) Successful in 2m51s
2026-06-07 14:12:15 +00:00
xiaoju ae757e4d44 feat(cli): thread list defaults to active threads only
CI / check (pull_request) Successful in 2m52s
Closes #147. Changes default behavior of `uwf thread list` to show only
active threads (idle + running). Adds `--all` flag to opt into the
previous full-list behavior. Explicit `--status` still wins over `--all`.

- cmdThreadList gains a `showAll: boolean` parameter (default false)
- CLI registers `--all` option and passes it through
- Test suite includes new `default behavior (issue #147)` describe block
  covering 9 scenarios; existing tests updated where they implicitly
  relied on the old "show everything" behavior
- README, cli-reference, and usage-reference updated to document the
  new default and the `--all` flag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 13:46:25 +00:00
xingyue e1c7e3d267 Merge pull request 'chore(cli): clean up step-ask nits from PR #156 review' (#158) from chore/156-nits into main
CI / check (push) Successful in 2m27s
2026-06-07 11:57:51 +00:00
xingyue 8b01ade66a Merge pull request 'docs: expand .cards — vision, comparisons, business rationale' (#157) from docs/project-cards into main
CI / check (push) Successful in 2m39s
2026-06-07 11:50:48 +00:00
xiaoju 10113f6ec6 chore(cli): clean up step-ask nits from PR #156 review
CI / check (pull_request) Successful in 2m23s
- Remove dead-code detailRef null guards (already validated upstream)
- Fix ?: to explicit T | null on external error type boundary
- Drop unnecessary async from resolveAskWorkflow (no await)
- Simplify double negation: opts.fork !== false → opts.fork

Refs #146
2026-06-07 11:44:45 +00:00
xiaomo 04e2b5b8a7 docs: expand .cards — vision, comparisons, business rationale, open questions
CI / check (pull_request) Successful in 2m21s
26 cards covering:
- Project philosophy (session isolation, process discipline, dissipative structure)
- Comparisons (vs skill, vs dynamic workflow)
- Business rationale (FTE vs vendor, OPC, switching cost)
- Learning model (memory + skill + workflow)
- Self-improvement (reflective workflow, eval)
- Open questions (workflow granularity, human-in-the-loop)
2026-06-07 11:43:52 +00:00
xingyue f697aec3e7 Merge pull request 'feat(cli): step ask — read-only query to historical step sessions' (#156) from fix/146-step-ask into main
CI / check (push) Successful in 3m2s
2026-06-07 11:05:59 +00:00
xiaoju b5e094ab4d feat(cli): step ask — read-only query to historical step sessions
CI / check (pull_request) Successful in 2m46s
Adds `uwf step ask <step-hash> -p <prompt>` for asking follow-up
questions to a completed step's agent without mutating thread state.

- Fork-by-default: creates and caches a fork session per step (cache
  key `<stepHash>:ask`); subsequent asks reuse it.
- `--no-fork` fallback: spawns a fresh session with the step's detail
  ref injected as context.
- `--agent` overrides the recorded agent; otherwise resolves from the
  step's agent field via config alias.
- Updates `packages/cli/README.md` and `packages/util/src/usage-reference.ts`
  so the new subcommand is discoverable via README and `uwf prompt usage`.

Fixes #146
2026-06-07 10:33:41 +00:00
xingyue e9e896146e Merge pull request 'feat(util-agent): extend AgentOptions with fork / cleanup (Phase 2a)' (#155) from fix/145-agent-fork-cleanup into main
CI / check (push) Successful in 2m48s
2026-06-07 09:16:27 +00:00
xiaoju d666516ce6 feat(util-agent): extend AgentOptions with fork / cleanup (Phase 2a)
CI / check (pull_request) Successful in 3m20s
Add AgentForkFn and AgentCleanupFn type aliases. Extend AgentOptions
with fork: AgentForkFn | null and cleanup: AgentCleanupFn | null
fields. Add getAskSessionId / setAskSessionId session-cache helpers
using <stepHash>:ask key shape (coexists with exec sessions in the
same per-agent cache file). All four adapters pass fork: null,
cleanup: null — real wiring lands in Phase 2b. Resolves #145.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-06-07 08:36:58 +00:00
xingyue afc0287094 Merge pull request 'docs: add .cards — project philosophy and design rationale' (#154) from docs/project-cards into main
CI / check (push) Successful in 2m58s
2026-06-07 08:32:05 +00:00
40 changed files with 2013 additions and 18 deletions
+19
View File
@@ -0,0 +1,19 @@
---
title: "Agency over Content, Not Process"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, decision]
category: "architecture"
links:
- skill-vs-workflow-different-layers
- deterministic-engine-uncertain-agent
- feedback-loops-convergent-and-divergent
- cognitive-process-orchestration
- uwf-vs-dynamic-workflow
---
uwf 与"agent 自治"方案的核心区别:**agent 对内容有自主权,但对流程没有**。
流程是声明式的、引擎执行的、agent 无法绕过的。agent 不能决定跳过 review,就像程序员不能绕过 CI。自由度被有意限制在"内容"维度,"过程"维度是刚性的。这跟人类组织的逻辑一致——你可以自由发挥怎么写代码,但必须走 PR review。
参见 [[uwf-vs-dynamic-workflow]] 了解与 Claude Code dynamic workflow 的具体对比。
@@ -0,0 +1,21 @@
---
title: "Attention Isolation Breaks Cognitive Inertia"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, pattern]
category: "architecture"
links:
- session-isolation-as-cognitive-reset
- skill-vs-workflow-different-layers
- role-is-not-agent
---
"知识都在一个 session 内不是更好吗?"——这个直觉混淆了**信息量**和**认知模式**。
Session 隔离去掉的不是信息,而是**不该影响当前判断的信息**。reviewer 通过 CAS 链拿到 developer 的全部产出物(代码、变更说明),它缺的是 developer 的内心独白——为什么选方案 A、哪里犹豫过、哪里偷了懒。
这恰恰是关键。知道"为什么"的 reviewer 会顺着作者的逻辑走;不知道"为什么"的 reviewer 只能看产出物本身是否站得住——就像真实用户或未来维护者的视角。与学术双盲评审同理:去掉不该影响判断的信息,让注意力聚焦在工作本身。
每个认知任务需要的信息集合不同。developer 需要 issue 上下文、代码库知识、技术约束;reviewer 需要 diff、规范、测试结果。混在一起不是多了信息,是多了噪声。
**关注点的隔离是打破惯性和线性思维的关键。** 一个 session 做所有事,不是"知识都在",是关注点混在一起,确认偏误无法靠 prompt 消除,只能靠结构隔离。
+18
View File
@@ -0,0 +1,18 @@
---
title: "Cognitive Process Orchestration"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, decision]
category: "architecture"
links:
- feedback-loops-convergent-and-divergent
- session-isolation-as-cognitive-reset
- role-is-not-agent
- process-discipline-from-software-engineering
---
uwf 的抽象层次高于"质量保障工具"或"任务编排引擎"——它是一个**认知过程的编排引擎**。
收敛和发散都是认知过程。负反馈环(code review 循环)和正反馈环(苏格拉底式追问、头脑风暴)是同一套机制的不同配置。workflow author 通过设计 role 的 goal 和 graph 的环路结构,编排的是**思维方式**,不仅仅是任务步骤。
这意味着 uwf 的应用范围不限于软件开发流程,而是任何需要多视角、多轮次认知协作的场景。
@@ -0,0 +1,20 @@
---
title: "Cold Start — Same Entry Point, Different Exit"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, pattern]
category: "architecture"
links:
- uwf-vs-dynamic-workflow
- process-authorship-human-ai-vs-delegation
- workflow-as-improvable-system
- agent-as-graduate
---
uwf 的冷启动不比 dw 更复杂——起点完全一样:用户描述任务,agent 执行。
区别在出口:dw 跑完即丢,uwf 跑完后沉淀成 workflow YAML,用户可以审查、调优、复用。workflow 不一定要用户写,往往也是 agent 写的——跟 dw 一样的模式。uwf 和 dw 的差异不在"谁写流程",而在"流程跑完后去哪"。
冷启动路径:agent 先跑一次临时流程 → 用户觉得好就固化成 workflow → 下次同类任务直接复用 → 用过几次后根据经验调优。从零门槛的即兴执行,渐进演化为成熟的可复用流程。
入口像 dw 一样低,出口比 dw 多了一个沉淀层。
+20
View File
@@ -0,0 +1,20 @@
---
title: "Domain Experts Own the Process"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, decision, pattern]
category: "architecture"
links:
- trust-chain-audit-evaluate-reuse
- uwf-vs-dynamic-workflow
- cognitive-process-orchestration
- process-discipline-from-software-engineering
---
现实中各行各业有大量由反馈回路构成的流程正在实际运行,掌握和优化这些流程的是行业专家,不是 AI 工程师。
一个资深 QA 负责人知道测试应该怎么分层、失败后应该回到哪一步。一个风控经理知道审批要经过几道关、驳回后应该回到哪个环节补材料。这些人掌握流程的核心知识,但你让他们写 JS 编排脚本,他们做不到也不应该做。
YAML 声明式 workflow 让行业专家能直接参与——看得懂 roles 和 graph,能判断"这个环节是不是多余的"、"这两个角色之间应该加一个校验步骤"。审查门槛低不是为了技术简洁,是为了**让对的人参与对的决策**。
这是可审查 → 可评估 → 可复用信任链能真正转动的前提——转动它的人是行业专家,不是 AI 工程师。也是 uwf 选择声明式 YAML 而非 JS 的根本原因:**流程的设计权应该属于懂流程的人**。
+21
View File
@@ -0,0 +1,21 @@
---
title: "Eval Closes the Trust Chain"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, pattern]
category: "architecture"
links:
- trust-chain-audit-evaluate-reuse
- workflow-as-improvable-system
- feedback-loops-convergent-and-divergent
---
信任链(可审查 → 可评估 → 可复用 → 可迭代)的"可评估"环节需要工程落地。
uwf 的 eval 包(`@united-workforce/eval`,已在 repo 开发中)的目标是让 agent 能自我评估执行质量——一次 thread 跑完后,度量"做得好不好"、"workflow v2 比 v1 好还是差"。
这形成了两层反馈闭环:
1. **workflow 内的反馈环** — developer → reviewer → rejected → developer(已实现,负反馈驱动执行质量收敛)
2. **workflow 级的反馈环** — 执行 → eval → workflow 迭代 → 再执行(在建,驱动流程本身的持续改进)
第二层闭环接通后,uwf 就不只是一个执行引擎,而是一个**自我改进的流程系统**。
@@ -0,0 +1,21 @@
---
title: "Feedback Loops — Convergent and Divergent"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, pattern]
category: "architecture"
links:
- dissipative-structure-token-for-entropy
- process-discipline-from-software-engineering
- cognitive-process-orchestration
---
uwf 的 graph 环路不限于负反馈(收敛),也可以是正反馈(发散)。引擎本身不带倾向——流转方向由 `$status` 和 graph 决定,反馈性质由 role 的设计意图决定。
**负反馈环(收敛)**:developer → reviewer → rejected → developer。reviewer 的 goal 是"找问题",产生修正力。稳定点是 `approved`,系统自然收敛到那里。特性:偏差越大修正越强,对扰动鲁棒。
**正反馈环(发散)**:proposer → challenger → "interesting" → proposer。challenger 的 goal 是"追问更深层的假设",每轮发散,一个想法激发更多想法。
终止条件不同:负反馈靠收敛自然到达稳定点;正反馈不会自己停,需要外部约束(轮次上限,或额外 role 判断"够了")。
每个 role 的 `$status` 就是误差信号(负反馈)或激励信号(正反馈),驱动系统向不同方向演化。Workflow author 真正在设计的是**在哪里放什么样的环**。
@@ -0,0 +1,27 @@
---
title: "Four Advantages over Single Session + Skill"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, pattern]
category: "architecture"
links:
- session-isolation-as-cognitive-reset
- attention-isolation-breaks-cognitive-inertia
- skill-vs-workflow-different-layers
- when-skill-is-not-enough
---
Session 隔离除了认知层面的好处(打破确认偏误、聚焦注意力),还解决一个更物理性的问题:**长 session 的上下文压缩导致降智和行为不稳定**。
Context window 是有限资源。一个 session 从头做到尾,前期的 tool output、中间的思考过程不断堆积,要么触发 compaction(信息丢失),要么挤占后期推理的有效空间。越到后面 agent 越"笨"——不是能力变了,是可用的认知空间被历史占满了。表现为:忘记约束、重复错误、输出不稳定。
Session 隔离直接解决这个问题:每个 role 进入时拿到的是**精炼过的前序产出**(CAS 里经 schema 过滤的结构化 output),不是前面所有 session 的原始 token 流。信息经过 schema 过滤,只有产出物,没有过程噪声。
uwf 相对单 session + skill 的四个优势,前三个来自 session 隔离,第四个来自程序化流程:
1. **认知隔离** — 打破确认偏误和线性思维惯性
2. **注意力聚焦** — 每个 role 只看该看的信息
3. **上下文保鲜** — 避免长 session 的压缩降智和行为漂移
4. **流程可靠性** — 引擎强制执行每一步,agent 无法跳过或篡改流程
前三点回答"为什么拆成多个 session 更好",第四点回答"为什么流程要由引擎控制而不是 agent 自觉"。Skill 里写"先编码再测试再 review",agent 可能做着做着就跳过——不是故意的,是 context 压力下行为漂移,或者觉得"改动太小不需要测试"。程序化流程不存在这个问题:graph 说要走 tester,就必须走 tester。
+22
View File
@@ -0,0 +1,22 @@
---
title: "Open Question — Human as Role Participant"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, open-question]
category: "architecture"
links:
- agent-as-graduate
- opc-why-fte-agents-matter-most
- role-is-not-agent
- process-authorship-human-ai-vs-delegation
---
**待讨论。**
目前讨论主要围绕 OPC(一个人 + N 个 agent)。但小团队场景下——几个人各自有 FTE agent,共享 workflow 库和记忆——workflow 的某些 role 可能需要人来执行而不是 agent。
问题:
- uwf 是否需要支持人作为 role 的参与者(比如"人工审批"作为 graph 中的一个 role)?
- 还是人永远在 workflow 之外,只做设计者和监督者?
- 如果支持,$SUSPEND 机制是否已经覆盖了这个需求(暂停等人介入)?
- 多人 + 多 agent 的协作场景下,workflow 的共享和权限模型是什么样的?
@@ -0,0 +1,20 @@
---
title: "Open Question — Workflow Granularity and Composition"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, open-question]
category: "architecture"
links:
- cognitive-process-orchestration
- skill-vs-workflow-different-layers
- domain-experts-own-the-process
---
**待讨论。**
Workflow 的粒度问题:solve-issue 是端到端的大 workflow(planner → developer → reviewer → tester → committer),但现实中有些场景只需要管一个环节(比如只用 uwf 管 code review,其他部分用 skill 或手动)。
问题:
- Workflow 是否应该支持嵌套或组合——小 workflow 作为大 workflow 的一个 role?
- 还是粒度完全由用户自己决定,引擎不需要管?
- 组合式 workflow 和单体 workflow 各自的 trade-off 是什么?
@@ -0,0 +1,23 @@
---
title: "Process Authorship — Human-AI Collaboration vs Full Delegation"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, decision]
category: "architecture"
links:
- domain-experts-own-the-process
- uwf-vs-dynamic-workflow
- trust-chain-audit-evaluate-reuse
- workflow-as-improvable-system
---
dw 和 uwf 都面向 agent,用户都不需要会写代码。区别在于**流程的创作权**:
- **dw**:流程由 AI 全权负责。用户描述任务,agent 决定怎么拆步骤、怎么编排。用户参与度最低,门槛最低。
- **uwf**:流程创作是人和 AI 协作的。行业专家参与设计、审查、调优流程,agent 参与起草和执行。
这是主动权的取舍。dw 把流程交给 AI 是为了降低使用门槛;uwf 有意保留人对流程的参与权,代价是门槛稍高,收益是流程能融入人的领域知识。
背后的认知:**AI 擅长执行,但流程设计需要领域知识。** AI 不知道行业里哪个环节容易出错、哪个审批不能跳过、哪个反馈回路是血的教训换来的。这些知识在行业专家脑子里,需要一个他们能参与的载体来表达。
dw 赌的是 AI 能自己发现好的流程,uwf 赌的是好的流程需要人的知识参与。两个赌注没有对错,适用于不同的场景:临时任务用 dw 的零门槛更高效,反复执行的核心业务流程用 uwf 的人机协作更可靠。
@@ -0,0 +1,35 @@
---
title: "Reflective Workflow — Self-Improvement as Discipline"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, pattern, decision]
category: "architecture"
links:
- eval-closes-the-trust-chain
- three-learning-carriers
- workflow-as-improvable-system
- feedback-loops-convergent-and-divergent
- trust-chain-audit-evaluate-reuse
---
FTE agent 的"成长"不靠自发顿悟,靠纪律性的反思。反思本身是纪律性的(定期跑、不能跳过、有固定步骤),所以应该用 workflow 承载——不能靠 agent "有空想想"。
反思 workflow 定期拉取最近执行过的任务,分析流程中出现的问题,找可优化的点,迭代,eval,对比。反思的对象覆盖三层载体:
- 发现某个 role 反复在同一类问题上出错 → **迭代 skill**
- 发现某类任务的上下文总是缺少关键信息 → **补充记忆**
- 发现某个审批环节通过率 100% 从未驳回 → **简化 workflow**
这形成了双层 workflow 架构:
```
执行层:workflow 驱动日常任务
↓ 产出执行记录(CAS 链)
反思层:反思 workflow 定期分析执行记录
↓ 产出改进建议
改进层:迭代 memory / skill / workflow
↓ 提升下一轮执行质量
执行层:...
```
两层都是 workflow,职责不同——执行层做事,反思层改进做事的方式。用 workflow 来优化 workflow——工具改进自身的递归。
@@ -0,0 +1,19 @@
---
title: "Skill vs Workflow — Different Layers"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, decision]
category: "architecture"
links:
- session-isolation-as-cognitive-reset
- cognitive-process-orchestration
- agency-over-content-not-process
---
Skill 和 workflow 不是替代关系,是不同层次。
**Skill** 管的是一个 session 内怎么做——给 agent 的指令和方法论。你可以在 skill 里写"先规划再编码再 review",但 agent 始终在同一个 session 里,review 自己刚写的代码时带着全部决策记忆。确认偏误无法靠 prompt 消除。
**Workflow** 管的是 session 之间怎么协作——强制 session 断裂,reviewer 进来时不知道 developer 当时为什么做那个选择,只看到产出物。这个隔离不是靠自律,是靠结构。
两者正交:workflow 的每个 role 里面完全可以加载 skill。Skill 提升单个 session 的能力,workflow 编排多个 session 的协作关系。
@@ -0,0 +1,23 @@
---
title: "Trust Chain — Auditable → Evaluable → Reusable → Improvable"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, pattern, decision]
category: "architecture"
links:
- workflow-as-improvable-system
- uwf-vs-dynamic-workflow
- process-discipline-from-software-engineering
---
可审查、可评估、可复用不是并列的好处,而是一条因果链:
**可审查 → 可评估 → 可复用 → 可迭代**
不能审查的东西不敢复用——不知道它为什么 work,换个场景可能就 break。不能评估的东西不知道该不该复用——也许它其实没用,只是恰好那次任务简单。
这是一条信任链,每一环是下一环的前提。uwf 选择声明式 YAML 而不是 JS/TS 定义 workflow,不是技术限制,是有意降低审查门槛,让这条链的摩擦力最低。
dw 不是不能做这些,而是它的默认路径不鼓励这条链——即兴生成的脚本,审查成本高、评估缺乏对照、复用需要额外抽象。差异在摩擦力,不在能力边界。
这也是耗散结构的递归应用——不只是用流程对 agent 做负反馈(提升执行质量),还在对流程本身做负反馈(提升流程质量)。Workflow 和代码一样,需要 review、测试、度量、迭代。
+27
View File
@@ -0,0 +1,27 @@
---
title: "uwf vs Dynamic Workflow — Structural Differences"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, decision]
category: "architecture"
links:
- agency-over-content-not-process
- deterministic-engine-uncertain-agent
- session-isolation-as-cognitive-reset
- cognitive-process-orchestration
- workflow-as-improvable-system
---
Claude Code 的 dynamic workflow (dw) 和 uwf 都有 session 隔离——dw spawn 独立 subagent(最多 16 并发、1000 总量),每个 subagent 是独立 context,也能做对抗性 review。四个优势(认知隔离、注意力聚焦、上下文保鲜、流程可靠性)两者都具备。
差异不在能不能做 session 隔离和程序化流程,而在**流程和执行的解耦程度**:
dw 的流程生成和执行是一体的——同一个 agent 既决定怎么做又开始做。流程嵌在执行里。uwf 的 workflow 是独立的持久制品,不管是人写的还是 agent 写的,一旦存在就和任何一次执行无关,可以被单独审查、讨论、迭代。
这个解耦在三个维度上拉开差距:
**审查**:dw 的 JS 脚本是代码,审查门槛高,逻辑和业务细节混在一起。uwf 的 YAML 是声明式的,roles 定义关注点,graph 定义流转,一眼能看出流程结构,非工程师也能参与讨论。
**评估**:dw 每次生成不同脚本,难以控制变量——跑得好是流程好还是脚本碰巧写得好?uwf 的 workflow 固定,跑 N 次可以统计成功率,增减 role 后效果差异可以归因到流程变更。
**复用**:dw 脚本为特定任务生成,复用需要手动泛化。uwf 的 workflow 天然是通用模板——solve-issue 就是 solve-issue,换个 repo 换个 issue 直接跑。
+24
View File
@@ -0,0 +1,24 @@
---
title: "When Skill Is Not Enough — Workflow Judgment Call"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, decision, pattern]
category: "architecture"
links:
- skill-vs-workflow-different-layers
- attention-isolation-breaks-cognitive-inertia
- feedback-loops-convergent-and-divergent
- agency-over-content-not-process
---
**Skill 够用的场景:** 任务在单一认知模式下可以完成好。查资料、写文档、跑部署脚本、按规范格式化——不需要自我对抗,一个 session 带着清晰指令一路执行到底就行。
**Workflow 更好的场景:** 任务需要在不同认知模式之间切换,且这些模式之间存在张力。典型标志:
1. **产出需要被"不知道过程"的眼睛审视** — 写代码+review、写方案+挑战、翻译+校对。一个 session 做不到真正的自我审视,确认偏误是自回归结构决定的,不是 prompt 能修的。
2. **出错成本高到需要结构性保证** — 不是"建议你 review 一下",而是"你不可能跳过 review"。Skill 是建议,workflow 是制度。
3. **需要收敛到明确的质量标准** — 负反馈环驱动修正直到通过,而不是 agent 自己觉得"差不多了"。
**判词:当任务复杂到 agent 可能说服自己"错的是对的"时,你需要 workflow 的结构隔离,而不是 skill 的行为指导。**
+20
View File
@@ -0,0 +1,20 @@
---
title: "Workflow as an Improvable System"
created: "2026-06-07"
source: "openclaw-xiaomo"
tags: [architecture, pattern]
category: "architecture"
links:
- uwf-vs-dynamic-workflow
- process-discipline-from-software-engineering
- feedback-loops-convergent-and-divergent
- cognitive-process-orchestration
---
uwf 把 workflow 定位为**可持续改进的系统**,而不是一次性的任务完成工具。
LLM 能力在快速提升,但单次执行的可靠性永远有上限。真正的杠杆不在于某一次跑得好不好,而在于流程本身能不能从每次执行中学到东西、越来越好。这需要流程是可审查的(看得懂才能改)、可评估的(量化才能知道改对没有)、可复用的(积累才有复利)。
dw 每次重新生成脚本,某种意义上是在放弃之前执行的经验——每次从零开始发明流程。uwf 把流程固化为独立制品,每次迭代都在前一版基础上改进。v1 没有 tester 角色,加上 tester 变成 v2,效果可量化对比。
这是一个有记忆的系统——记忆不在 agent 的 context 里,而在 workflow 的版本历史里。
+18
View File
@@ -0,0 +1,18 @@
---
"@united-workforce/util-agent": minor
"@united-workforce/agent-mock": patch
"@united-workforce/agent-builtin": patch
"@united-workforce/agent-hermes": patch
"@united-workforce/agent-claude-code": patch
---
feat(util-agent): extend AgentOptions with `fork` / `cleanup` and add ask-session cache
Phase 2a infrastructure for `step ask`. Extends `AgentOptions` with
`fork: AgentForkFn | null` and `cleanup: AgentCleanupFn | null` fields, exporting
the new `AgentForkFn` and `AgentCleanupFn` type aliases. Adds `getAskSessionId` /
`setAskSessionId` to the per-agent session cache, using `<stepHash>:ask` keys
that share the cache file with exec sessions (`<threadId>:<role>` keys) without
collision. All four adapters (mock, builtin, hermes, claude-code) now pass
`fork: null, cleanup: null` — real implementations land in Phase 2b. Resolves
issue #145.
+18
View File
@@ -0,0 +1,18 @@
---
"@united-workforce/cli": minor
"@united-workforce/util": patch
---
feat(cli): add `uwf step ask <step-hash> -p <prompt>` read-only follow-up command
Phase 2b of the ask-session work. Adds a new subcommand that lets the user ask
a follow-up question to a historical step's agent without writing a new
`StepNode` or mutating thread state. The command resolves the agent from the
recorded step (or `--agent <cmd>` override), forks the original session via the
adapter's `--mode fork --session <source>` contract, caches the resulting
ask-session id under `<stepHash>:ask` so subsequent asks reuse it, then invokes
the agent with `--mode ask --session <forkId> --prompt <text> --detail <ref>`
and streams the raw stdout to the caller. `--no-fork` falls back to a fresh
session that receives the step's detail ref for context. The `prompt usage`
reference (in `@united-workforce/util`) is also updated so agents discover the
new subcommand. Resolves issue #146.
+14
View File
@@ -0,0 +1,14 @@
---
"@united-workforce/cli": minor
"@united-workforce/util": patch
---
feat(cli): `uwf thread list` now defaults to active threads only
Changes the default behavior of `uwf thread list` to show only active threads
(idle + running). Adds a new `--all` flag to opt into the previous behavior of
listing every thread (including completed, cancelled, and suspended).
When invoked with no flags, the command now hides completed/cancelled/suspended
threads. Use `--all` to see them, or `--status <status>` to filter explicitly.
The `--status` filter wins when both are present. Resolves issue #147.
+2
View File
@@ -167,5 +167,7 @@ export function createBuiltinAgent(): () => Promise<void> {
name: "builtin",
run: runBuiltin,
continue: continueBuiltin,
fork: null,
cleanup: null,
});
}
@@ -253,5 +253,7 @@ export function createClaudeCodeAgent(model: string | null): () => Promise<void>
name: "claude-code",
run: (ctx) => runClaudeCode(ctx, model),
continue: (sessionId, message, store) => continueClaudeCode(sessionId, message, store, model),
fork: null,
cleanup: null,
});
}
+2
View File
@@ -246,6 +246,8 @@ export function createHermesAgent(resumeDisabled: boolean): () => Promise<void>
name: "hermes",
run: runHermes,
continue: continueHermes,
fork: null,
cleanup: null,
});
// Wrap to ensure ACP client is closed after agent completes,
+2
View File
@@ -125,5 +125,7 @@ export function createMockAgent(mockDataPath: string): () => Promise<void> {
name: "mock",
run,
continue: continueRun,
fork: null,
cleanup: null,
});
}
+6 -1
View File
@@ -49,7 +49,7 @@ bun link packages/cli
| `uwf thread start <workflow> -p <prompt>` | Create a thread without executing |
| `uwf thread exec <thread-id> [--agent <cmd>] [-c <count>] [--background]` | Execute one or more moderator→agent→extract cycles |
| `uwf thread show <thread-id>` | Show thread head pointer |
| `uwf thread list [--status <status>] [--after <date>] [--before <date>] [--skip <n>] [--take <n>]` | List threads filtered by status (idle, running, completed, active, or comma-separated), time range (ISO or relative like '7d'), with pagination |
| `uwf thread list [--status <status>] [--all] [--after <date>] [--before <date>] [--skip <n>] [--take <n>]` | List threads (defaults to active: idle + running). Use `--all` to include completed/cancelled/suspended, or `--status` to filter explicitly (idle, running, suspended, completed, cancelled, active, or comma-separated). Supports time range and pagination. |
| `uwf thread read <thread-id> [--quota N] [--before <hash>] [--start]` | Render thread as readable markdown |
`thread read`, `step list`, and `step show` work on both active and completed threads.
@@ -63,6 +63,8 @@ uwf thread start solve-issue -p "Fix the login redirect bug"
uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV
uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV -c 3 --agent uwf-builtin
uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV --background
uwf thread list
uwf thread list --all
uwf thread list --status running
uwf thread list --status active
uwf thread list --status idle,completed
@@ -79,6 +81,7 @@ uwf thread stop 01ARZ3NDEKTSV4RRFFQ69G5FAV
| `uwf step show <step-hash>` | Show step metadata and frontmatter |
| `uwf step read <step-hash> [--quota <chars>]` | Read a step's turns as human-readable markdown |
| `uwf step fork <step-hash>` | Fork a thread from a specific step |
| `uwf step ask <step-hash> -p <prompt> [--agent <cmd>] [--no-fork]` | Ask a follow-up question to a historical step's agent (read-only; no thread mutation) |
Examples:
@@ -87,6 +90,8 @@ uwf step list 01ARZ3NDEKTSV4RRFFQ69G5FAV
uwf step show 32GCDE899RRQ3
uwf step read 32GCDE899RRQ3 --quota 2000
uwf step fork 32GCDE899RRQ3
uwf step ask 32GCDE899RRQ3 -p "Why did you choose this approach?"
uwf step ask 32GCDE899RRQ3 -p "Summarise the key findings" --no-fork
```
### Workflow (Layer 1: Templates)
@@ -384,7 +384,7 @@ describe("currentRole field", () => {
const _compHead = loadActiveThreads(uwfForIndex.varStore)[compId]!.head;
completeThread(uwfForIndex.varStore, compId, "completed");
const list = await cmdThreadList(storageRoot, null, null, null, 0, 100);
const list = await cmdThreadList(storageRoot, null, null, null, 0, 100, true);
const idleItem = list.find((i) => i.thread === idleId);
expect(idleItem).toBeDefined();
+670
View File
@@ -0,0 +1,670 @@
import { execFileSync } from "node:child_process";
import { mkdir, mkdtemp, readFile, rm, writeFile } from "node:fs/promises";
import { tmpdir } from "node:os";
import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
import { bootstrap, putSchema } from "@ocas/core";
import { openStore } from "@ocas/fs";
import type { CasRef, ThreadId, ThreadIndexEntry } from "@united-workforce/protocol";
import { afterEach, beforeEach, describe, expect, test } from "vitest";
import { registerUwfSchemas } from "../schemas.js";
import { seedThreads } from "./thread-test-helpers.js";
const OUTPUT_SCHEMA = {
type: "object" as const,
properties: {
$status: { type: "string" as const },
note: { type: "string" as const },
},
required: ["$status"],
additionalProperties: false,
};
const DETAIL_SCHEMA = {
title: "ask-detail",
type: "object" as const,
required: ["sessionId", "model", "duration", "turnCount", "turns"],
properties: {
sessionId: { type: "string" as const },
model: { type: "string" as const },
duration: { type: "integer" as const },
turnCount: { type: "integer" as const },
turns: {
type: "array" as const,
items: { type: "string" as const, format: "ocas_ref" },
},
},
additionalProperties: false,
};
const THREAD_ID = "01ASKSTEPTEST000000000" as ThreadId;
const STEP_SESSION_ID = "ses-original-step-001";
let tmpDir: string;
beforeEach(async () => {
tmpDir = await mkdtemp(join(tmpdir(), "cli-uwf-step-ask-test-"));
});
afterEach(async () => {
await rm(tmpDir, { recursive: true, force: true });
});
type SetupOpts = {
threadStatus: ThreadIndexEntry["status"];
withDetail: boolean;
// The agent name (path or alias) to record in the head StepNode.agent field.
// Defaults to mockAgentPath.
stepAgentNameOverride: string | null;
// Pre-cached fork session-id. When provided, the cache file is written
// before running so the test can verify reuse semantics.
preCachedForkSessionId: string | null;
};
type SetupResult = {
casDir: string;
stepHash: CasRef;
startHash: CasRef;
workflowHash: CasRef;
detailHash: CasRef | null;
mockAgentPath: string;
failingAgentPath: string;
promptCapturePath: string;
modeCapturePath: string;
forkSessionCapturePath: string;
askSessionCapturePath: string;
envCapturePath: string;
};
async function setupAskFixture(opts: Partial<SetupOpts> = {}): Promise<SetupResult> {
const cfg: SetupOpts = {
threadStatus: opts.threadStatus ?? "idle",
withDetail: opts.withDetail ?? true,
stepAgentNameOverride: opts.stepAgentNameOverride ?? null,
preCachedForkSessionId: opts.preCachedForkSessionId ?? null,
};
const casDir = join(tmpDir, "cas");
await mkdir(casDir, { recursive: true });
const store = await openStore(casDir);
await bootstrap(store);
const schemas = await registerUwfSchemas(store);
const outputSchemaHash = await putSchema(store, OUTPUT_SCHEMA);
const detailSchemaHash = await putSchema(store, DETAIL_SCHEMA);
const workflowHash = await store.cas.put(schemas.workflow, {
name: "test-ask",
description: "ask command integration test",
roles: {
worker: {
description: "Worker",
goal: "Work",
capabilities: [],
procedure: "work",
output: "result",
frontmatter: outputSchemaHash,
},
},
graph: {
$START: {
new: { role: "worker", prompt: "Start work", location: null },
},
worker: { ok: { role: "$END", prompt: "done", location: null } },
},
});
const startHash = await store.cas.put(schemas.startNode, {
workflow: workflowHash,
prompt: "Test ask task",
cwd: tmpDir,
});
// Set OCAS_HOME so seedThreads + in-test createUwfStore calls resolve to this CAS dir.
process.env.OCAS_HOME = casDir;
// Capture file paths
const promptCapturePath = join(tmpDir, "captured-prompt.txt");
const modeCapturePath = join(tmpDir, "captured-mode.txt");
const forkSessionCapturePath = join(tmpDir, "captured-fork-session.txt");
const askSessionCapturePath = join(tmpDir, "captured-ask-session.txt");
const envCapturePath = join(tmpDir, "captured-env.txt");
const mockAgentPath = join(tmpDir, "mock-agent.sh");
const failingAgentPath = join(tmpDir, "failing-agent.sh");
// Build a detail node with sessionId so step ask can extract it
let detailHash: CasRef | null = null;
if (cfg.withDetail) {
const turnHash = await store.cas.put(detailSchemaHash, {
sessionId: STEP_SESSION_ID,
model: "test-model",
duration: 1000,
turnCount: 0,
turns: [],
});
detailHash = turnHash;
}
// Build the StepNode at thread head
const outputHash = await store.cas.put(outputSchemaHash, { $status: "ok" });
const stepHash = await store.cas.put(schemas.stepNode, {
start: startHash,
prev: null,
role: "worker",
output: outputHash,
detail: detailHash,
agent: cfg.stepAgentNameOverride ?? mockAgentPath,
edgePrompt: "Start work",
startedAtMs: 1716600000000,
completedAtMs: 1716600001000,
cwd: tmpDir,
assembledPrompt: null,
usage: null,
});
// Seed thread index entry
await seedThreads(tmpDir, {
[THREAD_ID]: {
head: stepHash,
status: cfg.threadStatus,
suspendedRole: null,
suspendMessage: null,
completedAt: cfg.threadStatus === "completed" ? 1716600001000 : null,
},
});
// Pre-seed the ask session cache so reuse tests have something to find.
if (cfg.preCachedForkSessionId !== null) {
const cachePath = join(tmpDir, "cache", "mock-sessions.json");
await mkdir(dirname(cachePath), { recursive: true });
await writeFile(
cachePath,
`${JSON.stringify({ [`${stepHash}:ask`]: cfg.preCachedForkSessionId }, null, 2)}\n`,
"utf8",
);
}
// Mock agent: dispatches based on `--mode` (ask|fork|run) and captures inputs.
// - --mode ask --session <id> --prompt <text>: writes to ask capture; echoes a fixed answer to stdout
// - --mode fork --session <id>: writes to fork capture; prints "forked-from-<id>" sessionId on stdout
// - default (uwf-* style invocation): captures and echoes adapter JSON (not used in this suite)
await writeFile(
mockAgentPath,
`#!/bin/sh
mode=""
prompt=""
session=""
detail=""
while [ $# -gt 0 ]; do
case "$1" in
--mode) mode="$2"; shift 2 ;;
--prompt) prompt="$2"; shift 2 ;;
--session) session="$2"; shift 2 ;;
--detail) detail="$2"; shift 2 ;;
*) shift ;;
esac
done
printf '%s' "$mode" > '${modeCapturePath}'
printf '%s' "$prompt" > '${promptCapturePath}'
printf 'OCAS_HOME=%s\\n' "$OCAS_HOME" > '${envCapturePath}'
case "$mode" in
fork)
printf '%s' "$session" > '${forkSessionCapturePath}'
new_id="forked-from-$session"
printf '%s\\n' "$new_id"
;;
ask)
printf '%s' "$session" > '${askSessionCapturePath}'
# Print a deterministic answer that the cmdStepAsk path will hand back.
printf 'MOCK_ANSWER prompt=%s session=%s detail=%s\\n' "$prompt" "$session" "$detail"
;;
*)
echo "{\\"stepHash\\":\\"unused\\"}"
;;
esac
`,
{ mode: 0o755 },
);
await writeFile(
failingAgentPath,
`#!/bin/sh
echo "boom" >&2
exit 7
`,
{ mode: 0o755 },
);
// Minimal config so loadWorkflowConfig succeeds.
const configPath = join(tmpDir, "config.yaml");
await writeFile(
configPath,
`defaultAgent: uwf-hermes\ndefaultModel: test-model\nagentOverrides: null\nagents: {}\nproviders: {}\nmodels: {}\n`,
);
return {
casDir,
stepHash,
startHash,
workflowHash,
detailHash,
mockAgentPath,
failingAgentPath,
promptCapturePath,
modeCapturePath,
forkSessionCapturePath,
askSessionCapturePath,
envCapturePath,
};
}
function runUwf(
args: string[],
casDir: string,
): { stdout: string; stderr: string; status: number } {
const cliPath = join(dirname(fileURLToPath(import.meta.url)), "..", "..", "dist", "cli.js");
try {
const stdout = execFileSync(process.execPath, [cliPath, ...args], {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
env: {
...process.env,
UWF_HOME: tmpDir,
OCAS_HOME: casDir,
},
cwd: tmpDir,
timeout: 30000,
});
return { stdout, stderr: "", status: 0 };
} catch (error) {
const err = error as NodeJS.ErrnoException & {
stdout?: string | Buffer;
stderr?: string | Buffer;
status?: number;
};
return {
stdout: typeof err.stdout === "string" ? err.stdout : (err.stdout?.toString("utf8") ?? ""),
stderr: typeof err.stderr === "string" ? err.stderr : (err.stderr?.toString("utf8") ?? ""),
status: err.status ?? 1,
};
}
}
// ── Group 1: CLI argument validation ───────────────────────────────────────
describe("uwf step ask - CLI argument validation", () => {
test("1.1 missing step-hash exits non-zero", async () => {
const { casDir } = await setupAskFixture();
const result = runUwf(["step", "ask"], casDir);
expect(result.status).not.toBe(0);
});
test("1.2 missing -p flag exits non-zero", async () => {
const { casDir, stepHash } = await setupAskFixture();
const result = runUwf(["step", "ask", stepHash], casDir);
expect(result.status).not.toBe(0);
expect(result.stderr.toLowerCase()).toMatch(/required|missing|prompt/);
});
test("1.3 step-hash and -p accepted as valid invocation", async () => {
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
const result = runUwf(
["step", "ask", stepHash, "-p", "why?", "--agent", mockAgentPath],
casDir,
);
expect(result.status).toBe(0);
});
});
// ── Group 2: CAS validation errors ────────────────────────────────────────
describe("uwf step ask - CAS validation errors", () => {
test("2.1 non-existent CAS hash exits non-zero with 'not found'", async () => {
const { casDir, mockAgentPath } = await setupAskFixture();
const result = runUwf(
["step", "ask", "0000000000000", "-p", "why?", "--agent", mockAgentPath],
casDir,
);
expect(result.status).not.toBe(0);
expect(result.stderr.toLowerCase()).toContain("not found");
});
test("2.2 hash that is not a StepNode exits non-zero", async () => {
const { casDir, startHash, mockAgentPath } = await setupAskFixture();
// Use the StartNode hash — it exists but is not a StepNode
const result = runUwf(
["step", "ask", startHash, "-p", "why?", "--agent", mockAgentPath],
casDir,
);
expect(result.status).not.toBe(0);
expect(result.stderr.toLowerCase()).toContain("not a stepnode");
});
test("2.3 step with no detail ref exits non-zero", async () => {
const { casDir, stepHash, mockAgentPath } = await setupAskFixture({ withDetail: false });
const result = runUwf(
["step", "ask", stepHash, "-p", "why?", "--agent", mockAgentPath],
casDir,
);
expect(result.status).not.toBe(0);
expect(result.stderr.toLowerCase()).toMatch(/no detail|detail.*missing|missing.*detail/);
});
});
// ── Group 3: Successful ask (core behavior) ───────────────────────────────
describe("uwf step ask - successful ask (core)", () => {
test("3.1 stdout contains agent's response text", async () => {
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
const result = runUwf(
["step", "ask", stepHash, "-p", "why tar not zip?", "--agent", mockAgentPath],
casDir,
);
expect(result.status).toBe(0);
expect(result.stdout).toContain("MOCK_ANSWER");
expect(result.stdout).toContain("why tar not zip?");
});
test("3.2 thread index entry (head, status) is identical before and after ask", async () => {
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
// Before ask: snapshot the thread state
const { createUwfStore, getThread } = await import("../store.js");
const before = await createUwfStore(tmpDir);
const beforeEntry = getThread(before.varStore, THREAD_ID);
expect(beforeEntry).not.toBeNull();
const result = runUwf(
["step", "ask", stepHash, "-p", "anything", "--agent", mockAgentPath],
casDir,
);
expect(result.status).toBe(0);
// After ask: thread state should be unchanged
const after = await createUwfStore(tmpDir);
const afterEntry = getThread(after.varStore, THREAD_ID);
expect(afterEntry).not.toBeNull();
expect(afterEntry?.head).toBe(beforeEntry?.head);
expect(afterEntry?.status).toBe(beforeEntry?.status);
});
test("3.3 no new StepNode is written to CAS (step count unchanged)", async () => {
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
// Count StepNodes before
const { createUwfStore } = await import("../store.js");
const before = await createUwfStore(tmpDir);
const stepSchemaHash = before.schemas.stepNode;
function countStepNodes(uwfStore: typeof before): number {
const candidates = [stepHash];
let count = 0;
for (const h of candidates) {
const node = uwfStore.store.cas.get(h);
if (node !== null && node.type === stepSchemaHash) count++;
}
return count;
}
const beforeCount = countStepNodes(before);
expect(beforeCount).toBe(1);
const result = runUwf(
["step", "ask", stepHash, "-p", "anything", "--agent", mockAgentPath],
casDir,
);
expect(result.status).toBe(0);
// After ask: still only the seeded StepNode exists at head; no new step appended.
const after = await createUwfStore(tmpDir);
const headNode = after.store.cas.get(stepHash);
expect(headNode).not.toBeNull();
expect(headNode?.type).toBe(after.schemas.stepNode);
// Confirm thread head still points to the original step hash
const { getThread } = await import("../store.js");
const entry = getThread(after.varStore, THREAD_ID);
expect(entry?.head).toBe(stepHash);
});
});
// ── Group 4: Fork cache semantics ─────────────────────────────────────────
describe("uwf step ask - fork cache", () => {
test("4.1 first ask creates a fork session and caches it", async () => {
const { casDir, stepHash, mockAgentPath, forkSessionCapturePath } = await setupAskFixture();
const result = runUwf(
["step", "ask", stepHash, "-p", "first ask", "--agent", mockAgentPath],
casDir,
);
expect(result.status).toBe(0);
// The mock agent in fork mode receives the source session id
const forkArg = await readFile(forkSessionCapturePath, "utf8");
expect(forkArg).toBe(STEP_SESSION_ID);
// Cache file should now contain the ask key
const cachePath = join(tmpDir, "cache", "mock-sessions.json");
const raw = await readFile(cachePath, "utf8");
const parsed = JSON.parse(raw) as Record<string, string>;
expect(parsed[`${stepHash}:ask`]).toBeDefined();
expect(parsed[`${stepHash}:ask`]).toBe(`forked-from-${STEP_SESSION_ID}`);
});
test("4.2 second ask on same step reuses the cached fork session", async () => {
const cachedFork = "ses-already-forked-once";
const { casDir, stepHash, mockAgentPath, modeCapturePath, askSessionCapturePath } =
await setupAskFixture({ preCachedForkSessionId: cachedFork });
const result = runUwf(
["step", "ask", stepHash, "-p", "second ask", "--agent", mockAgentPath],
casDir,
);
expect(result.status).toBe(0);
// The mock agent must have been invoked in `ask` mode (no fork performed).
const mode = await readFile(modeCapturePath, "utf8");
expect(mode).toBe("ask");
// The ask invocation should have received the cached fork session id.
const askArg = await readFile(askSessionCapturePath, "utf8");
expect(askArg).toBe(cachedFork);
});
test("4.3 different step hash creates an independent fork", async () => {
// Run a first ask on the base step → caches forkA
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
const r1 = runUwf(
["step", "ask", stepHash, "-p", "ask on step A", "--agent", mockAgentPath],
casDir,
);
expect(r1.status).toBe(0);
// Build a second StepNode (different hash) with a different sessionId so
// its detail-derived ask session is independent of the first.
const { createUwfStore } = await import("../store.js");
const uwf = await createUwfStore(tmpDir);
const detailSchemaHash = await putSchema(uwf.store, DETAIL_SCHEMA);
const outputSchemaHash = await putSchema(uwf.store, OUTPUT_SCHEMA);
const otherDetailHash = await uwf.store.cas.put(detailSchemaHash, {
sessionId: "ses-original-step-002",
model: "test-model",
duration: 1000,
turnCount: 0,
turns: [],
});
const otherOutputHash = await uwf.store.cas.put(outputSchemaHash, {
$status: "ok",
note: "alt",
});
// Reuse the same start ref the first step points to so the new step is a valid sibling.
const head = uwf.store.cas.get(stepHash);
const startRefFromHead = (head?.payload as { start: CasRef }).start;
const properOtherStep = await uwf.store.cas.put(uwf.schemas.stepNode, {
start: startRefFromHead,
prev: null,
role: "worker",
output: otherOutputHash,
detail: otherDetailHash,
agent: mockAgentPath,
edgePrompt: "Start work",
startedAtMs: 1716600002000,
completedAtMs: 1716600003000,
cwd: tmpDir,
assembledPrompt: null,
usage: null,
});
// sanity check we constructed a separate hash
expect(properOtherStep).not.toBe(stepHash);
const r2 = runUwf(
["step", "ask", properOtherStep, "-p", "ask on step B", "--agent", mockAgentPath],
casDir,
);
expect(r2.status).toBe(0);
const cachePath = join(tmpDir, "cache", "mock-sessions.json");
const raw = await readFile(cachePath, "utf8");
const parsed = JSON.parse(raw) as Record<string, string>;
expect(parsed[`${stepHash}:ask`]).toBeDefined();
expect(parsed[`${properOtherStep}:ask`]).toBeDefined();
expect(parsed[`${stepHash}:ask`]).not.toBe(parsed[`${properOtherStep}:ask`]);
});
});
// ── Group 5: Fallback (agent has no fork support) ─────────────────────────
describe("uwf step ask - fallback path", () => {
test("5.1 fallback agent (no fork support) still answers via stdout", async () => {
// Use a fallback agent that ONLY supports `ask` mode without ever being asked
// to fork. The CLI should detect missing fork support and inject context instead.
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
// Create a fallback agent script that fails with non-zero exit on "fork" mode.
// Fallback path must NOT call mode=fork; it should call mode=ask directly.
const fallbackPath = join(tmpDir, "fallback-agent.sh");
const promptCapture = join(tmpDir, "fallback-prompt.txt");
const sessionCapture = join(tmpDir, "fallback-session.txt");
const modeCapture = join(tmpDir, "fallback-mode.txt");
await writeFile(
fallbackPath,
`#!/bin/sh
mode=""
prompt=""
session=""
detail=""
while [ $# -gt 0 ]; do
case "$1" in
--mode) mode="$2"; shift 2 ;;
--prompt) prompt="$2"; shift 2 ;;
--session) session="$2"; shift 2 ;;
--detail) detail="$2"; shift 2 ;;
*) shift ;;
esac
done
printf '%s' "$mode" > '${modeCapture}'
printf '%s' "$prompt" > '${promptCapture}'
printf '%s' "$session" > '${sessionCapture}'
case "$mode" in
fork) echo "fork not supported" >&2; exit 99 ;;
ask) printf 'FALLBACK_ANSWER for: %s (detail=%s)\\n' "$prompt" "$detail" ;;
*) echo "unknown" >&2; exit 1 ;;
esac
`,
{ mode: 0o755 },
);
const result = runUwf(
["step", "ask", stepHash, "-p", "explain context", "--agent", fallbackPath, "--no-fork"],
casDir,
);
expect(result.status).toBe(0);
expect(result.stdout).toContain("FALLBACK_ANSWER");
expect(result.stdout).toContain("explain context");
// The fallback agent should be invoked in `ask` mode, with NO session id
// (since no fork happened). The detail ref must be passed for context injection.
const mode = await readFile(modeCapture, "utf8");
expect(mode).toBe("ask");
const session = await readFile(sessionCapture, "utf8");
expect(session).toBe("");
// Make sure mockAgentPath's mock never ran.
void mockAgentPath;
});
test("5.2 fallback ask still does NOT mutate thread state", async () => {
const { casDir, stepHash } = await setupAskFixture();
const fallbackPath = join(tmpDir, "fallback-agent.sh");
await writeFile(
fallbackPath,
`#!/bin/sh
mode=""
prompt=""
while [ $# -gt 0 ]; do
case "$1" in
--mode) mode="$2"; shift 2 ;;
--prompt) prompt="$2"; shift 2 ;;
*) shift ;;
esac
done
case "$mode" in
fork) echo "fork not supported" >&2; exit 99 ;;
ask) printf 'OK %s\\n' "$prompt" ;;
*) exit 1 ;;
esac
`,
{ mode: 0o755 },
);
const { createUwfStore, getThread } = await import("../store.js");
const before = await createUwfStore(tmpDir);
const beforeEntry = getThread(before.varStore, THREAD_ID);
const result = runUwf(
["step", "ask", stepHash, "-p", "any", "--agent", fallbackPath, "--no-fork"],
casDir,
);
expect(result.status).toBe(0);
const after = await createUwfStore(tmpDir);
const afterEntry = getThread(after.varStore, THREAD_ID);
expect(afterEntry?.head).toBe(beforeEntry?.head);
expect(afterEntry?.status).toBe(beforeEntry?.status);
});
});
// ── Group 6: Agent resolution ─────────────────────────────────────────────
describe("uwf step ask - agent resolution", () => {
test("6.1 without --agent flag, agent is resolved from step's agent field", async () => {
// Step's agent field points at mockAgentPath by default.
const { casDir, stepHash, modeCapturePath, promptCapturePath } = await setupAskFixture();
const result = runUwf(["step", "ask", stepHash, "-p", "explain"], casDir);
expect(result.status).toBe(0);
// The mockAgentPath must have been invoked in ask mode with the user prompt.
const mode = await readFile(modeCapturePath, "utf8");
expect(mode).toBe("ask");
const captured = await readFile(promptCapturePath, "utf8");
expect(captured).toBe("explain");
});
test("6.2 --agent override beats step's recorded agent", async () => {
// Record a non-existent agent in step.agent. Provide a working one via --agent.
const { casDir, stepHash, mockAgentPath } = await setupAskFixture({
stepAgentNameOverride: "uwf-does-not-exist",
});
const result = runUwf(
["step", "ask", stepHash, "-p", "explain", "--agent", mockAgentPath],
casDir,
);
expect(result.status).toBe(0);
expect(result.stdout).toContain("MOCK_ANSWER");
});
});
@@ -167,7 +167,7 @@ describe("cmdThreadList status filter", () => {
expect(result[0]?.status).toBe("completed");
});
test("should return all threads when no status filter provided", async () => {
test("should return only active threads when no filter and no --all", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
@@ -185,8 +185,290 @@ describe("cmdThreadList status filter", () => {
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
// Default behavior (issue #147): only active threads (idle + running)
expect(result).toHaveLength(2);
expect(result.map((r) => r.thread).sort()).toEqual([thread1, thread2].sort());
// Clean up marker
await deleteMarker(tmpDir, thread2);
});
test("should return all threads when --all (showAll=true)", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const thread1 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
const thread2 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
const thread3 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
await markThreadRunning(tmpDir, thread2, workflowHash);
const uwfIdx = await createUwfStore(tmpDir);
const index = loadAllThreads(uwfIdx.varStore);
const thread3Head = index[thread3]!.head;
if (thread3Head === undefined) throw new Error("thread3 head not found");
await completeThread(tmpDir, thread3, workflowHash, thread3Head);
const result = await cmdThreadList(tmpDir, null, null, null, null, null, true);
expect(result).toHaveLength(3);
expect(result.map((r) => r.thread).sort()).toEqual([thread1, thread2, thread3].sort());
// Clean up marker
await deleteMarker(tmpDir, thread2);
});
});
// ── default behavior tests (issue #147) ───────────────────────────────────────
describe("cmdThreadList default behavior (issue #147)", () => {
test("default returns only idle + running threads", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const threadA = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 4000);
const threadB = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
const threadC = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
const threadD = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
await markThreadRunning(tmpDir, threadB, workflowHash);
const uwfIdx = await createUwfStore(tmpDir);
const index = loadAllThreads(uwfIdx.varStore);
const threadCHead = index[threadC]!.head;
if (threadCHead === undefined) throw new Error("threadC head not found");
await completeThread(tmpDir, threadC, workflowHash, threadCHead);
// Cancel threadD
const threadDHead = index[threadD]!.head;
if (threadDHead === undefined) throw new Error("threadD head not found");
const uwfCancel = await createUwfStore(tmpDir);
completeThreadInStore(uwfCancel.varStore, threadD, "cancelled");
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
expect(result).toHaveLength(2);
expect(result.map((r) => r.thread).sort()).toEqual([threadA, threadB].sort());
await deleteMarker(tmpDir, threadB);
});
test("default excludes completed threads", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 6000);
const completedThreads: ThreadId[] = [];
for (let i = 0; i < 5; i++) {
const t = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - (5 - i) * 1000);
completedThreads.push(t);
const uwfIdx = await createUwfStore(tmpDir);
const index = loadAllThreads(uwfIdx.varStore);
const head = index[t]!.head;
if (head === undefined) throw new Error("head not found");
await completeThread(tmpDir, t, workflowHash, head);
}
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
expect(result).toHaveLength(1);
expect(result[0]?.thread).toBe(idleThread);
});
test("default excludes cancelled threads", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 4000);
await markThreadRunning(tmpDir, runningThread, workflowHash);
const cancelled: ThreadId[] = [];
for (let i = 0; i < 3; i++) {
const t = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - (3 - i) * 1000);
cancelled.push(t);
const uwfIdx = await createUwfStore(tmpDir);
completeThreadInStore(uwfIdx.varStore, t, "cancelled");
}
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
expect(result).toHaveLength(1);
expect(result[0]?.thread).toBe(runningThread);
await deleteMarker(tmpDir, runningThread);
});
test("--all (showAll=true) returns every status", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 4000);
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
await markThreadRunning(tmpDir, runningThread, workflowHash);
const completedThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
const uwfIdx = await createUwfStore(tmpDir);
const idx = loadAllThreads(uwfIdx.varStore);
const ch = idx[completedThread]!.head;
if (ch === undefined) throw new Error("completedThread head not found");
await completeThread(tmpDir, completedThread, workflowHash, ch);
const cancelledThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
completeThreadInStore(uwfIdx.varStore, cancelledThread, "cancelled");
const result = await cmdThreadList(tmpDir, null, null, null, null, null, true);
expect(result).toHaveLength(4);
expect(result.map((r) => r.thread).sort()).toEqual(
[idleThread, runningThread, completedThread, cancelledThread].sort(),
);
await deleteMarker(tmpDir, runningThread);
});
test("explicit --status overrides default (still returns just the filtered statuses)", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const _idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
await markThreadRunning(tmpDir, runningThread, workflowHash);
const completedThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
const uwfIdx = await createUwfStore(tmpDir);
const idx = loadAllThreads(uwfIdx.varStore);
const ch = idx[completedThread]!.head;
if (ch === undefined) throw new Error("completedThread head not found");
await completeThread(tmpDir, completedThread, workflowHash, ch);
const result = await cmdThreadList(tmpDir, ["completed"], null, null, null, null);
expect(result).toHaveLength(1);
expect(result[0]?.thread).toBe(completedThread);
expect(result[0]?.status).toBe("completed");
await deleteMarker(tmpDir, runningThread);
});
test("--status active keeps working", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
await markThreadRunning(tmpDir, runningThread, workflowHash);
const completedThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
const uwfIdx = await createUwfStore(tmpDir);
const idx = loadAllThreads(uwfIdx.varStore);
const ch = idx[completedThread]!.head;
if (ch === undefined) throw new Error("completedThread head not found");
await completeThread(tmpDir, completedThread, workflowHash, ch);
const result = await cmdThreadList(tmpDir, ["idle", "running"], null, null, null, null);
expect(result).toHaveLength(2);
expect(result.map((r) => r.thread).sort()).toEqual([idleThread, runningThread].sort());
await deleteMarker(tmpDir, runningThread);
});
test("--status + --all — explicit status wins", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const _idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
await markThreadRunning(tmpDir, runningThread, workflowHash);
const completedThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
const uwfIdx = await createUwfStore(tmpDir);
const idx = loadAllThreads(uwfIdx.varStore);
const ch = idx[completedThread]!.head;
if (ch === undefined) throw new Error("completedThread head not found");
await completeThread(tmpDir, completedThread, workflowHash, ch);
const result = await cmdThreadList(tmpDir, ["completed"], null, null, null, null, true);
expect(result).toHaveLength(1);
expect(result[0]?.thread).toBe(completedThread);
await deleteMarker(tmpDir, runningThread);
});
test("default returns empty when no threads", async () => {
await makeUwfStore(tmpDir);
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
expect(result).toHaveLength(0);
});
test("default + time range filter composes correctly", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
const ts1 = Date.UTC(2026, 4, 20, 0, 0, 0);
const ts2 = Date.UTC(2026, 4, 21, 0, 0, 0);
const ts3 = Date.UTC(2026, 4, 22, 0, 0, 0);
const ts4 = Date.UTC(2026, 4, 23, 0, 0, 0);
const ts5 = Date.UTC(2026, 4, 24, 0, 0, 0);
const _t1 = await createTestThread(uwf, tmpDir, workflowHash, ts1);
const t2 = await createTestThread(uwf, tmpDir, workflowHash, ts2);
const t3 = await createTestThread(uwf, tmpDir, workflowHash, ts3);
const t4 = await createTestThread(uwf, tmpDir, workflowHash, ts4);
const _t5 = await createTestThread(uwf, tmpDir, workflowHash, ts5);
// Mark t3 running
await markThreadRunning(tmpDir, t3, workflowHash);
// Complete t4 (should be excluded by default)
const uwfIdx = await createUwfStore(tmpDir);
const idx = loadAllThreads(uwfIdx.varStore);
const t4head = idx[t4]!.head;
if (t4head === undefined) throw new Error("t4 head not found");
await completeThread(tmpDir, t4, workflowHash, t4head);
// afterMs in middle of range to exclude _t1
const afterMs = Date.UTC(2026, 4, 20, 12, 0, 0);
const result = await cmdThreadList(tmpDir, null, afterMs, null, null, null);
// Expected: t2 (idle), t3 (running), _t5 (idle); excludes t4 (completed) and _t1 (filtered by time)
expect(result).toHaveLength(3);
const ids = result.map((r) => r.thread).sort();
expect(ids).toEqual([t2, t3, _t5].sort());
await deleteMarker(tmpDir, t3);
});
test("default + pagination composes correctly", async () => {
const uwf = await makeUwfStore(tmpDir);
const workflowHash = await createTestWorkflow(uwf);
// Create 10 idle threads + 5 completed threads
const idleThreads: ThreadId[] = [];
for (let i = 0; i < 10; i++) {
idleThreads.push(
await createTestThread(uwf, tmpDir, workflowHash, Date.now() - (15 - i) * 1000),
);
}
for (let i = 0; i < 5; i++) {
const t = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - (5 - i) * 1000);
const uwfIdx = await createUwfStore(tmpDir);
const idx = loadAllThreads(uwfIdx.varStore);
const head = idx[t]!.head;
if (head === undefined) throw new Error("head not found");
await completeThread(tmpDir, t, workflowHash, head);
}
const result = await cmdThreadList(tmpDir, null, null, null, 2, 3);
expect(result).toHaveLength(3);
// All results should be idle (default excludes completed)
for (const r of result) {
expect(r.status).toBe("idle");
}
});
});
@@ -118,8 +118,8 @@ describe("suspended thread display", () => {
[idleThreadId]: idleEntry,
});
// Test thread list
const listResult = await cmdThreadList(tmpDir, null, null, null, null, null);
// Test thread list — pass showAll=true to include suspended threads
const listResult = await cmdThreadList(tmpDir, null, null, null, null, null, true);
// Find the suspended and idle threads in results
const suspendedItem = listResult.find((item) => item.thread === suspendedThreadId);
+32 -2
View File
@@ -12,7 +12,7 @@ import {
cmdPromptWorkflowAuthoring,
} from "./commands/prompt.js";
import { cmdSetup, cmdSetupInteractive, resolvePresetBaseUrl } from "./commands/setup.js";
import { cmdStepFork, cmdStepList, cmdStepRead, cmdStepShow } from "./commands/step.js";
import { cmdStepAsk, cmdStepFork, cmdStepList, cmdStepRead, cmdStepShow } from "./commands/step.js";
import {
cmdThreadCancel,
cmdThreadExec,
@@ -233,11 +233,12 @@ function parsePaginationOptions(
thread
.command("list")
.description("List threads")
.description("List threads (defaults to active: idle + running)")
.option(
"--status <status>",
"Filter by status: idle, running, completed, cancelled, active (idle+running), or comma-separated values",
)
.option("--all", "Show all threads regardless of status (overrides default active-only filter)")
.option("--after <date>", "Filter threads created after this date (ISO or relative like '7d')")
.option("--before <date>", "Filter threads created before this date (ISO or relative like '7d')")
.option("--skip <n>", "Skip first n threads")
@@ -245,6 +246,7 @@ thread
.action(
(opts: {
status: string | undefined;
all: boolean | undefined;
after: string | undefined;
before: string | undefined;
skip: string | undefined;
@@ -256,6 +258,7 @@ thread
const nowMs = Date.now();
const { afterMs, beforeMs } = parseTimeFilters(opts.after, opts.before, nowMs);
const { skip, take } = parsePaginationOptions(opts.skip, opts.take);
const showAll = opts.all === true;
const result = await cmdThreadList(
storageRoot,
@@ -264,6 +267,7 @@ thread
beforeMs,
skip,
take,
showAll,
);
writeOutput(result);
});
@@ -390,6 +394,32 @@ step
});
});
step
.command("ask")
.description(
"Ask a follow-up question to a historical step's agent (read-only; no thread mutation)",
)
.argument("<step-hash>", "CAS hash of the StepNode to query")
.requiredOption("-p, --prompt <text>", "Question to ask the step's agent")
.option("--agent <cmd>", "Override agent command (defaults to the step's recorded agent)")
.option(
"--no-fork",
"Skip session-fork; spawn the agent in a fresh ask session and inject the step's detail ref for context",
)
.action(
(stepHash: string, opts: { prompt: string; agent: string | undefined; fork: boolean }) => {
const storageRoot = resolveStorageRoot();
runAction(async () => {
const stdout = await cmdStepAsk(storageRoot, stepHash as CasRef, {
prompt: opts.prompt,
agentOverride: opts.agent ?? null,
fork: opts.fork,
});
process.stdout.write(stdout.endsWith("\n") ? stdout : `${stdout}\n`);
});
},
);
step
.command("read")
.description("Read a step's turns as human-readable markdown")
+221 -1
View File
@@ -1,5 +1,8 @@
import { execFileSync } from "node:child_process";
import type { CasStore } from "@ocas/core";
import type {
AgentAlias,
AgentConfig,
CasRef,
StartEntry,
StepEntry,
@@ -7,9 +10,12 @@ import type {
ThreadForkOutput,
ThreadId,
ThreadStepsOutput,
WorkflowConfig,
WorkflowPayload,
} from "@united-workforce/protocol";
import { generateUlid } from "@united-workforce/util";
import { createUwfStore, setThread } from "../store.js";
import { getAskSessionId, loadWorkflowConfig, setAskSessionId } from "@united-workforce/util-agent";
import { createUwfStore, setThread, type UwfStore } from "../store.js";
import {
collectOrderedSteps,
expandDeep,
@@ -341,3 +347,217 @@ export async function cmdStepRead(
return formatStepMarkdown(stepHash, payload.role, payload.agent, turnData, selectedTurns);
}
// ── step ask ────────────────────────────────────────────────────────────────
function parseAgentOverride(override: string): AgentConfig {
const parts = override
.trim()
.split(/\s+/)
.filter((p) => p.length > 0);
const command = parts[0];
if (command === undefined) {
fail("agent override must not be empty");
}
return { command, args: parts.slice(1) };
}
function resolveAskAgentConfig(
config: WorkflowConfig,
workflow: WorkflowPayload | null,
role: string,
agentOverride: string | null,
recordedAgent: string,
): AgentConfig {
if (agentOverride !== null) {
const fromAlias = config.agents[agentOverride as AgentAlias];
if (fromAlias !== undefined) {
return fromAlias;
}
return parseAgentOverride(agentOverride);
}
// Try to resolve via the recorded agent name as a config alias.
const fromRecorded = config.agents[recordedAgent as AgentAlias];
if (fromRecorded !== undefined) {
return fromRecorded;
}
// Fall back to default agent for the workflow / role.
if (workflow !== null && config.agentOverrides !== null) {
const roleOverrides = config.agentOverrides[workflow.name];
if (roleOverrides !== undefined && roleOverrides[role] !== undefined) {
const alias = roleOverrides[role];
const agentConfig = config.agents[alias];
if (agentConfig !== undefined) {
return agentConfig;
}
}
}
// Treat the recorded value as a raw command path.
return parseAgentOverride(recordedAgent);
}
/**
* Derive the agent name used for cache file partitioning from an executable
* path or alias. Examples:
* uwf-hermes → hermes
* uwf-claude-code → claude-code
* /tmp/mock-agent.sh → mock
* /usr/bin/agent → agent
*/
function deriveAgentName(commandPath: string): string {
const basename = commandPath.split(/[/\\]/).pop() ?? commandPath;
// Strip a trailing extension (.sh, .js, .mjs, .cjs)
const noExt = basename.replace(/\.(sh|js|mjs|cjs|ts)$/i, "");
// Strip the `uwf-` prefix introduced by agentLabel().
const noPrefix = noExt.startsWith("uwf-") ? noExt.slice(4) : noExt;
// Strip the trailing `-agent` suffix used by tests / generic agent shells.
const noSuffix = noPrefix.endsWith("-agent") ? noPrefix.slice(0, -"-agent".length) : noPrefix;
return noSuffix === "" ? noExt : noSuffix;
}
function loadDetailNode(
store: CasStore,
detailRef: CasRef,
): { sessionId: string | null; payload: Record<string, unknown> } {
const detailNode = store.get(detailRef);
if (detailNode === null) {
fail(`detail node not found: ${detailRef}`);
}
const payload = detailNode.payload as Record<string, unknown>;
const sessionId = typeof payload.sessionId === "string" ? payload.sessionId : null;
return { sessionId, payload };
}
function spawnAskAgent(agent: AgentConfig, argv: string[], cwd: string): { stdout: string } {
try {
const stdout = execFileSync(agent.command, [...agent.args, ...argv], {
encoding: "utf8",
stdio: ["ignore", "pipe", "pipe"],
maxBuffer: 50 * 1024 * 1024,
cwd,
});
return { stdout };
} catch (e) {
const err = e as NodeJS.ErrnoException & { stderr: Buffer | string | null };
if (err.code === "ENOENT") {
fail(
`"${agent.command}" not found in PATH. Install it or check your PATH config. Run: which ${agent.command}`,
);
}
const stderr =
err.stderr == null
? ""
: typeof err.stderr === "string"
? err.stderr
: err.stderr.toString("utf8");
const detail = stderr.trim() !== "" ? `: ${stderr.trim()}` : "";
fail(`agent command failed (${agent.command})${detail}`);
}
}
function resolveAskWorkflow(uwf: UwfStore, payload: StepNodePayload): WorkflowPayload | null {
const startNode = uwf.store.cas.get(payload.start);
if (startNode === null) {
return null;
}
const start = startNode.payload as { workflow: CasRef };
const workflowNode = uwf.store.cas.get(start.workflow);
if (workflowNode === null) {
return null;
}
return workflowNode.payload as WorkflowPayload;
}
async function performFork(
agent: AgentConfig,
agentName: string,
stepHash: CasRef,
sourceSessionId: string,
storageRoot: string,
cwd: string,
): Promise<string> {
const cached = await getAskSessionId(agentName, stepHash, storageRoot);
if (cached !== null) {
return cached;
}
const { stdout } = spawnAskAgent(agent, ["--mode", "fork", "--session", sourceSessionId], cwd);
const newSessionId = stdout.trim().split("\n").pop()?.trim() ?? "";
if (newSessionId === "") {
fail(`agent fork did not return a session id (${agent.command})`);
}
await setAskSessionId(agentName, stepHash, newSessionId, storageRoot);
return newSessionId;
}
export type CmdStepAskOptions = {
prompt: string;
agentOverride: string | null;
/** When false, skip session forking and pass detail ref for context injection. */
fork: boolean;
};
/**
* Ask a follow-up question to a historical step's agent (read-only).
*
* Does NOT write a new StepNode and does NOT mutate thread state. The agent's
* raw stdout is returned so the CLI entry point can stream it directly.
*/
export async function cmdStepAsk(
storageRoot: string,
stepHash: CasRef,
options: CmdStepAskOptions,
): Promise<string> {
const uwf = await createUwfStore(storageRoot);
const node = uwf.store.cas.get(stepHash);
if (node === null) {
fail(`CAS node not found: ${stepHash}`);
}
if (node.type !== uwf.schemas.stepNode) {
fail(`node ${stepHash} is not a StepNode`);
}
const payload = node.payload as StepNodePayload;
if (payload.detail === null) {
fail(`step ${stepHash} has no detail; cannot ask`);
}
const detailRef = payload.detail;
const { sessionId: sourceSessionId } = loadDetailNode(uwf.store.cas, detailRef);
const workflow = resolveAskWorkflow(uwf, payload);
const config = await loadWorkflowConfig(storageRoot);
const agent = resolveAskAgentConfig(
config,
workflow,
payload.role,
options.agentOverride,
payload.agent,
);
const agentName = deriveAgentName(agent.command);
const cwd = payload.cwd !== "" ? payload.cwd : process.cwd();
// Fork path: fork (or reuse cached fork) → ask with that session.
if (options.fork && sourceSessionId !== null) {
const askSessionId = await performFork(
agent,
agentName,
stepHash,
sourceSessionId,
storageRoot,
cwd,
);
const argv = ["--mode", "ask", "--session", askSessionId, "--prompt", options.prompt];
argv.push("--detail", detailRef);
const { stdout } = spawnAskAgent(agent, argv, cwd);
return stdout;
}
// Fallback path: ask without forking; inject detail ref for context.
const argv = ["--mode", "ask", "--prompt", options.prompt];
argv.push("--detail", detailRef);
const { stdout } = spawnAskAgent(agent, argv, cwd);
return stdout;
}
+12 -5
View File
@@ -650,18 +650,25 @@ export async function cmdThreadList(
beforeMs: number | null,
skip: number | null,
take: number | null,
showAll: boolean = false,
): Promise<ThreadListItemWithStatus[]> {
const uwf = await createUwfStore(storageRoot);
const index = loadActiveThreads(uwf.varStore);
// Resolve the effective filter:
// - explicit --status wins (showAll has no effect)
// - otherwise: --all → no filter; default → ["idle", "running"]
const effectiveFilter: ThreadStatus[] | null =
statusFilter !== null ? statusFilter : showAll ? null : ["idle", "running"];
// Collect active threads
let items = await collectActiveThreads(storageRoot, uwf, index);
// Collect completed threads (if relevant for status filter)
const includeCompleted =
statusFilter === null ||
statusFilter.includes("completed") ||
statusFilter.includes("cancelled");
effectiveFilter === null ||
effectiveFilter.includes("completed") ||
effectiveFilter.includes("cancelled");
if (includeCompleted) {
const activeIds = new Set(items.map((i) => i.thread));
const completedItems = collectCompletedThreads(uwf, activeIds);
@@ -669,8 +676,8 @@ export async function cmdThreadList(
}
// Apply status filter
if (statusFilter !== null) {
items = items.filter((item) => statusFilter.includes(item.status));
if (effectiveFilter !== null) {
items = items.filter((item) => effectiveFilter.includes(item.status));
}
// Apply time range filters
@@ -0,0 +1,60 @@
import { readFile } from "node:fs/promises";
import { join } from "node:path";
import { describe, expect, test } from "vitest";
/**
* Source-level verification that each adapter's `createAgent({...})` call
* includes the new `fork: null` and `cleanup: null` fields.
*
* Adapters are CLI binaries that spawn external processes — runtime testing
* requires real LLM environments — so we use static source inspection here.
* Type-level correctness is enforced separately by `tsc --build`.
*/
const REPO_ROOT = join(__dirname, "..", "..", "..");
const ADAPTERS: Array<{ name: string; path: string }> = [
{ name: "agent-mock", path: "packages/agent-mock/src/mock-agent.ts" },
{ name: "agent-builtin", path: "packages/agent-builtin/src/agent.ts" },
{ name: "agent-hermes", path: "packages/agent-hermes/src/hermes.ts" },
{ name: "agent-claude-code", path: "packages/agent-claude-code/src/claude-code.ts" },
];
/** Find the matching `}` for the `{` at `openIdx` in `source`. */
function findMatchingBrace(source: string, openIdx: number): number {
let depth = 0;
for (let i = openIdx; i < source.length; i++) {
const ch = source[i];
if (ch === "{") {
depth++;
} else if (ch === "}") {
depth--;
if (depth === 0) {
return i;
}
}
}
return -1;
}
/** Extract the `createAgent({...})` block from adapter source. */
function extractCreateAgentBlock(source: string): string {
const startIdx = source.indexOf("createAgent({");
expect(startIdx).toBeGreaterThanOrEqual(0);
const openIdx = source.indexOf("{", startIdx);
const endIdx = findMatchingBrace(source, openIdx);
expect(endIdx).toBeGreaterThan(openIdx);
return source.slice(openIdx, endIdx + 1);
}
describe("adapter createAgent calls include fork: null and cleanup: null", () => {
for (const adapter of ADAPTERS) {
test(`${adapter.name} createAgent call includes fork: null and cleanup: null`, async () => {
const source = await readFile(join(REPO_ROOT, adapter.path), "utf8");
expect(source).toMatch(/createAgent\s*\(\s*\{/);
const block = extractCreateAgentBlock(source);
expect(block).toMatch(/fork:\s*null/);
expect(block).toMatch(/cleanup:\s*null/);
});
}
});
@@ -0,0 +1,78 @@
import type { Store } from "@ocas/core";
import { describe, expect, test } from "vitest";
import type {
AgentCleanupFn,
AgentContext,
AgentContinueFn,
AgentForkFn,
AgentOptions,
AgentRunFn,
} from "../src/types.js";
const makeRun: AgentRunFn = async (_ctx: AgentContext) => ({
output: "",
detailHash: "",
sessionId: "",
assembledPrompt: "",
usage: null,
});
const makeContinue: AgentContinueFn = async (_sessionId, _message, _store) => ({
output: "",
detailHash: "",
sessionId: "",
assembledPrompt: "",
usage: null,
});
describe("AgentOptions fork/cleanup", () => {
test("AgentOptions accepts fork and cleanup as null", () => {
const opts: AgentOptions = {
name: "test",
run: makeRun,
continue: makeContinue,
fork: null,
cleanup: null,
};
expect(opts.name).toBe("test");
expect(opts.run).toBe(makeRun);
expect(opts.continue).toBe(makeContinue);
expect(opts.fork).toBeNull();
expect(opts.cleanup).toBeNull();
});
test("AgentOptions accepts real fork and cleanup functions", () => {
const fork: AgentForkFn = async (sessionId, _store) => `${sessionId}-forked`;
const cleanup: AgentCleanupFn = async () => {
/* no-op */
};
const opts: AgentOptions = {
name: "test",
run: makeRun,
continue: makeContinue,
fork,
cleanup,
};
expect(typeof opts.fork).toBe("function");
expect(typeof opts.cleanup).toBe("function");
});
test("AgentForkFn signature accepts (sessionId: string, store: Store) and returns Promise<string>", async () => {
const fork: AgentForkFn = async (sessionId, _store) => `${sessionId}-child`;
// Cast a placeholder Store — only the signature shape matters for this test.
const fakeStore = {} as Store;
const result = await fork("session-abc", fakeStore);
expect(result).toBe("session-abc-child");
});
test("AgentCleanupFn signature accepts no args and returns Promise<void>", async () => {
let called = false;
const cleanup: AgentCleanupFn = async () => {
called = true;
};
const result = await cleanup();
expect(result).toBeUndefined();
expect(called).toBe(true);
});
});
@@ -0,0 +1,131 @@
import { mkdir, readFile, rm, writeFile } from "node:fs/promises";
import { dirname, join } from "node:path";
import type { ThreadId } from "@united-workforce/protocol";
import { afterEach, beforeEach, describe, expect, test } from "vitest";
import {
getAskSessionId,
getCachedSessionId,
getCachePath,
setAskSessionId,
setCachedSessionId,
} from "../src/session-cache.js";
import { getDefaultStorageRoot } from "../src/storage.js";
describe("session-cache ask sessions", () => {
let testStorageRoot: string;
beforeEach(async () => {
testStorageRoot = join(
getDefaultStorageRoot(),
"test-cache",
`ask-${Date.now()}-${Math.random()}`,
);
await mkdir(testStorageRoot, { recursive: true });
});
afterEach(async () => {
await rm(testStorageRoot, { recursive: true, force: true });
});
const stepHash = "ABCDEFG1234567";
test("getAskSessionId returns null when no ask session cached", async () => {
const session = await getAskSessionId("claude-code", stepHash, testStorageRoot);
expect(session).toBeNull();
});
test("setAskSessionId + getAskSessionId round-trip", async () => {
await setAskSessionId("claude-code", stepHash, "ask-session-123", testStorageRoot);
const session = await getAskSessionId("claude-code", stepHash, testStorageRoot);
expect(session).toBe("ask-session-123");
});
test("ask cache keys use stepHash:ask format", async () => {
await setAskSessionId("claude-code", stepHash, "ask-session-456", testStorageRoot);
const cachePath = getCachePath("claude-code", testStorageRoot);
const content = JSON.parse(await readFile(cachePath, "utf8")) as Record<string, string>;
expect(content).toHaveProperty(`${stepHash}:ask`, "ask-session-456");
});
test("exec cache and ask cache coexist in same file", async () => {
const threadId = "01234567890123456789012345" as ThreadId;
const role = "developer";
await setCachedSessionId("claude-code", threadId, role, "exec-session", testStorageRoot);
await setAskSessionId("claude-code", stepHash, "ask-session", testStorageRoot);
const cachePath = getCachePath("claude-code", testStorageRoot);
const content = JSON.parse(await readFile(cachePath, "utf8")) as Record<string, string>;
expect(content).toHaveProperty(`${threadId}:${role}`, "exec-session");
expect(content).toHaveProperty(`${stepHash}:ask`, "ask-session");
expect(await getCachedSessionId("claude-code", threadId, role, testStorageRoot)).toBe(
"exec-session",
);
expect(await getAskSessionId("claude-code", stepHash, testStorageRoot)).toBe("ask-session");
});
test("updating ask session does not affect exec session", async () => {
const threadId = "01234567890123456789012345" as ThreadId;
const role = "developer";
await setCachedSessionId("claude-code", threadId, role, "exec-original", testStorageRoot);
await setAskSessionId("claude-code", stepHash, "ask-original", testStorageRoot);
await setAskSessionId("claude-code", stepHash, "ask-updated", testStorageRoot);
expect(await getCachedSessionId("claude-code", threadId, role, testStorageRoot)).toBe(
"exec-original",
);
expect(await getAskSessionId("claude-code", stepHash, testStorageRoot)).toBe("ask-updated");
});
test("updating exec session does not affect ask session", async () => {
const threadId = "01234567890123456789012345" as ThreadId;
const role = "developer";
await setAskSessionId("claude-code", stepHash, "ask-original", testStorageRoot);
await setCachedSessionId("claude-code", threadId, role, "exec-original", testStorageRoot);
await setCachedSessionId("claude-code", threadId, role, "exec-updated", testStorageRoot);
expect(await getAskSessionId("claude-code", stepHash, testStorageRoot)).toBe("ask-original");
expect(await getCachedSessionId("claude-code", threadId, role, testStorageRoot)).toBe(
"exec-updated",
);
});
test("different stepHashes have independent ask sessions", async () => {
const stepHashA = "AAAAAAA1234567";
const stepHashB = "BBBBBBB1234567";
await setAskSessionId("claude-code", stepHashA, "session-A", testStorageRoot);
await setAskSessionId("claude-code", stepHashB, "session-B", testStorageRoot);
expect(await getAskSessionId("claude-code", stepHashA, testStorageRoot)).toBe("session-A");
expect(await getAskSessionId("claude-code", stepHashB, testStorageRoot)).toBe("session-B");
});
test("ask session for one agent does not leak to another", async () => {
await setAskSessionId("claude-code", stepHash, "cc-ask-session", testStorageRoot);
const ccSession = await getAskSessionId("claude-code", stepHash, testStorageRoot);
const hermesSession = await getAskSessionId("hermes", stepHash, testStorageRoot);
expect(ccSession).toBe("cc-ask-session");
expect(hermesSession).toBeNull();
});
test("empty string ask session treated as missing", async () => {
const cachePath = getCachePath("claude-code", testStorageRoot);
await mkdir(dirname(cachePath), { recursive: true });
await writeFile(cachePath, JSON.stringify({ [`${stepHash}:ask`]: "" }), "utf8");
const session = await getAskSessionId("claude-code", stepHash, testStorageRoot);
expect(session).toBeNull();
});
});
+9 -1
View File
@@ -14,12 +14,20 @@ export type { FrontmatterFastPathResult } from "./frontmatter.js";
export { tryFrontmatterFastPath } from "./frontmatter.js";
export { buildFrontmatterRetryPrompt } from "./frontmatter-retry-prompt.js";
export { createAgent, parseArgv } from "./run.js";
export { getCachedSessionId, getCachePath, setCachedSessionId } from "./session-cache.js";
export {
getAskSessionId,
getCachedSessionId,
getCachePath,
setAskSessionId,
setCachedSessionId,
} from "./session-cache.js";
export { getConfigPath, getEnvPath, loadWorkflowConfig, resolveStorageRoot } from "./storage.js";
export type {
AdapterOutput,
AgentCleanupFn,
AgentContext,
AgentContinueFn,
AgentForkFn,
AgentOptions,
AgentRunFn,
AgentRunResult,
+34
View File
@@ -14,6 +14,10 @@ function cacheKey(threadId: ThreadId, role: string): string {
return `${threadId}:${role}`;
}
function askCacheKey(stepHash: string): string {
return `${stepHash}:ask`;
}
function isRecord(value: unknown): value is Record<string, unknown> {
return typeof value === "object" && value !== null && !Array.isArray(value);
}
@@ -86,3 +90,33 @@ export async function setCachedSessionId(
cache[cacheKey(threadId, role)] = sessionId;
await writeCache(agentName, storageRoot, cache);
}
/**
* Read the cached ask-session ID for a stepHash.
*
* Ask sessions are forked side conversations spawned by `step ask` from a
* specific completed step. They share the per-agent cache file with exec
* sessions but use the `<stepHash>:ask` key shape so the two namespaces
* never collide.
*/
export async function getAskSessionId(
agentName: string,
stepHash: string,
storageRoot: string,
): Promise<string | null> {
const cache = await readCache(agentName, storageRoot);
const sessionId = cache[askCacheKey(stepHash)];
return sessionId ?? null;
}
/** Write the ask-session ID for a stepHash into the cache. */
export async function setAskSessionId(
agentName: string,
stepHash: string,
sessionId: string,
storageRoot: string,
): Promise<void> {
const cache = await readCache(agentName, storageRoot);
cache[askCacheKey(stepHash)] = sessionId;
await writeCache(agentName, storageRoot, cache);
}
+25
View File
@@ -50,6 +50,21 @@ export type AgentContinueFn = (
export type AgentRunFn = (ctx: AgentContext) => Promise<AgentRunResult>;
/**
* Fork an existing agent session, returning a new session ID that branches
* from the source session's state. Used by `step ask` (Phase 2a infrastructure)
* to spawn a side conversation from a completed step's session without
* polluting the original session's history.
*/
export type AgentForkFn = (sessionId: string, store: AgentContext["store"]) => Promise<string>;
/**
* Clean up adapter-level resources (e.g. close ACP client, kill subprocesses).
* Invoked by the agent CLI factory after the run completes regardless of
* success or failure so adapters can release I/O handles deterministically.
*/
export type AgentCleanupFn = () => Promise<void>;
export type AdapterOutput = {
stepHash: string;
detailHash: string;
@@ -65,4 +80,14 @@ export type AgentOptions = {
name: string;
run: AgentRunFn;
continue: AgentContinueFn;
/**
* Optional session-fork hook. null means the adapter does not yet support
* `step ask` (Phase 2a placeholder wired up in Phase 2b).
*/
fork: AgentForkFn | null;
/**
* Optional cleanup hook invoked after the agent CLI completes. null means
* the adapter has no resources to release.
*/
cleanup: AgentCleanupFn | null;
};
+3 -2
View File
@@ -29,8 +29,9 @@ uwf thread exec <thread-id> # execute one moderator→agen
[-c, --count <number>] # run multiple steps (default: 1)
[--background] # run in background
uwf thread show <thread-id> # show thread head pointer
uwf thread list # list threads
[--status <status>] # filter: idle, running, or completed
uwf thread list # list active threads (idle + running)
[--all] # include completed/cancelled/suspended
[--status <status>] # filter: idle, running, suspended, completed, cancelled, active
uwf thread read <thread-id> # render thread context as markdown
[--quota <chars>] # max output characters (default 32000)
[--before <step-hash>] # load steps before this hash (exclusive)
+8 -2
View File
@@ -67,8 +67,9 @@ uwf thread exec <thread-id> # execute one step
[-c, --count <n>] # run n steps
[--background] # run in background
uwf thread show <thread-id> # show head pointer
uwf thread list # list all threads
[--status <filter>] # idle, running, completed, cancelled, active (comma-separated)
uwf thread list # list active threads (idle + running)
[--all] # include completed/cancelled/suspended
[--status <filter>] # idle, running, suspended, completed, cancelled, active (comma-separated)
[--after <thread-id>] # pagination: after this thread
[--before <thread-id>] # pagination: before this thread
[--skip <n>] # skip first n results
@@ -94,10 +95,15 @@ start → exec (repeat) → thread reaches $END → auto-completed
uwf step list <thread-id> # list all steps
uwf step show <step-hash> # show step details
uwf step fork <step-hash> # fork thread from a step (branch)
uwf step ask <step-hash> -p <prompt> [--agent <cmd>] [--no-fork]
# ask a follow-up question to the step's agent
# (read-only; no new step, no thread mutation)
\`\`\`
Forking creates a new thread that shares history up to the fork point useful for retrying from a known-good state.
\`step ask\` re-opens the agent session that produced \`<step-hash>\` and returns its answer on stdout. Subsequent asks reuse the same forked session via the per-agent ask-cache; \`--no-fork\` runs the agent fresh with the step's detail ref injected for context.
## CAS Commands
Use the \`ocas\` CLI for direct CAS operations (\`~/.ocas/\` store, shared with \`uwf\`):