Compare commits
16 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| c128fad38e | |||
| 60fdb0a7ff | |||
| ae757e4d44 | |||
| e1c7e3d267 | |||
| 8b01ade66a | |||
| 10113f6ec6 | |||
| 04e2b5b8a7 | |||
| f697aec3e7 | |||
| b5e094ab4d | |||
| e9e896146e | |||
| d666516ce6 | |||
| afc0287094 | |||
| 22bffc5fcd | |||
| 4c5cc27d52 | |||
| 031ecc6f7e | |||
| e4c46c8150 |
@@ -0,0 +1,19 @@
|
||||
---
|
||||
title: "Agency over Content, Not Process"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- skill-vs-workflow-different-layers
|
||||
- deterministic-engine-uncertain-agent
|
||||
- feedback-loops-convergent-and-divergent
|
||||
- cognitive-process-orchestration
|
||||
- uwf-vs-dynamic-workflow
|
||||
---
|
||||
|
||||
uwf 与"agent 自治"方案的核心区别:**agent 对内容有自主权,但对流程没有**。
|
||||
|
||||
流程是声明式的、引擎执行的、agent 无法绕过的。agent 不能决定跳过 review,就像程序员不能绕过 CI。自由度被有意限制在"内容"维度,"过程"维度是刚性的。这跟人类组织的逻辑一致——你可以自由发挥怎么写代码,但必须走 PR review。
|
||||
|
||||
参见 [[uwf-vs-dynamic-workflow]] 了解与 Claude Code dynamic workflow 的具体对比。
|
||||
@@ -0,0 +1,19 @@
|
||||
---
|
||||
title: "Agent as Graduate — The Onboarding Metaphor"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [concept, analogy]
|
||||
category: "product"
|
||||
links:
|
||||
- vendor-vs-fte-who-defines-capability
|
||||
- three-learning-carriers
|
||||
- fte-maturity-threshold
|
||||
---
|
||||
|
||||
FTE 型 agent 最贴切的类比:**应届毕业生**。
|
||||
|
||||
出厂时有通用能力(底座模型 = 学历),但不懂你的业务、不知道你的偏好、没有你的流程经验。用户的角色是"带教老师"——通过日常协作,逐步把 agent 带成自己的得力助手。
|
||||
|
||||
这个类比揭示了当前 FTE 产品的核心瓶颈:**带教门槛太高**。现在只有技术背景深厚的用户才能"带"——能写 skill、能调 workflow、能 debug agent 行为。行业专家(不懂代码的人)被挡在门外。
|
||||
|
||||
真正成熟的 FTE 型产品 = 降低带教门槛,让非技术用户也能教会 agent 自己的业务。
|
||||
@@ -0,0 +1,21 @@
|
||||
---
|
||||
title: "Attention Isolation Breaks Cognitive Inertia"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- session-isolation-as-cognitive-reset
|
||||
- skill-vs-workflow-different-layers
|
||||
- role-is-not-agent
|
||||
---
|
||||
|
||||
"知识都在一个 session 内不是更好吗?"——这个直觉混淆了**信息量**和**认知模式**。
|
||||
|
||||
Session 隔离去掉的不是信息,而是**不该影响当前判断的信息**。reviewer 通过 CAS 链拿到 developer 的全部产出物(代码、变更说明),它缺的是 developer 的内心独白——为什么选方案 A、哪里犹豫过、哪里偷了懒。
|
||||
|
||||
这恰恰是关键。知道"为什么"的 reviewer 会顺着作者的逻辑走;不知道"为什么"的 reviewer 只能看产出物本身是否站得住——就像真实用户或未来维护者的视角。与学术双盲评审同理:去掉不该影响判断的信息,让注意力聚焦在工作本身。
|
||||
|
||||
每个认知任务需要的信息集合不同。developer 需要 issue 上下文、代码库知识、技术约束;reviewer 需要 diff、规范、测试结果。混在一起不是多了信息,是多了噪声。
|
||||
|
||||
**关注点的隔离是打破惯性和线性思维的关键。** 一个 session 做所有事,不是"知识都在",是关注点混在一起,确认偏误无法靠 prompt 消除,只能靠结构隔离。
|
||||
@@ -0,0 +1,18 @@
|
||||
---
|
||||
title: "Cognitive Process Orchestration"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- feedback-loops-convergent-and-divergent
|
||||
- session-isolation-as-cognitive-reset
|
||||
- role-is-not-agent
|
||||
- process-discipline-from-software-engineering
|
||||
---
|
||||
|
||||
uwf 的抽象层次高于"质量保障工具"或"任务编排引擎"——它是一个**认知过程的编排引擎**。
|
||||
|
||||
收敛和发散都是认知过程。负反馈环(code review 循环)和正反馈环(苏格拉底式追问、头脑风暴)是同一套机制的不同配置。workflow author 通过设计 role 的 goal 和 graph 的环路结构,编排的是**思维方式**,不仅仅是任务步骤。
|
||||
|
||||
这意味着 uwf 的应用范围不限于软件开发流程,而是任何需要多视角、多轮次认知协作的场景。
|
||||
@@ -0,0 +1,20 @@
|
||||
---
|
||||
title: "Cold Start — Same Entry Point, Different Exit"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- uwf-vs-dynamic-workflow
|
||||
- process-authorship-human-ai-vs-delegation
|
||||
- workflow-as-improvable-system
|
||||
- agent-as-graduate
|
||||
---
|
||||
|
||||
uwf 的冷启动不比 dw 更复杂——起点完全一样:用户描述任务,agent 执行。
|
||||
|
||||
区别在出口:dw 跑完即丢,uwf 跑完后沉淀成 workflow YAML,用户可以审查、调优、复用。workflow 不一定要用户写,往往也是 agent 写的——跟 dw 一样的模式。uwf 和 dw 的差异不在"谁写流程",而在"流程跑完后去哪"。
|
||||
|
||||
冷启动路径:agent 先跑一次临时流程 → 用户觉得好就固化成 workflow → 下次同类任务直接复用 → 用过几次后根据经验调优。从零门槛的即兴执行,渐进演化为成熟的可复用流程。
|
||||
|
||||
入口像 dw 一样低,出口比 dw 多了一个沉淀层。
|
||||
@@ -0,0 +1,16 @@
|
||||
---
|
||||
title: "Deterministic Engine, Uncertain Agent"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- process-discipline-from-software-engineering
|
||||
- session-isolation-as-cognitive-reset
|
||||
---
|
||||
|
||||
uwf 的架构将确定性和不确定性严格分层。
|
||||
|
||||
Engine 层(moderator 纯查表、CAS 不可变、每步原子化)是刚性的——流程骨架本身不能成为另一个不可靠的环节。LLM 的不确定性被严格约束在 agent session 内部。
|
||||
|
||||
这个选择意味着:调度逻辑完全可预测、可调试、可审计。出问题时你知道问题一定在某个 session 的产出里,不在流程逻辑里。
|
||||
@@ -0,0 +1,16 @@
|
||||
---
|
||||
title: "Dissipative Structure — Token for Entropy Reduction"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- process-discipline-from-software-engineering
|
||||
- session-isolation-as-cognitive-reset
|
||||
---
|
||||
|
||||
uwf 本质上是一种耗散结构:通过消耗能量(token)实现熵减。
|
||||
|
||||
一个 AI session 做长了会漂移、会累积错误、会失去焦点。把一件事拆成多个有明确边界的 session,让它们从不同角度相互校验,比一个 session 从头做到尾更可靠。多花的 token 就是耗散的能量,换来的是更低的交付熵——更可预测、更高质量的产出。
|
||||
|
||||
这与人类工程实践中引入 review、测试、灰度等流程的逻辑一致:都是在用额外成本换系统可靠性。
|
||||
@@ -0,0 +1,20 @@
|
||||
---
|
||||
title: "Domain Experts Own the Process"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- trust-chain-audit-evaluate-reuse
|
||||
- uwf-vs-dynamic-workflow
|
||||
- cognitive-process-orchestration
|
||||
- process-discipline-from-software-engineering
|
||||
---
|
||||
|
||||
现实中各行各业有大量由反馈回路构成的流程正在实际运行,掌握和优化这些流程的是行业专家,不是 AI 工程师。
|
||||
|
||||
一个资深 QA 负责人知道测试应该怎么分层、失败后应该回到哪一步。一个风控经理知道审批要经过几道关、驳回后应该回到哪个环节补材料。这些人掌握流程的核心知识,但你让他们写 JS 编排脚本,他们做不到也不应该做。
|
||||
|
||||
YAML 声明式 workflow 让行业专家能直接参与——看得懂 roles 和 graph,能判断"这个环节是不是多余的"、"这两个角色之间应该加一个校验步骤"。审查门槛低不是为了技术简洁,是为了**让对的人参与对的决策**。
|
||||
|
||||
这是可审查 → 可评估 → 可复用信任链能真正转动的前提——转动它的人是行业专家,不是 AI 工程师。也是 uwf 选择声明式 YAML 而非 JS 的根本原因:**流程的设计权应该属于懂流程的人**。
|
||||
@@ -0,0 +1,21 @@
|
||||
---
|
||||
title: "Eval Closes the Trust Chain"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- trust-chain-audit-evaluate-reuse
|
||||
- workflow-as-improvable-system
|
||||
- feedback-loops-convergent-and-divergent
|
||||
---
|
||||
|
||||
信任链(可审查 → 可评估 → 可复用 → 可迭代)的"可评估"环节需要工程落地。
|
||||
|
||||
uwf 的 eval 包(`@united-workforce/eval`,已在 repo 开发中)的目标是让 agent 能自我评估执行质量——一次 thread 跑完后,度量"做得好不好"、"workflow v2 比 v1 好还是差"。
|
||||
|
||||
这形成了两层反馈闭环:
|
||||
1. **workflow 内的反馈环** — developer → reviewer → rejected → developer(已实现,负反馈驱动执行质量收敛)
|
||||
2. **workflow 级的反馈环** — 执行 → eval → workflow 迭代 → 再执行(在建,驱动流程本身的持续改进)
|
||||
|
||||
第二层闭环接通后,uwf 就不只是一个执行引擎,而是一个**自我改进的流程系统**。
|
||||
@@ -0,0 +1,21 @@
|
||||
---
|
||||
title: "Feedback Loops — Convergent and Divergent"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- dissipative-structure-token-for-entropy
|
||||
- process-discipline-from-software-engineering
|
||||
- cognitive-process-orchestration
|
||||
---
|
||||
|
||||
uwf 的 graph 环路不限于负反馈(收敛),也可以是正反馈(发散)。引擎本身不带倾向——流转方向由 `$status` 和 graph 决定,反馈性质由 role 的设计意图决定。
|
||||
|
||||
**负反馈环(收敛)**:developer → reviewer → rejected → developer。reviewer 的 goal 是"找问题",产生修正力。稳定点是 `approved`,系统自然收敛到那里。特性:偏差越大修正越强,对扰动鲁棒。
|
||||
|
||||
**正反馈环(发散)**:proposer → challenger → "interesting" → proposer。challenger 的 goal 是"追问更深层的假设",每轮发散,一个想法激发更多想法。
|
||||
|
||||
终止条件不同:负反馈靠收敛自然到达稳定点;正反馈不会自己停,需要外部约束(轮次上限,或额外 role 判断"够了")。
|
||||
|
||||
每个 role 的 `$status` 就是误差信号(负反馈)或激励信号(正反馈),驱动系统向不同方向演化。Workflow author 真正在设计的是**在哪里放什么样的环**。
|
||||
@@ -0,0 +1,27 @@
|
||||
---
|
||||
title: "Four Advantages over Single Session + Skill"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- session-isolation-as-cognitive-reset
|
||||
- attention-isolation-breaks-cognitive-inertia
|
||||
- skill-vs-workflow-different-layers
|
||||
- when-skill-is-not-enough
|
||||
---
|
||||
|
||||
Session 隔离除了认知层面的好处(打破确认偏误、聚焦注意力),还解决一个更物理性的问题:**长 session 的上下文压缩导致降智和行为不稳定**。
|
||||
|
||||
Context window 是有限资源。一个 session 从头做到尾,前期的 tool output、中间的思考过程不断堆积,要么触发 compaction(信息丢失),要么挤占后期推理的有效空间。越到后面 agent 越"笨"——不是能力变了,是可用的认知空间被历史占满了。表现为:忘记约束、重复错误、输出不稳定。
|
||||
|
||||
Session 隔离直接解决这个问题:每个 role 进入时拿到的是**精炼过的前序产出**(CAS 里经 schema 过滤的结构化 output),不是前面所有 session 的原始 token 流。信息经过 schema 过滤,只有产出物,没有过程噪声。
|
||||
|
||||
uwf 相对单 session + skill 的四个优势,前三个来自 session 隔离,第四个来自程序化流程:
|
||||
|
||||
1. **认知隔离** — 打破确认偏误和线性思维惯性
|
||||
2. **注意力聚焦** — 每个 role 只看该看的信息
|
||||
3. **上下文保鲜** — 避免长 session 的压缩降智和行为漂移
|
||||
4. **流程可靠性** — 引擎强制执行每一步,agent 无法跳过或篡改流程
|
||||
|
||||
前三点回答"为什么拆成多个 session 更好",第四点回答"为什么流程要由引擎控制而不是 agent 自觉"。Skill 里写"先编码再测试再 review",agent 可能做着做着就跳过——不是故意的,是 context 压力下行为漂移,或者觉得"改动太小不需要测试"。程序化流程不存在这个问题:graph 说要走 tester,就必须走 tester。
|
||||
@@ -0,0 +1,25 @@
|
||||
---
|
||||
title: "FTE Maturity Threshold — Who Can Onboard an Agent"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [concept, decision]
|
||||
category: "product"
|
||||
links:
|
||||
- agent-as-graduate
|
||||
- vendor-vs-fte-who-defines-capability
|
||||
- three-learning-carriers
|
||||
---
|
||||
|
||||
FTE 型 agent 的成熟度,归根结底看一个问题:**谁能带教它?**
|
||||
|
||||
当前阶段(2026):OpenClaw、Claude Code、Hermes 都是 FTE 型产品的雏形,三者都具备 memory/skill/workflow 三个载体。但它们的用户画像高度重叠——有较深技术能力的开发者。
|
||||
|
||||
这意味着 FTE agent 现在更像"只有技术 lead 才能带的毕业生"。要跨越鸿沟,需要降低带教门槛到**行业专家(不懂代码的人)也能带、也能教、也能调优**。
|
||||
|
||||
谁先把这个门槛降下来,谁就定义了 FTE agent 品类的分水岭。
|
||||
|
||||
可能的降低路径:
|
||||
- **自然语言 skill 定义**(不需要写代码/YAML)
|
||||
- **可视化 workflow 编辑**(拖拽而非配置)
|
||||
- **Agent 主动学习**(从用户行为中推断偏好,而非等用户显式配置)
|
||||
- **带教过程本身被 agent 化**(用 agent 辅助用户定义 skill 和 workflow)
|
||||
@@ -0,0 +1,23 @@
|
||||
---
|
||||
title: "FTE Product Landscape — OpenClaw, Claude Code, Hermes"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [concept, comparison]
|
||||
category: "product"
|
||||
links:
|
||||
- vendor-vs-fte-who-defines-capability
|
||||
- three-learning-carriers
|
||||
- fte-maturity-threshold
|
||||
- agent-as-graduate
|
||||
---
|
||||
|
||||
2026 年中,FTE 型 agent 的代表产品对比:
|
||||
|
||||
**共性**:都有 memory、skill、workflow/多步协作机制,都面向技术用户。
|
||||
|
||||
**差异点**:
|
||||
- **OpenClaw** — uwf 引擎驱动,用 YAML 定义多角色 workflow,强调流程纪律和 session 隔离。面向团队级 agent 协作。
|
||||
- **Claude Code** — Anthropic 官方 CLI agent,CLAUDE.md 作为 memory,skill 通过项目约定积累。单 agent 深度协作,开发者体验好。
|
||||
- **Hermes** — 跨平台 agent 协调者,memory/skill/cron 体系完善,支持多 agent 调度。偏个人效率工具。
|
||||
|
||||
三者都谈不上成熟。成熟的标志不是技术完备度,而是**非技术用户能否用起来**。
|
||||
@@ -0,0 +1,22 @@
|
||||
---
|
||||
title: "OPC — Why FTE Agents Matter Most"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [vision, decision]
|
||||
category: "product"
|
||||
links:
|
||||
- vendor-vs-fte-who-defines-capability
|
||||
- agent-as-graduate
|
||||
- fte-maturity-threshold
|
||||
---
|
||||
|
||||
OpenClaw 押注 FTE 型 agent 的核心判断:**AI 的终极形态不是工具,是同事。**
|
||||
|
||||
工具被使用,同事被培养。工具的价值在出厂那一刻确定,同事的价值随协作持续增长。
|
||||
|
||||
这个判断决定了产品方向:
|
||||
- 不做"最强的单次对话",做"最能被带教的长期协作者"
|
||||
- 不做"开箱即用的成品",做"越用越好用的底座"
|
||||
- 核心指标不是 benchmark 分数,是用户留存和 skill 积累量
|
||||
|
||||
uwf 是这个判断的工程实现——用流程纪律让 agent 的产出可靠,让用户敢把真正的业务交给它。
|
||||
@@ -0,0 +1,22 @@
|
||||
---
|
||||
title: "Open Question — Human as Role Participant"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, open-question]
|
||||
category: "architecture"
|
||||
links:
|
||||
- agent-as-graduate
|
||||
- opc-why-fte-agents-matter-most
|
||||
- role-is-not-agent
|
||||
- process-authorship-human-ai-vs-delegation
|
||||
---
|
||||
|
||||
**待讨论。**
|
||||
|
||||
目前讨论主要围绕 OPC(一个人 + N 个 agent)。但小团队场景下——几个人各自有 FTE agent,共享 workflow 库和记忆——workflow 的某些 role 可能需要人来执行而不是 agent。
|
||||
|
||||
问题:
|
||||
- uwf 是否需要支持人作为 role 的参与者(比如"人工审批"作为 graph 中的一个 role)?
|
||||
- 还是人永远在 workflow 之外,只做设计者和监督者?
|
||||
- 如果支持,$SUSPEND 机制是否已经覆盖了这个需求(暂停等人介入)?
|
||||
- 多人 + 多 agent 的协作场景下,workflow 的共享和权限模型是什么样的?
|
||||
@@ -0,0 +1,20 @@
|
||||
---
|
||||
title: "Open Question — Workflow Granularity and Composition"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, open-question]
|
||||
category: "architecture"
|
||||
links:
|
||||
- cognitive-process-orchestration
|
||||
- skill-vs-workflow-different-layers
|
||||
- domain-experts-own-the-process
|
||||
---
|
||||
|
||||
**待讨论。**
|
||||
|
||||
Workflow 的粒度问题:solve-issue 是端到端的大 workflow(planner → developer → reviewer → tester → committer),但现实中有些场景只需要管一个环节(比如只用 uwf 管 code review,其他部分用 skill 或手动)。
|
||||
|
||||
问题:
|
||||
- Workflow 是否应该支持嵌套或组合——小 workflow 作为大 workflow 的一个 role?
|
||||
- 还是粒度完全由用户自己决定,引擎不需要管?
|
||||
- 组合式 workflow 和单体 workflow 各自的 trade-off 是什么?
|
||||
@@ -0,0 +1,23 @@
|
||||
---
|
||||
title: "Process Authorship — Human-AI Collaboration vs Full Delegation"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- domain-experts-own-the-process
|
||||
- uwf-vs-dynamic-workflow
|
||||
- trust-chain-audit-evaluate-reuse
|
||||
- workflow-as-improvable-system
|
||||
---
|
||||
|
||||
dw 和 uwf 都面向 agent,用户都不需要会写代码。区别在于**流程的创作权**:
|
||||
|
||||
- **dw**:流程由 AI 全权负责。用户描述任务,agent 决定怎么拆步骤、怎么编排。用户参与度最低,门槛最低。
|
||||
- **uwf**:流程创作是人和 AI 协作的。行业专家参与设计、审查、调优流程,agent 参与起草和执行。
|
||||
|
||||
这是主动权的取舍。dw 把流程交给 AI 是为了降低使用门槛;uwf 有意保留人对流程的参与权,代价是门槛稍高,收益是流程能融入人的领域知识。
|
||||
|
||||
背后的认知:**AI 擅长执行,但流程设计需要领域知识。** AI 不知道行业里哪个环节容易出错、哪个审批不能跳过、哪个反馈回路是血的教训换来的。这些知识在行业专家脑子里,需要一个他们能参与的载体来表达。
|
||||
|
||||
dw 赌的是 AI 能自己发现好的流程,uwf 赌的是好的流程需要人的知识参与。两个赌注没有对错,适用于不同的场景:临时任务用 dw 的零门槛更高效,反复执行的核心业务流程用 uwf 的人机协作更可靠。
|
||||
@@ -0,0 +1,20 @@
|
||||
---
|
||||
title: "Process Discipline from Software Engineering"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- session-isolation-as-cognitive-reset
|
||||
- role-is-not-agent
|
||||
- dissipative-structure-token-for-entropy
|
||||
- deterministic-engine-uncertain-agent
|
||||
---
|
||||
|
||||
uwf 的发心是将人类软件工程的流程纪律应用到 AI agent 上。
|
||||
|
||||
人类早已验证:个体不可靠,但流程可以让不可靠的个体组成可靠的系统。Code review 不是因为不信任程序员,而是**写代码和审代码是两种认知模式**,一个人很难同时做好。测试、灰度、回滚——每一层都是在用额外成本换确定性。
|
||||
|
||||
uwf 把这套搬过来:planner 和 reviewer 可以是同一个 agent,但流程迫使它在不同 session 里切换视角,形成自我制衡。用 role 和 role 之间的流转关系,**把做一件事的步骤固定下来**。
|
||||
|
||||
PR #148 vs #142 是直接证据——不是换了更强的 agent,是同样的 agent,换了协作结构。
|
||||
@@ -0,0 +1,35 @@
|
||||
---
|
||||
title: "Reflective Workflow — Self-Improvement as Discipline"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- eval-closes-the-trust-chain
|
||||
- three-learning-carriers
|
||||
- workflow-as-improvable-system
|
||||
- feedback-loops-convergent-and-divergent
|
||||
- trust-chain-audit-evaluate-reuse
|
||||
---
|
||||
|
||||
FTE agent 的"成长"不靠自发顿悟,靠纪律性的反思。反思本身是纪律性的(定期跑、不能跳过、有固定步骤),所以应该用 workflow 承载——不能靠 agent "有空想想"。
|
||||
|
||||
反思 workflow 定期拉取最近执行过的任务,分析流程中出现的问题,找可优化的点,迭代,eval,对比。反思的对象覆盖三层载体:
|
||||
|
||||
- 发现某个 role 反复在同一类问题上出错 → **迭代 skill**
|
||||
- 发现某类任务的上下文总是缺少关键信息 → **补充记忆**
|
||||
- 发现某个审批环节通过率 100% 从未驳回 → **简化 workflow**
|
||||
|
||||
这形成了双层 workflow 架构:
|
||||
|
||||
```
|
||||
执行层:workflow 驱动日常任务
|
||||
↓ 产出执行记录(CAS 链)
|
||||
反思层:反思 workflow 定期分析执行记录
|
||||
↓ 产出改进建议
|
||||
改进层:迭代 memory / skill / workflow
|
||||
↓ 提升下一轮执行质量
|
||||
执行层:...
|
||||
```
|
||||
|
||||
两层都是 workflow,职责不同——执行层做事,反思层改进做事的方式。用 workflow 来优化 workflow——工具改进自身的递归。
|
||||
@@ -0,0 +1,16 @@
|
||||
---
|
||||
title: "Role Is Not Agent"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- session-isolation-as-cognitive-reset
|
||||
- process-discipline-from-software-engineering
|
||||
---
|
||||
|
||||
在 uwf 体系里,role ≠ agent。一个 thread 跑的过程中,所有 role 往往由**同一个 agent** 扮演。
|
||||
|
||||
Role 对应的是 agent 的 **session**——为了解决一个问题,需要多个 session 从不同角度观察和行动、相互制衡。角色可以在流程中多次重入,重入时**复用**同一个 session(保持角色内记忆连续),隔离发生在角色之间,不是每一步。
|
||||
|
||||
这个区分决定了 uwf 的设计不是在做"任务分发给不同 agent",而是在做**一个 agent 的多视角自我协作**。
|
||||
@@ -0,0 +1,17 @@
|
||||
---
|
||||
title: "Session Isolation as Cognitive Reset"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- role-is-not-agent
|
||||
- dissipative-structure-token-for-entropy
|
||||
- process-discipline-from-software-engineering
|
||||
---
|
||||
|
||||
uwf 的核心机制不是"多 agent 协调",而是**用 session 隔离实现视角切换**。
|
||||
|
||||
同一个 agent 以不同 role 进入时,得到的是全新的认知上下文——没有惯性、没有确认偏误。CAS 链传递工作成果,但认知状态是重置的。Role 定义(goal、procedure、output schema)塑造每个 session 的关注点和行为边界。
|
||||
|
||||
这解释了为什么 stateless 单步设计这么重要:engine 确保每次角色切换都是一个干净的 session 入口。
|
||||
@@ -0,0 +1,19 @@
|
||||
---
|
||||
title: "Skill vs Workflow — Different Layers"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- session-isolation-as-cognitive-reset
|
||||
- cognitive-process-orchestration
|
||||
- agency-over-content-not-process
|
||||
---
|
||||
|
||||
Skill 和 workflow 不是替代关系,是不同层次。
|
||||
|
||||
**Skill** 管的是一个 session 内怎么做——给 agent 的指令和方法论。你可以在 skill 里写"先规划再编码再 review",但 agent 始终在同一个 session 里,review 自己刚写的代码时带着全部决策记忆。确认偏误无法靠 prompt 消除。
|
||||
|
||||
**Workflow** 管的是 session 之间怎么协作——强制 session 断裂,reviewer 进来时不知道 developer 当时为什么做那个选择,只看到产出物。这个隔离不是靠自律,是靠结构。
|
||||
|
||||
两者正交:workflow 的每个 role 里面完全可以加载 skill。Skill 提升单个 session 的能力,workflow 编排多个 session 的协作关系。
|
||||
@@ -0,0 +1,21 @@
|
||||
---
|
||||
title: "Switching Cost — Process Knowledge as Moat"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [concept, decision]
|
||||
category: "product"
|
||||
links:
|
||||
- vendor-vs-fte-who-defines-capability
|
||||
- three-learning-carriers
|
||||
- agent-as-graduate
|
||||
---
|
||||
|
||||
FTE 型 agent 的护城河不是技术壁垒,是**用户自己积累的流程知识**。
|
||||
|
||||
用得越久,agent 越懂你的业务——记忆里有你的偏好,skill 里有你验证过的做法,workflow 里有你打磨过的流程。换一个 agent = 重新带一个毕业生,之前的积累全部作废。
|
||||
|
||||
这解释了为什么 FTE 型产品的竞争逻辑和 vendor 型完全不同:
|
||||
- **Vendor 型**竞争模型能力(谁的基座更强),switching cost 低,用户随时换
|
||||
- **FTE 型**竞争生态粘性(谁让用户积累得更深),switching cost 随使用时长增长
|
||||
|
||||
风险面:如果用户的流程知识被锁死在一个平台,就变成了 vendor lock-in。开放的知识格式(如 markdown skill、YAML workflow)是对冲手段。
|
||||
@@ -0,0 +1,21 @@
|
||||
---
|
||||
title: "Three Learning Carriers — Memory, Skill, Workflow"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, concept]
|
||||
category: "product"
|
||||
links:
|
||||
- vendor-vs-fte-who-defines-capability
|
||||
- agent-as-graduate
|
||||
- switching-cost-process-knowledge-as-moat
|
||||
---
|
||||
|
||||
FTE 型 agent 的能力积累依赖三个载体:
|
||||
|
||||
1. **Memory(记忆)**— 用户偏好、环境事实、历史上下文。跨 session 持久化,让 agent 不用每次从零开始。
|
||||
2. **Skill(技能)**— 可复用的操作程序。解决过的问题沉淀成步骤,下次直接调用。
|
||||
3. **Workflow / DW(流程)**— 多步骤协作模式。把复杂任务拆成角色和阶段,用流程纪律保障质量。
|
||||
|
||||
三者的关系:memory 是"认识你",skill 是"会做事",workflow 是"知道怎么把事做好"。
|
||||
|
||||
OpenClaw、Claude Code、Hermes 都已具备这三个载体,但成熟度各异。差异在于:用户能多容易地往这三个载体里"灌"自己的知识。
|
||||
@@ -0,0 +1,23 @@
|
||||
---
|
||||
title: "Trust Chain — Auditable → Evaluable → Reusable → Improvable"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- workflow-as-improvable-system
|
||||
- uwf-vs-dynamic-workflow
|
||||
- process-discipline-from-software-engineering
|
||||
---
|
||||
|
||||
可审查、可评估、可复用不是并列的好处,而是一条因果链:
|
||||
|
||||
**可审查 → 可评估 → 可复用 → 可迭代**
|
||||
|
||||
不能审查的东西不敢复用——不知道它为什么 work,换个场景可能就 break。不能评估的东西不知道该不该复用——也许它其实没用,只是恰好那次任务简单。
|
||||
|
||||
这是一条信任链,每一环是下一环的前提。uwf 选择声明式 YAML 而不是 JS/TS 定义 workflow,不是技术限制,是有意降低审查门槛,让这条链的摩擦力最低。
|
||||
|
||||
dw 不是不能做这些,而是它的默认路径不鼓励这条链——即兴生成的脚本,审查成本高、评估缺乏对照、复用需要额外抽象。差异在摩擦力,不在能力边界。
|
||||
|
||||
这也是耗散结构的递归应用——不只是用流程对 agent 做负反馈(提升执行质量),还在对流程本身做负反馈(提升流程质量)。Workflow 和代码一样,需要 review、测试、度量、迭代。
|
||||
@@ -0,0 +1,27 @@
|
||||
---
|
||||
title: "uwf vs Dynamic Workflow — Structural Differences"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- agency-over-content-not-process
|
||||
- deterministic-engine-uncertain-agent
|
||||
- session-isolation-as-cognitive-reset
|
||||
- cognitive-process-orchestration
|
||||
- workflow-as-improvable-system
|
||||
---
|
||||
|
||||
Claude Code 的 dynamic workflow (dw) 和 uwf 都有 session 隔离——dw spawn 独立 subagent(最多 16 并发、1000 总量),每个 subagent 是独立 context,也能做对抗性 review。四个优势(认知隔离、注意力聚焦、上下文保鲜、流程可靠性)两者都具备。
|
||||
|
||||
差异不在能不能做 session 隔离和程序化流程,而在**流程和执行的解耦程度**:
|
||||
|
||||
dw 的流程生成和执行是一体的——同一个 agent 既决定怎么做又开始做。流程嵌在执行里。uwf 的 workflow 是独立的持久制品,不管是人写的还是 agent 写的,一旦存在就和任何一次执行无关,可以被单独审查、讨论、迭代。
|
||||
|
||||
这个解耦在三个维度上拉开差距:
|
||||
|
||||
**审查**:dw 的 JS 脚本是代码,审查门槛高,逻辑和业务细节混在一起。uwf 的 YAML 是声明式的,roles 定义关注点,graph 定义流转,一眼能看出流程结构,非工程师也能参与讨论。
|
||||
|
||||
**评估**:dw 每次生成不同脚本,难以控制变量——跑得好是流程好还是脚本碰巧写得好?uwf 的 workflow 固定,跑 N 次可以统计成功率,增减 role 后效果差异可以归因到流程变更。
|
||||
|
||||
**复用**:dw 脚本为特定任务生成,复用需要手动泛化。uwf 的 workflow 天然是通用模板——solve-issue 就是 solve-issue,换个 repo 换个 issue 直接跑。
|
||||
@@ -0,0 +1,29 @@
|
||||
---
|
||||
title: "Vendor vs FTE — Who Defines the Agent's Capability"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision]
|
||||
category: "architecture"
|
||||
links:
|
||||
- agent-as-graduate
|
||||
- three-learning-carriers
|
||||
- switching-cost-process-knowledge-as-moat
|
||||
- opc-why-fte-agents-matter-most
|
||||
---
|
||||
|
||||
区分 vendor 型和 FTE 型 agent 最本质的一条:**谁定义 agent 的能力。**
|
||||
|
||||
- **Vendor 型**:开发者定义能力,用户消费能力。能力边界在发布那一刻就定了,升级主动权在开发者。
|
||||
- **FTE 型**:开发者定义出厂能力(底座模型 + 基础技能包),用户持续定义能力(记忆、skill、workflow)。
|
||||
|
||||
出厂是起点不是终点。用户通过积累记忆、训练 skill、设计 workflow,持续塑造 agent 的能力。用得越久,越贴合自己的业务,越不像别人的 agent。
|
||||
|
||||
引申的两个特征:
|
||||
- **成长性** — vendor 的能力随模型升级变化,不随使用积累;FTE 的能力随使用持续积累
|
||||
- **流程适配性** — vendor 是用户适应工具;FTE 是工具适应用户的业务流程
|
||||
|
||||
这也解释了 switching cost 的来源——换掉的不是一个产品,是用户自己定义出来的能力。
|
||||
|
||||
代表产品:
|
||||
- **Vendor 型**:ChatGPT、Claude(对话式)、Midjourney(图像生成)、Perplexity(搜索问答)、各种 GPTs
|
||||
- **FTE 型**:OpenClaw、Claude Code、Hermes 都在往这个方向走——有记忆、有 skill/workflow 机制、有持续协作关系。但尚未成熟,目前都面向有较深技术能力的用户。真正成熟的 FTE 型产品,应该是行业专家(不懂代码的人)也能带、也能教、也能调优的。这个门槛什么时候降下来,谁先降下来,可能就是这个品类的分水岭。
|
||||
@@ -0,0 +1,24 @@
|
||||
---
|
||||
title: "When Skill Is Not Enough — Workflow Judgment Call"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, decision, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- skill-vs-workflow-different-layers
|
||||
- attention-isolation-breaks-cognitive-inertia
|
||||
- feedback-loops-convergent-and-divergent
|
||||
- agency-over-content-not-process
|
||||
---
|
||||
|
||||
**Skill 够用的场景:** 任务在单一认知模式下可以完成好。查资料、写文档、跑部署脚本、按规范格式化——不需要自我对抗,一个 session 带着清晰指令一路执行到底就行。
|
||||
|
||||
**Workflow 更好的场景:** 任务需要在不同认知模式之间切换,且这些模式之间存在张力。典型标志:
|
||||
|
||||
1. **产出需要被"不知道过程"的眼睛审视** — 写代码+review、写方案+挑战、翻译+校对。一个 session 做不到真正的自我审视,确认偏误是自回归结构决定的,不是 prompt 能修的。
|
||||
|
||||
2. **出错成本高到需要结构性保证** — 不是"建议你 review 一下",而是"你不可能跳过 review"。Skill 是建议,workflow 是制度。
|
||||
|
||||
3. **需要收敛到明确的质量标准** — 负反馈环驱动修正直到通过,而不是 agent 自己觉得"差不多了"。
|
||||
|
||||
**判词:当任务复杂到 agent 可能说服自己"错的是对的"时,你需要 workflow 的结构隔离,而不是 skill 的行为指导。**
|
||||
@@ -0,0 +1,20 @@
|
||||
---
|
||||
title: "Workflow as an Improvable System"
|
||||
created: "2026-06-07"
|
||||
source: "openclaw-xiaomo"
|
||||
tags: [architecture, pattern]
|
||||
category: "architecture"
|
||||
links:
|
||||
- uwf-vs-dynamic-workflow
|
||||
- process-discipline-from-software-engineering
|
||||
- feedback-loops-convergent-and-divergent
|
||||
- cognitive-process-orchestration
|
||||
---
|
||||
|
||||
uwf 把 workflow 定位为**可持续改进的系统**,而不是一次性的任务完成工具。
|
||||
|
||||
LLM 能力在快速提升,但单次执行的可靠性永远有上限。真正的杠杆不在于某一次跑得好不好,而在于流程本身能不能从每次执行中学到东西、越来越好。这需要流程是可审查的(看得懂才能改)、可评估的(量化才能知道改对没有)、可复用的(积累才有复利)。
|
||||
|
||||
dw 每次重新生成脚本,某种意义上是在放弃之前执行的经验——每次从零开始发明流程。uwf 把流程固化为独立制品,每次迭代都在前一版基础上改进。v1 没有 tester 角色,加上 tester 变成 v2,效果可量化对比。
|
||||
|
||||
这是一个有记忆的系统——记忆不在 agent 的 context 里,而在 workflow 的版本历史里。
|
||||
@@ -0,0 +1,18 @@
|
||||
---
|
||||
"@united-workforce/util-agent": minor
|
||||
"@united-workforce/agent-mock": patch
|
||||
"@united-workforce/agent-builtin": patch
|
||||
"@united-workforce/agent-hermes": patch
|
||||
"@united-workforce/agent-claude-code": patch
|
||||
---
|
||||
|
||||
feat(util-agent): extend AgentOptions with `fork` / `cleanup` and add ask-session cache
|
||||
|
||||
Phase 2a infrastructure for `step ask`. Extends `AgentOptions` with
|
||||
`fork: AgentForkFn | null` and `cleanup: AgentCleanupFn | null` fields, exporting
|
||||
the new `AgentForkFn` and `AgentCleanupFn` type aliases. Adds `getAskSessionId` /
|
||||
`setAskSessionId` to the per-agent session cache, using `<stepHash>:ask` keys
|
||||
that share the cache file with exec sessions (`<threadId>:<role>` keys) without
|
||||
collision. All four adapters (mock, builtin, hermes, claude-code) now pass
|
||||
`fork: null, cleanup: null` — real implementations land in Phase 2b. Resolves
|
||||
issue #145.
|
||||
@@ -0,0 +1,18 @@
|
||||
---
|
||||
"@united-workforce/cli": minor
|
||||
"@united-workforce/util": patch
|
||||
---
|
||||
|
||||
feat(cli): add `uwf step ask <step-hash> -p <prompt>` read-only follow-up command
|
||||
|
||||
Phase 2b of the ask-session work. Adds a new subcommand that lets the user ask
|
||||
a follow-up question to a historical step's agent without writing a new
|
||||
`StepNode` or mutating thread state. The command resolves the agent from the
|
||||
recorded step (or `--agent <cmd>` override), forks the original session via the
|
||||
adapter's `--mode fork --session <source>` contract, caches the resulting
|
||||
ask-session id under `<stepHash>:ask` so subsequent asks reuse it, then invokes
|
||||
the agent with `--mode ask --session <forkId> --prompt <text> --detail <ref>`
|
||||
and streams the raw stdout to the caller. `--no-fork` falls back to a fresh
|
||||
session that receives the step's detail ref for context. The `prompt usage`
|
||||
reference (in `@united-workforce/util`) is also updated so agents discover the
|
||||
new subcommand. Resolves issue #146.
|
||||
@@ -0,0 +1,14 @@
|
||||
---
|
||||
"@united-workforce/cli": minor
|
||||
"@united-workforce/util": patch
|
||||
---
|
||||
|
||||
feat(cli): `uwf thread list` now defaults to active threads only
|
||||
|
||||
Changes the default behavior of `uwf thread list` to show only active threads
|
||||
(idle + running). Adds a new `--all` flag to opt into the previous behavior of
|
||||
listing every thread (including completed, cancelled, and suspended).
|
||||
|
||||
When invoked with no flags, the command now hides completed/cancelled/suspended
|
||||
threads. Use `--all` to see them, or `--status <status>` to filter explicitly.
|
||||
The `--status` filter wins when both are present. Resolves issue #147.
|
||||
@@ -0,0 +1,11 @@
|
||||
---
|
||||
"@united-workforce/cli": minor
|
||||
---
|
||||
|
||||
feat(cli): add `uwf thread poke` command
|
||||
|
||||
New subcommand `uwf thread poke <thread-id> -p <prompt>` re-runs the head step's
|
||||
agent with a supplementary prompt, replacing the head step's output. Unlike
|
||||
`thread resume`, poke skips the moderator and rewrites the new step's `prev`
|
||||
pointer so the new head replaces (not appends to) the old head. Works on idle
|
||||
and suspended threads. Resolves issue #144 (Phase 1).
|
||||
@@ -167,5 +167,7 @@ export function createBuiltinAgent(): () => Promise<void> {
|
||||
name: "builtin",
|
||||
run: runBuiltin,
|
||||
continue: continueBuiltin,
|
||||
fork: null,
|
||||
cleanup: null,
|
||||
});
|
||||
}
|
||||
|
||||
@@ -253,5 +253,7 @@ export function createClaudeCodeAgent(model: string | null): () => Promise<void>
|
||||
name: "claude-code",
|
||||
run: (ctx) => runClaudeCode(ctx, model),
|
||||
continue: (sessionId, message, store) => continueClaudeCode(sessionId, message, store, model),
|
||||
fork: null,
|
||||
cleanup: null,
|
||||
});
|
||||
}
|
||||
|
||||
@@ -246,6 +246,8 @@ export function createHermesAgent(resumeDisabled: boolean): () => Promise<void>
|
||||
name: "hermes",
|
||||
run: runHermes,
|
||||
continue: continueHermes,
|
||||
fork: null,
|
||||
cleanup: null,
|
||||
});
|
||||
|
||||
// Wrap to ensure ACP client is closed after agent completes,
|
||||
|
||||
@@ -125,5 +125,7 @@ export function createMockAgent(mockDataPath: string): () => Promise<void> {
|
||||
name: "mock",
|
||||
run,
|
||||
continue: continueRun,
|
||||
fork: null,
|
||||
cleanup: null,
|
||||
});
|
||||
}
|
||||
|
||||
@@ -49,7 +49,7 @@ bun link packages/cli
|
||||
| `uwf thread start <workflow> -p <prompt>` | Create a thread without executing |
|
||||
| `uwf thread exec <thread-id> [--agent <cmd>] [-c <count>] [--background]` | Execute one or more moderator→agent→extract cycles |
|
||||
| `uwf thread show <thread-id>` | Show thread head pointer |
|
||||
| `uwf thread list [--status <status>] [--after <date>] [--before <date>] [--skip <n>] [--take <n>]` | List threads filtered by status (idle, running, completed, active, or comma-separated), time range (ISO or relative like '7d'), with pagination |
|
||||
| `uwf thread list [--status <status>] [--all] [--after <date>] [--before <date>] [--skip <n>] [--take <n>]` | List threads (defaults to active: idle + running). Use `--all` to include completed/cancelled/suspended, or `--status` to filter explicitly (idle, running, suspended, completed, cancelled, active, or comma-separated). Supports time range and pagination. |
|
||||
| `uwf thread read <thread-id> [--quota N] [--before <hash>] [--start]` | Render thread as readable markdown |
|
||||
|
||||
`thread read`, `step list`, and `step show` work on both active and completed threads.
|
||||
@@ -63,6 +63,8 @@ uwf thread start solve-issue -p "Fix the login redirect bug"
|
||||
uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV -c 3 --agent uwf-builtin
|
||||
uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV --background
|
||||
uwf thread list
|
||||
uwf thread list --all
|
||||
uwf thread list --status running
|
||||
uwf thread list --status active
|
||||
uwf thread list --status idle,completed
|
||||
@@ -79,6 +81,7 @@ uwf thread stop 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
| `uwf step show <step-hash>` | Show step metadata and frontmatter |
|
||||
| `uwf step read <step-hash> [--quota <chars>]` | Read a step's turns as human-readable markdown |
|
||||
| `uwf step fork <step-hash>` | Fork a thread from a specific step |
|
||||
| `uwf step ask <step-hash> -p <prompt> [--agent <cmd>] [--no-fork]` | Ask a follow-up question to a historical step's agent (read-only; no thread mutation) |
|
||||
|
||||
Examples:
|
||||
|
||||
@@ -87,6 +90,8 @@ uwf step list 01ARZ3NDEKTSV4RRFFQ69G5FAV
|
||||
uwf step show 32GCDE899RRQ3
|
||||
uwf step read 32GCDE899RRQ3 --quota 2000
|
||||
uwf step fork 32GCDE899RRQ3
|
||||
uwf step ask 32GCDE899RRQ3 -p "Why did you choose this approach?"
|
||||
uwf step ask 32GCDE899RRQ3 -p "Summarise the key findings" --no-fork
|
||||
```
|
||||
|
||||
### Workflow (Layer 1: Templates)
|
||||
|
||||
@@ -384,7 +384,7 @@ describe("currentRole field", () => {
|
||||
const _compHead = loadActiveThreads(uwfForIndex.varStore)[compId]!.head;
|
||||
completeThread(uwfForIndex.varStore, compId, "completed");
|
||||
|
||||
const list = await cmdThreadList(storageRoot, null, null, null, 0, 100);
|
||||
const list = await cmdThreadList(storageRoot, null, null, null, 0, 100, true);
|
||||
|
||||
const idleItem = list.find((i) => i.thread === idleId);
|
||||
expect(idleItem).toBeDefined();
|
||||
|
||||
@@ -0,0 +1,670 @@
|
||||
import { execFileSync } from "node:child_process";
|
||||
import { mkdir, mkdtemp, readFile, rm, writeFile } from "node:fs/promises";
|
||||
import { tmpdir } from "node:os";
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
import { bootstrap, putSchema } from "@ocas/core";
|
||||
import { openStore } from "@ocas/fs";
|
||||
import type { CasRef, ThreadId, ThreadIndexEntry } from "@united-workforce/protocol";
|
||||
import { afterEach, beforeEach, describe, expect, test } from "vitest";
|
||||
import { registerUwfSchemas } from "../schemas.js";
|
||||
import { seedThreads } from "./thread-test-helpers.js";
|
||||
|
||||
const OUTPUT_SCHEMA = {
|
||||
type: "object" as const,
|
||||
properties: {
|
||||
$status: { type: "string" as const },
|
||||
note: { type: "string" as const },
|
||||
},
|
||||
required: ["$status"],
|
||||
additionalProperties: false,
|
||||
};
|
||||
|
||||
const DETAIL_SCHEMA = {
|
||||
title: "ask-detail",
|
||||
type: "object" as const,
|
||||
required: ["sessionId", "model", "duration", "turnCount", "turns"],
|
||||
properties: {
|
||||
sessionId: { type: "string" as const },
|
||||
model: { type: "string" as const },
|
||||
duration: { type: "integer" as const },
|
||||
turnCount: { type: "integer" as const },
|
||||
turns: {
|
||||
type: "array" as const,
|
||||
items: { type: "string" as const, format: "ocas_ref" },
|
||||
},
|
||||
},
|
||||
additionalProperties: false,
|
||||
};
|
||||
|
||||
const THREAD_ID = "01ASKSTEPTEST000000000" as ThreadId;
|
||||
const STEP_SESSION_ID = "ses-original-step-001";
|
||||
|
||||
let tmpDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tmpDir = await mkdtemp(join(tmpdir(), "cli-uwf-step-ask-test-"));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
type SetupOpts = {
|
||||
threadStatus: ThreadIndexEntry["status"];
|
||||
withDetail: boolean;
|
||||
// The agent name (path or alias) to record in the head StepNode.agent field.
|
||||
// Defaults to mockAgentPath.
|
||||
stepAgentNameOverride: string | null;
|
||||
// Pre-cached fork session-id. When provided, the cache file is written
|
||||
// before running so the test can verify reuse semantics.
|
||||
preCachedForkSessionId: string | null;
|
||||
};
|
||||
|
||||
type SetupResult = {
|
||||
casDir: string;
|
||||
stepHash: CasRef;
|
||||
startHash: CasRef;
|
||||
workflowHash: CasRef;
|
||||
detailHash: CasRef | null;
|
||||
mockAgentPath: string;
|
||||
failingAgentPath: string;
|
||||
promptCapturePath: string;
|
||||
modeCapturePath: string;
|
||||
forkSessionCapturePath: string;
|
||||
askSessionCapturePath: string;
|
||||
envCapturePath: string;
|
||||
};
|
||||
|
||||
async function setupAskFixture(opts: Partial<SetupOpts> = {}): Promise<SetupResult> {
|
||||
const cfg: SetupOpts = {
|
||||
threadStatus: opts.threadStatus ?? "idle",
|
||||
withDetail: opts.withDetail ?? true,
|
||||
stepAgentNameOverride: opts.stepAgentNameOverride ?? null,
|
||||
preCachedForkSessionId: opts.preCachedForkSessionId ?? null,
|
||||
};
|
||||
|
||||
const casDir = join(tmpDir, "cas");
|
||||
await mkdir(casDir, { recursive: true });
|
||||
|
||||
const store = await openStore(casDir);
|
||||
await bootstrap(store);
|
||||
const schemas = await registerUwfSchemas(store);
|
||||
const outputSchemaHash = await putSchema(store, OUTPUT_SCHEMA);
|
||||
const detailSchemaHash = await putSchema(store, DETAIL_SCHEMA);
|
||||
|
||||
const workflowHash = await store.cas.put(schemas.workflow, {
|
||||
name: "test-ask",
|
||||
description: "ask command integration test",
|
||||
roles: {
|
||||
worker: {
|
||||
description: "Worker",
|
||||
goal: "Work",
|
||||
capabilities: [],
|
||||
procedure: "work",
|
||||
output: "result",
|
||||
frontmatter: outputSchemaHash,
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start work", location: null },
|
||||
},
|
||||
worker: { ok: { role: "$END", prompt: "done", location: null } },
|
||||
},
|
||||
});
|
||||
|
||||
const startHash = await store.cas.put(schemas.startNode, {
|
||||
workflow: workflowHash,
|
||||
prompt: "Test ask task",
|
||||
cwd: tmpDir,
|
||||
});
|
||||
|
||||
// Set OCAS_HOME so seedThreads + in-test createUwfStore calls resolve to this CAS dir.
|
||||
process.env.OCAS_HOME = casDir;
|
||||
|
||||
// Capture file paths
|
||||
const promptCapturePath = join(tmpDir, "captured-prompt.txt");
|
||||
const modeCapturePath = join(tmpDir, "captured-mode.txt");
|
||||
const forkSessionCapturePath = join(tmpDir, "captured-fork-session.txt");
|
||||
const askSessionCapturePath = join(tmpDir, "captured-ask-session.txt");
|
||||
const envCapturePath = join(tmpDir, "captured-env.txt");
|
||||
const mockAgentPath = join(tmpDir, "mock-agent.sh");
|
||||
const failingAgentPath = join(tmpDir, "failing-agent.sh");
|
||||
|
||||
// Build a detail node with sessionId so step ask can extract it
|
||||
let detailHash: CasRef | null = null;
|
||||
if (cfg.withDetail) {
|
||||
const turnHash = await store.cas.put(detailSchemaHash, {
|
||||
sessionId: STEP_SESSION_ID,
|
||||
model: "test-model",
|
||||
duration: 1000,
|
||||
turnCount: 0,
|
||||
turns: [],
|
||||
});
|
||||
detailHash = turnHash;
|
||||
}
|
||||
|
||||
// Build the StepNode at thread head
|
||||
const outputHash = await store.cas.put(outputSchemaHash, { $status: "ok" });
|
||||
const stepHash = await store.cas.put(schemas.stepNode, {
|
||||
start: startHash,
|
||||
prev: null,
|
||||
role: "worker",
|
||||
output: outputHash,
|
||||
detail: detailHash,
|
||||
agent: cfg.stepAgentNameOverride ?? mockAgentPath,
|
||||
edgePrompt: "Start work",
|
||||
startedAtMs: 1716600000000,
|
||||
completedAtMs: 1716600001000,
|
||||
cwd: tmpDir,
|
||||
assembledPrompt: null,
|
||||
usage: null,
|
||||
});
|
||||
|
||||
// Seed thread index entry
|
||||
await seedThreads(tmpDir, {
|
||||
[THREAD_ID]: {
|
||||
head: stepHash,
|
||||
status: cfg.threadStatus,
|
||||
suspendedRole: null,
|
||||
suspendMessage: null,
|
||||
completedAt: cfg.threadStatus === "completed" ? 1716600001000 : null,
|
||||
},
|
||||
});
|
||||
|
||||
// Pre-seed the ask session cache so reuse tests have something to find.
|
||||
if (cfg.preCachedForkSessionId !== null) {
|
||||
const cachePath = join(tmpDir, "cache", "mock-sessions.json");
|
||||
await mkdir(dirname(cachePath), { recursive: true });
|
||||
await writeFile(
|
||||
cachePath,
|
||||
`${JSON.stringify({ [`${stepHash}:ask`]: cfg.preCachedForkSessionId }, null, 2)}\n`,
|
||||
"utf8",
|
||||
);
|
||||
}
|
||||
|
||||
// Mock agent: dispatches based on `--mode` (ask|fork|run) and captures inputs.
|
||||
// - --mode ask --session <id> --prompt <text>: writes to ask capture; echoes a fixed answer to stdout
|
||||
// - --mode fork --session <id>: writes to fork capture; prints "forked-from-<id>" sessionId on stdout
|
||||
// - default (uwf-* style invocation): captures and echoes adapter JSON (not used in this suite)
|
||||
await writeFile(
|
||||
mockAgentPath,
|
||||
`#!/bin/sh
|
||||
mode=""
|
||||
prompt=""
|
||||
session=""
|
||||
detail=""
|
||||
while [ $# -gt 0 ]; do
|
||||
case "$1" in
|
||||
--mode) mode="$2"; shift 2 ;;
|
||||
--prompt) prompt="$2"; shift 2 ;;
|
||||
--session) session="$2"; shift 2 ;;
|
||||
--detail) detail="$2"; shift 2 ;;
|
||||
*) shift ;;
|
||||
esac
|
||||
done
|
||||
printf '%s' "$mode" > '${modeCapturePath}'
|
||||
printf '%s' "$prompt" > '${promptCapturePath}'
|
||||
printf 'OCAS_HOME=%s\\n' "$OCAS_HOME" > '${envCapturePath}'
|
||||
case "$mode" in
|
||||
fork)
|
||||
printf '%s' "$session" > '${forkSessionCapturePath}'
|
||||
new_id="forked-from-$session"
|
||||
printf '%s\\n' "$new_id"
|
||||
;;
|
||||
ask)
|
||||
printf '%s' "$session" > '${askSessionCapturePath}'
|
||||
# Print a deterministic answer that the cmdStepAsk path will hand back.
|
||||
printf 'MOCK_ANSWER prompt=%s session=%s detail=%s\\n' "$prompt" "$session" "$detail"
|
||||
;;
|
||||
*)
|
||||
echo "{\\"stepHash\\":\\"unused\\"}"
|
||||
;;
|
||||
esac
|
||||
`,
|
||||
{ mode: 0o755 },
|
||||
);
|
||||
|
||||
await writeFile(
|
||||
failingAgentPath,
|
||||
`#!/bin/sh
|
||||
echo "boom" >&2
|
||||
exit 7
|
||||
`,
|
||||
{ mode: 0o755 },
|
||||
);
|
||||
|
||||
// Minimal config so loadWorkflowConfig succeeds.
|
||||
const configPath = join(tmpDir, "config.yaml");
|
||||
await writeFile(
|
||||
configPath,
|
||||
`defaultAgent: uwf-hermes\ndefaultModel: test-model\nagentOverrides: null\nagents: {}\nproviders: {}\nmodels: {}\n`,
|
||||
);
|
||||
|
||||
return {
|
||||
casDir,
|
||||
stepHash,
|
||||
startHash,
|
||||
workflowHash,
|
||||
detailHash,
|
||||
mockAgentPath,
|
||||
failingAgentPath,
|
||||
promptCapturePath,
|
||||
modeCapturePath,
|
||||
forkSessionCapturePath,
|
||||
askSessionCapturePath,
|
||||
envCapturePath,
|
||||
};
|
||||
}
|
||||
|
||||
function runUwf(
|
||||
args: string[],
|
||||
casDir: string,
|
||||
): { stdout: string; stderr: string; status: number } {
|
||||
const cliPath = join(dirname(fileURLToPath(import.meta.url)), "..", "..", "dist", "cli.js");
|
||||
try {
|
||||
const stdout = execFileSync(process.execPath, [cliPath, ...args], {
|
||||
encoding: "utf8",
|
||||
stdio: ["ignore", "pipe", "pipe"],
|
||||
env: {
|
||||
...process.env,
|
||||
UWF_HOME: tmpDir,
|
||||
OCAS_HOME: casDir,
|
||||
},
|
||||
cwd: tmpDir,
|
||||
timeout: 30000,
|
||||
});
|
||||
return { stdout, stderr: "", status: 0 };
|
||||
} catch (error) {
|
||||
const err = error as NodeJS.ErrnoException & {
|
||||
stdout?: string | Buffer;
|
||||
stderr?: string | Buffer;
|
||||
status?: number;
|
||||
};
|
||||
return {
|
||||
stdout: typeof err.stdout === "string" ? err.stdout : (err.stdout?.toString("utf8") ?? ""),
|
||||
stderr: typeof err.stderr === "string" ? err.stderr : (err.stderr?.toString("utf8") ?? ""),
|
||||
status: err.status ?? 1,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// ── Group 1: CLI argument validation ───────────────────────────────────────
|
||||
|
||||
describe("uwf step ask - CLI argument validation", () => {
|
||||
test("1.1 missing step-hash exits non-zero", async () => {
|
||||
const { casDir } = await setupAskFixture();
|
||||
const result = runUwf(["step", "ask"], casDir);
|
||||
expect(result.status).not.toBe(0);
|
||||
});
|
||||
|
||||
test("1.2 missing -p flag exits non-zero", async () => {
|
||||
const { casDir, stepHash } = await setupAskFixture();
|
||||
const result = runUwf(["step", "ask", stepHash], casDir);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toMatch(/required|missing|prompt/);
|
||||
});
|
||||
|
||||
test("1.3 step-hash and -p accepted as valid invocation", async () => {
|
||||
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "why?", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 2: CAS validation errors ────────────────────────────────────────
|
||||
|
||||
describe("uwf step ask - CAS validation errors", () => {
|
||||
test("2.1 non-existent CAS hash exits non-zero with 'not found'", async () => {
|
||||
const { casDir, mockAgentPath } = await setupAskFixture();
|
||||
const result = runUwf(
|
||||
["step", "ask", "0000000000000", "-p", "why?", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toContain("not found");
|
||||
});
|
||||
|
||||
test("2.2 hash that is not a StepNode exits non-zero", async () => {
|
||||
const { casDir, startHash, mockAgentPath } = await setupAskFixture();
|
||||
// Use the StartNode hash — it exists but is not a StepNode
|
||||
const result = runUwf(
|
||||
["step", "ask", startHash, "-p", "why?", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toContain("not a stepnode");
|
||||
});
|
||||
|
||||
test("2.3 step with no detail ref exits non-zero", async () => {
|
||||
const { casDir, stepHash, mockAgentPath } = await setupAskFixture({ withDetail: false });
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "why?", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toMatch(/no detail|detail.*missing|missing.*detail/);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 3: Successful ask (core behavior) ───────────────────────────────
|
||||
|
||||
describe("uwf step ask - successful ask (core)", () => {
|
||||
test("3.1 stdout contains agent's response text", async () => {
|
||||
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "why tar not zip?", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
expect(result.stdout).toContain("MOCK_ANSWER");
|
||||
expect(result.stdout).toContain("why tar not zip?");
|
||||
});
|
||||
|
||||
test("3.2 thread index entry (head, status) is identical before and after ask", async () => {
|
||||
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
|
||||
|
||||
// Before ask: snapshot the thread state
|
||||
const { createUwfStore, getThread } = await import("../store.js");
|
||||
const before = await createUwfStore(tmpDir);
|
||||
const beforeEntry = getThread(before.varStore, THREAD_ID);
|
||||
expect(beforeEntry).not.toBeNull();
|
||||
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "anything", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
|
||||
// After ask: thread state should be unchanged
|
||||
const after = await createUwfStore(tmpDir);
|
||||
const afterEntry = getThread(after.varStore, THREAD_ID);
|
||||
expect(afterEntry).not.toBeNull();
|
||||
expect(afterEntry?.head).toBe(beforeEntry?.head);
|
||||
expect(afterEntry?.status).toBe(beforeEntry?.status);
|
||||
});
|
||||
|
||||
test("3.3 no new StepNode is written to CAS (step count unchanged)", async () => {
|
||||
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
|
||||
|
||||
// Count StepNodes before
|
||||
const { createUwfStore } = await import("../store.js");
|
||||
const before = await createUwfStore(tmpDir);
|
||||
const stepSchemaHash = before.schemas.stepNode;
|
||||
|
||||
function countStepNodes(uwfStore: typeof before): number {
|
||||
const candidates = [stepHash];
|
||||
let count = 0;
|
||||
for (const h of candidates) {
|
||||
const node = uwfStore.store.cas.get(h);
|
||||
if (node !== null && node.type === stepSchemaHash) count++;
|
||||
}
|
||||
return count;
|
||||
}
|
||||
|
||||
const beforeCount = countStepNodes(before);
|
||||
expect(beforeCount).toBe(1);
|
||||
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "anything", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
|
||||
// After ask: still only the seeded StepNode exists at head; no new step appended.
|
||||
const after = await createUwfStore(tmpDir);
|
||||
const headNode = after.store.cas.get(stepHash);
|
||||
expect(headNode).not.toBeNull();
|
||||
expect(headNode?.type).toBe(after.schemas.stepNode);
|
||||
|
||||
// Confirm thread head still points to the original step hash
|
||||
const { getThread } = await import("../store.js");
|
||||
const entry = getThread(after.varStore, THREAD_ID);
|
||||
expect(entry?.head).toBe(stepHash);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 4: Fork cache semantics ─────────────────────────────────────────
|
||||
|
||||
describe("uwf step ask - fork cache", () => {
|
||||
test("4.1 first ask creates a fork session and caches it", async () => {
|
||||
const { casDir, stepHash, mockAgentPath, forkSessionCapturePath } = await setupAskFixture();
|
||||
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "first ask", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
|
||||
// The mock agent in fork mode receives the source session id
|
||||
const forkArg = await readFile(forkSessionCapturePath, "utf8");
|
||||
expect(forkArg).toBe(STEP_SESSION_ID);
|
||||
|
||||
// Cache file should now contain the ask key
|
||||
const cachePath = join(tmpDir, "cache", "mock-sessions.json");
|
||||
const raw = await readFile(cachePath, "utf8");
|
||||
const parsed = JSON.parse(raw) as Record<string, string>;
|
||||
expect(parsed[`${stepHash}:ask`]).toBeDefined();
|
||||
expect(parsed[`${stepHash}:ask`]).toBe(`forked-from-${STEP_SESSION_ID}`);
|
||||
});
|
||||
|
||||
test("4.2 second ask on same step reuses the cached fork session", async () => {
|
||||
const cachedFork = "ses-already-forked-once";
|
||||
const { casDir, stepHash, mockAgentPath, modeCapturePath, askSessionCapturePath } =
|
||||
await setupAskFixture({ preCachedForkSessionId: cachedFork });
|
||||
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "second ask", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
|
||||
// The mock agent must have been invoked in `ask` mode (no fork performed).
|
||||
const mode = await readFile(modeCapturePath, "utf8");
|
||||
expect(mode).toBe("ask");
|
||||
|
||||
// The ask invocation should have received the cached fork session id.
|
||||
const askArg = await readFile(askSessionCapturePath, "utf8");
|
||||
expect(askArg).toBe(cachedFork);
|
||||
});
|
||||
|
||||
test("4.3 different step hash creates an independent fork", async () => {
|
||||
// Run a first ask on the base step → caches forkA
|
||||
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
|
||||
|
||||
const r1 = runUwf(
|
||||
["step", "ask", stepHash, "-p", "ask on step A", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(r1.status).toBe(0);
|
||||
|
||||
// Build a second StepNode (different hash) with a different sessionId so
|
||||
// its detail-derived ask session is independent of the first.
|
||||
const { createUwfStore } = await import("../store.js");
|
||||
const uwf = await createUwfStore(tmpDir);
|
||||
const detailSchemaHash = await putSchema(uwf.store, DETAIL_SCHEMA);
|
||||
const outputSchemaHash = await putSchema(uwf.store, OUTPUT_SCHEMA);
|
||||
const otherDetailHash = await uwf.store.cas.put(detailSchemaHash, {
|
||||
sessionId: "ses-original-step-002",
|
||||
model: "test-model",
|
||||
duration: 1000,
|
||||
turnCount: 0,
|
||||
turns: [],
|
||||
});
|
||||
const otherOutputHash = await uwf.store.cas.put(outputSchemaHash, {
|
||||
$status: "ok",
|
||||
note: "alt",
|
||||
});
|
||||
|
||||
// Reuse the same start ref the first step points to so the new step is a valid sibling.
|
||||
const head = uwf.store.cas.get(stepHash);
|
||||
const startRefFromHead = (head?.payload as { start: CasRef }).start;
|
||||
const properOtherStep = await uwf.store.cas.put(uwf.schemas.stepNode, {
|
||||
start: startRefFromHead,
|
||||
prev: null,
|
||||
role: "worker",
|
||||
output: otherOutputHash,
|
||||
detail: otherDetailHash,
|
||||
agent: mockAgentPath,
|
||||
edgePrompt: "Start work",
|
||||
startedAtMs: 1716600002000,
|
||||
completedAtMs: 1716600003000,
|
||||
cwd: tmpDir,
|
||||
assembledPrompt: null,
|
||||
usage: null,
|
||||
});
|
||||
|
||||
// sanity check we constructed a separate hash
|
||||
expect(properOtherStep).not.toBe(stepHash);
|
||||
|
||||
const r2 = runUwf(
|
||||
["step", "ask", properOtherStep, "-p", "ask on step B", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(r2.status).toBe(0);
|
||||
|
||||
const cachePath = join(tmpDir, "cache", "mock-sessions.json");
|
||||
const raw = await readFile(cachePath, "utf8");
|
||||
const parsed = JSON.parse(raw) as Record<string, string>;
|
||||
expect(parsed[`${stepHash}:ask`]).toBeDefined();
|
||||
expect(parsed[`${properOtherStep}:ask`]).toBeDefined();
|
||||
expect(parsed[`${stepHash}:ask`]).not.toBe(parsed[`${properOtherStep}:ask`]);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 5: Fallback (agent has no fork support) ─────────────────────────
|
||||
|
||||
describe("uwf step ask - fallback path", () => {
|
||||
test("5.1 fallback agent (no fork support) still answers via stdout", async () => {
|
||||
// Use a fallback agent that ONLY supports `ask` mode without ever being asked
|
||||
// to fork. The CLI should detect missing fork support and inject context instead.
|
||||
const { casDir, stepHash, mockAgentPath } = await setupAskFixture();
|
||||
|
||||
// Create a fallback agent script that fails with non-zero exit on "fork" mode.
|
||||
// Fallback path must NOT call mode=fork; it should call mode=ask directly.
|
||||
const fallbackPath = join(tmpDir, "fallback-agent.sh");
|
||||
const promptCapture = join(tmpDir, "fallback-prompt.txt");
|
||||
const sessionCapture = join(tmpDir, "fallback-session.txt");
|
||||
const modeCapture = join(tmpDir, "fallback-mode.txt");
|
||||
await writeFile(
|
||||
fallbackPath,
|
||||
`#!/bin/sh
|
||||
mode=""
|
||||
prompt=""
|
||||
session=""
|
||||
detail=""
|
||||
while [ $# -gt 0 ]; do
|
||||
case "$1" in
|
||||
--mode) mode="$2"; shift 2 ;;
|
||||
--prompt) prompt="$2"; shift 2 ;;
|
||||
--session) session="$2"; shift 2 ;;
|
||||
--detail) detail="$2"; shift 2 ;;
|
||||
*) shift ;;
|
||||
esac
|
||||
done
|
||||
printf '%s' "$mode" > '${modeCapture}'
|
||||
printf '%s' "$prompt" > '${promptCapture}'
|
||||
printf '%s' "$session" > '${sessionCapture}'
|
||||
case "$mode" in
|
||||
fork) echo "fork not supported" >&2; exit 99 ;;
|
||||
ask) printf 'FALLBACK_ANSWER for: %s (detail=%s)\\n' "$prompt" "$detail" ;;
|
||||
*) echo "unknown" >&2; exit 1 ;;
|
||||
esac
|
||||
`,
|
||||
{ mode: 0o755 },
|
||||
);
|
||||
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "explain context", "--agent", fallbackPath, "--no-fork"],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
expect(result.stdout).toContain("FALLBACK_ANSWER");
|
||||
expect(result.stdout).toContain("explain context");
|
||||
|
||||
// The fallback agent should be invoked in `ask` mode, with NO session id
|
||||
// (since no fork happened). The detail ref must be passed for context injection.
|
||||
const mode = await readFile(modeCapture, "utf8");
|
||||
expect(mode).toBe("ask");
|
||||
const session = await readFile(sessionCapture, "utf8");
|
||||
expect(session).toBe("");
|
||||
|
||||
// Make sure mockAgentPath's mock never ran.
|
||||
void mockAgentPath;
|
||||
});
|
||||
|
||||
test("5.2 fallback ask still does NOT mutate thread state", async () => {
|
||||
const { casDir, stepHash } = await setupAskFixture();
|
||||
|
||||
const fallbackPath = join(tmpDir, "fallback-agent.sh");
|
||||
await writeFile(
|
||||
fallbackPath,
|
||||
`#!/bin/sh
|
||||
mode=""
|
||||
prompt=""
|
||||
while [ $# -gt 0 ]; do
|
||||
case "$1" in
|
||||
--mode) mode="$2"; shift 2 ;;
|
||||
--prompt) prompt="$2"; shift 2 ;;
|
||||
*) shift ;;
|
||||
esac
|
||||
done
|
||||
case "$mode" in
|
||||
fork) echo "fork not supported" >&2; exit 99 ;;
|
||||
ask) printf 'OK %s\\n' "$prompt" ;;
|
||||
*) exit 1 ;;
|
||||
esac
|
||||
`,
|
||||
{ mode: 0o755 },
|
||||
);
|
||||
|
||||
const { createUwfStore, getThread } = await import("../store.js");
|
||||
const before = await createUwfStore(tmpDir);
|
||||
const beforeEntry = getThread(before.varStore, THREAD_ID);
|
||||
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "any", "--agent", fallbackPath, "--no-fork"],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
|
||||
const after = await createUwfStore(tmpDir);
|
||||
const afterEntry = getThread(after.varStore, THREAD_ID);
|
||||
expect(afterEntry?.head).toBe(beforeEntry?.head);
|
||||
expect(afterEntry?.status).toBe(beforeEntry?.status);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 6: Agent resolution ─────────────────────────────────────────────
|
||||
|
||||
describe("uwf step ask - agent resolution", () => {
|
||||
test("6.1 without --agent flag, agent is resolved from step's agent field", async () => {
|
||||
// Step's agent field points at mockAgentPath by default.
|
||||
const { casDir, stepHash, modeCapturePath, promptCapturePath } = await setupAskFixture();
|
||||
const result = runUwf(["step", "ask", stepHash, "-p", "explain"], casDir);
|
||||
expect(result.status).toBe(0);
|
||||
|
||||
// The mockAgentPath must have been invoked in ask mode with the user prompt.
|
||||
const mode = await readFile(modeCapturePath, "utf8");
|
||||
expect(mode).toBe("ask");
|
||||
const captured = await readFile(promptCapturePath, "utf8");
|
||||
expect(captured).toBe("explain");
|
||||
});
|
||||
|
||||
test("6.2 --agent override beats step's recorded agent", async () => {
|
||||
// Record a non-existent agent in step.agent. Provide a working one via --agent.
|
||||
const { casDir, stepHash, mockAgentPath } = await setupAskFixture({
|
||||
stepAgentNameOverride: "uwf-does-not-exist",
|
||||
});
|
||||
const result = runUwf(
|
||||
["step", "ask", stepHash, "-p", "explain", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
expect(result.stdout).toContain("MOCK_ANSWER");
|
||||
});
|
||||
});
|
||||
@@ -167,7 +167,7 @@ describe("cmdThreadList status filter", () => {
|
||||
expect(result[0]?.status).toBe("completed");
|
||||
});
|
||||
|
||||
test("should return all threads when no status filter provided", async () => {
|
||||
test("should return only active threads when no filter and no --all", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
@@ -185,8 +185,290 @@ describe("cmdThreadList status filter", () => {
|
||||
|
||||
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
|
||||
|
||||
// Default behavior (issue #147): only active threads (idle + running)
|
||||
expect(result).toHaveLength(2);
|
||||
expect(result.map((r) => r.thread).sort()).toEqual([thread1, thread2].sort());
|
||||
|
||||
// Clean up marker
|
||||
await deleteMarker(tmpDir, thread2);
|
||||
});
|
||||
|
||||
test("should return all threads when --all (showAll=true)", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const thread1 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
|
||||
const thread2 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
|
||||
const thread3 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
|
||||
|
||||
await markThreadRunning(tmpDir, thread2, workflowHash);
|
||||
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const index = loadAllThreads(uwfIdx.varStore);
|
||||
const thread3Head = index[thread3]!.head;
|
||||
if (thread3Head === undefined) throw new Error("thread3 head not found");
|
||||
await completeThread(tmpDir, thread3, workflowHash, thread3Head);
|
||||
|
||||
const result = await cmdThreadList(tmpDir, null, null, null, null, null, true);
|
||||
|
||||
expect(result).toHaveLength(3);
|
||||
expect(result.map((r) => r.thread).sort()).toEqual([thread1, thread2, thread3].sort());
|
||||
|
||||
// Clean up marker
|
||||
await deleteMarker(tmpDir, thread2);
|
||||
});
|
||||
});
|
||||
|
||||
// ── default behavior tests (issue #147) ───────────────────────────────────────
|
||||
|
||||
describe("cmdThreadList default behavior (issue #147)", () => {
|
||||
test("default returns only idle + running threads", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const threadA = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 4000);
|
||||
const threadB = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
|
||||
const threadC = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
|
||||
const threadD = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
|
||||
|
||||
await markThreadRunning(tmpDir, threadB, workflowHash);
|
||||
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const index = loadAllThreads(uwfIdx.varStore);
|
||||
const threadCHead = index[threadC]!.head;
|
||||
if (threadCHead === undefined) throw new Error("threadC head not found");
|
||||
await completeThread(tmpDir, threadC, workflowHash, threadCHead);
|
||||
|
||||
// Cancel threadD
|
||||
const threadDHead = index[threadD]!.head;
|
||||
if (threadDHead === undefined) throw new Error("threadD head not found");
|
||||
const uwfCancel = await createUwfStore(tmpDir);
|
||||
completeThreadInStore(uwfCancel.varStore, threadD, "cancelled");
|
||||
|
||||
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
|
||||
|
||||
expect(result).toHaveLength(2);
|
||||
expect(result.map((r) => r.thread).sort()).toEqual([threadA, threadB].sort());
|
||||
|
||||
await deleteMarker(tmpDir, threadB);
|
||||
});
|
||||
|
||||
test("default excludes completed threads", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 6000);
|
||||
const completedThreads: ThreadId[] = [];
|
||||
for (let i = 0; i < 5; i++) {
|
||||
const t = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - (5 - i) * 1000);
|
||||
completedThreads.push(t);
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const index = loadAllThreads(uwfIdx.varStore);
|
||||
const head = index[t]!.head;
|
||||
if (head === undefined) throw new Error("head not found");
|
||||
await completeThread(tmpDir, t, workflowHash, head);
|
||||
}
|
||||
|
||||
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
|
||||
|
||||
expect(result).toHaveLength(1);
|
||||
expect(result[0]?.thread).toBe(idleThread);
|
||||
});
|
||||
|
||||
test("default excludes cancelled threads", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 4000);
|
||||
await markThreadRunning(tmpDir, runningThread, workflowHash);
|
||||
|
||||
const cancelled: ThreadId[] = [];
|
||||
for (let i = 0; i < 3; i++) {
|
||||
const t = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - (3 - i) * 1000);
|
||||
cancelled.push(t);
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
completeThreadInStore(uwfIdx.varStore, t, "cancelled");
|
||||
}
|
||||
|
||||
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
|
||||
|
||||
expect(result).toHaveLength(1);
|
||||
expect(result[0]?.thread).toBe(runningThread);
|
||||
|
||||
await deleteMarker(tmpDir, runningThread);
|
||||
});
|
||||
|
||||
test("--all (showAll=true) returns every status", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 4000);
|
||||
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
|
||||
await markThreadRunning(tmpDir, runningThread, workflowHash);
|
||||
|
||||
const completedThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const idx = loadAllThreads(uwfIdx.varStore);
|
||||
const ch = idx[completedThread]!.head;
|
||||
if (ch === undefined) throw new Error("completedThread head not found");
|
||||
await completeThread(tmpDir, completedThread, workflowHash, ch);
|
||||
|
||||
const cancelledThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
|
||||
completeThreadInStore(uwfIdx.varStore, cancelledThread, "cancelled");
|
||||
|
||||
const result = await cmdThreadList(tmpDir, null, null, null, null, null, true);
|
||||
|
||||
expect(result).toHaveLength(4);
|
||||
expect(result.map((r) => r.thread).sort()).toEqual(
|
||||
[idleThread, runningThread, completedThread, cancelledThread].sort(),
|
||||
);
|
||||
|
||||
await deleteMarker(tmpDir, runningThread);
|
||||
});
|
||||
|
||||
test("explicit --status overrides default (still returns just the filtered statuses)", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const _idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
|
||||
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
|
||||
await markThreadRunning(tmpDir, runningThread, workflowHash);
|
||||
|
||||
const completedThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const idx = loadAllThreads(uwfIdx.varStore);
|
||||
const ch = idx[completedThread]!.head;
|
||||
if (ch === undefined) throw new Error("completedThread head not found");
|
||||
await completeThread(tmpDir, completedThread, workflowHash, ch);
|
||||
|
||||
const result = await cmdThreadList(tmpDir, ["completed"], null, null, null, null);
|
||||
|
||||
expect(result).toHaveLength(1);
|
||||
expect(result[0]?.thread).toBe(completedThread);
|
||||
expect(result[0]?.status).toBe("completed");
|
||||
|
||||
await deleteMarker(tmpDir, runningThread);
|
||||
});
|
||||
|
||||
test("--status active keeps working", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
|
||||
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
|
||||
await markThreadRunning(tmpDir, runningThread, workflowHash);
|
||||
|
||||
const completedThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const idx = loadAllThreads(uwfIdx.varStore);
|
||||
const ch = idx[completedThread]!.head;
|
||||
if (ch === undefined) throw new Error("completedThread head not found");
|
||||
await completeThread(tmpDir, completedThread, workflowHash, ch);
|
||||
|
||||
const result = await cmdThreadList(tmpDir, ["idle", "running"], null, null, null, null);
|
||||
|
||||
expect(result).toHaveLength(2);
|
||||
expect(result.map((r) => r.thread).sort()).toEqual([idleThread, runningThread].sort());
|
||||
|
||||
await deleteMarker(tmpDir, runningThread);
|
||||
});
|
||||
|
||||
test("--status + --all — explicit status wins", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const _idleThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
|
||||
const runningThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
|
||||
await markThreadRunning(tmpDir, runningThread, workflowHash);
|
||||
|
||||
const completedThread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const idx = loadAllThreads(uwfIdx.varStore);
|
||||
const ch = idx[completedThread]!.head;
|
||||
if (ch === undefined) throw new Error("completedThread head not found");
|
||||
await completeThread(tmpDir, completedThread, workflowHash, ch);
|
||||
|
||||
const result = await cmdThreadList(tmpDir, ["completed"], null, null, null, null, true);
|
||||
|
||||
expect(result).toHaveLength(1);
|
||||
expect(result[0]?.thread).toBe(completedThread);
|
||||
|
||||
await deleteMarker(tmpDir, runningThread);
|
||||
});
|
||||
|
||||
test("default returns empty when no threads", async () => {
|
||||
await makeUwfStore(tmpDir);
|
||||
|
||||
const result = await cmdThreadList(tmpDir, null, null, null, null, null);
|
||||
|
||||
expect(result).toHaveLength(0);
|
||||
});
|
||||
|
||||
test("default + time range filter composes correctly", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
const ts1 = Date.UTC(2026, 4, 20, 0, 0, 0);
|
||||
const ts2 = Date.UTC(2026, 4, 21, 0, 0, 0);
|
||||
const ts3 = Date.UTC(2026, 4, 22, 0, 0, 0);
|
||||
const ts4 = Date.UTC(2026, 4, 23, 0, 0, 0);
|
||||
const ts5 = Date.UTC(2026, 4, 24, 0, 0, 0);
|
||||
|
||||
const _t1 = await createTestThread(uwf, tmpDir, workflowHash, ts1);
|
||||
const t2 = await createTestThread(uwf, tmpDir, workflowHash, ts2);
|
||||
const t3 = await createTestThread(uwf, tmpDir, workflowHash, ts3);
|
||||
const t4 = await createTestThread(uwf, tmpDir, workflowHash, ts4);
|
||||
const _t5 = await createTestThread(uwf, tmpDir, workflowHash, ts5);
|
||||
|
||||
// Mark t3 running
|
||||
await markThreadRunning(tmpDir, t3, workflowHash);
|
||||
|
||||
// Complete t4 (should be excluded by default)
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const idx = loadAllThreads(uwfIdx.varStore);
|
||||
const t4head = idx[t4]!.head;
|
||||
if (t4head === undefined) throw new Error("t4 head not found");
|
||||
await completeThread(tmpDir, t4, workflowHash, t4head);
|
||||
|
||||
// afterMs in middle of range to exclude _t1
|
||||
const afterMs = Date.UTC(2026, 4, 20, 12, 0, 0);
|
||||
const result = await cmdThreadList(tmpDir, null, afterMs, null, null, null);
|
||||
|
||||
// Expected: t2 (idle), t3 (running), _t5 (idle); excludes t4 (completed) and _t1 (filtered by time)
|
||||
expect(result).toHaveLength(3);
|
||||
const ids = result.map((r) => r.thread).sort();
|
||||
expect(ids).toEqual([t2, t3, _t5].sort());
|
||||
|
||||
await deleteMarker(tmpDir, t3);
|
||||
});
|
||||
|
||||
test("default + pagination composes correctly", async () => {
|
||||
const uwf = await makeUwfStore(tmpDir);
|
||||
const workflowHash = await createTestWorkflow(uwf);
|
||||
|
||||
// Create 10 idle threads + 5 completed threads
|
||||
const idleThreads: ThreadId[] = [];
|
||||
for (let i = 0; i < 10; i++) {
|
||||
idleThreads.push(
|
||||
await createTestThread(uwf, tmpDir, workflowHash, Date.now() - (15 - i) * 1000),
|
||||
);
|
||||
}
|
||||
for (let i = 0; i < 5; i++) {
|
||||
const t = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - (5 - i) * 1000);
|
||||
const uwfIdx = await createUwfStore(tmpDir);
|
||||
const idx = loadAllThreads(uwfIdx.varStore);
|
||||
const head = idx[t]!.head;
|
||||
if (head === undefined) throw new Error("head not found");
|
||||
await completeThread(tmpDir, t, workflowHash, head);
|
||||
}
|
||||
|
||||
const result = await cmdThreadList(tmpDir, null, null, null, 2, 3);
|
||||
|
||||
expect(result).toHaveLength(3);
|
||||
// All results should be idle (default excludes completed)
|
||||
for (const r of result) {
|
||||
expect(r.status).toBe("idle");
|
||||
}
|
||||
});
|
||||
});
|
||||
|
||||
|
||||
@@ -0,0 +1,549 @@
|
||||
import { execFileSync } from "node:child_process";
|
||||
import { mkdir, mkdtemp, readFile, rm, writeFile } from "node:fs/promises";
|
||||
import { tmpdir } from "node:os";
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
import { putSchema } from "@ocas/core";
|
||||
import { openStore } from "@ocas/fs";
|
||||
import type {
|
||||
CasRef,
|
||||
StepNodePayload,
|
||||
ThreadId,
|
||||
ThreadIndexEntry,
|
||||
} from "@united-workforce/protocol";
|
||||
import { afterEach, beforeEach, describe, expect, test } from "vitest";
|
||||
import { registerUwfSchemas } from "../schemas.js";
|
||||
import { seedThreads } from "./thread-test-helpers.js";
|
||||
|
||||
const OUTPUT_SCHEMA = {
|
||||
type: "object" as const,
|
||||
properties: {
|
||||
$status: { type: "string" as const },
|
||||
note: { type: "string" as const },
|
||||
},
|
||||
required: ["$status"],
|
||||
additionalProperties: false,
|
||||
};
|
||||
|
||||
const THREAD_ID = "01POKESTEPTEST00000000" as ThreadId;
|
||||
|
||||
let tmpDir: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
tmpDir = await mkdtemp(join(tmpdir(), "cli-uwf-poke-test-"));
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(tmpDir, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
type SetupResult = {
|
||||
casDir: string;
|
||||
oldStepHash: CasRef;
|
||||
oldStepPrev: CasRef | null;
|
||||
oldStepCompletedAtMs: number;
|
||||
startHash: CasRef;
|
||||
workflowHash: CasRef;
|
||||
mockAgentPath: string;
|
||||
failingAgentPath: string;
|
||||
promptCapturePath: string;
|
||||
envCapturePath: string;
|
||||
};
|
||||
|
||||
type SetupOpts = {
|
||||
threadStatus: ThreadIndexEntry["status"];
|
||||
multipleSteps: boolean;
|
||||
newCompletedAtMs: number;
|
||||
newStatus: string;
|
||||
// The agent name to record in the head StepNode.agent field. Defaults to mockAgentPath.
|
||||
stepAgentNameOverride: string | null;
|
||||
// Whether to seed an actual head StepNode (false → only StartNode is the head).
|
||||
withHeadStep: boolean;
|
||||
};
|
||||
|
||||
async function setupThread(opts: Partial<SetupOpts> = {}): Promise<SetupResult> {
|
||||
const cfg: SetupOpts = {
|
||||
threadStatus: opts.threadStatus ?? "idle",
|
||||
multipleSteps: opts.multipleSteps ?? false,
|
||||
newCompletedAtMs: opts.newCompletedAtMs ?? 1716600005000,
|
||||
newStatus: opts.newStatus ?? "ok",
|
||||
stepAgentNameOverride: opts.stepAgentNameOverride ?? null,
|
||||
withHeadStep: opts.withHeadStep ?? true,
|
||||
};
|
||||
|
||||
const casDir = join(tmpDir, "cas");
|
||||
await mkdir(casDir, { recursive: true });
|
||||
|
||||
const store = await openStore(casDir);
|
||||
const schemas = await registerUwfSchemas(store);
|
||||
const outputSchemaHash = await putSchema(store, OUTPUT_SCHEMA);
|
||||
|
||||
const workflowHash = await store.cas.put(schemas.workflow, {
|
||||
name: "test-poke",
|
||||
description: "poke command integration test",
|
||||
roles: {
|
||||
worker: {
|
||||
description: "Worker role",
|
||||
goal: "Work",
|
||||
capabilities: [],
|
||||
procedure: "work",
|
||||
output: "result",
|
||||
frontmatter: outputSchemaHash,
|
||||
},
|
||||
reviewer: {
|
||||
description: "Reviewer role",
|
||||
goal: "Review",
|
||||
capabilities: [],
|
||||
procedure: "review",
|
||||
output: "result",
|
||||
frontmatter: outputSchemaHash,
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start work", location: null },
|
||||
resume: { role: "worker", prompt: "Resume the work", location: null },
|
||||
},
|
||||
worker: {
|
||||
ok: { role: "reviewer", prompt: "Review the work", location: null },
|
||||
needs_input: {
|
||||
role: "$SUSPEND",
|
||||
prompt: "Please clarify",
|
||||
location: null,
|
||||
},
|
||||
},
|
||||
reviewer: { done: { role: "$END", prompt: "Done", location: null } },
|
||||
},
|
||||
});
|
||||
|
||||
const startHash = await store.cas.put(schemas.startNode, {
|
||||
workflow: workflowHash,
|
||||
prompt: "Test poke task",
|
||||
cwd: tmpDir,
|
||||
});
|
||||
|
||||
process.env.OCAS_HOME = casDir;
|
||||
|
||||
// Paths for mock agent and capture files (set early so we can use mockAgentPath as the recorded agent name)
|
||||
const promptCapturePath = join(tmpDir, "captured-prompt.txt");
|
||||
const envCapturePath = join(tmpDir, "captured-env.txt");
|
||||
const mockAgentPath = join(tmpDir, "mock-agent.sh");
|
||||
const failingAgentPath = join(tmpDir, "failing-agent.sh");
|
||||
|
||||
// Build head StepNode chain
|
||||
let oldStepPrev: CasRef | null = null;
|
||||
if (cfg.multipleSteps) {
|
||||
// First step: prev=null
|
||||
const firstOutputHash = await store.cas.put(outputSchemaHash, { $status: "ok" });
|
||||
const firstDetailHash = await store.cas.put(schemas.text, "first detail");
|
||||
const firstStepHash = await store.cas.put(schemas.stepNode, {
|
||||
start: startHash,
|
||||
prev: null,
|
||||
role: "worker",
|
||||
output: firstOutputHash,
|
||||
detail: firstDetailHash,
|
||||
agent: cfg.stepAgentNameOverride ?? mockAgentPath,
|
||||
edgePrompt: "Start work",
|
||||
startedAtMs: 1716600000000,
|
||||
completedAtMs: 1716600001000,
|
||||
cwd: tmpDir,
|
||||
assembledPrompt: null,
|
||||
usage: null,
|
||||
});
|
||||
oldStepPrev = firstStepHash;
|
||||
}
|
||||
|
||||
let oldStepHash: CasRef = startHash;
|
||||
const oldStepCompletedAtMs = 1716600002000;
|
||||
if (cfg.withHeadStep) {
|
||||
const outputHash = await store.cas.put(outputSchemaHash, { $status: "ok" });
|
||||
const detailHash = await store.cas.put(schemas.text, "head step detail");
|
||||
oldStepHash = await store.cas.put(schemas.stepNode, {
|
||||
start: startHash,
|
||||
prev: oldStepPrev,
|
||||
role: "worker",
|
||||
output: outputHash,
|
||||
detail: detailHash,
|
||||
agent: cfg.stepAgentNameOverride ?? mockAgentPath,
|
||||
edgePrompt: "Start work",
|
||||
startedAtMs: 1716600001500,
|
||||
completedAtMs: oldStepCompletedAtMs,
|
||||
cwd: tmpDir,
|
||||
assembledPrompt: null,
|
||||
usage: null,
|
||||
});
|
||||
}
|
||||
|
||||
// Seed thread index entry. For "running" we let the test create the marker separately.
|
||||
await seedThreads(tmpDir, {
|
||||
[THREAD_ID]: {
|
||||
head: oldStepHash,
|
||||
status: cfg.threadStatus,
|
||||
suspendedRole: cfg.threadStatus === "suspended" ? "worker" : null,
|
||||
suspendMessage: cfg.threadStatus === "suspended" ? "Please clarify" : null,
|
||||
completedAt:
|
||||
cfg.threadStatus === "completed" || cfg.threadStatus === "cancelled"
|
||||
? oldStepCompletedAtMs
|
||||
: null,
|
||||
},
|
||||
});
|
||||
|
||||
// Mock agent always emits a stepNode keyed off the current thread head (which we
|
||||
// observe through OCAS_HOME). The script writes prompt/env captures and then prints
|
||||
// an adapter JSON that references a pre-built stepHash.
|
||||
// We pre-build the agent's stepHash with prev=oldStepHash (normal append behaviour).
|
||||
const newOutputHash = await store.cas.put(outputSchemaHash, {
|
||||
$status: cfg.newStatus,
|
||||
note: "poked output",
|
||||
});
|
||||
const newDetailHash = await store.cas.put(schemas.text, "poked detail");
|
||||
const agentStepHash = await store.cas.put(schemas.stepNode, {
|
||||
start: startHash,
|
||||
prev: cfg.withHeadStep ? oldStepHash : null,
|
||||
role: "worker",
|
||||
output: newOutputHash,
|
||||
detail: newDetailHash,
|
||||
agent: "mock-agent-output",
|
||||
edgePrompt: "poke prompt placeholder",
|
||||
startedAtMs: cfg.newCompletedAtMs - 100,
|
||||
completedAtMs: cfg.newCompletedAtMs,
|
||||
cwd: tmpDir,
|
||||
assembledPrompt: null,
|
||||
usage: null,
|
||||
});
|
||||
|
||||
const adapterJson = JSON.stringify({
|
||||
stepHash: agentStepHash,
|
||||
detailHash: newDetailHash,
|
||||
role: "worker",
|
||||
frontmatter: { $status: cfg.newStatus, note: "poked output" },
|
||||
body: "",
|
||||
startedAtMs: cfg.newCompletedAtMs - 100,
|
||||
completedAtMs: cfg.newCompletedAtMs,
|
||||
usage: null,
|
||||
});
|
||||
|
||||
await writeFile(
|
||||
mockAgentPath,
|
||||
`#!/bin/sh
|
||||
prompt=""
|
||||
while [ $# -gt 0 ]; do
|
||||
if [ "$1" = "--prompt" ]; then
|
||||
prompt="$2"
|
||||
shift 2
|
||||
else
|
||||
shift
|
||||
fi
|
||||
done
|
||||
printf '%s' "$prompt" > '${promptCapturePath}'
|
||||
printf 'OCAS_HOME=%s\\n' "$OCAS_HOME" > '${envCapturePath}'
|
||||
echo '${adapterJson}'
|
||||
`,
|
||||
{ mode: 0o755 },
|
||||
);
|
||||
|
||||
await writeFile(
|
||||
failingAgentPath,
|
||||
`#!/bin/sh
|
||||
echo "boom" >&2
|
||||
exit 7
|
||||
`,
|
||||
{ mode: 0o755 },
|
||||
);
|
||||
|
||||
const configPath = join(tmpDir, "config.yaml");
|
||||
await writeFile(
|
||||
configPath,
|
||||
`defaultAgent: uwf-hermes\ndefaultModel: test-model\nagentOverrides: null\nagents: {}\nproviders: {}\nmodels: {}\n`,
|
||||
);
|
||||
|
||||
return {
|
||||
casDir,
|
||||
oldStepHash,
|
||||
oldStepPrev,
|
||||
oldStepCompletedAtMs,
|
||||
startHash,
|
||||
workflowHash,
|
||||
mockAgentPath,
|
||||
failingAgentPath,
|
||||
promptCapturePath,
|
||||
envCapturePath,
|
||||
};
|
||||
}
|
||||
|
||||
function runUwf(
|
||||
args: string[],
|
||||
casDir: string,
|
||||
): { stdout: string; stderr: string; status: number } {
|
||||
const cliPath = join(dirname(fileURLToPath(import.meta.url)), "..", "..", "dist", "cli.js");
|
||||
try {
|
||||
const stdout = execFileSync(process.execPath, [cliPath, ...args], {
|
||||
encoding: "utf8",
|
||||
stdio: ["ignore", "pipe", "pipe"],
|
||||
env: {
|
||||
...process.env,
|
||||
UWF_HOME: tmpDir,
|
||||
OCAS_HOME: casDir,
|
||||
},
|
||||
cwd: tmpDir,
|
||||
timeout: 30000,
|
||||
});
|
||||
return { stdout, stderr: "", status: 0 };
|
||||
} catch (error) {
|
||||
const err = error as NodeJS.ErrnoException & {
|
||||
stdout?: string | Buffer;
|
||||
stderr?: string | Buffer;
|
||||
status?: number;
|
||||
};
|
||||
return {
|
||||
stdout: typeof err.stdout === "string" ? err.stdout : (err.stdout?.toString("utf8") ?? ""),
|
||||
stderr: typeof err.stderr === "string" ? err.stderr : (err.stderr?.toString("utf8") ?? ""),
|
||||
status: err.status ?? 1,
|
||||
};
|
||||
}
|
||||
}
|
||||
|
||||
// ── Group 1: CLI argument validation ───────────────────────────────────────
|
||||
|
||||
describe("uwf thread poke - CLI argument validation", () => {
|
||||
test("1.1 missing -p flag exits non-zero", async () => {
|
||||
const { casDir } = await setupThread();
|
||||
const result = runUwf(["thread", "poke", THREAD_ID], casDir);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toMatch(/required|missing|prompt/);
|
||||
});
|
||||
|
||||
test("1.2 -p without --agent succeeds", async () => {
|
||||
const { casDir } = await setupThread();
|
||||
const result = runUwf(["thread", "poke", THREAD_ID, "-p", "do it again"], casDir);
|
||||
expect(result.status).toBe(0);
|
||||
});
|
||||
|
||||
test("1.3 -p with --agent succeeds", async () => {
|
||||
const { casDir, mockAgentPath } = await setupThread();
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "do it again", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 2: Guard errors ──────────────────────────────────────────────────
|
||||
|
||||
describe("uwf thread poke - guard errors", () => {
|
||||
test("2.1 thread not found", async () => {
|
||||
const { casDir } = await setupThread();
|
||||
const result = runUwf(["thread", "poke", "01NOSUCHTHREAD0000000A", "-p", "prompt"], casDir);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toMatch(/not found|not active/);
|
||||
});
|
||||
|
||||
test("2.2 thread running rejects poke", async () => {
|
||||
const { casDir, workflowHash } = await setupThread();
|
||||
// Create background marker to simulate running
|
||||
const { createMarker } = await import("../background/index.js");
|
||||
await createMarker(tmpDir, {
|
||||
thread: THREAD_ID,
|
||||
workflow: workflowHash,
|
||||
pid: process.pid,
|
||||
startedAt: Date.now(),
|
||||
});
|
||||
|
||||
const result = runUwf(["thread", "poke", THREAD_ID, "-p", "prompt"], casDir);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toContain("already executing");
|
||||
});
|
||||
|
||||
test("2.3 completed thread rejects poke", async () => {
|
||||
const { casDir } = await setupThread({ threadStatus: "completed" });
|
||||
const result = runUwf(["thread", "poke", THREAD_ID, "-p", "prompt"], casDir);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toMatch(/cannot be poked|completed/);
|
||||
});
|
||||
|
||||
test("2.4 cancelled thread rejects poke", async () => {
|
||||
const { casDir } = await setupThread({ threadStatus: "cancelled" });
|
||||
const result = runUwf(["thread", "poke", THREAD_ID, "-p", "prompt"], casDir);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toMatch(/cannot be poked|cancelled/);
|
||||
});
|
||||
|
||||
test("2.5 thread head is StartNode (no StepNode) rejects poke", async () => {
|
||||
const { casDir } = await setupThread({ withHeadStep: false });
|
||||
const result = runUwf(["thread", "poke", THREAD_ID, "-p", "prompt"], casDir);
|
||||
expect(result.status).not.toBe(0);
|
||||
expect(result.stderr.toLowerCase()).toMatch(/no step|cannot be poked/);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 3: Success happy path ────────────────────────────────────────────
|
||||
|
||||
describe("uwf thread poke - success", () => {
|
||||
test("3.1, 3.4 idle thread → new head differs from old, thread index updated", async () => {
|
||||
const { casDir, oldStepHash, mockAgentPath } = await setupThread();
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
const cliOutput = JSON.parse(result.stdout.trim());
|
||||
expect(cliOutput.head).not.toBe(oldStepHash);
|
||||
|
||||
const { createUwfStore, getThread } = await import("../store.js");
|
||||
const uwf = await createUwfStore(tmpDir);
|
||||
const entry = getThread(uwf.varStore, THREAD_ID);
|
||||
expect(entry?.head).toBe(cliOutput.head);
|
||||
});
|
||||
|
||||
test("3.2 new step's prev equals old head's prev (replace, not append)", async () => {
|
||||
const { casDir, oldStepPrev, mockAgentPath } = await setupThread({ multipleSteps: true });
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
const cliOutput = JSON.parse(result.stdout.trim());
|
||||
|
||||
const { createUwfStore } = await import("../store.js");
|
||||
const uwf = await createUwfStore(tmpDir);
|
||||
const node = uwf.store.cas.get(cliOutput.head as CasRef);
|
||||
expect(node).not.toBeNull();
|
||||
expect(node?.type).toBe(uwf.schemas.stepNode);
|
||||
const payload = node?.payload as StepNodePayload;
|
||||
expect(payload.prev).toBe(oldStepPrev);
|
||||
});
|
||||
|
||||
test("3.2b new step's prev is null when old head was the first step", async () => {
|
||||
// multipleSteps:false means oldHead.prev = null
|
||||
const { casDir, mockAgentPath } = await setupThread({ multipleSteps: false });
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
const cliOutput = JSON.parse(result.stdout.trim());
|
||||
|
||||
const { createUwfStore } = await import("../store.js");
|
||||
const uwf = await createUwfStore(tmpDir);
|
||||
const node = uwf.store.cas.get(cliOutput.head as CasRef);
|
||||
const payload = node?.payload as StepNodePayload;
|
||||
expect(payload.prev).toBeNull();
|
||||
});
|
||||
|
||||
test("3.3 new step's completedAtMs is later than old", async () => {
|
||||
const { casDir, oldStepCompletedAtMs, mockAgentPath } = await setupThread();
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
const cliOutput = JSON.parse(result.stdout.trim());
|
||||
|
||||
const { createUwfStore } = await import("../store.js");
|
||||
const uwf = await createUwfStore(tmpDir);
|
||||
const node = uwf.store.cas.get(cliOutput.head as CasRef);
|
||||
const payload = node?.payload as StepNodePayload;
|
||||
expect(payload.completedAtMs).toBeGreaterThan(oldStepCompletedAtMs);
|
||||
});
|
||||
|
||||
test("3.5 status remains idle after poke (no completion/suspend)", async () => {
|
||||
const { casDir, mockAgentPath } = await setupThread();
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
const cliOutput = JSON.parse(result.stdout.trim());
|
||||
expect(cliOutput.status).toBe("idle");
|
||||
expect(cliOutput.done).toBe(false);
|
||||
expect(cliOutput.suspendedRole).toBeNull();
|
||||
expect(cliOutput.suspendMessage).toBeNull();
|
||||
});
|
||||
|
||||
test("3.6 currentRole unchanged after poke (no moderator re-route)", async () => {
|
||||
// Before poke: idle thread with worker step having $status=ok → moderator would route to reviewer.
|
||||
// After poke (mock returns same $status=ok), moderator routing remains the same.
|
||||
const { casDir, mockAgentPath } = await setupThread();
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
const cliOutput = JSON.parse(result.stdout.trim());
|
||||
expect(cliOutput.currentRole).toBe("reviewer");
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 4: Agent resolution ──────────────────────────────────────────────
|
||||
|
||||
describe("uwf thread poke - agent resolution", () => {
|
||||
test("4.1 without --agent, agent command read from head step's agent field", async () => {
|
||||
// Head step's agent field points at mockAgentPath (default in setupThread)
|
||||
const { casDir, promptCapturePath } = await setupThread();
|
||||
const result = runUwf(["thread", "poke", THREAD_ID, "-p", "redo"], casDir);
|
||||
expect(result.status).toBe(0);
|
||||
const captured = await readFile(promptCapturePath, "utf8");
|
||||
expect(captured).toBe("redo");
|
||||
});
|
||||
|
||||
test("4.2 with --agent, explicit override is used", async () => {
|
||||
// Head step records "uwf-mock" (which is not a real binary). Override with mockAgentPath.
|
||||
const { casDir, mockAgentPath } = await setupThread({ stepAgentNameOverride: "uwf-mock" });
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 5: Prompt passthrough ────────────────────────────────────────────
|
||||
|
||||
describe("uwf thread poke - prompt passthrough", () => {
|
||||
test("5.1 -p value is passed to agent as --prompt", async () => {
|
||||
const { casDir, mockAgentPath, promptCapturePath } = await setupThread();
|
||||
const supplement = "Use the REST API instead.";
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", supplement, "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
const captured = await readFile(promptCapturePath, "utf8");
|
||||
expect(captured).toBe(supplement);
|
||||
});
|
||||
});
|
||||
|
||||
// ── Group 6: Edge cases ────────────────────────────────────────────────────
|
||||
|
||||
describe("uwf thread poke - edge cases", () => {
|
||||
test("6.1 poke succeeds on suspended thread", async () => {
|
||||
const { casDir, oldStepHash, mockAgentPath } = await setupThread({
|
||||
threadStatus: "suspended",
|
||||
});
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", mockAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).toBe(0);
|
||||
const cliOutput = JSON.parse(result.stdout.trim());
|
||||
expect(cliOutput.head).not.toBe(oldStepHash);
|
||||
expect(cliOutput.status).toBe("idle");
|
||||
expect(cliOutput.suspendedRole).toBeNull();
|
||||
expect(cliOutput.suspendMessage).toBeNull();
|
||||
});
|
||||
|
||||
test("6.2 agent failure leaves thread head unchanged", async () => {
|
||||
const { casDir, oldStepHash, failingAgentPath } = await setupThread();
|
||||
const result = runUwf(
|
||||
["thread", "poke", THREAD_ID, "-p", "redo", "--agent", failingAgentPath],
|
||||
casDir,
|
||||
);
|
||||
expect(result.status).not.toBe(0);
|
||||
|
||||
const { createUwfStore, getThread } = await import("../store.js");
|
||||
const uwf = await createUwfStore(tmpDir);
|
||||
const entry = getThread(uwf.varStore, THREAD_ID);
|
||||
expect(entry?.head).toBe(oldStepHash);
|
||||
});
|
||||
});
|
||||
@@ -118,8 +118,8 @@ describe("suspended thread display", () => {
|
||||
[idleThreadId]: idleEntry,
|
||||
});
|
||||
|
||||
// Test thread list
|
||||
const listResult = await cmdThreadList(tmpDir, null, null, null, null, null);
|
||||
// Test thread list — pass showAll=true to include suspended threads
|
||||
const listResult = await cmdThreadList(tmpDir, null, null, null, null, null, true);
|
||||
|
||||
// Find the suspended and idle threads in results
|
||||
const suspendedItem = listResult.find((item) => item.thread === suspendedThreadId);
|
||||
|
||||
+53
-2
@@ -12,11 +12,12 @@ import {
|
||||
cmdPromptWorkflowAuthoring,
|
||||
} from "./commands/prompt.js";
|
||||
import { cmdSetup, cmdSetupInteractive, resolvePresetBaseUrl } from "./commands/setup.js";
|
||||
import { cmdStepFork, cmdStepList, cmdStepRead, cmdStepShow } from "./commands/step.js";
|
||||
import { cmdStepAsk, cmdStepFork, cmdStepList, cmdStepRead, cmdStepShow } from "./commands/step.js";
|
||||
import {
|
||||
cmdThreadCancel,
|
||||
cmdThreadExec,
|
||||
cmdThreadList,
|
||||
cmdThreadPoke,
|
||||
cmdThreadRead,
|
||||
cmdThreadResume,
|
||||
cmdThreadShow,
|
||||
@@ -232,11 +233,12 @@ function parsePaginationOptions(
|
||||
|
||||
thread
|
||||
.command("list")
|
||||
.description("List threads")
|
||||
.description("List threads (defaults to active: idle + running)")
|
||||
.option(
|
||||
"--status <status>",
|
||||
"Filter by status: idle, running, completed, cancelled, active (idle+running), or comma-separated values",
|
||||
)
|
||||
.option("--all", "Show all threads regardless of status (overrides default active-only filter)")
|
||||
.option("--after <date>", "Filter threads created after this date (ISO or relative like '7d')")
|
||||
.option("--before <date>", "Filter threads created before this date (ISO or relative like '7d')")
|
||||
.option("--skip <n>", "Skip first n threads")
|
||||
@@ -244,6 +246,7 @@ thread
|
||||
.action(
|
||||
(opts: {
|
||||
status: string | undefined;
|
||||
all: boolean | undefined;
|
||||
after: string | undefined;
|
||||
before: string | undefined;
|
||||
skip: string | undefined;
|
||||
@@ -255,6 +258,7 @@ thread
|
||||
const nowMs = Date.now();
|
||||
const { afterMs, beforeMs } = parseTimeFilters(opts.after, opts.before, nowMs);
|
||||
const { skip, take } = parsePaginationOptions(opts.skip, opts.take);
|
||||
const showAll = opts.all === true;
|
||||
|
||||
const result = await cmdThreadList(
|
||||
storageRoot,
|
||||
@@ -263,6 +267,7 @@ thread
|
||||
beforeMs,
|
||||
skip,
|
||||
take,
|
||||
showAll,
|
||||
);
|
||||
writeOutput(result);
|
||||
});
|
||||
@@ -290,6 +295,26 @@ thread
|
||||
});
|
||||
});
|
||||
|
||||
thread
|
||||
.command("poke")
|
||||
.description("Re-run the head step's agent with a supplementary prompt (replaces head step)")
|
||||
.argument("<thread-id>", "Thread ULID")
|
||||
.requiredOption("-p, --prompt <text>", "Supplementary prompt for the agent")
|
||||
.option("--agent <cmd>", "Override agent command (defaults to head step's agent)")
|
||||
.action((threadId: string, opts: { prompt: string; agent: string | undefined }) => {
|
||||
const storageRoot = resolveStorageRoot();
|
||||
runAction(async () => {
|
||||
const agentOverride = opts.agent ?? null;
|
||||
const result = await cmdThreadPoke(
|
||||
storageRoot,
|
||||
threadId as ThreadId,
|
||||
opts.prompt,
|
||||
agentOverride,
|
||||
);
|
||||
writeOutput(result);
|
||||
});
|
||||
});
|
||||
|
||||
thread
|
||||
.command("stop")
|
||||
.description("Stop background execution of a thread (keep thread active)")
|
||||
@@ -369,6 +394,32 @@ step
|
||||
});
|
||||
});
|
||||
|
||||
step
|
||||
.command("ask")
|
||||
.description(
|
||||
"Ask a follow-up question to a historical step's agent (read-only; no thread mutation)",
|
||||
)
|
||||
.argument("<step-hash>", "CAS hash of the StepNode to query")
|
||||
.requiredOption("-p, --prompt <text>", "Question to ask the step's agent")
|
||||
.option("--agent <cmd>", "Override agent command (defaults to the step's recorded agent)")
|
||||
.option(
|
||||
"--no-fork",
|
||||
"Skip session-fork; spawn the agent in a fresh ask session and inject the step's detail ref for context",
|
||||
)
|
||||
.action(
|
||||
(stepHash: string, opts: { prompt: string; agent: string | undefined; fork: boolean }) => {
|
||||
const storageRoot = resolveStorageRoot();
|
||||
runAction(async () => {
|
||||
const stdout = await cmdStepAsk(storageRoot, stepHash as CasRef, {
|
||||
prompt: opts.prompt,
|
||||
agentOverride: opts.agent ?? null,
|
||||
fork: opts.fork,
|
||||
});
|
||||
process.stdout.write(stdout.endsWith("\n") ? stdout : `${stdout}\n`);
|
||||
});
|
||||
},
|
||||
);
|
||||
|
||||
step
|
||||
.command("read")
|
||||
.description("Read a step's turns as human-readable markdown")
|
||||
|
||||
@@ -1,5 +1,8 @@
|
||||
import { execFileSync } from "node:child_process";
|
||||
import type { CasStore } from "@ocas/core";
|
||||
import type {
|
||||
AgentAlias,
|
||||
AgentConfig,
|
||||
CasRef,
|
||||
StartEntry,
|
||||
StepEntry,
|
||||
@@ -7,9 +10,12 @@ import type {
|
||||
ThreadForkOutput,
|
||||
ThreadId,
|
||||
ThreadStepsOutput,
|
||||
WorkflowConfig,
|
||||
WorkflowPayload,
|
||||
} from "@united-workforce/protocol";
|
||||
import { generateUlid } from "@united-workforce/util";
|
||||
import { createUwfStore, setThread } from "../store.js";
|
||||
import { getAskSessionId, loadWorkflowConfig, setAskSessionId } from "@united-workforce/util-agent";
|
||||
import { createUwfStore, setThread, type UwfStore } from "../store.js";
|
||||
import {
|
||||
collectOrderedSteps,
|
||||
expandDeep,
|
||||
@@ -341,3 +347,217 @@ export async function cmdStepRead(
|
||||
|
||||
return formatStepMarkdown(stepHash, payload.role, payload.agent, turnData, selectedTurns);
|
||||
}
|
||||
|
||||
// ── step ask ────────────────────────────────────────────────────────────────
|
||||
|
||||
function parseAgentOverride(override: string): AgentConfig {
|
||||
const parts = override
|
||||
.trim()
|
||||
.split(/\s+/)
|
||||
.filter((p) => p.length > 0);
|
||||
const command = parts[0];
|
||||
if (command === undefined) {
|
||||
fail("agent override must not be empty");
|
||||
}
|
||||
return { command, args: parts.slice(1) };
|
||||
}
|
||||
|
||||
function resolveAskAgentConfig(
|
||||
config: WorkflowConfig,
|
||||
workflow: WorkflowPayload | null,
|
||||
role: string,
|
||||
agentOverride: string | null,
|
||||
recordedAgent: string,
|
||||
): AgentConfig {
|
||||
if (agentOverride !== null) {
|
||||
const fromAlias = config.agents[agentOverride as AgentAlias];
|
||||
if (fromAlias !== undefined) {
|
||||
return fromAlias;
|
||||
}
|
||||
return parseAgentOverride(agentOverride);
|
||||
}
|
||||
|
||||
// Try to resolve via the recorded agent name as a config alias.
|
||||
const fromRecorded = config.agents[recordedAgent as AgentAlias];
|
||||
if (fromRecorded !== undefined) {
|
||||
return fromRecorded;
|
||||
}
|
||||
|
||||
// Fall back to default agent for the workflow / role.
|
||||
if (workflow !== null && config.agentOverrides !== null) {
|
||||
const roleOverrides = config.agentOverrides[workflow.name];
|
||||
if (roleOverrides !== undefined && roleOverrides[role] !== undefined) {
|
||||
const alias = roleOverrides[role];
|
||||
const agentConfig = config.agents[alias];
|
||||
if (agentConfig !== undefined) {
|
||||
return agentConfig;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Treat the recorded value as a raw command path.
|
||||
return parseAgentOverride(recordedAgent);
|
||||
}
|
||||
|
||||
/**
|
||||
* Derive the agent name used for cache file partitioning from an executable
|
||||
* path or alias. Examples:
|
||||
* uwf-hermes → hermes
|
||||
* uwf-claude-code → claude-code
|
||||
* /tmp/mock-agent.sh → mock
|
||||
* /usr/bin/agent → agent
|
||||
*/
|
||||
function deriveAgentName(commandPath: string): string {
|
||||
const basename = commandPath.split(/[/\\]/).pop() ?? commandPath;
|
||||
// Strip a trailing extension (.sh, .js, .mjs, .cjs)
|
||||
const noExt = basename.replace(/\.(sh|js|mjs|cjs|ts)$/i, "");
|
||||
// Strip the `uwf-` prefix introduced by agentLabel().
|
||||
const noPrefix = noExt.startsWith("uwf-") ? noExt.slice(4) : noExt;
|
||||
// Strip the trailing `-agent` suffix used by tests / generic agent shells.
|
||||
const noSuffix = noPrefix.endsWith("-agent") ? noPrefix.slice(0, -"-agent".length) : noPrefix;
|
||||
return noSuffix === "" ? noExt : noSuffix;
|
||||
}
|
||||
|
||||
function loadDetailNode(
|
||||
store: CasStore,
|
||||
detailRef: CasRef,
|
||||
): { sessionId: string | null; payload: Record<string, unknown> } {
|
||||
const detailNode = store.get(detailRef);
|
||||
if (detailNode === null) {
|
||||
fail(`detail node not found: ${detailRef}`);
|
||||
}
|
||||
const payload = detailNode.payload as Record<string, unknown>;
|
||||
const sessionId = typeof payload.sessionId === "string" ? payload.sessionId : null;
|
||||
return { sessionId, payload };
|
||||
}
|
||||
|
||||
function spawnAskAgent(agent: AgentConfig, argv: string[], cwd: string): { stdout: string } {
|
||||
try {
|
||||
const stdout = execFileSync(agent.command, [...agent.args, ...argv], {
|
||||
encoding: "utf8",
|
||||
stdio: ["ignore", "pipe", "pipe"],
|
||||
maxBuffer: 50 * 1024 * 1024,
|
||||
cwd,
|
||||
});
|
||||
return { stdout };
|
||||
} catch (e) {
|
||||
const err = e as NodeJS.ErrnoException & { stderr: Buffer | string | null };
|
||||
if (err.code === "ENOENT") {
|
||||
fail(
|
||||
`"${agent.command}" not found in PATH. Install it or check your PATH config. Run: which ${agent.command}`,
|
||||
);
|
||||
}
|
||||
const stderr =
|
||||
err.stderr == null
|
||||
? ""
|
||||
: typeof err.stderr === "string"
|
||||
? err.stderr
|
||||
: err.stderr.toString("utf8");
|
||||
const detail = stderr.trim() !== "" ? `: ${stderr.trim()}` : "";
|
||||
fail(`agent command failed (${agent.command})${detail}`);
|
||||
}
|
||||
}
|
||||
|
||||
function resolveAskWorkflow(uwf: UwfStore, payload: StepNodePayload): WorkflowPayload | null {
|
||||
const startNode = uwf.store.cas.get(payload.start);
|
||||
if (startNode === null) {
|
||||
return null;
|
||||
}
|
||||
const start = startNode.payload as { workflow: CasRef };
|
||||
const workflowNode = uwf.store.cas.get(start.workflow);
|
||||
if (workflowNode === null) {
|
||||
return null;
|
||||
}
|
||||
return workflowNode.payload as WorkflowPayload;
|
||||
}
|
||||
|
||||
async function performFork(
|
||||
agent: AgentConfig,
|
||||
agentName: string,
|
||||
stepHash: CasRef,
|
||||
sourceSessionId: string,
|
||||
storageRoot: string,
|
||||
cwd: string,
|
||||
): Promise<string> {
|
||||
const cached = await getAskSessionId(agentName, stepHash, storageRoot);
|
||||
if (cached !== null) {
|
||||
return cached;
|
||||
}
|
||||
const { stdout } = spawnAskAgent(agent, ["--mode", "fork", "--session", sourceSessionId], cwd);
|
||||
const newSessionId = stdout.trim().split("\n").pop()?.trim() ?? "";
|
||||
if (newSessionId === "") {
|
||||
fail(`agent fork did not return a session id (${agent.command})`);
|
||||
}
|
||||
await setAskSessionId(agentName, stepHash, newSessionId, storageRoot);
|
||||
return newSessionId;
|
||||
}
|
||||
|
||||
export type CmdStepAskOptions = {
|
||||
prompt: string;
|
||||
agentOverride: string | null;
|
||||
/** When false, skip session forking and pass detail ref for context injection. */
|
||||
fork: boolean;
|
||||
};
|
||||
|
||||
/**
|
||||
* Ask a follow-up question to a historical step's agent (read-only).
|
||||
*
|
||||
* Does NOT write a new StepNode and does NOT mutate thread state. The agent's
|
||||
* raw stdout is returned so the CLI entry point can stream it directly.
|
||||
*/
|
||||
export async function cmdStepAsk(
|
||||
storageRoot: string,
|
||||
stepHash: CasRef,
|
||||
options: CmdStepAskOptions,
|
||||
): Promise<string> {
|
||||
const uwf = await createUwfStore(storageRoot);
|
||||
const node = uwf.store.cas.get(stepHash);
|
||||
if (node === null) {
|
||||
fail(`CAS node not found: ${stepHash}`);
|
||||
}
|
||||
if (node.type !== uwf.schemas.stepNode) {
|
||||
fail(`node ${stepHash} is not a StepNode`);
|
||||
}
|
||||
const payload = node.payload as StepNodePayload;
|
||||
if (payload.detail === null) {
|
||||
fail(`step ${stepHash} has no detail; cannot ask`);
|
||||
}
|
||||
|
||||
const detailRef = payload.detail;
|
||||
const { sessionId: sourceSessionId } = loadDetailNode(uwf.store.cas, detailRef);
|
||||
|
||||
const workflow = resolveAskWorkflow(uwf, payload);
|
||||
const config = await loadWorkflowConfig(storageRoot);
|
||||
const agent = resolveAskAgentConfig(
|
||||
config,
|
||||
workflow,
|
||||
payload.role,
|
||||
options.agentOverride,
|
||||
payload.agent,
|
||||
);
|
||||
const agentName = deriveAgentName(agent.command);
|
||||
|
||||
const cwd = payload.cwd !== "" ? payload.cwd : process.cwd();
|
||||
|
||||
// Fork path: fork (or reuse cached fork) → ask with that session.
|
||||
if (options.fork && sourceSessionId !== null) {
|
||||
const askSessionId = await performFork(
|
||||
agent,
|
||||
agentName,
|
||||
stepHash,
|
||||
sourceSessionId,
|
||||
storageRoot,
|
||||
cwd,
|
||||
);
|
||||
const argv = ["--mode", "ask", "--session", askSessionId, "--prompt", options.prompt];
|
||||
argv.push("--detail", detailRef);
|
||||
const { stdout } = spawnAskAgent(agent, argv, cwd);
|
||||
return stdout;
|
||||
}
|
||||
|
||||
// Fallback path: ask without forking; inject detail ref for context.
|
||||
const argv = ["--mode", "ask", "--prompt", options.prompt];
|
||||
argv.push("--detail", detailRef);
|
||||
const { stdout } = spawnAskAgent(agent, argv, cwd);
|
||||
return stdout;
|
||||
}
|
||||
|
||||
@@ -199,6 +199,7 @@ const PL_THREAD_ARCHIVED = "F4D8Q2K5";
|
||||
const PL_STEP_ERROR = "B8T5N1V6";
|
||||
const PL_BACKGROUND_START = "X7Q4W9M2";
|
||||
const PL_THREAD_RESUME = "K2R7M4N8";
|
||||
const PL_THREAD_POKE = "P4Q9R3X7";
|
||||
|
||||
type ResumeStepConfig = {
|
||||
role: string;
|
||||
@@ -649,18 +650,25 @@ export async function cmdThreadList(
|
||||
beforeMs: number | null,
|
||||
skip: number | null,
|
||||
take: number | null,
|
||||
showAll: boolean = false,
|
||||
): Promise<ThreadListItemWithStatus[]> {
|
||||
const uwf = await createUwfStore(storageRoot);
|
||||
const index = loadActiveThreads(uwf.varStore);
|
||||
|
||||
// Resolve the effective filter:
|
||||
// - explicit --status wins (showAll has no effect)
|
||||
// - otherwise: --all → no filter; default → ["idle", "running"]
|
||||
const effectiveFilter: ThreadStatus[] | null =
|
||||
statusFilter !== null ? statusFilter : showAll ? null : ["idle", "running"];
|
||||
|
||||
// Collect active threads
|
||||
let items = await collectActiveThreads(storageRoot, uwf, index);
|
||||
|
||||
// Collect completed threads (if relevant for status filter)
|
||||
const includeCompleted =
|
||||
statusFilter === null ||
|
||||
statusFilter.includes("completed") ||
|
||||
statusFilter.includes("cancelled");
|
||||
effectiveFilter === null ||
|
||||
effectiveFilter.includes("completed") ||
|
||||
effectiveFilter.includes("cancelled");
|
||||
if (includeCompleted) {
|
||||
const activeIds = new Set(items.map((i) => i.thread));
|
||||
const completedItems = collectCompletedThreads(uwf, activeIds);
|
||||
@@ -668,8 +676,8 @@ export async function cmdThreadList(
|
||||
}
|
||||
|
||||
// Apply status filter
|
||||
if (statusFilter !== null) {
|
||||
items = items.filter((item) => statusFilter.includes(item.status));
|
||||
if (effectiveFilter !== null) {
|
||||
items = items.filter((item) => effectiveFilter.includes(item.status));
|
||||
}
|
||||
|
||||
// Apply time range filters
|
||||
@@ -1135,6 +1143,147 @@ export async function cmdThreadResume(
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* Validate that a thread can be poked. Returns the existing entry and the head StepNode payload.
|
||||
* Fails (process exit) when the thread is missing, running, completed, cancelled, or has no
|
||||
* StepNode at its head.
|
||||
*/
|
||||
async function validatePokePreconditions(
|
||||
storageRoot: string,
|
||||
uwf: UwfStore,
|
||||
threadId: ThreadId,
|
||||
): Promise<{ entry: ThreadIndexEntry; oldHead: CasRef; oldHeadPayload: StepNodePayload }> {
|
||||
const runningMarker = await isThreadRunning(storageRoot, threadId);
|
||||
if (runningMarker !== null) {
|
||||
fail(`thread already executing in background (PID: ${runningMarker.pid})`);
|
||||
}
|
||||
|
||||
const entry = getThread(uwf.varStore, threadId);
|
||||
if (entry === null) {
|
||||
fail(`thread not active: ${threadId}`);
|
||||
}
|
||||
|
||||
if (entry.status === "completed" || entry.status === "cancelled") {
|
||||
fail(`thread cannot be poked: ${threadId} (status: ${entry.status})`);
|
||||
}
|
||||
|
||||
const oldHead = entry.head;
|
||||
const oldHeadNode = uwf.store.cas.get(oldHead);
|
||||
if (oldHeadNode === null) {
|
||||
fail(`CAS node not found: ${oldHead}`);
|
||||
}
|
||||
if (oldHeadNode.type !== uwf.schemas.stepNode) {
|
||||
fail("thread cannot be poked: no step to replace (head is StartNode)");
|
||||
}
|
||||
|
||||
return { entry, oldHead, oldHeadPayload: oldHeadNode.payload as StepNodePayload };
|
||||
}
|
||||
|
||||
/**
|
||||
* Resolve the next role from the post-poke chain state, used for the StepOutput.currentRole field.
|
||||
* Returns null when the next role is $END, evaluation fails, or the result is a suspend.
|
||||
*/
|
||||
function resolveCurrentRoleFromChain(
|
||||
uwfAfter: UwfStore,
|
||||
workflow: WorkflowPayload,
|
||||
replacedHash: CasRef,
|
||||
): string | null {
|
||||
const chainAfter = walkChain(uwfAfter, replacedHash);
|
||||
const { lastRole, lastOutput } = resolveEvaluateArgs(uwfAfter, chainAfter);
|
||||
const afterResult = evaluate(workflow.graph, lastRole, lastOutput);
|
||||
if (!afterResult.ok || isSuspendResult(afterResult.value)) {
|
||||
return null;
|
||||
}
|
||||
if (afterResult.value.role === END_ROLE) {
|
||||
return null;
|
||||
}
|
||||
return afterResult.value.role;
|
||||
}
|
||||
|
||||
/**
|
||||
* Poke a thread: re-run the agent on the head step with a supplementary prompt,
|
||||
* replacing the head step's output. The new step's `prev` points to the OLD head's
|
||||
* `prev` — semantically replacing (not appending to) the head. The moderator is NOT
|
||||
* re-evaluated for routing; the role of the head step is re-used.
|
||||
*/
|
||||
export async function cmdThreadPoke(
|
||||
storageRoot: string,
|
||||
threadId: ThreadId,
|
||||
prompt: string,
|
||||
agentOverride: string | null,
|
||||
): Promise<StepOutput> {
|
||||
const uwf = await createUwfStore(storageRoot);
|
||||
const { entry, oldHeadPayload } = await validatePokePreconditions(storageRoot, uwf, threadId);
|
||||
|
||||
const chain = walkChain(uwf, entry.head);
|
||||
const workflowHash = chain.start.workflow;
|
||||
const threadCwd = chain.start.cwd;
|
||||
|
||||
const plog = createProcessLogger({
|
||||
storageRoot,
|
||||
context: { thread: threadId, workflow: workflowHash },
|
||||
});
|
||||
|
||||
// Resolve the agent: --agent override wins; otherwise read from old head step's `agent` field.
|
||||
const config = await loadWorkflowConfig(storageRoot);
|
||||
const workflow = loadWorkflowPayload(uwf, workflowHash);
|
||||
const role = oldHeadPayload.role;
|
||||
const agent =
|
||||
agentOverride !== null
|
||||
? resolveAgentConfig(config, workflow, role, agentOverride)
|
||||
: parseAgentOverride(oldHeadPayload.agent);
|
||||
|
||||
const effectiveCwd = oldHeadPayload.cwd !== "" ? oldHeadPayload.cwd : threadCwd;
|
||||
|
||||
plog.log(PL_THREAD_POKE, `poke role=${role} agent=${agent.command}`, null);
|
||||
plog.log(PL_AGENT_SPAWN, `spawning agent command=${agent.command}`, {
|
||||
args: [...agent.args, threadId, role].join(" "),
|
||||
});
|
||||
|
||||
loadDotenv({ path: getEnvPath(storageRoot) });
|
||||
|
||||
// Spawn the agent. The agent will create a new StepNode with prev=oldHead (it reads
|
||||
// the active thread head). After the agent returns, we rewrite that node's prev so
|
||||
// that the new head replaces the old head instead of appending after it.
|
||||
const agentResult = spawnAgent(plog, agent, threadId, role, prompt, effectiveCwd);
|
||||
const agentStepHash = agentResult.stepHash as CasRef;
|
||||
|
||||
plog.log(PL_AGENT_DONE, `agent returned head=${agentStepHash}`, null);
|
||||
|
||||
const uwfAfter = await createUwfStore(storageRoot);
|
||||
const agentNode = uwfAfter.store.cas.get(agentStepHash);
|
||||
if (agentNode === null || agentNode.type !== uwfAfter.schemas.stepNode) {
|
||||
failStep(plog, `agent returned hash that is not a StepNode: ${agentStepHash}`);
|
||||
}
|
||||
const agentPayload = agentNode.payload as StepNodePayload;
|
||||
|
||||
// Rewrite the new step so that its `prev` points to the OLD head's prev (replace semantics).
|
||||
const replacedPayload: StepNodePayload = {
|
||||
...agentPayload,
|
||||
prev: oldHeadPayload.prev,
|
||||
};
|
||||
const replacedHash = await uwfAfter.store.cas.put(uwfAfter.schemas.stepNode, replacedPayload);
|
||||
const replacedNode = uwfAfter.store.cas.get(replacedHash);
|
||||
if (replacedNode === null || !validate(uwfAfter.store, replacedNode)) {
|
||||
failStep(plog, "rewritten StepNode failed schema validation");
|
||||
}
|
||||
|
||||
// Update thread head to the replaced step. Status becomes idle (no moderator re-route).
|
||||
setThread(uwfAfter.varStore, threadId, updateThreadHead(entry, replacedHash));
|
||||
|
||||
return {
|
||||
workflow: workflowHash,
|
||||
thread: threadId,
|
||||
head: replacedHash,
|
||||
status: "idle",
|
||||
currentRole: resolveCurrentRoleFromChain(uwfAfter, workflow, replacedHash),
|
||||
suspendedRole: null,
|
||||
suspendMessage: null,
|
||||
done: false,
|
||||
background: null,
|
||||
};
|
||||
}
|
||||
|
||||
export function validateCount(count: number): void {
|
||||
if (count < 1 || !Number.isInteger(count)) {
|
||||
throw new Error(`--count must be a positive integer, got: ${count}`);
|
||||
|
||||
@@ -0,0 +1,60 @@
|
||||
import { readFile } from "node:fs/promises";
|
||||
import { join } from "node:path";
|
||||
import { describe, expect, test } from "vitest";
|
||||
|
||||
/**
|
||||
* Source-level verification that each adapter's `createAgent({...})` call
|
||||
* includes the new `fork: null` and `cleanup: null` fields.
|
||||
*
|
||||
* Adapters are CLI binaries that spawn external processes — runtime testing
|
||||
* requires real LLM environments — so we use static source inspection here.
|
||||
* Type-level correctness is enforced separately by `tsc --build`.
|
||||
*/
|
||||
|
||||
const REPO_ROOT = join(__dirname, "..", "..", "..");
|
||||
|
||||
const ADAPTERS: Array<{ name: string; path: string }> = [
|
||||
{ name: "agent-mock", path: "packages/agent-mock/src/mock-agent.ts" },
|
||||
{ name: "agent-builtin", path: "packages/agent-builtin/src/agent.ts" },
|
||||
{ name: "agent-hermes", path: "packages/agent-hermes/src/hermes.ts" },
|
||||
{ name: "agent-claude-code", path: "packages/agent-claude-code/src/claude-code.ts" },
|
||||
];
|
||||
|
||||
/** Find the matching `}` for the `{` at `openIdx` in `source`. */
|
||||
function findMatchingBrace(source: string, openIdx: number): number {
|
||||
let depth = 0;
|
||||
for (let i = openIdx; i < source.length; i++) {
|
||||
const ch = source[i];
|
||||
if (ch === "{") {
|
||||
depth++;
|
||||
} else if (ch === "}") {
|
||||
depth--;
|
||||
if (depth === 0) {
|
||||
return i;
|
||||
}
|
||||
}
|
||||
}
|
||||
return -1;
|
||||
}
|
||||
|
||||
/** Extract the `createAgent({...})` block from adapter source. */
|
||||
function extractCreateAgentBlock(source: string): string {
|
||||
const startIdx = source.indexOf("createAgent({");
|
||||
expect(startIdx).toBeGreaterThanOrEqual(0);
|
||||
const openIdx = source.indexOf("{", startIdx);
|
||||
const endIdx = findMatchingBrace(source, openIdx);
|
||||
expect(endIdx).toBeGreaterThan(openIdx);
|
||||
return source.slice(openIdx, endIdx + 1);
|
||||
}
|
||||
|
||||
describe("adapter createAgent calls include fork: null and cleanup: null", () => {
|
||||
for (const adapter of ADAPTERS) {
|
||||
test(`${adapter.name} createAgent call includes fork: null and cleanup: null`, async () => {
|
||||
const source = await readFile(join(REPO_ROOT, adapter.path), "utf8");
|
||||
expect(source).toMatch(/createAgent\s*\(\s*\{/);
|
||||
const block = extractCreateAgentBlock(source);
|
||||
expect(block).toMatch(/fork:\s*null/);
|
||||
expect(block).toMatch(/cleanup:\s*null/);
|
||||
});
|
||||
}
|
||||
});
|
||||
@@ -0,0 +1,78 @@
|
||||
import type { Store } from "@ocas/core";
|
||||
import { describe, expect, test } from "vitest";
|
||||
|
||||
import type {
|
||||
AgentCleanupFn,
|
||||
AgentContext,
|
||||
AgentContinueFn,
|
||||
AgentForkFn,
|
||||
AgentOptions,
|
||||
AgentRunFn,
|
||||
} from "../src/types.js";
|
||||
|
||||
const makeRun: AgentRunFn = async (_ctx: AgentContext) => ({
|
||||
output: "",
|
||||
detailHash: "",
|
||||
sessionId: "",
|
||||
assembledPrompt: "",
|
||||
usage: null,
|
||||
});
|
||||
|
||||
const makeContinue: AgentContinueFn = async (_sessionId, _message, _store) => ({
|
||||
output: "",
|
||||
detailHash: "",
|
||||
sessionId: "",
|
||||
assembledPrompt: "",
|
||||
usage: null,
|
||||
});
|
||||
|
||||
describe("AgentOptions fork/cleanup", () => {
|
||||
test("AgentOptions accepts fork and cleanup as null", () => {
|
||||
const opts: AgentOptions = {
|
||||
name: "test",
|
||||
run: makeRun,
|
||||
continue: makeContinue,
|
||||
fork: null,
|
||||
cleanup: null,
|
||||
};
|
||||
expect(opts.name).toBe("test");
|
||||
expect(opts.run).toBe(makeRun);
|
||||
expect(opts.continue).toBe(makeContinue);
|
||||
expect(opts.fork).toBeNull();
|
||||
expect(opts.cleanup).toBeNull();
|
||||
});
|
||||
|
||||
test("AgentOptions accepts real fork and cleanup functions", () => {
|
||||
const fork: AgentForkFn = async (sessionId, _store) => `${sessionId}-forked`;
|
||||
const cleanup: AgentCleanupFn = async () => {
|
||||
/* no-op */
|
||||
};
|
||||
const opts: AgentOptions = {
|
||||
name: "test",
|
||||
run: makeRun,
|
||||
continue: makeContinue,
|
||||
fork,
|
||||
cleanup,
|
||||
};
|
||||
expect(typeof opts.fork).toBe("function");
|
||||
expect(typeof opts.cleanup).toBe("function");
|
||||
});
|
||||
|
||||
test("AgentForkFn signature accepts (sessionId: string, store: Store) and returns Promise<string>", async () => {
|
||||
const fork: AgentForkFn = async (sessionId, _store) => `${sessionId}-child`;
|
||||
// Cast a placeholder Store — only the signature shape matters for this test.
|
||||
const fakeStore = {} as Store;
|
||||
const result = await fork("session-abc", fakeStore);
|
||||
expect(result).toBe("session-abc-child");
|
||||
});
|
||||
|
||||
test("AgentCleanupFn signature accepts no args and returns Promise<void>", async () => {
|
||||
let called = false;
|
||||
const cleanup: AgentCleanupFn = async () => {
|
||||
called = true;
|
||||
};
|
||||
const result = await cleanup();
|
||||
expect(result).toBeUndefined();
|
||||
expect(called).toBe(true);
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,131 @@
|
||||
import { mkdir, readFile, rm, writeFile } from "node:fs/promises";
|
||||
import { dirname, join } from "node:path";
|
||||
import type { ThreadId } from "@united-workforce/protocol";
|
||||
import { afterEach, beforeEach, describe, expect, test } from "vitest";
|
||||
|
||||
import {
|
||||
getAskSessionId,
|
||||
getCachedSessionId,
|
||||
getCachePath,
|
||||
setAskSessionId,
|
||||
setCachedSessionId,
|
||||
} from "../src/session-cache.js";
|
||||
import { getDefaultStorageRoot } from "../src/storage.js";
|
||||
|
||||
describe("session-cache ask sessions", () => {
|
||||
let testStorageRoot: string;
|
||||
|
||||
beforeEach(async () => {
|
||||
testStorageRoot = join(
|
||||
getDefaultStorageRoot(),
|
||||
"test-cache",
|
||||
`ask-${Date.now()}-${Math.random()}`,
|
||||
);
|
||||
await mkdir(testStorageRoot, { recursive: true });
|
||||
});
|
||||
|
||||
afterEach(async () => {
|
||||
await rm(testStorageRoot, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
const stepHash = "ABCDEFG1234567";
|
||||
|
||||
test("getAskSessionId returns null when no ask session cached", async () => {
|
||||
const session = await getAskSessionId("claude-code", stepHash, testStorageRoot);
|
||||
expect(session).toBeNull();
|
||||
});
|
||||
|
||||
test("setAskSessionId + getAskSessionId round-trip", async () => {
|
||||
await setAskSessionId("claude-code", stepHash, "ask-session-123", testStorageRoot);
|
||||
const session = await getAskSessionId("claude-code", stepHash, testStorageRoot);
|
||||
expect(session).toBe("ask-session-123");
|
||||
});
|
||||
|
||||
test("ask cache keys use stepHash:ask format", async () => {
|
||||
await setAskSessionId("claude-code", stepHash, "ask-session-456", testStorageRoot);
|
||||
|
||||
const cachePath = getCachePath("claude-code", testStorageRoot);
|
||||
const content = JSON.parse(await readFile(cachePath, "utf8")) as Record<string, string>;
|
||||
|
||||
expect(content).toHaveProperty(`${stepHash}:ask`, "ask-session-456");
|
||||
});
|
||||
|
||||
test("exec cache and ask cache coexist in same file", async () => {
|
||||
const threadId = "01234567890123456789012345" as ThreadId;
|
||||
const role = "developer";
|
||||
|
||||
await setCachedSessionId("claude-code", threadId, role, "exec-session", testStorageRoot);
|
||||
await setAskSessionId("claude-code", stepHash, "ask-session", testStorageRoot);
|
||||
|
||||
const cachePath = getCachePath("claude-code", testStorageRoot);
|
||||
const content = JSON.parse(await readFile(cachePath, "utf8")) as Record<string, string>;
|
||||
|
||||
expect(content).toHaveProperty(`${threadId}:${role}`, "exec-session");
|
||||
expect(content).toHaveProperty(`${stepHash}:ask`, "ask-session");
|
||||
|
||||
expect(await getCachedSessionId("claude-code", threadId, role, testStorageRoot)).toBe(
|
||||
"exec-session",
|
||||
);
|
||||
expect(await getAskSessionId("claude-code", stepHash, testStorageRoot)).toBe("ask-session");
|
||||
});
|
||||
|
||||
test("updating ask session does not affect exec session", async () => {
|
||||
const threadId = "01234567890123456789012345" as ThreadId;
|
||||
const role = "developer";
|
||||
|
||||
await setCachedSessionId("claude-code", threadId, role, "exec-original", testStorageRoot);
|
||||
await setAskSessionId("claude-code", stepHash, "ask-original", testStorageRoot);
|
||||
|
||||
await setAskSessionId("claude-code", stepHash, "ask-updated", testStorageRoot);
|
||||
|
||||
expect(await getCachedSessionId("claude-code", threadId, role, testStorageRoot)).toBe(
|
||||
"exec-original",
|
||||
);
|
||||
expect(await getAskSessionId("claude-code", stepHash, testStorageRoot)).toBe("ask-updated");
|
||||
});
|
||||
|
||||
test("updating exec session does not affect ask session", async () => {
|
||||
const threadId = "01234567890123456789012345" as ThreadId;
|
||||
const role = "developer";
|
||||
|
||||
await setAskSessionId("claude-code", stepHash, "ask-original", testStorageRoot);
|
||||
await setCachedSessionId("claude-code", threadId, role, "exec-original", testStorageRoot);
|
||||
|
||||
await setCachedSessionId("claude-code", threadId, role, "exec-updated", testStorageRoot);
|
||||
|
||||
expect(await getAskSessionId("claude-code", stepHash, testStorageRoot)).toBe("ask-original");
|
||||
expect(await getCachedSessionId("claude-code", threadId, role, testStorageRoot)).toBe(
|
||||
"exec-updated",
|
||||
);
|
||||
});
|
||||
|
||||
test("different stepHashes have independent ask sessions", async () => {
|
||||
const stepHashA = "AAAAAAA1234567";
|
||||
const stepHashB = "BBBBBBB1234567";
|
||||
|
||||
await setAskSessionId("claude-code", stepHashA, "session-A", testStorageRoot);
|
||||
await setAskSessionId("claude-code", stepHashB, "session-B", testStorageRoot);
|
||||
|
||||
expect(await getAskSessionId("claude-code", stepHashA, testStorageRoot)).toBe("session-A");
|
||||
expect(await getAskSessionId("claude-code", stepHashB, testStorageRoot)).toBe("session-B");
|
||||
});
|
||||
|
||||
test("ask session for one agent does not leak to another", async () => {
|
||||
await setAskSessionId("claude-code", stepHash, "cc-ask-session", testStorageRoot);
|
||||
|
||||
const ccSession = await getAskSessionId("claude-code", stepHash, testStorageRoot);
|
||||
const hermesSession = await getAskSessionId("hermes", stepHash, testStorageRoot);
|
||||
|
||||
expect(ccSession).toBe("cc-ask-session");
|
||||
expect(hermesSession).toBeNull();
|
||||
});
|
||||
|
||||
test("empty string ask session treated as missing", async () => {
|
||||
const cachePath = getCachePath("claude-code", testStorageRoot);
|
||||
await mkdir(dirname(cachePath), { recursive: true });
|
||||
await writeFile(cachePath, JSON.stringify({ [`${stepHash}:ask`]: "" }), "utf8");
|
||||
|
||||
const session = await getAskSessionId("claude-code", stepHash, testStorageRoot);
|
||||
expect(session).toBeNull();
|
||||
});
|
||||
});
|
||||
@@ -14,12 +14,20 @@ export type { FrontmatterFastPathResult } from "./frontmatter.js";
|
||||
export { tryFrontmatterFastPath } from "./frontmatter.js";
|
||||
export { buildFrontmatterRetryPrompt } from "./frontmatter-retry-prompt.js";
|
||||
export { createAgent, parseArgv } from "./run.js";
|
||||
export { getCachedSessionId, getCachePath, setCachedSessionId } from "./session-cache.js";
|
||||
export {
|
||||
getAskSessionId,
|
||||
getCachedSessionId,
|
||||
getCachePath,
|
||||
setAskSessionId,
|
||||
setCachedSessionId,
|
||||
} from "./session-cache.js";
|
||||
export { getConfigPath, getEnvPath, loadWorkflowConfig, resolveStorageRoot } from "./storage.js";
|
||||
export type {
|
||||
AdapterOutput,
|
||||
AgentCleanupFn,
|
||||
AgentContext,
|
||||
AgentContinueFn,
|
||||
AgentForkFn,
|
||||
AgentOptions,
|
||||
AgentRunFn,
|
||||
AgentRunResult,
|
||||
|
||||
@@ -14,6 +14,10 @@ function cacheKey(threadId: ThreadId, role: string): string {
|
||||
return `${threadId}:${role}`;
|
||||
}
|
||||
|
||||
function askCacheKey(stepHash: string): string {
|
||||
return `${stepHash}:ask`;
|
||||
}
|
||||
|
||||
function isRecord(value: unknown): value is Record<string, unknown> {
|
||||
return typeof value === "object" && value !== null && !Array.isArray(value);
|
||||
}
|
||||
@@ -86,3 +90,33 @@ export async function setCachedSessionId(
|
||||
cache[cacheKey(threadId, role)] = sessionId;
|
||||
await writeCache(agentName, storageRoot, cache);
|
||||
}
|
||||
|
||||
/**
|
||||
* Read the cached ask-session ID for a stepHash.
|
||||
*
|
||||
* Ask sessions are forked side conversations spawned by `step ask` from a
|
||||
* specific completed step. They share the per-agent cache file with exec
|
||||
* sessions but use the `<stepHash>:ask` key shape so the two namespaces
|
||||
* never collide.
|
||||
*/
|
||||
export async function getAskSessionId(
|
||||
agentName: string,
|
||||
stepHash: string,
|
||||
storageRoot: string,
|
||||
): Promise<string | null> {
|
||||
const cache = await readCache(agentName, storageRoot);
|
||||
const sessionId = cache[askCacheKey(stepHash)];
|
||||
return sessionId ?? null;
|
||||
}
|
||||
|
||||
/** Write the ask-session ID for a stepHash into the cache. */
|
||||
export async function setAskSessionId(
|
||||
agentName: string,
|
||||
stepHash: string,
|
||||
sessionId: string,
|
||||
storageRoot: string,
|
||||
): Promise<void> {
|
||||
const cache = await readCache(agentName, storageRoot);
|
||||
cache[askCacheKey(stepHash)] = sessionId;
|
||||
await writeCache(agentName, storageRoot, cache);
|
||||
}
|
||||
|
||||
@@ -50,6 +50,21 @@ export type AgentContinueFn = (
|
||||
|
||||
export type AgentRunFn = (ctx: AgentContext) => Promise<AgentRunResult>;
|
||||
|
||||
/**
|
||||
* Fork an existing agent session, returning a new session ID that branches
|
||||
* from the source session's state. Used by `step ask` (Phase 2a infrastructure)
|
||||
* to spawn a side conversation from a completed step's session without
|
||||
* polluting the original session's history.
|
||||
*/
|
||||
export type AgentForkFn = (sessionId: string, store: AgentContext["store"]) => Promise<string>;
|
||||
|
||||
/**
|
||||
* Clean up adapter-level resources (e.g. close ACP client, kill subprocesses).
|
||||
* Invoked by the agent CLI factory after the run completes — regardless of
|
||||
* success or failure — so adapters can release I/O handles deterministically.
|
||||
*/
|
||||
export type AgentCleanupFn = () => Promise<void>;
|
||||
|
||||
export type AdapterOutput = {
|
||||
stepHash: string;
|
||||
detailHash: string;
|
||||
@@ -65,4 +80,14 @@ export type AgentOptions = {
|
||||
name: string;
|
||||
run: AgentRunFn;
|
||||
continue: AgentContinueFn;
|
||||
/**
|
||||
* Optional session-fork hook. null means the adapter does not yet support
|
||||
* `step ask` (Phase 2a placeholder — wired up in Phase 2b).
|
||||
*/
|
||||
fork: AgentForkFn | null;
|
||||
/**
|
||||
* Optional cleanup hook invoked after the agent CLI completes. null means
|
||||
* the adapter has no resources to release.
|
||||
*/
|
||||
cleanup: AgentCleanupFn | null;
|
||||
};
|
||||
|
||||
@@ -29,8 +29,9 @@ uwf thread exec <thread-id> # execute one moderator→agen
|
||||
[-c, --count <number>] # run multiple steps (default: 1)
|
||||
[--background] # run in background
|
||||
uwf thread show <thread-id> # show thread head pointer
|
||||
uwf thread list # list threads
|
||||
[--status <status>] # filter: idle, running, or completed
|
||||
uwf thread list # list active threads (idle + running)
|
||||
[--all] # include completed/cancelled/suspended
|
||||
[--status <status>] # filter: idle, running, suspended, completed, cancelled, active
|
||||
uwf thread read <thread-id> # render thread context as markdown
|
||||
[--quota <chars>] # max output characters (default 32000)
|
||||
[--before <step-hash>] # load steps before this hash (exclusive)
|
||||
|
||||
@@ -67,8 +67,9 @@ uwf thread exec <thread-id> # execute one step
|
||||
[-c, --count <n>] # run n steps
|
||||
[--background] # run in background
|
||||
uwf thread show <thread-id> # show head pointer
|
||||
uwf thread list # list all threads
|
||||
[--status <filter>] # idle, running, completed, cancelled, active (comma-separated)
|
||||
uwf thread list # list active threads (idle + running)
|
||||
[--all] # include completed/cancelled/suspended
|
||||
[--status <filter>] # idle, running, suspended, completed, cancelled, active (comma-separated)
|
||||
[--after <thread-id>] # pagination: after this thread
|
||||
[--before <thread-id>] # pagination: before this thread
|
||||
[--skip <n>] # skip first n results
|
||||
@@ -94,10 +95,15 @@ start → exec (repeat) → thread reaches $END → auto-completed
|
||||
uwf step list <thread-id> # list all steps
|
||||
uwf step show <step-hash> # show step details
|
||||
uwf step fork <step-hash> # fork thread from a step (branch)
|
||||
uwf step ask <step-hash> -p <prompt> [--agent <cmd>] [--no-fork]
|
||||
# ask a follow-up question to the step's agent
|
||||
# (read-only; no new step, no thread mutation)
|
||||
\`\`\`
|
||||
|
||||
Forking creates a new thread that shares history up to the fork point — useful for retrying from a known-good state.
|
||||
|
||||
\`step ask\` re-opens the agent session that produced \`<step-hash>\` and returns its answer on stdout. Subsequent asks reuse the same forked session via the per-agent ask-cache; \`--no-fork\` runs the agent fresh with the step's detail ref injected for context.
|
||||
|
||||
## CAS Commands
|
||||
|
||||
Use the \`ocas\` CLI for direct CAS operations (\`~/.ocas/\` store, shared with \`uwf\`):
|
||||
|
||||
Reference in New Issue
Block a user