fix: isRoleDefinition accepts oneOf frontmatter

Without this, parseWorkflowPayload rejects workflows with oneOf frontmatter before semantic validation even runs.
feat(cli): add workflow semantic validation before execution
2026-05-25 07:46:12 +00:00 · 2026-05-25 07:25:10 +00:00 · 2026-05-25 06:54:38 +00:00 · 2026-05-25 06:34:56 +00:00 · 2026-05-25 06:22:53 +00:00 · 2026-05-25 05:52:27 +00:00
196 changed files with 20083 additions and 1396 deletions
@@ -0,0 +1,67 @@
+# Sync README
+
+When updating README.md files in this monorepo, follow these conventions.
+
+## Scope
+
+- Root `README.md` — project overview and navigation hub
+- Per-package `packages/*/README.md` — each package self-contained
+
+## Root README Structure
+
+The root README should have these sections in order:
+
+1. **Title and one-liner** — stateless workflow engine driven by single-step CLI
+2. **Overview** — 2-3 paragraphs explaining what it does and key concepts
+3. **Architecture** — dependency layer diagram (text-based)
+4. **Packages** — table with ALL packages from packages/ directory, columns: Package, Description, Type (cli/lib/agent/app)
+5. **Quick Start** — install, build, register workflow, start thread, run step
+6. **CLI Reference** — brief command list, detailed usage in cli-workflow README
+7. **Development** — bun install / build / check / test
+
+## Per-Package README Structure
+
+Each package README should have:
+
+1. **Title** — package name
+2. **One-line description** — matching package.json
+3. **Overview** — what it does, where it sits in the architecture, dependencies
+4. **Installation** — bun add (for libs) or "included as binary" (for cli/agents)
+5. **API** (lib packages) — all exports from src/index.ts with type signatures, grouped by category, minimal usage examples
+6. **CLI Usage** (cli/agent packages) — command reference with examples
+7. **Internal Structure** — brief src/ file organization
+8. **Configuration** (if applicable)
+
+## Execution Steps
+
+### Step 1: Gather current state
+For each package read:
+- package.json (name, version, description, dependencies, bin)
+- src/index.ts (public API exports)
+- Existing README.md (preserve hand-written content worth keeping)
+
+### Step 2: Update root README
+- Ensure ALL packages in packages/ directory are listed in the table
+- Update CLI command reference from uwf --help output
+- Keep Quick Start examples valid
+
+### Step 3: Write/update each package README
+- Follow the per-package structure
+- API section MUST match actual src/index.ts exports — never invent
+- For agent packages: document CLI binary name, how it is invoked
+- For lib packages: document exported types and functions
+- Internal structure: list actual files in src/
+
+### Step 4: Verify
+- All relative links work
+- Package names match package.json
+- No references to removed/renamed packages
+- bun run build still passes
+
+## Guidelines
+
+- Only document what src/index.ts actually exports
+- Root README summarizes, package READMEs go into detail
+- Verify CLI examples against actual commands
+- Preserve existing good prose when updating
+- English for all README content
@@ -11,3 +11,5 @@ solve-issue-entry.ts
 packages/workflow-template-develop/develop.esm.js
 .DS_Store
 *.py
+.claude
+tmp
@@ -0,0 +1,83 @@
+# Test Spec: uwf setup model connectivity validation (#335)
+
+## Context
+
+File: `packages/cli-workflow/src/commands/setup.ts`
+Test file: `packages/cli-workflow/src/__tests__/setup-validate.test.ts`
+
+After `cmdSetup` writes config, it should send a test chat completion request to verify the configured model is reachable. If validation fails, warn the user (don't abort — config is already saved).
+
+## Implementation Notes
+
+- Add a `validateModel(baseUrl, apiKey, model)` function that sends a minimal chat completion request (`POST /chat/completions` with `messages: [{role:"user",content:"hi"}]`, `max_tokens: 1`)
+- Returns `Result<void, string>` — ok if 2xx response, error with reason string otherwise
+- Use `AbortSignal.timeout(15_000)` for the request
+- Both `cmdSetup` and `cmdSetupInteractive` should call it after saving config
+- `cmdSetup` returns validation result in its return object: `{ ...existing, validation: { ok: true } | { ok: false, error: string } }`
+- `cmdSetupInteractive` prints a warning to console if validation fails, success message if it passes
+- Use the project logger (`createLogger`) — no raw `console.log` except in interactive CLI output (per CLAUDE.md)
+
+## Test Cases (vitest)
+
+### 1. `validateModel` — success path
+- Mock `fetch` to return `{ status: 200, ok: true, json: () => ({}) }`
+- Call `validateModel(baseUrl, apiKey, model)`
+- Assert returns `{ ok: true, value: undefined }`
+- Assert fetch was called with correct URL (`${baseUrl}/chat/completions`), correct headers (`Authorization: Bearer ${apiKey}`), correct body (model, messages, max_tokens: 1)
+
+### 2. `validateModel` — HTTP error (401 unauthorized)
+- Mock `fetch` to return `{ status: 401, ok: false, statusText: "Unauthorized" }`
+- Call `validateModel(baseUrl, apiKey, model)`
+- Assert returns `{ ok: false, error: <string containing "401"> }`
+
+### 3. `validateModel` — HTTP error (404 model not found)
+- Mock `fetch` to return `{ status: 404, ok: false, statusText: "Not Found" }`
+- Assert returns `{ ok: false, error: <string containing "404"> }`
+
+### 4. `validateModel` — network timeout
+- Mock `fetch` to throw `DOMException` with name `AbortError`
+- Assert returns `{ ok: false, error: <string containing "timeout" or "unreachable"> }`
+
+### 5. `validateModel` — network error (DNS failure, connection refused)
+- Mock `fetch` to throw `TypeError("fetch failed")`
+- Assert returns `{ ok: false, error: <string mentioning connectivity> }`
+
+### 6. `cmdSetup` — includes validation result on success
+- Mock global `fetch` for `/chat/completions` to succeed
+- Call `cmdSetup({ provider, baseUrl, apiKey, model, storageRoot })`
+- Assert returned object has `validation: { ok: true, value: undefined }`
+- Assert config files are still written (existing behavior preserved)
+
+### 7. `cmdSetup` — includes validation result on failure (config still saved)
+- Mock global `fetch` for `/chat/completions` to return 401
+- Call `cmdSetup({ ... })`
+- Assert returned object has `validation: { ok: false, error: ... }`
+- Assert `config.yaml` and `.env` are still written (validation failure doesn't prevent saving)
+
+### 8. `cmdSetupInteractive` — prints success message on validation pass
+- Mock `fetch` for both `/models` and `/chat/completions` to succeed
+- Mock stdin to provide valid selections
+- Capture console output
+- Assert output contains a success message like "Model verified" or "✓"
+
+### 9. `cmdSetupInteractive` — prints warning on validation failure
+- Mock `fetch`: `/models` succeeds, `/chat/completions` returns 401
+- Mock stdin for valid selections
+- Capture console output
+- Assert output contains a warning about model not being reachable and suggests trying a different model
+
+### 10. `validateModel` — request body correctness
+- Mock `fetch` to capture the request body
+- Call `validateModel(baseUrl, apiKey, "test-model")`
+- Assert body is `{ model: "test-model", messages: [{role: "user", content: "hi"}], max_tokens: 1 }`
+
+## Export Requirements
+
+- `validateModel` must be exported (for direct unit testing)
+- Signature: `async function validateModel(baseUrl: string, apiKey: string, model: string): Promise<Result<void, string>>`
+- `Result` type: `{ ok: true; value: T } | { ok: false; error: E }` (project convention)
+
+## Files to Create/Modify
+
+- **New**: `packages/cli-workflow/src/__tests__/setup-validate.test.ts` — all test cases above
+- **Modify**: `packages/cli-workflow/src/commands/setup.ts` — add `validateModel`, integrate into `cmdSetup` and `cmdSetupInteractive`
@@ -22,36 +22,50 @@ roles:
      After producing the test spec:
      1. Store it via `uwf cas put-text "<markdown content>"` and capture the returned hash
      2. Put the hash in frontmatter.plan (required when status=ready)
-    output: "Output a brief summary of the test spec. Frontmatter must include: status (ready or insufficient_info) and plan (CAS hash of the test spec, required when status=ready)."
+    output: "Output a brief summary of the test spec. Set $status to ready (with plan hash and repoPath) or insufficient_info."
    frontmatter:
-      type: object
-      properties:
-        status:
-          type: string
-          enum: [ready, insufficient_info]
-        plan:
-          type: string
-      required: [status]
+      oneOf:
+        - properties:
+            $status: { const: "ready" }
+            plan: { type: string }
+            repoPath: { type: string }
+          required: [$status, plan, repoPath]
+        - properties:
+            $status: { const: "insufficient_info" }
+          required: [$status]
  developer:
    description: "TDD implementation per test spec"
    goal: "You are a developer agent. You implement code changes following TDD — write tests first, then implementation."
    capabilities:
      - coding
    procedure: |
-      1. Read the test spec from CAS: `uwf cas get <plan hash>` (find the hash from the latest planner step's meta.plan)
-      2. If bounced back from reviewer or tester: read the previous role's output to understand what needs fixing
-      3. Write tests first based on the spec
-      4. Implement the code to make tests pass
-      5. Ensure `bun run build` passes with no errors
-      6. Run `bun test` to verify all tests pass
-    output: "List all files changed and provide a summary. Frontmatter must include: status (done or failed)."
+      IMPORTANT: Always work in a git worktree, NEVER modify the main working directory directly.
+
+      Before starting any work, set up an isolated worktree:
+      1. `cd ~/repos/workflow && git fetch origin` to get latest refs
+      2. First time (no existing branch):
+         - `git worktree add ~/repos/workflow-worktrees/fix/<issue-number>-<short-slug> -b fix/<issue-number>-<short-slug> origin/main`
+         - `cd ~/repos/workflow-worktrees/fix/<issue-number>-<short-slug> && bun install`
+      3. If bounced back from reviewer or tester (branch already exists):
+         - The worktree should already exist at `~/repos/workflow-worktrees/fix/<issue-number>-<short-slug>`
+         - `cd ~/repos/workflow-worktrees/fix/<issue-number>-<short-slug>`
+         - `git fetch origin && git rebase origin/main`
+      4. ALL subsequent work must happen inside the worktree directory.
+
+      Then implement TDD:
+      5. Read the test spec from CAS: `uwf cas get <plan hash>` (find the hash from the latest planner step's frontmatter.plan)
+      6. If bounced back from reviewer or tester: read the previous role's output to understand what needs fixing
+      7. Write tests first based on the spec
+      8. Implement the code to make tests pass
+      9. Ensure `bun run build` passes with no errors
+      10. Run `bun test` to verify all tests pass
+    output: "List all files changed and provide a summary. Include branch name and worktree path in frontmatter."
    frontmatter:
      type: object
      properties:
-        status:
-          type: string
-          enum: [done, failed]
-      required: [status]
+        branch: { type: string }
+        worktree: { type: string }
+      required: [branch, worktree]
  reviewer:
    description: "Code standards compliance check"
    goal: "You are a code reviewer. You verify code standards compliance — NOT functionality (that's the tester's job)."
@@ -59,10 +73,17 @@ roles:
      - code-review
      - static-analysis
    procedure: |
+      First, cd into the worktree: `cd ~/repos/workflow-worktrees/fix/<issue-number>-*` (find the exact directory)
+
+      Before reviewing, verify the git branch:
+      1. Run `git branch --show-current` — confirm the branch name references the issue number being worked on
+      2. If the branch doesn't correspond to the issue, flag it in your output and reject
+
+      Then perform code review:
      Hard checks (must all pass):
-      1. `bun run build` — no build errors
-      2. `bunx biome check` — no lint violations
-      3. TypeScript strict mode — no type errors
+      3. `bun run build` — no build errors
+      4. `bunx biome check` — no lint violations
+      5. TypeScript strict mode — no type errors

      Soft checks (review against CLAUDE.md conventions):
      - Functional-first: `function` + `type`, not `class` + `interface`
@@ -74,94 +95,95 @@ roles:

      Only review standards compliance. Do NOT test functionality.
      If rejecting, you MUST explain the specific reason in your output.
-    output: "Explain your decision with specific file/line references. Frontmatter must include: approved (true or false)."
+    output: "Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments)."
    frontmatter:
-      type: object
-      properties:
-        approved:
-          type: boolean
-      required: [approved]
+      oneOf:
+        - properties:
+            $status: { const: "approved" }
+            branch: { type: string }
+            worktree: { type: string }
+          required: [$status, branch, worktree]
+        - properties:
+            $status: { const: "rejected" }
+            comments: { type: string }
+          required: [$status, comments]
  tester:
    description: "Functional correctness verification"
    goal: "You are a tester agent. You verify that the implementation correctly satisfies every scenario in the test spec."
    capabilities:
      - testing
    procedure: |
+      First, cd into the worktree: `cd ~/repos/workflow-worktrees/fix/<issue-number>-*` (find the exact directory)
+
      1. Run `bun test` for automated test verification
-      2. Read the test spec from CAS: `uwf cas get <plan hash>` (find the hash from the latest planner step's meta.plan)
+      2. Read the test spec from CAS: `uwf cas get <plan hash>` (find the hash from the latest planner step's frontmatter.plan)
      3. Verify each scenario in the spec is covered and passing
      4. Determine outcome:
         - passed: all scenarios verified, tests pass
         - fix_code: tests fail or implementation doesn't match spec → send back to developer
         - fix_spec: the spec itself is wrong or incomplete → send back to planner
-    output: "Report test results per scenario. Frontmatter must include: status (passed, fix_code, or fix_spec)."
+    output: "Report test results per scenario. Set $status to passed (with branch/worktree), fix_code (with report), or fix_spec (with report)."
    frontmatter:
-      type: object
-      properties:
-        status:
-          type: string
-          enum: [passed, fix_code, fix_spec]
-      required: [status]
+      oneOf:
+        - properties:
+            $status: { const: "passed" }
+            branch: { type: string }
+            worktree: { type: string }
+          required: [$status, branch, worktree]
+        - properties:
+            $status: { const: "fix_code" }
+            report: { type: string }
+          required: [$status, report]
+        - properties:
+            $status: { const: "fix_spec" }
+            report: { type: string }
+          required: [$status, report]
  committer:
    description: "Commits and creates PR"
    goal: "You are a committer agent. You create a clean commit and push a PR linking the original issue."
    capabilities: []
    procedure: |
+      First, cd into the worktree: `cd ~/repos/workflow-worktrees/fix/<issue-number>-*` (find the exact directory)
+
      Note: You inherit the developer's worktree and branch. Do NOT create a new branch.
      1. Stage all changes: `git add -A`
      2. Commit with a descriptive message referencing the issue: `git commit -m "type: description\n\nFixes #N"`
      3. Push the branch: `git push -u origin <branch-name>`
         - If push hook fails: capture the error log in your output, mark hook_failed
-      4. On push success: create a PR via `tea pr create --title "..." --description "..."`
+      4. On push success: create a PR via `tea pr create --repo uncaged/workflow --title "..." --description "..."`
+         - The `--repo` flag is required to work in worktree directories (fixes #474 "path segment [0] is empty" error)
+         - If working on a different repo, extract owner/repo from: `git remote get-url origin | sed 's/.*[:/]\([^/]*\/[^.]*\).*/\1/'`
         - PR description must follow the project template: What / Why / Changes / Ref sections, with `Fixes #N` in Ref
-    output: "Include PR URL on success or error log on failure. Frontmatter must include: success (true or false)."
+         - On tea failure: capture stderr/stdout, log the error clearly, include PR details (title, description, branch) for manual creation, and mark success=false
+      5. After PR creation, clean up the worktree:
+         - `cd ~/repos/workflow`
+         - `git worktree remove ~/repos/workflow-worktrees/fix/<issue-number>-<slug>`
+    output: "Include PR URL on success or error log on failure. Set $status to committed (with prUrl) or hook_failed (with error)."
    frontmatter:
-      type: object
-      properties:
-        success:
-          type: boolean
-      required: [success]
-conditions:
-  insufficientInfo:
-    description: "Planner determined there's not enough info to proceed"
-    expression: "$last('planner').status = 'insufficient_info'"
-  devFailed:
-    description: "Developer failed to implement"
-    expression: "$last('developer').status = 'failed'"
-  rejected:
-    description: "Reviewer rejected the implementation"
-    expression: "$last('reviewer').approved = false"
-  fixCode:
-    description: "Tester found code issues"
-    expression: "$last('tester').status = 'fix_code'"
-  fixSpec:
-    description: "Tester found spec issues"
-    expression: "$last('tester').status = 'fix_spec'"
-  hookFailed:
-    description: "Push hook failed"
-    expression: "$last('committer').success = false"
+      oneOf:
+        - properties:
+            $status: { const: "committed" }
+            prUrl: { type: string }
+          required: [$status, prUrl]
+        - properties:
+            $status: { const: "hook_failed" }
+            error: { type: string }
+          required: [$status, error]
 graph:
  $START:
-    - role: "planner"
+    _: { role: "planner", prompt: "Analyze the issue and produce an implementation plan." }
  planner:
-    - role: "$END"
-      condition: "insufficientInfo"
-    - role: "developer"
+    insufficient_info: { role: "$END", prompt: "Insufficient information to proceed; end the workflow." }
+    ready: { role: "developer", prompt: "Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}." }
  developer:
-    - role: "$END"
-      condition: "devFailed"
-    - role: "reviewer"
+    _: { role: "reviewer", prompt: "Review branch {{{branch}}} at {{{worktree}}} for code standards compliance." }
  reviewer:
-    - role: "developer"
-      condition: "rejected"
-    - role: "tester"
+    rejected: { role: "developer", prompt: "Reviewer rejected: {{{comments}}}. Fix the issues." }
+    approved: { role: "tester", prompt: "Review passed. Run tests on branch {{{branch}}} at {{{worktree}}}." }
  tester:
-    - role: "developer"
-      condition: "fixCode"
-    - role: "planner"
-      condition: "fixSpec"
-    - role: "committer"
+    fix_code: { role: "developer", prompt: "Tests found code issues: {{{report}}}. Fix and re-submit." }
+    fix_spec: { role: "planner", prompt: "Tests found spec issues: {{{report}}}. Revise the test spec." }
+    passed: { role: "committer", prompt: "All tests passed. Commit and push branch {{{branch}}} from {{{worktree}}}." }
  committer:
-    - role: "developer"
-      condition: "hookFailed"
-    - role: "$END"
+    hook_failed: { role: "developer", prompt: "Push hook failed: {{{error}}}. Fix and re-submit." }
+    committed: { role: "$END", prompt: "PR created: {{{prUrl}}}. Workflow complete." }
@@ -8,10 +8,10 @@ This monorepo implements a stateless workflow engine driven by a single-step CLI

 | Concept | What it is |
 |---------|-----------|
-| **Workflow** | A YAML definition (`WorkflowPayload`) with roles, conditions, and a routing graph. Stored as a CAS node, identified by its XXH64 hash. |
+| **Workflow** | A YAML definition (`WorkflowPayload`) with roles, status-based routing, and a directed graph. Stored as a CAS node, identified by its XXH64 hash. |
 | **Thread** | A single execution of a workflow, identified by a ULID. State is an immutable CAS chain; active threads indexed in `threads.yaml`; completed threads in `history.jsonl`. |
 | **Role** | A named actor within a workflow. Each role has a system prompt and a JSON Schema `outputSchema`. |
-| **Moderator** | JSONata-based graph evaluator — determines the next role (or `$END`) with zero LLM cost. |
+| **Moderator** | Status-based graph evaluator — determines the next role (or `$END`) with zero LLM cost. |
 | **Agent** | An external CLI command (`uwf-hermes`, etc.) spawned by `uwf thread step`. Produces frontmatter markdown output. |
 | **CAS** | Content-Addressed Storage via `@uncaged/json-cas` — all workflow definitions, thread nodes, and outputs are immutable CAS nodes. |
 | **Registry** | `~/.uncaged/workflow/registry.yaml` — maps workflow names to current CAS hashes. |
@@ -23,7 +23,7 @@ workflow/
  packages/
    workflow-protocol/    # @uncaged/workflow-protocol — shared types (WorkflowPayload, StepNodePayload, WorkflowConfig, etc.)
    workflow-util/        # @uncaged/workflow-util — Crockford Base32, ULID, logger, frontmatter parsing/validation
-    workflow-moderator/   # @uncaged/workflow-moderator — JSONata graph evaluator
+    workflow-moderator/   # @uncaged/workflow-moderator — Status-based graph evaluator
    workflow-agent-kit/   # @uncaged/workflow-agent-kit — createAgent factory, context builder, extract pipeline
    workflow-agent-hermes/ # @uncaged/workflow-agent-hermes — uwf-hermes CLI binary (spawns hermes chat)
    cli-workflow/         # @uncaged/cli-workflow — uwf CLI binary
@@ -1,93 +1,104 @@
 # @uncaged/workflow

-A stateless workflow engine driven by a single-step CLI. Workflows are YAML definitions with roles, JSONata routing conditions, and a directed graph. Threads are immutable CAS-linked chains — each `uwf thread step` runs one moderator→agent→extract cycle and exits.
+A stateless workflow engine driven by a single-step CLI. Workflows are YAML definitions with roles, status-based routing, and a directed graph. Threads are immutable CAS-linked chains — each `uwf thread step` runs one moderator→agent→extract cycle and exits.

-## Package Map
+## Overview

-| Package | npm | Role |
-|---------|-----|------|
-| `cli-workflow` | `@uncaged/cli-workflow` | `uwf` CLI binary — thread lifecycle, workflow registry, CAS inspection, setup |
-| `workflow-protocol` | `@uncaged/workflow-protocol` | Shared TypeScript types (`WorkflowPayload`, `StepNodePayload`, `WorkflowConfig`, etc.) |
-| `workflow-moderator` | `@uncaged/workflow-moderator` | JSONata graph evaluator — determines next role or `$END` |
-| `workflow-agent-kit` | `@uncaged/workflow-agent-kit` | `createAgent` factory, context builder, two-layer extract pipeline |
-| `workflow-agent-hermes` | `@uncaged/workflow-agent-hermes` | `uwf-hermes` agent — spawns Hermes chat, captures session |
-| `workflow-util` | `@uncaged/workflow-util` | Crockford Base32, ULID, logger, frontmatter parsing |
+This monorepo implements **uwf**, a workflow engine with no long-running daemon. You register YAML workflow definitions in a content-addressed store (CAS), start a thread with an initial prompt, then invoke `uwf thread step` repeatedly until the moderator routes to `$END`. Each step is a complete process: the moderator evaluates status-based routing to pick the next role, an external agent CLI produces frontmatter markdown output, and an extract pipeline validates or structures that output against the role's JSON Schema.

-External: [`@uncaged/json-cas`](https://www.npmjs.com/package/@uncaged/json-cas) (CAS store + JSON Schema validation) + `@uncaged/json-cas-fs` (filesystem backend).
+Workflow state lives entirely on disk under `~/.uncaged/workflow/`: CAS nodes for definitions and step payloads, `registry.yaml` for workflow name→hash mappings, and `threads.yaml` for active thread head pointers. Completed threads are archived to `history.jsonl`. Because there is no server process, workflows are easy to debug, fork, and inspect with ordinary CLI tools.
+
+Agents are pluggable CLI binaries (`uwf-hermes`, `uwf-builtin`, `uwf-claude-code`, or custom commands). The engine spawns the configured agent with `<thread-id>` and `<role>`, sets `UWF_EDGE_PROMPT` from the graph transition, and captures both the agent's markdown output and a detail CAS node for session replay.
+
+## Architecture
+
+Dependency layers (lower layers have no dependency on higher layers):
+
+```
+Layer 0 — Contract
+  workflow-protocol          Shared types and JSON Schema definitions
+
+Layer 1 — Shared infra
+  workflow-util              Encoding, IDs, logging, frontmatter, paths
+  workflow-moderator         Status-based graph evaluator
+
+Layer 2 — Agent framework
+  workflow-agent-kit         createAgent factory, context builder, extract pipeline
+
+Layer 3 — Agent implementations
+  workflow-agent-hermes      Hermes ACP agent (uwf-hermes)
+  workflow-agent-builtin     Built-in LLM + tools agent (uwf-builtin)
+  workflow-agent-claude-code Claude Code agent (uwf-claude-code)
+
+Layer 4 — CLI
+  cli-workflow               uwf binary — thread lifecycle, registry, CAS, setup
+
+App (uses protocol; not in the runtime engine stack)
+  workflow-dashboard         Web UI for visual workflow editing
+```
+
+External CAS: [`@uncaged/json-cas`](https://www.npmjs.com/package/@uncaged/json-cas) (store API, hashing, schema validation) + `@uncaged/json-cas-fs` (filesystem backend).
+
+See [docs/architecture.md](docs/architecture.md) for the full design — three-phase engine loop, CAS node types, storage layout, agent CLI protocol, and design decisions.
+
+## Packages
+
+| Package | npm | Description | Type | README |
+|---------|-----|-------------|------|--------|
+| `cli-workflow` | `@uncaged/cli-workflow` | `uwf` CLI — thread lifecycle, workflow registry, CAS inspection, setup | cli | [README](packages/cli-workflow/README.md) |
+| `workflow-protocol` | `@uncaged/workflow-protocol` | Shared TypeScript types and JSON Schema constants | lib | [README](packages/workflow-protocol/README.md) |
+| `workflow-moderator` | `@uncaged/workflow-moderator` | Status-based graph evaluator — next role or `$END` | lib | [README](packages/workflow-moderator/README.md) |
+| `workflow-agent-kit` | `@uncaged/workflow-agent-kit` | `createAgent` factory, context builder, extract pipeline | lib | [README](packages/workflow-agent-kit/README.md) |
+| `workflow-util` | `@uncaged/workflow-util` | Crockford Base32, ULID, logger, frontmatter parsing, storage paths | lib | [README](packages/workflow-util/README.md) |
+| `workflow-agent-hermes` | `@uncaged/workflow-agent-hermes` | `uwf-hermes` — spawns Hermes chat via ACP | agent | [README](packages/workflow-agent-hermes/README.md) |
+| `workflow-agent-builtin` | `@uncaged/workflow-agent-builtin` | `uwf-builtin` — built-in LLM agent with file/shell tools | agent | [README](packages/workflow-agent-builtin/README.md) |
+| `workflow-agent-claude-code` | `@uncaged/workflow-agent-claude-code` | `uwf-claude-code` — spawns Claude Code CLI | agent | [README](packages/workflow-agent-claude-code/README.md) |
+| `workflow-dashboard` | `@uncaged/workflow-dashboard` | Web graph editor for workflow YAML (private, alpha) | app | [README](packages/workflow-dashboard/README.md) |

 ## Quick Start

 ```bash
-# 1. Configure provider and model
+# 1. Configure provider, model, and default agent
 uwf setup

 # 2. Register a workflow from YAML
-uwf workflow put examples/solve-issue.yaml
+uwf workflow add examples/solve-issue.yaml

-# 3. Start a thread
+# 3. Start a thread (creates head pointer; does not execute)
 uwf thread start solve-issue -p "Fix the login redirect bug"

 # 4. Execute steps (one at a time, until done)
-uwf thread step <thread-id>
+uwf thread exec <thread-id>
 ```

-## CLI Commands
+Use `-c, --count <number>` on `thread exec` to run multiple steps in one invocation. Override the agent with `--agent <cmd>`.

-### Thread
+## CLI Reference

-| Command | Description |
-|---------|-------------|
-| `uwf thread start <workflow> -p <prompt>` | Create a thread (no execution) |
-| `uwf thread step <thread-id> [--agent <cmd>]` | Execute one moderator→agent→extract cycle |
-| `uwf thread show <thread-id>` | Show head pointer and done status |
-| `uwf thread list [--all]` | List threads (`--all` includes archived) |
-| `uwf thread steps <thread-id>` | List all steps chronologically |
-| `uwf thread read <thread-id> [--quota N]` | Render thread as readable markdown |
-| `uwf thread fork <step-hash>` | Fork from a specific step |
-| `uwf thread step-details <step-hash>` | Dump full detail node |
-| `uwf thread kill <thread-id>` | Terminate and archive |
+Global options: `-V, --version`, `--format <json|yaml>`, `-h, --help`.

-### Workflow
+| Group | Commands |
+|-------|----------|
+| **thread** | `start`, `exec`, `show`, `list`, `stop`, `cancel`, `read` |
+| **step** | `list`, `show`, `read`, `fork` |
+| **workflow** | `add`, `show`, `list` |
+| **cas** | `get`, `put`, `put-text`, `has`, `refs`, `walk`, `reindex`, `schema list`, `schema get` |
+| **setup** | Interactive or `--provider`, `--base-url`, `--api-key`, `--model`, `--agent` |
+| **skill** | `cli` — print markdown reference of all uwf commands |
+| **log** | `list`, `show`, `clean` — process-level debug logs |

-| Command | Description |
-|---------|-------------|
-| `uwf workflow put <file.yaml>` | Register a workflow from YAML |
-| `uwf workflow show <name-or-hash>` | Show workflow definition |
-| `uwf workflow list` | List registered workflows |
+Config is stored in `~/.uncaged/workflow/config.yaml`. API keys go in `~/.uncaged/workflow/.env`.

-### CAS
-
-| Command | Description |
-|---------|-------------|
-| `uwf cas get <hash>` | Read a CAS node |
-| `uwf cas put <type-hash> <data>` | Store a node |
-| `uwf cas has <hash>` | Check existence |
-| `uwf cas refs <hash>` | List direct references |
-| `uwf cas walk <hash>` | Recursive traversal |
-| `uwf cas reindex` | Rebuild type index |
-| `uwf cas schema list` | List schemas |
-| `uwf cas schema get <hash>` | Show a schema |
-
-### Setup
-
-| Command | Description |
-|---------|-------------|
-| `uwf setup` | Interactive provider/model/agent configuration |
-| `uwf setup --provider ... --base-url ... --api-key ... --model ...` | Non-interactive setup |
-
-Config stored in `~/.uncaged/workflow/config.yaml`. API keys in `~/.uncaged/workflow/.env`.
+Detailed command usage, options, and examples: [packages/cli-workflow/README.md](packages/cli-workflow/README.md).

 ## Development

 ```bash
 bun install --no-cache     # Install dependencies
+bun run build              # tsc --build (all packages)
 bun run check              # tsc + biome + lint-log-tags
 bun run format             # Auto-format with Biome
 bun test                   # Run all tests
 ```

 Managed with **bun workspace**. See [CLAUDE.md](CLAUDE.md) for coding conventions.
-
-## Architecture
-
-See [docs/architecture.md](docs/architecture.md) for the full design — three-phase engine loop, CAS node types, storage layout, agent CLI protocol, and design decisions.
@@ -17,6 +17,15 @@
    "indentWidth": 2,
    "lineWidth": 100
  },
+  "css": {
+    "parser": {
+      "cssModules": true,
+      "tailwindDirectives": true
+    },
+    "linter": {
+      "enabled": false
+    }
+  },
  "javascript": {
    "formatter": {
      "quoteStyle": "double",
@@ -16,7 +16,7 @@ The implementation lives in **6** active packages under `packages/`, plus two ex
 |-------|---------|---------------|
 | Contract | `@uncaged/workflow-protocol` → `workflow-protocol` | Shared TypeScript types (`WorkflowPayload`, `StepNodePayload`, `ModeratorContext`, `WorkflowConfig`, etc.). No runtime deps beyond `@uncaged/json-cas-fs`. |
 | Shared infra | `@uncaged/workflow-util` → `workflow-util` | Crockford Base32, ULID generation, `createLogger`, frontmatter parsing/validation. |
-| Moderator | `@uncaged/workflow-moderator` → `workflow-moderator` | JSONata-based graph evaluator: given a `WorkflowPayload` and `ModeratorContext`, returns the next role or `$END`. |
+| Moderator | `@uncaged/workflow-moderator` → `workflow-moderator` | Status-based graph evaluator: given a routing graph, last role, and last output, returns the next role or `$END`. |
 | Agent framework | `@uncaged/workflow-agent-kit` → `workflow-agent-kit` | `createAgent` entrypoint factory, context builder, frontmatter fast-path extractor, LLM extract fallback, output format instruction builder. |
 | Agent: Hermes | `@uncaged/workflow-agent-hermes` → `workflow-agent-hermes` | `uwf-hermes` CLI binary — spawns `hermes chat`, pipes prompt, captures session detail. |
 | CLI | `@uncaged/cli-workflow` → `cli-workflow` | `uwf` binary — thread lifecycle, workflow registry, CAS inspection, setup. |
@@ -27,7 +27,7 @@ The implementation lives in **6** active packages under `packages/`, plus two ex
 |---------|------|
 | `@uncaged/json-cas` | Content-addressed store API, XXH64 hashing, JSON Schema registration and validation. |
 | `@uncaged/json-cas-fs` | Filesystem backend for `json-cas`. |
-| `jsonata` | JSONata expression evaluator (used by `workflow-moderator`). |
+| `mustache` | Template renderer for edge prompts (used by `workflow-moderator`). |
 | `commander` | CLI argument parsing (used by `cli-workflow`). |
 | `dotenv` | Loads `.env` files for API keys. |
 | `yaml` | YAML parse/stringify. |
@@ -148,8 +148,7 @@ graph:
 Key properties:

 - **`roles`** — inline role definitions; each `meta` is a JSON Schema (stored as its own CAS node on registration)
- **`conditions`** — named JSONata expressions evaluated against the `ModeratorContext`
- **`graph`** — `Record<Role | "$START", Transition[]>` — first matching transition wins; `condition: null` = fallback
+- **`graph`** — `Record<Role | "$START", Record<Status, Target>>` — status-based routing; each role maps statuses to targets
 - **No agent binding** — agent selection is a deployment concern, configured in `config.yaml`
 - **No Zod** — all schemas are JSON Schema, validated through `@uncaged/json-cas`

@@ -159,8 +158,8 @@ Each `uwf thread step` runs exactly one cycle: moderator → agent → extract.

 ```
 ┌─→ Phase 1: MODERATOR
-│   Input:  WorkflowPayload + ModeratorContext { start, steps[] }
-│   Engine: JSONata conditions evaluated against the graph
+│   Input:  graph + lastRole + lastOutput
+│   Engine: Status-based map lookup against lastOutput.status
 │   Output: next role name | $END
 │
 │   Phase 2: AGENT
@@ -207,7 +206,7 @@ type AgentContext = ModeratorContext & {

 ### Key properties

- **Moderator** — pure JSONata evaluation; no LLM call, no I/O beyond CAS reads. Evaluates `workflow.graph[currentRole]` transitions in order, returns first match.
+- **Moderator** — pure status-based map lookup; no LLM call, no I/O beyond CAS reads. Looks up `graph[lastRole][lastOutput.status]` to get the next target.
 - **Agent** — receives `AgentContext` with thread history + role system prompt + output format instruction. Raw output is frontmatter markdown.
 - **Extractor** — two-layer: tries frontmatter fast-path first (zero LLM cost), falls back to LLM extract if frontmatter is absent or invalid.
 - **Stateless** — each `uwf thread step` is an atomic, self-contained operation. No in-memory state between steps.
@@ -485,7 +484,7 @@ Binary: `uwf`
 | **YAML workflow definitions** | Human-readable, versionable, no build step required. JSON Schema inline in YAML, registered as CAS nodes on `workflow put`. |
 | **Stateless single-step CLI** | Each `uwf thread step` is atomic — no in-memory state, no daemon, no long-running process. OS handles lifecycle. |
 | **CAS-backed thread state** | Immutable linked nodes enable fork, replay, and GC without copying data. Content-addressed deduplication across threads. |
-| **JSONata moderator** | Declarative condition expressions evaluated against thread history. No LLM cost for routing decisions. |
+| **Status-based moderator** | Status-based map routing — `graph[role][status]` lookup against last output. No LLM cost for routing decisions. |
 | **Frontmatter markdown output** | Agents produce structured meta (YAML frontmatter) alongside free-form content (markdown body). Enables zero-cost extraction when frontmatter is well-formed. |
 | **Two-layer extract** | Fast path avoids LLM calls when agents follow the format; LLM fallback handles messy output gracefully. |
 | **Prompt injection for format** | Output format instruction prepended to system prompt ensures agents produce parseable output without per-agent configuration. |
@@ -0,0 +1,779 @@
+# Built-in Role Agent 调研
+
+## 目标
+
+实现一个内置的 role agent（暂称 `uwf-builtin`），不依赖 hermes/openclaw 等外部 agent 进程。
+直接使用 workflow config 中配置的 model，自己实现 agent run loop 和关键 toolkit。
+
+---
+
+## 关键问题
+
+### Q1: Agent 接口协议
+
+现有 agent 是怎么被 CLI 调用的？输入（argv、环境变量）和输出（stdout、CAS）格式是什么？
+
+**调研要点：**
+- `cli-workflow` 里 `spawnAgent` 的完整实现
+- AgentConfig 类型定义
+- agent 进程的 exit code 约定
+- 环境变量传递（UWF_STORAGE_ROOT 等）
+
+**答案：**
+
+#### 调用链
+
+`uwf thread step` → `cmdThreadStepOnce` → moderator 求值下一 role → `resolveAgentConfig` → `spawnAgent`。
+
+#### AgentConfig 类型
+
+```146:149:packages/workflow-protocol/src/types.ts
+export type AgentConfig = {
+  command: string;
+  args: string[];
+};
+```
+
+在 `config.yaml` 的 `agents` 段注册，例如 `hermes: { command: "uwf-hermes", args: [] }`。
+
+#### spawnAgent 行为
+
+```627:653:packages/cli-workflow/src/commands/thread.ts
+function spawnAgent(agent: AgentConfig, threadId: ThreadId, role: string): CasRef {
+  const argv = [...agent.args, threadId, role];
+  let stdout: string;
+  try {
+    stdout = execFileSync(agent.command, argv, {
+      encoding: "utf8",
+      env: process.env,
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+  } catch (e) {
+  // ... stderr 拼进 fail 消息
+  }
+
+  const line = stdout.trim().split("\n").pop()?.trim() ?? "";
+  if (!isCasRef(line)) {
+    fail(`agent stdout is not a valid CAS hash: ${line || "(empty)"}`);
+  }
+  return line;
+}
+```
+
+| 项目 | 约定 |
+|------|------|
+| **argv** | `[...agent.args, <thread-id>, <role>]`，即 `process.argv[2]`=threadId，`process.argv[3]`=role（与 `createAgent` 的 `parseArgv` 一致） |
+| **stdin** | 忽略 |
+| **stdout** | 纯文本，**最后一行**必须是新 `StepNode` 的 CAS hash（13 字符 Crockford Base32） |
+| **stderr** | 失败时 CLI 会附带 stderr；成功时无约定 |
+| **exit code** | `0` = 成功；非 0 时 `execFileSync` 抛错，step 失败 |
+| **环境变量** | 继承父进程 `process.env`（含 storage root、API key 等） |
+| **链头更新** | **不由 agent 负责**；agent 只写 CAS StepNode，CLI 在拿到 stdout hash 后更新 `threads.yaml` |
+
+Agent 解析优先级（`resolveAgentConfig`）：
+
+1. CLI `--agent` override（整段 command + args 字符串）
+2. `config.agentOverrides[workflow.name][role]`
+3. `config.defaultAgent`
+
+#### 环境变量：Storage Root
+
+文档中写的 `UWF_STORAGE_ROOT` **在当前代码中不存在**。实际优先级（`workflow-agent-kit` / `cli-workflow` 一致）：
+
+```33:43:packages/workflow-agent-kit/src/storage.ts
+export function resolveStorageRoot(): string {
+  const internal = process.env.UNCAGED_WORKFLOW_STORAGE_ROOT;
+  if (internal !== undefined && internal !== "") {
+    return internal;
+  }
+  const userOverride = process.env.WORKFLOW_STORAGE_ROOT;
+  if (userOverride !== undefined && userOverride !== "") {
+    return userOverride;
+  }
+  return getDefaultStorageRoot();
+}
+```
+
+Agent 子进程通过继承的 `process.env` 与父 CLI 共享同一 storage root；`createAgent` 内还会 `loadDotenv({ path: getEnvPath(storageRoot) })` 加载 `~/.uncaged/workflow/.env`。
+
+#### Agent 侧职责（设计文档 + 实现）
+
+- 读 `threads.yaml` 链头，构建 context，执行 role
+- 将 `StepNode` 写入 CAS（`output` / `detail` / `agent` / `prev` / `start`）
+- stdout 打印 step hash
+- **不**更新 `threads.yaml`
+
+---
+
+### Q2: createAgent 工厂
+
+workflow-agent-kit 的 `createAgent` 做了什么？它的完整生命周期是什么？
+
+**调研要点：**
+- `AgentOptions` 类型的 `run` 和 `continue` 回调签名
+- `AgentRunResult` 的完整定义
+- retry 逻辑（frontmatter 校验失败后的重试机制）
+- `persistStep` 写入 CAS 的 StepNode 结构
+
+**答案：**
+
+#### 类型定义
+
+```4:35:packages/workflow-agent-kit/src/types.ts
+export type AgentContext = ModeratorContext & {
+  threadId: ThreadId;
+  role: string;
+  store: Store;
+  workflow: WorkflowPayload;
+  outputFormatInstruction: string;
+};
+
+export type AgentRunResult = {
+  output: string;
+  detailHash: CasRef;
+  sessionId: string;
+};
+
+export type AgentContinueFn = (
+  sessionId: string,
+  message: string,
+  store: AgentContext["store"],
+) => Promise<AgentRunResult>;
+
+export type AgentRunFn = (ctx: AgentContext) => Promise<AgentRunResult>;
+
+export type AgentOptions = {
+  name: string;
+  run: AgentRunFn;
+  continue: AgentContinueFn;
+};
+```
+
+- **`run(ctx)`**：首次执行，返回原始 agent 文本 `output`、审计用 `detailHash`、用于续聊的 `sessionId`。
+- **`continue(sessionId, message, store)`**：在同一 session 上追加用户消息（用于 frontmatter 纠错），再次返回 `AgentRunResult`。
+
+`createAgent(options)` 返回 `() => Promise<void>`，作为 agent CLI 的 `main`（见 `uwf-hermes` 的 `cli.ts`）。
+
+#### 生命周期（按执行顺序）
+
+```101:152:packages/workflow-agent-kit/src/run.ts
+export function createAgent(options: AgentOptions): () => Promise<void> {
+  return async function main(): Promise<void> {
+    const { threadId, role } = parseArgv(process.argv);
+    const storageRoot = resolveStorageRoot();
+    loadDotenv({ path: getEnvPath(storageRoot) });
+
+    const ctx = await buildContextWithMeta(threadId, role);
+    // 1. 校验 role 存在
+    // 2. 从 CAS 取 frontmatter JSON Schema → buildOutputFormatInstruction → ctx.outputFormatInstruction
+
+    let agentResult = await options.run(ctx);
+
+    let outputHash = await tryExtractOutput(agentResult.output, roleDef.frontmatter, ctx);
+
+    for (let retry = 0; retry < MAX_FRONTMATTER_RETRIES && outputHash === null; retry++) {
+      const correctionMessage = "Your previous response did not contain valid YAML frontmatter...";
+      agentResult = await options.continue(agentResult.sessionId, correctionMessage, ctx.meta.store);
+      outputHash = await tryExtractOutput(agentResult.output, roleDef.frontmatter, ctx);
+    }
+
+    if (outputHash === null) { fail(...); }
+
+    const stepHash = await persistStep({ ctx, outputHash, detailHash: agentResult.detailHash, agentName });
+    process.stdout.write(`${stepHash}\n`);
+  };
+}
+```
+
+| 阶段 | 行为 |
+|------|------|
+| 解析 argv | `argv[2]=threadId`, `argv[3]=role`，缺失则 `stderr` + `exit(1)` |
+| Context | `buildContextWithMeta` + 可选 `outputFormatInstruction` |
+| Run | `options.run(ctx)` |
+| Extract | **仅** `tryFrontmatterFastPath`（见 Q4）；**不**调用 `extract()` LLM fallback |
+| Retry | 最多 `MAX_FRONTMATTER_RETRIES = 2` 次 `continue` + 再试 fast-path |
+| Persist | `persistStep` → `writeStepNode` |
+| 输出 | stdout 一行 step CAS hash |
+
+#### StepNode 写入结构
+
+```44:68:packages/workflow-agent-kit/src/run.ts
+async function writeStepNode(options: {
+  store: AgentStore["store"];
+  schemas: AgentStore["schemas"];
+  startHash: CasRef;
+  prevHash: CasRef | null;
+  role: string;
+  outputHash: CasRef;
+  detailHash: CasRef;
+  agentName: string;
+}): Promise<CasRef> {
+  const payload: StepNodePayload = {
+    start: options.startHash,
+    prev: options.prevHash,
+    role: options.role,
+    output: options.outputHash,
+    detail: options.detailHash,
+    agent: options.agentName,
+  };
+  // store.put(stepNode schema) + validate
+}
+```
+
+`agentName` 经 `agentLabel(name)` 规范化：已有 `uwf-` 前缀则原样，否则加 `uwf-`（如 `hermes` → `uwf-hermes`）。
+
+`prevHash`：若链头仍是 `StartNode` 则为 `null`，否则为当前 head step hash。
+
+---
+
+### Q3: Context Builder
+
+`buildContextWithMeta` 构建了什么上下文给 agent？
+
+**调研要点：**
+- `AgentContext` 完整类型定义（所有字段）
+- context 构建过程（CAS chain walk）
+- `outputFormatInstruction` 怎么生成的
+- role definition 怎么获取（从 workflow YAML）
+
+**答案：**
+
+#### AgentContext 字段
+
+继承 `ModeratorContext`：
+
+```60:68:packages/workflow-protocol/src/types.ts
+export type ModeratorContext = {
+  start: StartNodePayload;
+  steps: StepContext[];
+};
+```
+
+```48:51:packages/workflow-protocol/src/types.ts
+export type StartNodePayload = {
+  workflow: CasRef;
+  prompt: string;
+};
+```
+
+```61:63:packages/workflow-protocol/src/types.ts
+export type StepContext = Omit<StepRecord, "output"> & {
+  output: unknown;
+};
+```
+
+`AgentContext` 额外字段：
+
+| 字段 | 类型 | 含义 |
+|------|------|------|
+| `threadId` | `ThreadId` | 当前线程 |
+| `role` | `string` | 本步要执行的角色名 |
+| `store` | `Store` | CAS store（读写节点） |
+| `workflow` | `WorkflowPayload` | 已从 CAS 加载的 workflow 定义 |
+| `outputFormatInstruction` | `string` | 由 `createAgent` 根据 role 的 frontmatter schema 生成；`buildContext*` 初始为 `""` |
+
+`buildContextWithMeta` 还返回 `meta`：
+
+```148:154:packages/workflow-agent-kit/src/context.ts
+export type BuildContextMeta = {
+  storageRoot: string;
+  store: Store;
+  schemas: AgentStore["schemas"];
+  headHash: CasRef;
+  chain: ChainState;
+};
+```
+
+#### CAS chain walk
+
+1. 从 `threads.yaml[threadId]` 取 `headHash`
+2. `walkChain`：若 head 是 `StartNode`，`stepsNewestFirst=[]`；否则沿 `prev` 收集所有 `StepNode`， newest-first
+3. `buildHistory`：反转为时间序，`expandOutput` 把每步 `output` CasRef 展开为 JSON payload（供 prompt / moderator 使用）
+4. `loadWorkflow`：从 `start.workflow` CasRef 加载 `WorkflowPayload`
+
+#### Role definition 来源
+
+- 作者写在 workflow YAML 的 `roles.<name>`（`goal`, `capabilities`, `procedure`, `output`, `frontmatter` 等）
+- `uwf workflow put` 时 `frontmatter` 内联 JSON Schema 经 `putSchema` 存入 CAS，workflow 里存的是 **CasRef**
+- Agent 运行时：`ctx.workflow.roles[ctx.role]` → `RoleDefinition`
+
+#### outputFormatInstruction
+
+在 `createAgent` 中，若 `getSchema(store, roleDef.frontmatter)` 非空，则：
+
+```typescript
+ctx.outputFormatInstruction = buildOutputFormatInstruction(frontmatterSchema);
+```
+
+`buildOutputFormatInstruction` 根据 JSON Schema 的 `properties` 生成「必须以 `---` YAML frontmatter 开头」的说明和示例字段列表（见 `build-output-format-instruction.ts`）。
+
+各 agent 实现（Hermes / Claude Code）在组装 prompt 时把该块放在最前，再接 `buildRolePrompt(roleDef)`。
+
+---
+
+### Q4: Extract Pipeline
+
+agent 输出怎么被处理成结构化数据？
+
+**调研要点：**
+- frontmatter fast-path 的完整逻辑
+- LLM extract fallback 的实现（`extract.ts`）
+- frontmatter schema 从哪里来（role 定义里的 `frontmatter` 字段）
+- 校验失败时的 correction prompt 是什么
+
+**答案：**
+
+#### Schema 来源
+
+Workflow YAML 中每个 role 的 `frontmatter:` 段是 JSON Schema 对象；注册时：
+
+```66:76:packages/cli-workflow/src/commands/workflow.ts
+async function resolveFrontmatterRef(..., frontmatter: unknown): Promise<CasRef> {
+  // 校验为 JSON Schema → putSchema → 返回 CasRef
+}
+```
+
+运行时 `roleDef.frontmatter` 即该 schema 的 CAS hash；structured `output` 节点用**同一 schema** 写入 CAS。
+
+#### Frontmatter fast-path（createAgent 实际使用的路径）
+
+```148:195:packages/workflow-agent-kit/src/frontmatter.ts
+export async function tryFrontmatterFastPath(
+  raw: string,
+  outputSchema: CasRef,
+  store: Store,
+): Promise<FrontmatterFastPathResult | null>
+```
+
+流程：
+
+1. `parseFrontmatterMarkdown(raw)` → 标准 agent 字段（`status`, `next`, `confidence`, `artifacts`, `scope`）+ body
+2. `validateFrontmatter` 失败 → `null`
+3. `getSchema(store, outputSchema)` + `extractSchemaFields` 得到 role 需要的属性名
+4. `buildCandidate`：从标准 frontmatter + YAML 原始字段拼出符合 schema 的对象
+5. `store.put(outputSchema, candidate)` + `validate` → 成功则 `{ body, outputHash }`
+
+**永不抛错**，失败返回 `null`。
+
+#### LLM extract fallback（已实现但未接入 createAgent）
+
+```135:181:packages/workflow-agent-kit/src/extract.ts
+export async function extract(
+  rawOutput: string,
+  outputSchema: CasRef,
+  config: WorkflowConfig,
+): Promise<ExtractResult>
+```
+
+- 模型：`resolveExtractModelAlias(config)` → `modelOverrides.extract` → `models.extract` → `models.default` → `defaultModel`
+- HTTP：`POST {baseUrl}/chat/completions`，`response_format: { type: "json_object" }`
+- System：要求按 JSON Schema 从 agent 输出提取单个 JSON 对象
+- 校验通过后 `store.put(outputSchema, structured)`
+
+**重要：`createAgent` 当前未调用 `extract()`**。fast-path 失败且 2 次 `continue` 仍失败则直接 `fail()`。builtin agent 若希望无 frontmatter 也能跑，需在 kit 或 builtin 层显式接入 `extract()`。
+
+#### Correction prompt（retry）
+
+```125:128:packages/workflow-agent-kit/src/run.ts
+const correctionMessage =
+  "Your previous response did not contain valid YAML frontmatter matching the role schema.\n" +
+  "You MUST begin your response with a YAML frontmatter block (--- delimited).\n" +
+  "Please output ONLY the corrected frontmatter block followed by your work.";
+```
+
+通过 `options.continue(sessionId, correctionMessage, store)` 发给外部 agent；builtin 需在自有 message 历史里 append 同等语义的 user 消息。
+
+---
+
+### Q5: Model 配置与 LLM 调用
+
+workflow 怎么配置和使用 model？
+
+**调研要点：**
+- `WorkflowConfig` 中 providers/models/defaultModel/modelOverrides 的完整定义
+- `resolveModel` 函数的实现
+- `chatCompletionText` 的实现（OpenAI 兼容 HTTP 客户端）
+- 有没有 streaming 支持？tool calling 支持？
+
+**答案：**
+
+#### WorkflowConfig
+
+```136:160:packages/workflow-protocol/src/types.ts
+export type ProviderConfig = {
+  baseUrl: string;
+  apiKeyEnv: string;
+};
+
+export type ModelConfig = {
+  provider: ProviderAlias;
+  name: string;
+};
+
+export type WorkflowConfig = {
+  providers: Record<ProviderAlias, ProviderConfig>;
+  models: Record<ModelAlias, ModelConfig>;
+  agents: Record<AgentAlias, AgentConfig>;
+  defaultAgent: AgentAlias;
+  agentOverrides: Record<WorkflowName, Record<RoleName, AgentAlias>> | null;
+  defaultModel: ModelAlias;
+  modelOverrides: Record<Scenario, ModelAlias> | null;
+};
+```
+
+示例见 `docs/architecture.md`（`providers` / `models` / `defaultModel` / `modelOverrides.extract`）。
+
+#### resolveModel
+
+```32:50:packages/workflow-agent-kit/src/extract.ts
+export function resolveModel(config: WorkflowConfig, alias: ModelAlias): ResolvedLlmProvider {
+  const modelEntry = config.models[alias];
+  const providerEntry = config.providers[modelEntry.provider];
+  const apiKey = process.env[providerEntry.apiKeyEnv];
+  return { baseUrl: providerEntry.baseUrl, apiKey, model: modelEntry.name };
+}
+```
+
+`ResolvedLlmProvider = { baseUrl, apiKey, model }`。
+
+Extract 专用别名解析：
+
+```18:30:packages/workflow-agent-kit/src/extract.ts
+export function resolveExtractModelAlias(config: WorkflowConfig): ModelAlias {
+  return config.modelOverrides?.extract ?? (config.models.extract ? "extract" : config.models.default ? "default" : config.defaultModel);
+}
+```
+
+**尚无** `modelOverrides` 按 role/workflow 解析 agent 主模型的函数；builtin 首版可用 `config.defaultModel`，扩展时可加 `modelOverrides.agent` 或与 `agentOverrides` 对称的表。
+
+#### chatCompletionText
+
+```87:124:packages/workflow-agent-kit/src/extract.ts
+async function chatCompletionText(
+  provider: ResolvedLlmProvider,
+  messages: Array<{ role: "system" | "user"; content: string }>,
+): Promise<string>
+```
+
+| 能力 | 现状 |
+|------|------|
+| 协议 | OpenAI 兼容 `POST /chat/completions` |
+| Streaming | **无**（一次性 `response.text()`） |
+| Tool calling | **无**（无 `tools` / `tool_calls` 字段） |
+| 多模态 | **无**（仅 text `content`） |
+| Extract 专用 | `response_format: { type: "json_object" }` |
+
+builtin agent 的 run loop 需要**新写**带 `tools` 的 completion 客户端（可放在 `workflow-agent-builtin` 或扩展 `workflow-agent-kit` 的 `llm/` 模块），不能复用当前 `chatCompletionText` 而不改。
+
+---
+
+### Q6: Hermes Agent 参考实现
+
+`uwf-hermes` 是怎么实现 `run` 和 `continue` 的？
+
+**调研要点：**
+- prompt 怎么组装的（outputFormatInstruction + rolePrompt + task + history）
+- hermes CLI 的调用参数
+- session management（resume）
+- 输出怎么捕获
+
+**答案：**
+
+#### Prompt 组装
+
+```40:53:packages/workflow-agent-hermes/src/hermes.ts
+export function buildHermesPrompt(ctx: AgentContext): string {
+  const roleDef = ctx.workflow.roles[ctx.role];
+  const rolePrompt = roleDef !== undefined ? buildRolePrompt(roleDef) : "";
+  const parts: string[] = [];
+  if (ctx.outputFormatInstruction !== "") {
+    parts.push(ctx.outputFormatInstruction, "");
+  }
+  parts.push(rolePrompt, "", "## Task", ctx.start.prompt);
+  const historyBlock = buildHistorySummary(ctx.steps);
+  if (historyBlock !== "") {
+    parts.push("", historyBlock);
+  }
+  return parts.join("\n");
+}
+```
+
+`buildRolePrompt` 生成 `## Goal` / `## Capabilities` / `## Prepare`（含 `generateCliReference()`）/ `## Procedure` / `## Output`。
+
+`buildHistorySummary`：每步 `role`、`JSON.stringify(step.output)`、`agent`。
+
+Hermes 把**整段 prompt 作为单条 user 消息**传给 `hermes chat -q`（无独立 system channel）。
+
+#### Hermes CLI 参数
+
+首次：
+
+```88:97:packages/workflow-agent-hermes/src/hermes.ts
+spawnHermes(["chat", "-q", prompt, "--yolo", "--max-turns", "90", "--quiet"]);
+```
+
+续聊：
+
+```100:114:packages/workflow-agent-hermes/src/hermes.ts
+spawnHermes(["chat", "--resume", sessionId, "-q", message, "--yolo", "--max-turns", "90", "--quiet"]);
+```
+
+#### Session
+
+- stdout/stderr 中解析 `session_id: <id>`（`parseSessionIdFromStdout`）
+- 会话文件：`~/.hermes/sessions/session_<id>.json`
+- `loadHermesSession` → `storeHermesSessionDetail`：每 assistant/tool 消息写成 CAS turn 节点，汇总为 `detail`；**output 文本** = 最后一条非空 `assistant` 的 `content`
+
+#### 与 createAgent 的衔接
+
+```157:164:packages/workflow-agent-hermes/src/hermes.ts
+export function createHermesAgent(): () => Promise<void> {
+  return createAgent({ name: "hermes", run: runHermes, continue: continueHermes });
+}
+```
+
+`uwf-hermes` 入口：`createHermesAgent()` 即 main。
+
+Claude Code 包（`workflow-agent-claude-code`）结构相同：`buildClaudeCodePrompt` 同构，`claude -p` + `--resume` + JSON stdout 解析。
+
+---
+
+### Q7: Toolkit 需求分析
+
+要实现一个自给自足的 agent，最少需要哪些 tool？
+
+**调研要点：**
+- 现有 workflow example（solve-issue.yaml）里 role 都做什么任务
+- hermes agent 在 workflow 场景下常用哪些 tool
+- 哪些 tool 是 agent loop 必须的（如 file read/write、shell exec、web fetch）
+
+**答案：**
+
+#### solve-issue.yaml 角色能力
+
+| Role | capabilities | 隐含需求 |
+|------|----------------|----------|
+| planner | issue-analysis, planning | 读上下文/仓库、总结，通常不需写代码 |
+| developer | file-edit, shell, testing | **读文件、写文件、执行命令** |
+| reviewer | code-review, static-analysis | 读 diff/文件、静态分析（可读+可选 shell） |
+
+#### Hermes 侧
+
+Hermes 自带完整 agent runtime（`--yolo`、max-turns），tool 集由 Hermes 项目定义，workflow 不配置。从 session JSON 可见 `tool_calls` 被记入 detail，常见包括文件与 shell 类工具。
+
+#### Builtin 最小 toolkit 建议
+
+| 优先级 | Tool | 用途 |
+|--------|------|------|
+| P0 | `read_file` | 读仓库/配置/issue 上下文 |
+| P0 | `write_file` / `edit_file` | developer 改代码 |
+| P0 | `run_command` | 测试、构建、git（需 cwd + timeout + 输出截断） |
+| P1 | `list_dir` / `glob` | 导航代码库 |
+| P1 | `grep` | 搜索符号/引用 |
+| P2 | `fetch_url` | 查文档（planner 偶尔需要） |
+
+**不需要**在 builtin 里实现 moderator / workflow 路由工具——仍由 `uwf thread step` + status-based moderator 负责。
+
+#### Agent loop 必须能力
+
+1. 多轮 LLM 调用 + **OpenAI-style tool_calls** 解析与执行
+2. 将 tool 结果 append 回 messages
+3. 终止条件：模型不再请求 tool，或达到 `maxTurns`
+4. 最终响应须含合法 YAML frontmatter（满足 Q4），供 `createAgent` fast-path
+
+---
+
+## 方案草案
+
+（调研完成后基于以上答案撰写）
+
+### 架构设计
+
+```mermaid
+flowchart TB
+  subgraph cli ["cli-workflow"]
+    Step["uwf thread step"]
+    Spawn["spawnAgent(uwf-builtin, threadId, role)"]
+    Step --> Spawn
+  end
+
+  subgraph builtin_pkg ["@uncaged/workflow-agent-builtin"]
+    Main["createBuiltinAgent() = createAgent({...})"]
+    Prompt["buildBuiltinPrompt(ctx)"]
+    Loop["runBuiltinLoop(provider, messages, tools)"]
+    Tools["Toolkit: read/write/exec/..."]
+    Detail["storeBuiltinDetail(turns)"]
+    Main --> Prompt
+    Main --> Loop
+    Loop --> Tools
+    Loop --> Detail
+  end
+
+  subgraph kit ["workflow-agent-kit"]
+    Ctx["buildContextWithMeta"]
+    FM["tryFrontmatterFastPath"]
+    Persist["persistStep"]
+    Ctx --> Main
+    Main --> FM
+    FM --> Persist
+  end
+
+  subgraph cas ["CAS / config"]
+    Config["config.yaml models/providers"]
+    CAS["cas/ + threads.yaml"]
+  end
+
+  Spawn --> Main
+  Config --> Loop
+  CAS --> Ctx
+  Persist --> CAS
+  Spawn -->|"stdout: step hash"| Step
+```
+
+**新包**：`packages/workflow-agent-builtin`，bin `uwf-builtin`，仅依赖 `workflow-agent-kit`、`workflow-protocol`、`workflow-util`（可选 `@uncaged/json-cas` 写 detail schema）。
+
+**分层**：
+
+| 层 | 职责 |
+|----|------|
+| `createAgent`（kit） | argv、context、frontmatter extract、StepNode、stdout 协议 — **不变** |
+| `builtin/agent.ts` | `run` / `continue` 实现 |
+| `builtin/llm.ts` | OpenAI 兼容 chat + tools（可后续抽到 kit） |
+| `builtin/tools/*.ts` | 各 tool 的 JSON Schema + handler |
+| `builtin/prompt.ts` | 复用 Hermes 的 prompt 拼接逻辑（或抽到 kit 的 `buildAgentPrompt`） |
+| `builtin/detail.ts` | 类似 Hermes：每轮 assistant/tool 写入 CAS detail |
+
+**配置集成**：
+
+```yaml
+agents:
+  builtin:
+    command: "uwf-builtin"
+    args: []
+defaultAgent: "builtin"   # 或 agentOverrides 按 role 指定
+```
+
+模型：首版 `resolveModel(config, config.defaultModel)`；后续可增加 `modelOverrides.agent` 或 per-role 映射。
+
+---
+
+### Agent Run Loop
+
+伪代码（单次 `run(ctx)`）：
+
+```
+1. provider ← resolveModel(loadWorkflowConfig(), defaultModel)
+2. system ← buildBuiltinPrompt(ctx)   // outputFormatInstruction + buildRolePrompt + Task + History
+3. messages ← [{ role: "system", content: system }]
+4. sessionId ← newULID()              // 内存或临时目录，供 continue 使用
+5. turns ← []
+
+6. for turn in 1..MAX_TURNS:
+     response ← chatCompletionWithTools(provider, messages, TOOL_DEFINITIONS)
+     record assistant message + tool_calls in turns
+
+     if response has no tool_calls:
+       finalText ← response.content
+       break
+
+     for each tool_call:
+       result ← executeTool(tool_call, { cwd: process.cwd() })
+       messages.push tool result
+       record in turns
+
+7. if no finalText with valid frontmatter after loop:
+     optionally one-shot "finalize" message without tools
+
+8. detailHash ← storeBuiltinDetail(store, sessionId, turns, metadata)
+9. return { output: finalText, detailHash, sessionId }
+```
+
+**`continue(sessionId, message, store)`**：
+
+- 从内存/磁盘恢复 `messages` + `turns`
+- `messages.push({ role: "user", content: message })`（correction 或续聊）
+- 从步骤 6 继续，步数上限可单独设小一点（如 3）
+- 返回新的 `AgentRunResult`
+
+**与 frontmatter 的配合**：
+
+- system prompt 已含 `outputFormatInstruction`；最后一轮可强制 user：`Now output your final answer with YAML frontmatter only if you have not yet.`
+- 仍依赖 `createAgent` 的 fast-path + 最多 2 次 continue
+
+**安全**：
+
+- `run_command`：白名单或需 `UWF_BUILTIN_ALLOW_SHELL=1`，默认工作区限定在 `process.cwd()` 或 `start` 中将来扩展的 `workspace` 字段
+- 路径：禁止 `..` 逃逸出 workspace root
+
+---
+
+### Toolkit 设计
+
+统一注册表：
+
+```typescript
+type BuiltinTool = {
+  name: string;
+  description: string;
+  parameters: JSONSchema; // object type
+  execute: (args: unknown, ctx: ToolContext) => Promise<string>;
+};
+
+type ToolContext = {
+  cwd: string;
+  storageRoot: string;
+};
+```
+
+| Tool name | OpenAI function | 行为摘要 |
+|-----------|-----------------|----------|
+| `read_file` | `read_file` | `{ path }` → UTF-8 文本，大小上限 |
+| `write_file` | `write_file` | `{ path, content }` → 写盘，返回确认 |
+| `edit_file` | 可选 | search/replace 块，减少 token |
+| `run_command` | `run_command` | `{ command, cwd? }` → stdout/stderr 截断 |
+| `list_dir` | `list_dir` | `{ path }` → 条目列表 |
+| `grep` | `grep` | `{ pattern, path? }` → 匹配行 |
+
+**LLM 请求形状**（扩展 extract 客户端）：
+
+```json
+{
+  "model": "...",
+  "messages": [...],
+  "tools": [{ "type": "function", "function": { "name", "description", "parameters" } }],
+  "tool_choice": "auto"
+}
+```
+
+解析 `choices[0].message.tool_calls`，执行后以 `{ role: "tool", tool_call_id, content }` 回传。
+
+**不提供** streaming 首版；detail CAS 记录每轮 tool 名/参数/结果摘要供 `uwf thread step-details` 调试。
+
+---
+
+### 与现有架构的集成
+
+| 集成点 | 方式 |
+|--------|------|
+| CLI 协议 | 实现标准 agent CLI：`uwf-builtin <thread-id> <role>`，stdout 一行 step hash，exit 0/1 |
+| 工厂 | `export function createBuiltinAgent()` → `createAgent({ name: "builtin", run, continue })` |
+| Context / Prompt | 复用 `buildContextWithMeta`、`buildRolePrompt`、`buildOutputFormatInstruction`；prompt 布局对齐 `buildHermesPrompt` |
+| 结构化输出 | 优先 YAML frontmatter fast-path；可选后续在 `createAgent` 增加 `extract()` fallback 开关 |
+| 配置 | `config.yaml` 增加 `agents.builtin`；`uwf setup` 可选默认 agent |
+| 存储 | `resolveStorageRoot()` + `loadWorkflowConfig` + `getEnvPath`；与 Hermes 相同，**不**改 `threads.yaml` 写入方 |
+| 测试 | 单元测试：tool handlers、prompt 组装、mock LLM tool loop；集成测试：临时 storage root + fake provider |
+| 发布 | 新包 `@uncaged/workflow-agent-builtin`，bin `uwf-builtin`，加入 `scripts/publish-all.mjs` |
+
+**明确不做**：
+
+- 不替代 moderator / 不在 agent 内调用 `uwf thread step`
+- 不依赖 Hermes/OpenClaw/Claude Code 二进制
+- 首版不实现 streaming、不实现 MCP
+
+**建议实现顺序**：
+
+1. `llm.ts`：tool calling HTTP 客户端 + 单测
+2. P0 tools + `runBuiltinLoop`
+3. `createBuiltinAgent` + detail CAS
+4. `config` / docs / `examples` 可选 `agentOverrides` 演示
+5. （可选）`createAgent` 接入 `extract()` fallback
@@ -0,0 +1,73 @@
+# Issue #418: ACP session/resume 返回空文本
+
+## 调研日期: 2026-05-23
+
+## 根因
+
+`session/resume` 在 restore 路径下 `_make_agent()` 失败，异常被静默吞掉。
+
+### 完整调用链
+
+```
+resume_session(sid)
+  → update_cwd(sid)
+    → get_session(sid) → _restore(sid)
+      → _make_agent()
+        → resolve_runtime_provider("custom") 失败（line 548-561）
+        → AIAgent() 抛出 "No LLM provider configured"（line 564）
+      → except Exception 静默吞掉（line 482-484）→ return None
+    → return None
+  → state is None → fallback: create_session()（新 sid，无历史）
+```
+
+### 关键代码位置（acp_adapter/session.py）
+
+- `_restore()` line 426-498: 从 DB 恢复 session，但 except 太宽泛
+- `_make_agent()` line 520-568: provider 解析在 restore 路径下不完整
+- Line 548-561: `resolve_runtime_provider("custom")` 失败后，`base_url` 虽然从 DB 取到了但没传给 AIAgent
+
+### 实测行为
+
+1. Phase 1: `session/new` + `prompt` → 正常，有 `agent_message_chunk`
+2. Phase 2: `session/resume` + `prompt`
+   - resume 返回成功，但 `available_commands_update` 里 sessionId 是新的（create_session fallback）
+   - 用原始 sid 发 prompt → `stopReason: "refusal"`（session 不在内存中）
+   - 用新 sid 发 prompt → 能跑但无历史（agent 回答"不知道 secret code"）
+
+### 验证脚本
+
+```python
+# 直接调用 _restore 验证
+cd ~/.hermes/hermes-agent
+python3 -c "
+import sys; sys.path.insert(0, '.')
+from acp_adapter.session import SessionManager
+sm = SessionManager()
+result = sm._restore('SESSION_ID_HERE')
+print(result)  # None — _make_agent 抛异常被吞掉
+"
+```
+
+### 两个 bug
+
+1. **`_make_agent` provider fallback 不完整**: restore 时 DB 里有 `base_url` 和 `api_mode`，但 `resolve_runtime_provider` 失败后这些值没被正确传递给 AIAgent
+2. **`_restore` 的 except 太宽泛**: 静默吞掉所有异常，连 warning 都只在 debug 级别，导致 resume 失败完全无感知
+
+### Hermes 版本
+
+- v0.10.0 (2026.4.16) — 初始测试
+- v0.14.0 (2026.5.16) — 更新后重新测试，bug 仍在
+- 代码路径: ~/.hermes/hermes-agent/acp_adapter/session.py
+
+### v0.14.0 测试结果 (2026-05-23)
+
+- `_restore` 仍因 `custom` provider 解析失败返回 None
+- 日志更清晰了：`WARNING: Failed to recreate agent for ACP session ...`
+- resume fallback 创建新 session（新 sid），但 agent 居然能回答之前的问题（可能通过 memory/session search）
+- 核心问题不变：sessionId 变了，client 用旧 sid 发 prompt → refusal
+
+### 上游 Issue
+
+- https://github.com/NousResearch/hermes-agent/issues/13489 — 已评论根因分析
+- https://github.com/NousResearch/hermes-agent/issues/8083 — resume 静默创建新 session
+- https://github.com/NousResearch/hermes-agent/issues/18452 — _make_agent fallback 不完整
@@ -75,7 +75,7 @@ uwf thread step 01J7K9M2XNPQR5VWBCDF8G3H4T --agent "bunx uwf-cursor"
 **做的事：**
 1. 读链头 → 当前 StepNode（或 StartNode）
 2. 收集 thread 历史（遍历链）
-3. 调 moderator：评估 JSONata conditions → 得到下一个 role（或 END）
+3. 调 moderator：status-based map lookup → 得到下一个 role（或 END）
 4. 若 END → 归档 thread，输出最后链头，退出
 5. 确定 agent command（`--agent` override > config.yaml per-workflow/role > config.yaml defaultAgent）
 6. 调用：`<agent-cmd> <thread-id> <role>`，捕获 stdout 得到新 StepNode hash
@@ -199,29 +199,21 @@ payload:
 ```

 - `roles` — 内联定义，每个 role 的 `meta` 是独立的 cas_ref（指向 json-cas 内置 JSON Schema 节点）
- `conditions` — `Record<Name, JSONata>`，命名条件，方便画图描述
- `graph` — `Record<Role | "$START", Transition[]>`，每个 Transition = `{ role, condition }`
- `condition` 引用 conditions 中的 key，`null` = fallback
- 按数组顺序求值，第一个匹配的 transition 胜出
+- `graph` — `Record<Role | "$START", Record<Status, Target>>`，每个 Target = `{ role, prompt }`
+- Status 来自上一个 role 输出的 `status` 字段，`$START` 用 `_` 作为初始 status
+- Prompt 模板使用 Mustache 渲染，变量来自 lastOutput
 - 不含 agent binding — agent 配置在 `~/.uncaged/workflow/config.yaml` 中管理

-JSONata 表达式的求值上下文：
+Moderator 的求值逻辑：

-```jsonc
-{
-  "start": {                          // StartNode 信息
-    "workflow": "4KNM2PXR3B1QW",
-    "prompt": "Fix the login bug..."
-  },
-  "steps": [                          // 所有已完成 steps，从旧到新
-    { "role": "planner", "output": { "phases": [...] }, "detail": "7BQST3VW9F2MA", "agent": "uwf-hermes" },
-    { "role": "developer", "output": { "filesChanged": ["src/auth.ts"], "summary": "Fixed redirect" }, "detail": "9KRVW3TN5F1QA", "agent": "uwf-cursor" },
-    { "role": "reviewer", "output": { "approved": false }, "detail": "2MXBG6PN4A8JR", "agent": "uwf-hermes" }
-  ]
-}
+```typescript
+evaluate(graph, lastRole, lastOutput) → { role, prompt }
+// 1. status = lastRole === "$START" ? "_" : lastOutput.status
+// 2. target = graph[lastRole][status]
+// 3. prompt = mustache.render(target.prompt, lastOutput)
 ```

-注：`output` 在上下文中会被自动展开为实际的 CAS 节点内容（而非 hash），方便 JSONata 表达式直接访问字段。
+注：routing 基于 `lastOutput.status` 字段的值，直接在 graph map 中查找对应的 Target。

 #### `StartNode`（Thread 起点）

@@ -350,7 +342,7 @@ OPENROUTER_API_KEY=sk-or-...
 ```
 packages/
 ├── cli-workflow/              # @uncaged/cli-workflow — uwf CLI（thread/workflow 命令）
-├── workflow-moderator/        # @uncaged/workflow-moderator — JSONata moderator 引擎
+├── workflow-moderator/        # @uncaged/workflow-moderator — Status-based moderator 引擎
 ├── workflow-agent-kit/        # @uncaged/workflow-agent-kit — Agent CLI 框架（含 extractor）
 ├── workflow-agent-hermes/     # @uncaged/workflow-agent-hermes — uwf-hermes CLI
 ├── workflow-agent-cursor/ # @uncaged/workflow-agent-cursor — uwf-cursor CLI
@@ -367,7 +359,7 @@ packages/

 ## 4. 关键数据类型

-JSONata 求值上下文本质上是 thread 链表的线性化表达。StepNode payload 和上下文中的 step 共享大量字段，提取为公共类型。
+Moderator 通过 status-based map lookup 进行路由。StepNode payload 和上下文中的 step 共享大量字段，提取为公共类型。

 ### 4.1 公共类型

@@ -378,7 +370,7 @@ type CasRef = string;
 /** Thread ID — ULID, 26-char Crockford Base32 */
 type ThreadId = string;

-/** 一个 step 的核心数据，被 StepNode payload 和 JSONata 上下文共享 */
+/** 一个 step 的核心数据，被 StepNode payload 和 moderator 上下文共享 */
 type StepRecord = {
  role: string;
  output: CasRef;                    // cas_ref → 结构化输出节点（符合 role meta schema）
@@ -399,22 +391,16 @@ type RoleDefinition = {
  meta: CasRef;                      // cas_ref → json-cas 内置 JSON Schema 节点
 };

-type Transition = {
+type Target = {
  role: string;                      // 目标 role 名 或 "$END"
-  condition: string | null;          // 引用 conditions 中的 key，null = fallback
-};
-
-type ConditionDefinition = {
-  description: string;
-  expression: string;                           // JSONata expression
+  prompt: string;                    // Mustache 模板，渲染时注入 lastOutput
 };

 type WorkflowPayload = {
  name: string;
  description: string;
  roles: Record<string, RoleDefinition>;
-  conditions: Record<string, ConditionDefinition>;
-  graph: Record<string, Transition[]>;          // Record<Role | "$START", Transition[]>
+  graph: Record<string, Record<string, Target>>;  // Record<Role | "$START", Record<Status, Target>>
 };
 ```

@@ -432,20 +418,14 @@ type StepNodePayload = StepRecord & {
 };
 ```

-### 4.4 JSONata 求值上下文
+### 4.4 Moderator 求值

-Thread 链表的线性化。`steps[n]` 的字段和 `StepRecord` 一致，但 `output` 被展开为实际内容。
+Moderator 使用 `evaluate(graph, lastRole, lastOutput)` 进行同步 status-based routing：

 ```typescript
-/** JSONata 上下文中的 step — output 被展开 */
-type StepContext = Omit<StepRecord, "output"> & {
-  output: unknown;                   // 展开后的 CAS 节点内容，非 hash
-};
-
-type ModeratorContext = {
-  start: StartNodePayload;
-  steps: StepContext[];              // 从旧到新
-};
+// graph[lastRole][lastOutput.status] → Target { role, prompt }
+// $START 角色使用 "_" 作为初始 status
+// prompt 通过 Mustache 模板渲染，变量来自 lastOutput
 ```

 ### 4.5 CLI 输出
@@ -534,6 +514,5 @@ StepNodePayload ──extends──→ StepRecord ←──maps to──→ Step
    │
    └── start.workflow → WorkflowPayload
                             ├── roles: Record<name, RoleDefinition>
-                             ├── conditions: Record<name, JSONata>
-                             └── graph: Record<role, Transition[]>
+                             └── graph: Record<role, Record<status, Target>>
 ```
@@ -22,6 +22,8 @@ roles:
    frontmatter:
      type: object
      properties:
+        $status:
+          enum: ["_"]
        thesis:
          type: string
        keyPoints:
@@ -30,12 +32,9 @@ roles:
            type: string
        caveats:
          type: string
-      required: [thesis, keyPoints]
-conditions: {}
+      required: [$status, thesis, keyPoints]
 graph:
  $START:
-    - role: "analyst"
-      condition: null
+    _: { role: "analyst", prompt: "Analyze the topic in the task and produce a structured summary with key points." }
  analyst:
-    - role: "$END"
-      condition: null
+    _: { role: "$END", prompt: "Analysis complete. Finish the workflow." }
@@ -0,0 +1,62 @@
+name: "debate"
+description: "Structured debate between two sides. Tests cross-process session resume."
+roles:
+  against:
+    description: "Argues against the proposition"
+    goal: |
+      You are a skilled debater arguing AGAINST the proposition.
+      Be logical, cite evidence, and directly address your opponent's points.
+      Keep each argument concise (under 200 words).
+    capabilities:
+      - argumentation
+      - critical-thinking
+    procedure: |
+      1. If this is the opening, present your strongest argument against the proposition.
+      2. If responding to the other side, directly counter their points with evidence and logic.
+      3. If you find yourself genuinely convinced by the other side, you may concede.
+    output: |
+      Provide your argument in the frontmatter.
+      Set status to "conceded" ONLY if you are genuinely convinced and wish to stop debating.
+      Otherwise set status to "continue".
+    frontmatter:
+      type: object
+      properties:
+        $status:
+          enum: ["continue", "conceded"]
+        argument:
+          type: string
+      required: [$status, argument]
+  for:
+    description: "Argues for the proposition"
+    goal: |
+      You are a skilled debater arguing FOR the proposition.
+      Be logical, cite evidence, and directly address your opponent's points.
+      Keep each argument concise (under 200 words).
+    capabilities:
+      - argumentation
+      - critical-thinking
+    procedure: |
+      1. Read the opposing side's latest argument carefully.
+      2. Counter their points with evidence and logic.
+      3. If you find yourself genuinely convinced by the other side, you may concede.
+    output: |
+      Provide your argument in the frontmatter.
+      Set status to "conceded" ONLY if you are genuinely convinced and wish to stop debating.
+      Otherwise set status to "continue".
+    frontmatter:
+      type: object
+      properties:
+        $status:
+          enum: ["continue", "conceded"]
+        argument:
+          type: string
+      required: [$status, argument]
+graph:
+  $START:
+    _: { role: "against", prompt: "Present your opening argument against the proposition." }
+  against:
+    conceded: { role: "$END", prompt: "The against side conceded. Debate over." }
+    continue: { role: "for", prompt: "Counter the opposing argument: {{{argument}}}" }
+  for:
+    conceded: { role: "$END", prompt: "The for side conceded. Debate over." }
+    continue: { role: "against", prompt: "Counter the opposing argument: {{{argument}}}" }
@@ -3,22 +3,37 @@ description: "End-to-end issue resolution"
 roles:
  planner:
    description: "Creates implementation plan"
-    goal: "You are a planning agent. You analyze issues and create step-by-step plans."
+    goal: "You are a planning agent. You analyze issues and create implementation plans grounded in the actual codebase."
    capabilities:
      - issue-analysis
      - planning
-    procedure: "Analyze the issue and create a detailed, actionable implementation plan."
-    output: "Output the plan summary and list of concrete steps."
+      - file-read
+      - shell
+    procedure: |
+      1. Locate the code repository:
+         - Check if the current working directory is the repo (look for package.json, .git, etc.)
+         - If the task mentions a repo URL, clone it first.
+         - If this is a new project, create the repo and note the path.
+      2. Explore the codebase — read the relevant source files mentioned in the issue. Understand the current architecture, types, and conventions (check CLAUDE.md, CONTRIBUTING.md, .cursor/rules/).
+      3. Identify which files need changes and what the changes should be, with specific code references.
+      4. Output the plan with:
+         - `repoPath`: absolute path to the repository root
+         - `plan`: detailed implementation plan with file paths and code references
+         - `steps`: concrete action items for the developer
+    output: |
+      Provide repoPath, plan summary, and steps in the frontmatter.
+      The plan MUST reference actual file paths and code structures you found by reading the source.
+      Do NOT guess — if you haven't read a file, read it before referencing it.
    frontmatter:
      type: object
      properties:
+        $status:
+          enum: ["_"]
+        repoPath:
+          type: string
        plan:
          type: string
-        steps:
-          type: array
-          items:
-            type: string
-      required: [plan, steps]
+      required: [$status, repoPath, plan]
  developer:
    description: "Implements code changes"
    goal: "You are a developer agent. You implement code changes according to plans."
@@ -26,50 +41,52 @@ roles:
      - file-edit
      - shell
      - testing
-    procedure: "Implement the plan. Write code, tests, and ensure existing tests pass."
+    procedure: |
+      1. Read the planner's output to get the repoPath and implementation plan.
+      2. cd to the repoPath before making any changes.
+      3. Create a feature branch from the default branch.
+      4. Implement the plan — write code, tests, and ensure existing tests pass.
+      5. Run the project's lint/check command (e.g. `bun run check`, `npm run lint`) and fix ALL errors before proceeding. Build and lint must pass cleanly.
+      6. Commit your changes with a descriptive message referencing the issue.
    output: "List all files changed and provide a summary of the implementation."
    frontmatter:
      type: object
      properties:
+        $status:
+          enum: ["_"]
        filesChanged:
          type: array
          items:
            type: string
        summary:
          type: string
-      required: [filesChanged, summary]
+      required: [$status, filesChanged, summary]
  reviewer:
    description: "Reviews code changes"
    goal: "You are a code reviewer. You review implementations for correctness and quality."
    capabilities:
      - code-review
      - static-analysis
-    procedure: "Review the implementation against the plan. Check for bugs, edge cases, and style."
+    procedure: |
+      1. Run hard checks first — build (`bun run build` or equivalent) and lint (`bunx biome check .` or equivalent) MUST pass with zero errors. If they fail, reject immediately.
+      2. Then review code quality: correctness, edge cases, naming, project conventions (CLAUDE.md), and test coverage.
+      3. Only reject for hard check failures or genuine correctness/security issues. Style suggestions alone should not block approval.
    output: "Approve or reject with detailed comments explaining your decision."
    frontmatter:
      type: object
      properties:
-        approved:
-          type: boolean
+        $status:
+          enum: ["approved", "rejected"]
        comments:
          type: string
-      required: [approved, comments]
-conditions:
-  notApproved:
-    description: "Reviewer rejected the implementation"
-    expression: "$last('reviewer').approved = false"
+      required: [$status, comments]
 graph:
  $START:
-    - role: "planner"
-      condition: null
+    _: { role: "planner", prompt: "Analyze the issue described in the task and produce a detailed implementation plan." }
  planner:
-    - role: "developer"
-      condition: null
+    _: { role: "developer", prompt: "Implement the plan from the planner. Write code, tests, and ensure existing tests pass." }
  developer:
-    - role: "reviewer"
-      condition: null
+    _: { role: "reviewer", prompt: "Review the developer's implementation against the plan for correctness and quality." }
  reviewer:
-    - role: "developer"
-      condition: "notApproved"
-    - role: "$END"
-      condition: null
+    approved: { role: "$END", prompt: "The review passed. Complete the workflow." }
+    rejected: { role: "developer", prompt: "The reviewer rejected your implementation. Read their feedback and fix the issues: {{{comments}}}" }
@@ -531,13 +531,25 @@ export async function executeThread(
      timestamp: nowMs,
      parentState: options.parentStateHash,
    },
-    steps: input.steps.map((out, i) => ({
-      role: out.role,
-      contentHash: out.contentHash,
-      meta: out.meta,
-      refs: out.refs,
-      timestamp: replayTs?.[i] ?? prefilled?.[i]?.timestamp ?? nowMs + i,
-    })),
+    steps: await Promise.all(
+      input.steps.map(async (out, i) => {
+        // Resolve content for the last step (most relevant for the next agent).
+        // Earlier steps only carry meta summaries to avoid bloating the prompt.
+        const isLast = i === input.steps.length - 1;
+        let content: string | null = null;
+        if (isLast) {
+          content = await getContentMerklePayload(io.cas, out.contentHash);
+        }
+        return {
+          role: out.role,
+          contentHash: out.contentHash,
+          content,
+          meta: out.meta,
+          refs: out.refs,
+          timestamp: replayTs?.[i] ?? prefilled?.[i]?.timestamp ?? nowMs + i,
+        };
+      }),
+    ),
  };

  const runtime: WorkflowRuntime = {
@@ -71,6 +71,7 @@ export type RoleStep<M extends RoleMeta> = {
    role: K;
    meta: M[K];
    contentHash: string;
+    content: string | null;
    refs: string[];
    timestamp: number;
  };
@@ -71,7 +71,8 @@ async function buildRoleStepsFromStates<M extends RoleMeta>(
  cas: CasStore,
 ): Promise<RoleStep<M>[]> {
  const steps: RoleStep<M>[] = [];
-  for (const st of chronologicalStates) {
+  for (let idx = 0; idx < chronologicalStates.length; idx++) {
+    const st = chronologicalStates[idx];
    if (st.payload.role === END) {
      continue;
    }
@@ -79,10 +80,13 @@ async function buildRoleStepsFromStates<M extends RoleMeta>(
    if (contentParsed === null || contentParsed.kind !== "content") {
      throw new Error(`buildThreadContext: expected content node at ${st.payload.content}`);
    }
+    // Resolve full text content for the last step only
+    const isLast = idx === chronologicalStates.length - 1;
    steps.push({
      role: st.payload.role,
      meta: st.payload.meta,
      contentHash: st.payload.content,
+      content: isLast ? contentParsed.node.payload : null,
      refs: [...contentParsed.node.refs],
      timestamp: st.payload.timestamp,
    } as RoleStep<M>);
@@ -88,6 +88,7 @@ async function advanceOneRound<M extends RoleMeta>(
  const step = {
    role: next,
    contentHash,
+    content: contentPayload,
    meta,
    refs,
    timestamp: Date.now(),
@@ -30,7 +30,7 @@ describe("buildAgentPrompt", () => {
    expect(text).not.toContain("## Tools");
  });

-  test("single step shows hash and meta, and includes tools", async () => {
+  test("single step shows meta and content, and includes tools", async () => {
    const onlyHash = "01HASHSINGLESTEP0000000001";
    const ctx: AgentContext = {
      start: startTask("user task"),
@@ -42,6 +42,7 @@ describe("buildAgentPrompt", () => {
        {
          role: "coder",
          contentHash: onlyHash,
+          content: "Here is my implementation of the feature.",
          meta: { files: ["a.ts"] },
          refs: [onlyHash],
          timestamp: 2,
@@ -52,13 +53,39 @@ describe("buildAgentPrompt", () => {
    expect(text).toContain("## Task");
    expect(text).toContain("user task");
    expect(text).toContain("## Step: coder");
-    expect(text).toContain(`ContentHash: ${onlyHash}`);
    expect(text).toContain('Meta: {"files":["a.ts"]}');
+    expect(text).toContain("<output>");
+    expect(text).toContain("Here is my implementation of the feature.");
+    expect(text).toContain("</output>");
    expect(text).toContain("## Tools");
    expect(text).toContain("uncaged-workflow thread 01TEST000000000000000000TR");
  });

-  test("two or more steps: previous steps are meta-only; latest step includes hash", async () => {
+  test("single step with null content omits output tag", async () => {
+    const onlyHash = "01HASHSINGLESTEP0000000001";
+    const ctx: AgentContext = {
+      start: startTask("user task"),
+      depth: 0,
+      bundleHash: "TESTHASH00001",
+      threadId: "01TEST000000000000000000TR",
+      currentRole: { name: "coder", systemPrompt: "Be helpful." },
+      steps: [
+        {
+          role: "coder",
+          contentHash: onlyHash,
+          content: null,
+          meta: { files: ["a.ts"] },
+          refs: [onlyHash],
+          timestamp: 2,
+        },
+      ],
+    };
+    const text = await buildAgentPrompt(ctx);
+    expect(text).not.toContain("<output>");
+    expect(text).toContain('Meta: {"files":["a.ts"]}');
+  });
+
+  test("two or more steps: previous steps are meta-only; latest step includes content", async () => {
    const plannerHash = "01HASHPLANNER0000000000001";
    const coderHash = "01HASHCODER0000000000000001";
    const ctx: AgentContext = {
@@ -71,6 +98,7 @@ describe("buildAgentPrompt", () => {
        {
          role: "planner",
          contentHash: plannerHash,
+          content: null,
          meta: { plan: "short" },
          refs: [plannerHash],
          timestamp: 2,
@@ -78,6 +106,7 @@ describe("buildAgentPrompt", () => {
        {
          role: "coder",
          contentHash: coderHash,
+          content: "I reviewed the code and found 4 lint issues:\n1. Missing semicolon on line 42\n2. Unused import on line 3",
          meta: { done: true },
          refs: [coderHash],
          timestamp: 3,
@@ -90,10 +119,11 @@ describe("buildAgentPrompt", () => {
    expect(text).toContain("### Step 1: planner");
    expect(text).toContain('Summary: {"plan":"short"}');
    expect(text).toContain("## Latest Step: coder");
-    expect(text).toContain(`ContentHash: ${coderHash}`);
    expect(text).toContain('Meta: {"done":true}');
+    expect(text).toContain("<output>");
+    expect(text).toContain("I reviewed the code and found 4 lint issues:");
+    expect(text).toContain("</output>");
    expect(text).toContain("## Tools");
-    expect(text).toContain("uncaged-workflow thread 01TEST000000000000000000TR");
  });

  test("parentState null omits Parent Context section", async () => {
@@ -125,7 +155,7 @@ describe("buildAgentPrompt", () => {
    expect(text).toContain(`uncaged-workflow cas get ${parentHash}`);
  });

-  test("middle steps show meta summary only and latest shows hash", async () => {
+  test("middle steps show meta summary only and latest shows content", async () => {
    const ha = "01HASHA00000000000000000001";
    const hb = "01HASHB00000000000000000001";
    const hc = "01HASHC00000000000000000001";
@@ -139,6 +169,7 @@ describe("buildAgentPrompt", () => {
        {
          role: "a",
          contentHash: ha,
+          content: null,
          meta: { n: 1 },
          refs: [ha],
          timestamp: 2,
@@ -146,6 +177,7 @@ describe("buildAgentPrompt", () => {
        {
          role: "b",
          contentHash: hb,
+          content: null,
          meta: { n: 2 },
          refs: [hb],
          timestamp: 3,
@@ -153,6 +185,7 @@ describe("buildAgentPrompt", () => {
        {
          role: "c",
          contentHash: hc,
+          content: "Final output from role c",
          meta: { n: 3 },
          refs: [hc],
          timestamp: 4,
@@ -162,7 +195,35 @@ describe("buildAgentPrompt", () => {
    const text = await buildAgentPrompt(ctx);
    expect(text).toContain('Summary: {"n":1}');
    expect(text).toContain('Summary: {"n":2}');
-    expect(text).toContain(`ContentHash: ${hc}`);
    expect(text).toContain("## Latest Step: c");
+    expect(text).toContain("<output>");
+    expect(text).toContain("Final output from role c");
+    expect(text).toContain("</output>");
+  });
+
+  test("content is truncated when exceeding quota", async () => {
+    const longContent = "x".repeat(20_000);
+    const hash = "01HASHLONG000000000000000001";
+    const ctx: AgentContext = {
+      start: startTask("task"),
+      depth: 0,
+      bundleHash: "TESTHASH00001",
+      threadId: "01TEST000000000000000000TR",
+      currentRole: { name: "r", systemPrompt: "S" },
+      steps: [
+        {
+          role: "r",
+          contentHash: hash,
+          content: longContent,
+          meta: {},
+          refs: [],
+          timestamp: 2,
+        },
+      ],
+    };
+    const text = await buildAgentPrompt(ctx);
+    expect(text).toContain("<output>");
+    expect(text).toContain("... (truncated)");
+    expect(text.length).toBeLessThan(20_000);
  });
 });
@@ -5,20 +5,23 @@
    "packages/*"
  ],
  "scripts": {
+    "uwf": "bun packages/cli-workflow/src/cli.ts",
    "build": "bunx tsc --build",
    "check": "bunx tsc --build && biome check . && bash scripts/lint-log-tags.sh",
    "typecheck": "bunx tsc --build",
    "format": "biome format --write .",
-    "test": "bun run --filter '*' test",
+    "test": "bun run --filter './packages/*' test",
    "changeset": "bunx changeset",
    "version": "bunx changeset version",
    "release": "bun run build && bun test && node scripts/publish-all.mjs"
  },
  "devDependencies": {
+    "@agentclientprotocol/sdk": "^0.22.1",
    "@biomejs/biome": "^2.4.14",
    "@changesets/cli": "^2.31.0",
    "@types/node": "^25.7.0",
    "@types/xxhashjs": "^0.2.4",
+    "@uncaged/workflow-agent-hermes": "workspace:*",
    "bun-types": "^1.3.13"
  }
 }
@@ -0,0 +1,211 @@
+# @uncaged/cli-workflow
+
+`uwf` CLI — thread lifecycle, workflow registry, CAS inspection, and setup.
+
+## Overview
+
+Layer 4 entry point for the workflow engine. The `uwf` binary orchestrates one step per invocation: load thread head from `threads.yaml`, run the moderator, spawn the configured agent CLI, run extract, append a CAS step node, and update the head pointer (or archive when `$END`).
+
+### Four-Layer Architecture
+
+```
+workflow → thread → step → turn
+模板定义   执行实例   单步结果   agent内部交互
+```
+
+- **Workflow** (layer 1): YAML template with roles and routing graph
+- **Thread** (layer 2): Single workflow execution instance
+- **Step** (layer 3): One moderator→agent→extract cycle
+- **Turn** (layer 4): Agent-internal interactions (use `step show` or CAS to inspect)
+
+This package has no library `src/index.ts` — it is consumed as a CLI binary only.
+
+**Dependencies:** `@uncaged/json-cas`, `@uncaged/json-cas-fs`, `@uncaged/workflow-agent-kit`, `@uncaged/workflow-moderator`, `@uncaged/workflow-protocol`, `@uncaged/workflow-util`, `commander`, `dotenv`, `yaml`
+
+## Installation
+
+Included as the `uwf` binary when you install `@uncaged/cli-workflow`:
+
+```bash
+bun add -g @uncaged/cli-workflow
+# or from the monorepo:
+bun link packages/cli-workflow
+```
+
+## CLI Usage
+
+### Global options
+
+```
+-V, --version          Show version
+--format <json|yaml>   Output format (default: json)
+-h, --help             Show help
+```
+
+### Thread (Layer 2: Execution Instances)
+
+| Command | Description |
+|---------|-------------|
+| `uwf thread start <workflow> -p <prompt>` | Create a thread without executing |
+| `uwf thread exec <thread-id> [--agent <cmd>] [-c <count>] [--background]` | Execute one or more moderator→agent→extract cycles |
+| `uwf thread show <thread-id>` | Show thread head pointer |
+| `uwf thread list [--status <status>] [--after <date>] [--before <date>] [--skip <n>] [--take <n>]` | List threads filtered by status (idle, running, completed, active, or comma-separated), time range (ISO or relative like '7d'), with pagination |
+| `uwf thread read <thread-id> [--quota N] [--before <hash>] [--start]` | Render thread as readable markdown |
+
+`thread read`, `step list`, and `step show` work on both active and completed threads.
+| `uwf thread stop <thread-id>` | Stop background execution (keep thread active) |
+| `uwf thread cancel <thread-id>` | Cancel thread (stop + archive to history) |
+
+Examples:
+
+```bash
+uwf thread start solve-issue -p "Fix the login redirect bug"
+uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV
+uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV -c 3 --agent uwf-builtin
+uwf thread exec 01ARZ3NDEKTSV4RRFFQ69G5FAV --background
+uwf thread list --status running
+uwf thread list --status active
+uwf thread list --status idle,completed
+uwf thread list --after 7d --take 10
+uwf thread read 01ARZ3NDEKTSV4RRFFQ69G5FAV --quota 8000
+uwf thread stop 01ARZ3NDEKTSV4RRFFQ69G5FAV
+```
+
+### Step (Layer 3: Single Cycle Results)
+
+| Command | Description |
+|---------|-------------|
+| `uwf step list <thread-id>` | List all steps in a thread chronologically |
+| `uwf step show <step-hash>` | Show step metadata and frontmatter |
+| `uwf step read <step-hash> [--quota <chars>]` | Read a step's turns as human-readable markdown |
+| `uwf step fork <step-hash>` | Fork a thread from a specific step |
+
+Examples:
+
+```bash
+uwf step list 01ARZ3NDEKTSV4RRFFQ69G5FAV
+uwf step show 32GCDE899RRQ3
+uwf step read 32GCDE899RRQ3 --quota 2000
+uwf step fork 32GCDE899RRQ3
+```
+
+### Workflow (Layer 1: Templates)
+
+| Command | Description |
+|---------|-------------|
+| `uwf workflow add <file.yaml>` | Register a workflow from YAML |
+| `uwf workflow show <name-or-hash>` | Show workflow definition |
+| `uwf workflow list` | List registered workflows |
+
+### CAS
+
+| Command | Description |
+|---------|-------------|
+| `uwf cas get <hash> [--timestamp]` | Read a CAS node |
+| `uwf cas put <type-hash> <data>` | Store a node, print hash |
+| `uwf cas put-text <text>` | Store plain text, print hash |
+| `uwf cas has <hash>` | Check existence |
+| `uwf cas refs <hash>` | List direct references |
+| `uwf cas walk <hash>` | Recursive traversal |
+| `uwf cas reindex` | Rebuild type index |
+| `uwf cas schema list` | List registered schemas |
+| `uwf cas schema get <hash>` | Show a schema |
+
+### Setup
+
+```bash
+uwf setup
+uwf setup --provider openai --base-url https://api.openai.com/v1 \
+  --api-key sk-... --model gpt-4o --agent hermes
+```
+
+Config: `~/.uncaged/workflow/config.yaml`. API keys: `~/.uncaged/workflow/.env`.
+
+### Skill
+
+| Command | Description |
+|---------|-------------|
+| `uwf skill cli` | Print markdown reference of all uwf commands (for agent skills) |
+
+### Log
+
+| Command | Description |
+|---------|-------------|
+| `uwf log list` | List log files with sizes |
+| `uwf log show [--thread <id>] [--process <pid>] [--date YYYY-MM-DD]` | Show filtered log entries |
+| `uwf log clean [--before YYYY-MM-DD]` | Delete old log files |
+
+## Migration Guide
+
+### Breaking Changes (v0.x → v1.x)
+
+The CLI was reorganized to clarify the four-layer architecture. **No backward compatibility** — old commands have been removed.
+
+#### Renamed Commands
+
+| Old Command | New Command | Notes |
+|------------|-------------|-------|
+| `workflow put` | `workflow add` | More intuitive verb |
+| `thread step` | `thread exec` | Eliminates ambiguity with "step" noun |
+| `thread list --all` | `thread list --status completed` | Unified status filtering |
+
+#### Removed Commands (Merged)
+
+| Old Command | New Command | Notes |
+|------------|-------------|-------|
+| `thread running` | `thread list --status running` | Merged into unified list |
+
+#### Removed Commands (Split)
+
+| Old Command | New Commands | Notes |
+|------------|-------------|-------|
+| `thread kill` | `thread stop` or `thread cancel` | `stop` keeps thread active, `cancel` archives it |
+
+#### Moved Commands
+
+| Old Command | New Command | Notes |
+|------------|-------------|-------|
+| `thread steps` | `step list` | Moved to step layer |
+| `thread step-details` | `step show` | Moved to step layer |
+| `thread fork` | `step fork` | Moved to step layer (forks are step-based) |
+
+#### Deprecation Errors
+
+Old commands now show helpful error messages:
+
+```bash
+$ uwf thread step 01ARZ3NDEKTSV4RRFFQ69G5FAV
+Error: Command 'thread step' has been removed.
+Use 'thread exec' instead.
+
+For more information, see: uwf help thread exec
+```
+
+## Internal Structure
+
+```
+src/
+├── cli.ts              Commander entrypoint, command registration
+├── format.ts           JSON/YAML output formatting
+├── store.ts            CAS store + registry initialization
+├── validate.ts         Workflow YAML validation
+├── schemas.ts          CLI-local schema registration
+└── commands/
+    ├── thread.ts       Thread lifecycle and exec
+    ├── step.ts         Step operations (list/show/read/fork)
+    ├── workflow.ts     Workflow registry (add/show/list)
+    ├── cas.ts          CAS inspection and schema ops
+    ├── setup.ts        Interactive/non-interactive setup
+    ├── skill.ts        Built-in skill references
+    └── log.ts          Process debug log management
+```
+
+## Configuration
+
+| File | Purpose |
+|------|---------|
+| `~/.uncaged/workflow/config.yaml` | Providers, models, default agent |
+| `~/.uncaged/workflow/.env` | API keys (referenced by `apiKeyEnv` in config) |
+| `~/.uncaged/workflow/registry.yaml` | Workflow name → CAS hash |
+| `~/.uncaged/workflow/threads.yaml` | Active thread head pointers |
+| `~/.uncaged/workflow/cas/` | Content-addressed node storage |
@@ -0,0 +1,152 @@
+import { execSync } from "node:child_process";
+import { mkdir, rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { cmdCasPutText } from "../commands/cas.js";
+
+let storageRoot: string;
+let uwfPath: string;
+
+beforeEach(async () => {
+  storageRoot = join(
+    tmpdir(),
+    `uwf-cas-exit-test-${Date.now()}-${Math.random().toString(36).slice(2)}`,
+  );
+  await mkdir(storageRoot, { recursive: true });
+
+  // Find the uwf CLI path
+  uwfPath = join(__dirname, "../../src/cli.ts");
+});
+
+afterEach(async () => {
+  await rm(storageRoot, { recursive: true, force: true });
+});
+
+type ExecResult = {
+  stdout: string;
+  stderr: string;
+  exitCode: number;
+};
+
+function execUwf(args: string[]): ExecResult {
+  try {
+    const stdout = execSync(`bun ${uwfPath} ${args.join(" ")}`, {
+      env: { ...process.env, WORKFLOW_STORAGE_ROOT: storageRoot },
+      encoding: "utf-8",
+      stdio: ["pipe", "pipe", "pipe"],
+    });
+    return { stdout, stderr: "", exitCode: 0 };
+  } catch (error: unknown) {
+    if (
+      error &&
+      typeof error === "object" &&
+      "stdout" in error &&
+      "stderr" in error &&
+      "status" in error
+    ) {
+      return {
+        stdout: (error.stdout as Buffer | string).toString(),
+        stderr: (error.stderr as Buffer | string).toString(),
+        exitCode: error.status as number,
+      };
+    }
+    throw error;
+  }
+}
+
+describe("uwf cas has CLI exit codes", () => {
+  test("exits 0 when hash exists", async () => {
+    // Setup: Create a temp storage root, put a text node, capture hash
+    const putResult = await cmdCasPutText(storageRoot, "test content");
+    const hash = putResult.hash;
+
+    // Execute: uwf cas has <hash>
+    const result = execUwf(["cas", "has", hash]);
+
+    // Assert: stdout contains {"exists":true}, exit code === 0
+    expect(result.stdout).toContain('"exists":true');
+    expect(result.exitCode).toBe(0);
+  });
+
+  test("exits 1 when hash does not exist", () => {
+    // Setup: Create a temp storage root (empty CAS store)
+    // Execute: uwf cas has NOSUCHHASH123
+    const result = execUwf(["cas", "has", "NOSUCHHASH123"]);
+
+    // Assert: stdout contains {"exists":false}, exit code === 1
+    expect(result.stdout).toContain('"exists":false');
+    expect(result.exitCode).toBe(1);
+  });
+
+  test("JSON output format unchanged for exists=true", async () => {
+    // Setup: Create store, put node
+    const putResult = await cmdCasPutText(storageRoot, "test");
+    const hash = putResult.hash;
+
+    // Execute: uwf cas has <hash>
+    const result = execUwf(["cas", "has", hash]);
+
+    // Assert: stdout JSON parses correctly to {exists: true}
+    const parsed = JSON.parse(result.stdout.trim());
+    expect(parsed).toEqual({ exists: true });
+  });
+
+  test("JSON output format unchanged for exists=false", () => {
+    // Setup: Create empty store
+    // Execute: uwf cas has INVALID
+    const result = execUwf(["cas", "has", "INVALID"]);
+
+    // Assert: stdout JSON parses correctly to {exists: false}
+    const parsed = JSON.parse(result.stdout.trim());
+    expect(parsed).toEqual({ exists: false });
+  });
+
+  test("YAML output format preserves exit code behavior for exists=true", async () => {
+    // Setup: Create store with node
+    const putResult = await cmdCasPutText(storageRoot, "test");
+    const hash = putResult.hash;
+
+    // Execute: uwf --format yaml cas has <hash>
+    const result = execUwf(["--format", "yaml", "cas", "has", hash]);
+
+    // Assert: exit code === 0, output is YAML format
+    expect(result.exitCode).toBe(0);
+    expect(result.stdout).toContain("exists:");
+    expect(result.stdout).toContain("true");
+  });
+
+  test("YAML output format preserves exit code behavior for exists=false", () => {
+    // Setup: Create empty store
+    // Execute: uwf --format yaml cas has INVALID
+    const result = execUwf(["--format", "yaml", "cas", "has", "INVALID"]);
+
+    // Assert: exit code === 1, output is YAML format
+    expect(result.exitCode).toBe(1);
+    expect(result.stdout).toContain("exists:");
+    expect(result.stdout).toContain("false");
+  });
+});
+
+describe("regression: other cas commands unaffected", () => {
+  test("uwf cas get still exits 1 on not-found with error message", () => {
+    // Execute: uwf cas get NOSUCHHASH
+    const result = execUwf(["cas", "get", "NOSUCHHASH"]);
+
+    // Assert: exit code === 1, stderr contains "Node not found"
+    expect(result.exitCode).toBe(1);
+    expect(result.stderr).toContain("Node not found");
+  });
+
+  test("uwf cas put-text behavior unchanged", () => {
+    // Execute: uwf cas put-text "hello"
+    const result = execUwf(["cas", "put-text", "hello"]);
+
+    // Assert: exit code === 0, returns hash
+    expect(result.exitCode).toBe(0);
+    const parsed = JSON.parse(result.stdout.trim());
+    expect(parsed).toHaveProperty("hash");
+    expect(typeof parsed.hash).toBe("string");
+    expect(parsed.hash.length).toBe(13); // Crockford Base32 XXH64 hash length
+  });
+});
@@ -0,0 +1,74 @@
+import { mkdir, rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { cmdCasHas, cmdCasPutText } from "../commands/cas.js";
+
+let storageRoot: string;
+
+beforeEach(async () => {
+  storageRoot = join(tmpdir(), `uwf-cas-test-${Date.now()}-${Math.random().toString(36).slice(2)}`);
+  await mkdir(storageRoot, { recursive: true });
+});
+
+afterEach(async () => {
+  await rm(storageRoot, { recursive: true, force: true });
+});
+
+describe("cmdCasHas", () => {
+  test("returns {exists: true} for existing hash", async () => {
+    // Setup: Create a test store, put a node, get its hash
+    const putResult = await cmdCasPutText(storageRoot, "test content");
+    const hash = putResult.hash;
+
+    // Execute: Call cmdCasHas with the valid hash
+    const result = await cmdCasHas(storageRoot, hash);
+
+    // Assert: Result equals {exists: true}
+    expect(result).toEqual({ exists: true });
+  });
+
+  test("returns {exists: false} for non-existent hash", async () => {
+    // Setup: Create an empty test store
+    // (storageRoot already created in beforeEach)
+
+    // Execute: Call cmdCasHas with an invalid hash
+    const result = await cmdCasHas(storageRoot, "INVALIDHASH12");
+
+    // Assert: Result equals {exists: false}
+    expect(result).toEqual({ exists: false });
+  });
+
+  test("does not throw for non-existent hash", async () => {
+    // Setup: Create an empty test store
+    // Execute & Assert: Does not throw, returns {exists: false}
+    await expect(cmdCasHas(storageRoot, "NOSUCHHASH123")).resolves.toEqual({
+      exists: false,
+    });
+  });
+
+  test("handles malformed hash gracefully", async () => {
+    // Setup: Create a test store
+    // Execute: Call cmdCasHas with a too-short hash
+    const result = await cmdCasHas(storageRoot, "xyz");
+
+    // Assert: Returns {exists: false} (store.has() returns false)
+    expect(result).toEqual({ exists: false });
+  });
+
+  test("handles empty hash string", async () => {
+    // Execute: Call cmdCasHas with an empty string
+    const result = await cmdCasHas(storageRoot, "");
+
+    // Assert: Returns {exists: false}
+    expect(result).toEqual({ exists: false });
+  });
+
+  test("handles hash with special characters", async () => {
+    // Execute: Call cmdCasHas with special characters
+    const result = await cmdCasHas(storageRoot, "HASH!@#");
+
+    // Assert: Returns {exists: false}
+    expect(result).toEqual({ exists: false });
+  });
+});
@@ -0,0 +1,181 @@
+import { mkdir, readdir, rm, writeFile } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { cmdLogClean, cmdLogList, cmdLogShow } from "../commands/log.js";
+
+let storageRoot: string;
+
+beforeEach(async () => {
+  storageRoot = join(tmpdir(), `uwf-log-test-${Date.now()}-${Math.random().toString(36).slice(2)}`);
+  await mkdir(join(storageRoot, "logs"), { recursive: true });
+});
+
+afterEach(async () => {
+  await rm(storageRoot, { recursive: true, force: true });
+});
+
+const entry1 = JSON.stringify({
+  ts: "2026-05-20T10:00:00.000Z",
+  pid: "1716200000000-1234",
+  tag: "W9F3RK2M",
+  msg: "process start",
+  thread: "01J1234ABCDEF",
+  workflow: "solve-issue",
+});
+
+const entry2 = JSON.stringify({
+  ts: "2026-05-20T10:00:01.000Z",
+  pid: "1716200000000-1234",
+  tag: "ABC12345",
+  msg: "step executed",
+  thread: "01J1234ABCDEF",
+  workflow: "solve-issue",
+});
+
+const entry3 = JSON.stringify({
+  ts: "2026-05-20T10:00:02.000Z",
+  pid: "1716200000000-5678",
+  tag: "XYZ98765",
+  msg: "different process",
+  thread: "01JOTHER000000",
+  workflow: "review-code",
+});
+
+const oldEntry = JSON.stringify({
+  ts: "2026-05-19T08:00:00.000Z",
+  pid: "1716200000000-9999",
+  tag: "OLD1TAG1",
+  msg: "old entry",
+  thread: "01JOLD0000000",
+  workflow: "solve-issue",
+});
+
+const olderEntry = JSON.stringify({
+  ts: "2026-05-18T08:00:00.000Z",
+  pid: "1716200000000-0001",
+  tag: "OLD2TAG2",
+  msg: "older entry",
+  thread: "01JOLDER00000",
+  workflow: "review-code",
+});
+
+async function writeLogFiles(): Promise<void> {
+  const logsDir = join(storageRoot, "logs");
+  await writeFile(join(logsDir, "2026-05-20.jsonl"), `${[entry1, entry2, entry3].join("\n")}\n`);
+  await writeFile(join(logsDir, "2026-05-19.jsonl"), `${oldEntry}\n`);
+  await writeFile(join(logsDir, "2026-05-18.jsonl"), `${olderEntry}\n`);
+}
+
+describe("cmdLogList", () => {
+  test("lists log files with sizes sorted by date descending", async () => {
+    await writeLogFiles();
+    const result = await cmdLogList(storageRoot);
+    expect(result).toHaveLength(3);
+    expect(result[0].name).toBe("2026-05-20.jsonl");
+    expect(result[0].date).toBe("2026-05-20");
+    expect(result[0].size).toBeGreaterThan(0);
+    expect(result[1].name).toBe("2026-05-19.jsonl");
+    expect(result[2].name).toBe("2026-05-18.jsonl");
+  });
+
+  test("returns empty array when no log files exist", async () => {
+    const result = await cmdLogList(storageRoot);
+    expect(result).toEqual([]);
+  });
+
+  test("returns empty array when logs directory does not exist", async () => {
+    const noLogsRoot = join(storageRoot, "nonexistent");
+    await mkdir(noLogsRoot, { recursive: true });
+    const result = await cmdLogList(noLogsRoot);
+    expect(result).toEqual([]);
+  });
+});
+
+describe("cmdLogShow", () => {
+  test("filters by thread ID", async () => {
+    await writeLogFiles();
+    const result = await cmdLogShow(storageRoot, {
+      thread: "01J1234ABCDEF",
+      process: null,
+      date: null,
+    });
+    expect(result).toHaveLength(2);
+    expect(result.every((e) => e.thread === "01J1234ABCDEF")).toBe(true);
+  });
+
+  test("filters by process ID", async () => {
+    await writeLogFiles();
+    const result = await cmdLogShow(storageRoot, {
+      thread: null,
+      process: "1716200000000-1234",
+      date: null,
+    });
+    expect(result).toHaveLength(2);
+    expect(result.every((e) => e.pid === "1716200000000-1234")).toBe(true);
+  });
+
+  test("filters by date", async () => {
+    await writeLogFiles();
+    const result = await cmdLogShow(storageRoot, {
+      thread: null,
+      process: null,
+      date: "2026-05-19",
+    });
+    expect(result).toHaveLength(1);
+    expect(result[0].msg).toBe("old entry");
+  });
+
+  test("reads all files when no date filter", async () => {
+    await writeLogFiles();
+    const result = await cmdLogShow(storageRoot, { thread: null, process: null, date: null });
+    expect(result).toHaveLength(5);
+    // sorted by ts ascending
+    expect(result[0].ts).toBe("2026-05-18T08:00:00.000Z");
+    expect(result[4].ts).toBe("2026-05-20T10:00:02.000Z");
+  });
+
+  test("returns empty when no matches", async () => {
+    await writeLogFiles();
+    const result = await cmdLogShow(storageRoot, {
+      thread: "NONEXISTENT",
+      process: null,
+      date: null,
+    });
+    expect(result).toEqual([]);
+  });
+
+  test("combined thread + date filter", async () => {
+    await writeLogFiles();
+    const result = await cmdLogShow(storageRoot, {
+      thread: "01J1234ABCDEF",
+      process: null,
+      date: "2026-05-20",
+    });
+    expect(result).toHaveLength(2);
+    expect(result.every((e) => e.thread === "01J1234ABCDEF")).toBe(true);
+  });
+});
+
+describe("cmdLogClean", () => {
+  test("deletes files before given date", async () => {
+    await writeLogFiles();
+    const result = await cmdLogClean(storageRoot, "2026-05-20");
+    expect(result.deleted).toBe(2);
+    const remaining = await readdir(join(storageRoot, "logs"));
+    expect(remaining).toEqual(["2026-05-20.jsonl"]);
+  });
+
+  test("deletes nothing when all files are newer", async () => {
+    await writeLogFiles();
+    const result = await cmdLogClean(storageRoot, "2026-05-18");
+    expect(result.deleted).toBe(0);
+  });
+
+  test("handles missing logs directory gracefully", async () => {
+    const noLogsRoot = join(storageRoot, "nonexistent");
+    await mkdir(noLogsRoot, { recursive: true });
+    const result = await cmdLogClean(noLogsRoot, "2026-05-20");
+    expect(result).toEqual({ deleted: 0 });
+  });
+});
@@ -0,0 +1,108 @@
+import { mkdtemp, rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import type { CasRef, ThreadId } from "@uncaged/workflow-protocol";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { resolveHeadHash } from "../commands/shared.js";
+import { appendThreadHistory, saveThreadsIndex } from "../store.js";
+
+let tmpDir: string;
+
+beforeEach(async () => {
+  tmpDir = await mkdtemp(join(tmpdir(), "cli-uwf-resolve-head-"));
+});
+
+afterEach(async () => {
+  await rm(tmpDir, { recursive: true, force: true });
+});
+
+describe("resolveHeadHash", () => {
+  test("returns head hash from threads.yaml for active thread", async () => {
+    const threadId = "01JTEST0000000000000000001" as ThreadId;
+    const headHash = "active_hash_123" as CasRef;
+
+    await saveThreadsIndex(tmpDir, { [threadId]: headHash });
+
+    const result = await resolveHeadHash(tmpDir, threadId);
+
+    expect(result).toBe(headHash);
+  });
+
+  test("falls back to history.jsonl when thread not in threads.yaml", async () => {
+    const threadId = "01JTEST0000000000000000002" as ThreadId;
+    const headHash = "completed_hash_456" as CasRef;
+    const workflowHash = "workflow_hash_789" as CasRef;
+
+    // No entry in threads.yaml, only in history.jsonl
+    await saveThreadsIndex(tmpDir, {});
+    await appendThreadHistory(tmpDir, {
+      thread: threadId,
+      workflow: workflowHash,
+      head: headHash,
+      completedAt: Date.now(),
+    });
+
+    const result = await resolveHeadHash(tmpDir, threadId);
+
+    expect(result).toBe(headHash);
+  });
+
+  // Note: Testing the error case requires CLI-level testing because resolveHeadHash
+  // calls fail() which does process.exit(1), terminating the test runner.
+  // The error behavior is tested in integration tests below via CLI invocation.
+
+  test("prioritizes active thread over history when thread exists in both", async () => {
+    const threadId = "01JTEST0000000000000000004" as ThreadId;
+    const activeHash = "active_hash_v2" as CasRef;
+    const historicalHash = "historical_hash_v1" as CasRef;
+    const workflowHash = "workflow_hash_xyz" as CasRef;
+
+    // Thread exists in both locations (should not happen normally, but test the precedence)
+    await saveThreadsIndex(tmpDir, { [threadId]: activeHash });
+    await appendThreadHistory(tmpDir, {
+      thread: threadId,
+      workflow: workflowHash,
+      head: historicalHash,
+      completedAt: Date.now(),
+    });
+
+    const result = await resolveHeadHash(tmpDir, threadId);
+
+    // Should return the active head, not the historical one
+    expect(result).toBe(activeHash);
+  });
+
+  test("finds thread from multiple history entries", async () => {
+    const threadId1 = "01JTEST0000000000000000005" as ThreadId;
+    const threadId2 = "01JTEST0000000000000000006" as ThreadId;
+    const threadId3 = "01JTEST0000000000000000007" as ThreadId;
+    const hash1 = "hash_thread1" as CasRef;
+    const hash2 = "hash_thread2" as CasRef;
+    const hash3 = "hash_thread3" as CasRef;
+    const workflowHash = "workflow_hash_abc" as CasRef;
+
+    await saveThreadsIndex(tmpDir, {});
+    await appendThreadHistory(tmpDir, {
+      thread: threadId1,
+      workflow: workflowHash,
+      head: hash1,
+      completedAt: Date.now() - 2000,
+    });
+    await appendThreadHistory(tmpDir, {
+      thread: threadId2,
+      workflow: workflowHash,
+      head: hash2,
+      completedAt: Date.now() - 1000,
+    });
+    await appendThreadHistory(tmpDir, {
+      thread: threadId3,
+      workflow: workflowHash,
+      head: hash3,
+      completedAt: Date.now(),
+    });
+
+    const result = await resolveHeadHash(tmpDir, threadId2);
+
+    expect(result).toBe(hash2);
+  });
+});
@@ -0,0 +1,381 @@
+import { mkdirSync, writeFileSync } from "node:fs";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { afterEach, describe, expect, test, vi } from "vitest";
+import {
+  _discoverAgents,
+  _isBackspace,
+  _isTerminator,
+  _parseWhichOutput,
+  _printModelMenu,
+  _printProviderMenu,
+  _printValidationResult,
+  _resolveModelChoice,
+  _resolveProviderChoice,
+  _searchPathDirs,
+} from "../commands/setup.js";
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 1a. _searchPathDirs
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_searchPathDirs", () => {
+  test("returns empty array for empty PATH", async () => {
+    const result = await _searchPathDirs("");
+    expect(result).toEqual([]);
+  });
+
+  test("finds uwf-hermes in a single dir", async () => {
+    const dir = mkdirSync(join(tmpdir(), `uwf-test-${Date.now()}`), { recursive: true }) as
+      | string
+      | undefined;
+    const actualDir = dir ?? join(tmpdir(), `uwf-test-${Date.now()}`);
+    mkdirSync(actualDir, { recursive: true });
+    const filePath = join(actualDir, "uwf-hermes");
+    writeFileSync(filePath, "#!/bin/sh\n", { mode: 0o755 });
+    const result = await _searchPathDirs(actualDir);
+    expect(result).toContain("uwf-hermes");
+  });
+
+  test("skips non-uwf- prefixed binaries", async () => {
+    const dir = join(tmpdir(), `uwf-test-${Date.now()}-2`);
+    mkdirSync(dir, { recursive: true });
+    writeFileSync(join(dir, "hermes"), "#!/bin/sh\n", { mode: 0o755 });
+    writeFileSync(join(dir, "uwf-hermes"), "#!/bin/sh\n", { mode: 0o755 });
+    const result = await _searchPathDirs(dir);
+    expect(result).toEqual(["uwf-hermes"]);
+  });
+
+  test("skips entry named exactly 'uwf'", async () => {
+    const dir = join(tmpdir(), `uwf-test-${Date.now()}-3`);
+    mkdirSync(dir, { recursive: true });
+    writeFileSync(join(dir, "uwf"), "#!/bin/sh\n", { mode: 0o755 });
+    writeFileSync(join(dir, "uwf-hermes"), "#!/bin/sh\n", { mode: 0o755 });
+    const result = await _searchPathDirs(dir);
+    expect(result).toEqual(["uwf-hermes"]);
+  });
+
+  test("skips non-executable files", async () => {
+    const dir = join(tmpdir(), `uwf-test-${Date.now()}-4`);
+    mkdirSync(dir, { recursive: true });
+    writeFileSync(join(dir, "uwf-foo"), "#!/bin/sh\n", { mode: 0o644 });
+    const result = await _searchPathDirs(dir);
+    expect(result).toEqual([]);
+  });
+
+  test("deduplicates across PATH dirs", async () => {
+    const dir1 = join(tmpdir(), `uwf-test-${Date.now()}-5a`);
+    const dir2 = join(tmpdir(), `uwf-test-${Date.now()}-5b`);
+    mkdirSync(dir1, { recursive: true });
+    mkdirSync(dir2, { recursive: true });
+    writeFileSync(join(dir1, "uwf-hermes"), "#!/bin/sh\n", { mode: 0o755 });
+    writeFileSync(join(dir2, "uwf-hermes"), "#!/bin/sh\n", { mode: 0o755 });
+    const result = await _searchPathDirs(`${dir1}:${dir2}`);
+    expect(result).toEqual(["uwf-hermes"]);
+  });
+
+  test("returns sorted array", async () => {
+    const dir = join(tmpdir(), `uwf-test-${Date.now()}-6`);
+    mkdirSync(dir, { recursive: true });
+    writeFileSync(join(dir, "uwf-zoo"), "#!/bin/sh\n", { mode: 0o755 });
+    writeFileSync(join(dir, "uwf-alpha"), "#!/bin/sh\n", { mode: 0o755 });
+    writeFileSync(join(dir, "uwf-mid"), "#!/bin/sh\n", { mode: 0o755 });
+    const result = await _searchPathDirs(dir);
+    expect(result).toEqual(["uwf-alpha", "uwf-mid", "uwf-zoo"]);
+  });
+
+  test("skips inaccessible/nonexistent directories silently", async () => {
+    const result = await _searchPathDirs("/nonexistent-dir-xyz-abc-12345");
+    expect(result).toEqual([]);
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 1b. _parseWhichOutput
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_parseWhichOutput", () => {
+  test("returns empty array for empty string", () => {
+    expect(_parseWhichOutput("")).toEqual([]);
+  });
+
+  test("parses single path", () => {
+    expect(_parseWhichOutput("/usr/local/bin/uwf-hermes")).toEqual(["uwf-hermes"]);
+  });
+
+  test("parses multiple paths", () => {
+    expect(_parseWhichOutput("/usr/local/bin/uwf-hermes\n/usr/bin/uwf-claude-code")).toEqual([
+      "uwf-claude-code",
+      "uwf-hermes",
+    ]);
+  });
+
+  test("deduplicates identical basenames from different dirs", () => {
+    expect(_parseWhichOutput("/a/uwf-hermes\n/b/uwf-hermes")).toEqual(["uwf-hermes"]);
+  });
+
+  test("skips blank lines", () => {
+    expect(_parseWhichOutput("/a/uwf-hermes\n\n/b/uwf-cursor")).toEqual([
+      "uwf-cursor",
+      "uwf-hermes",
+    ]);
+  });
+
+  test("skips entry named exactly 'uwf'", () => {
+    expect(_parseWhichOutput("/usr/bin/uwf")).toEqual([]);
+  });
+
+  test("skips basenames not starting with uwf-", () => {
+    expect(_parseWhichOutput("/usr/bin/node")).toEqual([]);
+  });
+
+  test("returns sorted array", () => {
+    expect(_parseWhichOutput("/a/uwf-zoo\n/a/uwf-alpha")).toEqual(["uwf-alpha", "uwf-zoo"]);
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 2a. _isTerminator
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_isTerminator", () => {
+  test("\\n is a terminator", () => {
+    expect(_isTerminator("\n")).toBe(true);
+  });
+  test("\\r is a terminator", () => {
+    expect(_isTerminator("\r")).toBe(true);
+  });
+  test("\\u0004 (EOT) is a terminator", () => {
+    expect(_isTerminator("")).toBe(true);
+  });
+  test("regular char is not a terminator", () => {
+    expect(_isTerminator("a")).toBe(false);
+  });
+  test("empty string is not a terminator", () => {
+    expect(_isTerminator("")).toBe(false);
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 2b. _isBackspace
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_isBackspace", () => {
+  test("\\u007F is a backspace", () => {
+    expect(_isBackspace("")).toBe(true);
+  });
+  test("\\b is a backspace", () => {
+    expect(_isBackspace("\b")).toBe(true);
+  });
+  test("regular char is not a backspace", () => {
+    expect(_isBackspace("x")).toBe(false);
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 3a. _printProviderMenu
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_printProviderMenu", () => {
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
+  const providers = [
+    { name: "openai", label: "OpenAI", baseUrl: "https://api.openai.com/v1" },
+    { name: "xai", label: "xAI", baseUrl: "https://api.x.ai/v1" },
+  ] as const;
+
+  test("prints correct number of lines (one per provider + custom)", () => {
+    const lines: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      lines.push(msg);
+    });
+    _printProviderMenu(providers);
+    // 2 providers + 1 custom = 3 lines
+    expect(lines.length).toBe(3);
+  });
+
+  test("custom option number = providers.length + 1", () => {
+    const lines: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      lines.push(msg);
+    });
+    _printProviderMenu(providers);
+    const lastLine = lines[lines.length - 1] ?? "";
+    expect(lastLine).toMatch(/3\)/);
+  });
+
+  test("each provider line contains its label and baseUrl", () => {
+    const lines: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      lines.push(msg);
+    });
+    _printProviderMenu(providers);
+    expect(lines[0]).toContain("OpenAI");
+    expect(lines[0]).toContain("https://api.openai.com/v1");
+    expect(lines[1]).toContain("xAI");
+    expect(lines[1]).toContain("https://api.x.ai/v1");
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 3b. _resolveProviderChoice
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_resolveProviderChoice", () => {
+  const providers = [
+    { name: "openai", label: "OpenAI", baseUrl: "https://api.openai.com/v1" },
+    { name: "xai", label: "xAI", baseUrl: "https://api.x.ai/v1" },
+    { name: "deepseek", label: "DeepSeek", baseUrl: "https://api.deepseek.com/v1" },
+  ] as const;
+
+  test("valid index 1 returns first provider", () => {
+    const result = _resolveProviderChoice("1", providers);
+    expect(result).toEqual({ providerName: "openai", baseUrl: "https://api.openai.com/v1" });
+  });
+
+  test("valid index N (last preset) returns last provider", () => {
+    const result = _resolveProviderChoice("3", providers);
+    expect(result).toEqual({ providerName: "deepseek", baseUrl: "https://api.deepseek.com/v1" });
+  });
+
+  test("index providers.length+1 (custom) returns null", () => {
+    const result = _resolveProviderChoice("4", providers);
+    expect(result).toBeNull();
+  });
+
+  test("non-numeric string returns null", () => {
+    expect(_resolveProviderChoice("abc", providers)).toBeNull();
+  });
+
+  test("0 returns null (out of range)", () => {
+    expect(_resolveProviderChoice("0", providers)).toBeNull();
+  });
+
+  test("N+2 returns null (out of range)", () => {
+    expect(_resolveProviderChoice("5", providers)).toBeNull();
+  });
+
+  test("negative number returns null", () => {
+    expect(_resolveProviderChoice("-1", providers)).toBeNull();
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 3c. _resolveModelChoice
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_resolveModelChoice", () => {
+  test("numeric input within range returns model at that index", () => {
+    expect(_resolveModelChoice("2", ["a", "b", "c"])).toBe("b");
+  });
+
+  test("numeric input out of range returns input as-is", () => {
+    expect(_resolveModelChoice("5", ["a"])).toBe("5");
+  });
+
+  test("non-numeric input returns input as-is", () => {
+    expect(_resolveModelChoice("gpt-4o", ["a", "b"])).toBe("gpt-4o");
+  });
+
+  test("numeric input 1 returns first model", () => {
+    expect(_resolveModelChoice("1", ["alpha", "beta"])).toBe("alpha");
+  });
+
+  test("empty models list with numeric input returns input as-is", () => {
+    expect(_resolveModelChoice("1", [])).toBe("1");
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 3d. _printModelMenu
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_printModelMenu", () => {
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
+  test("prints all models — each model name appears in output", () => {
+    const output: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      output.push(msg);
+    });
+    const models = ["model-a", "model-b", "model-c"];
+    _printModelMenu(models, 100);
+    const combined = output.join("\n");
+    for (const m of models) {
+      expect(combined).toContain(m);
+    }
+  });
+
+  test("single column when termCols is very small", () => {
+    const output: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      output.push(msg);
+    });
+    _printModelMenu(["a", "b", "c"], 1);
+    // Each model on its own row → 3 lines
+    expect(output.length).toBe(3);
+  });
+
+  test("wide terminal fits multiple columns", () => {
+    const output: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      output.push(msg);
+    });
+    const models = Array.from({ length: 6 }, (_, i) => `m${i}`);
+    _printModelMenu(models, 200);
+    // With wide terminal and short names, should fit in fewer than 6 rows
+    expect(output.length).toBeLessThan(6);
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 3e. _printValidationResult
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_printValidationResult", () => {
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
+  test("ok=true prints success message containing '✓'", () => {
+    const lines: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      lines.push(msg);
+    });
+    _printValidationResult({ ok: true, error: null });
+    expect(lines.join("\n")).toContain("✓");
+  });
+
+  test("ok=false prints warning message containing '⚠'", () => {
+    const lines: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      lines.push(msg);
+    });
+    _printValidationResult({ ok: false, error: "HTTP 401" });
+    expect(lines.join("\n")).toContain("⚠");
+  });
+
+  test("ok=false includes the error string in output", () => {
+    const lines: string[] = [];
+    vi.spyOn(console, "log").mockImplementation((msg: string) => {
+      lines.push(msg);
+    });
+    _printValidationResult({ ok: false, error: "HTTP 401" });
+    expect(lines.join("\n")).toContain("HTTP 401");
+  });
+});
+
+// ──────────────────────────────────────────────────────────────────────────────
+// 4. Regression
+// ──────────────────────────────────────────────────────────────────────────────
+
+describe("_discoverAgents regression", () => {
+  test("returns an array (may be empty) — never throws", async () => {
+    const result = await _discoverAgents();
+    expect(Array.isArray(result)).toBe(true);
+  });
+});
@@ -0,0 +1,150 @@
+import { mkdtemp, rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { afterEach, beforeEach, describe, expect, test, vi } from "vitest";
+import { cmdSetup, validateModel } from "../commands/setup.js";
+
+describe("validateModel", () => {
+  const BASE_URL = "https://api.example.com/v1";
+  const API_KEY = "sk-test-key";
+  const MODEL = "test-model";
+
+  afterEach(() => {
+    vi.restoreAllMocks();
+  });
+
+  test("success path — returns ok on 200", async () => {
+    const mockFetch = vi
+      .spyOn(globalThis, "fetch")
+      .mockResolvedValue(new Response(JSON.stringify({}), { status: 200 }));
+
+    const result = await validateModel(BASE_URL, API_KEY, MODEL);
+
+    expect(result).toEqual({ ok: true, value: undefined });
+    expect(mockFetch).toHaveBeenCalledOnce();
+
+    const [url, opts] = mockFetch.mock.calls[0]!;
+    expect(url).toBe(`${BASE_URL}/chat/completions`);
+    expect((opts as RequestInit).headers).toEqual(
+      expect.objectContaining({ Authorization: `Bearer ${API_KEY}` }),
+    );
+    const body = JSON.parse((opts as RequestInit).body as string);
+    expect(body).toEqual({
+      model: MODEL,
+      messages: [{ role: "user", content: "hi" }],
+      max_tokens: 1,
+    });
+  });
+
+  test("HTTP 401 — returns error containing 401", async () => {
+    vi.spyOn(globalThis, "fetch").mockResolvedValue(
+      new Response("Unauthorized", { status: 401, statusText: "Unauthorized" }),
+    );
+
+    const result = await validateModel(BASE_URL, API_KEY, MODEL);
+
+    expect(result.ok).toBe(false);
+    if (!result.ok) {
+      expect(result.error).toContain("401");
+    }
+  });
+
+  test("HTTP 404 — returns error containing 404", async () => {
+    vi.spyOn(globalThis, "fetch").mockResolvedValue(
+      new Response("Not Found", { status: 404, statusText: "Not Found" }),
+    );
+
+    const result = await validateModel(BASE_URL, API_KEY, MODEL);
+
+    expect(result.ok).toBe(false);
+    if (!result.ok) {
+      expect(result.error).toContain("404");
+    }
+  });
+
+  test("network timeout — returns error mentioning timeout", async () => {
+    const err = new DOMException("signal timed out", "AbortError");
+    vi.spyOn(globalThis, "fetch").mockRejectedValue(err);
+
+    const result = await validateModel(BASE_URL, API_KEY, MODEL);
+
+    expect(result.ok).toBe(false);
+    if (!result.ok) {
+      expect(result.error.toLowerCase()).toMatch(/timeout|timed out/);
+    }
+  });
+
+  test("network error (DNS/connection) — returns error mentioning connectivity", async () => {
+    vi.spyOn(globalThis, "fetch").mockRejectedValue(new TypeError("fetch failed"));
+
+    const result = await validateModel(BASE_URL, API_KEY, MODEL);
+
+    expect(result.ok).toBe(false);
+    if (!result.ok) {
+      expect(result.error.toLowerCase()).toMatch(/connect|reach|network/);
+    }
+  });
+
+  test("request body correctness", async () => {
+    const mockFetch = vi
+      .spyOn(globalThis, "fetch")
+      .mockResolvedValue(new Response(JSON.stringify({}), { status: 200 }));
+
+    await validateModel(BASE_URL, API_KEY, "my-special-model");
+
+    const body = JSON.parse((mockFetch.mock.calls[0]![1] as RequestInit).body as string);
+    expect(body).toEqual({
+      model: "my-special-model",
+      messages: [{ role: "user", content: "hi" }],
+      max_tokens: 1,
+    });
+  });
+});
+
+describe("cmdSetup with validation", () => {
+  let storageRoot: string;
+
+  beforeEach(async () => {
+    storageRoot = await mkdtemp(join(tmpdir(), "uwf-setup-validate-"));
+  });
+
+  afterEach(async () => {
+    vi.restoreAllMocks();
+    await rm(storageRoot, { recursive: true, force: true });
+  });
+
+  const setupArgs = () => ({
+    provider: "testprovider",
+    baseUrl: "https://api.test.com/v1",
+    apiKey: "sk-test",
+    model: "test-model",
+    storageRoot,
+  });
+
+  test("includes validation result on success", async () => {
+    vi.spyOn(globalThis, "fetch").mockResolvedValue(
+      new Response(JSON.stringify({}), { status: 200 }),
+    );
+
+    const result = await cmdSetup(setupArgs());
+
+    expect(result.validation).toEqual({ ok: true, value: undefined });
+    // Config files should still be written
+    expect(result.configPath).toBeTruthy();
+    expect(result.envPath).toBeTruthy();
+  });
+
+  test("includes validation failure — config still saved", async () => {
+    vi.spyOn(globalThis, "fetch").mockResolvedValue(
+      new Response("Unauthorized", { status: 401, statusText: "Unauthorized" }),
+    );
+
+    const result = await cmdSetup(setupArgs());
+
+    expect(result.validation).toBeDefined();
+    expect((result.validation as { ok: boolean }).ok).toBe(false);
+    // Config files should still be written despite validation failure
+    expect(result.configPath).toBeTruthy();
+    expect(result.envPath).toBeTruthy();
+  });
+});
@@ -0,0 +1,98 @@
+import { readFile } from "node:fs/promises";
+import { join } from "node:path";
+import type { WorkflowPayload } from "@uncaged/workflow-protocol";
+import { describe, expect, test } from "vitest";
+import { parse } from "yaml";
+
+/**
+ * Test: Issue #474 - tea pr create fails in git worktree directories
+ *
+ * This test verifies that the solve-issue workflow's committer role
+ * includes the --repo flag when running tea pr create, which fixes
+ * the "path segment [0] is empty" error in worktree directories.
+ */
+
+describe("solve-issue workflow: tea pr create worktree fix", () => {
+  // Navigate up from packages/cli-workflow to repo root
+  const workflowPath = join(process.cwd(), "..", "..", ".workflows", "solve-issue.yaml");
+
+  test("committer procedure should include --repo flag in tea pr create command", async () => {
+    const yamlContent = await readFile(workflowPath, "utf-8");
+    const workflow = parse(yamlContent) as WorkflowPayload;
+
+    expect(workflow.roles.committer).toBeDefined();
+    const committerProcedure = workflow.roles.committer?.procedure;
+    expect(committerProcedure).toBeDefined();
+
+    // Verify the procedure includes tea pr create with --repo flag
+    expect(committerProcedure).toContain("tea pr create");
+    expect(committerProcedure).toContain("--repo");
+
+    // Verify the --repo flag appears before or together with tea pr create
+    // This ensures the command is: tea pr create --repo <owner/repo> ...
+    const teaPrCreateMatch = committerProcedure?.match(/tea pr create[^\n]*/);
+    expect(teaPrCreateMatch).not.toBeNull();
+
+    if (teaPrCreateMatch) {
+      const teaCommandLine = teaPrCreateMatch[0];
+      expect(teaCommandLine).toContain("--repo");
+    }
+  });
+
+  test("committer procedure should mention repo extraction from git remote", async () => {
+    const yamlContent = await readFile(workflowPath, "utf-8");
+    const workflow = parse(yamlContent) as WorkflowPayload;
+
+    const committerProcedure = workflow.roles.committer?.procedure;
+    expect(committerProcedure).toBeDefined();
+
+    // Verify the procedure mentions extracting repo info from git remote
+    // This ensures fallback logic is documented
+    expect(committerProcedure).toMatch(/git remote/i);
+  });
+
+  test("committer procedure should include error handling for tea failures", async () => {
+    const yamlContent = await readFile(workflowPath, "utf-8");
+    const workflow = parse(yamlContent) as WorkflowPayload;
+
+    const committerProcedure = workflow.roles.committer?.procedure;
+    expect(committerProcedure).toBeDefined();
+
+    // Verify the procedure includes error handling guidance
+    // This ensures we capture failures and provide actionable output
+    expect(committerProcedure).toMatch(/error|fail/i);
+  });
+
+  test("workflow should be parseable as valid WorkflowPayload", async () => {
+    const yamlContent = await readFile(workflowPath, "utf-8");
+    const workflow = parse(yamlContent) as WorkflowPayload;
+
+    // Basic structure validation
+    expect(workflow.name).toBe("solve-issue");
+    expect(workflow.roles).toBeDefined();
+    expect(workflow.graph).toBeDefined();
+
+    // Verify committer role exists with required fields
+    expect(workflow.roles.committer).toBeDefined();
+    expect(workflow.roles.committer?.description).toBeDefined();
+    expect(workflow.roles.committer?.goal).toBeDefined();
+    expect(workflow.roles.committer?.procedure).toBeDefined();
+    expect(workflow.roles.committer?.output).toBeDefined();
+    expect(workflow.roles.committer?.frontmatter).toBeDefined();
+  });
+
+  test("committer frontmatter schema should be oneOf with $status discriminant", async () => {
+    const yamlContent = await readFile(workflowPath, "utf-8");
+    // Parse as any to access the raw YAML structure (frontmatter is inline JSON Schema in YAML)
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const workflow = parse(yamlContent) as any;
+    const frontmatter = workflow.roles.committer?.frontmatter;
+    expect(frontmatter).toBeDefined();
+    expect(frontmatter?.oneOf).toBeDefined();
+    const committedVariant = frontmatter.oneOf.find(
+      (v: any) => v.properties?.["$status"]?.const === "committed",
+    );
+    expect(committedVariant).toBeDefined();
+    expect(committedVariant.required).toContain("$status");
+  });
+});
@@ -0,0 +1,519 @@
+import { mkdir, mkdtemp, rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { bootstrap, putSchema } from "@uncaged/json-cas";
+import { createFsStore } from "@uncaged/json-cas-fs";
+import type { CasRef } from "@uncaged/workflow-protocol";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { cmdStepRead } from "../commands/step.js";
+import { registerUwfSchemas } from "../schemas.js";
+
+// ── schemas used in tests ────────────────────────────────────────────────────
+
+const TURN_SCHEMA = {
+  title: "hermes-turn",
+  type: "object" as const,
+  required: ["index", "role", "content"],
+  properties: {
+    index: { type: "integer" as const },
+    role: { type: "string" as const },
+    content: { type: "string" as const },
+    toolCalls: {
+      anyOf: [
+        { type: "array" as const, items: { type: "object" as const } },
+        { type: "null" as const },
+      ],
+    },
+    reasoning: { anyOf: [{ type: "string" as const }, { type: "null" as const }] },
+  },
+  additionalProperties: false,
+};
+
+const DETAIL_SCHEMA = {
+  title: "hermes-detail",
+  type: "object" as const,
+  required: ["sessionId", "model", "duration", "turnCount", "turns"],
+  properties: {
+    sessionId: { type: "string" as const },
+    model: { type: "string" as const },
+    duration: { type: "integer" as const },
+    turnCount: { type: "integer" as const },
+    turns: {
+      type: "array" as const,
+      items: { type: "string" as const, format: "cas_ref" },
+    },
+  },
+  additionalProperties: false,
+};
+
+// ── helpers ───────────────────────────────────────────────────────────────────
+
+async function registerDetailSchemas(store: ReturnType<typeof createFsStore>) {
+  await bootstrap(store);
+  const [turn, detail] = await Promise.all([
+    putSchema(store, TURN_SCHEMA),
+    putSchema(store, DETAIL_SCHEMA),
+  ]);
+  return { turn, detail };
+}
+
+function generateContent(size: number, prefix = "Content"): string {
+  const base = `${prefix} `;
+  const repeat = Math.ceil(size / base.length);
+  return base.repeat(repeat).slice(0, size);
+}
+
+// ── fixture ───────────────────────────────────────────────────────────────────
+
+let tmpDir: string;
+
+beforeEach(async () => {
+  tmpDir = await mkdtemp(join(tmpdir(), "cli-uwf-step-read-test-"));
+});
+
+afterEach(async () => {
+  await rm(tmpDir, { recursive: true, force: true });
+});
+
+// ── step read tests ───────────────────────────────────────────────────────────
+
+describe("step read", () => {
+  test("test 1: basic single-step read with 3 turns", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create 3 turns
+    const turnHashes: CasRef[] = [];
+    for (let i = 1; i <= 3; i++) {
+      const content = `Turn ${i} content with some text to make it readable.`;
+      const turnHash = await store.put(detailSchemas.turn, {
+        index: i - 1,
+        role: "assistant",
+        content,
+        toolCalls: null,
+        reasoning: null,
+      });
+      turnHashes.push(turnHash);
+    }
+
+    const detailHash = await store.put(detailSchemas.detail, {
+      sessionId: "session-1",
+      model: "test-model",
+      duration: 1000,
+      turnCount: 3,
+      turns: turnHashes,
+    });
+
+    const stepHash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-test",
+    });
+
+    // Read step with large quota
+    const markdown = await cmdStepRead(tmpDir, stepHash, 10000);
+
+    // Assert structure
+    expect(markdown).toContain(`# Step ${stepHash}`);
+    expect(markdown).toContain("**Role:** worker");
+    expect(markdown).toContain("**Agent:** uwf-test");
+    expect(markdown).toContain("## Turn 1");
+    expect(markdown).toContain("## Turn 2");
+    expect(markdown).toContain("## Turn 3");
+    expect(markdown).toContain("Turn 1 content with some text to make it readable.");
+    expect(markdown).toContain("Turn 2 content with some text to make it readable.");
+    expect(markdown).toContain("Turn 3 content with some text to make it readable.");
+  });
+
+  test("test 2: quota enforcement - multiple turns", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create 4 turns of ~300 chars each
+    const turnHashes: CasRef[] = [];
+    for (let i = 1; i <= 4; i++) {
+      const content = generateContent(300, `Turn${i}`);
+      const turnHash = await store.put(detailSchemas.turn, {
+        index: i - 1,
+        role: "assistant",
+        content,
+        toolCalls: null,
+        reasoning: null,
+      });
+      turnHashes.push(turnHash);
+    }
+
+    const detailHash = await store.put(detailSchemas.detail, {
+      sessionId: "session-1",
+      model: "test-model",
+      duration: 1000,
+      turnCount: 4,
+      turns: turnHashes,
+    });
+
+    const stepHash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-test",
+    });
+
+    // Read step with limited quota (700 chars)
+    const markdown = await cmdStepRead(tmpDir, stepHash, 700);
+
+    // Assert only most recent turns fit
+    expect(markdown).toContain(`# Step ${stepHash}`);
+    // Should have skip hint
+    expect(markdown).toContain("Earlier turns omitted");
+    // Should include at least Turn 4 (most recent)
+    expect(markdown).toContain("Turn4");
+    // Total length should respect quota (with tolerance for structural overhead)
+    expect(markdown.length).toBeLessThanOrEqual(900); // 700 quota + 200 buffer tolerance
+  });
+
+  test("test 3: minimal quota edge case - always show at least one turn", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create 1 turn of 500 chars
+    const content = generateContent(500, "LongTurn");
+    const turnHash = await store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content,
+      toolCalls: null,
+      reasoning: null,
+    });
+
+    const detailHash = await store.put(detailSchemas.detail, {
+      sessionId: "session-1",
+      model: "test-model",
+      duration: 1000,
+      turnCount: 1,
+      turns: [turnHash],
+    });
+
+    const stepHash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-test",
+    });
+
+    // Read step with minimal quota (1 char)
+    const markdown = await cmdStepRead(tmpDir, stepHash, 1);
+
+    // Assert at least one turn is always shown
+    expect(markdown).toContain("LongTurn");
+    expect(markdown.length).toBeGreaterThan(1);
+  });
+
+  test("test 4: step with no detail field", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const stepHash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    // Read step - should return metadata only (no error)
+    const markdown = await cmdStepRead(tmpDir, stepHash, 4000);
+
+    // Assert metadata is present
+    expect(markdown).toContain(`# Step ${stepHash}`);
+    expect(markdown).toContain("**Role:** worker");
+    expect(markdown).toContain("**Agent:** uwf-test");
+    // Should not have turn sections
+    expect(markdown).not.toContain("## Turn");
+  });
+
+  test("test 5: step with detail but no turns array", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create detail with different schema (no turns)
+    const SIMPLE_DETAIL_SCHEMA = {
+      title: "simple-detail",
+      type: "object" as const,
+      required: ["sessionId"],
+      properties: {
+        sessionId: { type: "string" as const },
+      },
+      additionalProperties: false,
+    };
+
+    await bootstrap(store);
+    const simpleDetailType = await putSchema(store, SIMPLE_DETAIL_SCHEMA);
+    const detailHash = await store.put(simpleDetailType, {
+      sessionId: "session-1",
+    });
+
+    const stepHash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-test",
+    });
+
+    // Read step - should return metadata only (no error)
+    const markdown = await cmdStepRead(tmpDir, stepHash, 4000);
+
+    // Assert metadata is present
+    expect(markdown).toContain(`# Step ${stepHash}`);
+    expect(markdown).toContain("**Role:** worker");
+    // Should not have turn sections
+    expect(markdown).not.toContain("## Turn");
+  });
+
+  test("test 6: turn content with special characters", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create turn with special markdown characters
+    const content = "This has `backticks`, **bold**, *italic*, and [links](http://example.com)";
+    const turnHash = await store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content,
+      toolCalls: null,
+      reasoning: null,
+    });
+
+    const detailHash = await store.put(detailSchemas.detail, {
+      sessionId: "session-1",
+      model: "test-model",
+      duration: 1000,
+      turnCount: 1,
+      turns: [turnHash],
+    });
+
+    const stepHash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-test",
+    });
+
+    // Read step
+    const markdown = await cmdStepRead(tmpDir, stepHash, 4000);
+
+    // Assert content is rendered correctly without corruption
+    expect(markdown).toContain("`backticks`");
+    expect(markdown).toContain("**bold**");
+    expect(markdown).toContain("*italic*");
+    expect(markdown).toContain("[links](http://example.com)");
+  });
+});
@@ -0,0 +1,550 @@
+import { mkdir, mkdtemp, rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import type { CasRef, ThreadId } from "@uncaged/workflow-protocol";
+import { extractUlidTimestamp, generateUlid } from "@uncaged/workflow-util";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { createMarker, deleteMarker } from "../background/index.js";
+import { cmdThreadList } from "../commands/thread.js";
+import { parseTimeInput } from "../commands/thread-time-parser.js";
+import type { UwfStore } from "../store.js";
+import { appendThreadHistory, createUwfStore, saveThreadsIndex } from "../store.js";
+
+// ── helpers ───────────────────────────────────────────────────────────────────
+
+async function makeUwfStore(storageRoot: string): Promise<UwfStore> {
+  const casDir = join(storageRoot, "cas");
+  await mkdir(casDir, { recursive: true });
+  return createUwfStore(storageRoot);
+}
+
+async function createTestWorkflow(uwf: UwfStore): Promise<CasRef> {
+  const workflowPayload = {
+    name: "test-workflow",
+    roles: {
+      role1: {
+        goal: "test goal",
+        outputSchema: { type: "object" as const, properties: {} },
+      },
+    },
+    graph: { start: "role1" },
+    conditions: {},
+  };
+  return await uwf.store.put(uwf.schemas.workflow, workflowPayload);
+}
+
+async function createTestThread(
+  uwf: UwfStore,
+  storageRoot: string,
+  workflowHash: CasRef,
+  timestamp: number,
+): Promise<ThreadId> {
+  const threadId = generateUlid(timestamp) as ThreadId;
+  const startPayload = {
+    workflow: workflowHash,
+    prompt: "test prompt",
+  };
+  const headHash = await uwf.store.put(uwf.schemas.startNode, startPayload);
+  const index = await import("../store.js").then((m) => m.loadThreadsIndex(storageRoot));
+  index[threadId] = headHash;
+  await saveThreadsIndex(storageRoot, index);
+  return threadId;
+}
+
+async function markThreadRunning(storageRoot: string, threadId: ThreadId, workflow: CasRef) {
+  await createMarker(storageRoot, {
+    thread: threadId,
+    workflow,
+    pid: process.pid, // Use current process PID so isPidAlive returns true
+    startedAt: Date.now(),
+  });
+}
+
+async function completeThread(
+  storageRoot: string,
+  threadId: ThreadId,
+  workflowHash: CasRef,
+  headHash: CasRef,
+) {
+  const index = await import("../store.js").then((m) => m.loadThreadsIndex(storageRoot));
+  delete index[threadId];
+  await saveThreadsIndex(storageRoot, index);
+  await appendThreadHistory(storageRoot, {
+    thread: threadId,
+    workflow: workflowHash,
+    head: headHash,
+    completedAt: Date.now(),
+  });
+}
+
+// ── test setup ────────────────────────────────────────────────────────────────
+
+let tmpDir: string;
+
+beforeEach(async () => {
+  tmpDir = await mkdtemp(join(tmpdir(), "thread-list-filters-test-"));
+});
+
+afterEach(async () => {
+  await rm(tmpDir, { recursive: true, force: true });
+});
+
+// ── status filter tests ───────────────────────────────────────────────────────
+
+describe("cmdThreadList status filter", () => {
+  test("should return idle and running threads when status=active", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const thread1 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
+    const thread2 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
+    const thread3 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
+
+    await markThreadRunning(tmpDir, thread2, workflowHash);
+
+    const index = await import("../store.js").then((m) => m.loadThreadsIndex(tmpDir));
+    const thread3Head = index[thread3];
+    if (thread3Head === undefined) throw new Error("thread3 head not found");
+    await completeThread(tmpDir, thread3, workflowHash, thread3Head);
+
+    const result = await cmdThreadList(tmpDir, ["idle", "running"], null, null, null, null);
+
+    expect(result).toHaveLength(2);
+    expect(result.map((r) => r.thread).sort()).toEqual([thread1, thread2].sort());
+
+    // Clean up marker after test
+    await deleteMarker(tmpDir, thread2);
+  });
+
+  test("should support comma-separated status values", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const thread1 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
+    const thread2 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
+    const thread3 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
+
+    await markThreadRunning(tmpDir, thread2, workflowHash);
+
+    const index = await import("../store.js").then((m) => m.loadThreadsIndex(tmpDir));
+    const thread3Head = index[thread3];
+    if (thread3Head === undefined) throw new Error("thread3 head not found");
+    await completeThread(tmpDir, thread3, workflowHash, thread3Head);
+
+    const result = await cmdThreadList(tmpDir, ["idle", "completed"], null, null, null, null);
+
+    // Clean up marker
+    await deleteMarker(tmpDir, thread2);
+
+    // thread2 is running (not idle), so should not be included
+    // Expected: thread1 (idle) and thread3 (completed)
+    expect(result).toHaveLength(2);
+    expect(result.map((r) => r.thread).sort()).toEqual([thread1, thread3].sort());
+  });
+
+  test("should support single status filter (backward compat)", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const _thread1 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
+    const _thread2 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
+    const thread3 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
+
+    const index = await import("../store.js").then((m) => m.loadThreadsIndex(tmpDir));
+    const thread3Head = index[thread3];
+    if (thread3Head === undefined) throw new Error("thread3 head not found");
+    await completeThread(tmpDir, thread3, workflowHash, thread3Head);
+
+    const result = await cmdThreadList(tmpDir, ["completed"], null, null, null, null);
+
+    expect(result).toHaveLength(1);
+    expect(result[0]?.thread).toBe(thread3);
+    expect(result[0]?.status).toBe("completed");
+  });
+
+  test("should return all threads when no status filter provided", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const thread1 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
+    const thread2 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
+    const thread3 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
+
+    await markThreadRunning(tmpDir, thread2, workflowHash);
+
+    const index = await import("../store.js").then((m) => m.loadThreadsIndex(tmpDir));
+    const thread3Head = index[thread3];
+    if (thread3Head === undefined) throw new Error("thread3 head not found");
+    await completeThread(tmpDir, thread3, workflowHash, thread3Head);
+
+    const result = await cmdThreadList(tmpDir, null, null, null, null, null);
+
+    expect(result).toHaveLength(3);
+    expect(result.map((r) => r.thread).sort()).toEqual([thread1, thread2, thread3].sort());
+  });
+});
+
+// ── time range filtering tests ────────────────────────────────────────────────
+
+describe("cmdThreadList time filters", () => {
+  test("should filter threads created after given timestamp", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const ts1 = Date.UTC(2026, 4, 20, 0, 0, 0);
+    const ts2 = Date.UTC(2026, 4, 21, 0, 0, 0);
+    const ts3 = Date.UTC(2026, 4, 22, 0, 0, 0);
+
+    const _threadA = await createTestThread(uwf, tmpDir, workflowHash, ts1);
+    const threadB = await createTestThread(uwf, tmpDir, workflowHash, ts2);
+    const threadC = await createTestThread(uwf, tmpDir, workflowHash, ts3);
+
+    // Use a timestamp slightly before ts2 to include threadB
+    const afterMs = Date.UTC(2026, 4, 20, 12, 0, 0);
+    const result = await cmdThreadList(tmpDir, null, afterMs, null, null, null);
+
+    expect(result).toHaveLength(2);
+    expect(result.map((r) => r.thread).sort()).toEqual([threadB, threadC].sort());
+  });
+
+  test("should filter threads created before given timestamp", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const ts1 = Date.UTC(2026, 4, 20, 0, 0, 0);
+    const ts2 = Date.UTC(2026, 4, 21, 0, 0, 0);
+    const ts3 = Date.UTC(2026, 4, 22, 0, 0, 0);
+
+    const threadA = await createTestThread(uwf, tmpDir, workflowHash, ts1);
+    const threadB = await createTestThread(uwf, tmpDir, workflowHash, ts2);
+    const _threadC = await createTestThread(uwf, tmpDir, workflowHash, ts3);
+
+    const beforeMs = Date.UTC(2026, 4, 22, 0, 0, 0);
+    const result = await cmdThreadList(tmpDir, null, null, beforeMs, null, null);
+
+    expect(result).toHaveLength(2);
+    expect(result.map((r) => r.thread).sort()).toEqual([threadA, threadB].sort());
+  });
+
+  test("should support both after and before filters (time range)", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const ts1 = Date.UTC(2026, 4, 20, 0, 0, 0);
+    const ts2 = Date.UTC(2026, 4, 21, 0, 0, 0);
+    const ts3 = Date.UTC(2026, 4, 22, 0, 0, 0);
+
+    const _threadA = await createTestThread(uwf, tmpDir, workflowHash, ts1);
+    const threadB = await createTestThread(uwf, tmpDir, workflowHash, ts2);
+    const _threadC = await createTestThread(uwf, tmpDir, workflowHash, ts3);
+
+    const afterMs = Date.UTC(2026, 4, 20, 12, 0, 0);
+    const beforeMs = Date.UTC(2026, 4, 22, 0, 0, 0);
+    const result = await cmdThreadList(tmpDir, null, afterMs, beforeMs, null, null);
+
+    expect(result).toHaveLength(1);
+    expect(result[0]?.thread).toBe(threadB);
+  });
+});
+
+// ── pagination tests ──────────────────────────────────────────────────────────
+
+describe("cmdThreadList pagination", () => {
+  test("should limit results with --take", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const threads: ThreadId[] = [];
+    for (let i = 0; i < 10; i++) {
+      threads.push(await createTestThread(uwf, tmpDir, workflowHash, Date.now() - i * 1000));
+    }
+
+    const result = await cmdThreadList(tmpDir, null, null, null, null, 5);
+
+    expect(result).toHaveLength(5);
+  });
+
+  test("should skip first N threads with --skip", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const threads: ThreadId[] = [];
+    // Create threads in chronological order, but they'll be sorted newest first
+    for (let i = 0; i < 10; i++) {
+      threads.push(await createTestThread(uwf, tmpDir, workflowHash, Date.now() + i * 100));
+      // Small delay to ensure distinct timestamps
+      await new Promise((resolve) => setTimeout(resolve, 10));
+    }
+
+    const result = await cmdThreadList(tmpDir, null, null, null, 3, null);
+
+    expect(result).toHaveLength(7);
+    // The 3 newest threads should be skipped, so we should get the 7 oldest
+  });
+
+  test("should support skip + take for pagination", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const threads: ThreadId[] = [];
+    for (let i = 0; i < 10; i++) {
+      threads.push(await createTestThread(uwf, tmpDir, workflowHash, Date.now() + i * 100));
+      await new Promise((resolve) => setTimeout(resolve, 10));
+    }
+
+    const result = await cmdThreadList(tmpDir, null, null, null, 5, 3);
+
+    expect(result).toHaveLength(3);
+    // Should skip first 5 (newest), then take 3
+  });
+
+  test("should handle take > available threads", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const _thread1 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
+    const _thread2 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
+    const _thread3 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
+
+    const result = await cmdThreadList(tmpDir, null, null, null, null, 10);
+
+    expect(result).toHaveLength(3);
+  });
+
+  test("should return empty array when skip >= thread count", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 3000);
+    await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
+    await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
+
+    const result = await cmdThreadList(tmpDir, null, null, null, 5, null);
+
+    expect(result).toHaveLength(0);
+  });
+});
+
+// ── combined filters tests ────────────────────────────────────────────────────
+
+describe("combined filters", () => {
+  test("should combine status and time range filters", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const ts1 = Date.UTC(2026, 4, 20, 0, 0, 0);
+    const ts2 = Date.UTC(2026, 4, 21, 0, 0, 0);
+    const ts3 = Date.UTC(2026, 4, 22, 0, 0, 0);
+    const ts4 = Date.UTC(2026, 4, 23, 0, 0, 0);
+
+    const _thread1 = await createTestThread(uwf, tmpDir, workflowHash, ts1);
+    const thread2 = await createTestThread(uwf, tmpDir, workflowHash, ts2);
+    const thread3 = await createTestThread(uwf, tmpDir, workflowHash, ts3);
+    const thread4 = await createTestThread(uwf, tmpDir, workflowHash, ts4);
+
+    await markThreadRunning(tmpDir, thread2, workflowHash);
+
+    const index = await import("../store.js").then((m) => m.loadThreadsIndex(tmpDir));
+    const thread3Head = index[thread3];
+    if (thread3Head === undefined) throw new Error("thread3 head not found");
+    await completeThread(tmpDir, thread3, workflowHash, thread3Head);
+
+    const afterMs = Date.UTC(2026, 4, 20, 12, 0, 0);
+    const result = await cmdThreadList(tmpDir, ["idle"], afterMs, null, null, null);
+
+    expect(result).toHaveLength(1);
+    expect(result[0]?.thread).toBe(thread4);
+    expect(result[0]?.status).toBe("idle");
+
+    // Clean up marker
+    await deleteMarker(tmpDir, thread2);
+  });
+
+  test("should combine status filter and pagination", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const threads: ThreadId[] = [];
+    for (let i = 9; i >= 0; i--) {
+      const thread = await createTestThread(uwf, tmpDir, workflowHash, Date.now() + i * 1000);
+      threads.push(thread);
+      const index = await import("../store.js").then((m) => m.loadThreadsIndex(tmpDir));
+      const headHash = index[thread];
+      if (headHash === undefined) throw new Error("head not found");
+      await completeThread(tmpDir, thread, workflowHash, headHash);
+    }
+
+    const result = await cmdThreadList(tmpDir, ["completed"], null, null, 3, 5);
+
+    expect(result).toHaveLength(5);
+    for (const r of result) {
+      expect(r.status).toBe("completed");
+    }
+  });
+
+  test("should combine time range and pagination", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const threads: ThreadId[] = [];
+    for (let i = 0; i < 20; i++) {
+      const ts = Date.UTC(2026, 4, 1 + i, 0, 0, 0);
+      threads.push(await createTestThread(uwf, tmpDir, workflowHash, ts));
+    }
+
+    const afterMs = Date.UTC(2026, 4, 10, 0, 0, 0);
+    const result = await cmdThreadList(tmpDir, null, afterMs, null, 2, 5);
+
+    expect(result).toHaveLength(5);
+    for (const r of result) {
+      const ts = extractUlidTimestamp(r.thread);
+      expect(ts).not.toBeNull();
+      if (ts !== null) {
+        expect(ts).toBeGreaterThan(afterMs);
+      }
+    }
+  });
+
+  async function setupMixedStatusThreads(
+    uwf: UwfStore,
+    workflowHash: string,
+    count: number,
+  ): Promise<ThreadId[]> {
+    const threads: ThreadId[] = [];
+    for (let i = 0; i < count; i++) {
+      const ts = Date.UTC(2026, 4, 10 + i, 0, 0, 0);
+      const thread = await createTestThread(uwf, tmpDir, workflowHash, ts);
+      threads.push(thread);
+
+      if (i % 2 === 0) {
+        const index = await import("../store.js").then((m) => m.loadThreadsIndex(tmpDir));
+        const headHash = index[thread];
+        if (headHash === undefined) throw new Error("head not found");
+        await completeThread(tmpDir, thread, workflowHash, headHash);
+      } else {
+        await markThreadRunning(tmpDir, thread, workflowHash);
+      }
+    }
+    return threads;
+  }
+
+  async function cleanupRunningMarkers(threads: ThreadId[]): Promise<void> {
+    for (let i = 0; i < threads.length; i++) {
+      if (i % 2 !== 0) {
+        await deleteMarker(tmpDir, threads[i] as ThreadId);
+      }
+    }
+  }
+
+  test("should combine all filters (status + time + pagination)", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+    const threads = await setupMixedStatusThreads(uwf, workflowHash, 15);
+
+    const afterMs = Date.UTC(2026, 4, 14, 12, 0, 0);
+    const beforeMs = Date.UTC(2026, 4, 20, 0, 0, 0);
+    const result = await cmdThreadList(tmpDir, ["idle", "running"], afterMs, beforeMs, 1, 3);
+
+    expect(result.length).toBeLessThanOrEqual(3);
+    for (const r of result) {
+      expect(["idle", "running"]).toContain(r.status);
+      const ts = extractUlidTimestamp(r.thread);
+      if (ts !== null) {
+        expect(ts).toBeGreaterThan(afterMs);
+        expect(ts).toBeLessThan(beforeMs);
+      }
+    }
+
+    await cleanupRunningMarkers(threads);
+  });
+});
+
+// ── edge cases tests ──────────────────────────────────────────────────────────
+
+describe("edge cases", () => {
+  test("should handle empty thread list", async () => {
+    await makeUwfStore(tmpDir);
+    const result = await cmdThreadList(tmpDir, null, null, null, null, null);
+    expect(result).toHaveLength(0);
+  });
+
+  test("should skip threads with invalid ULID when time filtering", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const workflowHash = await createTestWorkflow(uwf);
+
+    const thread1 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 2000);
+    const thread2 = await createTestThread(uwf, tmpDir, workflowHash, Date.now() - 1000);
+
+    const index = await import("../store.js").then((m) => m.loadThreadsIndex(tmpDir));
+    index["INVALID_ULID_FORMAT_HERE" as ThreadId] = "01J6HMVRNQKJV2";
+    await saveThreadsIndex(tmpDir, index);
+
+    const afterMs = Date.now() - 3000;
+    const result = await cmdThreadList(tmpDir, null, afterMs, null, null, null);
+
+    expect(result).toHaveLength(2);
+    expect(result.map((r) => r.thread).sort()).toEqual([thread1, thread2].sort());
+  });
+});
+
+// ── time parsing tests ────────────────────────────────────────────────────────
+
+describe("relative time parsing", () => {
+  test("should parse '7d' as 7 days ago", () => {
+    const nowMs = Date.UTC(2026, 4, 24, 12, 0, 0);
+    const result = parseTimeInput("7d", nowMs);
+    const expected = Date.UTC(2026, 4, 17, 12, 0, 0);
+    expect(result).toBe(expected);
+  });
+
+  test("should parse '24h' as 24 hours ago", () => {
+    const nowMs = Date.UTC(2026, 4, 24, 12, 0, 0);
+    const result = parseTimeInput("24h", nowMs);
+    const expected = Date.UTC(2026, 4, 23, 12, 0, 0);
+    expect(result).toBe(expected);
+  });
+
+  test("should parse '30m' as 30 minutes ago", () => {
+    const nowMs = Date.UTC(2026, 4, 24, 12, 30, 0);
+    const result = parseTimeInput("30m", nowMs);
+    const expected = Date.UTC(2026, 4, 24, 12, 0, 0);
+    expect(result).toBe(expected);
+  });
+
+  test("should parse '1d' as 1 day ago", () => {
+    const nowMs = Date.UTC(2026, 4, 24, 0, 0, 0);
+    const result = parseTimeInput("1d", nowMs);
+    const expected = Date.UTC(2026, 4, 23, 0, 0, 0);
+    expect(result).toBe(expected);
+  });
+});
+
+describe("ISO date parsing", () => {
+  test("should parse ISO date (YYYY-MM-DD)", () => {
+    const nowMs = Date.now();
+    const result = parseTimeInput("2026-05-20", nowMs);
+    const expected = Date.UTC(2026, 4, 20, 0, 0, 0);
+    expect(result).toBe(expected);
+  });
+
+  test("should parse ISO datetime (YYYY-MM-DDTHH:MM:SS)", () => {
+    const nowMs = Date.now();
+    const result = parseTimeInput("2026-05-20T14:30:00", nowMs);
+    const expected = Date.parse("2026-05-20T14:30:00");
+    expect(result).toBe(expected);
+  });
+
+  test("should parse ISO datetime with Z suffix", () => {
+    const nowMs = Date.now();
+    const result = parseTimeInput("2026-05-20T14:30:00Z", nowMs);
+    const expected = Date.UTC(2026, 4, 20, 14, 30, 0);
+    expect(result).toBe(expected);
+  });
+
+  test("should reject invalid date formats", () => {
+    const nowMs = Date.now();
+    expect(() => parseTimeInput("not-a-date", nowMs)).toThrow();
+    expect(() => parseTimeInput("2026-13-01", nowMs)).toThrow();
+    expect(() => parseTimeInput("invalid", nowMs)).toThrow();
+  });
+});
@@ -0,0 +1,583 @@
+import { mkdir, mkdtemp, rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { bootstrap, putSchema } from "@uncaged/json-cas";
+import { createFsStore } from "@uncaged/json-cas-fs";
+import type { CasRef, ThreadId } from "@uncaged/workflow-protocol";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { cmdThreadRead } from "../commands/thread.js";
+import { registerUwfSchemas } from "../schemas.js";
+import { saveThreadsIndex } from "../store.js";
+
+// ── schemas used in tests ────────────────────────────────────────────────────
+
+const TURN_SCHEMA = {
+  title: "hermes-turn",
+  type: "object" as const,
+  required: ["index", "role", "content"],
+  properties: {
+    index: { type: "integer" as const },
+    role: { type: "string" as const },
+    content: { type: "string" as const },
+    toolCalls: {
+      anyOf: [
+        { type: "array" as const, items: { type: "object" as const } },
+        { type: "null" as const },
+      ],
+    },
+    reasoning: { anyOf: [{ type: "string" as const }, { type: "null" as const }] },
+  },
+  additionalProperties: false,
+};
+
+const DETAIL_SCHEMA = {
+  title: "hermes-detail",
+  type: "object" as const,
+  required: ["sessionId", "model", "duration", "turnCount", "turns"],
+  properties: {
+    sessionId: { type: "string" as const },
+    model: { type: "string" as const },
+    duration: { type: "integer" as const },
+    turnCount: { type: "integer" as const },
+    turns: {
+      type: "array" as const,
+      items: { type: "string" as const, format: "cas_ref" },
+    },
+  },
+  additionalProperties: false,
+};
+
+// ── helpers ───────────────────────────────────────────────────────────────────
+
+async function registerDetailSchemas(store: ReturnType<typeof createFsStore>) {
+  await bootstrap(store);
+  const [turn, detail] = await Promise.all([
+    putSchema(store, TURN_SCHEMA),
+    putSchema(store, DETAIL_SCHEMA),
+  ]);
+  return { turn, detail };
+}
+
+function generateContent(size: number, prefix = "Content"): string {
+  const base = `${prefix} `;
+  const repeat = Math.ceil(size / base.length);
+  return base.repeat(repeat).slice(0, size);
+}
+
+// ── fixture ───────────────────────────────────────────────────────────────────
+
+let tmpDir: string;
+
+beforeEach(async () => {
+  tmpDir = await mkdtemp(join(tmpdir(), "cli-uwf-quota-test-"));
+});
+
+afterEach(async () => {
+  await rm(tmpDir, { recursive: true, force: true });
+});
+
+// ── thread read quota enforcement ─────────────────────────────────────────────
+
+describe("thread read --quota flag", () => {
+  test("test 1: basic quota enforcement with 3 steps", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create 3 steps with ~500 chars each
+    const steps: CasRef[] = [];
+    for (let i = 1; i <= 3; i++) {
+      const content = generateContent(500, `Step${i}`);
+      const turnHash = await store.put(detailSchemas.turn, {
+        index: 0,
+        role: "assistant",
+        content,
+        toolCalls: null,
+        reasoning: null,
+      });
+      const detailHash = await store.put(detailSchemas.detail, {
+        sessionId: `session-${i}`,
+        model: "test-model",
+        duration: 1000,
+        turnCount: 1,
+        turns: [turnHash],
+      });
+      const stepHash = await store.put(schemas.stepNode, {
+        start: startHash,
+        prev: steps[i - 2] ?? null,
+        role: "worker",
+        output: outputHash,
+        detail: detailHash,
+        agent: "uwf-test",
+      });
+      steps.push(stepHash);
+    }
+
+    const threadId = "01HX2Q3R4S5T6V7W8X9YZ0" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: steps[2] as CasRef });
+
+    // Set quota to 800 chars - should only fit most recent steps
+    const markdown = await cmdThreadRead(tmpDir, threadId, 800, null, false);
+
+    // Quota must be reasonably enforced (allow ~200 char tolerance for skip hint)
+    expect(markdown.length).toBeLessThanOrEqual(1000);
+
+    // Should contain skip hint since not all steps fit
+    expect(markdown).toMatch(/earlier step/);
+
+    // Most recent step should be included
+    expect(markdown).toMatch(/Step3/);
+  });
+
+  test("test 2: quota check order - verifies bug is fixed", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create 2 steps: first=300 chars, second=600 chars
+    const step1Content = generateContent(300, "First");
+    const step1TurnHash = await store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content: step1Content,
+      toolCalls: null,
+      reasoning: null,
+    });
+    const step1DetailHash = await store.put(detailSchemas.detail, {
+      sessionId: "session-1",
+      model: "test-model",
+      duration: 1000,
+      turnCount: 1,
+      turns: [step1TurnHash],
+    });
+    const step1Hash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: step1DetailHash,
+      agent: "uwf-test",
+    });
+
+    const step2Content = generateContent(600, "Second");
+    const step2TurnHash = await store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content: step2Content,
+      toolCalls: null,
+      reasoning: null,
+    });
+    const step2DetailHash = await store.put(detailSchemas.detail, {
+      sessionId: "session-2",
+      model: "test-model",
+      duration: 1000,
+      turnCount: 1,
+      turns: [step2TurnHash],
+    });
+    const step2Hash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: step1Hash,
+      role: "worker",
+      output: outputHash,
+      detail: step2DetailHash,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01HX2Q3R4S5T6V7W8X9YZ1" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: step2Hash });
+
+    // Set quota to 500 chars
+    const markdown = await cmdThreadRead(tmpDir, threadId, 500, null, false);
+
+    // Bug fix verification: output must be limited (allow ~200 char tolerance)
+    expect(markdown.length).toBeLessThanOrEqual(1100);
+
+    // Should contain "Second" (most recent step)
+    expect(markdown).toMatch(/Second/);
+
+    // Should skip first step
+    expect(markdown).toMatch(/earlier step/);
+
+    // Verify improvement: before fix would be ~1264, now should be much closer to 500
+    expect(markdown.length).toBeLessThan(1200);
+  });
+
+  test("test 3: quota with --start section", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task with a moderately long prompt to test quota accounting",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create 2 steps
+    const steps: CasRef[] = [];
+    for (let i = 1; i <= 2; i++) {
+      const content = generateContent(400, `Step${i}`);
+      const turnHash = await store.put(detailSchemas.turn, {
+        index: 0,
+        role: "assistant",
+        content,
+        toolCalls: null,
+        reasoning: null,
+      });
+      const detailHash = await store.put(detailSchemas.detail, {
+        sessionId: `session-${i}`,
+        model: "test-model",
+        duration: 1000,
+        turnCount: 1,
+        turns: [turnHash],
+      });
+      const stepHash = await store.put(schemas.stepNode, {
+        start: startHash,
+        prev: steps[i - 2] ?? null,
+        role: "worker",
+        output: outputHash,
+        detail: detailHash,
+        agent: "uwf-test",
+      });
+      steps.push(stepHash);
+    }
+
+    const threadId = "01HX2Q3R4S5T6V7W8X9YZ2" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: steps[1] as CasRef });
+
+    // Set tight quota with --start flag
+    const markdown = await cmdThreadRead(tmpDir, threadId, 600, null, true);
+
+    // Quota must be reasonably enforced (allow ~210 char tolerance for structure)
+    expect(markdown.length).toBeLessThanOrEqual(810);
+
+    // Should contain thread header
+    expect(markdown).toMatch(/# Thread/);
+    expect(markdown).toMatch(/test-wf/);
+  });
+
+  test("test 5a: quota edge case - minimal quota", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const content = generateContent(500, "Test");
+    const turnHash = await store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content,
+      toolCalls: null,
+      reasoning: null,
+    });
+    const detailHash = await store.put(detailSchemas.detail, {
+      sessionId: "session-1",
+      model: "test-model",
+      duration: 1000,
+      turnCount: 1,
+      turns: [turnHash],
+    });
+    const stepHash = await store.put(schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01HX2Q3R4S5T6V7W8X9YZ4" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHash });
+
+    // Minimal quota
+    const markdown = await cmdThreadRead(tmpDir, threadId, 1, null, false);
+
+    // Should handle gracefully - always shows at least one step
+    expect(markdown.length).toBeGreaterThan(1);
+    expect(markdown).toMatch(/Test/);
+  });
+
+  test("test 5b: quota edge case - very large quota", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create 3 steps
+    const steps: CasRef[] = [];
+    for (let i = 1; i <= 3; i++) {
+      const content = generateContent(300, `Step${i}`);
+      const turnHash = await store.put(detailSchemas.turn, {
+        index: 0,
+        role: "assistant",
+        content,
+        toolCalls: null,
+        reasoning: null,
+      });
+      const detailHash = await store.put(detailSchemas.detail, {
+        sessionId: `session-${i}`,
+        model: "test-model",
+        duration: 1000,
+        turnCount: 1,
+        turns: [turnHash],
+      });
+      const stepHash = await store.put(schemas.stepNode, {
+        start: startHash,
+        prev: steps[i - 2] ?? null,
+        role: "worker",
+        output: outputHash,
+        detail: detailHash,
+        agent: "uwf-test",
+      });
+      steps.push(stepHash);
+    }
+
+    const threadId = "01HX2Q3R4S5T6V7W8X9YZ5" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: steps[2] as CasRef });
+
+    // Very large quota
+    const markdown = await cmdThreadRead(tmpDir, threadId, 1000000, null, false);
+
+    // Should show all steps (no skipping)
+    expect(markdown).not.toMatch(/earlier step/);
+    expect(markdown).toMatch(/Step1/);
+    expect(markdown).toMatch(/Step2/);
+    expect(markdown).toMatch(/Step3/);
+  });
+
+  test("test 6: quota with --before parameter", async () => {
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const detailSchemas = await registerDetailSchemas(store);
+
+    const workflowHash = await store.put(schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do the work.",
+          output: "Summarize the work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await store.put(schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Test task",
+    });
+
+    const outputHash = await store.put(schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // Create 5 steps
+    const steps: CasRef[] = [];
+    for (let i = 1; i <= 5; i++) {
+      const content = generateContent(300, `Step${i}`);
+      const turnHash = await store.put(detailSchemas.turn, {
+        index: 0,
+        role: "assistant",
+        content,
+        toolCalls: null,
+        reasoning: null,
+      });
+      const detailHash = await store.put(detailSchemas.detail, {
+        sessionId: `session-${i}`,
+        model: "test-model",
+        duration: 1000,
+        turnCount: 1,
+        turns: [turnHash],
+      });
+      const stepHash = await store.put(schemas.stepNode, {
+        start: startHash,
+        prev: steps[i - 2] ?? null,
+        role: "worker",
+        output: outputHash,
+        detail: detailHash,
+        agent: "uwf-test",
+      });
+      steps.push(stepHash);
+    }
+
+    const threadId = "01HX2Q3R4S5T6V7W8X9YZ6" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: steps[4] as CasRef });
+
+    // Use --before to limit to steps 1-2, then set quota that allows only 1
+    const markdown = await cmdThreadRead(tmpDir, threadId, 500, steps[2] as CasRef, false);
+
+    // Should not contain Step3 or later
+    expect(markdown).not.toMatch(/Step3/);
+    expect(markdown).not.toMatch(/Step4/);
+    expect(markdown).not.toMatch(/Step5/);
+
+    // Quota should select most recent of candidates (Step2)
+    expect(markdown).toMatch(/Step2/);
+
+    // Quota enforcement (allow ~200 char tolerance)
+    expect(markdown.length).toBeLessThanOrEqual(700);
+  });
+});
@@ -0,0 +1,683 @@
+import { mkdir, mkdtemp, rm } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { bootstrap, putSchema } from "@uncaged/json-cas";
+import { createFsStore } from "@uncaged/json-cas-fs";
+import type { CasRef, ThreadId } from "@uncaged/workflow-protocol";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { cmdThreadRead, THREAD_READ_DEFAULT_QUOTA } from "../commands/thread.js";
+import { registerUwfSchemas } from "../schemas.js";
+import type { UwfStore } from "../store.js";
+import { saveThreadsIndex } from "../store.js";
+
+// ── schemas used in tests ────────────────────────────────────────────────────
+
+const TURN_SCHEMA = {
+  title: "hermes-turn",
+  type: "object" as const,
+  required: ["index", "role", "content"],
+  properties: {
+    index: { type: "integer" as const },
+    role: { type: "string" as const },
+    content: { type: "string" as const },
+    toolCalls: {
+      anyOf: [
+        { type: "array" as const, items: { type: "object" as const } },
+        { type: "null" as const },
+      ],
+    },
+    reasoning: { anyOf: [{ type: "string" as const }, { type: "null" as const }] },
+  },
+  additionalProperties: false,
+};
+
+const DETAIL_SCHEMA = {
+  title: "hermes-detail",
+  type: "object" as const,
+  required: ["sessionId", "model", "duration", "turnCount", "turns"],
+  properties: {
+    sessionId: { type: "string" as const },
+    model: { type: "string" as const },
+    duration: { type: "integer" as const },
+    turnCount: { type: "integer" as const },
+    turns: {
+      type: "array" as const,
+      items: { type: "string" as const, format: "cas_ref" },
+    },
+  },
+  additionalProperties: false,
+};
+
+// ── helpers ───────────────────────────────────────────────────────────────────
+
+async function makeUwfStore(storageRoot: string): Promise<UwfStore> {
+  const casDir = join(storageRoot, "cas");
+  await mkdir(casDir, { recursive: true });
+  const store = createFsStore(casDir);
+  const schemas = await registerUwfSchemas(store);
+  return { storageRoot, store, schemas };
+}
+
+async function registerDetailSchemas(store: ReturnType<typeof createFsStore>) {
+  await bootstrap(store);
+  const [turn, detail] = await Promise.all([
+    putSchema(store, TURN_SCHEMA),
+    putSchema(store, DETAIL_SCHEMA),
+  ]);
+  return { turn, detail };
+}
+
+// ── fixture ───────────────────────────────────────────────────────────────────
+
+let tmpDir: string;
+
+beforeEach(async () => {
+  tmpDir = await mkdtemp(join(tmpdir(), "cli-uwf-test-"));
+});
+
+afterEach(async () => {
+  await rm(tmpDir, { recursive: true, force: true });
+});
+
+// ── thread read XML tag isolation ─────────────────────────────────────────────
+
+describe("thread read XML tag isolation", () => {
+  test("scenario 1: wraps output in XML tags instead of heading", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const detailSchemas = await registerDetailSchemas(uwf.store);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        planner: {
+          description: "Planner",
+          goal: "You are a planning agent. Your task is to...",
+          capabilities: [],
+          procedure: "Plan the work.",
+          output: "Summarize the plan.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Fix issue #459",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const turnHash = await uwf.store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content:
+        "---\nstatus: ready\nplan: CMWGHQKT58RY4\n---\n\n# Analysis Complete\n## Issue Summary\nThe issue requires XML tag isolation.",
+      toolCalls: null,
+      reasoning: null,
+    });
+    const detailHash = await uwf.store.put(detailSchemas.detail, {
+      sessionId: "sx",
+      model: "mx",
+      duration: 500,
+      turnCount: 1,
+      turns: [turnHash],
+    });
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "planner",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-claude-code",
+    });
+
+    const threadId = "01JTEST0000000000000001" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+
+    // Should wrap output in XML tags
+    expect(markdown).toContain("<output>");
+    expect(markdown).toContain("</output>");
+
+    // Should not have ### Content heading
+    expect(markdown).not.toContain("### Content");
+
+    // Should preserve markdown headings inside output tags
+    expect(markdown).toContain("# Analysis Complete");
+    expect(markdown).toContain("## Issue Summary");
+  });
+
+  test("scenario 2: wraps prompt in XML tags", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const detailSchemas = await registerDetailSchemas(uwf.store);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        planner: {
+          description: "Planner",
+          goal: "You are a planning agent. Your task is to analyze and plan.",
+          capabilities: [],
+          procedure: "Plan the work.",
+          output: "Summarize the plan.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Fix issue",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const turnHash = await uwf.store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content: "---\nstatus: ready\n---\n\nContent here...",
+      toolCalls: null,
+      reasoning: null,
+    });
+    const detailHash = await uwf.store.put(detailSchemas.detail, {
+      sessionId: "sx",
+      model: "mx",
+      duration: 500,
+      turnCount: 1,
+      turns: [turnHash],
+    });
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "planner",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-claude-code",
+    });
+
+    const threadId = "01JTEST0000000000000002" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+
+    // Should wrap prompt in XML tags
+    expect(markdown).toContain("<prompt>");
+    expect(markdown).toContain("</prompt>");
+    expect(markdown).toContain("You are a planning agent. Your task is to analyze and plan.");
+
+    // Should not have ### Prompt heading
+    expect(markdown).not.toContain("### Prompt");
+
+    // Should wrap output in XML tags
+    expect(markdown).toContain("<output>");
+    expect(markdown).toContain("</output>");
+  });
+
+  test("scenario 3: same role repeated does not show prompt twice", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        writer: {
+          description: "Writer",
+          goal: "You are a writer agent.",
+          capabilities: [],
+          procedure: "Write content.",
+          output: "Summarize writing.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Write something",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const step1 = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "writer",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const step2 = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: step1 as CasRef,
+      role: "writer",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000003" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: step2 });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+
+    // Should only show prompt tags once
+    const promptCount = (markdown.match(/<prompt>/g) ?? []).length;
+    expect(promptCount).toBe(1);
+  });
+
+  test("scenario 4: step with no detail shows no output tags", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        worker: {
+          description: "Worker",
+          goal: "You are a worker agent.",
+          capabilities: [],
+          procedure: "Do work.",
+          output: "Summarize work.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Do stuff",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000004" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+
+    // Should not have output tags
+    expect(markdown).not.toContain("<output>");
+    expect(markdown).not.toContain("</output>");
+
+    // Step header should still be displayed
+    expect(markdown).toContain("## Step 1: worker");
+
+    // Prompt should still be shown
+    expect(markdown).toContain("<prompt>");
+  });
+
+  test("scenario 5: empty content shows no output tags", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Do stuff",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    // A detail ref that doesn't exist → extractLastAssistantContent returns null
+    const missingDetailRef = "missingdetail0" as CasRef;
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "worker",
+      output: outputHash,
+      detail: missingDetailRef,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000005" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+
+    // Should not have output tags
+    expect(markdown).not.toContain("<output>");
+    expect(markdown).not.toContain("</output>");
+  });
+
+  test("scenario 6: thread read with --start flag shows task section", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        roleA: {
+          description: "Role A",
+          goal: "Goal for roleA",
+          capabilities: [],
+          procedure: "Do stuff.",
+          output: "Output.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Initial prompt",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "roleA",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000006" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, true);
+
+    // Should include task section
+    expect(markdown).toContain("# Thread");
+    expect(markdown).toContain("## Task");
+    expect(markdown).toContain("Initial prompt");
+
+    // Prompts should use XML tags
+    expect(markdown).toContain("<prompt>");
+  });
+
+  test("scenario 7: thread read with --before parameter", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        roleA: {
+          description: "Role A",
+          goal: "Goal for roleA",
+          capabilities: [],
+          procedure: "Do stuff.",
+          output: "Output.",
+          meta: "placeholder00" as CasRef,
+        },
+        roleB: {
+          description: "Role B",
+          goal: "Goal for roleB",
+          capabilities: [],
+          procedure: "Do stuff.",
+          output: "Output.",
+          meta: "placeholder00" as CasRef,
+        },
+        roleC: {
+          description: "Role C",
+          goal: "Goal for roleC",
+          capabilities: [],
+          procedure: "Do stuff.",
+          output: "Output.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Initial prompt",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const step1 = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "roleA",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const step2 = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: step1 as CasRef,
+      role: "roleB",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const step3 = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: step2 as CasRef,
+      role: "roleC",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000007" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: step3 });
+
+    const markdown = await cmdThreadRead(
+      tmpDir,
+      threadId,
+      THREAD_READ_DEFAULT_QUOTA,
+      step2 as CasRef,
+      false,
+    );
+
+    // Should only show roleA
+    expect(markdown).toContain("roleA");
+    expect(markdown).not.toContain("roleB");
+    expect(markdown).not.toContain("roleC");
+
+    // Should use XML tags
+    expect(markdown).toContain("<prompt>");
+  });
+
+  test("scenario 9: special characters in content are preserved", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const detailSchemas = await registerDetailSchemas(uwf.store);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        writer: {
+          description: "Writer",
+          goal: "You are a writer.",
+          capabilities: [],
+          procedure: "Write content.",
+          output: "Summarize.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Write something",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const turnHash = await uwf.store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content: "Content with <special> & characters > like <this>",
+      toolCalls: null,
+      reasoning: null,
+    });
+    const detailHash = await uwf.store.put(detailSchemas.detail, {
+      sessionId: "sx",
+      model: "mx",
+      duration: 500,
+      turnCount: 1,
+      turns: [turnHash],
+    });
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "writer",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000008" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+
+    // Special characters should be preserved as-is
+    expect(markdown).toContain("Content with <special> & characters > like <this>");
+  });
+
+  test("scenario 10: quota limit with XML tags", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf",
+      description: "desc",
+      roles: {
+        roleA: {
+          description: "Role A",
+          goal: "Goal for roleA",
+          capabilities: [],
+          procedure: "Do stuff.",
+          output: "Output.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Initial prompt",
+    });
+
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const steps: CasRef[] = [];
+    let prev: CasRef | null = null;
+    for (let i = 0; i < 5; i++) {
+      const step = (await uwf.store.put(uwf.schemas.stepNode, {
+        start: startHash,
+        prev,
+        role: "roleA",
+        output: outputHash,
+        detail: null,
+        agent: "uwf-test",
+      })) as CasRef;
+      steps.push(step);
+      prev = step;
+    }
+
+    const threadId = "01JTEST0000000000000009" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: steps[steps.length - 1]! });
+
+    // Use very small quota
+    const markdown = await cmdThreadRead(tmpDir, threadId, 1, null, false);
+
+    // Should have skip hint
+    expect(markdown).toContain("earlier step");
+
+    // Should have XML tags for displayed steps
+    if (markdown.includes("<prompt>")) {
+      expect(markdown).toContain("</prompt>");
+    }
+  });
+});
@@ -0,0 +1,71 @@
+import { execFileSync } from "node:child_process";
+import { join } from "node:path";
+import { describe, expect, test } from "vitest";
+
+const CLI_PATH = join(import.meta.dirname, "..", "cli.js");
+
+function runCli(args: string[]): { stdout: string; stderr: string; exitCode: number } {
+  try {
+    const stdout = execFileSync("bun", ["run", CLI_PATH, ...args], {
+      encoding: "utf8",
+      env: { ...process.env, WORKFLOW_STORAGE_ROOT: "/tmp/uwf-test-nonexistent" },
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+    return { stdout, stderr: "", exitCode: 0 };
+  } catch (e: unknown) {
+    const err = e as NodeJS.ErrnoException & { stdout?: string; stderr?: string; status?: number };
+    return {
+      stdout: err.stdout ?? "",
+      stderr: err.stderr ?? "",
+      exitCode: err.status ?? 1,
+    };
+  }
+}
+
+describe("thread exec --count CLI parsing", () => {
+  test("--help shows -c/--count option", () => {
+    const result = runCli(["thread", "exec", "--help"]);
+    expect(result.stdout).toContain("--count");
+    expect(result.stdout).toContain("-c");
+  });
+
+  test("description says 'one or more steps'", () => {
+    const result = runCli(["thread", "exec", "--help"]);
+    expect(result.stdout).toContain("one or more steps");
+  });
+});
+
+describe("cmdThreadExec count logic", () => {
+  test("count=0 fails with validation error", () => {
+    const result = runCli(["thread", "exec", "FAKE_THREAD_ID", "-c", "0"]);
+    expect(result.exitCode).not.toBe(0);
+    expect(result.stderr).toContain("positive integer");
+  });
+
+  test("negative count fails with validation error", () => {
+    const result = runCli(["thread", "exec", "FAKE_THREAD_ID", "-c", "-1"]);
+    expect(result.exitCode).not.toBe(0);
+    expect(result.stderr).toContain("positive integer");
+  });
+
+  test("non-integer count fails with validation error", () => {
+    const result = runCli(["thread", "exec", "FAKE_THREAD_ID", "-c", "1.5"]);
+    expect(result.exitCode).not.toBe(0);
+    expect(result.stderr).toContain("positive integer");
+  });
+
+  test("count=1 is the default (no -c flag)", () => {
+    // Without -c, it should attempt to run 1 step (failing on missing thread, not on count validation)
+    const result = runCli(["thread", "exec", "FAKE_THREAD_ID"]);
+    expect(result.exitCode).not.toBe(0);
+    // Should NOT contain "positive integer" error — should fail on thread lookup instead
+    expect(result.stderr).not.toContain("positive integer");
+  });
+
+  test("count=3 passes validation (fails on thread lookup)", () => {
+    const result = runCli(["thread", "exec", "FAKE_THREAD_ID", "-c", "3"]);
+    expect(result.exitCode).not.toBe(0);
+    // Should NOT contain "positive integer" error — should fail on thread/storage lookup
+    expect(result.stderr).not.toContain("positive integer");
+  });
+});
@@ -5,15 +5,15 @@ import { bootstrap, putSchema } from "@uncaged/json-cas";
 import { createFsStore } from "@uncaged/json-cas-fs";
 import type { CasRef, ThreadId } from "@uncaged/workflow-protocol";
 import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { cmdStepList, cmdStepShow } from "../commands/step.js";
 import {
  cmdThreadRead,
-  cmdThreadStepDetails,
  extractLastAssistantContent,
  THREAD_READ_DEFAULT_QUOTA,
 } from "../commands/thread.js";
 import { registerUwfSchemas } from "../schemas.js";
 import type { UwfStore } from "../store.js";
-import { saveThreadsIndex } from "../store.js";
+import { appendThreadHistory, saveThreadsIndex } from "../store.js";

 // ── schemas used in tests ────────────────────────────────────────────────────

@@ -198,10 +198,10 @@ describe("extractLastAssistantContent", () => {
  });
 });

-// ── cmdThreadRead: ### Content section ───────────────────────────────────────
+// ── cmdThreadRead: <output> section ──────────────────────────────────────────

-describe("cmdThreadRead ### Content section", () => {
-  test("includes ### Content before ### Output when detail has assistant turns", async () => {
+describe("cmdThreadRead <output> section", () => {
+  test("includes <output> tags when detail has assistant turns", async () => {
    const uwf = await makeUwfStore(tmpDir);
    const detailSchemas = await registerDetailSchemas(uwf.store);

@@ -264,17 +264,13 @@ describe("cmdThreadRead ### Content section", () => {

    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);

-    expect(markdown).toContain("### Content");
+    expect(markdown).toContain("<output>");
+    expect(markdown).toContain("</output>");
    expect(markdown).toContain("The assistant response text");
-
-    const contentIdx = markdown.indexOf("### Content");
-    const outputIdx = markdown.indexOf("### Output");
-    expect(contentIdx).toBeGreaterThanOrEqual(0);
-    expect(outputIdx).toBeGreaterThanOrEqual(0);
-    expect(contentIdx).toBeLessThan(outputIdx);
+    expect(markdown).not.toContain("### Content");
  });

-  test("omits ### Content when detail has no matching assistant turns", async () => {
+  test("omits <output> tags when detail has no matching assistant turns", async () => {
    const uwf = await makeUwfStore(tmpDir);

    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
@@ -313,14 +309,15 @@ describe("cmdThreadRead ### Content section", () => {

    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);

+    expect(markdown).not.toContain("<output>");
+    expect(markdown).not.toContain("</output>");
    expect(markdown).not.toContain("### Content");
-    expect(markdown).toContain("### Output");
  });
 });

-// ── cmdThreadStepDetails ──────────────────────────────────────────────────────
+// ── cmdStepShow ───────────────────────────────────────────────────────────────

-describe("cmdThreadStepDetails", () => {
+describe("cmdStepShow", () => {
  test("returns expanded detail node with turns inlined", async () => {
    const uwf = await makeUwfStore(tmpDir);
    const detailSchemas = await registerDetailSchemas(uwf.store);
@@ -368,7 +365,7 @@ describe("cmdThreadStepDetails", () => {
      agent: "uwf-hermes",
    });

-    const result = await cmdThreadStepDetails(tmpDir, stepHash);
+    const result = await cmdStepShow(tmpDir, stepHash);

    expect(result).toMatchObject({
      sessionId: "sess42",
@@ -387,8 +384,646 @@ describe("cmdThreadStepDetails", () => {
      content: "done",
    });
  });
+});

-  test("throws when step hash does not exist", async () => {
-    await expect(cmdThreadStepDetails(tmpDir, "nonexistenth0" as CasRef)).rejects.toThrow();
+// ── cmdThreadRead: <prompt> deduplication ────────────────────────────────────
+
+describe("cmdThreadRead <prompt> deduplication", () => {
+  async function makeThreadWithRoles(uwf: UwfStore, roles: string[]): Promise<string> {
+    const roleMap: Record<string, unknown> = {};
+    for (const r of [...new Set(roles)]) {
+      roleMap[r] = {
+        description: r,
+        goal: `Goal for ${r}`,
+        capabilities: [],
+        procedure: "Do stuff.",
+        output: "Output.",
+        meta: "placeholder00" as CasRef,
+      };
+    }
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "dedup-wf",
+      description: "desc",
+      roles: roleMap,
+      conditions: {},
+      graph: {},
+    });
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Start",
+    });
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    let prev: string | null = null;
+    let stepHash = "";
+    for (const role of roles) {
+      stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+        start: startHash,
+        prev: prev as CasRef | null,
+        role,
+        output: outputHash,
+        detail: null,
+        agent: "uwf-test",
+      });
+      prev = stepHash;
+    }
+    return stepHash;
+  }
+
+  test("same consecutive role shows <prompt> once", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const headHash = await makeThreadWithRoles(uwf, ["writer", "writer"]);
+    const threadId = "01JTEST0000000000000003" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: headHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+    const count = (markdown.match(/<prompt>/g) ?? []).length;
+    expect(count).toBe(1);
+  });
+
+  test("different consecutive roles each show <prompt>", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const headHash = await makeThreadWithRoles(uwf, ["planner", "coder"]);
+    const threadId = "01JTEST0000000000000004" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: headHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+    const count = (markdown.match(/<prompt>/g) ?? []).length;
+    expect(count).toBe(2);
+  });
+
+  test("non-consecutive same role shows <prompt> twice", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const headHash = await makeThreadWithRoles(uwf, ["roleA", "roleB", "roleA"]);
+    const threadId = "01JTEST0000000000000005" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: headHash });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+    const count = (markdown.match(/<prompt>/g) ?? []).length;
+    expect(count).toBe(2);
+  });
+});
+
+// ── cmdThreadRead: showStart / before / quota ─────────────────────────────────
+
+describe("cmdThreadRead start section / before / quota", () => {
+  async function makeSimpleThread(
+    uwf: UwfStore,
+    roles: string[],
+  ): Promise<{ startHash: CasRef; stepHashes: CasRef[] }> {
+    const uniqueRoles = [...new Set(roles)];
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "simple-wf",
+      description: "desc",
+      roles: Object.fromEntries(
+        uniqueRoles.map((r) => [
+          r,
+          {
+            description: r,
+            goal: `Goal for ${r}`,
+            capabilities: [],
+            procedure: "Do stuff.",
+            output: "Output.",
+            meta: "placeholder00" as CasRef,
+          },
+        ]),
+      ),
+      conditions: {},
+      graph: {},
+    });
+    const startHash = (await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Initial prompt",
+    })) as CasRef;
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const stepHashes: CasRef[] = [];
+    let prev: CasRef | null = null;
+    for (const role of roles) {
+      const stepHash = (await uwf.store.put(uwf.schemas.stepNode, {
+        start: startHash,
+        prev,
+        role,
+        output: outputHash,
+        detail: null,
+        agent: "uwf-test",
+      })) as CasRef;
+      stepHashes.push(stepHash);
+      prev = stepHash;
+    }
+    return { startHash, stepHashes };
+  }
+
+  test("showStart=true includes # Thread header and ## Task section", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const { stepHashes } = await makeSimpleThread(uwf, ["roleA"]);
+    const threadId = "01JTEST0000000000000006" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHashes[stepHashes.length - 1]! });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, true);
+    expect(markdown).toContain("# Thread");
+    expect(markdown).toContain("## Task");
+    expect(markdown).toContain("Initial prompt");
+  });
+
+  test("showStart=false with before=null still shows # Thread header (default behavior)", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const { stepHashes } = await makeSimpleThread(uwf, ["roleA"]);
+    const threadId = "01JTEST0000000000000007" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHashes[stepHashes.length - 1]! });
+
+    // When before=null, the start section is always shown regardless of showStart
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+    expect(markdown).toContain("# Thread");
+    expect(markdown).toContain("## Task");
+  });
+
+  test("before filter: only steps before the given hash appear", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const { stepHashes } = await makeSimpleThread(uwf, ["roleA", "roleB", "roleC"]);
+    const [_hashA, hashB, hashC] = stepHashes as [CasRef, CasRef, CasRef];
+    const threadId = "01JTEST0000000000000008" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: hashC });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, hashB, false);
+    expect(markdown).toContain("roleA");
+    expect(markdown).not.toContain("roleB");
+    expect(markdown).not.toContain("roleC");
+  });
+
+  test("quota=1 limits output and includes skip hint", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const { stepHashes } = await makeSimpleThread(uwf, ["roleA", "roleB", "roleC"]);
+    const threadId = "01JTEST000000000000000A" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHashes[stepHashes.length - 1]! });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, 1, null, false);
+    expect(markdown).toContain("earlier step");
+  });
+
+  test("all steps fit in quota: no skip hint", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const { stepHashes } = await makeSimpleThread(uwf, ["roleA"]);
+    const threadId = "01JTEST000000000000000B" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHashes[0]! });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+    expect(markdown).not.toContain("earlier step");
+  });
+});
+
+// ── Tests that call process.exit must be last ─────────────────────────────────
+
+describe("cmdStepShow (process.exit tests - must be last)", () => {
+  test("throws when step hash does not exist", async () => {
+    await expect(cmdStepShow(tmpDir, "nonexistenth0" as CasRef)).rejects.toThrow();
+  });
+
+  test("before with unknown hash rejects", async () => {
+    const _uwf = await makeUwfStore(tmpDir);
+    const casDir = join(tmpDir, "cas");
+    await mkdir(casDir, { recursive: true });
+    const store = createFsStore(casDir);
+    const schemas = await registerUwfSchemas(store);
+    const uwfStore: UwfStore = { storageRoot: tmpDir, store, schemas };
+
+    const workflowHash = await uwfStore.store.put(uwfStore.schemas.workflow, {
+      name: "wf2",
+      description: "",
+      roles: {
+        roleA: {
+          description: "r",
+          goal: "g",
+          capabilities: [],
+          procedure: "p",
+          output: "o",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+    const startHash = await uwfStore.store.put(uwfStore.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "p",
+    });
+    const outputHash = await uwfStore.store.put(uwfStore.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+    const stepHash = await uwfStore.store.put(uwfStore.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "roleA",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+    await saveThreadsIndex(tmpDir, { ["01JTEST000000000000000C" as ThreadId]: stepHash as CasRef });
+
+    await expect(
+      cmdThreadRead(
+        tmpDir,
+        "01JTEST000000000000000C" as ThreadId,
+        THREAD_READ_DEFAULT_QUOTA,
+        "unknownhash0" as CasRef,
+        false,
+      ),
+    ).rejects.toThrow();
+  });
+});
+
+// ── cmdStepList / cmdStepShow: completed threads ──────────────────────────────
+
+describe("cmdStepList with completed threads", () => {
+  test("lists steps from active thread", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf-active",
+      description: "desc",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Start prompt",
+    });
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const step1Hash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "role1",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+    const step2Hash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: step1Hash,
+      role: "role2",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+    const step3Hash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: step2Hash,
+      role: "role3",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000000A1" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: step3Hash });
+
+    const result = await cmdStepList(tmpDir, threadId);
+
+    expect(result.thread).toBe(threadId);
+    expect(result.steps).toHaveLength(4); // start + 3 steps
+    expect(result.steps[1].role).toBe("role1");
+    expect(result.steps[2].role).toBe("role2");
+    expect(result.steps[3].role).toBe("role3");
+  });
+
+  test("lists steps from completed thread", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf-completed",
+      description: "desc",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Start prompt",
+    });
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const step1Hash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "roleA",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+    const step2Hash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: step1Hash,
+      role: "roleB",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000000A2" as ThreadId;
+    // Thread is NOT in threads.yaml (simulating completed thread)
+    await saveThreadsIndex(tmpDir, {});
+    // But it IS in history.jsonl
+    await appendThreadHistory(tmpDir, {
+      thread: threadId,
+      workflow: workflowHash,
+      head: step2Hash,
+      completedAt: Date.now(),
+    });
+
+    const result = await cmdStepList(tmpDir, threadId);
+
+    expect(result.thread).toBe(threadId);
+    expect(result.steps).toHaveLength(3); // start + 2 steps
+    expect(result.steps[1].role).toBe("roleA");
+    expect(result.steps[2].role).toBe("roleB");
+  });
+});
+
+describe("cmdStepShow with completed threads", () => {
+  test("shows step detail from active thread", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const detailSchemas = await registerDetailSchemas(uwf.store);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf-step-active",
+      description: "desc",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "p",
+    });
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const turnHash = await uwf.store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content: "Active thread response",
+      toolCalls: null,
+      reasoning: null,
+    });
+    const detailHash = await uwf.store.put(detailSchemas.detail, {
+      sessionId: "sess-active",
+      model: "model-x",
+      duration: 1234,
+      turnCount: 1,
+      turns: [turnHash],
+    });
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "coder",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-hermes",
+    });
+
+    const threadId = "01JTEST0000000000000000B1" as ThreadId;
+    await saveThreadsIndex(tmpDir, { [threadId]: stepHash });
+
+    const result = await cmdStepShow(tmpDir, stepHash);
+
+    expect(result).toMatchObject({
+      sessionId: "sess-active",
+      model: "model-x",
+      duration: 1234,
+      turnCount: 1,
+    });
+  });
+
+  test("shows step detail from completed thread", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+    const detailSchemas = await registerDetailSchemas(uwf.store);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf-step-completed",
+      description: "desc",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "p",
+    });
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const turnHash = await uwf.store.put(detailSchemas.turn, {
+      index: 0,
+      role: "assistant",
+      content: "Completed thread response",
+      toolCalls: null,
+      reasoning: null,
+    });
+    const detailHash = await uwf.store.put(detailSchemas.detail, {
+      sessionId: "sess-completed",
+      model: "model-y",
+      duration: 5678,
+      turnCount: 1,
+      turns: [turnHash],
+    });
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "reviewer",
+      output: outputHash,
+      detail: detailHash,
+      agent: "uwf-hermes",
+    });
+
+    const threadId = "01JTEST0000000000000000B2" as ThreadId;
+    // Thread is NOT in threads.yaml
+    await saveThreadsIndex(tmpDir, {});
+    // But it IS in history.jsonl
+    await appendThreadHistory(tmpDir, {
+      thread: threadId,
+      workflow: workflowHash,
+      head: stepHash,
+      completedAt: Date.now(),
+    });
+
+    const result = await cmdStepShow(tmpDir, stepHash);
+
+    expect(result).toMatchObject({
+      sessionId: "sess-completed",
+      model: "model-y",
+      duration: 5678,
+      turnCount: 1,
+    });
+  });
+});
+
+describe("cmdThreadRead with completed threads", () => {
+  test("reads completed thread context", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf-read-completed",
+      description: "desc",
+      roles: {
+        writer: {
+          description: "Write",
+          goal: "You are a writer.",
+          capabilities: [],
+          procedure: "Write content.",
+          output: "Summary.",
+          meta: "placeholder00" as CasRef,
+        },
+      },
+      conditions: {},
+      graph: {},
+    });
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Write something",
+    });
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const stepHash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "writer",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-hermes",
+    });
+
+    const threadId = "01JTEST0000000000000000C1" as ThreadId;
+    // Thread is NOT in threads.yaml
+    await saveThreadsIndex(tmpDir, {});
+    // But it IS in history.jsonl
+    await appendThreadHistory(tmpDir, {
+      thread: threadId,
+      workflow: workflowHash,
+      head: stepHash,
+      completedAt: Date.now(),
+    });
+
+    const markdown = await cmdThreadRead(tmpDir, threadId, THREAD_READ_DEFAULT_QUOTA, null, false);
+
+    expect(markdown).toContain("writer");
+    expect(markdown).toContain("Write something");
+  });
+
+  test("reads completed thread with before filter", async () => {
+    const uwf = await makeUwfStore(tmpDir);
+
+    const workflowHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "test-wf-read-before",
+      description: "desc",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+    const startHash = await uwf.store.put(uwf.schemas.startNode, {
+      workflow: workflowHash,
+      prompt: "Do task",
+    });
+    const outputHash = await uwf.store.put(uwf.schemas.workflow, {
+      name: "out",
+      description: "",
+      roles: {},
+      conditions: {},
+      graph: {},
+    });
+
+    const step1Hash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: null,
+      role: "roleX",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+    const step2Hash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: step1Hash,
+      role: "roleY",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+    const step3Hash = await uwf.store.put(uwf.schemas.stepNode, {
+      start: startHash,
+      prev: step2Hash,
+      role: "roleZ",
+      output: outputHash,
+      detail: null,
+      agent: "uwf-test",
+    });
+
+    const threadId = "01JTEST0000000000000000C2" as ThreadId;
+    await saveThreadsIndex(tmpDir, {});
+    await appendThreadHistory(tmpDir, {
+      thread: threadId,
+      workflow: workflowHash,
+      head: step3Hash,
+      completedAt: Date.now(),
+    });
+
+    const markdown = await cmdThreadRead(
+      tmpDir,
+      threadId,
+      THREAD_READ_DEFAULT_QUOTA,
+      step2Hash,
+      false,
+    );
+
+    // Should contain step1 (roleX) but not step2 (roleY) or step3 (roleZ)
+    expect(markdown).toContain("roleX");
+    expect(markdown).not.toContain("roleY");
+    expect(markdown).not.toContain("roleZ");
  });
 });
@@ -0,0 +1,366 @@
+import type { WorkflowPayload } from "@uncaged/workflow-protocol";
+import { describe, expect, test } from "vitest";
+import { validateWorkflow } from "../validate-semantic.js";
+
+/** Build a valid two-role workflow that passes all checks. */
+function makeWorkflow(overrides?: Partial<WorkflowPayload>): WorkflowPayload {
+  const base: WorkflowPayload = {
+    name: "test-workflow",
+    description: "A test workflow",
+    roles: {
+      writer: {
+        description: "Writes content",
+        goal: "Write content",
+        capabilities: ["writing"],
+        procedure: "Write it",
+        output: "The content",
+        frontmatter: {
+          type: "object",
+          properties: {
+            $status: { enum: ["_"] },
+            plan: { type: "string" },
+          },
+          required: ["$status", "plan"],
+        } as unknown as string,
+      },
+      reviewer: {
+        description: "Reviews content",
+        goal: "Review content",
+        capabilities: ["reviewing"],
+        procedure: "Review it",
+        output: "The review",
+        frontmatter: {
+          type: "object",
+          oneOf: [
+            {
+              properties: {
+                $status: { const: "approved" },
+                summary: { type: "string" },
+              },
+              required: ["$status", "summary"],
+            },
+            {
+              properties: {
+                $status: { const: "rejected" },
+                reason: { type: "string" },
+              },
+              required: ["$status", "reason"],
+            },
+          ],
+        } as unknown as string,
+      },
+    },
+    graph: {
+      $START: { _: { role: "writer", prompt: "Begin writing" } },
+      writer: { _: { role: "reviewer", prompt: "Review this: {{{plan}}}" } },
+      reviewer: {
+        approved: { role: "$END", prompt: "Done: {{{summary}}}" },
+        rejected: { role: "writer", prompt: "Fix: {{{reason}}}" },
+      },
+    },
+  };
+
+  if (!overrides) return base;
+  return { ...base, ...overrides };
+}
+
+describe("Suite 1: Role Reference Integrity", () => {
+  test("1.1 graph references unknown role", () => {
+    const wf = makeWorkflow();
+    wf.graph.nonexistent = { _: { role: "$END", prompt: "done" } };
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes('unknown role "nonexistent"'))).toBe(true);
+  });
+
+  test("1.2 orphan role not in graph", () => {
+    const wf = makeWorkflow();
+    wf.roles.orphan = {
+      description: "Orphan",
+      goal: "Nothing",
+      capabilities: [],
+      procedure: "None",
+      output: "None",
+      frontmatter: {
+        type: "object",
+        properties: { $status: { enum: ["_"] } },
+        required: ["$status"],
+      } as unknown as string,
+    };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) => e.includes('role "orphan" is defined but not referenced in graph')),
+    ).toBe(true);
+  });
+
+  test("1.3 $START in roles", () => {
+    const wf = makeWorkflow();
+    (wf.roles as Record<string, unknown>).$START = {
+      description: "Bad",
+      goal: "Bad",
+      capabilities: [],
+      procedure: "Bad",
+      output: "Bad",
+      frontmatter: { type: "object", properties: {}, required: [] },
+    };
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes('reserved name "$START"'))).toBe(true);
+  });
+
+  test("1.4 $END in roles", () => {
+    const wf = makeWorkflow();
+    (wf.roles as Record<string, unknown>).$END = {
+      description: "Bad",
+      goal: "Bad",
+      capabilities: [],
+      procedure: "Bad",
+      output: "Bad",
+      frontmatter: { type: "object", properties: {}, required: [] },
+    };
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes('reserved name "$END"'))).toBe(true);
+  });
+
+  test("1.5 valid workflow returns no errors", () => {
+    const wf = makeWorkflow();
+    const errors = validateWorkflow(wf);
+    expect(errors).toEqual([]);
+  });
+});
+
+describe("Suite 2: Graph Structure", () => {
+  test("2.1 $START missing from graph", () => {
+    const wf = makeWorkflow();
+    delete wf.graph.$START;
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes("$START must be defined in graph"))).toBe(true);
+  });
+
+  test("2.2 $START has multiple status keys", () => {
+    const wf = makeWorkflow();
+    wf.graph.$START = {
+      _: { role: "writer", prompt: "Begin" },
+      other: { role: "reviewer", prompt: "Also" },
+    };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) => e.includes('$START must have exactly one edge with status "_"')),
+    ).toBe(true);
+  });
+
+  test("2.3 $START edge uses non-_ status", () => {
+    const wf = makeWorkflow();
+    wf.graph.$START = { ready: { role: "writer", prompt: "Begin" } };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) => e.includes('$START must have exactly one edge with status "_"')),
+    ).toBe(true);
+  });
+
+  test("2.4 $END has outgoing edges", () => {
+    const wf = makeWorkflow();
+    wf.graph.$END = { _: { role: "writer", prompt: "Loop" } };
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes("$END must not have outgoing edges"))).toBe(true);
+  });
+
+  test("2.5 unreachable role", () => {
+    const wf = makeWorkflow();
+    wf.roles.isolated = {
+      description: "Isolated",
+      goal: "Isolated",
+      capabilities: [],
+      procedure: "Isolated",
+      output: "Isolated",
+      frontmatter: {
+        type: "object",
+        properties: { $status: { enum: ["_"] } },
+        required: ["$status"],
+      } as unknown as string,
+    };
+    wf.graph.isolated = { _: { role: "$END", prompt: "done" } };
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes('role "isolated" is not reachable from $START'))).toBe(
+      true,
+    );
+  });
+
+  test("2.6 edge target references invalid role", () => {
+    const wf = makeWorkflow();
+    wf.graph.writer = { _: { role: "ghost", prompt: "Go to ghost" } };
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes('unknown target role "ghost"'))).toBe(true);
+  });
+});
+
+describe("Suite 3: Status-Edge Consistency", () => {
+  test("3.1 single-exit role with multiple graph keys", () => {
+    const wf = makeWorkflow();
+    wf.graph.writer = {
+      _: { role: "reviewer", prompt: "Review" },
+      extra: { role: "$END", prompt: "Done" },
+    };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) =>
+        e.includes('role "writer" is single-exit but has status keys other than "_"'),
+      ),
+    ).toBe(true);
+  });
+
+  test("3.2 single-exit role missing _ key", () => {
+    const wf = makeWorkflow();
+    wf.graph.writer = { done: { role: "reviewer", prompt: "Review" } };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) => e.includes('role "writer" is single-exit but graph has no "_" key')),
+    ).toBe(true);
+  });
+
+  test("3.3 multi-exit role with extra statuses", () => {
+    const wf = makeWorkflow();
+    wf.graph.reviewer = {
+      approved: { role: "$END", prompt: "Done" },
+      rejected: { role: "writer", prompt: "Fix" },
+      timeout: { role: "$END", prompt: "Timed out" },
+    };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) => e.includes('role "reviewer" graph has extra status keys: timeout')),
+    ).toBe(true);
+  });
+
+  test("3.4 multi-exit role missing a status", () => {
+    const wf = makeWorkflow();
+    wf.graph.reviewer = {
+      approved: { role: "$END", prompt: "Done" },
+    };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) => e.includes('role "reviewer" graph is missing status keys: rejected')),
+    ).toBe(true);
+  });
+
+  test("3.5 multi-exit role with _ key", () => {
+    const wf = makeWorkflow();
+    wf.graph.reviewer = { _: { role: "$END", prompt: "Done" } };
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes('role "reviewer" is multi-exit but graph uses "_"'))).toBe(
+      true,
+    );
+  });
+});
+
+describe("Suite 4: Mustache Template Variable Existence", () => {
+  test("4.1 prompt references nonexistent variable (single-exit)", () => {
+    const wf = makeWorkflow();
+    wf.graph.writer = { _: { role: "reviewer", prompt: "Review: {{{branch}}}" } };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) =>
+        e.includes('prompt variable "branch" not found in role "writer" frontmatter'),
+      ),
+    ).toBe(true);
+  });
+
+  test("4.2 prompt references nonexistent variable (multi-exit)", () => {
+    const wf = makeWorkflow();
+    wf.graph.reviewer = {
+      approved: { role: "$END", prompt: "Done: {{{branch}}}" },
+      rejected: { role: "writer", prompt: "Fix: {{{reason}}}" },
+    };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) =>
+        e.includes('prompt variable "branch" not found in role "reviewer" variant "approved"'),
+      ),
+    ).toBe(true);
+  });
+
+  test("4.3 valid mustache variables pass", () => {
+    const wf = makeWorkflow();
+    const errors = validateWorkflow(wf);
+    expect(errors).toEqual([]);
+  });
+
+  test("4.4 $status variable is always valid", () => {
+    const wf = makeWorkflow();
+    wf.graph.writer = { _: { role: "reviewer", prompt: "Status: {{$status}}" } };
+    const errors = validateWorkflow(wf);
+    expect(errors).toEqual([]);
+  });
+});
+
+describe("Suite 5: oneOf Discriminant Validity", () => {
+  test("5.1 oneOf without $status const", () => {
+    const wf = makeWorkflow();
+    wf.roles.reviewer = {
+      ...wf.roles.reviewer,
+      frontmatter: {
+        type: "object",
+        oneOf: [
+          { properties: { summary: { type: "string" } }, required: ["summary"] },
+          { properties: { reason: { type: "string" } }, required: ["reason"] },
+        ],
+      } as unknown as string,
+    };
+    const errors = validateWorkflow(wf);
+    expect(
+      errors.some((e) => e.includes('oneOf variants must have "$status" as const discriminant')),
+    ).toBe(true);
+  });
+
+  test("5.2 oneOf with non-const $status", () => {
+    const wf = makeWorkflow();
+    wf.roles.reviewer = {
+      ...wf.roles.reviewer,
+      frontmatter: {
+        type: "object",
+        oneOf: [
+          {
+            properties: { $status: { type: "string" }, summary: { type: "string" } },
+            required: ["$status", "summary"],
+          },
+          {
+            properties: { $status: { type: "string" }, reason: { type: "string" } },
+            required: ["$status", "reason"],
+          },
+        ],
+      } as unknown as string,
+    };
+    const errors = validateWorkflow(wf);
+    expect(errors.some((e) => e.includes("oneOf variant $status must be a const value"))).toBe(
+      true,
+    );
+  });
+
+  test("5.3 valid oneOf passes", () => {
+    const wf = makeWorkflow();
+    const errors = validateWorkflow(wf);
+    expect(errors).toEqual([]);
+  });
+});
+
+describe("Suite 6: Multiple Errors Collection", () => {
+  test("6.1 multiple errors collected", () => {
+    const wf = makeWorkflow();
+    // orphan role
+    wf.roles.orphan = {
+      description: "Orphan",
+      goal: "Nothing",
+      capabilities: [],
+      procedure: "None",
+      output: "None",
+      frontmatter: {
+        type: "object",
+        properties: { $status: { enum: ["_"] } },
+        required: ["$status"],
+      } as unknown as string,
+    };
+    // unknown graph reference
+    wf.graph.nonexistent = { _: { role: "$END", prompt: "done" } };
+    // bad mustache var
+    wf.graph.writer = { _: { role: "reviewer", prompt: "{{{badvar}}}" } };
+    const errors = validateWorkflow(wf);
+    expect(errors.length).toBeGreaterThanOrEqual(3);
+  });
+});
@@ -0,0 +1,379 @@
+import { mkdir, mkdtemp, rm, writeFile } from "node:fs/promises";
+import { tmpdir } from "node:os";
+import { join } from "node:path";
+import { createFsStore } from "@uncaged/json-cas-fs";
+import type { CasRef, WorkflowPayload } from "@uncaged/workflow-protocol";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+import { stringify } from "yaml";
+import { cmdThreadStart } from "../commands/thread.js";
+import { registerUwfSchemas } from "../schemas.js";
+import type { UwfStore } from "../store.js";
+import { loadWorkflowRegistry, saveWorkflowRegistry } from "../store.js";
+
+// ── helpers ───────────────────────────────────────────────────────────────────
+
+async function makeUwfStore(storageRoot: string): Promise<UwfStore> {
+  const casDir = join(storageRoot, "cas");
+  await mkdir(casDir, { recursive: true });
+  const store = createFsStore(casDir);
+  const schemas = await registerUwfSchemas(store);
+  return { storageRoot, store, schemas };
+}
+
+function makeMinimalPayload(name: string, description: string): WorkflowPayload {
+  return {
+    name,
+    description,
+    roles: {
+      worker: {
+        description: "worker role",
+        goal: "do work",
+        capabilities: [],
+        procedure: "",
+        output: "",
+        frontmatter: { type: "0000000000000" } as unknown as CasRef,
+      },
+    },
+    graph: {
+      $START: { _: { role: "worker", prompt: "start working" } },
+      worker: { _: { role: "$END", prompt: "done" } },
+    },
+  };
+}
+
+async function storeWorkflow(uwf: UwfStore, name: string): Promise<CasRef> {
+  const payload = makeMinimalPayload(name, "Test workflow");
+  return await uwf.store.put(uwf.schemas.workflow, payload);
+}
+
+async function createWorkflowYaml(name: string, version: string | null = null): Promise<string> {
+  const payload = makeMinimalPayload(
+    name,
+    version !== null ? `Test workflow (${version})` : "Test workflow",
+  );
+  const yaml = stringify(payload);
+  return yaml;
+}
+
+// ── fixture ───────────────────────────────────────────────────────────────────
+
+let tmpDir: string;
+let storageRoot: string;
+let projectRoot: string;
+
+beforeEach(async () => {
+  tmpDir = await mkdtemp(join(tmpdir(), "cli-uwf-wf-resolve-test-"));
+  storageRoot = join(tmpDir, "storage");
+  projectRoot = join(tmpDir, "project");
+  await mkdir(storageRoot, { recursive: true });
+  await mkdir(projectRoot, { recursive: true });
+});
+
+afterEach(async () => {
+  await rm(tmpDir, { recursive: true, force: true });
+});
+
+// ── Strategy 1: CAS Hash Resolution ───────────────────────────────────────────
+
+describe("Strategy 1: CAS Hash Resolution", () => {
+  test("should resolve valid 13-char Crockford Base32 hash", async () => {
+    const uwf = await makeUwfStore(storageRoot);
+    const hash = await storeWorkflow(uwf, "test-workflow");
+
+    const result = await cmdThreadStart(storageRoot, hash, "test prompt", projectRoot);
+
+    expect(result.workflow).toBe(hash);
+    expect(result.thread).toMatch(/^[0-9A-HJKMNP-TV-Z]{26}$/);
+  });
+
+  test("should fail on invalid hash format (non-Crockford characters)", async () => {
+    await makeUwfStore(storageRoot);
+
+    await expect(
+      cmdThreadStart(storageRoot, "123456789ABCD", "prompt", projectRoot),
+    ).rejects.toThrow();
+  });
+
+  test("should fail on valid-format hash not present in CAS", async () => {
+    await makeUwfStore(storageRoot);
+    const fakeHash = "0000000000000"; // valid format, doesn't exist
+
+    await expect(cmdThreadStart(storageRoot, fakeHash, "prompt", projectRoot)).rejects.toThrow();
+  });
+
+  test("should reject 40-char hex hash (legacy format not supported)", async () => {
+    await makeUwfStore(storageRoot);
+    const hexHash = "a".repeat(40);
+
+    await expect(cmdThreadStart(storageRoot, hexHash, "prompt", projectRoot)).rejects.toThrow();
+  });
+});
+
+// ── Strategy 2: File Path Resolution ──────────────────────────────────────────
+
+describe("Strategy 2: File Path Resolution", () => {
+  test("should load workflow from absolute file path", async () => {
+    await makeUwfStore(storageRoot);
+    const yamlPath = join(tmpDir, "test-workflow.yaml");
+    await writeFile(yamlPath, await createWorkflowYaml("test-workflow"));
+
+    const result = await cmdThreadStart(storageRoot, yamlPath, "prompt", projectRoot);
+
+    expect(result.workflow).toMatch(/^[0-9A-HJKMNP-TV-Z]{13}$/);
+    const uwf = await makeUwfStore(storageRoot);
+    const node = uwf.store.get(result.workflow);
+    expect(node).not.toBeNull();
+    if (node !== null) {
+      expect((node.payload as WorkflowPayload).name).toBe("test-workflow");
+    }
+  });
+
+  test("should load workflow from relative file path", async () => {
+    await makeUwfStore(storageRoot);
+    const yamlPath = "test-workflow.yaml";
+    await writeFile(join(projectRoot, yamlPath), await createWorkflowYaml("test-workflow"));
+
+    const result = await cmdThreadStart(storageRoot, yamlPath, "prompt", projectRoot);
+
+    expect(result.workflow).toMatch(/^[0-9A-HJKMNP-TV-Z]{13}$/);
+  });
+
+  test("should fail when file path does not exist", async () => {
+    await makeUwfStore(storageRoot);
+
+    await expect(
+      cmdThreadStart(storageRoot, "./nonexistent.yaml", "prompt", projectRoot),
+    ).rejects.toThrow();
+  });
+
+  test("should fail on invalid YAML syntax in file", async () => {
+    await makeUwfStore(storageRoot);
+    const yamlPath = join(tmpDir, "bad-syntax.yaml");
+    await writeFile(yamlPath, "invalid: yaml: : :");
+
+    await expect(cmdThreadStart(storageRoot, yamlPath, "prompt", projectRoot)).rejects.toThrow();
+  });
+
+  test("should fail on valid YAML with invalid WorkflowPayload shape", async () => {
+    await makeUwfStore(storageRoot);
+    const yamlPath = join(tmpDir, "invalid-workflow.yaml");
+    await writeFile(yamlPath, "name: test\n# missing roles and graph");
+
+    await expect(cmdThreadStart(storageRoot, yamlPath, "prompt", projectRoot)).rejects.toThrow();
+  });
+
+  test("should enforce filename matches workflow name", async () => {
+    await makeUwfStore(storageRoot);
+    const yamlPath = join(tmpDir, "solve-issue.yaml");
+    await writeFile(yamlPath, await createWorkflowYaml("wrong-name"));
+
+    await expect(cmdThreadStart(storageRoot, yamlPath, "prompt", projectRoot)).rejects.toThrow();
+  });
+});
+
+// ── Strategy 3: Local Discovery (Parent Traversal) ────────────────────────────
+
+describe("Strategy 3: Local Discovery", () => {
+  test("should find workflow in current directory .workflow/", async () => {
+    await makeUwfStore(storageRoot);
+    const workflowDir = join(projectRoot, ".workflow");
+    await mkdir(workflowDir, { recursive: true });
+    await writeFile(join(workflowDir, "solve-issue.yaml"), await createWorkflowYaml("solve-issue"));
+
+    const result = await cmdThreadStart(storageRoot, "solve-issue", "prompt", projectRoot);
+
+    expect(result.workflow).toMatch(/^[0-9A-HJKMNP-TV-Z]{13}$/);
+    const uwf = await makeUwfStore(storageRoot);
+    const node = uwf.store.get(result.workflow);
+    expect(node).not.toBeNull();
+    if (node !== null) {
+      expect((node.payload as WorkflowPayload).name).toBe("solve-issue");
+    }
+  });
+
+  test("should find workflow in parent directory .workflow/", async () => {
+    await makeUwfStore(storageRoot);
+    const workflowDir = join(projectRoot, ".workflow");
+    await mkdir(workflowDir, { recursive: true });
+    await writeFile(join(workflowDir, "solve-issue.yaml"), await createWorkflowYaml("solve-issue"));
+
+    const subdir = join(projectRoot, "packages", "cli-workflow", "src");
+    await mkdir(subdir, { recursive: true });
+
+    const result = await cmdThreadStart(storageRoot, "solve-issue", "prompt", subdir);
+
+    expect(result.workflow).toMatch(/^[0-9A-HJKMNP-TV-Z]{13}$/);
+  });
+
+  test("should stop at filesystem root when traversing", async () => {
+    await makeUwfStore(storageRoot);
+    const deepPath = join(tmpDir, "deep", "path", "that", "does", "not", "have", "workflow");
+    await mkdir(deepPath, { recursive: true });
+
+    await expect(cmdThreadStart(storageRoot, "nonexistent", "prompt", deepPath)).rejects.toThrow();
+  });
+
+  test("should prefer .workflow/ over .workflows/ directory", async () => {
+    await makeUwfStore(storageRoot);
+    const workflowDir = join(projectRoot, ".workflow");
+    const workflowsDir = join(projectRoot, ".workflows");
+    await mkdir(workflowDir, { recursive: true });
+    await mkdir(workflowsDir, { recursive: true });
+
+    await writeFile(
+      join(workflowDir, "solve-issue.yaml"),
+      await createWorkflowYaml("solve-issue", "1"),
+    );
+    await writeFile(
+      join(workflowsDir, "solve-issue.yaml"),
+      await createWorkflowYaml("solve-issue", "2"),
+    );
+
+    const result = await cmdThreadStart(storageRoot, "solve-issue", "prompt", projectRoot);
+
+    const uwf = await makeUwfStore(storageRoot);
+    const node = uwf.store.get(result.workflow);
+    expect(node).not.toBeNull();
+    if (node !== null) {
+      expect((node.payload as WorkflowPayload).description).toBe("Test workflow (1)");
+    }
+  });
+
+  test("should support .yml extension in local discovery", async () => {
+    await makeUwfStore(storageRoot);
+    const workflowDir = join(projectRoot, ".workflow");
+    await mkdir(workflowDir, { recursive: true });
+    await writeFile(join(workflowDir, "solve-issue.yml"), await createWorkflowYaml("solve-issue"));
+
+    const result = await cmdThreadStart(storageRoot, "solve-issue", "prompt", projectRoot);
+
+    expect(result.workflow).toMatch(/^[0-9A-HJKMNP-TV-Z]{13}$/);
+  });
+});
+
+// ── Strategy 4: Global Registry Fallback ──────────────────────────────────────
+
+describe("Strategy 4: Global Registry Resolution", () => {
+  test("should resolve workflow from global registry when not found locally", async () => {
+    const uwf = await makeUwfStore(storageRoot);
+    const hash = await storeWorkflow(uwf, "deploy-pipeline");
+    const registry = await loadWorkflowRegistry(storageRoot);
+    registry["deploy-pipeline"] = hash;
+    await saveWorkflowRegistry(storageRoot, registry);
+
+    const isolatedRoot = join(tmpDir, "isolated");
+    await mkdir(isolatedRoot, { recursive: true });
+
+    const result = await cmdThreadStart(storageRoot, "deploy-pipeline", "prompt", isolatedRoot);
+
+    expect(result.workflow).toBe(hash);
+  });
+
+  test("should fail when workflow not found in any strategy", async () => {
+    await makeUwfStore(storageRoot);
+
+    await expect(cmdThreadStart(storageRoot, "nonexistent", "prompt", tmpDir)).rejects.toThrow();
+  });
+});
+
+// ── Strategy Priority Order ───────────────────────────────────────────────────
+
+describe("Resolution Priority", () => {
+  test("should use explicit file path over local discovery", async () => {
+    await makeUwfStore(storageRoot);
+
+    // Setup: Create workflow in .workflow/ AND as explicit file
+    const workflowDir = join(projectRoot, ".workflow");
+    await mkdir(workflowDir, { recursive: true });
+    await writeFile(
+      join(workflowDir, "solve-issue.yaml"),
+      await createWorkflowYaml("solve-issue", "discovery"),
+    );
+
+    const explicitPath = join(projectRoot, "custom-solve-issue.yaml");
+    await writeFile(explicitPath, await createWorkflowYaml("custom-solve-issue", "explicit"));
+
+    // Execute with explicit path
+    const result = await cmdThreadStart(storageRoot, explicitPath, "prompt", projectRoot);
+
+    const uwf = await makeUwfStore(storageRoot);
+    const node = uwf.store.get(result.workflow);
+    expect(node).not.toBeNull();
+    if (node !== null) {
+      expect((node.payload as WorkflowPayload).description).toBe("Test workflow (explicit)");
+    }
+  });
+
+  test("should use local discovery over global registry", async () => {
+    const uwf = await makeUwfStore(storageRoot);
+
+    // Setup: Register globally
+    const globalHash = await storeWorkflow(uwf, "solve-issue");
+    const registry = await loadWorkflowRegistry(storageRoot);
+    registry["solve-issue"] = globalHash;
+    await saveWorkflowRegistry(storageRoot, registry);
+
+    // Setup: Create local .workflow/
+    const workflowDir = join(projectRoot, ".workflow");
+    await mkdir(workflowDir, { recursive: true });
+    const localYaml = await createWorkflowYaml("solve-issue", "local");
+    await writeFile(join(workflowDir, "solve-issue.yaml"), localYaml);
+
+    const result = await cmdThreadStart(storageRoot, "solve-issue", "prompt", projectRoot);
+
+    const uwf2 = await makeUwfStore(storageRoot);
+    const node = uwf2.store.get(result.workflow);
+    expect(node).not.toBeNull();
+    if (node !== null) {
+      expect((node.payload as WorkflowPayload).description).toBe("Test workflow (local)");
+    }
+  });
+});
+
+// ── Edge Cases ────────────────────────────────────────────────────────────────
+
+describe("Edge Cases", () => {
+  test("should treat '13-char-string.yaml' as file path, not CAS hash", async () => {
+    await makeUwfStore(storageRoot);
+    const fileName = "0123456789ABC.yaml"; // 13 chars + .yaml
+    await writeFile(join(projectRoot, fileName), await createWorkflowYaml("0123456789ABC"));
+
+    const result = await cmdThreadStart(storageRoot, fileName, "prompt", projectRoot);
+
+    expect(result.workflow).toMatch(/^[0-9A-HJKMNP-TV-Z]{13}$/);
+  });
+
+  test("should handle workflow names containing slashes as file paths", async () => {
+    await makeUwfStore(storageRoot);
+    const filePath = "subdir/solve-issue.yaml";
+    const fullPath = join(projectRoot, filePath);
+    await mkdir(join(projectRoot, "subdir"), { recursive: true });
+    await writeFile(fullPath, await createWorkflowYaml("solve-issue"));
+
+    const result = await cmdThreadStart(storageRoot, filePath, "prompt", projectRoot);
+
+    expect(result.workflow).toMatch(/^[0-9A-HJKMNP-TV-Z]{13}$/);
+  });
+
+  test("should handle absolute paths correctly", async () => {
+    await makeUwfStore(storageRoot);
+    const absPath = join(tmpDir, "abs-workflow.yaml");
+    await writeFile(absPath, await createWorkflowYaml("abs-workflow"));
+
+    const result = await cmdThreadStart(storageRoot, absPath, "prompt", projectRoot);
+
+    expect(result.workflow).toMatch(/^[0-9A-HJKMNP-TV-Z]{13}$/);
+  });
+
+  test("should fail on empty workflow ID", async () => {
+    await makeUwfStore(storageRoot);
+
+    await expect(cmdThreadStart(storageRoot, "", "prompt", projectRoot)).rejects.toThrow();
+  });
+
+  test("should fail on whitespace-only workflow ID", async () => {
+    await makeUwfStore(storageRoot);
+
+    await expect(cmdThreadStart(storageRoot, "   ", "prompt", projectRoot)).rejects.toThrow();
+  });
+});
@@ -0,0 +1,147 @@
+import { mkdir, readdir, readFile, rename, rm, writeFile } from "node:fs/promises";
+import { join } from "node:path";
+import type { RunningThreadItem, ThreadId } from "@uncaged/workflow-protocol";
+
+import type { RunningMarker } from "./types.js";
+
+/**
+ * Get the path to the running markers directory.
+ */
+export function getRunningDir(storageRoot: string): string {
+  return join(storageRoot, "running");
+}
+
+/**
+ * Get the path to a specific thread's marker file.
+ */
+export function getMarkerPath(storageRoot: string, threadId: ThreadId): string {
+  return join(getRunningDir(storageRoot), `${threadId}.json`);
+}
+
+/**
+ * Check if a PID is still running.
+ * Returns true if the process exists, false otherwise.
+ */
+export function isPidAlive(pid: number): boolean {
+  try {
+    // process.kill with signal 0 checks existence without killing
+    process.kill(pid, 0);
+    return true;
+  } catch {
+    // ESRCH means process doesn't exist
+    return false;
+  }
+}
+
+/**
+ * Create a marker file for a running thread.
+ * Writes to a temp file in the same directory, then atomically renames.
+ */
+export async function createMarker(storageRoot: string, marker: RunningMarker): Promise<void> {
+  const runningDir = getRunningDir(storageRoot);
+  await mkdir(runningDir, { recursive: true });
+
+  const markerPath = getMarkerPath(storageRoot, marker.thread);
+  const tempPath = join(runningDir, `.${marker.thread}-${process.pid}.tmp`);
+
+  const content = JSON.stringify(marker, null, 2);
+  await writeFile(tempPath, content, "utf8");
+  await rename(tempPath, markerPath);
+}
+
+/**
+ * Delete a marker file for a thread.
+ */
+export async function deleteMarker(storageRoot: string, threadId: ThreadId): Promise<void> {
+  const markerPath = getMarkerPath(storageRoot, threadId);
+  try {
+    await rm(markerPath);
+  } catch {
+    // Ignore errors if file doesn't exist
+  }
+}
+
+/**
+ * Read a marker file. Returns null if file doesn't exist or is invalid.
+ */
+export async function readMarker(
+  storageRoot: string,
+  threadId: ThreadId,
+): Promise<RunningMarker | null> {
+  const markerPath = getMarkerPath(storageRoot, threadId);
+  try {
+    const content = await readFile(markerPath, "utf8");
+    const marker = JSON.parse(content) as RunningMarker;
+    return marker;
+  } catch {
+    return null;
+  }
+}
+
+/**
+ * List all running threads, filtering out stale markers.
+ */
+export async function listRunningThreads(storageRoot: string): Promise<RunningThreadItem[]> {
+  const runningDir = getRunningDir(storageRoot);
+
+  let files: string[];
+  try {
+    files = await readdir(runningDir);
+  } catch {
+    // Directory doesn't exist or can't be read
+    return [];
+  }
+
+  const results: RunningThreadItem[] = [];
+
+  for (const filename of files) {
+    if (!filename.endsWith(".json")) {
+      continue;
+    }
+
+    const threadId = filename.slice(0, -5) as ThreadId;
+    const marker = await readMarker(storageRoot, threadId);
+
+    if (marker === null) {
+      // Invalid marker file
+      continue;
+    }
+
+    if (!isPidAlive(marker.pid)) {
+      // Stale marker - process no longer exists
+      await deleteMarker(storageRoot, threadId);
+      continue;
+    }
+
+    results.push({
+      thread: marker.thread,
+      workflow: marker.workflow,
+      pid: marker.pid,
+      startedAt: marker.startedAt,
+    });
+  }
+
+  return results;
+}
+
+/**
+ * Check if a thread is currently executing in the background.
+ * Returns the marker if running, null otherwise.
+ */
+export async function isThreadRunning(
+  storageRoot: string,
+  threadId: ThreadId,
+): Promise<RunningMarker | null> {
+  const marker = await readMarker(storageRoot, threadId);
+  if (marker === null) {
+    return null;
+  }
+
+  if (!isPidAlive(marker.pid)) {
+    // Stale marker
+    await deleteMarker(storageRoot, threadId);
+    return null;
+  }
+
+  return marker;
+}
@@ -0,0 +1,11 @@
+export {
+  createMarker,
+  deleteMarker,
+  getMarkerPath,
+  getRunningDir,
+  isPidAlive,
+  isThreadRunning,
+  listRunningThreads,
+  readMarker,
+} from "./background.js";
+export type { RunningMarker } from "./types.js";
@@ -0,0 +1,9 @@
+import type { CasRef, ThreadId } from "@uncaged/workflow-protocol";
+
+/** Marker file stored at ~/.uncaged/workflow/running/<thread-id>.json */
+export type RunningMarker = {
+  thread: ThreadId;
+  workflow: CasRef;
+  pid: number;
+  startedAt: number;
+};
@@ -1,8 +1,7 @@
 #!/usr/bin/env bun

-import type { ThreadId } from "@uncaged/workflow-protocol";
+import type { CasRef, ThreadId } from "@uncaged/workflow-protocol";
 import { Command } from "commander";
-import { stringify as yamlStringify } from "yaml";
 import {
  cmdCasGet,
  cmdCasHas,
@@ -14,21 +13,23 @@ import {
  cmdCasSchemaList,
  cmdCasWalk,
 } from "./commands/cas.js";
+import { cmdLogClean, cmdLogList, cmdLogShow } from "./commands/log.js";
 import { cmdSetup, cmdSetupInteractive } from "./commands/setup.js";
 import { cmdSkillCli } from "./commands/skill.js";
+import { cmdStepFork, cmdStepList, cmdStepRead, cmdStepShow } from "./commands/step.js";
 import {
-  cmdThreadFork,
-  cmdThreadKill,
+  cmdThreadCancel,
+  cmdThreadExec,
  cmdThreadList,
  cmdThreadRead,
  cmdThreadShow,
  cmdThreadStart,
-  cmdThreadStep,
-  cmdThreadStepDetails,
-  cmdThreadSteps,
+  cmdThreadStop,
  THREAD_READ_DEFAULT_QUOTA,
+  type ThreadStatus,
 } from "./commands/thread.js";
-import { cmdWorkflowList, cmdWorkflowPut, cmdWorkflowShow } from "./commands/workflow.js";
+import { parseTimeInput } from "./commands/thread-time-parser.js";
+import { cmdWorkflowAdd, cmdWorkflowList, cmdWorkflowShow } from "./commands/workflow.js";
 import { formatOutput, type OutputFormat } from "./format.js";
 import { resolveStorageRoot } from "./store.js";

@@ -51,20 +52,27 @@ const program = new Command();
 const pkg = await import("../package.json", { with: { type: "json" } });
 program
  .name("uwf")
-  .description("Stateless workflow CLI")
+  .description(
+    "Stateless workflow CLI\n\n" +
+      "Four-layer architecture:\n" +
+      "  workflow → thread → step → turn\n" +
+      "  模板定义   执行实例   单步结果   agent内部交互",
+  )
  .version(pkg.default.version, "-V, --version");
 program.option("--format <fmt>", "Output format: json or yaml", "json");

-const workflow = program.command("workflow").description("Workflow registry and CAS");
+const workflow = program
+  .command("workflow")
+  .description("Workflow definitions (layer 1: templates)");

 workflow
-  .command("put")
+  .command("add")
  .description("Register a workflow from YAML")
  .argument("<file>", "Workflow YAML file")
  .action((file: string) => {
    const storageRoot = resolveStorageRoot();
    runAction(async () => {
-      const result = await cmdWorkflowPut(storageRoot, file);
+      const result = await cmdWorkflowAdd(storageRoot, file);
      writeOutput(result);
    });
  });
@@ -92,7 +100,7 @@ workflow
    });
  });

-const thread = program.command("thread").description("Thread lifecycle and execution");
+const thread = program.command("thread").description("Thread execution (layer 2: instances)");

 thread
  .command("start")
@@ -108,18 +116,46 @@ thread
  });

 thread
-  .command("step")
-  .description("Execute one step")
+  .command("exec")
+  .description("Execute one or more steps")
  .argument("<thread-id>", "Thread ULID")
  .option("--agent <cmd>", "Override agent command")
-  .action((threadId: string, opts: { agent: string | undefined }) => {
-    const storageRoot = resolveStorageRoot();
-    runAction(async () => {
-      const agentOverride = opts.agent ?? null;
-      const result = await cmdThreadStep(storageRoot, threadId, agentOverride);
-      writeOutput(result);
-    });
-  });
+  .option("-c, --count <number>", "Number of steps to run (default: 1)")
+  .option("--background", "Run in background and return immediately")
+  .option("--_background-worker", "Internal flag for background worker process", false)
+  .action(
+    (
+      threadId: string,
+      opts: {
+        agent: string | undefined;
+        count: string | undefined;
+        background: boolean;
+        _backgroundWorker: boolean;
+      },
+    ) => {
+      const storageRoot = resolveStorageRoot();
+      runAction(async () => {
+        const agentOverride = opts.agent ?? null;
+        const count = opts.count !== undefined ? Number(opts.count) : 1;
+        const background = opts.background ?? false;
+        const backgroundWorker = opts._backgroundWorker ?? false;
+
+        const results = await cmdThreadExec(
+          storageRoot,
+          threadId,
+          agentOverride,
+          count,
+          background,
+          backgroundWorker,
+        );
+        if (results.length === 1) {
+          writeOutput(results[0]);
+        } else {
+          writeOutput(results);
+        }
+      });
+    },
+  );

 thread
  .command("show")
@@ -133,38 +169,124 @@ thread
    });
  });

+// Helper functions for thread list command parsing
+function parseStatusFilter(status: string | undefined): ThreadStatus[] | null {
+  if (status === undefined) return null;
+  const raw = status.trim();
+  if (raw === "active") return ["idle", "running"];
+
+  const parts = raw.split(",").map((s) => s.trim());
+  const validStatuses: ThreadStatus[] = ["idle", "running", "completed"];
+  for (const part of parts) {
+    if (!validStatuses.includes(part as ThreadStatus)) {
+      process.stderr.write(
+        `Invalid status: ${part}. Must be one of: idle, running, completed, active\n`,
+      );
+      process.exit(1);
+    }
+  }
+  return parts as ThreadStatus[];
+}
+
+function parseTimeFilters(
+  after: string | undefined,
+  before: string | undefined,
+  nowMs: number,
+): { afterMs: number | null; beforeMs: number | null } {
+  try {
+    const afterMs = after !== undefined ? parseTimeInput(after, nowMs) : null;
+    const beforeMs = before !== undefined ? parseTimeInput(before, nowMs) : null;
+    return { afterMs, beforeMs };
+  } catch (e) {
+    const message = e instanceof Error ? e.message : String(e);
+    process.stderr.write(`${message}\n`);
+    process.exit(1);
+  }
+}
+
+function parsePaginationOptions(
+  skip: string | undefined,
+  take: string | undefined,
+): { skip: number | null; take: number | null } {
+  let skipVal: number | null = null;
+  let takeVal: number | null = null;
+
+  if (skip !== undefined) {
+    skipVal = Number.parseInt(skip, 10);
+    if (!Number.isInteger(skipVal) || skipVal < 0) {
+      process.stderr.write("--skip must be a non-negative integer\n");
+      process.exit(1);
+    }
+  }
+  if (take !== undefined) {
+    takeVal = Number.parseInt(take, 10);
+    if (!Number.isInteger(takeVal) || takeVal < 1) {
+      process.stderr.write("--take must be a positive integer\n");
+      process.exit(1);
+    }
+  }
+  return { skip: skipVal, take: takeVal };
+}
+
 thread
  .command("list")
-  .description("List active threads")
-  .option("--all", "Include archived threads")
-  .action((opts: { all: boolean }) => {
+  .description("List threads")
+  .option(
+    "--status <status>",
+    "Filter by status: idle, running, completed, active (idle+running), or comma-separated values",
+  )
+  .option("--after <date>", "Filter threads created after this date (ISO or relative like '7d')")
+  .option("--before <date>", "Filter threads created before this date (ISO or relative like '7d')")
+  .option("--skip <n>", "Skip first n threads")
+  .option("--take <n>", "Return at most n threads")
+  .action(
+    (opts: {
+      status: string | undefined;
+      after: string | undefined;
+      before: string | undefined;
+      skip: string | undefined;
+      take: string | undefined;
+    }) => {
+      const storageRoot = resolveStorageRoot();
+      runAction(async () => {
+        const statusFilter = parseStatusFilter(opts.status);
+        const nowMs = Date.now();
+        const { afterMs, beforeMs } = parseTimeFilters(opts.after, opts.before, nowMs);
+        const { skip, take } = parsePaginationOptions(opts.skip, opts.take);
+
+        const result = await cmdThreadList(
+          storageRoot,
+          statusFilter,
+          afterMs,
+          beforeMs,
+          skip,
+          take,
+        );
+        writeOutput(result);
+      });
+    },
+  );
+
+thread
+  .command("stop")
+  .description("Stop background execution of a thread (keep thread active)")
+  .argument("<thread-id>", "Thread ULID")
+  .action((threadId: string) => {
    const storageRoot = resolveStorageRoot();
    runAction(async () => {
-      const result = await cmdThreadList(storageRoot, opts.all);
+      const result = await cmdThreadStop(storageRoot, threadId);
      writeOutput(result);
    });
  });

 thread
-  .command("kill")
-  .description("Terminate and archive a thread")
+  .command("cancel")
+  .description("Cancel a thread (stop execution and move to history)")
  .argument("<thread-id>", "Thread ULID")
  .action((threadId: string) => {
    const storageRoot = resolveStorageRoot();
    runAction(async () => {
-      const result = await cmdThreadKill(storageRoot, threadId);
-      writeOutput(result);
-    });
-  });
-
-thread
-  .command("steps")
-  .description("List all steps in a thread")
-  .argument("<thread-id>", "Thread ULID")
-  .action((threadId: string) => {
-    const storageRoot = resolveStorageRoot();
-    runAction(async () => {
-      const result = await cmdThreadSteps(storageRoot, threadId);
+      const result = await cmdThreadCancel(storageRoot, threadId);
      writeOutput(result);
    });
  });
@@ -198,28 +320,157 @@ thread
    },
  );

-thread
+const step = program.command("step").description("Step results (layer 3: single cycle)");
+
+step
+  .command("list")
+  .description("List all steps in a thread")
+  .argument("<thread-id>", "Thread ULID")
+  .action((threadId: string) => {
+    const storageRoot = resolveStorageRoot();
+    runAction(async () => {
+      const result = await cmdStepList(storageRoot, threadId);
+      writeOutput(result);
+    });
+  });
+
+step
+  .command("show")
+  .description("Show details of a specific step")
+  .argument("<step-hash>", "CAS hash of the StepNode")
+  .action((stepHash: string) => {
+    const storageRoot = resolveStorageRoot();
+    runAction(async () => {
+      const detail = await cmdStepShow(storageRoot, stepHash as CasRef);
+      writeOutput(detail);
+    });
+  });
+
+step
+  .command("read")
+  .description("Read a step's turns as human-readable markdown")
+  .argument("<step-hash>", "CAS hash of the StepNode")
+  .option("--quota <chars>", "Max output characters", "4000")
+  .action((stepHash: string, opts: { quota: string }) => {
+    const storageRoot = resolveStorageRoot();
+    runAction(async () => {
+      const quota = Number.parseInt(opts.quota, 10);
+      if (!Number.isFinite(quota) || quota < 1) {
+        process.stderr.write("invalid --quota: must be a positive integer\n");
+        process.exit(1);
+      }
+      const markdown = await cmdStepRead(storageRoot, stepHash as CasRef, quota);
+      process.stdout.write(markdown.endsWith("\n") ? markdown : `${markdown}\n`);
+    });
+  });
+
+step
  .command("fork")
  .description("Fork a thread from a specific step")
  .argument("<step-hash>", "CAS hash of the StartNode or StepNode to fork from")
  .action((stepHash: string) => {
    const storageRoot = resolveStorageRoot();
    runAction(async () => {
-      const result = await cmdThreadFork(storageRoot, stepHash);
+      const result = await cmdStepFork(storageRoot, stepHash as CasRef);
      writeOutput(result);
    });
  });

+// ── Deprecation Handlers ──────────────────────────────────────────────────────
+// These commands have been removed. Show helpful error messages.
+
+workflow
+  .command("put")
+  .description("[DEPRECATED] Use 'workflow add' instead")
+  .argument("<file>", "Workflow YAML file")
+  .action(() => {
+    process.stderr.write(`Error: Command 'workflow put' has been removed.
+Use 'workflow add' instead.
+
+For more information, see: uwf help workflow add
+`);
+    process.exit(1);
+  });
+
+thread
+  .command("step")
+  .description("[DEPRECATED] Use 'thread exec' instead")
+  .argument("<thread-id>", "Thread ULID")
+  .allowUnknownOption()
+  .action(() => {
+    process.stderr.write(`Error: Command 'thread step' has been removed.
+Use 'thread exec' instead.
+
+For more information, see: uwf help thread exec
+`);
+    process.exit(1);
+  });
+
+thread
+  .command("steps")
+  .description("[DEPRECATED] Use 'step list' instead")
+  .argument("<thread-id>", "Thread ULID")
+  .action(() => {
+    process.stderr.write(`Error: Command 'thread steps' has been removed.
+Use 'step list' instead.
+
+For more information, see: uwf help step list
+`);
+    process.exit(1);
+  });
+
 thread
  .command("step-details")
-  .description("Dump the full detail node of a step as YAML")
-  .argument("<step-hash>", "CAS hash of the StepNode")
-  .action((stepHash: string) => {
-    const storageRoot = resolveStorageRoot();
-    runAction(async () => {
-      const detail = await cmdThreadStepDetails(storageRoot, stepHash);
-      process.stdout.write(yamlStringify(detail));
-    });
+  .description("[DEPRECATED] Use 'step show' instead")
+  .argument("<step-hash>", "Step hash")
+  .action(() => {
+    process.stderr.write(`Error: Command 'thread step-details' has been removed.
+Use 'step show' instead.
+
+For more information, see: uwf help step show
+`);
+    process.exit(1);
+  });
+
+thread
+  .command("fork")
+  .description("[DEPRECATED] Use 'step fork' instead")
+  .argument("<step-hash>", "Step hash")
+  .action(() => {
+    process.stderr.write(`Error: Command 'thread fork' has been removed.
+Use 'step fork' instead.
+
+For more information, see: uwf help step fork
+`);
+    process.exit(1);
+  });
+
+thread
+  .command("kill")
+  .description("[DEPRECATED] Use 'thread stop' or 'thread cancel' instead")
+  .argument("<thread-id>", "Thread ULID")
+  .action(() => {
+    process.stderr.write(`Error: Command 'thread kill' has been removed.
+Use 'thread stop' to stop background execution (keep thread active),
+or 'thread cancel' to cancel and archive the thread.
+
+For more information, see:
+  uwf help thread stop
+  uwf help thread cancel
+`);
+    process.exit(1);
+  });
+
+thread
+  .command("running")
+  .description("[DEPRECATED] Use 'thread list --status running' instead")
+  .action(() => {
+    process.stderr.write(`Error: Command 'thread running' has been removed.
+Use 'thread list --status running' instead.
+
+For more information, see: uwf help thread list
+`);
+    process.exit(1);
  });

 const skill = program.command("skill").description("Built-in skill references for agents");
@@ -314,7 +565,11 @@ cas
  .action((hash: string) => {
    const storageRoot = resolveStorageRoot();
    runAction(async () => {
-      writeOutput(await cmdCasHas(storageRoot, hash));
+      const result = await cmdCasHas(storageRoot, hash);
+      writeOutput(result);
+      if (!result.exists) {
+        process.exit(1);
+      }
    });
  });

@@ -373,6 +628,55 @@ casSchema
    });
  });

+const log = program.command("log").description("Process-level debug logs");
+
+log
+  .command("list")
+  .description("List log files with sizes")
+  .action(() => {
+    const storageRoot = resolveStorageRoot();
+    runAction(async () => {
+      const result = await cmdLogList(storageRoot);
+      writeOutput(result);
+    });
+  });
+
+log
+  .command("show")
+  .description("Show and filter log entries")
+  .option("--thread <thread-id>", "Filter by thread ID")
+  .option("--process <pid>", "Filter by process ID")
+  .option("--date <date>", "Filter by date (YYYY-MM-DD)")
+  .action(
+    (opts: {
+      thread: string | undefined;
+      process: string | undefined;
+      date: string | undefined;
+    }) => {
+      const storageRoot = resolveStorageRoot();
+      runAction(async () => {
+        const result = await cmdLogShow(storageRoot, {
+          thread: opts.thread ?? null,
+          process: opts.process ?? null,
+          date: opts.date ?? null,
+        });
+        writeOutput(result);
+      });
+    },
+  );
+
+log
+  .command("clean")
+  .description("Delete log files older than given date")
+  .requiredOption("--before <date>", "Delete files before this date (YYYY-MM-DD)")
+  .action((opts: { before: string }) => {
+    const storageRoot = resolveStorageRoot();
+    runAction(async () => {
+      const result = await cmdLogClean(storageRoot, opts.before);
+      writeOutput(result);
+    });
+  });
+
 program.parseAsync(process.argv).catch((e: unknown) => {
  const message = e instanceof Error ? e.message : String(e);
  process.stderr.write(`${message}\n`);
@@ -0,0 +1,116 @@
+import { readdir, readFile, stat, unlink } from "node:fs/promises";
+import { join } from "node:path";
+
+type LogListItem = {
+  name: string;
+  size: number;
+  date: string;
+};
+
+type LogShowFilter = {
+  thread: string | null;
+  process: string | null;
+  date: string | null;
+};
+
+type LogEntry = {
+  ts: string;
+  pid: string;
+  tag: string;
+  msg: string;
+  thread: string | null;
+  workflow: string | null;
+};
+
+type LogCleanResult = {
+  deleted: number;
+};
+
+function logsDir(storageRoot: string): string {
+  return join(storageRoot, "logs");
+}
+
+async function listLogFiles(dir: string): Promise<Array<string>> {
+  try {
+    const files = await readdir(dir);
+    return files.filter((f) => f.endsWith(".jsonl")).sort();
+  } catch {
+    return [];
+  }
+}
+
+function dateFromFilename(name: string): string {
+  return name.replace(".jsonl", "");
+}
+
+async function parseJsonlFile(path: string): Promise<Array<LogEntry>> {
+  const content = await readFile(path, "utf-8");
+  const lines = content
+    .trim()
+    .split("\n")
+    .filter((l) => l.length > 0);
+  return lines.map((line) => JSON.parse(line) as LogEntry);
+}
+
+export async function cmdLogList(storageRoot: string): Promise<Array<LogListItem>> {
+  const dir = logsDir(storageRoot);
+  const files = await listLogFiles(dir);
+  const items: Array<LogListItem> = [];
+  for (const name of files) {
+    const s = await stat(join(dir, name));
+    items.push({ name, size: s.size, date: dateFromFilename(name) });
+  }
+  // sort by date descending
+  items.sort((a, b) => (a.date > b.date ? -1 : a.date < b.date ? 1 : 0));
+  return items;
+}
+
+export async function cmdLogShow(
+  storageRoot: string,
+  filter: LogShowFilter,
+): Promise<Array<LogEntry>> {
+  const dir = logsDir(storageRoot);
+  let files: Array<string>;
+
+  if (filter.date !== null) {
+    files = [`${filter.date}.jsonl`];
+  } else {
+    files = await listLogFiles(dir);
+  }
+
+  let entries: Array<LogEntry> = [];
+  for (const file of files) {
+    try {
+      const parsed = await parseJsonlFile(join(dir, file));
+      entries = entries.concat(parsed);
+    } catch {
+      // file doesn't exist or is unreadable, skip
+    }
+  }
+
+  if (filter.thread !== null) {
+    entries = entries.filter((e) => e.thread === filter.thread);
+  }
+  if (filter.process !== null) {
+    entries = entries.filter((e) => e.pid === filter.process);
+  }
+
+  entries.sort((a, b) => (a.ts < b.ts ? -1 : a.ts > b.ts ? 1 : 0));
+  return entries;
+}
+
+export async function cmdLogClean(storageRoot: string, before: string): Promise<LogCleanResult> {
+  const dir = logsDir(storageRoot);
+  const files = await listLogFiles(dir);
+  let deleted = 0;
+
+  for (const name of files) {
+    const date = dateFromFilename(name);
+    if (date < before) {
+      await unlink(join(dir, name));
+      deleted++;
+    }
+  }
+
+  return { deleted };
+}
@@ -1,10 +1,46 @@
-import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
+import { existsSync, mkdirSync, readdirSync, readFileSync, statSync, writeFileSync } from "node:fs";
 import { join } from "node:path";
 import { stdin as input, stdout as output } from "node:process";
 import { createInterface } from "node:readline/promises";
-
+import type { Result } from "@uncaged/workflow-util";
 import { parse, stringify } from "yaml";

+/**
+ * Send a minimal chat completion request to verify the model is reachable.
+ * Returns ok on 2xx, error with reason string otherwise.
+ */
+export async function validateModel(
+  baseUrl: string,
+  apiKey: string,
+  model: string,
+): Promise<Result<void, string>> {
+  try {
+    const url = `${baseUrl.replace(/\/+$/, "")}/chat/completions`;
+    const res = await fetch(url, {
+      method: "POST",
+      headers: {
+        Authorization: `Bearer ${apiKey}`,
+        "Content-Type": "application/json",
+      },
+      body: JSON.stringify({
+        model,
+        messages: [{ role: "user", content: "hi" }],
+        max_tokens: 1,
+      }),
+      signal: AbortSignal.timeout(15_000),
+    });
+    if (!res.ok) {
+      return { ok: false, error: `HTTP ${res.status} ${res.statusText}` };
+    }
+    return { ok: true, value: undefined };
+  } catch (err: unknown) {
+    if (err instanceof DOMException && err.name === "AbortError") {
+      return { ok: false, error: "Request timed out — model endpoint unreachable" };
+    }
+    return { ok: false, error: `Network error — could not reach endpoint (${String(err)})` };
+  }
+}
+
 /**
 * Preset provider list — embedded to avoid runtime YAML loading dependency.
 * Keep in sync with providers.yaml in cli-workflow.
@@ -101,6 +137,182 @@ function apiKeyEnvName(providerName: string): string {
  return `${providerName.toUpperCase().replace(/[^A-Z0-9]/g, "_")}_API_KEY`;
 }

+// ──────────────────────────────────────────────────────────────────────────────
+// Extracted helpers — _discoverAgents
+// ──────────────────────────────────────────────────────────────────────────────
+
+/**
+ * Scans directories from a PATH string for uwf-* executables.
+ */
+export async function _searchPathDirs(pathEnv: string): Promise<string[]> {
+  if (!pathEnv) return [];
+  const dirs = pathEnv.split(":").filter((d) => d.length > 0);
+  const agents = new Set<string>();
+  for (const dir of dirs) {
+    _scanDirForAgents(dir, agents);
+  }
+  return Array.from(agents).sort();
+}
+
+function _scanDirForAgents(dir: string, agents: Set<string>): void {
+  try {
+    if (!existsSync(dir)) return;
+    const entries = readdirSync(dir);
+    for (const entry of entries) {
+      if (!entry.startsWith("uwf-") || entry === "uwf") continue;
+      if (_isExecutableFile(join(dir, entry))) {
+        agents.add(entry);
+      }
+    }
+  } catch {
+    // Skip inaccessible directories
+  }
+}
+
+function _isExecutableFile(fullPath: string): boolean {
+  try {
+    const s = statSync(fullPath);
+    return s.isFile() && (s.mode & 0o111) !== 0;
+  } catch {
+    return false;
+  }
+}
+
+/**
+ * Parses the stdout of `which -a` into sorted unique basenames.
+ */
+export function _parseWhichOutput(text: string): string[] {
+  if (!text) return [];
+  const agents = new Set<string>();
+  for (const line of text.trim().split("\n")) {
+    if (!line) continue;
+    const basename = line.split("/").pop() ?? "";
+    if (basename.startsWith("uwf-") && basename !== "uwf") {
+      agents.add(basename);
+    }
+  }
+  return Array.from(agents).sort();
+}
+
+/**
+ * Discover uwf-* agent binaries in PATH.
+ * Returns sorted list of binary names (e.g., ["uwf-hermes", "uwf-claude-code"]).
+ */
+export async function _discoverAgents(): Promise<string[]> {
+  try {
+    const agents = await _tryWhichDiscovery();
+    if (agents !== null) return agents;
+    return await _searchPathDirs(process.env.PATH ?? "");
+  } catch {
+    return [];
+  }
+}
+
+async function _tryWhichDiscovery(): Promise<string[] | null> {
+  try {
+    const proc = Bun.spawn(["which", "-a", "uwf-hermes", "uwf-claude-code", "uwf-cursor"], {
+      stdout: "pipe",
+      stderr: "pipe",
+    });
+    const text = await new Response(proc.stdout).text();
+    await proc.exited;
+    if (proc.exitCode !== 0) return null;
+    return _parseWhichOutput(text);
+  } catch {
+    return null;
+  }
+}
+
+// ──────────────────────────────────────────────────────────────────────────────
+// Extracted helpers — onData closure (promptSecret)
+// ──────────────────────────────────────────────────────────────────────────────
+
+/** Returns true for newline, carriage return, or EOF (EOT). */
+export function _isTerminator(c: string): boolean {
+  return c === "\n" || c === "\r" || c === "";
+}
+
+/** Returns true for DEL or backspace. */
+export function _isBackspace(c: string): boolean {
+  return c === "" || c === "\b";
+}
+
+// ──────────────────────────────────────────────────────────────────────────────
+// Extracted helpers — cmdSetupInteractive
+// ──────────────────────────────────────────────────────────────────────────────
+
+type ProviderEntry = { name: string; label: string; baseUrl: string };
+
+/** Prints the numbered provider list and custom option to stdout. */
+export function _printProviderMenu(providers: readonly ProviderEntry[]): void {
+  const numWidth = String(providers.length + 1).length;
+  for (let i = 0; i < providers.length; i++) {
+    const p = providers[i];
+    if (!p) continue;
+    const num = String(i + 1).padStart(numWidth);
+    console.log(`  ${num}) ${p.label.padEnd(28)} ${p.baseUrl}`);
+  }
+  const customNum = String(providers.length + 1).padStart(numWidth);
+  console.log(`  ${customNum}) Custom (enter name and URL manually)\n`);
+}
+
+/** Resolves a numeric choice string to a preset provider, or null for custom/invalid. */
+export function _resolveProviderChoice(
+  choice: string,
+  providers: readonly ProviderEntry[],
+): { providerName: string; baseUrl: string } | null {
+  const n = Number.parseInt(choice, 10);
+  if (Number.isNaN(n) || n < 1 || n > providers.length) return null;
+  const p = providers[n - 1];
+  if (!p) return null;
+  return { providerName: p.name, baseUrl: p.baseUrl };
+}
+
+/** Resolves numeric index or literal model name to a model string. */
+export function _resolveModelChoice(input: string, models: string[]): string {
+  const n = Number.parseInt(input, 10);
+  if (!Number.isNaN(n) && n >= 1 && n <= models.length) {
+    return models[n - 1] ?? input;
+  }
+  return input;
+}
+
+/** Prints the multi-column model list to stdout. */
+export function _printModelMenu(models: string[], termCols: number): void {
+  const nw = String(models.length).length;
+  const maxLen = models.reduce((m, s) => Math.max(m, s.length), 0);
+  const colWidth = nw + 2 + maxLen + 4;
+  const cols = Math.max(1, Math.floor(termCols / colWidth));
+  const rows = Math.ceil(models.length / cols);
+  for (let r = 0; r < rows; r++) {
+    let line = "";
+    for (let c = 0; c < cols; c++) {
+      const idx = c * rows + r;
+      if (idx >= models.length) break;
+      const num = String(idx + 1).padStart(nw);
+      const name = (models[idx] ?? "").padEnd(maxLen);
+      line += `  ${num}) ${name}  `;
+    }
+    console.log(line.trimEnd());
+  }
+}
+
+type ValidationResult = { ok: boolean; error: string | null };
+
+/** Prints the model validation result to stdout. */
+export function _printValidationResult(validation: ValidationResult): void {
+  if (validation.ok) {
+    console.log("✓ Model verified — connection successful.\n");
+  } else {
+    console.log(`\n⚠ Warning: Could not reach model — ${validation.error}`);
+    console.log(
+      "  Config saved, but you may want to try a different model or check your API key.\n",
+    );
+  }
+}
+
+// ──────────────────────────────────────────────────────────────────────────────
+
 /**
 * Merge setup args into config.yaml structure. Non-destructive — preserves existing entries.
 */
@@ -163,15 +375,59 @@ export async function cmdSetup(args: SetupArgs): Promise<Record<string, unknown>
  envData[envName] = args.apiKey;
  saveEnvFile(envPath, envData);

+  // Validate model connectivity
+  const validation = await validateModel(args.baseUrl, args.apiKey, args.model);
+
  return {
    configPath,
    envPath,
    provider: args.provider,
    model: args.model,
    defaultAgent: merged.defaultAgent,
+    validation,
  };
 }

+type SecretState = {
+  buf: string;
+  rawWasSet: boolean;
+  resolve: (value: string) => void;
+  onData: (chunk: string) => void;
+};
+
+function _handleSecretTerminator(state: SecretState): void {
+  if (process.stdin.isTTY) process.stdin.setRawMode(state.rawWasSet);
+  process.stdin.pause();
+  process.stdin.removeListener("data", state.onData);
+  process.stdout.write("\n");
+  state.resolve(state.buf.trim());
+}
+
+function _handleSecretBackspace(state: SecretState): void {
+  if (state.buf.length > 0) {
+    state.buf = state.buf.slice(0, -1);
+    process.stdout.write("\b \b");
+  }
+}
+
+function _handleSecretChar(c: string, state: SecretState): boolean {
+  if (_isTerminator(c)) {
+    _handleSecretTerminator(state);
+    return true;
+  }
+  if (_isBackspace(c)) {
+    _handleSecretBackspace(state);
+    return false;
+  }
+  if (c === "") {
+    if (process.stdin.isTTY) process.stdin.setRawMode(state.rawWasSet);
+    process.exit(130);
+  }
+  state.buf += c;
+  process.stdout.write("*");
+  return false;
+}
+
 /** Read a line with terminal echo disabled (for secrets). */
 async function promptSecret(label: string): Promise<string> {
  process.stdout.write(label);
@@ -183,33 +439,13 @@ async function promptSecret(label: string): Promise<string> {
    process.stdin.resume();
    process.stdin.setEncoding("utf8");

-    let buf = "";
-    const onData = (chunk: string) => {
+    const state: SecretState = { buf: "", rawWasSet, resolve, onData: () => {} };
+    state.onData = (chunk: string) => {
      for (const c of chunk.toString()) {
-        if (c === "\n" || c === "\r" || c === "\u0004") {
-          if (process.stdin.isTTY) process.stdin.setRawMode(rawWasSet);
-          process.stdin.pause();
-          process.stdin.removeListener("data", onData);
-          process.stdout.write("\n");
-          resolve(buf.trim());
-          return;
-        }
-        if (c === "\u007F" || c === "\b") {
-          if (buf.length > 0) {
-            buf = buf.slice(0, -1);
-            process.stdout.write("\b \b");
-          }
-          continue;
-        }
-        if (c === "\u0003") {
-          if (process.stdin.isTTY) process.stdin.setRawMode(rawWasSet);
-          process.exit(130);
-        }
-        buf += c;
-        process.stdout.write("*");
+        if (_handleSecretChar(c, state)) return;
      }
    };
-    process.stdin.on("data", onData);
+    process.stdin.on("data", state.onData);
  });
 }

@@ -235,6 +471,56 @@ async function fetchModels(baseUrl: string, apiKey: string): Promise<string[]> {
  }
 }

+async function _promptProviderSelection(
+  rl: ReturnType<typeof createInterface>,
+): Promise<{ providerName: string; baseUrl: string }> {
+  console.log("Select a provider:\n");
+  _printProviderMenu(PRESET_PROVIDERS);
+
+  const choice = (await rl.question(`Choose [1-${PRESET_PROVIDERS.length + 1}]: `)).trim();
+  const choiceNum = Number.parseInt(choice, 10);
+  if (Number.isNaN(choiceNum) || choiceNum < 1 || choiceNum > PRESET_PROVIDERS.length + 1) {
+    throw new Error(`Invalid choice: ${choice}`);
+  }
+
+  const preset = _resolveProviderChoice(choice, PRESET_PROVIDERS);
+  if (preset) {
+    const selected = PRESET_PROVIDERS[choiceNum - 1];
+    if (selected) {
+      console.log(`\n  → ${selected.label} (${selected.baseUrl})\n`);
+    }
+    return preset;
+  }
+
+  const providerName = (await rl.question("Provider name (e.g. my-proxy): ")).trim();
+  if (!providerName) throw new Error("Provider name required");
+  const baseUrl = (await rl.question("OpenAI-compatible API base URL: ")).trim();
+  if (!baseUrl) throw new Error("Base URL required");
+  return { providerName, baseUrl };
+}
+
+async function _promptModelSelection(
+  rl: ReturnType<typeof createInterface>,
+  baseUrl: string,
+  apiKey: string,
+): Promise<string> {
+  console.log("\nFetching available models...");
+  const models = await fetchModels(baseUrl, apiKey);
+
+  if (models.length === 0) {
+    console.log("Could not fetch models. Enter model name manually.");
+    const model = (await rl.question("Default model (e.g. qwen-plus, gpt-4o): ")).trim();
+    if (!model) throw new Error("Model required");
+    return model;
+  }
+  console.log(`\nAvailable models (${models.length}):\n`);
+  _printModelMenu(models, process.stdout.columns || 100);
+  console.log(`\nChoose a number, or type a model name directly.`);
+  const modelInput = (await rl.question(`Default model [1-${models.length}]: `)).trim();
+  if (!modelInput) throw new Error("Model required");
+  return _resolveModelChoice(modelInput, models);
+}
+
 /**
 * Interactive setup — prompts user for provider, API key, model.
 */
@@ -244,39 +530,7 @@ export async function cmdSetupInteractive(storageRoot: string): Promise<Record<s
  try {
    console.log("Configure LLM provider for uwf workflow agents.\n");

-    // 1. Provider selection
-    const numWidth = String(PRESET_PROVIDERS.length + 1).length;
-    console.log("Select a provider:\n");
-    for (let i = 0; i < PRESET_PROVIDERS.length; i++) {
-      const p = PRESET_PROVIDERS[i];
-      if (!p) continue;
-      const num = String(i + 1).padStart(numWidth);
-      console.log(`  ${num}) ${p.label.padEnd(28)} ${p.baseUrl}`);
-    }
-    const customNum = String(PRESET_PROVIDERS.length + 1).padStart(numWidth);
-    console.log(`  ${customNum}) Custom (enter name and URL manually)\n`);
-
-    const choice = (await rl.question(`Choose [1-${PRESET_PROVIDERS.length + 1}]: `)).trim();
-    const choiceNum = Number.parseInt(choice, 10);
-    if (Number.isNaN(choiceNum) || choiceNum < 1 || choiceNum > PRESET_PROVIDERS.length + 1) {
-      throw new Error(`Invalid choice: ${choice}`);
-    }
-
-    let providerName: string;
-    let baseUrl: string;
-
-    if (choiceNum <= PRESET_PROVIDERS.length) {
-      const selected = PRESET_PROVIDERS[choiceNum - 1];
-      if (!selected) throw new Error("Invalid selection");
-      providerName = selected.name;
-      baseUrl = selected.baseUrl;
-      console.log(`\n  → ${selected.label} (${selected.baseUrl})\n`);
-    } else {
-      providerName = (await rl.question("Provider name (e.g. my-proxy): ")).trim();
-      if (!providerName) throw new Error("Provider name required");
-      baseUrl = (await rl.question("OpenAI-compatible API base URL: ")).trim();
-      if (!baseUrl) throw new Error("Base URL required");
-    }
+    const { providerName, baseUrl } = await _promptProviderSelection(rl);

    // 2. API key
    rl.close();
@@ -285,50 +539,11 @@ export async function cmdSetupInteractive(storageRoot: string): Promise<Record<s

    // 3. Model selection
    const rl2 = createInterface({ input, output });
-    console.log("\nFetching available models...");
-    const models = await fetchModels(baseUrl, apiKey);
-
-    let model: string;
-    if (models.length > 0) {
-      console.log(`\nAvailable models (${models.length}):\n`);
-      const nw = String(models.length).length;
-      // Multi-column layout
-      const maxLen = models.reduce((m, s) => Math.max(m, s.length), 0);
-      const colWidth = nw + 2 + maxLen + 4; // "  N) name    "
-      const termCols = process.stdout.columns || 100;
-      const cols = Math.max(1, Math.floor(termCols / colWidth));
-      const rows = Math.ceil(models.length / cols);
-      for (let r = 0; r < rows; r++) {
-        let line = "";
-        for (let c = 0; c < cols; c++) {
-          const idx = c * rows + r;
-          if (idx >= models.length) break;
-          const num = String(idx + 1).padStart(nw);
-          const name = (models[idx] ?? "").padEnd(maxLen);
-          line += `  ${num}) ${name}  `;
-        }
-        console.log(line.trimEnd());
-      }
-      console.log(`\nChoose a number, or type a model name directly.`);
-      const modelInput = (await rl2.question(`Default model [1-${models.length}]: `)).trim();
-      if (!modelInput) throw new Error("Model required");
-      const modelNum = Number.parseInt(modelInput, 10);
-      if (!Number.isNaN(modelNum) && modelNum >= 1 && modelNum <= models.length) {
-        model = models[modelNum - 1] ?? modelInput;
-      } else {
-        model = modelInput;
-      }
-    } else {
-      console.log("Could not fetch models. Enter model name manually.");
-      model = (await rl2.question("Default model (e.g. qwen-plus, gpt-4o): ")).trim();
-      if (!model) throw new Error("Model required");
-    }
-
+    const model = await _promptModelSelection(rl2, baseUrl, apiKey);
    rl2.close();
-
    console.log(`  → ${providerName}/${model}\n`);

-    await cmdSetup({
+    const setupResult = await cmdSetup({
      provider: providerName,
      baseUrl,
      apiKey,
@@ -336,6 +551,10 @@ export async function cmdSetupInteractive(storageRoot: string): Promise<Record<s
      storageRoot,
    });

+    // Show validation result
+    if (setupResult.validation && typeof setupResult.validation === "object") {
+      _printValidationResult(setupResult.validation as ValidationResult);
+    }
    console.log("Setup complete! Get started:\n");
    console.log("  uwf workflow put <workflow.yaml>   Register a workflow");
    console.log('  uwf thread start <name> -p "..."   Start a thread');
@@ -0,0 +1,231 @@
+import type { Store as CasStore, JSONSchema } from "@uncaged/json-cas";
+import { getSchema } from "@uncaged/json-cas";
+import type {
+  CasRef,
+  StartNodePayload,
+  StepNodePayload,
+  ThreadId,
+} from "@uncaged/workflow-protocol";
+import { findThreadInHistory, loadThreadsIndex, type UwfStore } from "../store.js";
+
+type ChainState = {
+  startHash: CasRef;
+  start: StartNodePayload;
+  stepsNewestFirst: StepNodePayload[];
+  headIsStart: boolean;
+};
+
+type OrderedStepItem = {
+  hash: CasRef;
+  payload: StepNodePayload;
+  timestamp: number;
+};
+
+function fail(message: string): never {
+  process.stderr.write(`${message}\n`);
+  process.exit(1);
+}
+
+function walkChain(uwf: UwfStore, headHash: CasRef): ChainState {
+  const headNode = uwf.store.get(headHash);
+  if (headNode === null) {
+    fail(`CAS node not found: ${headHash}`);
+  }
+
+  if (headNode.type === uwf.schemas.startNode) {
+    return {
+      startHash: headHash,
+      start: headNode.payload as StartNodePayload,
+      stepsNewestFirst: [],
+      headIsStart: true,
+    };
+  }
+
+  if (headNode.type !== uwf.schemas.stepNode) {
+    fail(`head ${headHash} is not a StartNode or StepNode`);
+  }
+
+  const stepsNewestFirst: StepNodePayload[] = [];
+  let hash: CasRef | null = headHash;
+
+  while (hash !== null) {
+    const node = uwf.store.get(hash);
+    if (node === null) {
+      fail(`CAS node not found while walking chain: ${hash}`);
+    }
+    if (node.type !== uwf.schemas.stepNode) {
+      break;
+    }
+    const payload = node.payload as StepNodePayload;
+    stepsNewestFirst.push(payload);
+    hash = payload.prev;
+  }
+
+  const newest = stepsNewestFirst[0];
+  if (newest === undefined) {
+    fail(`empty step chain at head ${headHash}`);
+  }
+
+  const startNode = uwf.store.get(newest.start);
+  if (startNode === null || startNode.type !== uwf.schemas.startNode) {
+    fail(`StartNode not found: ${newest.start}`);
+  }
+
+  return {
+    startHash: newest.start,
+    start: startNode.payload as StartNodePayload,
+    stepsNewestFirst,
+    headIsStart: false,
+  };
+}
+
+function expandOutput(uwf: UwfStore, outputRef: CasRef): unknown {
+  const node = uwf.store.get(outputRef);
+  if (node === null) {
+    return {};
+  }
+  return node.payload;
+}
+
+/**
+ * Recursively expand all cas_ref fields in a CAS node's payload,
+ * replacing hash strings with the referenced node's expanded payload.
+ */
+function expandDeep(store: CasStore, hash: CasRef, visited?: Set<string>): unknown {
+  const seen = visited ?? new Set<string>();
+  if (seen.has(hash)) return hash; // cycle guard
+  seen.add(hash);
+
+  const node = store.get(hash);
+  if (node === null) return hash;
+
+  const schema = getSchema(store, node.type);
+  if (schema === null) return node.payload;
+
+  return expandValue(store, schema, node.payload, seen);
+}
+
+function expandCasRefField(store: CasStore, value: unknown, visited: Set<string>): unknown {
+  if (typeof value === "string") {
+    return expandDeep(store, value as CasRef, visited);
+  }
+  return value;
+}
+
+function expandAnyOfField(
+  store: CasStore,
+  schema: JSONSchema,
+  value: unknown,
+  visited: Set<string>,
+): unknown {
+  if (!Array.isArray(schema.anyOf)) return value;
+  for (const sub of schema.anyOf as JSONSchema[]) {
+    if (sub.format === "cas_ref" && typeof value === "string") {
+      return expandDeep(store, value as CasRef, visited);
+    }
+  }
+  return value;
+}
+
+function expandArrayField(
+  store: CasStore,
+  schema: JSONSchema,
+  value: unknown,
+  visited: Set<string>,
+): unknown {
+  if (!schema.items || !Array.isArray(value)) return value;
+  const itemSchema = schema.items as JSONSchema;
+  return (value as unknown[]).map((item) => expandValue(store, itemSchema, item, visited));
+}
+
+function expandObjectField(
+  store: CasStore,
+  schema: JSONSchema,
+  value: unknown,
+  visited: Set<string>,
+): unknown {
+  if (value === null || typeof value !== "object" || Array.isArray(value) || !schema.properties) {
+    return value;
+  }
+  const props = schema.properties as Record<string, JSONSchema>;
+  const obj = value as Record<string, unknown>;
+  const result: Record<string, unknown> = {};
+  for (const [key, val] of Object.entries(obj)) {
+    const propSchema = props[key];
+    result[key] = propSchema ? expandValue(store, propSchema, val, visited) : val;
+  }
+  return result;
+}
+
+function expandValue(
+  store: CasStore,
+  schema: JSONSchema,
+  value: unknown,
+  visited: Set<string>,
+): unknown {
+  if (schema.format === "cas_ref") return expandCasRefField(store, value, visited);
+  if (Array.isArray(schema.anyOf)) return expandAnyOfField(store, schema, value, visited);
+  if (schema.type === "array") return expandArrayField(store, schema, value, visited);
+  return expandObjectField(store, schema, value, visited);
+}
+
+function collectOrderedSteps(
+  uwf: UwfStore,
+  headHash: CasRef,
+  chain: ChainState,
+): OrderedStepItem[] {
+  let hash: CasRef | null = headHash;
+  const hashToNode = new Map<string, { payload: StepNodePayload; timestamp: number }>();
+  while (hash !== null) {
+    const node = uwf.store.get(hash);
+    if (node === null || node.type !== uwf.schemas.stepNode) {
+      break;
+    }
+    const payload = node.payload as StepNodePayload;
+    hashToNode.set(hash, { payload, timestamp: node.timestamp });
+    hash = payload.prev;
+  }
+
+  let cur: CasRef | null = chain.headIsStart ? null : headHash;
+  const ordered: OrderedStepItem[] = [];
+  while (cur !== null) {
+    const entry = hashToNode.get(cur);
+    if (entry === undefined) {
+      break;
+    }
+    ordered.push({ hash: cur, ...entry });
+    cur = entry.payload.prev;
+  }
+
+  ordered.reverse();
+  return ordered;
+}
+
+async function resolveHeadHash(storageRoot: string, threadId: ThreadId): Promise<CasRef> {
+  const index = await loadThreadsIndex(storageRoot);
+  const activeHead = index[threadId];
+  if (activeHead !== undefined) {
+    return activeHead;
+  }
+  const hist = await findThreadInHistory(storageRoot, threadId);
+  if (hist !== null) {
+    return hist.head;
+  }
+  fail(`thread not found: ${threadId}`);
+}
+
+export {
+  type ChainState,
+  collectOrderedSteps,
+  expandAnyOfField,
+  expandArrayField,
+  expandCasRefField,
+  expandDeep,
+  expandObjectField,
+  expandOutput,
+  expandValue,
+  fail,
+  type OrderedStepItem,
+  resolveHeadHash,
+  walkChain,
+};
@@ -0,0 +1,256 @@
+import type { BootstrapCapableStore } from "@uncaged/json-cas";
+import type {
+  CasRef,
+  StartEntry,
+  StepEntry,
+  StepNodePayload,
+  ThreadForkOutput,
+  ThreadId,
+  ThreadStepsOutput,
+} from "@uncaged/workflow-protocol";
+import { generateUlid } from "@uncaged/workflow-util";
+import { createUwfStore, loadThreadsIndex, saveThreadsIndex } from "../store.js";
+import {
+  collectOrderedSteps,
+  expandDeep,
+  expandOutput,
+  fail,
+  resolveHeadHash,
+  walkChain,
+} from "./shared.js";
+
+type TurnData = {
+  index: number;
+  content: string;
+};
+
+/**
+ * List all steps in a thread (previously: thread steps)
+ */
+export async function cmdStepList(
+  storageRoot: string,
+  threadId: ThreadId,
+): Promise<ThreadStepsOutput> {
+  const headHash = await resolveHeadHash(storageRoot, threadId);
+  const uwf = await createUwfStore(storageRoot);
+  const chain = walkChain(uwf, headHash);
+
+  const startNode = uwf.store.get(chain.startHash);
+  if (startNode === null) {
+    fail(`StartNode not found: ${chain.startHash}`);
+  }
+
+  const startEntry: StartEntry = {
+    hash: chain.startHash,
+    workflow: chain.start.workflow,
+    prompt: chain.start.prompt,
+    timestamp: startNode.timestamp,
+  };
+
+  const stepEntries: StepEntry[] = [];
+  const ordered = collectOrderedSteps(uwf, headHash, chain);
+
+  for (const item of ordered) {
+    stepEntries.push({
+      hash: item.hash,
+      role: item.payload.role,
+      output: expandOutput(uwf, item.payload.output),
+      detail: item.payload.detail ?? null,
+      agent: item.payload.agent,
+      timestamp: item.timestamp,
+    });
+  }
+
+  return {
+    thread: threadId,
+    workflow: chain.start.workflow,
+    steps: [startEntry, ...stepEntries],
+  };
+}
+
+/**
+ * Show details of a specific step (previously: thread step-details)
+ */
+export async function cmdStepShow(storageRoot: string, stepHash: CasRef): Promise<unknown> {
+  const uwf = await createUwfStore(storageRoot);
+  const node = uwf.store.get(stepHash);
+  if (node === null) {
+    fail(`CAS node not found: ${stepHash}`);
+  }
+  if (node.type !== uwf.schemas.stepNode) {
+    fail(`node ${stepHash} is not a StepNode`);
+  }
+  const payload = node.payload as StepNodePayload;
+  if (!payload.detail) {
+    fail(`step ${stepHash} has no detail`);
+  }
+  return expandDeep(uwf.store, payload.detail);
+}
+
+/**
+ * Fork a thread from a specific step (previously: thread fork)
+ */
+export async function cmdStepFork(
+  storageRoot: string,
+  stepHash: CasRef,
+): Promise<ThreadForkOutput> {
+  const uwf = await createUwfStore(storageRoot);
+  const node = uwf.store.get(stepHash);
+  if (node === null) {
+    fail(`CAS node not found: ${stepHash}`);
+  }
+  if (node.type !== uwf.schemas.startNode && node.type !== uwf.schemas.stepNode) {
+    fail(`node ${stepHash} is not a StartNode or StepNode`);
+  }
+
+  const newThreadId = generateUlid(Date.now()) as ThreadId;
+  const index = await loadThreadsIndex(storageRoot);
+  index[newThreadId] = stepHash;
+  await saveThreadsIndex(storageRoot, index);
+
+  return {
+    thread: newThreadId,
+    forkedFrom: {
+      step: stepHash,
+    },
+  };
+}
+
+/**
+ * Load and validate step detail node from CAS store
+ */
+function loadStepDetail(store: BootstrapCapableStore, detailRef: CasRef): Record<string, unknown> {
+  const detailNode = store.get(detailRef);
+  if (detailNode === null) {
+    fail(`detail node not found: ${detailRef}`);
+  }
+  return detailNode.payload as Record<string, unknown>;
+}
+
+/**
+ * Load all turn nodes from CAS store and extract content
+ */
+function loadTurnData(store: BootstrapCapableStore, turns: unknown): TurnData[] {
+  if (!Array.isArray(turns) || turns.length === 0) {
+    return [];
+  }
+
+  const turnData: TurnData[] = [];
+  for (const turnRef of turns) {
+    if (typeof turnRef !== "string") {
+      continue;
+    }
+    const turnNode = store.get(turnRef as CasRef);
+    if (turnNode === null) {
+      continue;
+    }
+    const turn = turnNode.payload as Record<string, unknown>;
+    if (typeof turn.content === "string") {
+      turnData.push({
+        index: typeof turn.index === "number" ? turn.index : turnData.length,
+        content: turn.content,
+      });
+    }
+  }
+  return turnData;
+}
+
+/**
+ * Select turns that fit within quota, working backwards from most recent
+ */
+function selectTurnsForQuota(turnData: TurnData[], availableQuota: number): TurnData[] {
+  const selectedTurns: TurnData[] = [];
+  let totalChars = 0;
+
+  for (let i = turnData.length - 1; i >= 0; i--) {
+    const turn = turnData[i];
+    if (turn === undefined) continue;
+
+    const turnHeader = `## Turn ${turn.index + 1}\n\n`;
+    const turnBlock = turnHeader + turn.content;
+    const separatorCost = selectedTurns.length > 0 ? 2 : 0;
+    const addCost = turnBlock.length + separatorCost;
+
+    if (totalChars + addCost > availableQuota && selectedTurns.length > 0) {
+      break;
+    }
+
+    selectedTurns.unshift(turn);
+    totalChars += addCost;
+  }
+
+  return selectedTurns;
+}
+
+/**
+ * Assemble final markdown output from header and selected turns
+ */
+function formatStepMarkdown(
+  stepHash: CasRef,
+  role: string,
+  agent: string,
+  turnData: TurnData[],
+  selectedTurns: TurnData[],
+): string {
+  const parts: string[] = [];
+  parts.push(`# Step ${stepHash}`);
+  parts.push("");
+  parts.push(`**Role:** ${role}`);
+  parts.push(`**Agent:** ${agent}`);
+
+  if (selectedTurns.length === 0) {
+    return parts.join("\n");
+  }
+
+  const skippedCount = turnData.length - selectedTurns.length;
+  if (skippedCount > 0) {
+    parts.push("");
+    parts.push(`_[Earlier turns omitted due to quota. Use --quota to increase.]_`);
+  }
+
+  for (const turn of selectedTurns) {
+    parts.push("");
+    parts.push(`## Turn ${turn.index + 1}`);
+    parts.push("");
+    parts.push(turn.content);
+  }
+
+  return parts.join("\n");
+}
+
+/**
+ * Read a step's agent turns as human-readable markdown with quota enforcement
+ */
+export async function cmdStepRead(
+  storageRoot: string,
+  stepHash: CasRef,
+  quota: number,
+): Promise<string> {
+  const uwf = await createUwfStore(storageRoot);
+  const node = uwf.store.get(stepHash);
+  if (node === null) {
+    fail(`CAS node not found: ${stepHash}`);
+  }
+  if (node.type !== uwf.schemas.stepNode) {
+    fail(`node ${stepHash} is not a StepNode`);
+  }
+  const payload = node.payload as StepNodePayload;
+
+  if (payload.detail === null) {
+    return formatStepMarkdown(stepHash, payload.role, payload.agent, [], []);
+  }
+
+  const detail = loadStepDetail(uwf.store, payload.detail);
+  const turnData = loadTurnData(uwf.store, detail.turns);
+
+  if (turnData.length === 0) {
+    return formatStepMarkdown(stepHash, payload.role, payload.agent, [], []);
+  }
+
+  const headerSection = formatStepMarkdown(stepHash, payload.role, payload.agent, [], []);
+  const BUFFER = 200;
+  const availableQuota = quota - headerSection.length - BUFFER;
+  const selectedTurns = selectTurnsForQuota(turnData, availableQuota);
+
+  return formatStepMarkdown(stepHash, payload.role, payload.agent, turnData, selectedTurns);
+}
@@ -0,0 +1,23 @@
+/**
+ * Parse time input: ISO date (YYYY-MM-DD, YYYY-MM-DDTHH:MM:SS) or relative (7d, 24h, 30m)
+ * Returns Unix timestamp in milliseconds.
+ */
+export function parseTimeInput(input: string, nowMs: number): number {
+  const trimmed = input.trim();
+
+  // Relative time: 7d, 24h, 30m
+  const relativeMatch = /^(\d+)(d|h|m)$/.exec(trimmed);
+  if (relativeMatch !== null) {
+    const value = Number.parseInt(relativeMatch[1], 10);
+    const unit = relativeMatch[2];
+    const multiplier = unit === "d" ? 86400000 : unit === "h" ? 3600000 : 60000;
+    return nowMs - value * multiplier;
+  }
+
+  // ISO date: try parsing
+  const parsed = Date.parse(trimmed);
+  if (Number.isNaN(parsed)) {
+    throw new Error(`invalid time format: ${trimmed} (expected ISO date or relative like '7d')`);
+  }
+  return parsed;
+}
@@ -2,12 +2,7 @@ import { readFile } from "node:fs/promises";

 import type { JSONSchema } from "@uncaged/json-cas";
 import { putSchema, validate } from "@uncaged/json-cas";
-import type {
-  CasRef,
-  RoleDefinition,
-  Transition,
-  WorkflowPayload,
-} from "@uncaged/workflow-protocol";
+import type { CasRef, RoleDefinition, Target, WorkflowPayload } from "@uncaged/workflow-protocol";
 import { parse } from "yaml";

 import {
@@ -20,6 +15,7 @@ import {
  type UwfStore,
 } from "../store.js";
 import { checkWorkflowFilenameConsistency, parseWorkflowPayload } from "../validate.js";
+import { validateWorkflow } from "../validate-semantic.js";

 export type WorkflowOrigin = "local" | "global";

@@ -29,7 +25,7 @@ export type WorkflowListEntry = {
  origin: WorkflowOrigin;
 };

-export type WorkflowPutOutput = {
+export type WorkflowAddOutput = {
  name: string;
  hash: CasRef;
 };
@@ -51,14 +47,23 @@ function isJsonSchema(value: unknown): value is JSONSchema {
  return typeof value === "object" && value !== null && !Array.isArray(value);
 }

-/** Normalize graph transitions: ensure condition is null (not undefined) for fallback entries. */
-function normalizeGraph(graph: Record<string, Transition[]>): Record<string, Transition[]> {
-  const result: Record<string, Transition[]> = {};
-  for (const [node, transitions] of Object.entries(graph)) {
-    result[node] = transitions.map((t) => ({
-      role: t.role,
-      condition: t.condition ?? null,
-    }));
+/** Normalize graph: validate each status → target mapping. */
+function normalizeGraph(
+  graph: Record<string, Record<string, Target>>,
+): Record<string, Record<string, Target>> {
+  const result: Record<string, Record<string, Target>> = {};
+  for (const [node, statusMap] of Object.entries(graph)) {
+    const normalized: Record<string, Target> = {};
+    for (const [status, target] of Object.entries(statusMap)) {
+      if (typeof target.prompt !== "string" || target.prompt.trim() === "") {
+        fail(`graph[${node}][${status}] → "${target.role}": prompt is required (non-empty string)`);
+      }
+      normalized[status] = {
+        role: target.role,
+        prompt: target.prompt,
+      };
+    }
+    result[node] = normalized;
  }
  return result;
 }
@@ -100,15 +105,14 @@ export async function materializeWorkflowPayload(
    name: raw.name,
    description: raw.description,
    roles,
-    conditions: raw.conditions,
    graph: normalizeGraph(raw.graph),
  };
 }

-export async function cmdWorkflowPut(
+export async function cmdWorkflowAdd(
  storageRoot: string,
  filePath: string,
-): Promise<WorkflowPutOutput> {
+): Promise<WorkflowAddOutput> {
  let text: string;
  try {
    text = await readFile(filePath, "utf8");
@@ -133,6 +137,11 @@ export async function cmdWorkflowPut(
    fail(filenameError);
  }

+  const semanticErrors = validateWorkflow(payload);
+  if (semanticErrors.length > 0) {
+    fail(`workflow validation failed:\n${semanticErrors.map((e) => `  - ${e}`).join("\n")}`);
+  }
+
  const uwf = await createUwfStore(storageRoot);
  const materialized = await materializeWorkflowPayload(uwf, payload);

@@ -7,6 +7,6 @@ export function formatOutput(data: unknown, format: OutputFormat): string {
    case "json":
      return JSON.stringify(data);
    case "yaml":
-      return stringify(data).trimEnd();
+      return stringify(data, { aliasDuplicateObjects: false }).trimEnd();
  }
 }
@@ -0,0 +1,278 @@
+import type { WorkflowPayload } from "@uncaged/workflow-protocol";
+
+type SchemaObj = Record<string, unknown>;
+
+const RESERVED_NAMES = new Set(["$START", "$END"]);
+
+/** Extract mustache variable names from a prompt string. */
+function extractMustacheVars(prompt: string): string[] {
+  const vars: string[] = [];
+  const re = /\{\{\{?([^}]+)\}\}\}?/g;
+  let m: RegExpExecArray | null = re.exec(prompt);
+  while (m !== null) {
+    vars.push(m[1]);
+    m = re.exec(prompt);
+  }
+  return vars;
+}
+
+/** Check if a frontmatter schema is a oneOf (multi-exit) type. */
+function isOneOfSchema(fm: unknown): fm is SchemaObj & { oneOf: SchemaObj[] } {
+  if (typeof fm !== "object" || fm === null) return false;
+  const obj = fm as SchemaObj;
+  return Array.isArray(obj.oneOf);
+}
+
+/** Get property names from a schema object. */
+function getPropertyNames(schema: SchemaObj): Set<string> {
+  const props = schema.properties;
+  if (typeof props !== "object" || props === null) return new Set();
+  return new Set(Object.keys(props as Record<string, unknown>));
+}
+
+/** Extract $status const values from oneOf variants. */
+function getOneOfStatuses(variants: SchemaObj[]): string[] {
+  const statuses: string[] = [];
+  for (const variant of variants) {
+    const props = variant.properties as Record<string, SchemaObj> | undefined;
+    if (props?.$status) {
+      const statusDef = props.$status;
+      if (typeof statusDef.const === "string") {
+        statuses.push(statusDef.const);
+      }
+    }
+  }
+  return statuses;
+}
+
+/** Check reserved names and role/graph reference integrity. */
+function checkRoleReferences(payload: WorkflowPayload, errors: string[]): void {
+  const roleNames = new Set(Object.keys(payload.roles));
+  const graphNodes = new Set(Object.keys(payload.graph));
+
+  for (const name of roleNames) {
+    if (RESERVED_NAMES.has(name)) {
+      errors.push(`reserved name "${name}" must not appear in roles`);
+    }
+  }
+
+  for (const node of graphNodes) {
+    if (!RESERVED_NAMES.has(node) && !roleNames.has(node)) {
+      errors.push(`graph references unknown role "${node}"`);
+    }
+  }
+
+  for (const name of roleNames) {
+    if (RESERVED_NAMES.has(name)) continue;
+    if (!graphNodes.has(name)) {
+      errors.push(`role "${name}" is defined but not referenced in graph`);
+    }
+  }
+}
+
+/** Check $START/$END constraints, edge targets, and reachability. */
+function checkGraphStructure(payload: WorkflowPayload, errors: string[]): void {
+  const roleNames = new Set(Object.keys(payload.roles));
+  const graphNodes = new Set(Object.keys(payload.graph));
+
+  if (!graphNodes.has("$START")) {
+    errors.push("$START must be defined in graph");
+  } else {
+    const startKeys = Object.keys(payload.graph.$START);
+    if (startKeys.length !== 1 || startKeys[0] !== "_") {
+      errors.push('$START must have exactly one edge with status "_"');
+    }
+  }
+
+  if (graphNodes.has("$END")) {
+    errors.push("$END must not have outgoing edges");
+  }
+
+  for (const [node, statusMap] of Object.entries(payload.graph)) {
+    for (const [status, target] of Object.entries(statusMap)) {
+      if (target.role !== "$END" && !roleNames.has(target.role)) {
+        errors.push(`edge ${node}→${status}: unknown target role "${target.role}"`);
+      }
+    }
+  }
+
+  checkReachability(roleNames, collectReachableRoles(payload.graph), errors);
+}
+
+/** BFS to collect all roles reachable from $START. */
+function collectReachableRoles(graph: WorkflowPayload["graph"]): Set<string> {
+  const reachable = new Set<string>();
+  const startEdges = graph.$START;
+  if (!startEdges) return reachable;
+
+  const queue: string[] = [];
+  for (const target of Object.values(startEdges)) {
+    if (target.role !== "$END" && !reachable.has(target.role)) {
+      reachable.add(target.role);
+      queue.push(target.role);
+    }
+  }
+
+  while (queue.length > 0) {
+    const current = queue.shift() as string;
+    const edges = graph[current];
+    if (!edges) continue;
+    for (const target of Object.values(edges)) {
+      if (target.role !== "$END" && !reachable.has(target.role)) {
+        reachable.add(target.role);
+        queue.push(target.role);
+      }
+    }
+  }
+
+  return reachable;
+}
+
+/** Check that all defined roles are reachable from $START. */
+function checkReachability(roleNames: Set<string>, reachable: Set<string>, errors: string[]): void {
+  for (const name of roleNames) {
+    if (RESERVED_NAMES.has(name)) continue;
+    if (!reachable.has(name)) {
+      errors.push(`role "${name}" is not reachable from $START`);
+    }
+  }
+}
+
+/** Check oneOf discriminant validity for a role. */
+function checkOneOfDiscriminant(
+  roleName: string,
+  variants: SchemaObj[],
+  statuses: string[],
+  errors: string[],
+): void {
+  if (statuses.length === variants.length) return;
+
+  let foundMissing = false;
+  for (const variant of variants) {
+    const props = variant.properties as Record<string, SchemaObj> | undefined;
+    if (!props?.$status) {
+      errors.push(`role "${roleName}": oneOf variants must have "$status" as const discriminant`);
+      foundMissing = true;
+      break;
+    }
+    if (typeof props.$status.const !== "string") {
+      errors.push(`role "${roleName}": oneOf variant $status must be a const value`);
+      foundMissing = true;
+      break;
+    }
+  }
+
+  if (!foundMissing) {
+    errors.push(`role "${roleName}": oneOf variant $status must be a const value`);
+  }
+}
+
+/** Check status-edge consistency for a multi-exit role. */
+function checkMultiExitEdges(
+  roleName: string,
+  graphKeys: Set<string>,
+  statusSet: Set<string>,
+  errors: string[],
+): void {
+  if (graphKeys.has("_")) {
+    errors.push(`role "${roleName}" is multi-exit but graph uses "_"`);
+    return;
+  }
+
+  const extraKeys = [...graphKeys].filter((k) => !statusSet.has(k));
+  const missingKeys = [...statusSet].filter((k) => !graphKeys.has(k));
+  if (extraKeys.length > 0) {
+    errors.push(`role "${roleName}" graph has extra status keys: ${extraKeys.join(", ")}`);
+  }
+  if (missingKeys.length > 0) {
+    errors.push(`role "${roleName}" graph is missing status keys: ${missingKeys.join(", ")}`);
+  }
+}
+
+/** Check mustache variables for multi-exit role. */
+function checkMultiExitMustache(
+  roleName: string,
+  graphEntry: Record<string, { role: string; prompt: string }>,
+  variants: SchemaObj[],
+  errors: string[],
+): void {
+  for (const [status, target] of Object.entries(graphEntry)) {
+    const vars = extractMustacheVars(target.prompt);
+    const variant = variants.find((v) => {
+      const props = v.properties as Record<string, SchemaObj> | undefined;
+      return props?.$status?.const === status;
+    });
+    if (!variant) continue;
+    const propNames = getPropertyNames(variant);
+    for (const v of vars) {
+      if (v === "$status") continue;
+      if (!propNames.has(v)) {
+        errors.push(`prompt variable "${v}" not found in role "${roleName}" variant "${status}"`);
+      }
+    }
+  }
+}
+
+/** Check status-edge consistency and mustache for each role. */
+function checkRoleConsistency(payload: WorkflowPayload, errors: string[]): void {
+  for (const [roleName, role] of Object.entries(payload.roles)) {
+    if (RESERVED_NAMES.has(roleName)) continue;
+    const graphEntry = payload.graph[roleName];
+    if (!graphEntry) continue;
+
+    const fm = role.frontmatter as unknown;
+    const graphKeys = new Set(Object.keys(graphEntry));
+
+    if (isOneOfSchema(fm)) {
+      const variants = fm.oneOf as SchemaObj[];
+      const statuses = getOneOfStatuses(variants);
+
+      checkOneOfDiscriminant(roleName, variants, statuses, errors);
+      checkMultiExitEdges(roleName, graphKeys, new Set(statuses), errors);
+      checkMultiExitMustache(roleName, graphEntry, variants, errors);
+    } else {
+      checkSingleExitRole(roleName, graphKeys, graphEntry, fm as SchemaObj | null, errors);
+    }
+  }
+}
+
+/** Check single-exit role status and mustache. */
+function checkSingleExitRole(
+  roleName: string,
+  graphKeys: Set<string>,
+  graphEntry: Record<string, { role: string; prompt: string }>,
+  fm: SchemaObj | null,
+  errors: string[],
+): void {
+  if (graphKeys.size > 1 || (graphKeys.size === 1 && !graphKeys.has("_"))) {
+    if (!graphKeys.has("_")) {
+      errors.push(`role "${roleName}" is single-exit but graph has no "_" key`);
+    } else {
+      errors.push(`role "${roleName}" is single-exit but has status keys other than "_"`);
+    }
+  }
+
+  const singleTarget = graphEntry._;
+  if (!singleTarget) return;
+
+  const vars = extractMustacheVars(singleTarget.prompt);
+  const propNames = fm ? getPropertyNames(fm) : new Set<string>();
+  for (const v of vars) {
+    if (v === "$status") continue;
+    if (!propNames.has(v)) {
+      errors.push(`prompt variable "${v}" not found in role "${roleName}" frontmatter`);
+    }
+  }
+}
+
+/**
+ * Validate a parsed WorkflowPayload for semantic correctness.
+ * Returns an array of error messages. Empty array = valid.
+ */
+export function validateWorkflow(payload: WorkflowPayload): string[] {
+  const errors: string[] = [];
+  checkRoleReferences(payload, errors);
+  checkGraphStructure(payload, errors);
+  checkRoleConsistency(payload, errors);
+  return errors;
+}
@@ -16,7 +16,9 @@ function isRoleDefinition(value: unknown): boolean {
    return false;
  }
  const frontmatter = value.frontmatter;
-  const frontmatterOk = isRecord(frontmatter) && typeof frontmatter.type === "string";
+  const frontmatterOk =
+    isRecord(frontmatter) &&
+    (typeof frontmatter.type === "string" || Array.isArray(frontmatter.oneOf));
  const capabilities = value.capabilities;
  const capabilitiesOk =
    Array.isArray(capabilities) && capabilities.every((c) => typeof c === "string");
@@ -30,21 +32,12 @@ function isRoleDefinition(value: unknown): boolean {
  );
 }

-function isConditionDefinition(value: unknown): boolean {
+function isTarget(value: unknown): boolean {
  if (!isRecord(value)) {
    return false;
  }
-  return typeof value.description === "string" && typeof value.expression === "string";
-}
-
-function isTransition(value: unknown): boolean {
-  if (!isRecord(value)) {
-    return false;
-  }
-  const condition = value.condition;
  return (
-    typeof value.role === "string" &&
-    (condition === null || condition === undefined || typeof condition === "string")
+    typeof value.role === "string" && typeof value.prompt === "string" && value.prompt.trim() !== ""
  );
 }

@@ -60,7 +53,7 @@ function isGraph(value: unknown): boolean {
    return false;
  }
  return Object.values(value).every(
-    (transitions) => Array.isArray(transitions) && transitions.every((t) => isTransition(t)),
+    (statusMap) => isRecord(statusMap) && Object.values(statusMap).every((t) => isTarget(t)),
  );
 }

@@ -99,11 +92,7 @@ export function parseWorkflowPayload(raw: unknown): WorkflowPayload | null {
  if (typeof raw.name !== "string" || typeof raw.description !== "string") {
    return null;
  }
-  if (
-    !isStringRecord(raw.roles, isRoleDefinition) ||
-    !isStringRecord(raw.conditions, isConditionDefinition) ||
-    !isGraph(raw.graph)
-  ) {
+  if (!isStringRecord(raw.roles, isRoleDefinition) || !isGraph(raw.graph)) {
    return null;
  }
  return raw as WorkflowPayload;
@@ -0,0 +1,141 @@
+# @uncaged/workflow-agent-builtin
+
+`uwf-builtin` agent — built-in LLM agent with file read/write and shell tools.
+
+## Overview
+
+Layer 3 agent implementation. Runs an OpenAI-compatible chat completion loop with built-in tools (`read_file`, `write_file`, `run_command`). Uses the configured provider/model from `config.yaml`. Produces frontmatter markdown output and stores turn-by-turn session detail in CAS.
+
+Useful when you want a self-contained agent without an external CLI like Hermes or Claude Code.
+
+**Dependencies:** `@uncaged/json-cas`, `@uncaged/workflow-agent-kit`, `@uncaged/workflow-util`
+
+## Installation
+
+Included as the `uwf-builtin` binary when you install `@uncaged/workflow-agent-builtin`:
+
+```bash
+bun add -g @uncaged/workflow-agent-builtin
+```
+
+## CLI Usage
+
+Invoked by `uwf thread step`:
+
+```bash
+uwf-builtin <thread-id> <role>
+```
+
+Configure as default agent:
+
+```bash
+uwf setup --agent builtin
+```
+
+Override per step:
+
+```bash
+uwf thread step <thread-id> --agent uwf-builtin
+```
+
+Environment variables set by the engine:
+
+| Variable | Purpose |
+|----------|---------|
+| `UWF_EDGE_PROMPT` | Moderator edge instruction for this step |
+
+## API
+
+All exports come from `src/index.ts`.
+
+### Agent factory
+
+```typescript
+function createBuiltinAgent(): () => Promise<void>
+function buildBuiltinMessages(ctx: AgentContext): ChatMessage[]
+```
+
+### LLM loop
+
+```typescript
+const BUILTIN_MAX_TURNS = 30;
+const BUILTIN_CONTINUE_MAX_TURNS = 5;
+
+function runBuiltinLoop(/* options: RunBuiltinLoopOptions */): Promise<RunBuiltinLoopResult>
+function chatCompletionWithTools(
+  provider: ResolvedLlmProvider,
+  messages: ChatMessage[],
+  tools: OpenAiToolDefinition[],
+): Promise<LlmAssistantResponse>
+```
+
+`RunBuiltinLoopOptions` and `RunBuiltinLoopResult` are internal to `loop.ts` and not re-exported from `index.ts`.
+
+### Tools
+
+```typescript
+function getBuiltinTools(): readonly BuiltinTool[]
+function executeBuiltinTool(
+  name: string,
+  args: Record<string, unknown>,
+  ctx: ToolContext,
+): Promise<string>
+```
+
+### Session and detail
+
+```typescript
+function initSessionDir(storageRoot: string): Promise<void>
+function appendSessionTurn(storageRoot: string, sessionId: string, turn: BuiltinTurnPayload): Promise<void>
+function readSessionTurns(storageRoot: string, sessionId: string): Promise<BuiltinTurnPayload[]>
+function removeSession(storageRoot: string, sessionId: string): Promise<void>
+function registerBuiltinSchemas(store: Store): Promise<BuiltinSchemaHashes>
+function storeBuiltinDetail(store: Store, payload: BuiltinDetailPayload): Promise<string>
+```
+
+### Types
+
+```typescript
+type ChatMessage = /* system | user | assistant | tool */;
+type LlmAssistantResponse = { content: string | null; toolCalls: LlmToolCall[] | null };
+type LlmToolCall = { id: string; name: string; arguments: string };
+type BuiltinTool = { name: string; description: string; parameters: Record<string, unknown> };
+type ToolContext = { cwd: string; storageRoot: string };
+type BuiltinDetailPayload = { /* session turns, model, timestamps */ };
+type BuiltinLoopTurn = { /* single loop iteration record */ };
+type BuiltinToolCallRecord = { /* tool call audit */ };
+type BuiltinToolResultRecord = { /* tool result audit */ };
+type BuiltinTurnPayload = { /* persisted turn */ };
+```
+
+## Internal Structure
+
+```
+src/
+├── index.ts
+├── cli.ts              Binary entrypoint
+├── agent.ts            createBuiltinAgent
+├── loop.ts             Multi-turn LLM + tool loop
+├── prompt.ts           buildBuiltinMessages
+├── session.ts          Session directory persistence
+├── detail.ts           CAS detail node storage
+├── schemas.ts          Builtin CAS schemas
+├── types.ts            Detail and turn payload types
+├── llm/
+│   ├── index.ts
+│   ├── llm.ts          chatCompletionWithTools
+│   └── types.ts        ChatMessage, LlmToolCall, etc.
+└── tools/
+    ├── index.ts        getBuiltinTools, executeBuiltinTool
+    ├── read-file.ts
+    ├── write-file.ts
+    ├── run-command.ts
+    ├── path.ts
+    └── types.ts
+```
+
+## Configuration
+
+Requires a configured OpenAI-compatible provider and model in `~/.uncaged/workflow/config.yaml` (via `uwf setup`). API keys are loaded from `~/.uncaged/workflow/.env`.
+
+Tools run with the current working directory as `ToolContext.cwd` (typically the directory where `uwf thread step` was invoked).
@@ -0,0 +1,16 @@
+import { describe, expect, test } from "bun:test";
+
+import type { LlmToolCall } from "../src/llm/types.js";
+
+/** Mirror OpenAI response shape for parser coverage via chatCompletionWithTools integration later. */
+describe("LlmToolCall shape", () => {
+  test("tool call record fields", () => {
+    const call: LlmToolCall = {
+      id: "call_1",
+      name: "read_file",
+      arguments: '{"path":"README.md"}',
+    };
+    expect(call.name).toBe("read_file");
+    expect(JSON.parse(call.arguments)).toEqual({ path: "README.md" });
+  });
+});
@@ -0,0 +1,256 @@
+import { beforeEach, describe, expect, mock, test } from "bun:test";
+
+const mockChatCompletionWithTools = mock(async () => ({
+  content: "---\nstatus: done\n---",
+  toolCalls: [],
+}));
+const mockAppendSessionTurn = mock(async () => {});
+const mockExecuteBuiltinTool = mock(async () => "tool-result");
+
+mock.module("../src/llm/index.js", () => ({
+  chatCompletionWithTools: mockChatCompletionWithTools,
+}));
+mock.module("../src/session.js", () => ({
+  appendSessionTurn: mockAppendSessionTurn,
+}));
+mock.module("../src/tools/index.js", () => ({
+  builtinToolsToOpenAi: () => [],
+  executeBuiltinTool: mockExecuteBuiltinTool,
+  getBuiltinTools: () => [],
+}));
+
+import {
+  executeTurnTools,
+  extractFinalText,
+  runBuiltinLoop,
+  shouldInjectDeadlineWarning,
+  shouldNudge,
+  shouldProcessToolCalls,
+} from "../src/loop.js";
+
+const fakeProvider = {} as any;
+const fakeToolCtx = {} as any;
+
+function makeOptions(overrides: Partial<Parameters<typeof runBuiltinLoop>[0]> = {}) {
+  return {
+    provider: fakeProvider,
+    messages: [{ role: "system" as const, content: "sys" }],
+    toolCtx: fakeToolCtx,
+    maxTurns: 5,
+    storageRoot: "/tmp",
+    sessionId: "sess",
+    noTools: false,
+    ...overrides,
+  };
+}
+
+beforeEach(() => {
+  mockChatCompletionWithTools.mockReset();
+  mockAppendSessionTurn.mockReset();
+  mockExecuteBuiltinTool.mockReset();
+});
+
+describe("shouldNudge", () => {
+  test("2.1 returns true when all conditions met", () => {
+    expect(shouldNudge({ noTools: false, text: "some text", turn: 0, maxTurns: 5 })).toBe(true);
+  });
+  test("2.2 returns false when noTools=true", () => {
+    expect(shouldNudge({ noTools: true, text: "some text", turn: 0, maxTurns: 5 })).toBe(false);
+  });
+  test("2.3 returns false when text starts with ---", () => {
+    expect(shouldNudge({ noTools: false, text: "---\nstatus: done", turn: 0, maxTurns: 5 })).toBe(
+      false,
+    );
+  });
+  test("2.4 returns false on last turn", () => {
+    expect(shouldNudge({ noTools: false, text: "some text", turn: 4, maxTurns: 5 })).toBe(false);
+  });
+  test("2.5 returns true on second-to-last turn", () => {
+    expect(shouldNudge({ noTools: false, text: "some text", turn: 3, maxTurns: 5 })).toBe(true);
+  });
+  test("2.6 leading whitespace before --- suppresses nudge", () => {
+    expect(shouldNudge({ noTools: false, text: "  ---\nstatus: done", turn: 0, maxTurns: 5 })).toBe(
+      false,
+    );
+  });
+});
+
+describe("executeTurnTools", () => {
+  test("4.1 executes each tool call and pushes tool result messages", async () => {
+    mockExecuteBuiltinTool.mockResolvedValue("result");
+    const messages: any[] = [];
+    const calls = [
+      { id: "c1", name: "tool_a", arguments: "{}" },
+      { id: "c2", name: "tool_b", arguments: "{}" },
+    ];
+    const count = await executeTurnTools(calls, fakeToolCtx, messages, "/tmp", "sess");
+    expect(messages.length).toBe(2);
+    expect(messages[0].role).toBe("tool");
+    expect(messages[1].role).toBe("tool");
+    expect(count).toBe(2);
+  });
+  test("4.2 tool result content matches executeBuiltinTool return value", async () => {
+    mockExecuteBuiltinTool.mockResolvedValue("result-A");
+    const messages: any[] = [];
+    await executeTurnTools(
+      [{ id: "c1", name: "read_file", arguments: "{}" }],
+      fakeToolCtx,
+      messages,
+      "/tmp",
+      "sess",
+    );
+    expect(messages[0].content).toBe("result-A");
+  });
+});
+
+describe("runBuiltinLoop integration", () => {
+  test("3.1 single text-only response returns finalText immediately", async () => {
+    mockChatCompletionWithTools.mockResolvedValue({
+      content: "---\nstatus: done\n---",
+      toolCalls: [],
+    });
+    const result = await runBuiltinLoop(makeOptions());
+    expect(result.finalText).toBe("---\nstatus: done\n---");
+    expect(result.turnCount).toBe(1);
+  });
+  test("3.2 noTools=true suppresses tool calls", async () => {
+    mockChatCompletionWithTools.mockResolvedValue({
+      content: "ok",
+      toolCalls: [{ id: "c1", name: "read_file", arguments: "{}" }],
+    });
+    const result = await runBuiltinLoop(makeOptions({ noTools: true }));
+    expect(result.finalText).toBe("ok");
+    expect(result.turnCount).toBe(1);
+  });
+  test("3.3 tool call followed by text response", async () => {
+    mockChatCompletionWithTools
+      .mockResolvedValueOnce({
+        content: null,
+        toolCalls: [{ id: "c1", name: "read_file", arguments: "{}" }],
+      })
+      .mockResolvedValueOnce({ content: "---\nstatus: done\n---", toolCalls: [] });
+    mockExecuteBuiltinTool.mockResolvedValue("file contents");
+    const result = await runBuiltinLoop(makeOptions());
+    expect(result.finalText).toBe("---\nstatus: done\n---");
+    expect(result.turnCount).toBe(3);
+  });
+  test("3.4 nudge cycle inserts nudge message", async () => {
+    mockChatCompletionWithTools
+      .mockResolvedValueOnce({ content: "I am thinking", toolCalls: [] })
+      .mockResolvedValueOnce({ content: "---\nstatus: done\n---", toolCalls: [] });
+    const result = await runBuiltinLoop(makeOptions());
+    expect(result.finalText).toBe("---\nstatus: done\n---");
+    const nudgeMsg = result.messages.find(
+      (m) =>
+        m.role === "user" && typeof m.content === "string" && m.content.includes("frontmatter"),
+    );
+    expect(nudgeMsg).toBeDefined();
+  });
+  test("3.5 maxTurns exhaustion falls back to last assistant content", async () => {
+    mockChatCompletionWithTools.mockResolvedValue({ content: "still thinking", toolCalls: [] });
+    const result = await runBuiltinLoop(makeOptions({ maxTurns: 3 }));
+    expect(result.finalText).toBe("still thinking");
+  });
+  test("3.6 original messages array is not mutated", async () => {
+    mockChatCompletionWithTools.mockResolvedValue({
+      content: "---\nstatus: done\n---",
+      toolCalls: [],
+    });
+    const original = [{ role: "system" as const, content: "sys" }];
+    await runBuiltinLoop(makeOptions({ messages: original }));
+    expect(original.length).toBe(1);
+  });
+});
+
+describe("shouldInjectDeadlineWarning", () => {
+  test("5.1 returns true when turn count reaches warning threshold and not yet warned", () => {
+    expect(shouldInjectDeadlineWarning(7, 10, false, false)).toBe(true);
+  });
+  test("5.2 returns false when already warned", () => {
+    expect(shouldInjectDeadlineWarning(7, 10, true, false)).toBe(false);
+  });
+  test("5.3 returns false when noTools is true", () => {
+    expect(shouldInjectDeadlineWarning(7, 10, false, true)).toBe(false);
+  });
+  test("5.4 returns false when turns remaining > DEADLINE_WARNING_TURNS", () => {
+    expect(shouldInjectDeadlineWarning(5, 10, false, false)).toBe(false);
+  });
+  test("5.5 returns true when exactly at warning threshold", () => {
+    expect(shouldInjectDeadlineWarning(7, 10, false, false)).toBe(true);
+  });
+  test("5.6 returns false when turns remaining is 0", () => {
+    expect(shouldInjectDeadlineWarning(10, 10, false, false)).toBe(false);
+  });
+});
+
+describe("shouldProcessToolCalls", () => {
+  test("6.1 returns true when toolCalls present and noTools=false", () => {
+    expect(shouldProcessToolCalls([{ id: "x", name: "read", arguments: "{}" }], false)).toBe(true);
+  });
+  test("6.2 returns false when toolCalls is null", () => {
+    expect(shouldProcessToolCalls(null, false)).toBe(false);
+  });
+  test("6.3 returns false when toolCalls is empty array", () => {
+    expect(shouldProcessToolCalls([], false)).toBe(false);
+  });
+  test("6.4 returns false when noTools=true", () => {
+    expect(shouldProcessToolCalls([{ id: "x", name: "read", arguments: "{}" }], true)).toBe(false);
+  });
+  test("6.5 returns true when multiple tool calls present", () => {
+    expect(
+      shouldProcessToolCalls(
+        [
+          { id: "x1", name: "read", arguments: "{}" },
+          { id: "x2", name: "write", arguments: "{}" },
+        ],
+        false,
+      ),
+    ).toBe(true);
+  });
+});
+
+describe("extractFinalText", () => {
+  test("7.1 returns last assistant message content", () => {
+    const messages = [
+      { role: "system" as const, content: "sys", tool_calls: null },
+      { role: "assistant" as const, content: "first", tool_calls: null },
+      { role: "assistant" as const, content: "last", tool_calls: null },
+    ];
+    expect(extractFinalText(messages)).toBe("last");
+  });
+  test("7.2 returns empty string when no assistant messages", () => {
+    expect(extractFinalText([{ role: "system" as const, content: "sys", tool_calls: null }])).toBe(
+      "",
+    );
+  });
+  test("7.3 skips assistant messages with null content", () => {
+    const messages = [
+      { role: "assistant" as const, content: "first", tool_calls: null },
+      {
+        role: "assistant" as const,
+        content: null,
+        tool_calls: [{ id: "x", name: "t", arguments: "{}" }],
+      },
+      { role: "assistant" as const, content: "second", tool_calls: null },
+    ];
+    expect(extractFinalText(messages)).toBe("second");
+  });
+  test("7.4 skips assistant messages with empty content", () => {
+    const messages = [
+      { role: "assistant" as const, content: "first", tool_calls: null },
+      { role: "assistant" as const, content: "", tool_calls: null },
+      { role: "user" as const, content: "nudge", tool_calls: null },
+    ];
+    expect(extractFinalText(messages)).toBe("first");
+  });
+  test("7.5 handles empty messages array", () => {
+    expect(extractFinalText([])).toBe("");
+  });
+  test("7.6 handles messages with only user and system roles", () => {
+    const messages = [
+      { role: "system" as const, content: "sys", tool_calls: null },
+      { role: "user" as const, content: "query", tool_calls: null },
+    ];
+    expect(extractFinalText(messages)).toBe("");
+  });
+});
@@ -0,0 +1,21 @@
+import { describe, expect, test } from "bun:test";
+import { resolve } from "node:path";
+import { resolvePath } from "../src/tools/path.js";
+
+describe("resolvePath", () => {
+  test("resolves relative paths against cwd", () => {
+    const root = "/workspace/project";
+    const resolved = resolvePath(root, "src/foo.ts");
+    expect(resolved).toBe(resolve(root, "src/foo.ts"));
+  });
+
+  test("resolves absolute paths as-is", () => {
+    const resolved = resolvePath("/workspace", "/etc/hosts");
+    expect(resolved).toBe("/etc/hosts");
+  });
+
+  test("resolves parent traversal normally", () => {
+    const resolved = resolvePath("/workspace/project", "../other/file.ts");
+    expect(resolved).toBe(resolve("/workspace/project", "../other/file.ts"));
+  });
+});
@@ -0,0 +1,236 @@
+import { describe, expect, test } from "bun:test";
+
+import type { AgentContext } from "@uncaged/workflow-agent-kit";
+
+import { buildBuiltinMessages } from "../src/prompt.js";
+
+function minimalContext(overrides: Partial<AgentContext> = {}): AgentContext {
+  return {
+    threadId: "00000000000000000000000000" as AgentContext["threadId"],
+    role: "developer",
+    store: {} as AgentContext["store"],
+    workflow: {
+      name: "test",
+      description: "test workflow",
+      roles: {
+        developer: {
+          description: "Developer role",
+          goal: "Ship the fix",
+          capabilities: ["file-edit"],
+          procedure: "Edit files",
+          output: "A patch",
+          frontmatter: "schema-hash",
+        },
+      },
+      conditions: {},
+      graph: {},
+    },
+    start: { workflow: "wf-hash", prompt: "Fix the bug" },
+    steps: [],
+    outputFormatInstruction: "---\nstatus: done\n---",
+    edgePrompt: "Implement the fix described in the plan.",
+    isFirstVisit: true,
+    ...overrides,
+  };
+}
+
+describe("buildBuiltinMessages", () => {
+  test("system includes output format and role goal", () => {
+    const messages = buildBuiltinMessages(minimalContext());
+    const system = messages[0];
+    expect(system?.role).toBe("system");
+    if (system?.role === "system") {
+      expect(system.content).toContain("status: done");
+      expect(system.content).toContain("## Goal");
+      expect(system.content).toContain("Ship the fix");
+    }
+  });
+
+  test("first visit produces system + single user message with edge prompt", () => {
+    const messages = buildBuiltinMessages(minimalContext());
+    expect(messages).toHaveLength(2);
+    expect(messages[1]?.role).toBe("user");
+    if (messages[1]?.role === "user") {
+      expect(messages[1].content).toContain("Implement the fix");
+      expect(messages[1].content).not.toContain("## What Happened Since Your Last Turn");
+    }
+  });
+
+  test("first visit with prior steps includes inter-step summary in final user message", () => {
+    const messages = buildBuiltinMessages(
+      minimalContext({
+        steps: [
+          {
+            role: "planner",
+            output: { plan: "step 1" },
+            agent: "uwf-builtin",
+            detail: "detail-hash",
+            edgePrompt: "Create a plan.",
+          },
+        ],
+      }),
+    );
+    expect(messages).toHaveLength(2);
+    const finalUser = messages[1];
+    if (finalUser?.role === "user") {
+      expect(finalUser.content).toContain("Implement the fix");
+      expect(finalUser.content).toContain("## What Happened Since Your Last Turn");
+      expect(finalUser.content).toContain("planner");
+    }
+  });
+
+  test("re-entry reconstructs prior user/assistant turns plus current user message", () => {
+    const messages = buildBuiltinMessages(
+      minimalContext({
+        isFirstVisit: false,
+        edgePrompt: "Fix the reviewer's feedback.",
+        steps: [
+          {
+            role: "developer",
+            output: { summary: "Initial fix" },
+            agent: "uwf-builtin",
+            detail: "detail-1",
+            edgePrompt: "Implement the fix.",
+          },
+          {
+            role: "reviewer",
+            output: { approved: false, comments: "Missing tests" },
+            agent: "uwf-builtin",
+            detail: "detail-2",
+            edgePrompt: "Review the implementation.",
+          },
+        ],
+      }),
+    );
+
+    expect(messages).toHaveLength(4);
+    expect(messages[0]?.role).toBe("system");
+    expect(messages[1]?.role).toBe("user");
+    expect(messages[2]?.role).toBe("assistant");
+    expect(messages[3]?.role).toBe("user");
+
+    if (messages[1]?.role === "user") {
+      expect(messages[1].content).toBe("Implement the fix.");
+    }
+    if (messages[2]?.role === "assistant") {
+      expect(messages[2].content).toBe(JSON.stringify({ summary: "Initial fix" }));
+    }
+    if (messages[3]?.role === "user") {
+      expect(messages[3].content).toContain("Fix the reviewer's feedback.");
+      expect(messages[3].content).toContain("## What Happened Since Your Last Turn");
+      expect(messages[3].content).toContain("reviewer");
+      expect(messages[3].content).toContain("Missing tests");
+    }
+  });
+
+  test("prefix is stable across re-entry for LLM cache hits", () => {
+    const firstVisitMessages = buildBuiltinMessages(
+      minimalContext({
+        edgePrompt: "Implement the fix.",
+        steps: [],
+      }),
+    );
+
+    const reEntryMessages = buildBuiltinMessages(
+      minimalContext({
+        isFirstVisit: false,
+        edgePrompt: "Fix the reviewer's feedback.",
+        steps: [
+          {
+            role: "developer",
+            output: { summary: "Initial fix" },
+            agent: "uwf-builtin",
+            detail: "detail-1",
+            edgePrompt: "Implement the fix.",
+          },
+          {
+            role: "reviewer",
+            output: { approved: false },
+            agent: "uwf-builtin",
+            detail: "detail-2",
+            edgePrompt: "Review the code.",
+          },
+        ],
+      }),
+    );
+
+    expect(reEntryMessages[0]).toEqual(firstVisitMessages[0]);
+    expect(reEntryMessages[1]).toEqual(firstVisitMessages[1]);
+    expect(reEntryMessages[2]?.role).toBe("assistant");
+    if (reEntryMessages[2]?.role === "assistant") {
+      expect(reEntryMessages[2].content).toBe(JSON.stringify({ summary: "Initial fix" }));
+    }
+    expect(reEntryMessages[3]?.role).toBe("user");
+    if (reEntryMessages[3]?.role === "user") {
+      expect(reEntryMessages[3].content).toContain("Fix the reviewer's feedback.");
+    }
+  });
+
+  test("multiple prior visits emit one user/assistant pair per visit", () => {
+    const messages = buildBuiltinMessages(
+      minimalContext({
+        isFirstVisit: false,
+        edgePrompt: "Third round fix.",
+        steps: [
+          {
+            role: "developer",
+            output: { round: 1 },
+            agent: "uwf-builtin",
+            detail: "d1",
+            edgePrompt: "First attempt.",
+          },
+          {
+            role: "reviewer",
+            output: { approved: false },
+            agent: "uwf-builtin",
+            detail: "d2",
+            edgePrompt: "Review round 1.",
+          },
+          {
+            role: "developer",
+            output: { round: 2 },
+            agent: "uwf-builtin",
+            detail: "d3",
+            edgePrompt: "Second attempt.",
+          },
+          {
+            role: "reviewer",
+            output: { approved: false },
+            agent: "uwf-builtin",
+            detail: "d4",
+            edgePrompt: "Review round 2.",
+          },
+        ],
+      }),
+    );
+
+    expect(messages).toHaveLength(6);
+    expect(messages.map((m) => m.role)).toEqual([
+      "system",
+      "user",
+      "assistant",
+      "user",
+      "assistant",
+      "user",
+    ]);
+
+    if (messages[1]?.role === "user") {
+      expect(messages[1].content).toBe("First attempt.");
+    }
+    if (messages[2]?.role === "assistant") {
+      expect(messages[2].content).toBe(JSON.stringify({ round: 1 }));
+    }
+    if (messages[3]?.role === "user") {
+      expect(messages[3].content).toContain("Second attempt.");
+      expect(messages[3].content).toContain("reviewer");
+    }
+    if (messages[4]?.role === "assistant") {
+      expect(messages[4].content).toBe(JSON.stringify({ round: 2 }));
+    }
+    if (messages[5]?.role === "user") {
+      expect(messages[5].content).toContain("Third round fix.");
+      expect(messages[5].content).toContain("### Step 4: reviewer");
+      expect(messages[5].content).toContain('"approved":false');
+    }
+  });
+});
@@ -0,0 +1,34 @@
+{
+  "name": "@uncaged/workflow-agent-builtin",
+  "version": "0.5.0",
+  "files": [
+    "src",
+    "dist",
+    "package.json"
+  ],
+  "type": "module",
+  "bin": {
+    "uwf-builtin": "./src/cli.ts"
+  },
+  "exports": {
+    ".": {
+      "bun": "./src/index.ts",
+      "types": "./dist/index.d.ts",
+      "import": "./dist/index.js"
+    }
+  },
+  "scripts": {
+    "test": "bun test"
+  },
+  "dependencies": {
+    "@uncaged/json-cas": "^0.4.0",
+    "@uncaged/workflow-agent-kit": "workspace:^",
+    "@uncaged/workflow-util": "workspace:^"
+  },
+  "devDependencies": {
+    "typescript": "^5.8.3"
+  },
+  "publishConfig": {
+    "access": "public"
+  }
+}
@@ -0,0 +1,158 @@
+import type { Store } from "@uncaged/json-cas";
+import {
+  type AgentContext,
+  type AgentRunResult,
+  createAgent,
+  loadWorkflowConfig,
+  resolveModel,
+  resolveStorageRoot,
+} from "@uncaged/workflow-agent-kit";
+import { createLogger, generateUlid } from "@uncaged/workflow-util";
+
+import { storeBuiltinDetail } from "./detail.js";
+import type { ChatMessage } from "./llm/index.js";
+import { BUILTIN_CONTINUE_MAX_TURNS, BUILTIN_MAX_TURNS, runBuiltinLoop } from "./loop.js";
+import { buildBuiltinMessages } from "./prompt.js";
+import { initSessionDir } from "./session.js";
+
+const log = createLogger({ sink: { kind: "stderr" } });
+
+const FRONTMATTER_FENCE = "---";
+
+/**
+ * Strip any text before the first `---` fence.
+ * LLMs sometimes emit preamble text before the frontmatter block.
+ */
+function stripPreamble(text: string): string {
+  if (text.startsWith(FRONTMATTER_FENCE)) {
+    return text;
+  }
+  const idx = text.indexOf(`\n${FRONTMATTER_FENCE}\n`);
+  if (idx !== -1) {
+    log("6GWRP3QX", `stripped ${idx + 1} chars of preamble before frontmatter`);
+    return text.slice(idx + 1);
+  }
+  return text;
+}
+
+type SessionRecord = {
+  sessionId: string;
+  model: string;
+  startedAtMs: number;
+  messages: ChatMessage[];
+};
+
+const sessions = new Map<string, SessionRecord>();
+
+function getSession(sessionId: string): SessionRecord {
+  const session = sessions.get(sessionId);
+  if (session === undefined) {
+    throw new Error(`builtin session not found: ${sessionId}`);
+  }
+  return session;
+}
+
+function buildToolContext(storageRoot: string): { cwd: string; storageRoot: string } {
+  return {
+    cwd: process.cwd(),
+    storageRoot,
+  };
+}
+
+async function runBuiltinWithMessages(
+  storageRoot: string,
+  provider: ReturnType<typeof resolveModel>,
+  messages: ChatMessage[],
+  session: SessionRecord,
+  store: Store,
+  maxTurns: number,
+  noTools: boolean,
+): Promise<AgentRunResult> {
+  const loopResult = await runBuiltinLoop({
+    provider,
+    messages,
+    toolCtx: buildToolContext(storageRoot),
+    maxTurns,
+    storageRoot,
+    sessionId: session.sessionId,
+    noTools,
+  });
+
+  session.messages = loopResult.messages;
+
+  if (loopResult.turnCount === 0) {
+    log("5RWTK9NB", "no turns produced, returning empty output");
+    return { output: "", detailHash: "", sessionId: session.sessionId };
+  }
+
+  // Read jsonl → persist turns to CAS → store detail
+  const { detailHash } = await storeBuiltinDetail(
+    store,
+    storageRoot,
+    session.sessionId,
+    session.model,
+    session.startedAtMs,
+  );
+
+  return { output: stripPreamble(loopResult.finalText), detailHash, sessionId: session.sessionId };
+}
+
+async function runBuiltin(ctx: AgentContext): Promise<AgentRunResult> {
+  const storageRoot = resolveStorageRoot();
+  const config = await loadWorkflowConfig(storageRoot);
+  const provider = resolveModel(config, config.defaultModel);
+
+  const sessionId = generateUlid(Date.now());
+  await initSessionDir(storageRoot);
+  const messages = buildBuiltinMessages(ctx);
+
+  const session: SessionRecord = {
+    sessionId,
+    model: provider.model,
+    startedAtMs: Date.now(),
+    messages,
+  };
+  sessions.set(sessionId, session);
+
+  return runBuiltinWithMessages(
+    storageRoot,
+    provider,
+    messages,
+    session,
+    ctx.store,
+    BUILTIN_MAX_TURNS,
+    false,
+  );
+}
+
+async function continueBuiltin(
+  sessionId: string,
+  message: string,
+  store: Store,
+): Promise<AgentRunResult> {
+  const session = getSession(sessionId);
+  const storageRoot = resolveStorageRoot();
+  const config = await loadWorkflowConfig(storageRoot);
+  const provider = resolveModel(config, config.defaultModel);
+
+  const messages: ChatMessage[] = [...session.messages, { role: "user", content: message }];
+
+  return runBuiltinWithMessages(
+    storageRoot,
+    provider,
+    messages,
+    session,
+    store,
+    BUILTIN_CONTINUE_MAX_TURNS,
+    true,
+  );
+}
+
+/** Agent CLI factory: built-in LLM loop with file/shell tools. */
+export function createBuiltinAgent(): () => Promise<void> {
+  return createAgent({
+    name: "builtin",
+    run: runBuiltin,
+    continue: continueBuiltin,
+  });
+}
@@ -0,0 +1,6 @@
+#!/usr/bin/env bun
+
+import { createBuiltinAgent } from "./agent.js";
+
+const main = createBuiltinAgent();
+void main();
@@ -0,0 +1,49 @@
+import { bootstrap, putSchema, type Store } from "@uncaged/json-cas";
+
+import { BUILTIN_DETAIL_SCHEMA, BUILTIN_TURN_SCHEMA } from "./schemas.js";
+import { readSessionTurns } from "./session.js";
+import type { BuiltinDetailPayload } from "./types.js";
+
+type BuiltinSchemaHashes = {
+  turn: string;
+  detail: string;
+};
+
+export async function registerBuiltinSchemas(store: Store): Promise<BuiltinSchemaHashes> {
+  await bootstrap(store);
+  const [turn, detail] = await Promise.all([
+    putSchema(store, BUILTIN_TURN_SCHEMA),
+    putSchema(store, BUILTIN_DETAIL_SCHEMA),
+  ]);
+  return { turn, detail };
+}
+
+/** Read session jsonl, persist each turn to CAS, return detail hash. */
+export async function storeBuiltinDetail(
+  store: Store,
+  storageRoot: string,
+  sessionId: string,
+  model: string,
+  startedAtMs: number,
+  nowMs: number = Date.now(),
+): Promise<{ detailHash: string; turnCount: number }> {
+  const schemas = await registerBuiltinSchemas(store);
+  const turns = await readSessionTurns(storageRoot, sessionId);
+
+  const turnHashes: string[] = [];
+  for (const turn of turns) {
+    const hash = await store.put(schemas.turn, turn);
+    turnHashes.push(hash);
+  }
+
+  const duration = Math.max(0, nowMs - startedAtMs);
+  const detail: BuiltinDetailPayload = {
+    sessionId,
+    model,
+    duration,
+    turnCount: turnHashes.length,
+    turns: turnHashes,
+  };
+  const detailHash = await store.put(schemas.detail, detail);
+  return { detailHash, turnCount: turnHashes.length };
+}
@@ -0,0 +1,16 @@
+export { createBuiltinAgent } from "./agent.js";
+export { registerBuiltinSchemas, storeBuiltinDetail } from "./detail.js";
+export type { ChatMessage, LlmAssistantResponse, LlmToolCall } from "./llm/index.js";
+export { chatCompletionWithTools } from "./llm/index.js";
+export { BUILTIN_CONTINUE_MAX_TURNS, BUILTIN_MAX_TURNS, runBuiltinLoop } from "./loop.js";
+export { buildBuiltinMessages } from "./prompt.js";
+export { appendSessionTurn, initSessionDir, readSessionTurns, removeSession } from "./session.js";
+export type { BuiltinTool, ToolContext } from "./tools/index.js";
+export { executeBuiltinTool, getBuiltinTools } from "./tools/index.js";
+export type {
+  BuiltinDetailPayload,
+  BuiltinLoopTurn,
+  BuiltinToolCallRecord,
+  BuiltinToolResultRecord,
+  BuiltinTurnPayload,
+} from "./types.js";
@@ -0,0 +1,7 @@
+export { chatCompletionWithTools } from "./llm.js";
+export type {
+  ChatMessage,
+  LlmAssistantResponse,
+  LlmToolCall,
+  OpenAiToolDefinition,
+} from "./types.js";
@@ -0,0 +1,139 @@
+import type { ResolvedLlmProvider } from "@uncaged/workflow-agent-kit";
+
+import type {
+  ChatMessage,
+  LlmAssistantResponse,
+  LlmToolCall,
+  OpenAiToolDefinition,
+} from "./types.js";
+
+function isRecord(value: unknown): value is Record<string, unknown> {
+  return typeof value === "object" && value !== null && !Array.isArray(value);
+}
+
+function chatUrl(baseUrl: string): string {
+  const trimmed = baseUrl.replace(/\/+$/, "");
+  return `${trimmed}/chat/completions`;
+}
+
+function parseToolCalls(raw: unknown): LlmToolCall[] | null {
+  if (!Array.isArray(raw) || raw.length === 0) {
+    return null;
+  }
+  const calls: LlmToolCall[] = [];
+  for (const entry of raw) {
+    if (!isRecord(entry)) {
+      continue;
+    }
+    const id = entry.id;
+    const fn = entry.function;
+    if (typeof id !== "string" || !isRecord(fn)) {
+      continue;
+    }
+    const name = fn.name;
+    const args = fn.arguments;
+    if (typeof name !== "string" || typeof args !== "string") {
+      continue;
+    }
+    calls.push({ id, name, arguments: args });
+  }
+  return calls.length > 0 ? calls : null;
+}
+
+function parseAssistantMessage(parsed: unknown): LlmAssistantResponse {
+  if (!isRecord(parsed)) {
+    throw new Error("LLM response is not an object");
+  }
+  const choices = parsed.choices;
+  if (!Array.isArray(choices) || choices.length === 0) {
+    throw new Error("LLM response has no choices");
+  }
+  const c0 = choices[0];
+  if (!isRecord(c0)) {
+    throw new Error("LLM choice is not an object");
+  }
+  const messageObj = c0.message;
+  if (!isRecord(messageObj)) {
+    throw new Error("LLM message is not an object");
+  }
+  const contentRaw = messageObj.content;
+  const content =
+    typeof contentRaw === "string"
+      ? contentRaw
+      : contentRaw === null || contentRaw === undefined
+        ? null
+        : null;
+  const toolCalls = parseToolCalls(messageObj.tool_calls);
+  return { content, toolCalls };
+}
+
+function serializeMessage(message: ChatMessage): Record<string, unknown> {
+  if (message.role === "tool") {
+    return {
+      role: "tool",
+      tool_call_id: message.tool_call_id,
+      content: message.content,
+    };
+  }
+  if (message.role === "assistant") {
+    const base: Record<string, unknown> = {
+      role: "assistant",
+      content: message.content,
+    };
+    if (message.tool_calls !== null && message.tool_calls.length > 0) {
+      base.tool_calls = message.tool_calls.map((call) => ({
+        id: call.id,
+        type: "function",
+        function: { name: call.name, arguments: call.arguments },
+      }));
+    }
+    return base;
+  }
+  return { role: message.role, content: message.content };
+}
+
+/** OpenAI-compatible chat completion with tool calling (non-streaming). */
+export async function chatCompletionWithTools(
+  provider: ResolvedLlmProvider,
+  messages: ChatMessage[],
+  tools: OpenAiToolDefinition[] | null,
+): Promise<LlmAssistantResponse> {
+  const body: Record<string, unknown> = {
+    model: provider.model,
+    messages: messages.map(serializeMessage),
+  };
+  if (tools !== null && tools.length > 0) {
+    body.tools = tools;
+    body.tool_choice = "auto";
+  }
+
+  let response: Response;
+  try {
+    response = await fetch(chatUrl(provider.baseUrl), {
+      method: "POST",
+      headers: {
+        Authorization: `Bearer ${provider.apiKey}`,
+        "Content-Type": "application/json",
+      },
+      body: JSON.stringify(body),
+    });
+  } catch (cause) {
+    const message = cause instanceof Error ? cause.message : String(cause);
+    throw new Error(`LLM network error: ${message}`);
+  }
+
+  const responseText = await response.text();
+  if (!response.ok) {
+    throw new Error(`LLM HTTP ${response.status}: ${responseText.slice(0, 2000)}`);
+  }
+
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(responseText) as unknown;
+  } catch (cause) {
+    const message = cause instanceof Error ? cause.message : String(cause);
+    throw new Error(`LLM invalid JSON response: ${message}`);
+  }
+
+  return parseAssistantMessage(parsed);
+}
@@ -0,0 +1,29 @@
+export type LlmToolCall = {
+  id: string;
+  name: string;
+  arguments: string;
+};
+
+export type LlmAssistantResponse = {
+  content: string | null;
+  toolCalls: LlmToolCall[] | null;
+};
+
+export type ChatMessage =
+  | { role: "system"; content: string }
+  | { role: "user"; content: string }
+  | {
+      role: "assistant";
+      content: string | null;
+      tool_calls: LlmToolCall[] | null;
+    }
+  | { role: "tool"; tool_call_id: string; content: string };
+
+export type OpenAiToolDefinition = {
+  type: "function";
+  function: {
+    name: string;
+    description: string;
+    parameters: Record<string, unknown>;
+  };
+};
@@ -0,0 +1,303 @@
+import type { ResolvedLlmProvider } from "@uncaged/workflow-agent-kit";
+import { createLogger } from "@uncaged/workflow-util";
+
+import {
+  type ChatMessage,
+  chatCompletionWithTools,
+  type LlmToolCall,
+  type OpenAiToolDefinition,
+} from "./llm/index.js";
+import { appendSessionTurn } from "./session.js";
+import {
+  builtinToolsToOpenAi,
+  executeBuiltinTool,
+  getBuiltinTools,
+  type ToolContext,
+} from "./tools/index.js";
+import type { BuiltinToolCall, BuiltinTurnPayload } from "./types.js";
+
+const log = createLogger({ sink: { kind: "stderr" } });
+
+export const BUILTIN_MAX_TURNS = 30;
+export const BUILTIN_CONTINUE_MAX_TURNS = 5;
+
+export type RunBuiltinLoopOptions = {
+  provider: ResolvedLlmProvider;
+  messages: ChatMessage[];
+  toolCtx: ToolContext;
+  maxTurns: number;
+  storageRoot: string;
+  sessionId: string;
+  /** When true, do not provide tools — force LLM to emit text only. */
+  noTools: boolean;
+};
+
+export type RunBuiltinLoopResult = {
+  finalText: string;
+  messages: ChatMessage[];
+  turnCount: number;
+};
+
+function mapToolCallsForPayload(calls: LlmToolCall[]): BuiltinToolCall[] {
+  return calls.map((call) => ({
+    name: call.name,
+    args: call.arguments,
+  }));
+}
+
+async function appendTurn(
+  storageRoot: string,
+  sessionId: string,
+  payload: BuiltinTurnPayload,
+): Promise<void> {
+  await appendSessionTurn(storageRoot, sessionId, payload);
+}
+
+export async function executeTurnTools(
+  calls: Array<{ id: string; name: string; arguments: string }>,
+  toolCtx: ToolContext,
+  messages: ChatMessage[],
+  storageRoot: string,
+  sessionId: string,
+): Promise<number> {
+  let turnCount = 0;
+  for (const call of calls) {
+    const result = await executeBuiltinTool(call.name, call.arguments, toolCtx);
+    messages.push({ role: "tool", tool_call_id: call.id, content: result });
+    await appendTurn(storageRoot, sessionId, {
+      role: "tool",
+      content: result,
+      toolCalls: null,
+      reasoning: null,
+    });
+    turnCount += 1;
+  }
+  return turnCount;
+}
+
+export type ShouldNudgeOptions = {
+  noTools: boolean;
+  text: string;
+  turn: number;
+  maxTurns: number;
+};
+
+const MAX_NUDGES = 3;
+const DEADLINE_WARNING_TURNS = 3;
+
+export function shouldInjectDeadlineWarning(
+  turn: number,
+  maxTurns: number,
+  alreadyWarned: boolean,
+  noTools: boolean,
+): boolean {
+  const turnsRemaining = maxTurns - turn;
+  return (
+    !noTools && !alreadyWarned && turnsRemaining > 0 && turnsRemaining <= DEADLINE_WARNING_TURNS
+  );
+}
+
+export function shouldProcessToolCalls(toolCalls: LlmToolCall[] | null, noTools: boolean): boolean {
+  return !noTools && toolCalls !== null && toolCalls.length > 0;
+}
+
+export function extractFinalText(messages: ChatMessage[]): string {
+  for (let i = messages.length - 1; i >= 0; i--) {
+    const msg = messages[i];
+    if (
+      msg !== undefined &&
+      msg.role === "assistant" &&
+      msg.content !== null &&
+      msg.content.trim() !== ""
+    ) {
+      return msg.content;
+    }
+  }
+  return "";
+}
+
+function injectDeadlineWarning(messages: ChatMessage[], turnsRemaining: number): void {
+  log("4NRXW6KT", `${turnsRemaining} turns remaining, injecting deadline warning`);
+  messages.push({
+    role: "user",
+    content:
+      `⚠️ You have ${turnsRemaining} turns remaining. ` +
+      "Wrap up your work and output the YAML frontmatter starting with `---`. " +
+      "If you cannot finish in time, output frontmatter with `status: failed` and describe what remains.",
+  });
+}
+
+type HandleTextOnlyTurnResult = {
+  shouldBreak: boolean;
+  finalText: string;
+  turnCount: number;
+  nudgeCount: number;
+  turnAdjustment: number;
+};
+
+async function handleTextOnlyTurn(
+  text: string,
+  messages: ChatMessage[],
+  storageRoot: string,
+  sessionId: string,
+  noTools: boolean,
+  turn: number,
+  maxTurns: number,
+  currentNudgeCount: number,
+): Promise<HandleTextOnlyTurnResult> {
+  await appendTurn(storageRoot, sessionId, {
+    role: "assistant",
+    content: text,
+    toolCalls: null,
+    reasoning: null,
+  });
+  const turnCount = 1;
+  let nudgeCount = currentNudgeCount;
+  let turnAdjustment = 0;
+
+  if (shouldNudge({ noTools, text, turn, maxTurns })) {
+    nudgeCount += 1;
+    log("7FXQM2KN", `text-only turn without frontmatter, nudge ${nudgeCount}/${MAX_NUDGES}`);
+    const nudge =
+      "You stopped calling tools but your response does not start with the required `---` YAML frontmatter. " +
+      "Either continue using tools to complete your work, or output your final response starting with `---`.";
+    messages.push({ role: "user", content: nudge });
+    // Nudge doesn't consume turn budget (up to MAX_NUDGES)
+    if (nudgeCount <= MAX_NUDGES) {
+      turnAdjustment = -1;
+    }
+    return { shouldBreak: false, finalText: "", turnCount, nudgeCount, turnAdjustment };
+  }
+
+  return { shouldBreak: true, finalText: text, turnCount, nudgeCount, turnAdjustment };
+}
+
+async function handleToolCallTurn(
+  content: string,
+  toolCalls: LlmToolCall[],
+  messages: ChatMessage[],
+  storageRoot: string,
+  sessionId: string,
+  toolCtx: ToolContext,
+): Promise<number> {
+  await appendTurn(storageRoot, sessionId, {
+    role: "assistant",
+    content,
+    toolCalls: mapToolCallsForPayload(toolCalls),
+    reasoning: null,
+  });
+  let turnCount = 1;
+
+  // Execute tools
+  turnCount += await executeTurnTools(toolCalls, toolCtx, messages, storageRoot, sessionId);
+
+  return turnCount;
+}
+
+export function shouldNudge({ noTools, text, turn, maxTurns }: ShouldNudgeOptions): boolean {
+  return !noTools && !text.trimStart().startsWith("---") && turn < maxTurns - 1;
+}
+
+type ProcessLoopIterationResult = {
+  shouldBreak: boolean;
+  finalText: string;
+  turnCount: number;
+  nudgeCount: number;
+  turnAdjustment: number;
+};
+
+async function processLoopIteration(
+  options: RunBuiltinLoopOptions,
+  messages: ChatMessage[],
+  openAiTools: OpenAiToolDefinition[],
+  turn: number,
+  nudgeCount: number,
+): Promise<ProcessLoopIterationResult> {
+  const response = await chatCompletionWithTools(
+    options.provider,
+    messages,
+    openAiTools.length > 0 ? openAiTools : null,
+  );
+
+  // When noTools is set, ignore any tool_calls the LLM might still return
+  const effectiveToolCalls = options.noTools ? null : (response.toolCalls ?? null);
+
+  const assistantMessage: ChatMessage = {
+    role: "assistant",
+    content: response.content,
+    tool_calls: effectiveToolCalls,
+  };
+  messages.push(assistantMessage);
+
+  if (!shouldProcessToolCalls(effectiveToolCalls, options.noTools)) {
+    const text = response.content ?? "";
+    const result = await handleTextOnlyTurn(
+      text,
+      messages,
+      options.storageRoot,
+      options.sessionId,
+      options.noTools,
+      turn,
+      options.maxTurns,
+      nudgeCount,
+    );
+    return result;
+  }
+
+  // At this point, effectiveToolCalls is guaranteed to be non-null and non-empty
+  const turnCount = await handleToolCallTurn(
+    response.content ?? "",
+    effectiveToolCalls as LlmToolCall[],
+    messages,
+    options.storageRoot,
+    options.sessionId,
+    options.toolCtx,
+  );
+
+  return {
+    shouldBreak: false,
+    finalText: "",
+    turnCount,
+    nudgeCount,
+    turnAdjustment: 0,
+  };
+}
+
+/** Agent run loop: LLM ↔ tools until no tool_calls or maxTurns. */
+export async function runBuiltinLoop(
+  options: RunBuiltinLoopOptions,
+): Promise<RunBuiltinLoopResult> {
+  const messages = [...options.messages];
+  const openAiTools = options.noTools ? [] : builtinToolsToOpenAi(getBuiltinTools());
+  let finalText = "";
+  let turnCount = 0;
+  let nudgeCount = 0;
+  let deadlineWarned = false;
+
+  for (let turn = 0; turn < options.maxTurns; turn++) {
+    log("8K2M4N7P", `builtin loop turn ${turn + 1}/${options.maxTurns}`);
+
+    // Warn agent when approaching turn limit
+    if (shouldInjectDeadlineWarning(turn, options.maxTurns, deadlineWarned, options.noTools)) {
+      deadlineWarned = true;
+      const turnsRemaining = options.maxTurns - turn;
+      injectDeadlineWarning(messages, turnsRemaining);
+    }
+
+    const result = await processLoopIteration(options, messages, openAiTools, turn, nudgeCount);
+    turnCount += result.turnCount;
+    nudgeCount = result.nudgeCount;
+    turn += result.turnAdjustment;
+
+    if (result.shouldBreak) {
+      finalText = result.finalText;
+      break;
+    }
+  }
+
+  if (finalText === "") {
+    finalText = extractFinalText(messages);
+  }
+
+  return { finalText, messages, turnCount };
+}
@@ -0,0 +1,115 @@
+import { type AgentContext, buildRolePrompt } from "@uncaged/workflow-agent-kit";
+
+import type { ChatMessage } from "./llm/index.js";
+
+type StepContext = AgentContext["steps"][number];
+
+function formatStep(step: StepContext, stepNumber: number): string {
+  return [
+    `### Step ${stepNumber}: ${step.role}`,
+    `Output: ${JSON.stringify(step.output)}`,
+    `Agent: ${step.agent}`,
+  ].join("\n");
+}
+
+function buildStepsSummary(steps: StepContext[], fromIndex: number, toIndex: number): string {
+  if (fromIndex >= toIndex) {
+    return "";
+  }
+
+  const lines: string[] = ["## What Happened Since Your Last Turn"];
+  for (let i = fromIndex; i < toIndex; i++) {
+    const step = steps[i];
+    if (step === undefined) {
+      continue;
+    }
+    lines.push("");
+    lines.push(formatStep(step, i + 1));
+  }
+  return lines.join("\n");
+}
+
+function buildUserTurnContent(edgePrompt: string, summary: string): string {
+  const parts: string[] = [];
+  if (edgePrompt !== "") {
+    parts.push(edgePrompt);
+  }
+  if (summary !== "") {
+    if (parts.length > 0) {
+      parts.push("");
+    }
+    parts.push(summary);
+  }
+  return parts.join("\n");
+}
+
+/**
+ * Reconstruct multi-turn chat messages from thread history for cache-friendly session resume.
+ *
+ * - system: role prompt + output format (stable prefix)
+ * - For each prior visit of this role: user (edgePrompt + inter-step summary) + assistant (output JSON)
+ * - Final user: current edgePrompt + summary since last visit of this role
+ */
+export function buildBuiltinMessages(ctx: AgentContext): ChatMessage[] {
+  const roleDef = ctx.workflow.roles[ctx.role];
+  const rolePrompt = roleDef !== undefined ? buildRolePrompt(roleDef) : "";
+  const systemParts: string[] = [];
+  if (ctx.outputFormatInstruction !== "") {
+    systemParts.push(ctx.outputFormatInstruction, "");
+  }
+  systemParts.push(rolePrompt);
+
+  systemParts.push(
+    "",
+    "## Workflow",
+    "",
+    `Your working directory is: ${process.cwd()}`,
+    "",
+    "You have tools available (read_file, write_file, run_command). " +
+      "Use them to complete your task — read files, run commands, make changes as needed. " +
+      "Your task is described in the user message below — do NOT use uwf or workflow CLI commands to discover your task. " +
+      "When you are done, output your final response with the YAML frontmatter block as specified above. " +
+      "Do NOT output the frontmatter until you have completed all necessary work. " +
+      "If you are running low on turns and cannot finish, output the frontmatter with `status: failed` and explain what remains in the body. " +
+      "CRITICAL: Your final output MUST start with the `---` fence on the very first line — " +
+      "no preamble text, no explanation before it. The parser requires `---` at position 0.",
+  );
+
+  const messages: ChatMessage[] = [{ role: "system", content: systemParts.join("\n") }];
+
+  const roleVisitIndices: number[] = [];
+  for (let i = 0; i < ctx.steps.length; i++) {
+    const step = ctx.steps[i];
+    if (step !== undefined && step.role === ctx.role) {
+      roleVisitIndices.push(i);
+    }
+  }
+
+  let prevVisitIndex = -1;
+  for (const visitIndex of roleVisitIndices) {
+    const visitStep = ctx.steps[visitIndex];
+    if (visitStep === undefined) {
+      continue;
+    }
+
+    const summary = buildStepsSummary(ctx.steps, prevVisitIndex + 1, visitIndex);
+    messages.push({
+      role: "user",
+      content: buildUserTurnContent(visitStep.edgePrompt, summary),
+    });
+    messages.push({
+      role: "assistant",
+      content: JSON.stringify(visitStep.output),
+      tool_calls: null,
+    });
+    prevVisitIndex = visitIndex;
+  }
+
+  const finalSummary = buildStepsSummary(ctx.steps, prevVisitIndex + 1, ctx.steps.length);
+  messages.push({
+    role: "user",
+    content: buildUserTurnContent(ctx.edgePrompt, finalSummary),
+  });
+
+  return messages;
+}
@@ -0,0 +1,45 @@
+import type { JSONSchema } from "@uncaged/json-cas";
+
+const BUILTIN_TOOL_CALL_SCHEMA: JSONSchema = {
+  type: "object",
+  required: ["name", "args"],
+  properties: {
+    name: { type: "string" },
+    args: { type: "string" },
+  },
+  additionalProperties: false,
+};
+
+export const BUILTIN_TURN_SCHEMA: JSONSchema = {
+  title: "builtin-turn",
+  type: "object",
+  required: ["role", "content"],
+  properties: {
+    role: { type: "string", enum: ["assistant", "tool"] },
+    content: { type: "string" },
+    toolCalls: {
+      anyOf: [{ type: "array", items: BUILTIN_TOOL_CALL_SCHEMA }, { type: "null" }],
+    },
+    reasoning: {
+      anyOf: [{ type: "string" }, { type: "null" }],
+    },
+  },
+  additionalProperties: false,
+};
+
+export const BUILTIN_DETAIL_SCHEMA: JSONSchema = {
+  title: "builtin-detail",
+  type: "object",
+  required: ["sessionId", "model", "duration", "turnCount", "turns"],
+  properties: {
+    sessionId: { type: "string" },
+    model: { type: "string" },
+    duration: { type: "integer" },
+    turnCount: { type: "integer" },
+    turns: {
+      type: "array",
+      items: { type: "string", format: "cas_ref" },
+    },
+  },
+  additionalProperties: false,
+};
@@ -0,0 +1,59 @@
+import { appendFile, mkdir, readFile, rm } from "node:fs/promises";
+import { join } from "node:path";
+
+import { createLogger } from "@uncaged/workflow-util";
+
+import type { BuiltinTurnPayload } from "./types.js";
+
+const log = createLogger({ sink: { kind: "stderr" } });
+
+function sessionsDir(storageRoot: string): string {
+  return join(storageRoot, "sessions");
+}
+
+function sessionFile(storageRoot: string, sessionId: string): string {
+  return join(sessionsDir(storageRoot), `${sessionId}.jsonl`);
+}
+
+/** Ensure sessions directory exists. */
+export async function initSessionDir(storageRoot: string): Promise<void> {
+  await mkdir(sessionsDir(storageRoot), { recursive: true });
+}
+
+/** Append a turn to the session jsonl file. */
+export async function appendSessionTurn(
+  storageRoot: string,
+  sessionId: string,
+  turn: BuiltinTurnPayload,
+): Promise<void> {
+  const line = `${JSON.stringify(turn)}\n`;
+  await appendFile(sessionFile(storageRoot, sessionId), line, "utf-8");
+  log("3XQVN8KR", `session ${sessionId} appended ${turn.role} turn`);
+}
+
+/** Read all turns from session jsonl. Returns empty array if file does not exist. */
+export async function readSessionTurns(
+  storageRoot: string,
+  sessionId: string,
+): Promise<BuiltinTurnPayload[]> {
+  try {
+    const content = await readFile(sessionFile(storageRoot, sessionId), "utf-8");
+    const lines = content
+      .trim()
+      .split("\n")
+      .filter((l) => l.length > 0);
+    return lines.map((l) => JSON.parse(l) as BuiltinTurnPayload);
+  } catch {
+    return [];
+  }
+}
+
+/** Remove session jsonl file (called after detail is persisted to step CAS). */
+export async function removeSession(storageRoot: string, sessionId: string): Promise<void> {
+  try {
+    await rm(sessionFile(storageRoot, sessionId));
+    log("7FWDP2MJ", `session ${sessionId} removed`);
+  } catch {
+    // already gone — fine
+  }
+}
@@ -0,0 +1,44 @@
+import type { OpenAiToolDefinition } from "../llm/index.js";
+
+import { readFileTool } from "./read-file.js";
+import { runCommandTool } from "./run-command.js";
+import type { BuiltinTool, ToolContext } from "./types.js";
+import { writeFileTool } from "./write-file.js";
+
+export { resolvePath } from "./path.js";
+export type { BuiltinTool, ToolContext } from "./types.js";
+
+const BUILTIN_TOOLS: BuiltinTool[] = [readFileTool, writeFileTool, runCommandTool];
+
+export function getBuiltinTools(): readonly BuiltinTool[] {
+  return BUILTIN_TOOLS;
+}
+
+export function builtinToolsToOpenAi(tools: readonly BuiltinTool[]): OpenAiToolDefinition[] {
+  return tools.map((tool) => ({
+    type: "function",
+    function: {
+      name: tool.name,
+      description: tool.description,
+      parameters: tool.parameters as Record<string, unknown>,
+    },
+  }));
+}
+
+export async function executeBuiltinTool(
+  name: string,
+  argsJson: string,
+  ctx: ToolContext,
+): Promise<string> {
+  const tool = BUILTIN_TOOLS.find((t) => t.name === name);
+  if (tool === undefined) {
+    return `Error: unknown tool ${name}`;
+  }
+  let args: unknown;
+  try {
+    args = JSON.parse(argsJson) as unknown;
+  } catch {
+    return "Error: tool arguments must be valid JSON";
+  }
+  return tool.execute(args, ctx);
+}
@@ -0,0 +1,6 @@
+import { resolve } from "node:path";
+
+/** Resolve a path relative to the working directory. */
+export function resolvePath(cwd: string, inputPath: string): string {
+  return resolve(cwd, inputPath);
+}
@@ -0,0 +1,41 @@
+import { readFile, stat } from "node:fs/promises";
+import { resolvePath } from "./path.js";
+import type { BuiltinTool } from "./types.js";
+
+const MAX_READ_BYTES = 512 * 1024;
+
+function isRecord(value: unknown): value is Record<string, unknown> {
+  return typeof value === "object" && value !== null && !Array.isArray(value);
+}
+
+export const readFileTool: BuiltinTool = {
+  name: "read_file",
+  description: "Read a UTF-8 text file from the workspace.",
+  parameters: {
+    type: "object",
+    required: ["path"],
+    properties: {
+      path: { type: "string", description: "Relative or absolute path within the workspace." },
+    },
+    additionalProperties: false,
+  },
+  execute: async (args, ctx) => {
+    if (!isRecord(args) || typeof args.path !== "string") {
+      return "Error: path must be a string";
+    }
+    const resolved = resolvePath(ctx.cwd, args.path);
+    try {
+      const info = await stat(resolved);
+      if (!info.isFile()) {
+        return "Error: not a file";
+      }
+      if (info.size > MAX_READ_BYTES) {
+        return `Error: file exceeds ${MAX_READ_BYTES} byte limit`;
+      }
+      return await readFile(resolved, "utf8");
+    } catch (cause) {
+      const message = cause instanceof Error ? cause.message : String(cause);
+      return `Error: ${message}`;
+    }
+  },
+};
@@ -0,0 +1,95 @@
+import { spawn } from "node:child_process";
+import { resolvePath } from "./path.js";
+import type { BuiltinTool } from "./types.js";
+
+const COMMAND_TIMEOUT_MS = 60_000;
+const MAX_OUTPUT_CHARS = 32_000;
+
+function isRecord(value: unknown): value is Record<string, unknown> {
+  return typeof value === "object" && value !== null && !Array.isArray(value);
+}
+
+function truncate(text: string, maxChars: number): string {
+  if (text.length <= maxChars) {
+    return text;
+  }
+  return `${text.slice(0, maxChars)}\n...(truncated)`;
+}
+
+function runShell(
+  command: string,
+  cwd: string,
+): Promise<{ stdout: string; stderr: string; code: number }> {
+  return new Promise((resolve, reject) => {
+    const child = spawn(command, {
+      cwd,
+      env: process.env,
+      shell: true,
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+
+    let stdout = "";
+    let stderr = "";
+    child.stdout?.on("data", (chunk: Buffer) => {
+      stdout += chunk.toString();
+    });
+    child.stderr?.on("data", (chunk: Buffer) => {
+      stderr += chunk.toString();
+    });
+
+    const timer = setTimeout(() => {
+      child.kill("SIGTERM");
+    }, COMMAND_TIMEOUT_MS);
+
+    child.on("error", (cause) => {
+      clearTimeout(timer);
+      const message = cause instanceof Error ? cause.message : String(cause);
+      reject(new Error(message));
+    });
+
+    child.on("close", (code) => {
+      clearTimeout(timer);
+      resolve({ stdout, stderr, code: code ?? 1 });
+    });
+  });
+}
+
+export const runCommandTool: BuiltinTool = {
+  name: "run_command",
+  description: "Run a shell command. Output is truncated to 32KB.",
+  parameters: {
+    type: "object",
+    required: ["command"],
+    properties: {
+      command: { type: "string", description: "Shell command to execute." },
+      cwd: {
+        type: "string",
+        description: "Optional working directory relative to workspace root.",
+      },
+    },
+    additionalProperties: false,
+  },
+  execute: async (args, ctx) => {
+    if (!isRecord(args) || typeof args.command !== "string") {
+      return "Error: command must be a string";
+    }
+    let workDir = ctx.cwd;
+    if (args.cwd !== undefined && args.cwd !== null) {
+      if (typeof args.cwd !== "string") {
+        return "Error: cwd must be a string";
+      }
+      workDir = resolvePath(ctx.cwd, args.cwd);
+    }
+    try {
+      const { stdout, stderr, code } = await runShell(args.command, workDir);
+      const out = truncate(
+        `exit_code: ${code}\n--- stdout ---\n${stdout}\n--- stderr ---\n${stderr}`,
+        MAX_OUTPUT_CHARS,
+      );
+      return out;
+    } catch (cause) {
+      const message = cause instanceof Error ? cause.message : String(cause);
+      return `Error: ${message}`;
+    }
+  },
+};
@@ -0,0 +1,13 @@
+import type { JSONSchema } from "@uncaged/json-cas";
+
+export type ToolContext = {
+  cwd: string;
+  storageRoot: string;
+};
+
+export type BuiltinTool = {
+  name: string;
+  description: string;
+  parameters: JSONSchema;
+  execute: (args: unknown, ctx: ToolContext) => Promise<string>;
+};
@@ -0,0 +1,36 @@
+import { mkdir, writeFile } from "node:fs/promises";
+import { dirname } from "node:path";
+import { resolvePath } from "./path.js";
+import type { BuiltinTool } from "./types.js";
+
+function isRecord(value: unknown): value is Record<string, unknown> {
+  return typeof value === "object" && value !== null && !Array.isArray(value);
+}
+
+export const writeFileTool: BuiltinTool = {
+  name: "write_file",
+  description: "Write UTF-8 text to a file in the workspace (creates parent directories).",
+  parameters: {
+    type: "object",
+    required: ["path", "content"],
+    properties: {
+      path: { type: "string", description: "Relative or absolute path within the workspace." },
+      content: { type: "string", description: "File contents to write." },
+    },
+    additionalProperties: false,
+  },
+  execute: async (args, ctx) => {
+    if (!isRecord(args) || typeof args.path !== "string" || typeof args.content !== "string") {
+      return "Error: path and content must be strings";
+    }
+    const resolved = resolvePath(ctx.cwd, args.path);
+    try {
+      await mkdir(dirname(resolved), { recursive: true });
+      await writeFile(resolved, args.content, "utf8");
+      return `Wrote ${args.content.length} bytes to ${args.path}`;
+    } catch (cause) {
+      const message = cause instanceof Error ? cause.message : String(cause);
+      return `Error: ${message}`;
+    }
+  },
+};
@@ -0,0 +1,49 @@
+import type { ChatMessage } from "./llm/index.js";
+
+export type BuiltinToolCallRecord = {
+  id: string;
+  name: string;
+  args: string;
+};
+
+export type BuiltinToolResultRecord = {
+  toolCallId: string;
+  name: string;
+  content: string;
+};
+
+export type BuiltinLoopTurn = {
+  assistantContent: string | null;
+  toolCalls: BuiltinToolCallRecord[] | null;
+  toolResults: BuiltinToolResultRecord[] | null;
+};
+
+export type BuiltinSessionState = {
+  sessionId: string;
+  model: string;
+  startedAtMs: number;
+  messages: ChatMessage[];
+  turns: BuiltinLoopTurn[];
+};
+
+export type BuiltinTurnRole = "assistant" | "tool";
+
+export type BuiltinToolCall = {
+  name: string;
+  args: string;
+};
+
+export type BuiltinTurnPayload = {
+  role: BuiltinTurnRole;
+  content: string;
+  toolCalls: BuiltinToolCall[] | null;
+  reasoning: string | null;
+};
+
+export type BuiltinDetailPayload = {
+  sessionId: string;
+  model: string;
+  duration: number;
+  turnCount: number;
+  turns: string[];
+};
@@ -0,0 +1,9 @@
+{
+  "extends": "../../tsconfig.json",
+  "compilerOptions": {
+    "rootDir": "src",
+    "outDir": "dist"
+  },
+  "include": ["src"],
+  "references": [{ "path": "../workflow-agent-kit" }, { "path": "../workflow-util" }]
+}
@@ -0,0 +1,91 @@
+# @uncaged/workflow-agent-claude-code
+
+`uwf-claude-code` agent — spawns the Claude Code CLI and captures session detail.
+
+## Overview
+
+Layer 3 agent implementation. Spawns the `claude` CLI with a composed system prompt (role definition, task, prior steps, edge prompt). Parses stream or JSON stdout, caches session IDs for multi-turn continuation, and stores raw output plus structured detail in CAS.
+
+**Dependencies:** `@uncaged/json-cas`, `@uncaged/workflow-agent-kit`
+
+## Installation
+
+Included as the `uwf-claude-code` binary when you install `@uncaged/workflow-agent-claude-code`:
+
+```bash
+bun add -g @uncaged/workflow-agent-claude-code
+```
+
+Requires the `claude` CLI on `PATH`.
+
+## CLI Usage
+
+Invoked by `uwf thread step`:
+
+```bash
+uwf-claude-code <thread-id> <role>
+```
+
+Configure or override the agent:
+
+```bash
+uwf setup --agent claude-code
+uwf thread step <thread-id> --agent uwf-claude-code
+```
+
+Environment variables set by the engine:
+
+| Variable | Purpose |
+|----------|---------|
+| `UWF_EDGE_PROMPT` | Moderator edge instruction for this step |
+
+## API
+
+All exports come from `src/index.ts`.
+
+### Agent factory
+
+```typescript
+function createClaudeCodeAgent(): () => Promise<void>
+function buildClaudeCodePrompt(ctx: AgentContext): string
+```
+
+### Session detail
+
+```typescript
+function parseClaudeCodeStreamOutput(stdout: string): ClaudeCodeParsedResult | null
+function parseClaudeCodeJsonOutput(stdout: string): ClaudeCodeParsedResult | null
+function storeClaudeCodeDetail(
+  store: Store,
+  parsed: ClaudeCodeParsedResult,
+  sessionId: string,
+): Promise<string>
+function storeClaudeCodeRawOutput(store: Store, rawOutput: string): Promise<string>
+```
+
+## Usage (library)
+
+```typescript
+import { createClaudeCodeAgent, buildClaudeCodePrompt } from "@uncaged/workflow-agent-claude-code";
+
+const main = createClaudeCodeAgent();
+void main();
+```
+
+## Internal Structure
+
+```
+src/
+├── index.ts
+├── cli.ts              Binary entrypoint
+├── claude-code.ts      createClaudeCodeAgent, buildClaudeCodePrompt, spawn logic
+├── session-detail.ts   Parse stdout, store CAS detail nodes
+├── schemas.ts          Claude Code detail CAS schemas
+└── types.ts            ClaudeCodeParsedResult, message shapes
+```
+
+## Configuration
+
+Uses session caching from `@uncaged/workflow-agent-kit` (`getCachedSessionId` / `setCachedSessionId`). No separate config file — relies on the Claude Code CLI's own authentication.
+
+Maximum turns per invocation: 90 (constant in `claude-code.ts`).
@@ -0,0 +1,103 @@
+import { describe, expect, test } from "bun:test";
+import type { AgentContext } from "@uncaged/workflow-agent-kit";
+import type { ThreadId } from "@uncaged/workflow-protocol";
+import { buildClaudeCodePrompt } from "../src/claude-code.js";
+
+function makeCtx(overrides: Partial<AgentContext> = {}): AgentContext {
+  return {
+    threadId: "01JTEST0000000000000000000" as ThreadId,
+    edgePrompt: "Proceed with the assigned role.",
+    isFirstVisit: true,
+    workflow: {
+      roles: {
+        developer: {
+          description: "TDD implementation per test spec",
+          goal: "Write code",
+          capabilities: ["coding"],
+          procedure: "1. Read spec\n2. Write code",
+          output: "List files changed",
+          frontmatter: "",
+        },
+      },
+      conditions: {},
+      graph: {},
+    },
+    role: "developer",
+    start: { prompt: "Fix the bug", workflowHash: "abc123", threadId: "t1" },
+    steps: [],
+    store: {} as AgentContext["store"],
+    outputFormatInstruction: "Use YAML frontmatter",
+    ...overrides,
+  };
+}
+
+describe("buildClaudeCodePrompt", () => {
+  test("assembles outputFormatInstruction + role prompt + task prompt", () => {
+    const result = buildClaudeCodePrompt(makeCtx());
+    expect(result).toMatch(/^Use YAML frontmatter/);
+    expect(result).toContain("Write code");
+    expect(result).toContain("## Task\nFix the bug");
+  });
+
+  test("includes previous steps with content on first visit", () => {
+    const ctx = makeCtx({
+      steps: [
+        {
+          role: "planner",
+          output: '{"plan":"do X"}',
+          agent: "hermes",
+          detail: "detail-1",
+          edgePrompt: "Create a plan.",
+          content: "Here is my detailed plan for doing X.",
+        },
+      ],
+    });
+    const result = buildClaudeCodePrompt(ctx);
+    expect(result).toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("Step 1: planner");
+    expect(result).toContain("do X");
+    // First visit should include step content
+    expect(result).toContain("Here is my detailed plan for doing X.");
+  });
+
+  test("re-entry shows steps since last visit without content", () => {
+    const ctx = makeCtx({
+      isFirstVisit: false,
+      steps: [
+        {
+          role: "developer",
+          output: '{"status":"done"}',
+          agent: "claude-code",
+          detail: "detail-1",
+          edgePrompt: "Implement.",
+          content: "I implemented everything.",
+        },
+        {
+          role: "reviewer",
+          output: '{"approved":false}',
+          agent: "claude-code",
+          detail: "detail-2",
+          edgePrompt: "Review.",
+          content: "Rejected: complexity too high, refactor cmdStepRead.",
+        },
+      ],
+    });
+    const result = buildClaudeCodePrompt(ctx);
+    expect(result).toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("reviewer");
+    expect(result).toContain("approved");
+  });
+
+  test("omits history section when steps array is empty", () => {
+    const result = buildClaudeCodePrompt(makeCtx({ steps: [] }));
+    expect(result).not.toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("## Current Instruction");
+  });
+
+  test("works without outputFormatInstruction", () => {
+    const result = buildClaudeCodePrompt(makeCtx({ outputFormatInstruction: "" }));
+    expect(result).not.toMatch(/^\s*\n/);
+    expect(result).toContain("Write code");
+    expect(result).toContain("## Task");
+  });
+});
@@ -0,0 +1,314 @@
+import { describe, expect, test } from "bun:test";
+import { createMemoryStore, walk } from "@uncaged/json-cas";
+import {
+  parseClaudeCodeJsonOutput,
+  parseClaudeCodeStreamOutput,
+  storeClaudeCodeDetail,
+  storeClaudeCodeRawOutput,
+} from "../src/session-detail.js";
+import type { ClaudeCodeParsedResult } from "../src/types.js";
+
+describe("parseClaudeCodeJsonOutput", () => {
+  test("parses valid claude -p --output-format json output", () => {
+    const stdout = JSON.stringify({
+      type: "result",
+      subtype: "success",
+      result: "Done fixing bug",
+      session_id: "75e2167f-abc",
+      num_turns: 3,
+      total_cost_usd: 0.08,
+      duration_ms: 10276,
+      stop_reason: "end_turn",
+      usage: { input_tokens: 100, output_tokens: 50 },
+    });
+    const parsed = parseClaudeCodeJsonOutput(stdout);
+    expect(parsed).not.toBeNull();
+    expect(parsed!.type).toBe("result");
+    expect(parsed!.subtype).toBe("success");
+    expect(parsed!.result).toBe("Done fixing bug");
+    expect(parsed!.sessionId).toBe("75e2167f-abc");
+    expect(parsed!.numTurns).toBe(3);
+    expect(parsed!.totalCostUsd).toBe(0.08);
+    expect(parsed!.durationMs).toBe(10276);
+    expect(parsed!.stopReason).toBe("end_turn");
+    expect(parsed!.usage.inputTokens).toBe(100);
+    expect(parsed!.usage.outputTokens).toBe(50);
+    expect(parsed!.turns).toEqual([]);
+  });
+
+  test("returns null for non-JSON output", () => {
+    const parsed = parseClaudeCodeJsonOutput("Some random text\nwithout JSON");
+    expect(parsed).toBeNull();
+  });
+
+  test("returns null when session_id is missing", () => {
+    const stdout = JSON.stringify({ type: "result", result: "hi", subtype: "success" });
+    const parsed = parseClaudeCodeJsonOutput(stdout);
+    expect(parsed).toBeNull();
+  });
+});
+
+describe("parseClaudeCodeStreamOutput", () => {
+  test("parses stream-json output with turns", () => {
+    const lines = [
+      JSON.stringify({
+        type: "system",
+        subtype: "init",
+        session_id: "sess-123",
+        model: "claude-sonnet-4.5",
+        tools: ["Bash", "Read"],
+      }),
+      JSON.stringify({
+        type: "assistant",
+        message: {
+          role: "assistant",
+          content: [
+            { type: "text", text: "I'll list the files." },
+            { type: "tool_use", id: "tool_1", name: "Bash", input: { command: "ls" } },
+          ],
+        },
+        session_id: "sess-123",
+      }),
+      JSON.stringify({
+        type: "user",
+        message: {
+          role: "user",
+          content: [{ type: "tool_result", tool_use_id: "tool_1", content: "file1.ts\nfile2.ts" }],
+        },
+        session_id: "sess-123",
+      }),
+      JSON.stringify({
+        type: "assistant",
+        message: {
+          role: "assistant",
+          content: [{ type: "text", text: "There are 2 files." }],
+        },
+        session_id: "sess-123",
+      }),
+      JSON.stringify({
+        type: "result",
+        subtype: "success",
+        result: "There are 2 files.",
+        session_id: "sess-123",
+        num_turns: 2,
+        total_cost_usd: 0.05,
+        duration_ms: 5000,
+        stop_reason: "end_turn",
+        usage: {
+          input_tokens: 200,
+          output_tokens: 30,
+          cache_read_input_tokens: 100,
+          cache_creation_input_tokens: 0,
+        },
+      }),
+    ];
+    const stdout = lines.join("\n");
+    const parsed = parseClaudeCodeStreamOutput(stdout);
+
+    expect(parsed).not.toBeNull();
+    expect(parsed!.model).toBe("claude-sonnet-4.5");
+    expect(parsed!.sessionId).toBe("sess-123");
+    expect(parsed!.result).toBe("There are 2 files.");
+    expect(parsed!.stopReason).toBe("end_turn");
+    expect(parsed!.usage.inputTokens).toBe(200);
+    expect(parsed!.usage.outputTokens).toBe(30);
+    expect(parsed!.usage.cacheReadInputTokens).toBe(100);
+
+    // Turns: assistant(text+tool), tool_result, assistant(text)
+    expect(parsed!.turns).toHaveLength(3);
+    expect(parsed!.turns[0]!.role).toBe("assistant");
+    expect(parsed!.turns[0]!.content).toBe("I'll list the files.");
+    expect(parsed!.turns[0]!.toolCalls).toHaveLength(1);
+    expect(parsed!.turns[0]!.toolCalls![0]!.name).toBe("Bash");
+    expect(parsed!.turns[1]!.role).toBe("tool_result");
+    expect(parsed!.turns[1]!.content).toBe("file1.ts\nfile2.ts");
+    expect(parsed!.turns[2]!.role).toBe("assistant");
+    expect(parsed!.turns[2]!.content).toBe("There are 2 files.");
+    expect(parsed!.turns[2]!.toolCalls).toBeNull();
+  });
+
+  test("returns null when no result line", () => {
+    const stdout = JSON.stringify({ type: "system", model: "test" });
+    expect(parseClaudeCodeStreamOutput(stdout)).toBeNull();
+  });
+
+  test("skips invalid JSON lines gracefully", () => {
+    const lines = [
+      "not json",
+      JSON.stringify({
+        type: "result",
+        subtype: "success",
+        result: "ok",
+        session_id: "s1",
+        num_turns: 1,
+        total_cost_usd: 0.01,
+        duration_ms: 1000,
+        stop_reason: "end_turn",
+        usage: {},
+      }),
+    ];
+    const parsed = parseClaudeCodeStreamOutput(lines.join("\n"));
+    expect(parsed).not.toBeNull();
+    expect(parsed!.result).toBe("ok");
+    expect(parsed!.turns).toHaveLength(0);
+  });
+});
+
+describe("parseClaudeCodeStreamOutput — helper extraction", () => {
+  test("processSystemLine sets model from system message", () => {
+    const lines = [
+      JSON.stringify({ type: "system", model: "claude-opus-4" }),
+      JSON.stringify({
+        type: "result",
+        subtype: "success",
+        result: "ok",
+        session_id: "s1",
+        num_turns: 0,
+        total_cost_usd: 0,
+        duration_ms: 0,
+        stop_reason: "end_turn",
+      }),
+    ];
+    const parsed = parseClaudeCodeStreamOutput(lines.join("\n"));
+    expect(parsed).not.toBeNull();
+    expect(parsed!.model).toBe("claude-opus-4");
+  });
+
+  test("processAssistantLine skips empty content", () => {
+    const lines = [
+      JSON.stringify({ type: "assistant", message: { role: "assistant", content: [] } }),
+      JSON.stringify({
+        type: "result",
+        subtype: "success",
+        result: "ok",
+        session_id: "s1",
+        num_turns: 0,
+        total_cost_usd: 0,
+        duration_ms: 0,
+        stop_reason: "end_turn",
+      }),
+    ];
+    const parsed = parseClaudeCodeStreamOutput(lines.join("\n"));
+    expect(parsed).not.toBeNull();
+    expect(parsed!.turns).toHaveLength(0);
+  });
+
+  test("processUserLine skips when no tool_result items", () => {
+    const lines = [
+      JSON.stringify({
+        type: "user",
+        message: { role: "user", content: [{ type: "text", text: "hi" }] },
+      }),
+      JSON.stringify({
+        type: "result",
+        subtype: "success",
+        result: "ok",
+        session_id: "s1",
+        num_turns: 0,
+        total_cost_usd: 0,
+        duration_ms: 0,
+        stop_reason: "end_turn",
+      }),
+    ];
+    const parsed = parseClaudeCodeStreamOutput(lines.join("\n"));
+    expect(parsed).not.toBeNull();
+    expect(parsed!.turns).toHaveLength(0);
+  });
+
+  test("turn indices are sequential across mixed assistant and user lines", () => {
+    const lines = [
+      JSON.stringify({
+        type: "assistant",
+        message: { role: "assistant", content: [{ type: "text", text: "A" }] },
+      }),
+      JSON.stringify({
+        type: "user",
+        message: { role: "user", content: [{ type: "tool_result", content: "R" }] },
+      }),
+      JSON.stringify({
+        type: "assistant",
+        message: { role: "assistant", content: [{ type: "text", text: "B" }] },
+      }),
+      JSON.stringify({
+        type: "result",
+        subtype: "success",
+        result: "ok",
+        session_id: "s1",
+        num_turns: 3,
+        total_cost_usd: 0,
+        duration_ms: 0,
+        stop_reason: "end_turn",
+      }),
+    ];
+    const parsed = parseClaudeCodeStreamOutput(lines.join("\n"));
+    expect(parsed).not.toBeNull();
+    expect(parsed!.turns).toHaveLength(3);
+    expect(parsed!.turns.map((t) => t.index)).toEqual([0, 1, 2]);
+  });
+});
+
+describe("storeClaudeCodeDetail", () => {
+  const baseParsed: ClaudeCodeParsedResult = {
+    type: "result",
+    subtype: "success",
+    result: "The answer",
+    sessionId: "abc-123",
+    numTurns: 5,
+    totalCostUsd: 0.12,
+    durationMs: 15000,
+    model: "claude-sonnet-4.5",
+    stopReason: "end_turn",
+    usage: {
+      inputTokens: 100,
+      outputTokens: 50,
+      cacheReadInputTokens: 0,
+      cacheCreationInputTokens: 0,
+    },
+    turns: [
+      { index: 0, role: "assistant", content: "hello", toolCalls: null },
+      { index: 1, role: "tool_result", content: "world", toolCalls: null },
+    ],
+  };
+
+  test("stores detail with per-turn CAS nodes", async () => {
+    const store = createMemoryStore();
+    const { detailHash, output, sessionId } = await storeClaudeCodeDetail(store, baseParsed);
+
+    expect(detailHash).toHaveLength(13);
+    expect(output).toBe("The answer");
+    expect(sessionId).toBe("abc-123");
+
+    const node = await store.get(detailHash);
+    expect(node).not.toBeNull();
+    expect(node!.payload.model).toBe("claude-sonnet-4.5");
+    expect(node!.payload.stopReason).toBe("end_turn");
+    expect(node!.payload.usage.inputTokens).toBe(100);
+    expect(node!.payload.turns).toHaveLength(2);
+
+    // Verify turn CAS nodes
+    const turn0 = await store.get(node!.payload.turns[0]);
+    expect(turn0).not.toBeNull();
+    expect(turn0!.payload.role).toBe("assistant");
+    expect(turn0!.payload.content).toBe("hello");
+  });
+
+  test("detail node is walkable from root", async () => {
+    const store = createMemoryStore();
+    const { detailHash } = await storeClaudeCodeDetail(store, baseParsed);
+    const visited: string[] = [];
+    walk(store, detailHash, (hash) => visited.push(hash));
+    expect(visited.length).toBeGreaterThan(0);
+  });
+});
+
+describe("storeClaudeCodeRawOutput", () => {
+  test("stores raw text when JSON parsing fails", async () => {
+    const store = createMemoryStore();
+    const rawText = "Claude produced plain text without JSON";
+    const hash = await storeClaudeCodeRawOutput(store, rawText);
+    expect(hash).toHaveLength(13);
+    const node = await store.get(hash);
+    expect(node).not.toBeNull();
+    expect(node!.payload.text).toBe(rawText);
+  });
+});
@@ -0,0 +1,34 @@
+{
+  "name": "@uncaged/workflow-agent-claude-code",
+  "version": "0.1.0",
+  "files": [
+    "src",
+    "dist",
+    "package.json"
+  ],
+  "type": "module",
+  "bin": {
+    "uwf-claude-code": "./src/cli.ts"
+  },
+  "exports": {
+    ".": {
+      "bun": "./src/index.ts",
+      "types": "./dist/index.d.ts",
+      "import": "./dist/index.js"
+    }
+  },
+  "scripts": {
+    "test": "bun test"
+  },
+  "dependencies": {
+    "@uncaged/json-cas": "^0.4.0",
+    "@uncaged/workflow-agent-kit": "workspace:^",
+    "@uncaged/workflow-util": "workspace:^"
+  },
+  "devDependencies": {
+    "typescript": "^5.8.3"
+  },
+  "publishConfig": {
+    "access": "public"
+  }
+}
@@ -0,0 +1,187 @@
+import { spawn } from "node:child_process";
+import type { Store } from "@uncaged/json-cas";
+import {
+  type AgentContext,
+  type AgentRunResult,
+  buildContinuationPrompt,
+  buildRolePrompt,
+  createAgent,
+  getCachedSessionId,
+  setCachedSessionId,
+} from "@uncaged/workflow-agent-kit";
+import { createLogger } from "@uncaged/workflow-util";
+
+import { parseClaudeCodeStreamOutput, storeClaudeCodeDetail } from "./session-detail.js";
+
+const log = createLogger({ sink: { kind: "stderr" } });
+
+const CLAUDE_COMMAND = "claude";
+const CLAUDE_MAX_TURNS = 90;
+const CLAUDE_MODEL = process.env.CLAUDE_MODEL ?? null;
+
+/** Assemble system prompt, task, and prior step outputs for Claude Code. */
+export function buildClaudeCodePrompt(ctx: AgentContext): string {
+  const roleDef = ctx.workflow.roles[ctx.role];
+  const rolePrompt = roleDef !== undefined ? buildRolePrompt(roleDef) : "";
+  const parts: string[] = [];
+  if (ctx.outputFormatInstruction !== undefined && ctx.outputFormatInstruction !== "") {
+    parts.push(ctx.outputFormatInstruction, "");
+  }
+  parts.push(rolePrompt, "", "## Task", ctx.start.prompt);
+
+  if (!ctx.isFirstVisit) {
+    // Re-entry (session will be resumed): show only steps since last visit, meta only
+    parts.push("", buildContinuationPrompt(ctx.steps, ctx.role, ctx.edgePrompt));
+  } else if (ctx.steps.length > 0) {
+    // First visit: show all steps with content for recent ones
+    parts.push(
+      "",
+      buildContinuationPrompt(ctx.steps, ctx.role, ctx.edgePrompt, {
+        includeContent: true,
+        quota: 32000,
+      }),
+    );
+  } else {
+    parts.push("", "## Current Instruction", "", ctx.edgePrompt);
+  }
+
+  return parts.join("\n");
+}
+
+function spawnClaude(args: string[]): Promise<{ stdout: string; stderr: string }> {
+  return new Promise((resolve, reject) => {
+    const child = spawn(CLAUDE_COMMAND, args, {
+      env: process.env,
+      shell: false,
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+
+    let stdout = "";
+    let stderr = "";
+    child.stdout?.on("data", (chunk: Buffer) => {
+      stdout += chunk.toString();
+    });
+    child.stderr?.on("data", (chunk: Buffer) => {
+      stderr += chunk.toString();
+    });
+
+    child.on("error", (cause) => {
+      const message = cause instanceof Error ? cause.message : String(cause);
+      reject(new Error(`claude spawn failed: ${message}`));
+    });
+
+    child.on("close", (code) => {
+      if (code === 0) {
+        resolve({ stdout, stderr });
+        return;
+      }
+      const detail = stderr.trim() !== "" ? ` stderr=${stderr.trim()}` : "";
+      reject(new Error(`claude exited with code ${code ?? "null"}${detail}`));
+    });
+  });
+}
+
+function spawnClaudeRun(prompt: string): Promise<{ stdout: string; stderr: string }> {
+  const args = [
+    "-p",
+    prompt,
+    "--output-format",
+    "stream-json",
+    "--verbose",
+    "--dangerously-skip-permissions",
+    "--max-turns",
+    String(CLAUDE_MAX_TURNS),
+  ];
+  if (CLAUDE_MODEL !== null) {
+    args.push("--model", CLAUDE_MODEL);
+  }
+  return spawnClaude(args);
+}
+
+function spawnClaudeResume(
+  sessionId: string,
+  message: string,
+): Promise<{ stdout: string; stderr: string }> {
+  const args = [
+    "-p",
+    message,
+    "--resume",
+    sessionId,
+    "--output-format",
+    "stream-json",
+    "--verbose",
+    "--dangerously-skip-permissions",
+    "--max-turns",
+    String(CLAUDE_MAX_TURNS),
+  ];
+  if (CLAUDE_MODEL !== null) {
+    args.push("--model", CLAUDE_MODEL);
+  }
+  return spawnClaude(args);
+}
+
+async function processClaudeOutput(stdout: string, store: Store): Promise<AgentRunResult> {
+  const parsed = parseClaudeCodeStreamOutput(stdout);
+
+  if (parsed !== null) {
+    const { detailHash, output, sessionId } = await storeClaudeCodeDetail(store, parsed);
+    return { output, detailHash, sessionId };
+  }
+
+  throw new Error(
+    `Claude Code returned unparseable output (first 200 chars): ${stdout.slice(0, 200)}`,
+  );
+}
+
+async function runClaudeCode(ctx: AgentContext): Promise<AgentRunResult> {
+  const fullPrompt = buildClaudeCodePrompt(ctx);
+
+  log("K7R2M4N8", `prompt for role=${ctx.role} (length=${fullPrompt.length}):\n${fullPrompt}`);
+
+  // Try resuming a cached session for re-entry scenarios (e.g. reviewer reject → developer re-entry).
+  if (!ctx.isFirstVisit) {
+    const cachedSessionId = await getCachedSessionId("claude-code", ctx.threadId, ctx.role);
+    if (cachedSessionId !== null) {
+      try {
+        const { stdout } = await spawnClaudeResume(cachedSessionId, fullPrompt);
+        const result = await processClaudeOutput(stdout, ctx.store);
+        if (result.sessionId !== undefined && result.sessionId !== "") {
+          await setCachedSessionId("claude-code", ctx.threadId, ctx.role, result.sessionId);
+        }
+        return result;
+      } catch (err) {
+        log(
+          "5VKR8N3Q",
+          "resume failed for session %s, falling back to fresh run: %s",
+          cachedSessionId,
+          err,
+        );
+      }
+    }
+  }
+
+  const { stdout } = await spawnClaudeRun(fullPrompt);
+  const result = await processClaudeOutput(stdout, ctx.store);
+  if (result.sessionId !== undefined && result.sessionId !== "") {
+    await setCachedSessionId("claude-code", ctx.threadId, ctx.role, result.sessionId);
+  }
+  return result;
+}
+
+async function continueClaudeCode(
+  sessionId: string,
+  message: string,
+  store: Store,
+): Promise<AgentRunResult> {
+  const { stdout } = await spawnClaudeResume(sessionId, message);
+  return processClaudeOutput(stdout, store);
+}
+
+/** Agent CLI factory: parses argv, runs Claude Code, extracts output, writes StepNode. */
+export function createClaudeCodeAgent(): () => Promise<void> {
+  return createAgent({
+    name: "claude-code",
+    run: runClaudeCode,
+    continue: continueClaudeCode,
+  });
+}
@@ -0,0 +1,6 @@
+#!/usr/bin/env bun
+
+import { createClaudeCodeAgent } from "./claude-code.js";
+
+const main = createClaudeCodeAgent();
+void main();
@@ -0,0 +1,7 @@
+export { buildClaudeCodePrompt, createClaudeCodeAgent } from "./claude-code.js";
+export {
+  parseClaudeCodeJsonOutput,
+  parseClaudeCodeStreamOutput,
+  storeClaudeCodeDetail,
+  storeClaudeCodeRawOutput,
+} from "./session-detail.js";
@@ -0,0 +1,64 @@
+import type { JSONSchema } from "@uncaged/json-cas";
+
+export const CLAUDE_CODE_DETAIL_SCHEMA: JSONSchema = {
+  title: "claude-code-detail",
+  type: "object",
+  required: [
+    "sessionId",
+    "model",
+    "subtype",
+    "durationMs",
+    "numTurns",
+    "totalCostUsd",
+    "stopReason",
+    "usage",
+    "turns",
+  ],
+  properties: {
+    sessionId: { type: "string" },
+    model: { type: "string" },
+    subtype: { type: "string" },
+    durationMs: { type: "integer" },
+    numTurns: { type: "integer" },
+    totalCostUsd: { type: "number" },
+    stopReason: { type: "string" },
+    usage: {
+      type: "object",
+      properties: {
+        inputTokens: { type: "integer" },
+        outputTokens: { type: "integer" },
+        cacheReadInputTokens: { type: "integer" },
+        cacheCreationInputTokens: { type: "integer" },
+      },
+      required: ["inputTokens", "outputTokens", "cacheReadInputTokens", "cacheCreationInputTokens"],
+    },
+    turns: {
+      type: "array",
+      items: { type: "string", format: "cas_ref" },
+    },
+  },
+  additionalProperties: false,
+};
+
+export const CLAUDE_CODE_TURN_SCHEMA: JSONSchema = {
+  title: "claude-code-turn",
+  type: "object",
+  required: ["index", "role", "content", "toolCalls"],
+  properties: {
+    index: { type: "integer" },
+    role: { type: "string" },
+    content: { type: "string" },
+    toolCalls: {},
+  },
+  additionalProperties: false,
+};
+
+export const CLAUDE_CODE_RAW_OUTPUT_SCHEMA: JSONSchema = {
+  title: "claude-code-raw-output",
+  type: "object",
+  required: ["text"],
+  properties: {
+    text: { type: "string" },
+  },
+  additionalProperties: false,
+};
@@ -0,0 +1,263 @@
+import { bootstrap, putSchema, type Store } from "@uncaged/json-cas";
+
+import {
+  CLAUDE_CODE_DETAIL_SCHEMA,
+  CLAUDE_CODE_RAW_OUTPUT_SCHEMA,
+  CLAUDE_CODE_TURN_SCHEMA,
+} from "./schemas.js";
+import type {
+  ClaudeCodeDetailPayload,
+  ClaudeCodeParsedResult,
+  ClaudeCodeToolCall,
+  ClaudeCodeTurnPayload,
+} from "./types.js";
+
+function isRecord(value: unknown): value is Record<string, unknown> {
+  return typeof value === "object" && value !== null && !Array.isArray(value);
+}
+
+function safeNumber(v: unknown, fallback = 0): number {
+  return typeof v === "number" ? v : fallback;
+}
+
+function safeString(v: unknown, fallback = ""): string {
+  return typeof v === "string" ? v : fallback;
+}
+
+/**
+ * Extract tool calls from an assistant message content array.
+ */
+function extractToolCalls(content: unknown[]): ClaudeCodeToolCall[] {
+  const calls: ClaudeCodeToolCall[] = [];
+  for (const item of content) {
+    if (isRecord(item) && item.type === "tool_use" && typeof item.name === "string") {
+      calls.push({
+        name: item.name,
+        input: typeof item.input === "string" ? item.input : JSON.stringify(item.input ?? {}),
+      });
+    }
+  }
+  return calls;
+}
+
+/**
+ * Extract text content from a message content array.
+ */
+function extractTextContent(content: unknown[]): string {
+  const texts: string[] = [];
+  for (const item of content) {
+    if (isRecord(item) && item.type === "text" && typeof item.text === "string") {
+      texts.push(item.text);
+    }
+  }
+  return texts.join("\n");
+}
+
+/**
+ * Extract tool result content from a user message content array.
+ */
+function extractToolResultContent(content: unknown[]): string {
+  const results: string[] = [];
+  for (const item of content) {
+    if (isRecord(item) && item.type === "tool_result") {
+      const text = typeof item.content === "string" ? item.content : "";
+      results.push(text);
+    }
+  }
+  return results.join("\n");
+}
+
+type ParseState = {
+  turns: ClaudeCodeTurnPayload[];
+  resultLine: Record<string, unknown> | null;
+  model: string;
+  turnIndex: number;
+};
+
+function processSystemLine(parsed: Record<string, unknown>, state: ParseState): void {
+  if (typeof parsed.model === "string") {
+    state.model = parsed.model;
+  }
+}
+
+function processAssistantLine(parsed: Record<string, unknown>, state: ParseState): void {
+  if (!isRecord(parsed.message)) return;
+  const content = Array.isArray(parsed.message.content) ? parsed.message.content : [];
+  const textContent = extractTextContent(content as unknown[]);
+  const toolCalls = extractToolCalls(content as unknown[]);
+  if (textContent !== "" || toolCalls.length > 0) {
+    state.turns.push({
+      index: state.turnIndex++,
+      role: "assistant",
+      content: textContent,
+      toolCalls: toolCalls.length > 0 ? toolCalls : null,
+    });
+  }
+}
+
+function processUserLine(parsed: Record<string, unknown>, state: ParseState): void {
+  if (!isRecord(parsed.message)) return;
+  const content = Array.isArray(parsed.message.content) ? parsed.message.content : [];
+  const resultContent = extractToolResultContent(content as unknown[]);
+  if (resultContent !== "") {
+    state.turns.push({
+      index: state.turnIndex++,
+      role: "tool_result",
+      content: resultContent,
+      toolCalls: null,
+    });
+  }
+}
+
+function processLine(line: string, state: ParseState): void {
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(line);
+  } catch {
+    return;
+  }
+  if (!isRecord(parsed)) return;
+  const type = parsed.type;
+  if (type === "system") processSystemLine(parsed, state);
+  else if (type === "assistant") processAssistantLine(parsed, state);
+  else if (type === "user") processUserLine(parsed, state);
+  else if (type === "result") state.resultLine = parsed;
+}
+
+function assembleResult(state: ParseState): ClaudeCodeParsedResult | null {
+  if (state.resultLine === null) return null;
+  const sessionId = state.resultLine.session_id;
+  const result = state.resultLine.result;
+  const subtype = state.resultLine.subtype;
+  if (typeof sessionId !== "string" || typeof result !== "string" || typeof subtype !== "string") {
+    return null;
+  }
+  const usage = isRecord(state.resultLine.usage) ? state.resultLine.usage : {};
+  return {
+    type: safeString(state.resultLine.type, "result"),
+    subtype: subtype as ClaudeCodeParsedResult["subtype"],
+    result,
+    sessionId,
+    numTurns: safeNumber(state.resultLine.num_turns),
+    totalCostUsd: safeNumber(state.resultLine.total_cost_usd),
+    durationMs: safeNumber(state.resultLine.duration_ms),
+    model: state.model,
+    stopReason: safeString(state.resultLine.stop_reason),
+    usage: {
+      inputTokens: safeNumber(usage.input_tokens),
+      outputTokens: safeNumber(usage.output_tokens),
+      cacheReadInputTokens: safeNumber(usage.cache_read_input_tokens),
+      cacheCreationInputTokens: safeNumber(usage.cache_creation_input_tokens),
+    },
+    turns: state.turns,
+  };
+}
+
+/**
+ * Parse Claude Code stream-json (NDJSON) output.
+ * Each line is a JSON object with type: "system" | "assistant" | "user" | "result".
+ */
+export function parseClaudeCodeStreamOutput(stdout: string): ClaudeCodeParsedResult | null {
+  const lines = stdout.trim().split("\n");
+  const state: ParseState = { turns: [], resultLine: null, model: "", turnIndex: 0 };
+  for (const line of lines) {
+    processLine(line, state);
+  }
+  return assembleResult(state);
+}
+
+/**
+ * Legacy: parse Claude Code plain JSON output (non-streaming).
+ * Falls back when stream-json is not available.
+ */
+export function parseClaudeCodeJsonOutput(stdout: string): ClaudeCodeParsedResult | null {
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(stdout.trim());
+  } catch {
+    return null;
+  }
+
+  if (!isRecord(parsed)) return null;
+
+  const sessionId = parsed.session_id;
+  const result = parsed.result;
+  const subtype = parsed.subtype;
+
+  if (typeof sessionId !== "string" || typeof result !== "string" || typeof subtype !== "string") {
+    return null;
+  }
+
+  const usage = isRecord(parsed.usage) ? parsed.usage : {};
+
+  return {
+    type: safeString(parsed.type, "result"),
+    subtype: subtype as ClaudeCodeParsedResult["subtype"],
+    result,
+    sessionId,
+    numTurns: safeNumber(parsed.num_turns),
+    totalCostUsd: safeNumber(parsed.total_cost_usd),
+    durationMs: safeNumber(parsed.duration_ms),
+    model: "",
+    stopReason: safeString(parsed.stop_reason),
+    usage: {
+      inputTokens: safeNumber(usage.input_tokens),
+      outputTokens: safeNumber(usage.output_tokens),
+      cacheReadInputTokens: safeNumber(usage.cache_read_input_tokens),
+      cacheCreationInputTokens: safeNumber(usage.cache_creation_input_tokens),
+    },
+    turns: [],
+  };
+}
+
+type ClaudeCodeSchemaHashes = {
+  detail: string;
+  turn: string;
+  rawOutput: string;
+};
+
+async function registerSchemas(store: Store): Promise<ClaudeCodeSchemaHashes> {
+  await bootstrap(store);
+  const [detail, turn, rawOutput] = await Promise.all([
+    putSchema(store, CLAUDE_CODE_DETAIL_SCHEMA),
+    putSchema(store, CLAUDE_CODE_TURN_SCHEMA),
+    putSchema(store, CLAUDE_CODE_RAW_OUTPUT_SCHEMA),
+  ]);
+  return { detail, turn, rawOutput };
+}
+
+/** Store parsed Claude Code result with per-turn breakdown as CAS detail nodes. */
+export async function storeClaudeCodeDetail(
+  store: Store,
+  parsed: ClaudeCodeParsedResult,
+): Promise<{ detailHash: string; output: string; sessionId: string }> {
+  const schemas = await registerSchemas(store);
+
+  // Store each turn as an individual CAS node
+  const turnHashes: string[] = [];
+  for (const turn of parsed.turns) {
+    const hash = await store.put(schemas.turn, turn);
+    turnHashes.push(hash);
+  }
+
+  const detail: ClaudeCodeDetailPayload = {
+    sessionId: parsed.sessionId,
+    model: parsed.model,
+    subtype: parsed.subtype,
+    durationMs: parsed.durationMs,
+    numTurns: parsed.numTurns,
+    totalCostUsd: parsed.totalCostUsd,
+    stopReason: parsed.stopReason,
+    usage: parsed.usage,
+    turns: turnHashes,
+  };
+
+  const detailHash = await store.put(schemas.detail, detail);
+  return { detailHash, output: parsed.result, sessionId: parsed.sessionId };
+}
+
+/** Fallback: store raw text output when JSON parsing fails. */
+export async function storeClaudeCodeRawOutput(store: Store, rawOutput: string): Promise<string> {
+  const schemas = await registerSchemas(store);
+  return store.put(schemas.rawOutput, { text: rawOutput });
+}
@@ -0,0 +1,53 @@
+export type ClaudeCodeResultSubtype = "success" | "error_max_turns" | "error_budget";
+
+/** A single tool call within an assistant turn. */
+export type ClaudeCodeToolCall = {
+  name: string;
+  input: string;
+};
+
+/** A single turn (assistant text, tool use, or tool result). */
+export type ClaudeCodeTurnPayload = {
+  index: number;
+  role: "assistant" | "tool_result";
+  content: string;
+  toolCalls: ClaudeCodeToolCall[] | null;
+};
+
+/** Top-level detail stored as CAS node. */
+export type ClaudeCodeDetailPayload = {
+  sessionId: string;
+  model: string;
+  subtype: string;
+  durationMs: number;
+  numTurns: number;
+  totalCostUsd: number;
+  stopReason: string;
+  usage: {
+    inputTokens: number;
+    outputTokens: number;
+    cacheReadInputTokens: number;
+    cacheCreationInputTokens: number;
+  };
+  turns: string[]; // CAS hashes of ClaudeCodeTurnPayload
+};
+
+/** Intermediate parsed result from stream-json output. */
+export type ClaudeCodeParsedResult = {
+  type: string;
+  subtype: ClaudeCodeResultSubtype;
+  result: string;
+  sessionId: string;
+  numTurns: number;
+  totalCostUsd: number;
+  durationMs: number;
+  model: string;
+  stopReason: string;
+  usage: {
+    inputTokens: number;
+    outputTokens: number;
+    cacheReadInputTokens: number;
+    cacheCreationInputTokens: number;
+  };
+  turns: ClaudeCodeTurnPayload[];
+};
@@ -0,0 +1,6 @@
+{
+  "extends": "../../tsconfig.json",
+  "compilerOptions": { "rootDir": "src", "outDir": "dist" },
+  "include": ["src"],
+  "references": [{ "path": "../workflow-agent-kit" }]
+}
@@ -0,0 +1,90 @@
+# @uncaged/workflow-agent-hermes
+
+`uwf-hermes` agent — spawns Hermes chat via ACP and captures session detail.
+
+## Overview
+
+Layer 3 agent implementation. Wraps the Hermes CLI using the Agent Client Protocol (ACP). On first visit to a role it sends a composed prompt (role definition, task, history, edge prompt); on continuation it resumes the cached session. Session transcripts and raw output are stored as CAS detail nodes.
+
+**Dependencies:** `@uncaged/json-cas`, `@uncaged/workflow-agent-kit`, `@uncaged/workflow-protocol`, `@uncaged/workflow-util`
+
+## Installation
+
+Included as the `uwf-hermes` binary when you install `@uncaged/workflow-agent-hermes`:
+
+```bash
+bun add -g @uncaged/workflow-agent-hermes
+```
+
+Requires the `hermes` CLI on `PATH`.
+
+## CLI Usage
+
+Invoked by `uwf thread step` (not typically run directly):
+
+```bash
+uwf-hermes <thread-id> <role>
+```
+
+Environment variables set by the engine:
+
+| Variable | Purpose |
+|----------|---------|
+| `UWF_EDGE_PROMPT` | Moderator edge instruction for this step |
+
+Configure as the default agent via `uwf setup --agent hermes`.
+
+Override per step:
+
+```bash
+uwf thread step <thread-id> --agent uwf-hermes
+```
+
+## API
+
+All exports come from `src/index.ts`.
+
+### Agent factory
+
+```typescript
+function createHermesAgent(): () => Promise<void>
+function buildHermesPrompt(ctx: AgentContext): string
+```
+
+### ACP client
+
+```typescript
+class HermesAcpClient {
+  // Spawns hermes, handles JSON-RPC over stdio
+}
+```
+
+## Usage (library)
+
+```typescript
+import { createHermesAgent, buildHermesPrompt } from "@uncaged/workflow-agent-hermes";
+
+// CLI entry (src/cli.ts):
+const main = createHermesAgent();
+void main();
+```
+
+## Internal Structure
+
+```
+src/
+├── index.ts
+├── cli.ts              Binary entrypoint
+├── hermes.ts           createHermesAgent, buildHermesPrompt
+├── acp-client.ts       HermesAcpClient — ACP JSON-RPC over stdio
+├── session-cache.ts    Session ID cache (re-exports kit helpers + isResumeDisabled)
+├── session-detail.ts   Parse Hermes session JSON, store CAS detail nodes
+├── schemas.ts          Hermes detail CAS schemas
+└── types.ts            HermesSessionJson, HermesSessionMessage
+```
+
+## Configuration
+
+Uses workflow config from `~/.uncaged/workflow/config.yaml` (via agent-kit). Hermes session files are stored under the workflow storage root (see `session-detail.ts`).
+
+Set `UWF_HERMES_NO_RESUME=1` to disable session resume (see `isResumeDisabled` in `session-cache.ts`).
@@ -0,0 +1,168 @@
+import { afterEach, beforeEach, describe, expect, it } from "bun:test";
+
+import { HermesAcpClient } from "../src/acp-client.js";
+
+const UUID_RE = /^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/i;
+
+describe("handleSessionUpdate — helper extraction", () => {
+  let client: HermesAcpClient;
+
+  beforeEach(() => {
+    client = new HermesAcpClient();
+  });
+
+  afterEach(async () => {
+    await client.close();
+  });
+
+  it("agent_message_chunk accumulates text in messageChunks", () => {
+    (client as any).handleSessionUpdate({
+      sessionUpdate: "agent_message_chunk",
+      content: { type: "text", text: "hello" },
+    });
+    (client as any).handleSessionUpdate({
+      sessionUpdate: "agent_message_chunk",
+      content: { type: "text", text: " world" },
+    });
+    expect((client as any).messageChunks).toEqual(["hello", " world"]);
+  });
+
+  it("agent_thought_chunk accumulates reasoning in reasoningChunks", () => {
+    (client as any).handleSessionUpdate({
+      sessionUpdate: "agent_thought_chunk",
+      content: { type: "text", text: "thinking" },
+    });
+    expect((client as any).reasoningChunks).toEqual(["thinking"]);
+  });
+
+  it("tool_call registers a pending tool and flushes message chunks", () => {
+    (client as any).messageChunks = ["pre-tool text"];
+    (client as any).handleSessionUpdate({
+      sessionUpdate: "tool_call",
+      title: "Bash",
+      rawInput: { command: "ls" },
+      toolCallId: "tc-1",
+    });
+    expect((client as any).pendingTools.get("tc-1")).toEqual({
+      name: "Bash",
+      args: JSON.stringify({ command: "ls" }),
+    });
+    expect((client as any).messageChunks).toEqual([]);
+    expect((client as any).messages).toHaveLength(1);
+    expect((client as any).messages[0].role).toBe("assistant");
+  });
+
+  it("tool_call_update completed pushes tool_call and tool messages", () => {
+    (client as any).pendingTools.set("tc-2", { name: "Read", args: '{"path":"/foo"}' });
+    (client as any).handleSessionUpdate({
+      sessionUpdate: "tool_call_update",
+      status: "completed",
+      toolCallId: "tc-2",
+      rawOutput: "file contents",
+    });
+    const msgs = (client as any).messages as Array<{
+      role: string;
+      tool_calls: unknown;
+      content: string | null;
+    }>;
+    expect(msgs).toHaveLength(2);
+    expect(msgs[0].role).toBe("assistant");
+    expect(msgs[0].tool_calls).toEqual([
+      { function: { name: "Read", arguments: '{"path":"/foo"}' } },
+    ]);
+    expect(msgs[1].role).toBe("tool");
+    expect(msgs[1].content).toBe("file contents");
+    expect((client as any).pendingTools.has("tc-2")).toBe(false);
+  });
+
+  it("tool_call_update with non-string rawOutput JSON-stringifies it", () => {
+    (client as any).pendingTools.set("tc-3", { name: "Fetch", args: "" });
+    (client as any).handleSessionUpdate({
+      sessionUpdate: "tool_call_update",
+      status: "completed",
+      toolCallId: "tc-3",
+      rawOutput: { html: "<p>page</p>" },
+    });
+    const msgs = (client as any).messages as Array<{ role: string; content: string | null }>;
+    expect(msgs[1].content).toBe(JSON.stringify({ html: "<p>page</p>" }));
+  });
+
+  it("unknown updateType is a no-op", () => {
+    (client as any).handleSessionUpdate({ sessionUpdate: "unknown_type", data: {} });
+    expect((client as any).messages).toHaveLength(0);
+    expect((client as any).messageChunks).toHaveLength(0);
+  });
+});
+
+describe("HermesAcpClient", () => {
+  let client: HermesAcpClient;
+
+  beforeEach(() => {
+    client = new HermesAcpClient();
+  });
+
+  afterEach(async () => {
+    await client.close();
+  });
+
+  it(
+    "connect() returns a UUID sessionId",
+    async () => {
+      const sessionId = await client.connect(process.cwd());
+      expect(typeof sessionId).toBe("string");
+      expect(sessionId).toMatch(UUID_RE);
+    },
+    { timeout: 2 * 60 * 1000 },
+  );
+
+  it(
+    "prompt() returns a non-empty text response",
+    async () => {
+      await client.connect(process.cwd());
+      const result = await client.prompt("Reply with exactly the word: PONG");
+      expect(typeof result.text).toBe("string");
+      expect(result.text.length).toBeGreaterThan(0);
+      expect(typeof result.sessionId).toBe("string");
+      expect(result.sessionId).toMatch(UUID_RE);
+    },
+    { timeout: 2 * 60 * 1000 },
+  );
+
+  it(
+    "prompt() can be called twice on the same session (resume)",
+    async () => {
+      await client.connect(process.cwd());
+
+      const first = await client.prompt("Say the word ALPHA and nothing else.");
+      expect(first.text.length).toBeGreaterThan(0);
+
+      const second = await client.prompt("Now say the word BETA and nothing else.");
+      expect(second.text.length).toBeGreaterThan(0);
+
+      expect(first.sessionId).toBe(second.sessionId);
+    },
+    { timeout: 2 * 60 * 1000 },
+  );
+
+  // TODO(#435): flaky — depends on live LLM; mock or move to integration suite
+  it.skip(
+    "prompt() collects structured messages including tool calls",
+    async () => {
+      await client.connect(process.cwd());
+      const result = await client.prompt("Run this command: echo TOOL_DETAIL_TEST");
+      expect(result.messages.length).toBeGreaterThan(0);
+      // Should have at least one tool message (the echo command)
+      const toolMessages = result.messages.filter((m) => m.role === "tool");
+      expect(toolMessages.length).toBeGreaterThan(0);
+      // Tool message should contain the output
+      const toolContent = toolMessages[0]?.content ?? "";
+      expect(toolContent).toContain("TOOL_DETAIL_TEST");
+      // Should have assistant messages with tool_calls
+      const assistantWithTools = result.messages.filter(
+        (m) => m.role === "assistant" && m.tool_calls !== null,
+      );
+      expect(assistantWithTools.length).toBeGreaterThan(0);
+    },
+    { timeout: 2 * 60 * 1000 },
+  );
+});
@@ -0,0 +1,187 @@
+import { describe, expect, test } from "bun:test";
+import type { AgentContext } from "@uncaged/workflow-agent-kit";
+import type { ThreadId } from "@uncaged/workflow-protocol";
+import { buildHermesPrompt } from "../src/hermes.js";
+
+function makeCtx(overrides: Partial<AgentContext> = {}): AgentContext {
+  return {
+    threadId: "01JTEST0000000000000000000" as ThreadId,
+    edgePrompt: "Proceed with the assigned role.",
+    isFirstVisit: true,
+    workflow: {
+      roles: {
+        developer: {
+          description: "TDD implementation per test spec",
+          goal: "Write code",
+          capabilities: ["coding"],
+          procedure: "1. Read spec\n2. Write code",
+          output: "List files changed",
+          frontmatter: "",
+        },
+      },
+      conditions: {},
+      graph: {},
+    },
+    role: "developer",
+    start: { prompt: "Fix the bug", workflow: "abc123" },
+    steps: [],
+    store: {} as AgentContext["store"],
+    outputFormatInstruction: "Use YAML frontmatter",
+    ...overrides,
+  };
+}
+
+describe("buildHermesPrompt", () => {
+  test("first visit uses full role prompt and includes moderator instruction", () => {
+    const result = buildHermesPrompt(
+      makeCtx({ edgePrompt: "Focus on the failing test.", isFirstVisit: true }),
+    );
+
+    expect(result).toMatch(/^Use YAML frontmatter/);
+    expect(result).toContain("Write code");
+    expect(result).toContain("## Task\nFix the bug");
+    expect(result).toContain("## Moderator Instruction");
+    expect(result).toContain("Focus on the failing test.");
+  });
+
+  test("re-entry uses continuation prompt with edge instruction", () => {
+    const ctx = makeCtx({
+      isFirstVisit: false,
+      edgePrompt: "The reviewer rejected your work. Fix the issues.",
+      steps: [
+        {
+          role: "developer",
+          output: { summary: "Initial fix" },
+          agent: "uwf-hermes",
+          detail: "detail-1",
+          edgePrompt: "Implement the fix.",
+          content: null,
+        },
+        {
+          role: "reviewer",
+          output: { approved: false },
+          agent: "uwf-hermes",
+          detail: "detail-2",
+          edgePrompt: "Review the code.",
+          content: null,
+        },
+      ],
+    });
+
+    const result = buildHermesPrompt(ctx);
+
+    expect(result).not.toContain("## Task");
+    expect(result).toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("## Moderator Instruction");
+    expect(result).toContain("The reviewer rejected your work.");
+  });
+
+  test("forced first visit via isFirstVisit uses initial prompt even when role appears in history", () => {
+    const result = buildHermesPrompt(
+      makeCtx({
+        isFirstVisit: true,
+        steps: [
+          {
+            role: "developer",
+            output: { done: true },
+            agent: "uwf-hermes",
+            detail: "detail-1",
+            edgePrompt: "First attempt.",
+            content: null,
+          },
+        ],
+        edgePrompt: "Retry with a fresh approach.",
+      }),
+    );
+
+    expect(result).toContain("## Task");
+    expect(result).toContain("Retry with a fresh approach.");
+    expect(result).not.toContain("## What Happened Since Your Last Turn");
+  });
+
+  test("first visit includes content from previous steps", () => {
+    const ctx = makeCtx({
+      isFirstVisit: true,
+      steps: [
+        {
+          role: "planner",
+          output: { plan: "hash1" },
+          agent: "uwf-hermes",
+          detail: "detail-1",
+          edgePrompt: "Create the plan.",
+          content: "# Plan\nDetailed plan markdown...",
+        },
+        {
+          role: "developer",
+          output: { files: ["app.ts"] },
+          agent: "uwf-hermes",
+          detail: "detail-2",
+          edgePrompt: "Implement the code.",
+          content: "# Implementation\nCode changes...",
+        },
+        {
+          role: "reviewer",
+          output: { approved: true },
+          agent: "uwf-hermes",
+          detail: "detail-3",
+          edgePrompt: "Review the work.",
+          content: "# Review\nApproved!",
+        },
+      ],
+      role: "committer",
+      edgePrompt: "Commit the reviewed code.",
+    });
+
+    const result = buildHermesPrompt(ctx);
+
+    expect(result).toContain("Use YAML frontmatter");
+    expect(result).toContain("## Task");
+    expect(result).toContain("Fix the bug");
+    expect(result).toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("### Step 1: planner");
+    expect(result).toContain("#### Step Content");
+    expect(result).toContain("# Plan");
+    expect(result).toContain("Detailed plan markdown");
+    expect(result).toContain("### Step 2: developer");
+    expect(result).toContain("# Implementation");
+    expect(result).toContain("### Step 3: reviewer");
+    expect(result).toContain("# Review");
+    expect(result).toContain("## Moderator Instruction");
+    expect(result).toContain("Commit the reviewed code.");
+  });
+
+  test("re-entry omits content from previous steps", () => {
+    const ctx = makeCtx({
+      isFirstVisit: false,
+      steps: [
+        {
+          role: "developer",
+          output: { files: ["app.ts"] },
+          agent: "uwf-hermes",
+          detail: "detail-1",
+          edgePrompt: "Implement the code.",
+          content: "# Implementation\nCode changes...",
+        },
+        {
+          role: "reviewer",
+          output: { approved: false },
+          agent: "uwf-hermes",
+          detail: "detail-2",
+          edgePrompt: "Review the work.",
+          content: "# Review\nNot approved!",
+        },
+      ],
+      role: "developer",
+      edgePrompt: "Fix the issues.",
+    });
+
+    const result = buildHermesPrompt(ctx);
+
+    expect(result).toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("### Step 2: reviewer");
+    expect(result).toContain(JSON.stringify({ approved: false }));
+    expect(result).not.toContain("#### Step Content");
+    expect(result).not.toContain("# Review");
+    expect(result).not.toContain("Not approved!");
+  });
+});
@@ -0,0 +1,57 @@
+import { afterEach, describe, expect, it } from "bun:test";
+
+import { HermesAcpClient } from "../src/acp-client.js";
+
+/**
+ * E2E test for cross-process session resume.
+ *
+ * Simulates the workflow re-entry scenario:
+ * 1. Client A: connect → prompt → close (developer first run)
+ * 2. Client B: resume(sessionId) → prompt (developer re-entry after reviewer reject)
+ *
+ * This is what happens when uwf thread step spawns uwf-hermes twice for the same role.
+ */
+describe("HermesAcpClient cross-process resume", () => {
+  const clients: HermesAcpClient[] = [];
+
+  afterEach(async () => {
+    for (const c of clients) {
+      await c.close();
+    }
+    clients.length = 0;
+  });
+
+  // TODO(#435): flaky — depends on live LLM; mock or move to integration suite
+  it.skip(
+    "resume() after close — second prompt returns non-empty text",
+    async () => {
+      // --- Client A: first run ---
+      const clientA = new HermesAcpClient();
+      clients.push(clientA);
+
+      await clientA.connect(process.cwd());
+      const first = await clientA.prompt(
+        "Remember the secret code: WATERMELON. Reply with exactly: ACKNOWLEDGED",
+      );
+      expect(first.text.length).toBeGreaterThan(0);
+      const sessionId = first.sessionId;
+
+      // Close client A (simulates uwf-hermes process exit)
+      await clientA.close();
+
+      // --- Client B: resume (simulates re-entry) ---
+      const clientB = new HermesAcpClient();
+      clients.push(clientB);
+
+      await clientB.resume(sessionId, process.cwd());
+      const second = await clientB.prompt(
+        "What was the secret code I told you earlier? Reply with just the code word.",
+      );
+
+      // The critical assertion: resumed session produces non-empty output
+      expect(second.text.length).toBeGreaterThan(0);
+      expect(second.sessionId).toBe(sessionId);
+    },
+    { timeout: 3 * 60 * 1000 },
+  );
+});
@@ -22,7 +22,9 @@
  },
  "dependencies": {
    "@uncaged/json-cas": "^0.4.0",
-    "@uncaged/workflow-agent-kit": "workspace:^"
+    "@uncaged/workflow-agent-kit": "workspace:^",
+    "@uncaged/workflow-protocol": "workspace:^",
+    "@uncaged/workflow-util": "workspace:^"
  },
  "devDependencies": {
    "typescript": "^5.8.3"
@@ -0,0 +1,395 @@
+import type { ChildProcess } from "node:child_process";
+import { spawn } from "node:child_process";
+import { createInterface } from "node:readline";
+
+import type { HermesSessionMessage } from "./types.js";
+
+const HERMES_COMMAND = "hermes";
+const PROTOCOL_VERSION = 1;
+
+type JsonRpcResponse = {
+  jsonrpc: "2.0";
+  id: number;
+  result?: unknown;
+  error?: { code: number; message: string };
+};
+
+type PendingRequest = {
+  resolve: (value: JsonRpcResponse) => void;
+  reject: (reason: Error) => void;
+};
+
+/** Tracks in-flight tool calls so we can build complete messages when they finish. */
+type PendingToolCall = {
+  name: string;
+  args: string;
+};
+
+export type AcpPromptResult = {
+  text: string;
+  sessionId: string;
+  messages: HermesSessionMessage[];
+};
+
+export class HermesAcpClient {
+  private process: ChildProcess | null = null;
+  private nextId = 1;
+  private sessionId: string | null = null;
+  private stderrBuffer = "";
+  private pending = new Map<number, PendingRequest>();
+
+  // Message collection state
+  private messageChunks: string[] = [];
+  private reasoningChunks: string[] = [];
+  private pendingTools = new Map<string, PendingToolCall>();
+  messages: HermesSessionMessage[] = [];
+
+  /** Spawn hermes acp, initialize, create session */
+  async connect(cwd: string): Promise<string> {
+    await this.ensureProcess();
+    await this.initialize();
+
+    const sessionResponse = (await this.sendRequest("session/new", {
+      cwd,
+      mcpServers: [],
+    })) as { result: { sessionId: string } };
+
+    const sessionId = sessionResponse.result?.sessionId;
+    if (typeof sessionId !== "string" || sessionId === "") {
+      throw new Error(`session/new did not return a sessionId: ${JSON.stringify(sessionResponse)}`);
+    }
+
+    this.sessionId = sessionId;
+    return sessionId;
+  }
+
+  /** Spawn hermes acp, initialize, resume an existing session */
+  async resume(sessionId: string, cwd: string): Promise<string> {
+    await this.ensureProcess();
+    await this.initialize();
+
+    const response = await this.sendRequest("session/resume", {
+      cwd,
+      sessionId,
+      mcpServers: [],
+    });
+
+    if ((response as { error?: unknown }).error !== undefined) {
+      throw new Error(
+        `session/resume failed: ${JSON.stringify((response as { error: unknown }).error)}`,
+      );
+    }
+
+    this.sessionId = sessionId;
+    return sessionId;
+  }
+
+  /** Send prompt and collect full response text + structured messages. */
+  async prompt(text: string): Promise<AcpPromptResult> {
+    if (this.sessionId === null) {
+      throw new Error("Not connected — call connect() first");
+    }
+
+    this.messageChunks = [];
+    this.reasoningChunks = [];
+
+    const response = await this.sendRequest("session/prompt", {
+      sessionId: this.sessionId,
+      prompt: [{ type: "text", text }],
+    });
+
+    if ((response as { error?: unknown }).error !== undefined) {
+      throw new Error(
+        `session/prompt failed: ${JSON.stringify((response as { error: unknown }).error)}`,
+      );
+    }
+
+    // Flush any trailing assistant text that wasn't followed by a tool call.
+    this.flushAssistantMessage();
+
+    // Extract the final assistant text from collected messages.
+    let finalText = "";
+    for (let i = this.messages.length - 1; i >= 0; i--) {
+      const msg = this.messages[i];
+      if (
+        msg !== undefined &&
+        msg.role === "assistant" &&
+        msg.content !== null &&
+        msg.content.trim() !== ""
+      ) {
+        finalText = msg.content;
+        break;
+      }
+    }
+
+    return {
+      text: finalText,
+      sessionId: this.sessionId,
+      messages: this.messages,
+    };
+  }
+
+  /** Close the connection */
+  async close(): Promise<void> {
+    if (this.process === null) {
+      return;
+    }
+    this.sessionId = null;
+    this.process.stdin?.end();
+    const proc = this.process;
+    await new Promise<void>((resolve) => {
+      proc.on("close", () => resolve());
+      setTimeout(resolve, 5000);
+    });
+    this.process = null;
+  }
+
+  // ---- JSON-RPC transport ----
+
+  private sendRequest(
+    method: string,
+    params: Record<string, unknown>,
+    timeoutMs = 10 * 60 * 1000,
+  ): Promise<JsonRpcResponse> {
+    const id = this.nextId++;
+    return new Promise<JsonRpcResponse>((resolve, reject) => {
+      const timer = setTimeout(() => {
+        this.pending.delete(id);
+        reject(new Error(`Timeout waiting for response to ${method} (id=${id})`));
+      }, timeoutMs);
+
+      this.pending.set(id, {
+        resolve: (value) => {
+          clearTimeout(timer);
+          resolve(value);
+        },
+        reject: (err) => {
+          clearTimeout(timer);
+          reject(err);
+        },
+      });
+
+      this.writeLine(JSON.stringify({ jsonrpc: "2.0", id, method, params }));
+    });
+  }
+
+  private sendNotification(method: string, params?: Record<string, unknown>): void {
+    const message: Record<string, unknown> = { jsonrpc: "2.0", method };
+    if (params !== undefined) {
+      message.params = params;
+    }
+    this.writeLine(JSON.stringify(message));
+  }
+
+  private writeLine(line: string): void {
+    if (this.process?.stdin === null || this.process?.stdin === undefined) {
+      throw new Error("Cannot write: hermes acp process stdin not available");
+    }
+    this.process.stdin.write(`${line}\n`);
+  }
+
+  private handleLine(line: string): void {
+    if (line === "") {
+      return;
+    }
+
+    let parsed: unknown;
+    try {
+      parsed = JSON.parse(line);
+    } catch {
+      return;
+    }
+
+    const msg = parsed as Record<string, unknown>;
+
+    const hasId = "id" in msg && msg.id !== undefined && msg.id !== null;
+    const hasMethod = typeof msg.method === "string";
+
+    // JSON-RPC response to one of our requests (has "id" but no "method")
+    if (hasId && !hasMethod) {
+      const response = msg as unknown as JsonRpcResponse;
+      const handler = this.pending.get(response.id);
+      if (handler !== undefined) {
+        this.pending.delete(response.id);
+        handler.resolve(response);
+      }
+      return;
+    }
+
+    // Server-initiated JSON-RPC request: session/request_permission (has "id" + "method")
+    if (msg.method === "session/request_permission" && hasId) {
+      const params = msg.params as Record<string, unknown> | undefined;
+      const options = (params?.options ?? []) as Array<{ optionId?: string }>;
+      const firstOptionId = options[0]?.optionId ?? "";
+      this.writeLine(
+        JSON.stringify({
+          jsonrpc: "2.0",
+          id: msg.id,
+          result: { outcome: { outcome: "selected", optionId: firstOptionId } },
+        }),
+      );
+      return;
+    }
+
+    // JSON-RPC notification — session/update (no "id")
+    if (msg.method === "session/update") {
+      const params = msg.params as Record<string, unknown> | undefined;
+      const update = params?.update as Record<string, unknown> | undefined;
+      if (update !== undefined) {
+        this.handleSessionUpdate(update);
+      }
+      return;
+    }
+  }
+
+  // ---- Session update → structured messages ----
+
+  private handleSessionUpdate(update: Record<string, unknown>): void {
+    switch (update.sessionUpdate as string) {
+      case "agent_message_chunk":
+        this.handleAgentMessageChunk(update);
+        break;
+      case "agent_thought_chunk":
+        this.handleAgentThoughtChunk(update);
+        break;
+      case "tool_call":
+        this.handleToolCall(update);
+        break;
+      case "tool_call_update":
+        this.handleToolCallUpdate(update);
+        break;
+      default:
+        break;
+    }
+  }
+
+  private handleAgentMessageChunk(update: Record<string, unknown>): void {
+    const content = update.content as { type?: string; text?: string } | undefined;
+    if (content?.type === "text" && typeof content.text === "string") {
+      this.messageChunks.push(content.text);
+    }
+  }
+
+  private handleAgentThoughtChunk(update: Record<string, unknown>): void {
+    const content = update.content as { type?: string; text?: string } | undefined;
+    if (content?.type === "text" && typeof content.text === "string") {
+      this.reasoningChunks.push(content.text);
+    }
+  }
+
+  private handleToolCall(update: Record<string, unknown>): void {
+    const title = (update.title as string) ?? "";
+    const rawInput = update.rawInput;
+    const args = rawInput !== undefined && rawInput !== null ? JSON.stringify(rawInput) : "";
+    const toolCallId = update.toolCallId as string;
+    this.pendingTools.set(toolCallId, { name: title, args });
+    this.flushAssistantMessage();
+  }
+
+  private handleToolCallUpdate(update: Record<string, unknown>): void {
+    const status = update.status as string | undefined;
+    if (status !== "completed" && status !== "failed") return;
+    const toolCallId = update.toolCallId as string;
+    const pending = this.pendingTools.get(toolCallId);
+    const toolName = pending?.name ?? toolCallId;
+    const rawOutput = update.rawOutput;
+    const outputStr =
+      rawOutput !== undefined && rawOutput !== null
+        ? typeof rawOutput === "string"
+          ? rawOutput
+          : JSON.stringify(rawOutput)
+        : "";
+    this.messages.push({
+      role: "assistant",
+      content: null,
+      reasoning: null,
+      tool_calls: [{ function: { name: toolName, arguments: pending?.args ?? "" } }],
+    });
+    this.messages.push({
+      role: "tool",
+      content: outputStr,
+      reasoning: null,
+      tool_calls: null,
+    });
+    this.pendingTools.delete(toolCallId);
+  }
+
+  /** Flush any accumulated text/reasoning into an assistant message. */
+  private flushAssistantMessage(): void {
+    const text = this.messageChunks.join("");
+    const reasoning = this.reasoningChunks.join("");
+    if (text !== "" || reasoning !== "") {
+      this.messages.push({
+        role: "assistant",
+        content: text || null,
+        reasoning: reasoning || null,
+        tool_calls: null,
+      });
+    }
+    this.messageChunks = [];
+    this.reasoningChunks = [];
+  }
+
+  private rejectAll(err: Error): void {
+    for (const handler of this.pending.values()) {
+      handler.reject(err);
+    }
+    this.pending.clear();
+  }
+
+  private async ensureProcess(): Promise<void> {
+    if (this.process !== null) {
+      return;
+    }
+
+    const child = spawn(HERMES_COMMAND, ["acp"], {
+      env: process.env,
+      shell: false,
+      stdio: ["pipe", "pipe", "pipe"],
+    });
+
+    this.process = child;
+
+    child.stderr?.on("data", (chunk: Buffer) => {
+      this.stderrBuffer += chunk.toString();
+    });
+
+    child.on("error", (cause) => {
+      const message = cause instanceof Error ? cause.message : String(cause);
+      this.rejectAll(new Error(`hermes acp spawn failed: ${message}`));
+    });
+
+    child.on("close", (code) => {
+      if (code !== 0 && this.pending.size > 0) {
+        const detail = this.stderrBuffer.trim() !== "" ? ` stderr=${this.stderrBuffer.trim()}` : "";
+        this.rejectAll(
+          new Error(`hermes acp exited unexpectedly with code ${code ?? "null"}${detail}`),
+        );
+      }
+    });
+
+    if (child.stdout === null) {
+      throw new Error("hermes acp process stdout is not available");
+    }
+    const rl = createInterface({ input: child.stdout });
+    rl.on("line", (line) => {
+      this.handleLine(line.trim());
+    });
+  }
+
+  private async initialize(): Promise<void> {
+    const initResponse = await this.sendRequest("initialize", {
+      protocolVersion: PROTOCOL_VERSION,
+      clientInfo: { name: "uwf", version: "0.1.0" },
+      capabilities: {},
+    });
+
+    if ((initResponse as { error?: unknown }).error !== undefined) {
+      throw new Error(
+        `initialize failed: ${JSON.stringify((initResponse as { error: unknown }).error)}`,
+      );
+    }
+
+    this.sendNotification("initialized");
+  }
+}
@@ -1,164 +1,175 @@
-import { spawn } from "node:child_process";
 import type { Store } from "@uncaged/json-cas";
-
 import {
  type AgentContext,
  type AgentRunResult,
+  buildContinuationPrompt,
  buildRolePrompt,
  createAgent,
 } from "@uncaged/workflow-agent-kit";
+import { createLogger } from "@uncaged/workflow-util";

-import {
-  loadHermesSession,
-  parseSessionIdFromStdout,
-  storeHermesSessionDetail,
-} from "./session-detail.js";
+import { HermesAcpClient } from "./acp-client.js";
+import { getCachedSessionId, isResumeDisabled, setCachedSessionId } from "./session-cache.js";
+import { storeHermesSessionDetail } from "./session-detail.js";

-const HERMES_COMMAND = "hermes";
-const HERMES_MAX_TURNS = 90;
-
-function buildHistorySummary(steps: AgentContext["steps"]): string {
-  if (steps.length === 0) {
-    return "";
-  }
-
-  const lines: string[] = ["## Previous Steps"];
-  for (let i = 0; i < steps.length; i++) {
-    const step = steps[i];
-    if (step === undefined) {
-      continue;
-    }
-    lines.push("");
-    lines.push(`### Step ${i + 1}: ${step.role}`);
-    lines.push(`Output: ${JSON.stringify(step.output)}`);
-    lines.push(`Agent: ${step.agent}`);
-  }
-  return lines.join("\n");
-}
+const log = createLogger({ sink: { kind: "stderr" } });

 /** Assemble system prompt, task, and prior step outputs for Hermes. */
 export function buildHermesPrompt(ctx: AgentContext): string {
-  const roleDef = ctx.workflow.roles[ctx.role];
-  const rolePrompt = roleDef !== undefined ? buildRolePrompt(roleDef) : "";
  const parts: string[] = [];
-  if (ctx.outputFormatInstruction !== undefined && ctx.outputFormatInstruction !== "") {
+
+  if (ctx.outputFormatInstruction !== "") {
    parts.push(ctx.outputFormatInstruction, "");
  }
-  parts.push(rolePrompt, "", "## Task", ctx.start.prompt);
-  const historyBlock = buildHistorySummary(ctx.steps);
-  if (historyBlock !== "") {
-    parts.push("", historyBlock);
+
+  if (!ctx.isFirstVisit) {
+    // Re-entry: show only steps since last visit, meta only
+    parts.push(buildContinuationPrompt(ctx.steps, ctx.role, ctx.edgePrompt));
+    return parts.join("\n");
  }
+
+  // First visit: show initial context with content for recent steps
+  const roleDef = ctx.workflow.roles[ctx.role];
+  const rolePrompt = roleDef !== undefined ? buildRolePrompt(roleDef) : "";
+  parts.push(rolePrompt, "", "## Task", ctx.start.prompt);
+
+  // Add history with content (last 2-3 steps within quota)
+  if (ctx.steps.length > 0) {
+    parts.push(
+      "",
+      buildContinuationPrompt(ctx.steps, ctx.role, ctx.edgePrompt, {
+        includeContent: true,
+        quota: 32000, // Use THREAD_READ_DEFAULT_QUOTA equivalent
+      }),
+    );
+  } else {
+    parts.push("", "## Moderator Instruction", "", ctx.edgePrompt);
+  }
+
  return parts.join("\n");
 }

-function spawnHermes(args: string[]): Promise<{ stdout: string; stderr: string }> {
-  return new Promise((resolve, reject) => {
-    const child = spawn(HERMES_COMMAND, args, {
-      env: process.env,
-      shell: false,
-      stdio: ["ignore", "pipe", "pipe"],
-    });
-
-    let stdout = "";
-    let stderr = "";
-    child.stdout?.on("data", (chunk: Buffer) => {
-      stdout += chunk.toString();
-    });
-    child.stderr?.on("data", (chunk: Buffer) => {
-      stderr += chunk.toString();
-    });
-
-    child.on("error", (cause) => {
-      const message = cause instanceof Error ? cause.message : String(cause);
-      reject(new Error(`hermes spawn failed: ${message}`));
-    });
-
-    child.on("close", (code) => {
-      if (code === 0) {
-        resolve({ stdout, stderr });
-        return;
-      }
-      const detail = stderr.trim() !== "" ? ` stderr=${stderr.trim()}` : "";
-      reject(new Error(`hermes exited with code ${code ?? "null"}${detail}`));
-    });
-  });
-}
-
-function spawnHermesChat(prompt: string): Promise<{ stdout: string; stderr: string }> {
-  return spawnHermes([
-    "chat",
-    "-q",
-    prompt,
-    "--yolo",
-    "--max-turns",
-    String(HERMES_MAX_TURNS),
-    "--quiet",
-  ]);
-}
-
-function spawnHermesResume(
-  sessionId: string,
-  message: string,
-): Promise<{ stdout: string; stderr: string }> {
-  return spawnHermes([
-    "chat",
-    "--resume",
-    sessionId,
-    "-q",
-    message,
-    "--yolo",
-    "--max-turns",
-    String(HERMES_MAX_TURNS),
-    "--quiet",
-  ]);
-}
-
-function parseSessionId(stdout: string, stderr: string): string {
-  const sessionId = parseSessionIdFromStdout(stderr) ?? parseSessionIdFromStdout(stdout);
-  if (sessionId === null) {
-    throw new Error(
-      "Failed to parse session_id from hermes output.\n" +
-        `stderr (first 200 chars): ${stderr.slice(0, 200)}\n` +
-        `stdout (first 200 chars): ${stdout.slice(0, 200)}`,
-    );
-  }
-  return sessionId;
-}
-
-async function buildResultFromSession(sessionId: string, store: Store): Promise<AgentRunResult> {
-  const session = await loadHermesSession(sessionId);
-  if (session === null) {
-    throw new Error(`Failed to load hermes session file for session_id: ${sessionId}`);
-  }
-  const { detailHash, output } = await storeHermesSessionDetail(store, session);
-  return { output, detailHash, sessionId };
-}
-
-async function runHermes(ctx: AgentContext): Promise<AgentRunResult> {
-  const fullPrompt = buildHermesPrompt(ctx);
-  const { stdout, stderr } = await spawnHermesChat(fullPrompt);
-  const sessionId = parseSessionId(stdout, stderr);
-  return buildResultFromSession(sessionId, ctx.store);
-}
-
-async function continueHermes(
-  sessionId: string,
-  message: string,
+async function storePromptResult(
  store: Store,
-): Promise<AgentRunResult> {
-  const { stdout, stderr } = await spawnHermesResume(sessionId, message);
-  // Resume may return a new session_id
-  const newSessionId = parseSessionIdFromStdout(stderr) ?? parseSessionIdFromStdout(stdout);
-  const resolvedId = newSessionId ?? sessionId;
-  return buildResultFromSession(resolvedId, store);
+  sessionId: string,
+  messages: Awaited<ReturnType<HermesAcpClient["prompt"]>>["messages"],
+): Promise<{ detailHash: string }> {
+  const session = {
+    session_id: sessionId,
+    model: "",
+    session_start: new Date().toISOString(),
+    messages,
+  };
+  return storeHermesSessionDetail(store, session);
 }

-/** Agent CLI factory: parses argv, runs Hermes, extracts output, writes StepNode. */
+type PromptAttempt = {
+  useContinuation: boolean;
+  resumed: boolean;
+};
+
+async function prepareSession(
+  client: HermesAcpClient,
+  ctx: AgentContext,
+  cwd: string,
+): Promise<PromptAttempt> {
+  if (ctx.isFirstVisit || isResumeDisabled()) {
+    await client.connect(cwd);
+    return { useContinuation: false, resumed: false };
+  }
+
+  const cachedSessionId = await getCachedSessionId(ctx.threadId, ctx.role);
+  if (cachedSessionId === null) {
+    log("6RWK3N8Q", `no cached session for ${ctx.threadId}:${ctx.role}, starting new session`);
+    await client.connect(cwd);
+    return { useContinuation: false, resumed: false };
+  }
+
+  try {
+    await client.resume(cachedSessionId, cwd);
+    log("9MHT4V2P", `resumed hermes session ${cachedSessionId} for ${ctx.threadId}:${ctx.role}`);
+    return { useContinuation: true, resumed: true };
+  } catch (error) {
+    const message = error instanceof Error ? error.message : String(error);
+    log("3XPN7K4W", `session resume failed, falling back to new session: ${message}`);
+    await client.close();
+    await client.connect(cwd);
+    return { useContinuation: false, resumed: false };
+  }
+}
+
+/**
+ * Agent CLI factory: parses argv, runs Hermes, extracts output, writes StepNode.
+ *
+ * A single ACP client is shared across run() and continue() calls so that
+ * frontmatter retry loops keep the same Hermes session context.  The client
+ * is closed once the agent process exits (via process.on("exit")).
+ */
 export function createHermesAgent(): () => Promise<void> {
-  return createAgent({
+  const client = new HermesAcpClient();
+
+  // Ensure cleanup regardless of how the process exits.
+  process.on("exit", () => {
+    void client.close();
+  });
+
+  async function runPrompt(ctx: AgentContext, useContinuation: boolean): Promise<AgentRunResult> {
+    const effectiveCtx = useContinuation ? ctx : { ...ctx, isFirstVisit: true };
+    const fullPrompt = buildHermesPrompt(effectiveCtx);
+    const { text, sessionId, messages } = await client.prompt(fullPrompt);
+    const { detailHash } = await storePromptResult(ctx.store, sessionId, messages);
+
+    if (!isResumeDisabled()) {
+      await setCachedSessionId(ctx.threadId, ctx.role, sessionId);
+    }
+
+    return { output: text, detailHash, sessionId };
+  }
+
+  async function runHermes(ctx: AgentContext): Promise<AgentRunResult> {
+    const cwd = process.cwd();
+    const attempt = await prepareSession(client, ctx, cwd);
+
+    try {
+      return await runPrompt(ctx, attempt.useContinuation);
+    } catch (error) {
+      if (!attempt.resumed) {
+        throw error;
+      }
+
+      const message = error instanceof Error ? error.message : String(error);
+      log("8FQW2R6N", `continuation prompt failed, retrying with initial prompt: ${message}`);
+      await client.close();
+      await client.connect(cwd);
+      return runPrompt(ctx, false);
+    }
+  }
+
+  async function continueHermes(
+    _sessionId: string,
+    message: string,
+    store: Store,
+  ): Promise<AgentRunResult> {
+    // Client is already connected from runHermes — same ACP session,
+    // so the agent sees the full conversation history (crucial for retries).
+    const { text, sessionId, messages } = await client.prompt(message);
+    const { detailHash } = await storePromptResult(store, sessionId, messages);
+    return { output: text, detailHash, sessionId };
+  }
+
+  const agentMain = createAgent({
    name: "hermes",
    run: runHermes,
    continue: continueHermes,
  });
+
+  // Wrap to ensure ACP client is closed after agent completes,
+  // so the hermes subprocess exits and bun can terminate.
+  return async () => {
+    try {
+      await agentMain();
+    } finally {
+      await client.close();
+    }
+  };
 }
@@ -1 +1,2 @@
+export { HermesAcpClient } from "./acp-client.js";
 export { buildHermesPrompt, createHermesAgent } from "./hermes.js";
@@ -0,0 +1,34 @@
+// Re-export session cache from the shared agent-kit package with agent name injected.
+
+import {
+  getCachedSessionId as getCachedSessionIdBase,
+  setCachedSessionId as setCachedSessionIdBase,
+} from "@uncaged/workflow-agent-kit";
+import type { ThreadId } from "@uncaged/workflow-protocol";
+
+export async function getCachedSessionId(threadId: ThreadId, role: string): Promise<string | null> {
+  return getCachedSessionIdBase("hermes", threadId, role);
+}
+
+export async function setCachedSessionId(
+  threadId: ThreadId,
+  role: string,
+  sessionId: string,
+): Promise<void> {
+  return setCachedSessionIdBase("hermes", threadId, role, sessionId);
+}
+
+export function isResumeDisabled(): boolean {
+  // Hermes ACP session/resume is broken: _restore fails for custom providers
+  // because resolve_runtime_provider("custom") throws and base_url/api_mode
+  // are lost in the fallback path.  Resume silently creates a new session
+  // (different sessionId, no history), causing empty-text responses.
+  // See: https://github.com/NousResearch/hermes-agent/issues/13489
+  // Disable by default until upstream fixes the bug.  Set UWF_HERMES_RESUME=1
+  // to opt back in.
+  const enableFlag = process.env.UWF_HERMES_RESUME;
+  if (enableFlag === "1" || enableFlag === "true") {
+    return false;
+  }
+  return true;
+}
@@ -0,0 +1,183 @@
+# @uncaged/workflow-agent-kit
+
+Agent framework — `createAgent` factory, context builder, frontmatter fast-path, and LLM extract pipeline.
+
+## Overview
+
+Layer 2 agent framework. Provides the standard entrypoint for all agent CLIs: parse `<thread-id> <role>` from argv, load thread/workflow context from CAS, invoke the agent's `run`/`continue` functions, validate output via frontmatter fast-path or LLM extract, and write a `StepNodePayload` to CAS.
+
+Also exports prompt builders, config/storage helpers, and session ID caching for multi-turn agents.
+
+**Dependencies:** `@uncaged/json-cas`, `@uncaged/json-cas-fs`, `@uncaged/workflow-protocol`, `@uncaged/workflow-util`, `dotenv`, `yaml`
+
+## Installation
+
+```bash
+bun add @uncaged/workflow-agent-kit
+```
+
+## API
+
+All exports come from `src/index.ts`.
+
+### Agent factory
+
+```typescript
+function createAgent(options: AgentOptions): () => Promise<void>
+
+type AgentOptions = {
+  name: string;
+  run: AgentRunFn;
+  continue: AgentContinueFn;
+};
+
+type AgentRunFn = (ctx: AgentContext) => Promise<AgentRunResult>;
+type AgentContinueFn = (
+  sessionId: string,
+  message: string,
+  store: AgentContext["store"],
+) => Promise<AgentRunResult>;
+
+type AgentRunResult = {
+  output: string;
+  detailHash: string;
+  sessionId: string;
+};
+```
+
+Agent CLIs call `createAgent(...)` and invoke the returned function as `main()`.
+
+### Context
+
+```typescript
+function buildContext(threadId: ThreadId, role: string): Promise<AgentContext>
+function buildContextWithMeta(
+  threadId: ThreadId,
+  role: string,
+): Promise<AgentContext & { meta: BuildContextMeta }>
+
+type AgentContext = ModeratorContext & {
+  threadId: ThreadId;
+  role: string;
+  store: Store;
+  workflow: WorkflowPayload;
+  outputFormatInstruction: string;
+  edgePrompt: string;
+  isFirstVisit: boolean;
+};
+
+type BuildContextMeta = {
+  storageRoot: string;
+  store: Store;
+  schemas: AgentStore["schemas"];
+  headHash: CasRef;
+  chain: ChainState;
+};
+```
+
+Requires `UWF_EDGE_PROMPT` in the environment (set by `uwf thread step`).
+
+### Prompt builders
+
+```typescript
+function buildRolePrompt(role: RoleDefinition): string
+function buildOutputFormatInstruction(schema: JSONSchema): string
+function buildContinuationPrompt(
+  steps: StepContext[],
+  role: string,
+  edgePrompt: string,
+  options?: { includeContent?: boolean; quota?: number },
+): string
+```
+
+### Extract pipeline
+
+```typescript
+function resolveExtractModelAlias(config: WorkflowConfig): ModelAlias
+function resolveModel(config: WorkflowConfig, alias: ModelAlias): ResolvedLlmProvider
+function extract(
+  rawOutput: string,
+  outputSchema: CasRef,
+  config: WorkflowConfig,
+): Promise<ExtractResult>
+
+type ResolvedLlmProvider = { baseUrl: string; apiKey: string; model: string };
+type ExtractResult = { value: unknown; hash: CasRef };
+```
+
+### Frontmatter fast-path
+
+```typescript
+function tryFrontmatterFastPath(
+  rawOutput: string,
+  outputSchema: CasRef,
+  store: Store,
+): Promise<FrontmatterFastPathResult | null>
+
+type FrontmatterFastPathResult = { body: string; outputHash: CasRef };
+```
+
+### Session cache
+
+```typescript
+function getCachedSessionId(threadId: ThreadId, role: string): Promise<string | null>
+function setCachedSessionId(
+  threadId: ThreadId,
+  role: string,
+  sessionId: string,
+): Promise<void>
+```
+
+### Config and storage
+
+```typescript
+function getConfigPath(storageRoot: string): string
+function getEnvPath(storageRoot: string): string
+function resolveStorageRoot(): string
+function loadWorkflowConfig(storageRoot: string): Promise<WorkflowConfig>
+```
+
+## Usage
+
+```typescript
+import { createAgent, buildRolePrompt } from "@uncaged/workflow-agent-kit";
+import type { AgentContext, AgentRunResult } from "@uncaged/workflow-agent-kit";
+
+async function run(ctx: AgentContext): Promise<AgentRunResult> {
+  const prompt = buildRolePrompt(ctx.workflow.roles[ctx.role]!);
+  // ... spawn external process, capture output ...
+  return { output: markdown, detailHash: "...", sessionId: "..." };
+}
+
+async function continueSession(
+  sessionId: string,
+  message: string,
+): Promise<AgentRunResult> {
+  // ... continue multi-turn session ...
+  return { output: markdown, detailHash: "...", sessionId };
+}
+
+export const main = createAgent({ name: "my-agent", run, continue: continueSession });
+```
+
+## Internal Structure
+
+```
+src/
+├── index.ts
+├── run.ts                         createAgent entrypoint
+├── context.ts                     Thread chain walk, AgentContext builder
+├── extract.ts                     LLM structured extract fallback
+├── frontmatter.ts                 Frontmatter fast-path validation
+├── build-role-prompt.ts           Role definition → prompt text
+├── build-output-format-instruction.ts
+├── build-continuation-prompt.ts
+├── session-cache.ts               Per-thread/session ID persistence
+├── storage.ts                     CAS store, config, threads index
+├── schemas.ts                     Agent CAS schema registration
+└── types.ts                       AgentContext, AgentOptions, etc.
+```
+
+## Configuration
+
+Reads `config.yaml` and `.env` from the workflow storage root (`~/.uncaged/workflow` by default). See `@uncaged/workflow-protocol` for `WorkflowConfig` shape. Set via `uwf setup`.
@@ -0,0 +1,234 @@
+import type { StepContext } from "@uncaged/workflow-protocol";
+import { describe, expect, test } from "vitest";
+import { buildContinuationPrompt } from "../src/build-continuation-prompt.js";
+
+const reviewerStep: StepContext = {
+  role: "reviewer",
+  output: { approved: false, comments: "Missing tests" },
+  detail: "2MXBG6PN4A8JR",
+  agent: "uwf-hermes",
+  edgePrompt: "Review the developer's work.",
+  content: null,
+};
+
+const developerStep: StepContext = {
+  role: "developer",
+  output: { filesChanged: ["src/app.ts"], summary: "Initial fix" },
+  detail: "1VPBG9SM5E7WK",
+  agent: "uwf-hermes",
+  edgePrompt: "Implement the fix.",
+  content: null,
+};
+
+describe("buildContinuationPrompt", () => {
+  test("includes steps after the last matching role and the edge prompt", () => {
+    const steps: StepContext[] = [
+      developerStep,
+      reviewerStep,
+      {
+        role: "planner",
+        output: { plan: "revise approach" },
+        detail: "7BQST3VW9F2MA",
+        agent: "uwf-hermes",
+        edgePrompt: "Revise the plan.",
+        content: null,
+      },
+    ];
+
+    const result = buildContinuationPrompt(
+      steps,
+      "developer",
+      "The reviewer rejected your implementation. Read their feedback and fix the issues.",
+    );
+
+    expect(result).toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("### Step 2: reviewer");
+    expect(result).toContain("Missing tests");
+    expect(result).toContain("### Step 3: planner");
+    expect(result).toContain("## Moderator Instruction");
+    expect(result).toContain("The reviewer rejected your implementation.");
+    expect(result).not.toContain("Initial fix");
+  });
+
+  test("uses all steps when the role has not run before", () => {
+    const result = buildContinuationPrompt(
+      [developerStep, reviewerStep],
+      "planner",
+      "Continue from the reviewer feedback.",
+    );
+
+    expect(result).toContain("### Step 1: developer");
+    expect(result).toContain("### Step 2: reviewer");
+    expect(result).toContain("Continue from the reviewer feedback.");
+  });
+
+  test("still includes moderator instruction when there are no intervening steps", () => {
+    const result = buildContinuationPrompt(
+      [developerStep],
+      "developer",
+      "Please revise your work.",
+    );
+
+    expect(result).not.toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("## Moderator Instruction");
+    expect(result).toContain("Please revise your work.");
+  });
+
+  test("includes step content when includeContent option is true", () => {
+    const stepsWithContent: StepContext[] = [
+      {
+        role: "planner",
+        output: { plan: "hash123" },
+        detail: "detail1",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: "# Plan\nDetailed plan markdown...",
+      },
+      {
+        role: "developer",
+        output: { filesChanged: ["app.ts"] },
+        detail: "detail2",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: "# Implementation\nCode changes...",
+      },
+      {
+        role: "reviewer",
+        output: { approved: false },
+        detail: "detail3",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: "# Review\nFeedback...",
+      },
+    ];
+
+    const result = buildContinuationPrompt(stepsWithContent, "committer", "Commit the changes.", {
+      includeContent: true,
+    });
+
+    expect(result).toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("### Step 1: planner");
+    expect(result).toContain("#### Step Content");
+    expect(result).toContain("# Plan");
+    expect(result).toContain("Detailed plan markdown");
+    expect(result).toContain("### Step 2: developer");
+    expect(result).toContain("# Implementation");
+    expect(result).toContain("### Step 3: reviewer");
+    expect(result).toContain("# Review");
+    expect(result).toContain("## Moderator Instruction");
+    expect(result).toContain("Commit the changes.");
+  });
+
+  test("omits step content when includeContent is false (default)", () => {
+    const stepsWithContent: StepContext[] = [
+      {
+        role: "developer",
+        output: { filesChanged: ["app.ts"] },
+        detail: "detail1",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: "# Implementation\nCode changes...",
+      },
+      {
+        role: "reviewer",
+        output: { approved: false },
+        detail: "detail2",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: "# Review\nFeedback...",
+      },
+    ];
+
+    const result = buildContinuationPrompt(stepsWithContent, "developer", "Fix the issues.");
+
+    expect(result).toContain("## What Happened Since Your Last Turn");
+    expect(result).toContain("### Step 2: reviewer");
+    expect(result).toContain(JSON.stringify(stepsWithContent[1]?.output));
+    expect(result).not.toContain("#### Step Content");
+    expect(result).not.toContain("# Review");
+  });
+
+  test("respects quota when includeContent is true", () => {
+    const largeContent = "x".repeat(5000);
+    const stepsWithContent: StepContext[] = [
+      {
+        role: "planner",
+        output: { plan: "hash1" },
+        detail: "detail1",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: largeContent,
+      },
+      {
+        role: "developer",
+        output: { files: ["app.ts"] },
+        detail: "detail2",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: largeContent,
+      },
+      {
+        role: "reviewer",
+        output: { approved: true },
+        detail: "detail3",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: "# Review\nLooks good!",
+      },
+    ];
+
+    const result = buildContinuationPrompt(stepsWithContent, "committer", "Commit the changes.", {
+      includeContent: true,
+      quota: 1000,
+    });
+
+    // Should include most recent step(s) within quota
+    expect(result).toContain("### Step 1: reviewer"); // Showing 1 of 3, so step 3 becomes step 1
+    expect(result).toContain("#### Step Content");
+    expect(result).toContain("## Moderator Instruction");
+    expect(result).toContain("Showing 1 of 3 steps (2 omitted due to quota)");
+  });
+
+  test("handles null content gracefully when includeContent is true", () => {
+    const stepsWithMixedContent: StepContext[] = [
+      {
+        role: "planner",
+        output: { plan: "hash1" },
+        detail: "detail1",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: "# Plan\nDetails...",
+      },
+      {
+        role: "developer",
+        output: { files: ["app.ts"] },
+        detail: "detail2",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: null, // No content available
+      },
+      {
+        role: "reviewer",
+        output: { approved: true },
+        detail: "detail3",
+        agent: "uwf-hermes",
+        edgePrompt: "",
+        content: "# Review\nApproved!",
+      },
+    ];
+
+    const result = buildContinuationPrompt(
+      stepsWithMixedContent,
+      "committer",
+      "Commit the changes.",
+      { includeContent: true },
+    );
+
+    expect(result).toContain("### Step 1: planner");
+    expect(result).toContain("# Plan");
+    expect(result).toContain("### Step 2: developer");
+    // Step 2 should not have content section since content is null
+    expect(result).toContain("### Step 3: reviewer");
+    expect(result).toContain("# Review");
+  });
+});
@@ -87,7 +87,7 @@ describe("buildOutputFormatInstruction", () => {
    expect(result).toContain("beta: <number>");
  });

-  test("lists union of fields from a oneOf schema", () => {
+  test("lists union of fields from a oneOf schema (no discriminant — flat merge)", () => {
    const schema = {
      oneOf: [
        {
@@ -101,12 +101,71 @@ describe("buildOutputFormatInstruction", () => {
      ],
    };
    const result = buildOutputFormatInstruction(schema);
+    // No discriminant detected → falls back to flat merge
    expect(result).toContain("`foo`");
    expect(result).toContain("`bar`");
    expect(result).toContain("foo: <string>");
    expect(result).toContain("bar: true  # true | false");
  });

+  test("renders per-variant instructions for discriminated oneOf", () => {
+    const schema = {
+      oneOf: [
+        {
+          type: "object",
+          properties: {
+            $status: { const: "ready" },
+            plan: { type: "string" },
+          },
+          required: ["$status", "plan"],
+        },
+        {
+          type: "object",
+          properties: {
+            $status: { const: "insufficient_info" },
+          },
+          required: ["$status"],
+        },
+      ],
+    };
+    const result = buildOutputFormatInstruction(schema);
+    expect(result).toContain("Choose ONE of the following variants");
+    expect(result).toContain("When `$status: ready`");
+    expect(result).toContain("When `$status: insufficient_info`");
+    expect(result).toContain("plan: <string>");
+    // The insufficient_info variant should NOT mention plan
+    const insufficientBlock = result.split("When `$status: insufficient_info`")[1];
+    expect(insufficientBlock).not.toContain("plan:");
+  });
+
+  test("renders per-variant for single-enum discriminant", () => {
+    const schema = {
+      oneOf: [
+        {
+          type: "object",
+          properties: {
+            $status: { type: "string", enum: ["approved"] },
+            branch: { type: "string" },
+          },
+          required: ["$status"],
+        },
+        {
+          type: "object",
+          properties: {
+            $status: { type: "string", enum: ["rejected"] },
+            comments: { type: "string" },
+          },
+          required: ["$status"],
+        },
+      ],
+    };
+    const result = buildOutputFormatInstruction(schema);
+    expect(result).toContain("When `$status: approved`");
+    expect(result).toContain("When `$status: rejected`");
+    expect(result).toContain("branch: <string>");
+    expect(result).toContain("comments: <string>");
+  });
+
  test("falls back gracefully for a non-object schema with no properties", () => {
    const result = buildOutputFormatInstruction({ type: "string" });
    expect(result).toContain("schema fields will be extracted automatically");
@@ -141,6 +200,28 @@ describe("buildOutputFormatInstruction", () => {
    expect(result).toContain("shared: <string>  # required");
  });

+  test("explicitly forbids extra frontmatter fields", () => {
+    const result = buildOutputFormatInstruction(PLANNER_SCHEMA);
+    expect(result).toMatch(/\b(only|exclusively)\b.*fields/i);
+    expect(result).toMatch(/do not add (extra|additional|other) fields/i);
+  });
+
+  test("forbids extra fields even for empty schema", () => {
+    const result = buildOutputFormatInstruction({});
+    expect(result).toMatch(/do not add (extra|additional|other) fields/i);
+  });
+
+  test("forbids extra fields for anyOf/oneOf schemas", () => {
+    const schema = {
+      anyOf: [
+        { type: "object", properties: { alpha: { type: "string" } } },
+        { type: "object", properties: { beta: { type: "number" } } },
+      ],
+    };
+    const result = buildOutputFormatInstruction(schema);
+    expect(result).toMatch(/do not add (extra|additional|other) fields/i);
+  });
+
  test("includes focus reminder about role scope", () => {
    const result = buildOutputFormatInstruction({});
    expect(result).toContain("Focus exclusively on YOUR role");
@@ -0,0 +1,14 @@
+import { describe, expect, test } from "vitest";
+
+// We need to test buildHistory indirectly through buildContext
+// since buildHistory is not exported. For now, we'll test the integration
+// through the public API in a separate integration test.
+
+describe("context module - content extraction", () => {
+  test("placeholder - content extraction will be tested via integration tests", () => {
+    // This test is a placeholder. The actual testing of content extraction
+    // will be done through integration tests in build-continuation-prompt.test.ts
+    // where we can verify that StepContext objects have the correct content field.
+    expect(true).toBe(true);
+  });
+});
@@ -0,0 +1,247 @@
+import { mkdir, readdir, readFile, rm, stat, writeFile } from "node:fs/promises";
+import { dirname, join } from "node:path";
+import type { ThreadId } from "@uncaged/workflow-protocol";
+import { afterEach, beforeEach, describe, expect, test } from "vitest";
+
+import { getCachedSessionId, getCachePath, setCachedSessionId } from "../src/session-cache.js";
+import { resolveStorageRoot } from "../src/storage.js";
+
+describe("session-cache", () => {
+  let originalStorageRoot: string;
+  let testStorageRoot: string;
+
+  beforeEach(async () => {
+    // Create a temporary test storage root
+    originalStorageRoot = resolveStorageRoot();
+    testStorageRoot = join(originalStorageRoot, "test-cache", `test-${Date.now()}`);
+    await mkdir(testStorageRoot, { recursive: true });
+
+    // Override the storage root for testing
+    process.env.WORKFLOW_STORAGE_ROOT = testStorageRoot;
+  });
+
+  afterEach(async () => {
+    // Clean up test storage root
+    await rm(testStorageRoot, { recursive: true, force: true });
+    delete process.env.WORKFLOW_STORAGE_ROOT;
+  });
+
+  describe("getCachePath", () => {
+    test("returns agent-specific file path", () => {
+      const path = getCachePath("claude-code");
+      expect(path).toMatch(/\/cache\/claude-code-sessions\.json$/);
+    });
+
+    test("returns different paths for different agents", () => {
+      const pathClaudeCode = getCachePath("claude-code");
+      const pathHermes = getCachePath("hermes");
+
+      expect(pathClaudeCode).not.toBe(pathHermes);
+      expect(pathClaudeCode).toMatch(/claude-code-sessions\.json$/);
+      expect(pathHermes).toMatch(/hermes-sessions\.json$/);
+    });
+
+    test("handles agent names with special characters", () => {
+      const path1 = getCachePath("my-agent");
+      const path2 = getCachePath("my_agent");
+
+      expect(path1).toMatch(/my-agent-sessions\.json$/);
+      expect(path2).toMatch(/my_agent-sessions\.json$/);
+    });
+  });
+
+  describe("session isolation", () => {
+    const threadId = "01234567890123456789012345" as ThreadId;
+    const role = "developer";
+
+    test("sessions are isolated per agent", async () => {
+      // Cache different session IDs for each agent
+      await setCachedSessionId("claude-code", threadId, role, "session-cc-001");
+      await setCachedSessionId("hermes", threadId, role, "session-hermes-001");
+
+      // Each agent should retrieve its own session ID
+      const sessionCC = await getCachedSessionId("claude-code", threadId, role);
+      const sessionHermes = await getCachedSessionId("hermes", threadId, role);
+
+      expect(sessionCC).toBe("session-cc-001");
+      expect(sessionHermes).toBe("session-hermes-001");
+    });
+
+    test("updating one agent's cache does not affect another", async () => {
+      // Set initial sessions for both agents
+      await setCachedSessionId("claude-code", threadId, role, "session-cc-001");
+      await setCachedSessionId("hermes", threadId, role, "session-hermes-001");
+
+      // Update claude-code's session
+      await setCachedSessionId("claude-code", threadId, role, "session-cc-002");
+
+      // Hermes's session should remain unchanged
+      const sessionHermes = await getCachedSessionId("hermes", threadId, role);
+      expect(sessionHermes).toBe("session-hermes-001");
+
+      // Claude-code should have the new session
+      const sessionCC = await getCachedSessionId("claude-code", threadId, role);
+      expect(sessionCC).toBe("session-cc-002");
+    });
+
+    test("missing session returns null for specific agent", async () => {
+      const session = await getCachedSessionId("claude-code", threadId, role);
+      expect(session).toBeNull();
+    });
+
+    test("empty session ID is treated as missing", async () => {
+      await setCachedSessionId("claude-code", threadId, role, "");
+
+      const session = await getCachedSessionId("claude-code", threadId, role);
+      expect(session).toBeNull();
+    });
+  });
+
+  describe("file system operations", () => {
+    const threadId = "01234567890123456789012345" as ThreadId;
+    const role = "developer";
+
+    test("cache directory is created if missing", async () => {
+      const cachePath = getCachePath("claude-code");
+      const cacheDir = dirname(cachePath);
+
+      // Ensure cache dir doesn't exist
+      await rm(cacheDir, { recursive: true, force: true });
+
+      // Write a session
+      await setCachedSessionId("claude-code", threadId, role, "session-001");
+
+      // Cache directory should be created
+      const stats = await stat(cacheDir);
+      expect(stats.isDirectory()).toBe(true);
+    });
+
+    test("multiple agents create separate cache files", async () => {
+      // Cache sessions for multiple agents
+      await setCachedSessionId("claude-code", threadId, role, "session-cc-001");
+      await setCachedSessionId("hermes", threadId, role, "session-hermes-001");
+
+      // Separate cache files should exist
+      const pathCC = getCachePath("claude-code");
+      const pathHermes = getCachePath("hermes");
+
+      const contentCC = JSON.parse(await readFile(pathCC, "utf8")) as Record<string, string>;
+      const contentHermes = JSON.parse(await readFile(pathHermes, "utf8")) as Record<
+        string,
+        string
+      >;
+
+      expect(contentCC).toHaveProperty(`${threadId}:${role}`, "session-cc-001");
+      expect(contentHermes).toHaveProperty(`${threadId}:${role}`, "session-hermes-001");
+    });
+
+    test("atomic writes prevent partial reads", async () => {
+      // Write a session
+      await setCachedSessionId("claude-code", threadId, role, "session-001");
+
+      // The final file should exist (no .tmp files left behind)
+      const cachePath = getCachePath("claude-code");
+      const dir = dirname(cachePath);
+      const files = await readdir(dir);
+
+      expect(files).toContain("claude-code-sessions.json");
+      expect(files.every((f) => !f.endsWith(".tmp"))).toBe(true);
+    });
+  });
+
+  describe("legacy migration", () => {
+    const threadId = "01234567890123456789012345" as ThreadId;
+    const role = "developer";
+
+    test("old agent-sessions.json is ignored", async () => {
+      // Create old agent-sessions.json file
+      const oldCachePath = join(resolveStorageRoot(), "cache", "agent-sessions.json");
+      await mkdir(dirname(oldCachePath), { recursive: true });
+      await writeFile(
+        oldCachePath,
+        JSON.stringify({
+          "01234567890123456789012345:developer": "old-session-001",
+        }),
+        "utf8",
+      );
+
+      // Query with the new per-agent cache
+      const session = await getCachedSessionId("claude-code", threadId, role);
+
+      // Should return null (old cache is ignored)
+      expect(session).toBeNull();
+    });
+
+    test("new per-agent cache takes precedence", async () => {
+      // Create both old and new cache files
+      const oldPath = join(resolveStorageRoot(), "cache", "agent-sessions.json");
+      await mkdir(dirname(oldPath), { recursive: true });
+      await writeFile(
+        oldPath,
+        JSON.stringify({
+          [`${threadId}:${role}`]: "old-session",
+        }),
+        "utf8",
+      );
+
+      await setCachedSessionId("claude-code", threadId, role, "new-session");
+
+      // The new per-agent cache value should be returned
+      const session = await getCachedSessionId("claude-code", threadId, role);
+      expect(session).toBe("new-session");
+    });
+  });
+
+  describe("error handling", () => {
+    const threadId = "01234567890123456789012345" as ThreadId;
+    const role = "developer";
+
+    test("invalid JSON in cache file returns empty cache", async () => {
+      // Create a corrupted cache file
+      const cachePath = getCachePath("claude-code");
+      await mkdir(dirname(cachePath), { recursive: true });
+      await writeFile(cachePath, "{ invalid json }", "utf8");
+
+      // Should return null (treating corrupted cache as empty)
+      const session = await getCachedSessionId("claude-code", threadId, role);
+      expect(session).toBeNull();
+    });
+
+    test("non-object JSON in cache file returns empty cache", async () => {
+      // Create a cache file with non-object JSON
+      const cachePath = getCachePath("claude-code");
+      await mkdir(dirname(cachePath), { recursive: true });
+      await writeFile(cachePath, JSON.stringify(["not", "an", "object"]), "utf8");
+
+      // Should return null
+      const session = await getCachedSessionId("claude-code", threadId, role);
+      expect(session).toBeNull();
+    });
+
+    test("cache entries with non-string values are ignored", async () => {
+      // Create a cache file with mixed types
+      const cachePath = getCachePath("claude-code");
+      const cacheData = {
+        "thread1:role1": "valid-session",
+        "thread2:role2": 12345, // number
+        "thread3:role3": null, // null
+        "thread4:role4": "", // empty string
+      };
+      await mkdir(dirname(cachePath), { recursive: true });
+      await writeFile(cachePath, JSON.stringify(cacheData), "utf8");
+
+      // Valid string entries should be returned
+      const session1 = await getCachedSessionId("claude-code", "thread1" as ThreadId, "role1");
+      expect(session1).toBe("valid-session");
+
+      // Invalid entries should return null
+      const session2 = await getCachedSessionId("claude-code", "thread2" as ThreadId, "role2");
+      const session3 = await getCachedSessionId("claude-code", "thread3" as ThreadId, "role3");
+      const session4 = await getCachedSessionId("claude-code", "thread4" as ThreadId, "role4");
+
+      expect(session2).toBeNull();
+      expect(session3).toBeNull();
+      expect(session4).toBeNull(); // empty string is treated as missing
+    });
+  });
+});
--- a/Show More
+++ b/Show More