ci: use test:ci to skip integration tests in CI

The HermesAcpClient integration tests require a live Hermes agent process and always timeout (3 × 120s) in CI containers, causing every CI run to fail for ~6 minutes before reporting failure. Switch from `bun run test` to `bun run test:ci` which was already defined in all testable packages — workflow-agent-hermes's test:ci runs only unit tests (__tests__/*.test.ts), skipping integration/.
refactor: split e2e-walkthrough into 6 roles with dedicated cleanup
2026-05-26 23:08:16 +08:00 · 2026-05-26 14:47:44 +00:00 · 2026-05-26 14:37:46 +00:00 · 2026-05-26 14:27:31 +00:00 · 2026-05-26 14:24:42 +00:00
5 changed files with 302 additions and 330 deletions
@@ -22,4 +22,4 @@ jobs:
        run: bun run check

      - name: Test
-        run: bun test
+        run: bun run test:ci
@@ -1,83 +0,0 @@
-# Test Spec: uwf setup model connectivity validation (#335)
-
-## Context
-
-File: `packages/cli-workflow/src/commands/setup.ts`
-Test file: `packages/cli-workflow/src/__tests__/setup-validate.test.ts`
-
-After `cmdSetup` writes config, it should send a test chat completion request to verify the configured model is reachable. If validation fails, warn the user (don't abort — config is already saved).
-
-## Implementation Notes
-
- Add a `validateModel(baseUrl, apiKey, model)` function that sends a minimal chat completion request (`POST /chat/completions` with `messages: [{role:"user",content:"hi"}]`, `max_tokens: 1`)
- Returns `Result<void, string>` — ok if 2xx response, error with reason string otherwise
- Use `AbortSignal.timeout(15_000)` for the request
- Both `cmdSetup` and `cmdSetupInteractive` should call it after saving config
- `cmdSetup` returns validation result in its return object: `{ ...existing, validation: { ok: true } | { ok: false, error: string } }`
- `cmdSetupInteractive` prints a warning to console if validation fails, success message if it passes
- Use the project logger (`createLogger`) — no raw `console.log` except in interactive CLI output (per CLAUDE.md)
-
-## Test Cases (vitest)
-
-### 1. `validateModel` — success path
- Mock `fetch` to return `{ status: 200, ok: true, json: () => ({}) }`
- Call `validateModel(baseUrl, apiKey, model)`
- Assert returns `{ ok: true, value: undefined }`
- Assert fetch was called with correct URL (`${baseUrl}/chat/completions`), correct headers (`Authorization: Bearer ${apiKey}`), correct body (model, messages, max_tokens: 1)
-
-### 2. `validateModel` — HTTP error (401 unauthorized)
- Mock `fetch` to return `{ status: 401, ok: false, statusText: "Unauthorized" }`
- Call `validateModel(baseUrl, apiKey, model)`
- Assert returns `{ ok: false, error: <string containing "401"> }`
-
-### 3. `validateModel` — HTTP error (404 model not found)
- Mock `fetch` to return `{ status: 404, ok: false, statusText: "Not Found" }`
- Assert returns `{ ok: false, error: <string containing "404"> }`
-
-### 4. `validateModel` — network timeout
- Mock `fetch` to throw `DOMException` with name `AbortError`
- Assert returns `{ ok: false, error: <string containing "timeout" or "unreachable"> }`
-
-### 5. `validateModel` — network error (DNS failure, connection refused)
- Mock `fetch` to throw `TypeError("fetch failed")`
- Assert returns `{ ok: false, error: <string mentioning connectivity> }`
-
-### 6. `cmdSetup` — includes validation result on success
- Mock global `fetch` for `/chat/completions` to succeed
- Call `cmdSetup({ provider, baseUrl, apiKey, model, storageRoot })`
- Assert returned object has `validation: { ok: true, value: undefined }`
- Assert config files are still written (existing behavior preserved)
-
-### 7. `cmdSetup` — includes validation result on failure (config still saved)
- Mock global `fetch` for `/chat/completions` to return 401
- Call `cmdSetup({ ... })`
- Assert returned object has `validation: { ok: false, error: ... }`
- Assert `config.yaml` and `.env` are still written (validation failure doesn't prevent saving)
-
-### 8. `cmdSetupInteractive` — prints success message on validation pass
- Mock `fetch` for both `/models` and `/chat/completions` to succeed
- Mock stdin to provide valid selections
- Capture console output
- Assert output contains a success message like "Model verified" or "✓"
-
-### 9. `cmdSetupInteractive` — prints warning on validation failure
- Mock `fetch`: `/models` succeeds, `/chat/completions` returns 401
- Mock stdin for valid selections
- Capture console output
- Assert output contains a warning about model not being reachable and suggests trying a different model
-
-### 10. `validateModel` — request body correctness
- Mock `fetch` to capture the request body
- Call `validateModel(baseUrl, apiKey, "test-model")`
- Assert body is `{ model: "test-model", messages: [{role: "user", content: "hi"}], max_tokens: 1 }`
-
-## Export Requirements
-
- `validateModel` must be exported (for direct unit testing)
- Signature: `async function validateModel(baseUrl: string, apiKey: string, model: string): Promise<Result<void, string>>`
- `Result` type: `{ ok: true; value: T } | { ok: false; error: E }` (project convention)
-
-## Files to Create/Modify
-
- **New**: `packages/cli-workflow/src/__tests__/setup-validate.test.ts` — all test cases above
- **Modify**: `packages/cli-workflow/src/commands/setup.ts` — add `validateModel`, integrate into `cmdSetup` and `cmdSetupInteractive`
@@ -0,0 +1,269 @@
+name: "e2e-walkthrough"
+description: "End-to-end walkthrough of uwf CLI. Dogfooding: uwf tests uwf. Each role validates a phase of the CLI surface inside an isolated Docker container."
+roles:
+  bootstrap:
+    description: "Start Docker container with isolated storage, verify uwf is runnable"
+    goal: "You are an E2E test runner. Set up an isolated Docker environment and verify basic uwf functionality."
+    capabilities:
+      - docker
+      - shell
+    procedure: |
+      1. Start a Docker container with isolated storage:
+         ```
+         docker run -d --name uwf-e2e-$$ \
+           -v $HOME:$HOME \
+           -e HOME=$HOME \
+           -e UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage \
+           -w ~/repos/workflow \
+           node:22-bookworm \
+           sleep infinity
+         ```
+      2. Inside the container, install bun, install deps, then `bun link` all packages
+         so that `uwf`, `uwf-hermes`, `uwf-builtin` are on PATH (from source):
+         ```
+         docker exec uwf-e2e-$$ bash -c '
+           # Install bun
+           curl -fsSL https://bun.sh/install | bash
+           export PATH="$HOME/.bun/bin:$PATH"
+
+           # Isolated storage
+           mkdir -p $UNCAGED_WORKFLOW_STORAGE_ROOT
+
+           # Install workspace deps
+           cd ~/repos/workflow && bun install --frozen-lockfile
+
+           # bun link each package that has a bin entry
+           cd packages/cli-workflow && bun link && cd ../..
+           cd packages/workflow-agent-hermes && bun link && cd ../..
+           cd packages/workflow-agent-builtin && bun link && cd ../..
+         '
+         ```
+      3. Verify all three commands are available inside the container:
+         ```
+         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf --version'
+         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf-hermes --help'
+         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf-builtin --help'
+         ```
+      4. Copy host config if it exists:
+         ```
+         docker exec uwf-e2e-$$ bash -c '
+           if [ -f $HOME/.uncaged/workflow/config.yaml ]; then
+             cp $HOME/.uncaged/workflow/config.yaml $UNCAGED_WORKFLOW_STORAGE_ROOT/config.yaml
+           fi
+         '
+         ```
+
+      Report the container name and confirm uwf + agents are working.
+      Set containerName to the Docker container name for subsequent roles.
+    output: "Report uwf version and container readiness. Set $status to pass with containerName, or fail with error."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "pass" }
+            containerName: { type: string }
+          required: [$status, containerName]
+        - properties:
+            $status: { const: "fail" }
+            error: { type: string }
+          required: [$status, error]
+
+  config-and-registry:
+    description: "Validate uwf config commands and workflow registration"
+    goal: "You are an E2E test runner. Validate uwf config operations and workflow registration inside the Docker container."
+    capabilities:
+      - docker
+      - shell
+    procedure: |
+      Use the container from the previous step (containerName is in your prompt).
+      All commands run via: `docker exec <containerName> bash -c '...'`
+      All commands use `uwf` (installed via `bun link` inside the container).
+      Remember to set env vars in each exec:
+        export PATH="$HOME/.bun/bin:$PATH"
+        export UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage
+
+      Config tests:
+      1. `uwf config list` — verify it returns valid JSON
+      2. `uwf config set models.test.name test-model` — set a test key
+      3. `uwf config get models.test.name` — verify it returns "test-model"
+
+      Workflow registration tests:
+      4. `uwf workflow add ~/repos/workflow/examples/solve-issue.yaml` — register workflow
+      5. Verify the output contains a hash
+      6. `uwf workflow list` — verify non-empty array
+      7. Capture the workflow name from the list
+      8. `uwf workflow show <name>` — verify it returns roles
+
+      Report all test results with pass/fail counts.
+    output: "Report test results. Set $status to pass (with workflowName and containerName) or fail."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "pass" }
+            workflowName: { type: string }
+            containerName: { type: string }
+          required: [$status, workflowName, containerName]
+        - properties:
+            $status: { const: "fail" }
+            error: { type: string }
+            containerName: { type: string }
+          required: [$status, error, containerName]
+
+  thread-ops:
+    description: "Test thread start, list, show, and exec"
+    goal: "You are an E2E test runner. Validate thread creation and execution inside the Docker container."
+    capabilities:
+      - docker
+      - shell
+    procedure: |
+      Use the container (containerName) and workflow (workflowName) from your prompt.
+      All commands via: `docker exec <containerName> bash -c '...'`
+      Set env: PATH="$HOME/.bun/bin:$PATH" UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage
+
+      1. `uwf thread start <workflowName> -p 'E2E test: what is 2+2?'` — capture thread ID from JSON output
+      2. `uwf thread list` — verify the thread appears in the list
+      3. `uwf thread show <threadId>` — verify head pointer exists
+      4. `uwf thread exec <threadId> --agent uwf-builtin` — execute one step
+      5. Verify exec returns JSON with a head field
+
+      Report results. Pass threadId and containerName forward.
+    output: "Report test results. Set $status to pass (with threadId, workflowName, containerName) or fail."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "pass" }
+            threadId: { type: string }
+            workflowName: { type: string }
+            containerName: { type: string }
+          required: [$status, threadId, workflowName, containerName]
+        - properties:
+            $status: { const: "fail" }
+            error: { type: string }
+            containerName: { type: string }
+          required: [$status, error, containerName]
+
+  inspect:
+    description: "Test step list/show, thread read, and CAS operations"
+    goal: "You are an E2E test runner. Validate read and inspect operations inside the Docker container."
+    capabilities:
+      - docker
+      - shell
+    procedure: |
+      Use the container (containerName) and threadId from your prompt.
+      All commands via: `docker exec <containerName> bash -c '...'`
+      Set env: PATH="$HOME/.bun/bin:$PATH" UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage
+
+      Step inspection:
+      1. `uwf step list <threadId>` — verify steps array has length > 1
+      2. Capture the last step hash from the output
+      3. `uwf step show <lastStepHash>` — verify it returns a role field
+
+      Thread read:
+      4. `uwf thread read <threadId>` — verify non-empty output
+
+      CAS operations:
+      5. `uwf cas get <lastStepHash>` — verify returns a type field
+      6. `uwf cas has <lastStepHash>` — verify exits 0
+      7. `uwf cas refs <lastStepHash>` — list refs (may be empty)
+      8. `uwf cas walk <lastStepHash>` — verify returns non-empty array
+
+      Report results. Pass threadId, lastStepHash, workflowName, containerName forward.
+    output: "Report test results. Set $status to pass (with threadId, lastStepHash, workflowName, containerName) or fail."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "pass" }
+            threadId: { type: string }
+            lastStepHash: { type: string }
+            workflowName: { type: string }
+            containerName: { type: string }
+          required: [$status, threadId, lastStepHash, workflowName, containerName]
+        - properties:
+            $status: { const: "fail" }
+            error: { type: string }
+            containerName: { type: string }
+          required: [$status, error, containerName]
+
+  cancel-and-fork:
+    description: "Test thread cancel, step fork, and log inspection"
+    goal: "You are an E2E test runner. Validate cancel, fork, and log operations inside the Docker container."
+    capabilities:
+      - docker
+      - shell
+    procedure: |
+      Use containerName, threadId, lastStepHash, and workflowName from your prompt.
+      All commands via: `docker exec <containerName> bash -c '...'`
+      Set env: PATH="$HOME/.bun/bin:$PATH" UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage
+
+      Cancel:
+      1. Start a second thread: `uwf thread start <workflowName> -p 'E2E cancel test'`
+      2. Cancel it: `uwf thread cancel <secondThreadId>`
+      3. Verify it appears in completed list: `uwf thread list --status completed`
+
+      Fork:
+      4. Fork from the first thread's last step: `uwf step fork <lastStepHash>`
+      5. Verify fork creates a new thread with a different ID
+
+      Logs:
+      6. `uwf log list` — verify output (may be empty)
+      7. `uwf log show --thread <threadId>` — verify runs without error
+
+      Report results with summary.
+    output: "Report test results with summary. Set $status to pass or fail."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "pass" }
+            containerName: { type: string }
+            summary: { type: string }
+          required: [$status, containerName, summary]
+        - properties:
+            $status: { const: "fail" }
+            error: { type: string }
+            containerName: { type: string }
+          required: [$status, error, containerName]
+
+  cleanup:
+    description: "Remove Docker container"
+    goal: "You are an E2E test runner. Clean up the Docker container used for testing."
+    capabilities:
+      - docker
+      - shell
+    procedure: |
+      Remove the Docker container (containerName is in your prompt):
+      1. `docker rm -f <containerName>`
+      2. Verify the container is gone: `docker ps -a --filter name=<containerName> --format '{{.Names}}'` should return empty
+
+      Report cleanup result.
+    output: "Report cleanup result. Set $status to pass or fail."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "pass" }
+            summary: { type: string }
+          required: [$status, summary]
+        - properties:
+            $status: { const: "fail" }
+            error: { type: string }
+          required: [$status, error]
+
+graph:
+  $START:
+    _: { role: "bootstrap", prompt: "Set up the Docker container and verify uwf is runnable." }
+  bootstrap:
+    pass: { role: "config-and-registry", prompt: "Container {{{containerName}}} is ready. Validate config and workflow registration." }
+    fail: { role: "$END", prompt: "Bootstrap failed: {{{error}}}. No container was created." }
+  config-and-registry:
+    pass: { role: "thread-ops", prompt: "Config and registry OK. Workflow '{{{workflowName}}}' registered. Container: {{{containerName}}}. Now test thread operations." }
+    fail: { role: "cleanup", prompt: "Config/registry failed: {{{error}}}. Clean up container {{{containerName}}}." }
+  thread-ops:
+    pass: { role: "inspect", prompt: "Thread ops OK. threadId={{{threadId}}}, workflowName={{{workflowName}}}, containerName={{{containerName}}}. Now test inspect operations." }
+    fail: { role: "cleanup", prompt: "Thread ops failed: {{{error}}}. Clean up container {{{containerName}}}." }
+  inspect:
+    pass: { role: "cancel-and-fork", prompt: "Inspect OK. threadId={{{threadId}}}, lastStepHash={{{lastStepHash}}}, workflowName={{{workflowName}}}, containerName={{{containerName}}}. Now test cancel, fork, and logs." }
+    fail: { role: "cleanup", prompt: "Inspect failed: {{{error}}}. Clean up container {{{containerName}}}." }
+  cancel-and-fork:
+    pass: { role: "cleanup", prompt: "All tests passed! {{{summary}}}. Clean up container {{{containerName}}}." }
+    fail: { role: "cleanup", prompt: "Cancel/fork failed: {{{error}}}. Clean up container {{{containerName}}}." }
+  cleanup:
+    pass: { role: "$END", prompt: "E2E walkthrough complete. {{{summary}}}" }
+    fail: { role: "$END", prompt: "Cleanup failed: {{{error}}}. Manual cleanup may be needed." }
@@ -1,210 +0,0 @@
-name: "e2e-walkthrough"
-description: "End-to-end walkthrough of uwf CLI. Dogfooding: uwf tests uwf. Each role validates a phase of the CLI surface inside an isolated Docker container."
-roles:
-  bootstrap:
-    description: "Start Docker container with isolated storage, verify uwf is runnable"
-    goal: "You are an E2E test runner. Set up an isolated Docker environment and verify basic uwf functionality."
-    capabilities:
-      - docker
-      - shell
-    procedure: |
-      1. Create a temp dir for this E2E run: `E2E_DIR=$(mktemp -d /tmp/uwf-e2e-XXXXXX)`
-      2. Start a Docker container with isolated storage:
-         ```
-         docker run -d --name uwf-e2e-$$ \
-           -v $HOME:$HOME \
-           -e HOME=$HOME \
-           -e UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage \
-           -w ~/repos/workflow \
-           node:22-bookworm \
-           sleep infinity
-         ```
-      3. Inside the container, install bun, install deps, then `bun link` all packages
-         so that `uwf`, `uwf-hermes`, `uwf-builtin` are on PATH (from source):
-         ```
-         docker exec uwf-e2e-$$ bash -c '
-           # Install bun
-           curl -fsSL https://bun.sh/install | bash
-           export PATH="$HOME/.bun/bin:$PATH"
-
-           # Isolated storage
-           mkdir -p $UNCAGED_WORKFLOW_STORAGE_ROOT
-
-           # Install workspace deps
-           cd ~/repos/workflow && bun install --frozen-lockfile
-
-           # bun link each package that has a bin entry
-           cd packages/cli-workflow && bun link && cd ../..
-           cd packages/workflow-agent-hermes && bun link && cd ../..
-           cd packages/workflow-agent-builtin && bun link && cd ../..
-         '
-         ```
-      4. Verify all three commands are available inside the container:
-         ```
-         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf --version'
-         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf-hermes --help'
-         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf-builtin --help'
-         ```
-      5. Copy host config if it exists:
-         ```
-         docker exec uwf-e2e-$$ bash -c '
-           if [ -f $HOME/.uncaged/workflow/config.yaml ]; then
-             cp $HOME/.uncaged/workflow/config.yaml $UNCAGED_WORKFLOW_STORAGE_ROOT/config.yaml
-           fi
-         '
-         ```
-
-      Report the container name and confirm uwf + agents are working.
-      Set containerName to the Docker container name for subsequent roles.
-    output: "Report uwf version and container readiness. Set $status to pass with containerName, or fail with error."
-    frontmatter:
-      oneOf:
-        - properties:
-            $status: { const: "pass" }
-            containerName: { type: string }
-          required: [$status, containerName]
-        - properties:
-            $status: { const: "fail" }
-            error: { type: string }
-          required: [$status, error]
-
-  setup-and-registry:
-    description: "Validate uwf setup, config commands, and workflow registration"
-    goal: "You are an E2E test runner. Validate uwf config operations and workflow registration inside the Docker container."
-    capabilities:
-      - docker
-      - shell
-    procedure: |
-      Use the container from the previous step (containerName is in your prompt).
-      All commands run via: `docker exec <containerName> bash -c '...'`
-      All commands use `uwf` (installed via `bun link` inside the container).
-      Remember to set env vars in each exec:
-        export PATH="$HOME/.bun/bin:$PATH"
-        export UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage
-
-      Phase 2 — Config:
-      1. `uwf config list` — verify it returns valid JSON
-      2. `uwf config set models.test.name test-model` — set a test key
-      3. `uwf config get models.test.name` — verify it returns "test-model"
-
-      Phase 3 — Workflow registration:
-      4. `uwf workflow add ~/repos/workflow/examples/solve-issue.yaml` — register workflow
-      5. Verify the output contains a hash
-      6. `uwf workflow list` — verify non-empty array
-      7. Capture the workflow name from the list
-      8. `uwf workflow show <name>` — verify it returns roles
-
-      Report all test results with pass/fail counts.
-    output: "Report test results. Set $status to pass (with workflowName and containerName) or fail (with error and partial results)."
-    frontmatter:
-      oneOf:
-        - properties:
-            $status: { const: "pass" }
-            workflowName: { type: string }
-            containerName: { type: string }
-            testsPassed: { type: number }
-          required: [$status, workflowName, containerName]
-        - properties:
-            $status: { const: "fail" }
-            error: { type: string }
-          required: [$status, error]
-
-  thread-lifecycle:
-    description: "Test thread start, exec, read, step list/show, and CAS operations"
-    goal: "You are an E2E test runner. Validate the full thread lifecycle and CAS operations."
-    capabilities:
-      - docker
-      - shell
-    procedure: |
-      Use the container (containerName) and workflow (workflowName) from your prompt.
-      All commands via: `docker exec <containerName> bash -c '...'`
-      Set env: PATH, UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage
-
-      Phase 4 — Thread lifecycle:
-      1. `uwf thread start <workflowName> -p 'E2E test: what is 2+2?'` — capture thread ID
-      2. `uwf thread list` — verify thread appears
-      3. `uwf thread show <threadId>` — verify head pointer exists
-      4. `uwf thread exec <threadId> --agent uwf-builtin` — execute one step
-      5. Verify exec returns step info with head
-
-      Phase 5 — Read & Inspect:
-      6. `uwf step list <threadId>` — verify steps exist (length > 1)
-      7. Capture last step hash
-      8. `uwf step show <lastStepHash>` — verify it returns role
-      9. `uwf thread read <threadId>` — verify non-empty output
-      10. `uwf cas get <lastStepHash>` — verify returns type
-      11. `uwf cas has <lastStepHash>` — verify exists
-      12. `uwf cas refs <lastStepHash>` — list refs
-      13. `uwf cas walk <lastStepHash>` — verify returns nodes
-
-      Report all results. Pass the threadId and lastStepHash forward.
-    output: "Report test results. Set $status to pass (with threadId, lastStepHash, containerName) or fail."
-    frontmatter:
-      oneOf:
-        - properties:
-            $status: { const: "pass" }
-            threadId: { type: string }
-            lastStepHash: { type: string }
-            containerName: { type: string }
-            testsPassed: { type: number }
-          required: [$status, threadId, lastStepHash, containerName]
-        - properties:
-            $status: { const: "fail" }
-            error: { type: string }
-          required: [$status, error]
-
-  cancel-fork-and-logs:
-    description: "Test thread cancel, step fork, and log inspection"
-    goal: "You are an E2E test runner. Validate cancel, fork, and log operations."
-    capabilities:
-      - docker
-      - shell
-    procedure: |
-      Use containerName, threadId (first thread), lastStepHash, and workflowName from your prompt.
-      All commands via: `docker exec <containerName> bash -c '...'`
-      Set env: PATH, UNCAGED_WORKFLOW_STORAGE_ROOT=/tmp/uwf-e2e-storage
-
-      Phase 6 — Cancel & Fork:
-      1. Start a second thread: `uwf thread start <workflowName> -p 'E2E cancel test'`
-      2. Cancel it: `uwf thread cancel <secondThreadId>`
-      3. Verify it appears in completed list: `uwf thread list --status completed`
-      4. Fork from the first thread's last step: `uwf step fork <lastStepHash>`
-      5. Verify fork creates a new thread with different ID
-
-      Phase 7 — Logs:
-      6. `uwf log list` — check log files exist
-      7. `uwf log show --thread <threadId>` — verify log output (may be empty, that's ok)
-
-      Phase 8 — Cleanup:
-      8. Stop and remove the Docker container: `docker rm -f <containerName>`
-
-      Report final results with full summary of all phases.
-    output: "Report final test results with pass/fail counts. Set $status to pass or fail."
-    frontmatter:
-      oneOf:
-        - properties:
-            $status: { const: "pass" }
-            totalPassed: { type: number }
-            summary: { type: string }
-          required: [$status, totalPassed, summary]
-        - properties:
-            $status: { const: "fail" }
-            error: { type: string }
-            totalPassed: { type: number }
-          required: [$status, error]
-
-graph:
-  $START:
-    _: { role: "bootstrap", prompt: "Set up the Docker container and verify uwf is runnable." }
-  bootstrap:
-    pass: { role: "setup-and-registry", prompt: "Container {{{containerName}}} is ready. Validate config and workflow registration." }
-    fail: { role: "$END", prompt: "Bootstrap failed: {{{error}}}" }
-  setup-and-registry:
-    pass: { role: "thread-lifecycle", prompt: "Config and registry OK. Workflow '{{{workflowName}}}' registered. Container: {{{containerName}}}. Now test thread lifecycle." }
-    fail: { role: "$END", prompt: "Setup/registry failed: {{{error}}}" }
-  thread-lifecycle:
-    pass: { role: "cancel-fork-and-logs", prompt: "Thread lifecycle OK. threadId={{{threadId}}}, lastStepHash={{{lastStepHash}}}, containerName={{{containerName}}}. Now test cancel, fork, logs, and cleanup." }
-    fail: { role: "$END", prompt: "Thread lifecycle failed: {{{error}}}" }
-  cancel-fork-and-logs:
-    pass: { role: "$END", prompt: "All E2E tests passed! {{{summary}}}" }
-    fail: { role: "$END", prompt: "Cancel/fork/logs phase failed: {{{error}}}. Passed: {{{totalPassed}}}" }
@@ -150,46 +150,42 @@ function dbMessageToSessionMessage(row: DbMessageRow): HermesSessionMessage {
 export function loadHermesSessionFromDb(
  sessionId: string,
  dbPath: string | null = null,
-): Promise<HermesSessionJson | null> {
+): HermesSessionJson | null {
  const resolvedPath = dbPath ?? getHermesDbPath();
+  let db: InstanceType<typeof Database> | null = null;
  try {
-    const db = new Database(resolvedPath, { readonly: true });
-    try {
-      const session = db
-        .query("SELECT id, model, started_at FROM sessions WHERE id = ?")
-        .get(sessionId) as DbSessionRow | null;
-      if (session === null) {
-        db.close();
-        return Promise.resolve(null);
-      }
-      const rows = db
-        .query(
-          "SELECT role, content, reasoning, tool_calls FROM messages WHERE session_id = ? ORDER BY id",
-        )
-        .all(sessionId) as DbMessageRow[];
-      db.close();
-
-      const messages: HermesSessionMessage[] = [];
-      for (const row of rows) {
-        const role = row.role;
-        if (role !== "user" && role !== "assistant" && role !== "tool") {
-          continue;
-        }
-        messages.push(dbMessageToSessionMessage(row));
-      }
-
-      return Promise.resolve({
-        session_id: session.id,
-        model: session.model,
-        session_start: new Date(session.started_at * 1000).toISOString(),
-        messages,
-      });
-    } catch {
-      db.close();
-      return Promise.resolve(null);
+    db = new Database(resolvedPath, { readonly: true });
+    const session = db
+      .query("SELECT id, model, started_at FROM sessions WHERE id = ?")
+      .get(sessionId) as DbSessionRow | null;
+    if (session === null) {
+      return null;
    }
+    const rows = db
+      .query(
+        "SELECT role, content, reasoning, tool_calls FROM messages WHERE session_id = ? ORDER BY id",
+      )
+      .all(sessionId) as DbMessageRow[];
+
+    const messages: HermesSessionMessage[] = [];
+    for (const row of rows) {
+      const role = row.role;
+      if (role !== "user" && role !== "assistant" && role !== "tool") {
+        continue;
+      }
+      messages.push(dbMessageToSessionMessage(row));
+    }
+
+    return {
+      session_id: session.id,
+      model: session.model,
+      session_start: new Date(session.started_at * 1000).toISOString(),
+      messages,
+    };
  } catch {
-    return Promise.resolve(null);
+    return null;
+  } finally {
+    db?.close();
  }
 }
Author	SHA1	Message	Date
xingyue	168e604602	ci: use test:ci to skip integration tests in CI CI / test (pull_request) Successful in 9m13s Details The HermesAcpClient integration tests require a live Hermes agent process and always timeout (3 × 120s) in CI containers, causing every CI run to fail for ~6 minutes before reporting failure. Switch from `bun run test` to `bun run test:ci` which was already defined in all testable packages — workflow-agent-hermes's test:ci runs only unit tests (__tests__/*.test.ts), skipping integration/.	2026-05-26 23:08:16 +08:00
xiaoju	d50159c5a7	refactor: split e2e-walkthrough into 6 roles with dedicated cleanup CI / test (push) Failing after 11m29s Details - bootstrap: Docker + bun install + bun link + verify - config-and-registry: config get/set/list + workflow add/show/list - thread-ops: thread start/list/show/exec - inspect: step list/show + thread read + CAS get/has/refs/walk - cancel-and-fork: cancel + fork + logs - cleanup: docker rm -f (all fail paths route here) 小橘 🍊	2026-05-26 14:47:44 +00:00
xiaoju	9a7ad34e55	chore: move e2e-walkthrough to .workflows/, fix CI, clean .plan/ CI / test (push) Failing after 11m54s Details - e2e-walkthrough.yaml: examples/ → .workflows/ (project workflows, not examples) - .gitea/workflows/ci.yml: bun test → bun run test (avoid legacy-packages) - .plan/: removed stale test spec from #335 小橘 🍊	2026-05-26 14:37:46 +00:00
xiaoju	4193157124	refactor(hermes): clean up loadHermesSessionFromDb CI / test (push) Failing after 11m14s Details - Remove unnecessary Promise.resolve() wrappers (sync function) - Use try/finally for db.close() instead of manual close at each exit - Flatten nested try/catch Follow-up to #535 review nits. 小橘 🍊	2026-05-26 14:27:31 +00:00
xiaomo	6ff1414cf0	Merge pull request 'fix(hermes): add SQLite fallback for loadHermesSession' (#536 ) from fix/535-sqlite-fallback into main CI / test (push) Failing after 9m23s Details Merge pull request #536: fix(hermes): add SQLite fallback for loadHermesSession	2026-05-26 14:24:42 +00:00