united-workforce/legacy-packages/workflows/e2e-walkthrough.yaml

name: "e2e-walkthrough"
description: "End-to-end walkthrough of uwf CLI. Dogfooding: uwf tests uwf. Each role validates a phase of the CLI surface inside an isolated Docker container."
roles:
  bootstrap:
    description: "Start Docker container with isolated storage, verify uwf is runnable"
    goal: "You are an E2E test runner. Set up an isolated Docker environment and verify basic uwf functionality."
    capabilities:
      - docker
      - shell
    procedure: |
      1. Start a Docker container with isolated storage.
         IMPORTANT: Mount the source code READ-ONLY to prevent the container
         from overwriting host files (e.g. bun install would replace macOS bun with Linux bun).
         Use a container-local HOME so bun/npm installs stay inside the container.
         Add host.docker.internal mapping for LLM API access from inside the container.
         ```
         docker run -d --name uwf-e2e-$$ \
           -v "$(pwd):/workspace:ro" \
           -e HOME=/root \
           -e UWF_HOME=/tmp/uwf-e2e-storage \
           --add-host=host.docker.internal:host-gateway \
           -w /workspace \
           node:22-bookworm \
           sleep infinity
         ```
         NOTE: Run this from the workflow monorepo root directory.
         On macOS Docker Desktop, host.docker.internal is already available;
         --add-host ensures it also works on Linux Docker.

      2. Inside the container, copy source to a writable location, install bun, install deps,
         then `bun link` all packages so that `uwf`, `uwf-hermes`, `uwf-builtin` are on PATH:
         ```
         docker exec uwf-e2e-$$ bash -c '
           # Copy source to writable location (mount is read-only)
           cp -r /workspace /root/workflow

           # Install bun
           curl -fsSL https://bun.sh/install | bash
           export PATH="$HOME/.bun/bin:$PATH"

           # Isolated storage
           mkdir -p $UWF_HOME

           # Install workspace deps
           cd /root/workflow && bun install

           # bun link each package that has a bin entry
           cd packages/cli && bun link && cd ../..
           cd packages/agent-hermes && bun link && cd ../..
           cd packages/agent-builtin && bun link && cd ../..
         '
         ```
      3. Verify all three commands are available inside the container:
         ```
         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf --version'
         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf-hermes --help'
         docker exec uwf-e2e-$$ bash -c 'export PATH="$HOME/.bun/bin:$PATH" && uwf-builtin --help'
         ```
      4. Copy host uwf config into the container's isolated storage.
         The host config contains provider credentials and model settings needed for LLM calls.
         Also rewrite any localhost URLs to host.docker.internal so the container can reach host services.
         ```
         docker cp ~/.uwf/config.yaml uwf-e2e-$$:/tmp/uwf-e2e-storage/config.yaml 2>/dev/null || true
         docker exec uwf-e2e-$$ bash -c '
           if [ -f $UWF_HOME/config.yaml ]; then
             sed -i "s|localhost|host.docker.internal|g; s|127\.0\.0\.1|host.docker.internal|g" \
               $UWF_HOME/config.yaml
           fi
         '
         ```

      Report the container name and confirm uwf + agents are working.
      Set containerName to the Docker container name for subsequent roles.
    output: "Report uwf version and container readiness. Set $status to pass with containerName, or fail with error."
    frontmatter:
      oneOf:
        - properties:
            $status: { const: "pass" }
            containerName: { type: string }
          required: [$status, containerName]
        - properties:
            $status: { const: "fail" }
            error: { type: string }
          required: [$status, error]

  config-and-registry:
    description: "Validate uwf config commands and workflow registration"
    goal: "You are an E2E test runner. Validate uwf config operations and workflow registration inside the Docker container."
    capabilities:
      - docker
      - shell
    procedure: |
      Use the container from the previous step (containerName is in your prompt).
      All commands run via: `docker exec <containerName> bash -c '...'`
      All commands use `uwf` (installed via `bun link` inside the container).
      Remember to set env vars in each exec:
        export PATH="$HOME/.bun/bin:$PATH"
        export UWF_HOME=/tmp/uwf-e2e-storage

      Config tests:
      1. `uwf config list` — verify it returns valid JSON
      2. `uwf config set models.test.name test-model` — set a test key
      3. `uwf config get models.test.name` — verify it returns "test-model"

      Workflow registration tests:
      4. `uwf workflow add /root/workflow/examples/debate.yaml` — register a workflow (use debate.yaml as it has no $SUSPEND dependency)
      5. Verify the output contains a hash
      6. `uwf workflow list` — verify non-empty array
      7. Capture the workflow name from the list
      8. `uwf workflow show <name>` — verify it returns roles

      Report all test results with pass/fail counts.
    output: "Report test results. Set $status to pass (with workflowName and containerName) or fail."
    frontmatter:
      oneOf:
        - properties:
            $status: { const: "pass" }
            workflowName: { type: string }
            containerName: { type: string }
          required: [$status, workflowName, containerName]
        - properties:
            $status: { const: "fail" }
            error: { type: string }
            containerName: { type: string }
          required: [$status, error, containerName]

  thread-ops:
    description: "Test thread start, list, show, and exec"
    goal: "You are an E2E test runner. Validate thread creation and execution inside the Docker container."
    capabilities:
      - docker
      - shell
    procedure: |
      Use the container (containerName) and workflow (workflowName) from your prompt.
      All commands via: `docker exec <containerName> bash -c '...'`
      Set env: PATH="$HOME/.bun/bin:$PATH" UWF_HOME=/tmp/uwf-e2e-storage

      1. `uwf thread start <workflowName> -p 'E2E test: what is 2+2?'` — capture thread ID from JSON output
      2. `uwf thread list` — verify the thread appears in the list
      3. `uwf thread show <threadId>` — verify head pointer exists
      4. `uwf thread exec <threadId> --agent uwf-builtin` — execute one step
      5. Verify exec returns JSON with a head field

      Report results. Pass threadId and containerName forward.
    output: "Report test results. Set $status to pass (with threadId, workflowName, containerName) or fail."
    frontmatter:
      oneOf:
        - properties:
            $status: { const: "pass" }
            threadId: { type: string }
            workflowName: { type: string }
            containerName: { type: string }
          required: [$status, threadId, workflowName, containerName]
        - properties:
            $status: { const: "fail" }
            error: { type: string }
            containerName: { type: string }
          required: [$status, error, containerName]

  inspect:
    description: "Test step list/show, thread read, and CAS operations"
    goal: "You are an E2E test runner. Validate read and inspect operations inside the Docker container."
    capabilities:
      - docker
      - shell
    procedure: |
      Use the container (containerName) and threadId from your prompt.
      All commands via: `docker exec <containerName> bash -c '...'`
      Set env: PATH="$HOME/.bun/bin:$PATH" UWF_HOME=/tmp/uwf-e2e-storage

      Step inspection:
      1. `uwf step list <threadId>` — verify steps array has length > 1
      2. Capture the last step hash from the output
      3. `uwf step show <lastStepHash>` — verify it returns a role field

      Thread read:
      4. `uwf thread read <threadId>` — verify non-empty output

      CAS operations:
      5. `ocas get <lastStepHash>` — verify returns a type field
      6. `ocas has <lastStepHash>` — verify exits 0
      7. `ocas refs <lastStepHash>` — list refs (may be empty)
      8. `ocas walk <lastStepHash>` — verify returns non-empty array

      Report results. Pass threadId, lastStepHash, workflowName, containerName forward.
    output: "Report test results. Set $status to pass (with threadId, lastStepHash, workflowName, containerName) or fail."
    frontmatter:
      oneOf:
        - properties:
            $status: { const: "pass" }
            threadId: { type: string }
            lastStepHash: { type: string }
            workflowName: { type: string }
            containerName: { type: string }
          required: [$status, threadId, lastStepHash, workflowName, containerName]
        - properties:
            $status: { const: "fail" }
            error: { type: string }
            containerName: { type: string }
          required: [$status, error, containerName]

  cancel-and-fork:
    description: "Test thread cancel, step fork, and log inspection"
    goal: "You are an E2E test runner. Validate cancel, fork, and log operations inside the Docker container."
    capabilities:
      - docker
      - shell
    procedure: |
      Use containerName, threadId, lastStepHash, and workflowName from your prompt.
      All commands via: `docker exec <containerName> bash -c '...'`
      Set env: PATH="$HOME/.bun/bin:$PATH" UWF_HOME=/tmp/uwf-e2e-storage

      Cancel:
      1. Start a second thread: `uwf thread start <workflowName> -p 'E2E cancel test'`
      2. Cancel it: `uwf thread cancel <secondThreadId>`
      3. Verify it appears in cancelled list: `uwf thread list --status cancelled`

      Fork:
      4. Fork from the first thread's last step: `uwf step fork <lastStepHash>`
      5. Verify fork creates a new thread with a different ID

      Logs:
      6. `uwf log list` — verify output (may be empty)
      7. `uwf log show --thread <threadId>` — verify runs without error

      Report results with summary.
    output: "Report test results with summary. Set $status to pass or fail."
    frontmatter:
      oneOf:
        - properties:
            $status: { const: "pass" }
            containerName: { type: string }
            summary: { type: string }
          required: [$status, containerName, summary]
        - properties:
            $status: { const: "fail" }
            error: { type: string }
            containerName: { type: string }
          required: [$status, error, containerName]

  cleanup:
    description: "Remove Docker container"
    goal: "You are an E2E test runner. Clean up the Docker container used for testing."
    capabilities:
      - docker
      - shell
    procedure: |
      Remove the Docker container (containerName is in your prompt):
      1. `docker rm -f <containerName>`
      2. Verify the container is gone: `docker ps -a --filter name=<containerName> --format '{{.Names}}'` should return empty

      Report cleanup result.
    output: "Report cleanup result. Set $status to pass or fail."
    frontmatter:
      oneOf:
        - properties:
            $status: { const: "pass" }
            summary: { type: string }
          required: [$status, summary]
        - properties:
            $status: { const: "fail" }
            error: { type: string }
          required: [$status, error]

graph:
  $START:
    new: { role: "bootstrap", prompt: "Set up the Docker container and verify uwf is runnable." }
    resume: { role: "bootstrap", prompt: "Review the previous run output and continue the walkthrough." }
  bootstrap:
    pass: { role: "config-and-registry", prompt: "Container {{{containerName}}} is ready. Validate config and workflow registration." }
    fail: { role: "$END", prompt: "Bootstrap failed: {{{error}}}. No container was created." }
  config-and-registry:
    pass: { role: "thread-ops", prompt: "Config and registry OK. Workflow '{{{workflowName}}}' registered. Container: {{{containerName}}}. Now test thread operations." }
    fail: { role: "cleanup", prompt: "Config/registry failed: {{{error}}}. Clean up container {{{containerName}}}." }
  thread-ops:
    pass: { role: "inspect", prompt: "Thread ops OK. threadId={{{threadId}}}, workflowName={{{workflowName}}}, containerName={{{containerName}}}. Now test inspect operations." }
    fail: { role: "cleanup", prompt: "Thread ops failed: {{{error}}}. Clean up container {{{containerName}}}." }
  inspect:
    pass: { role: "cancel-and-fork", prompt: "Inspect OK. threadId={{{threadId}}}, lastStepHash={{{lastStepHash}}}, workflowName={{{workflowName}}}, containerName={{{containerName}}}. Now test cancel, fork, and logs." }
    fail: { role: "cleanup", prompt: "Inspect failed: {{{error}}}. Clean up container {{{containerName}}}." }
  cancel-and-fork:
    pass: { role: "cleanup", prompt: "All tests passed! {{{summary}}}. Clean up container {{{containerName}}}." }
    fail: { role: "cleanup", prompt: "Cancel/fork failed: {{{error}}}. Clean up container {{{containerName}}}." }
  cleanup:
    pass: { role: "$END", prompt: "E2E walkthrough complete. {{{summary}}}" }
    fail: { role: "$END", prompt: "Cleanup failed: {{{error}}}. Manual cleanup may be needed." }