docs: upgrade debate example — 3 roles, oneOf routing, bounded termination

Based on xiaonuo's battle-tested debate workflow. Changes: - 3 roles (proponent/opponent/host) instead of 2 - oneOf + const schema (speak/conceded/final) instead of enum - Bounded: 3rd speech forces final statement - Host role delivers summary/highlights/verdict - Critical thinking framework in procedure - Edge prompts pass structured data via {{{argument}}}/{{{closing}}}
2026-06-06 03:01:37 +00:00
21 changed files with 676 additions and 144 deletions
@@ -0,0 +1,12 @@
+---
+"@united-workforce/agent-hermes": patch
+"@united-workforce/agent-claude-code": patch
+"@united-workforce/util-agent": patch
+---
+
+feat: inject thread progress into agent prompt (#127)
+
+Agents now receive a "Thread Progress" section in their prompt showing the
+current step number and how many times the current role has spoken before.
+This eliminates the need for agents to make tool calls (terminal, delegate_task)
+just to count their own turn history.
@@ -0,0 +1,247 @@
+name: "solve-issue"
+description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds."
+roles:
+  planner:
+    description: "Analyzes issue and outputs a TDD test spec"
+    goal: "You are a planning agent. You analyze Gitea issues and produce a TDD test specification that downstream roles will implement and verify."
+    capabilities:
+      - issue-analysis
+      - planning
+    procedure: |
+      On first run (no previous steps):
+      1. Read the issue and all comments from Gitea using `tea issues <number> -r <owner/repo>`
+      2. Look for project conventions files (CLAUDE.md, CONTRIBUTING.md, .cursor/rules/) in the repo
+      3. Assess whether the issue has enough information to produce a test spec
+      4. If insufficient info: comment on the issue via `echo "..." | tea comment <number> -r <owner/repo>` (skip if you already commented), then output $status=insufficient_info
+      5. If sufficient: produce a detailed TDD test spec in markdown covering all scenarios
+
+      On subsequent runs (bounced back by tester with fix_spec):
+      1. Read the tester's output from the previous step to understand what's wrong with the spec
+      2. Revise the test spec accordingly
+
+      After producing the test spec:
+      1. The test spec is stored in CAS automatically by the uwf pipeline (agents do not need to call `ocas put` directly)
+      2. Put the plan hash in frontmatter.plan (required when $status=ready)
+      3. Set repoPath to the absolute path of the repository root
+
+      IMPORTANT: Extract the repo remote (owner/repo) from git:
+      ```bash
+      git remote get-url origin | sed 's|.*[:/]\([^/]*/[^.]*\).*|\1|'
+      ```
+      Store the result as repoRemote in your frontmatter output so downstream roles can use it for tea/API calls.
+    output: "Output a brief summary of the test spec. Set $status to ready (with plan hash and repoPath) or insufficient_info."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "ready" }
+            plan: { type: string }
+            repoPath: { type: string }
+            repoRemote: { type: string }
+          required: [$status, plan, repoPath, repoRemote]
+        - properties:
+            $status: { const: "insufficient_info" }
+            reason: { type: string }
+          required: [$status, reason]
+  developer:
+    description: "TDD implementation per test spec"
+    goal: "You are a developer agent. You implement code changes following TDD — write tests first, then implementation."
+    capabilities:
+      - coding
+    procedure: |
+      IMPORTANT: Always work in a git worktree, NEVER modify the main working directory directly.
+      The repo path and other details are provided in your task prompt.
+
+      Before starting any work, set up an isolated worktree:
+      1. cd into the repo path provided in your task prompt
+      2. `git fetch origin` to get latest refs
+      3. First time (no existing branch):
+         - `git worktree add .worktrees/fix/<issue-number>-<short-slug> -b fix/<issue-number>-<short-slug> origin/main`
+         - `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`
+      4. If bounced back from reviewer or tester (branch already exists):
+         - cd into the existing worktree under `.worktrees/fix/<issue-number>-<short-slug>`
+         - `git fetch origin && git rebase origin/main`
+      5. ALL subsequent work must happen inside the worktree directory.
+
+      Then implement TDD:
+      6. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner's output in your task prompt)
+      7. If bounced back from reviewer or tester: read the previous role's feedback in your task prompt
+      8. Write tests first based on the spec
+      9. Implement the code to make tests pass
+      10. Ensure `bun run build` passes with no errors
+      11. Run `bun test` to verify all tests pass
+          - If tests fail on first run:
+            * Read the test output carefully for missing imports or setup issues
+            * Check if you're running tests from the correct working directory (package root vs workspace root)
+            * Fix the immediate issue and rerun ONCE
+            * If tests still fail after 2 attempts: check the test spec for ambiguities
+            * If stuck after 3 test cycles: set $status=failed with detailed error report rather than continuing blind retries
+      12. MANDATORY VERIFICATION before reporting done:
+          - Run `git branch --show-current` and confirm branch name matches expected
+          - Run `git status` and verify changed files exist
+          - Run `ls -la <key-implementation-files>` to verify they exist on disk
+          - If ANY verification fails: retry the implementation, do NOT report done
+
+      If you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,
+      or repeated attempts fail), set $status=failed with a reason.
+    output: "List all files changed and provide a summary. Set $status to done (with branch/worktree), or failed (with reason)."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "done" }
+            branch: { type: string }
+            worktree: { type: string }
+            repoRemote: { type: string }
+          required: [$status, branch, worktree]
+        - properties:
+            $status: { const: "failed" }
+            reason: { type: string }
+          required: [$status, reason]
+  reviewer:
+    description: "Code standards compliance check"
+    goal: "You are a code reviewer. You verify code standards compliance — NOT functionality (that's the tester's job)."
+    capabilities:
+      - code-review
+      - static-analysis
+    procedure: |
+      The worktree path is provided in your task prompt. cd into it first.
+
+      CRITICAL: You MUST execute every verification command below. Do NOT report results without running the actual commands. Do NOT rely on prior context or assumptions.
+
+      Before reviewing, verify the worktree and branch exist:
+      0. Run `cd <worktree-path> && pwd` to confirm the path is accessible
+         - If the cd fails: the worktree truly doesn't exist, reject with that reason
+         - If the cd succeeds: proceed with step 1 below
+      1. Run `git branch --show-current` — confirm the branch name references the issue number being worked on
+      2. If the branch doesn't correspond to the issue, flag it in your output and reject
+
+      Then perform code review:
+      Hard checks (must all pass):
+      3. `bun run build` — no build errors
+      4. `bunx biome check` — no lint violations
+      5. TypeScript strict mode — no type errors
+
+      Soft checks (review against project conventions if CLAUDE.md / .cursor/rules exist):
+      - Naming conventions, module boundaries, code style
+      - No `console.log` in production code
+      - No dynamic imports in production code
+
+      Only review standards compliance. Do NOT test functionality.
+      If rejecting, you MUST explain the specific reason in your output.
+    output: "Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments)."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "approved" }
+            branch: { type: string }
+            worktree: { type: string }
+            repoRemote: { type: string }
+          required: [$status, branch, worktree]
+        - properties:
+            $status: { const: "rejected" }
+            comments: { type: string }
+            worktree: { type: string }
+            repoRemote: { type: string }
+          required: [$status, comments, worktree]
+  tester:
+    description: "Functional correctness verification"
+    goal: "You are a tester agent. You verify that the implementation correctly satisfies every scenario in the test spec."
+    capabilities:
+      - testing
+    procedure: |
+      The worktree path is provided in your task prompt. cd into it first.
+
+      1. Run `bun test` for automated test verification
+      2. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner step in the thread history)
+      3. Verify each scenario in the spec is covered and passing
+      4. Determine outcome:
+         - passed: all scenarios verified, tests pass
+         - fix_code: tests fail or implementation doesn't match spec → send back to developer
+         - fix_spec: the spec itself is wrong or incomplete → send back to planner
+    output: "Report test results per scenario. Set $status to passed (with branch/worktree), fix_code (with report), or fix_spec (with report)."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "passed" }
+            branch: { type: string }
+            worktree: { type: string }
+            repoRemote: { type: string }
+          required: [$status, branch, worktree]
+        - properties:
+            $status: { const: "fix_code" }
+            report: { type: string }
+            repoRemote: { type: string }
+            worktree: { type: string }
+            branch: { type: string }
+          required: [$status, report]
+        - properties:
+            $status: { const: "fix_spec" }
+            report: { type: string }
+            repoRemote: { type: string }
+            worktree: { type: string }
+            branch: { type: string }
+          required: [$status, report]
+  committer:
+    description: "Commits and creates PR"
+    goal: "You are a committer agent. You create a clean commit and push a PR linking the original issue."
+    capabilities: []
+    procedure: |
+      The worktree path, branch name, and repo remote (owner/repo) are provided in your task prompt.
+      cd into the worktree first.
+
+      Note: You inherit the developer's worktree and branch. Do NOT create a new branch.
+      1. Check `git status` — if working tree is clean and branch is ahead of origin, skip to step 3 (push).
+      2. If there are unstaged/uncommitted changes: `git add -A` then `git commit -m "type: description\n\nFixes #N"`
+      3. Push the branch: `git push -u origin <branch-name>`
+      4. **Verify push succeeded** — run `git ls-remote origin <branch-name>` and confirm it prints a commit hash.
+         - If no output or push failed: capture the error, mark hook_failed
+      5. Create a PR using the Gitea API (do NOT use `tea pr create` — it fails in worktrees):
+         ```bash
+         GITEA_TOKEN=$(cfg get GITEA_TOKEN)
+         curl -s -X POST -H "Authorization: token $GITEA_TOKEN" -H "Content-Type: application/json" \
+           "https://git.shazhou.work/api/v1/repos/<owner>/<repo>/pulls" \
+           -d '{"title":"...","body":"...","head":"<branch>","base":"main"}'
+         ```
+         - The repo remote (owner/repo format, e.g. "shazhou/united-workforce") is given in your task prompt — use it directly.
+         - PR body must include: What / Why / Changes / Ref sections, with `Fixes #N` in Ref
+      6. **Verify PR was created** — parse the curl response JSON: it must contain a `"number"` field. Print the PR URL.
+         - If curl returns an error or no number field: capture the response, mark hook_failed
+      7. After PR creation, clean up the worktree:
+         - cd to the repo root (parent of .worktrees)
+         - `git worktree remove <worktree-path>`
+    output: "Include PR URL on success or error log on failure. Set $status to committed (with prUrl) or hook_failed (with error)."
+    frontmatter:
+      oneOf:
+        - properties:
+            $status: { const: "committed" }
+            prUrl: { type: string }
+            repoRemote: { type: string }
+            worktree: { type: string }
+            branch: { type: string }
+          required: [$status, prUrl]
+        - properties:
+            $status: { const: "hook_failed" }
+            error: { type: string }
+            repoRemote: { type: string }
+            worktree: { type: string }
+            branch: { type: string }
+          required: [$status, error]
+graph:
+  $START:
+    new: { role: "planner", prompt: "Analyze the issue and produce an implementation plan." }
+    resume: { role: "planner", prompt: "Review the previous run output and continue the work." }
+  planner:
+    insufficient_info: { role: "$SUSPEND", prompt: "信息不足，需要补充：{{{reason}}}" }
+    ready: { role: "developer", prompt: "Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}. Repo remote: {{{repoRemote}}}." }
+  developer:
+    done: { role: "reviewer", prompt: "Review branch {{{branch}}} at {{{worktree}}} for code standards compliance. Repo remote: {{{repoRemote}}}." }
+    failed: { role: "$END", prompt: "Developer failed: {{{reason}}}. Ending workflow." }
+  reviewer:
+    rejected: { role: "developer", prompt: "Reviewer rejected: {{{comments}}}. Fix the issues in repo {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
+    approved: { role: "tester", prompt: "Review passed. Run tests on branch {{{branch}}} at {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
+  tester:
+    fix_code: { role: "developer", prompt: "Tests found code issues: {{{report}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
+    fix_spec: { role: "planner", prompt: "Tests found spec issues: {{{report}}}. Revise the test spec. Repo remote: {{{repoRemote}}}." }
+    passed: { role: "committer", prompt: "All tests passed. Commit and push branch {{{branch}}} from {{{worktree}}}. Repo remote (owner/repo): {{{repoRemote}}}." }
+  committer:
+    hook_failed: { role: "developer", prompt: "Push hook failed: {{{error}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
+    committed: { role: "$END", prompt: "PR created: {{{prUrl}}}. Workflow complete." }
@@ -23,7 +23,7 @@ roles:
      type: object
      properties:
        $status:
-          const: done
+          enum: ["done"]
        thesis:
          type: string
        keyPoints:
@@ -1,22 +1,5 @@
 name: debate
-description: "Multi-role structured debate with critical thinking framework and host summary."
-
-# Shared frontmatter schema for debater roles (YAML anchor)
-x-debater-frontmatter: &debater-frontmatter
-  type: object
-  oneOf:
-    - properties:
-        $status: { const: speak }
-        argument: { type: string }
-      required: [$status, argument]
-    - properties:
-        $status: { const: conceded }
-        reason: { type: string }
-      required: [$status, reason]
-    - properties:
-        $status: { const: final }
-        closing: { type: string }
-      required: [$status, closing]
+description: "Structured two-side debate with host summary. Demonstrates multi-role coordination, oneOf output routing, and bounded termination."

 roles:
  proponent:
@@ -26,13 +9,12 @@ roles:
    procedure: |
      You are an experienced scholar arguing FOR the proposition.

-      ## Critical Thinking Framework (execute before every speech)
+      ## Critical Thinking Framework (apply before every response)

-      ### A. Pre-speech reflection (internal, do not output)
-      - Does every step in my argument chain hold? Any hidden assumptions or logical gaps?
+      ### A. Pre-response reflection (internal, do not output)
+      - Does every step in my argument chain hold? Any hidden assumptions?
      - If I were my opponent, how would I attack this? Where am I weakest?
      - Does my evidence actually support my claim, or could it backfire?
-      - Should I go on offense or defense this round?

      ### B. Evidence discipline
      - Verify key numbers — watch for order-of-magnitude errors
@@ -50,7 +32,21 @@ roles:
      4. Otherwise output $status: speak and counter the opponent's points.
      5. Be rigorous, cite evidence, stay concise.
    output: "Debate argument"
-    frontmatter: *debater-frontmatter
+    frontmatter:
+      type: object
+      oneOf:
+        - properties:
+            $status: { const: speak }
+            argument: { type: string }
+          required: [$status, argument]
+        - properties:
+            $status: { const: conceded }
+            reason: { type: string }
+          required: [$status, reason]
+        - properties:
+            $status: { const: final }
+            closing: { type: string }
+          required: [$status, closing]

  opponent:
    description: "Argues AGAINST the proposition"
@@ -59,13 +55,12 @@ roles:
    procedure: |
      You are an experienced scholar arguing AGAINST the proposition.

-      ## Critical Thinking Framework (execute before every speech)
+      ## Critical Thinking Framework (apply before every response)

-      ### A. Pre-speech reflection (internal, do not output)
-      - Does every step in my argument chain hold? Any hidden assumptions or logical gaps?
+      ### A. Pre-response reflection (internal, do not output)
+      - Does every step in my argument chain hold? Any hidden assumptions?
      - If I were my opponent, how would I attack this? Where am I weakest?
      - Does my evidence actually support my claim, or could it backfire?
-      - Should I go on offense or defense this round?

      ### B. Evidence discipline
      - Verify key numbers — watch for order-of-magnitude errors
@@ -83,7 +78,21 @@ roles:
      4. Otherwise output $status: speak and counter the proponent's points.
      5. Be rigorous, cite evidence, stay concise.
    output: "Debate argument"
-    frontmatter: *debater-frontmatter
+    frontmatter:
+      type: object
+      oneOf:
+        - properties:
+            $status: { const: speak }
+            argument: { type: string }
+          required: [$status, argument]
+        - properties:
+            $status: { const: conceded }
+            reason: { type: string }
+          required: [$status, reason]
+        - properties:
+            $status: { const: final }
+            closing: { type: string }
+          required: [$status, closing]

  host:
    description: "Debate moderator — delivers impartial summary and verdict"
@@ -18,7 +18,8 @@ roles:
      type: object
      properties:
        $status:
-          const: done
+          type: string
+          enum: [done]
        summary:
          type: string
      required: [$status, summary]
@@ -1,5 +1,5 @@
 name: "solve-issue"
-description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds. Uses pnpm."
+description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds."
 roles:
  planner:
    description: "Analyzes issue and outputs a TDD test spec"
@@ -80,7 +80,7 @@ roles:
      2. `git fetch origin` to get latest refs
      3. First time (no existing branch):
         - `git worktree add .worktrees/fix/<issue-number>-<short-slug> -b fix/<issue-number>-<short-slug> origin/main`
-         - `cd .worktrees/fix/<issue-number>-<short-slug> && pnpm install`
+         - `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`
      4. If continuing on existing branch (prompt says "Continue work on existing branch" or provides a worktree path):
         - cd directly into the worktree path provided in the prompt
         - `git fetch origin && git rebase origin/main`
@@ -95,20 +95,8 @@ roles:
      7. If bounced back from reviewer or tester: read the previous role's feedback in your task prompt
      8. Write tests first based on the spec
      9. Implement the code to make tests pass
-      10. Ensure `pnpm run build` passes with no errors
-      11. Run `pnpm test` to verify all tests pass
-
-      After implementation, before reporting done:
-      12. Add a changeset file (`.changeset/<short-slug>.md`) with correct bump type:
-          - `patch` for bug fixes, internal refactors, test-only changes
-          - `minor` for new features, new CLI commands, new API surfaces
-          - `major` for breaking changes
-          List every affected package in the changeset frontmatter.
-      13. Update documentation if the change affects user-facing behavior:
-          - `README.md` — usage examples, feature descriptions
-          - `.cards/` — architecture decision records (if applicable)
-          - CLI prompt subcommand output (if CLI help text changes)
-          - CLI `--help` text (if flags/commands are added or changed)
+      10. Ensure `bun run build` passes with no errors
+      11. Run `bun test` to verify all tests pass

      If you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,
      or repeated attempts fail), set $status=failed with a reason.
@@ -139,8 +127,8 @@ roles:

      Then perform code review:
      Hard checks (must all pass):
-      3. `pnpm run build` — no build errors
-      4. `pnpm run check` — no lint violations
+      3. `bun run build` — no build errors
+      4. `bunx biome check` — no lint violations
      5. TypeScript strict mode — no type errors

      Soft checks (review against project conventions if CLAUDE.md / .cursor/rules exist):
@@ -148,14 +136,6 @@ roles:
      - No `console.log` in production code
      - No dynamic imports in production code

-      Documentation & changeset checks:
-      6. Changeset exists in `.changeset/` with correct bump type (`patch`/`minor`/`major`) and lists all affected packages
-      7. If the change is user-facing, documentation is updated:
-         - `README.md` reflects new/changed behavior
-         - `.cards/` architecture cards updated if design decisions changed
-         - CLI prompt subcommand output updated (if it generates skill/reference content)
-         - CLI `--help` text matches new flags/commands
-
      Only review standards compliance. Do NOT test functionality.
      If rejecting, you MUST explain the specific reason in your output.
    output: "Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments)."
@@ -179,7 +159,7 @@ roles:
    procedure: |
      The worktree path is provided in your task prompt. cd into it first.

-      1. Run `pnpm test` for automated test verification
+      1. Run `bun test` for automated test verification
      2. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner step in the thread history)
      3. Verify each scenario in the spec is covered and passing
      4. Determine outcome:
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/agent-claude-code",
-  "version": "0.1.3",
+  "version": "0.1.2",
  "files": [
    "src",
    "dist",
@@ -176,12 +176,8 @@ async function runClaudeCode(ctx: AgentContext, model: string | null): Promise<A

  log("K7R2M4N8", `prompt for role=${ctx.role} (length=${fullPrompt.length}):\n${fullPrompt}`);

-  // Try resuming a cached session.  This covers both normal re-entry
-  // (e.g. reviewer reject → developer re-entry) AND the case where a
-  // previous run completed but frontmatter validation failed — the step
-  // was never written to CAS so isFirstVisit is still true, but the
-  // session cache holds a valid session we should resume.
-  {
+  // Try resuming a cached session for re-entry scenarios (e.g. reviewer reject → developer re-entry).
+  if (!ctx.isFirstVisit) {
    const cachedSessionId = await getCachedSessionId(
      "claude-code",
      ctx.threadId,
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/agent-hermes",
-  "version": "0.1.4",
+  "version": "0.1.3",
  "files": [
    "src",
    "dist",
@@ -12,11 +12,7 @@ const OWN_VERSION = (
  }
 ).version;

-/** Resolve hermes binary: `UWF_HERMES_BIN` override → default `"hermes"` via PATH. */
-function resolveHermesCommand(): string {
-  const override = process.env.UWF_HERMES_BIN;
-  return override !== undefined && override !== "" ? override : "hermes";
-}
+const HERMES_COMMAND = "hermes";
 const PROTOCOL_VERSION = 1;

 type JsonRpcResponse = {
@@ -275,8 +271,7 @@ export class HermesAcpClient {
      return;
    }

-    const hermesCommand = resolveHermesCommand();
-    const child = spawn(hermesCommand, ["acp"], {
+    const child = spawn(HERMES_COMMAND, ["acp"], {
      env: process.env,
      shell: false,
      stdio: ["pipe", "pipe", "pipe"],
@@ -110,15 +110,11 @@ async function prepareSession(
  cwd: string,
  resumeDisabled: boolean,
 ): Promise<PromptAttempt> {
-  if (resumeDisabled) {
+  if (ctx.isFirstVisit || resumeDisabled) {
    await client.connect(cwd);
    return { useContinuation: false, resumed: false };
  }

-  // Check session cache regardless of isFirstVisit.  A previous run may
-  // have completed and cached its session but failed frontmatter
-  // validation — the step never got written to CAS so isFirstVisit is
-  // still true, yet we should resume the existing session.
  const cachedSessionId = await getCachedSessionId(ctx.threadId, ctx.role, ctx.storageRoot);
  if (cachedSessionId === null) {
    log("6RWK3N8Q", `no cached session for ${ctx.threadId}:${ctx.role}, starting new session`);
@@ -21,11 +21,11 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
    "..",
    "..",
    "..",
-    "examples",
+    ".workflows",
    "solve-issue.yaml",
  );

-  test("committer procedure should create PR via tea pr create", async () => {
+  test("committer procedure should use curl API instead of tea pr create", async () => {
    const yamlContent = await readFile(workflowPath, "utf-8");
    const workflow = parse(yamlContent) as WorkflowPayload;

@@ -33,22 +33,25 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
    const committerProcedure = workflow.roles.committer?.procedure;
    expect(committerProcedure).toBeDefined();

-    // Verify the procedure uses tea pr create for PR creation
-    expect(committerProcedure).toContain("tea pr create");
-    expect(committerProcedure).toContain("git push");
-    expect(committerProcedure).toContain("Fixes #N");
+    // Verify the procedure uses curl API, not tea pr create
+    expect(committerProcedure).toContain("curl");
+    expect(committerProcedure).toContain("api/v1/repos");
+    expect(committerProcedure).toContain("/pulls");
+
+    // Verify it explicitly warns against tea pr create
+    expect(committerProcedure).toMatch(/do NOT use.*tea pr create/i);
  });

-  test("committer procedure should extract owner/repo from git remote", async () => {
+  test("committer procedure should reference repoRemote from task prompt", async () => {
    const yamlContent = await readFile(workflowPath, "utf-8");
    const workflow = parse(yamlContent) as WorkflowPayload;

    const committerProcedure = workflow.roles.committer?.procedure;
    expect(committerProcedure).toBeDefined();

-    // Verify the procedure extracts owner/repo from remote
-    expect(committerProcedure).toContain("git remote get-url origin");
-    expect(committerProcedure).toContain("hook_failed");
+    // Verify the procedure mentions repoRemote is provided in task prompt
+    expect(committerProcedure).toMatch(/repo remote.*provided.*task prompt/i);
+    expect(committerProcedure).toMatch(/owner\/repo/i);
  });

  test("committer procedure should include error handling for curl failures", async () => {
@@ -97,42 +100,45 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
    expect(committedVariant.required).toContain("$status");
  });

-  test("developer procedure should include worktree setup", async () => {
+  test("developer procedure should include mandatory verification step", async () => {
    const yamlContent = await readFile(workflowPath, "utf-8");
    const workflow = parse(yamlContent) as WorkflowPayload;

    const developerProcedure = workflow.roles.developer?.procedure;
    expect(developerProcedure).toBeDefined();

-    // Verify the procedure includes worktree setup
-    expect(developerProcedure).toContain("IMPORTANT");
-    expect(developerProcedure).toContain("git worktree add");
-    expect(developerProcedure).toContain("pnpm install");
+    // Verify the procedure includes mandatory verification step
+    expect(developerProcedure).toContain("MANDATORY VERIFICATION");
+    expect(developerProcedure).toContain("git branch --show-current");
+    expect(developerProcedure).toContain("git status");
+    expect(developerProcedure).toMatch(/ls -la|verify.*exist/i);
  });

-  test("reviewer procedure should verify branch and run checks", async () => {
+  test("reviewer procedure should enforce worktree path verification", async () => {
    const yamlContent = await readFile(workflowPath, "utf-8");
    const workflow = parse(yamlContent) as WorkflowPayload;

    const reviewerProcedure = workflow.roles.reviewer?.procedure;
    expect(reviewerProcedure).toBeDefined();

-    // Verify the procedure includes branch verification and build checks
-    expect(reviewerProcedure).toContain("git branch --show-current");
-    expect(reviewerProcedure).toContain("pnpm run build");
-    expect(reviewerProcedure).toContain("pnpm run check");
+    // Verify the procedure includes critical enforcement
+    expect(reviewerProcedure).toContain("CRITICAL");
+    expect(reviewerProcedure).toMatch(/cd.*pwd/);
+    expect(reviewerProcedure).toContain(
+      "Do NOT report results without running the actual commands",
+    );
  });

-  test("developer procedure should include changeset and failure handling", async () => {
+  test("developer procedure should include test debugging escalation", async () => {
    const yamlContent = await readFile(workflowPath, "utf-8");
    const workflow = parse(yamlContent) as WorkflowPayload;

    const developerProcedure = workflow.roles.developer?.procedure;
    expect(developerProcedure).toBeDefined();

-    // Verify the procedure includes changeset requirement and failure path
-    expect(developerProcedure).toContain(".changeset/");
+    // Verify the procedure includes test failure guidance
+    expect(developerProcedure).toMatch(/tests fail.*first run/i);
+    expect(developerProcedure).toMatch(/3 test cycles|after 3 attempts/i);
    expect(developerProcedure).toContain("$status=failed");
-    expect(developerProcedure).toContain("pnpm test");
  });
 });
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/eval",
-  "version": "0.1.5",
+  "version": "0.1.3",
  "private": false,
  "files": [
    "src",
@@ -6,7 +6,7 @@ import { formatList, selectEntries } from "./format.js";
 import { readEvalEntries } from "./read.js";

 const log = createLogger({ sink: { kind: "stderr" } });
-const LOG_LIST = "H5KX9R2B";
+const LOG_LIST = "L5KX9R2B";

 type ListCliOptions = {
  task: string | undefined;
@@ -225,34 +225,4 @@ describe("buildOutputFormatInstruction", () => {
    const result = buildOutputFormatInstruction({});
    expect(result).toContain("Focus exclusively on YOUR role");
  });
-
-  test("renders const value as literal in flat schema example", () => {
-    const schema = {
-      type: "object",
-      properties: {
-        $status: { type: "string", const: "greeted" },
-        message: { type: "string" },
-      },
-      required: ["$status", "message"],
-    };
-    const result = buildOutputFormatInstruction(schema);
-    expect(result).toContain("$status: greeted");
-    expect(result).toContain("fixed value");
-    expect(result).not.toContain("$status: <string>");
-  });
-
-  test("renders const value for non-string types", () => {
-    const schema = {
-      type: "object",
-      properties: {
-        count: { type: "number", const: 42 },
-        done: { type: "boolean", const: true },
-      },
-      required: ["count", "done"],
-    };
-    const result = buildOutputFormatInstruction(schema);
-    expect(result).toContain("count: 42");
-    expect(result).toContain("done: true");
-    expect(result).toContain("fixed value");
-  });
 });
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/util-agent",
-  "version": "0.1.1",
+  "version": "0.1.0",
  "files": [
    "src",
    "dist",
@@ -74,10 +74,6 @@ function collectObjectSchemas(schema: JSONSchema): JSONSchema[] {
 }

 function resolvePropertySchema(prop: JSONSchema): JSONSchema {
-  if (prop.const !== undefined) {
-    return prop;
-  }
-
  if (Array.isArray(prop.enum) && prop.enum.length > 0) {
    return prop;
  }
@@ -117,11 +113,6 @@ function buildPropertyExampleLine(prop: SchemaProperty): string {
    commentParts.push("required");
  }

-  if (resolved.const !== undefined) {
-    commentParts.push("fixed value");
-    return `${prop.name}: ${formatYamlScalar(resolved.const)}${buildPropertyComment(commentParts)}`;
-  }
-
  if (Array.isArray(resolved.enum) && resolved.enum.length > 0) {
    const enumValues = resolved.enum.map((v) => String(v));
    commentParts.push(...enumValues);
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/util",
-  "version": "0.1.4",
+  "version": "0.1.3",
  "files": [
    "src",
    "dist",
@@ -0,0 +1,329 @@
+name: solve-issue
+description: TDD-driven issue resolution adapted for the workflow monorepo with bun + vitest
+roles:
+  planner:
+    description: Analyzes issue and outputs a TDD test spec
+    goal: You are a planning agent. You analyze Gitea issues and produce a TDD test specification that downstream roles will implement and verify.
+    capabilities:
+    - issue-analysis
+    - planning
+    procedure: 'On first run (no previous steps):
+
+      1. Read the issue and all comments from Gitea using `tea issues <number> -r <owner/repo>`
+
+      2. Look for project conventions files (CLAUDE.md, CONTRIBUTING.md) in the repo
+
+      3. Assess whether the issue has enough information to produce a test spec
+
+      4. If insufficient info: comment on the issue via `echo "..." | tea comment <number> -r <owner/repo>` (skip if you already commented), then output $status=insufficient_info
+
+      5. If sufficient: produce a detailed TDD test spec in markdown covering all scenarios
+
+
+      On subsequent runs (bounced back by tester with fix_spec):
+
+      1. Read the tester''s output from the previous step to understand what''s wrong with the spec
+
+      2. Revise the test spec accordingly
+
+
+      After producing the test spec:
+
+      1. The test spec is stored in CAS automatically by the uwf pipeline (agents do not need to call `ocas put` directly)
+
+      2. Put the hash in frontmatter.plan (required when $status=ready)
+
+      3. Set repoPath to the absolute path of the repository root
+
+
+
+      IMPORTANT: Extract the repo remote (owner/repo) from git:
+
+      ```bash
+
+      git remote get-url origin | sed ''s|.*[:/]\([^/]*/[^.]*\).*|\1|''
+
+      ```
+
+      Store the result as repoRemote in your frontmatter output so downstream roles can use it for tea/API calls.'
+    output: Output a brief summary of the test spec. Set $status to ready (with plan hash and repoPath) or insufficient_info.
+    frontmatter:
+      oneOf:
+      - properties:
+          $status:
+            const: ready
+          plan:
+            type: string
+          repoPath:
+            type: string
+          repoRemote:
+            type: string
+        required:
+        - $status
+        - plan
+        - repoPath
+      - properties:
+          $status:
+            const: insufficient_info
+          reason:
+            type: string
+        required:
+        - $status
+        - reason
+  developer:
+    description: TDD implementation per test spec
+    goal: You are a developer agent. You implement code changes following TDD — write tests first, then implementation.
+    capabilities:
+    - coding
+    procedure: "IMPORTANT: Always work in a git worktree, NEVER modify the main working directory directly.\nThe repo path and other details are provided in your task prompt.\n\nBefore starting any work,\
+      \ set up an isolated worktree:\n1. cd into the repo path provided in your task prompt\n2. `git fetch origin` to get latest refs\n3. First time (no existing branch):\n   - `git worktree add .worktrees/fix/<issue-number>-<short-slug>\
+      \ -b fix/<issue-number>-<short-slug> origin/main`\n   - `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`\n4. If bounced back from reviewer or tester (branch already exists):\n   - cd\
+      \ into the existing worktree under `.worktrees/fix/<issue-number>-<short-slug>`\n   - `git fetch origin && git rebase origin/main`\n5. ALL subsequent work must happen inside the worktree directory.\n\
+      \nThen implement TDD:\n6. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner's output in your task prompt)\n7. If bounced back from reviewer or tester: read the\
+      \ previous role's feedback in your task prompt\n8. Write tests first based on the spec (use vitest)\n9. Implement the code to make tests pass\n10. Ensure `bun run build` passes with no errors\n11.\
+      \ Run `bun test` to verify all tests pass\n\nIf you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,\nor repeated attempts fail), set $status=failed\
+      \ with a reason.\n"
+    output: List all files changed and provide a summary. Set $status to done (with branch/worktree), or failed (with reason).
+    frontmatter:
+      oneOf:
+      - properties:
+          $status:
+            const: done
+          branch:
+            type: string
+          worktree:
+            type: string
+          repoRemote:
+            type: string
+        required:
+        - $status
+        - branch
+        - worktree
+      - properties:
+          $status:
+            const: failed
+          reason:
+            type: string
+          repoRemote:
+            type: string
+        required:
+        - $status
+        - reason
+  reviewer:
+    description: Code standards compliance check
+    goal: You are a code reviewer. You verify code standards compliance — NOT functionality (that's the tester's job).
+    capabilities:
+    - code-review
+    - static-analysis
+    procedure: 'The worktree path is provided in your task prompt. cd into it first.
+
+
+      Before reviewing, verify the git branch:
+
+      1. Run `git branch --show-current` — confirm the branch name references the issue number being worked on
+
+      2. If the branch doesn''t correspond to the issue, flag it in your output and reject
+
+
+      Then perform code review:
+
+      Hard checks (must all pass):
+
+      3. `bun run build` — no build errors
+
+      4. `bunx biome check` — no lint violations
+
+      5. TypeScript strict mode — no type errors
+
+
+      Soft checks (review against project conventions from CLAUDE.md):
+
+      - Functional-first: functions + types, no classes (except for errors or third-party requirements)
+
+      - Named exports only, no default exports
+
+      - No optional properties (use `T | null` instead of `?:`)
+
+      - Folder module discipline: index.ts only re-exports, types in types.ts
+
+      - Crockford Base32 log tags (8-char, unique per call site)
+
+      - No `console.log` in production code (use createLogger from @united-workforce/util)
+
+      - No dynamic imports in production code
+
+
+      Only review standards compliance. Do NOT test functionality.
+
+      If rejecting, you MUST explain the specific reason in your output.
+
+      '
+    output: Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments).
+    frontmatter:
+      oneOf:
+      - properties:
+          $status:
+            const: approved
+          branch:
+            type: string
+          worktree:
+            type: string
+          repoRemote:
+            type: string
+        required:
+        - $status
+        - branch
+        - worktree
+      - properties:
+          $status:
+            const: rejected
+          comments:
+            type: string
+          worktree:
+            type: string
+          repoRemote:
+            type: string
+        required:
+        - $status
+        - comments
+        - worktree
+  tester:
+    description: Functional correctness verification
+    goal: You are a tester agent. You verify that the implementation correctly satisfies every scenario in the test spec.
+    capabilities:
+    - testing
+    procedure: "The worktree path is provided in your task prompt. cd into it first.\n\n1. Run `bun test` for automated test verification\n2. Read the test spec from CAS: `ocas get <plan hash>` (find\
+      \ the hash from the planner step in the thread history)\n3. Verify each scenario in the spec is covered and passing\n4. Determine outcome:\n   - passed: all scenarios verified, tests pass\n   - fix_code:\
+      \ tests fail or implementation doesn't match spec → send back to developer\n   - fix_spec: the spec itself is wrong or incomplete → send back to planner\n"
+    output: Report test results per scenario. Set $status to passed (with branch/worktree), fix_code (with report), or fix_spec (with report).
+    frontmatter:
+      oneOf:
+      - properties:
+          $status:
+            const: passed
+          branch:
+            type: string
+          worktree:
+            type: string
+          repoRemote:
+            type: string
+        required:
+        - $status
+        - branch
+        - worktree
+      - properties:
+          $status:
+            const: fix_code
+          report:
+            type: string
+          repoRemote:
+            type: string
+          worktree:
+            type: string
+          branch:
+            type: string
+        required:
+        - $status
+        - report
+      - properties:
+          $status:
+            const: fix_spec
+          report:
+            type: string
+          repoRemote:
+            type: string
+          worktree:
+            type: string
+          branch:
+            type: string
+        required:
+        - $status
+        - report
+  committer:
+    description: Commits and creates PR
+    goal: You are a committer agent. You create a clean commit and push a PR linking the original issue.
+    capabilities: []
+    procedure: "The worktree path, branch name, and repo remote (owner/repo) are provided in your task prompt.\ncd into the worktree first.\n\nNote: You inherit the developer's worktree and branch. Do NOT\
+      \ create a new branch.\n1. Stage all changes: `git add -A`\n2. Commit with a descriptive message referencing the issue: `git commit -m \"type: description\\n\\nFixes #N\"`\n3. Push the branch: `git\
+      \ push -u origin <branch-name>`\n4. **Verify push succeeded** — run `git ls-remote origin <branch-name>` and confirm it prints a commit hash.\n   - If no output or push failed: capture the error, mark hook_failed\n\
+      5. Create a PR using the Gitea API (do NOT use `tea pr create` — it fails in worktrees):\n   ```bash\n   GITEA_TOKEN=$(cfg get GITEA_TOKEN)\n   curl -s -X POST -H \"Authorization: token $GITEA_TOKEN\" -H \"Content-Type: application/json\" \\\n\
+      \     \"https://git.shazhou.work/api/v1/repos/<owner>/<repo>/pulls\" \\\n     -d '{\"title\":\"...\",\"body\":\"...\",\"head\":\"<branch>\",\"base\":\"main\"}'\n   ```\n   - The repo remote (owner/repo format, e.g. \"shazhou/united-workforce\") is given in your task prompt — use it directly.\n\
+      \   - PR body must include: What / Why / Changes / Ref sections, with `Fixes #N` in Ref\n6. **Verify PR was created** — parse the curl response JSON: it must contain a `\"number\"` field. Print the PR URL.\n\
+      \   - If curl returns an error or no number field: capture the response, mark hook_failed\n7. After PR creation, clean up the worktree:\n   - cd to the repo root (parent of .worktrees)\n   - `git worktree remove <worktree-path>`"
+    output: Include PR URL on success or error log on failure. Set $status to committed (with prUrl) or hook_failed (with error).
+    frontmatter:
+      oneOf:
+      - properties:
+          $status:
+            const: committed
+          prUrl:
+            type: string
+          repoRemote:
+            type: string
+          worktree:
+            type: string
+          branch:
+            type: string
+        required:
+        - $status
+        - prUrl
+      - properties:
+          $status:
+            const: hook_failed
+          error:
+            type: string
+          repoRemote:
+            type: string
+          worktree:
+            type: string
+          branch:
+            type: string
+        required:
+        - $status
+        - error
+graph:
+  $START:
+    new:
+      role: planner
+      prompt: Analyze the issue and produce an implementation plan.
+    resume:
+      role: planner
+      prompt: Review the previous run output and continue the work.
+  planner:
+    insufficient_info:
+      role: $SUSPEND
+      prompt: "信息不足，需要补充：{{{reason}}}"
+    ready:
+      role: developer
+      prompt: 'Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}. Repo remote: {{{repoRemote}}}.'
+  developer:
+    done:
+      role: reviewer
+      prompt: 'Review branch {{{branch}}} at {{{worktree}}} for code standards compliance. Repo remote: {{{repoRemote}}}.'
+    failed:
+      role: $END
+      prompt: 'Developer failed: {{{reason}}}. Ending workflow.'
+  reviewer:
+    rejected:
+      role: developer
+      prompt: 'Reviewer rejected: {{{comments}}}. Fix the issues in repo {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
+    approved:
+      role: tester
+      prompt: 'Review passed. Run tests on branch {{{branch}}} at {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
+  tester:
+    fix_code:
+      role: developer
+      prompt: 'Tests found code issues: {{{report}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
+    fix_spec:
+      role: planner
+      prompt: 'Tests found spec issues: {{{report}}}. Revise the test spec. Repo remote: {{{repoRemote}}}.'
+    passed:
+      role: committer
+      prompt: 'All tests passed. Commit and push branch {{{branch}}} from {{{worktree}}}. Repo remote (owner/repo): {{{repoRemote}}}.'
+  committer:
+    hook_failed:
+      role: developer
+      prompt: 'Push hook failed: {{{error}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
+    committed:
+      role: $END
+      prompt: 'PR created: {{{prUrl}}}. Workflow complete.'