Compare commits

..

6 Commits

Author SHA1 Message Date
xiaoju 32bbea0f32 fix: decouple session resume from isFirstVisit guard
CI / check (pull_request) Successful in 1m58s
When frontmatter validation fails, the step is never written to CAS, so
isFirstVisit remains true on the next run.  Both agent-claude-code and
agent-hermes gated session cache lookup behind !isFirstVisit, which
caused them to start a fresh session (and a new worktree) instead of
resuming the one that already has all the work done.

Remove the isFirstVisit guard from both adapters so they always check
the session cache.  If no cache entry exists, the behavior is unchanged
(fresh session).  If a cache entry exists, the agent resumes the
previous session regardless of CAS step state.

Fixes #139
2026-06-07 01:49:04 +00:00
xiaomo adf7837975 Merge pull request 'chore: add changeset + doc update requirements to solve-issue workflow' (#138) from chore/workflow-changeset-docs into main
CI / check (push) Successful in 2m0s
Merge PR #138: chore: add changeset + doc update requirements to solve-issue workflow
2026-06-06 23:09:17 +00:00
xiaoju 513846f4ab fix: update solve-issue test path from .workflows/ to examples/
CI / check (pull_request) Successful in 1m52s
Tests were referencing the old .workflows/ directory which no longer exists.
Updated workflow path and aligned assertions with current procedure content.

小橘 🍊(NEKO Team)
2026-06-06 23:01:33 +00:00
xiaoju aee123cc82 chore: add changeset + doc update requirements to solve-issue workflow
CI / check (pull_request) Failing after 2m4s
Developer: steps 12-13 — add changeset with correct bump type, update docs
Reviewer: checks 6-7 — verify changeset exists, docs updated for user-facing changes

Synced from ocas PR #86.
小橘 🍊
2026-06-06 22:45:42 +00:00
xiaoju 8ddada5879 chore: clean up workflow YAML — bun→pnpm, enum→const, deduplicate
CI / check (push) Failing after 3m6s
- solve-issue.yaml: bun→pnpm (5 refs), examples/ is now canonical
- Delete redundant workflows/solve-issue.yaml and .workflows/solve-issue.yaml
- analyze-topic.yaml + eval-simple.yaml: enum→const for $status
- Archive normalize-bun-monorepo.yaml and e2e-walkthrough.yaml to legacy-packages/

Closes #137
小橘 🍊
2026-06-06 10:56:28 +00:00
xiaoju aa732f5466 chore: bump eval to 0.1.5
CI / check (push) Successful in 3m56s
Fix workspace:^ not being replaced in 0.1.4 publish (was published with npm instead of pnpm).

小橘 🍊
2026-06-06 08:57:24 +00:00
11 changed files with 65 additions and 620 deletions
-247
View File
@@ -1,247 +0,0 @@
name: "solve-issue"
description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds."
roles:
planner:
description: "Analyzes issue and outputs a TDD test spec"
goal: "You are a planning agent. You analyze Gitea issues and produce a TDD test specification that downstream roles will implement and verify."
capabilities:
- issue-analysis
- planning
procedure: |
On first run (no previous steps):
1. Read the issue and all comments from Gitea using `tea issues <number> -r <owner/repo>`
2. Look for project conventions files (CLAUDE.md, CONTRIBUTING.md, .cursor/rules/) in the repo
3. Assess whether the issue has enough information to produce a test spec
4. If insufficient info: comment on the issue via `echo "..." | tea comment <number> -r <owner/repo>` (skip if you already commented), then output $status=insufficient_info
5. If sufficient: produce a detailed TDD test spec in markdown covering all scenarios
On subsequent runs (bounced back by tester with fix_spec):
1. Read the tester's output from the previous step to understand what's wrong with the spec
2. Revise the test spec accordingly
After producing the test spec:
1. The test spec is stored in CAS automatically by the uwf pipeline (agents do not need to call `ocas put` directly)
2. Put the plan hash in frontmatter.plan (required when $status=ready)
3. Set repoPath to the absolute path of the repository root
IMPORTANT: Extract the repo remote (owner/repo) from git:
```bash
git remote get-url origin | sed 's|.*[:/]\([^/]*/[^.]*\).*|\1|'
```
Store the result as repoRemote in your frontmatter output so downstream roles can use it for tea/API calls.
output: "Output a brief summary of the test spec. Set $status to ready (with plan hash and repoPath) or insufficient_info."
frontmatter:
oneOf:
- properties:
$status: { const: "ready" }
plan: { type: string }
repoPath: { type: string }
repoRemote: { type: string }
required: [$status, plan, repoPath, repoRemote]
- properties:
$status: { const: "insufficient_info" }
reason: { type: string }
required: [$status, reason]
developer:
description: "TDD implementation per test spec"
goal: "You are a developer agent. You implement code changes following TDD — write tests first, then implementation."
capabilities:
- coding
procedure: |
IMPORTANT: Always work in a git worktree, NEVER modify the main working directory directly.
The repo path and other details are provided in your task prompt.
Before starting any work, set up an isolated worktree:
1. cd into the repo path provided in your task prompt
2. `git fetch origin` to get latest refs
3. First time (no existing branch):
- `git worktree add .worktrees/fix/<issue-number>-<short-slug> -b fix/<issue-number>-<short-slug> origin/main`
- `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`
4. If bounced back from reviewer or tester (branch already exists):
- cd into the existing worktree under `.worktrees/fix/<issue-number>-<short-slug>`
- `git fetch origin && git rebase origin/main`
5. ALL subsequent work must happen inside the worktree directory.
Then implement TDD:
6. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner's output in your task prompt)
7. If bounced back from reviewer or tester: read the previous role's feedback in your task prompt
8. Write tests first based on the spec
9. Implement the code to make tests pass
10. Ensure `bun run build` passes with no errors
11. Run `bun test` to verify all tests pass
- If tests fail on first run:
* Read the test output carefully for missing imports or setup issues
* Check if you're running tests from the correct working directory (package root vs workspace root)
* Fix the immediate issue and rerun ONCE
* If tests still fail after 2 attempts: check the test spec for ambiguities
* If stuck after 3 test cycles: set $status=failed with detailed error report rather than continuing blind retries
12. MANDATORY VERIFICATION before reporting done:
- Run `git branch --show-current` and confirm branch name matches expected
- Run `git status` and verify changed files exist
- Run `ls -la <key-implementation-files>` to verify they exist on disk
- If ANY verification fails: retry the implementation, do NOT report done
If you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,
or repeated attempts fail), set $status=failed with a reason.
output: "List all files changed and provide a summary. Set $status to done (with branch/worktree), or failed (with reason)."
frontmatter:
oneOf:
- properties:
$status: { const: "done" }
branch: { type: string }
worktree: { type: string }
repoRemote: { type: string }
required: [$status, branch, worktree]
- properties:
$status: { const: "failed" }
reason: { type: string }
required: [$status, reason]
reviewer:
description: "Code standards compliance check"
goal: "You are a code reviewer. You verify code standards compliance — NOT functionality (that's the tester's job)."
capabilities:
- code-review
- static-analysis
procedure: |
The worktree path is provided in your task prompt. cd into it first.
CRITICAL: You MUST execute every verification command below. Do NOT report results without running the actual commands. Do NOT rely on prior context or assumptions.
Before reviewing, verify the worktree and branch exist:
0. Run `cd <worktree-path> && pwd` to confirm the path is accessible
- If the cd fails: the worktree truly doesn't exist, reject with that reason
- If the cd succeeds: proceed with step 1 below
1. Run `git branch --show-current` — confirm the branch name references the issue number being worked on
2. If the branch doesn't correspond to the issue, flag it in your output and reject
Then perform code review:
Hard checks (must all pass):
3. `bun run build` — no build errors
4. `bunx biome check` — no lint violations
5. TypeScript strict mode — no type errors
Soft checks (review against project conventions if CLAUDE.md / .cursor/rules exist):
- Naming conventions, module boundaries, code style
- No `console.log` in production code
- No dynamic imports in production code
Only review standards compliance. Do NOT test functionality.
If rejecting, you MUST explain the specific reason in your output.
output: "Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments)."
frontmatter:
oneOf:
- properties:
$status: { const: "approved" }
branch: { type: string }
worktree: { type: string }
repoRemote: { type: string }
required: [$status, branch, worktree]
- properties:
$status: { const: "rejected" }
comments: { type: string }
worktree: { type: string }
repoRemote: { type: string }
required: [$status, comments, worktree]
tester:
description: "Functional correctness verification"
goal: "You are a tester agent. You verify that the implementation correctly satisfies every scenario in the test spec."
capabilities:
- testing
procedure: |
The worktree path is provided in your task prompt. cd into it first.
1. Run `bun test` for automated test verification
2. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner step in the thread history)
3. Verify each scenario in the spec is covered and passing
4. Determine outcome:
- passed: all scenarios verified, tests pass
- fix_code: tests fail or implementation doesn't match spec → send back to developer
- fix_spec: the spec itself is wrong or incomplete → send back to planner
output: "Report test results per scenario. Set $status to passed (with branch/worktree), fix_code (with report), or fix_spec (with report)."
frontmatter:
oneOf:
- properties:
$status: { const: "passed" }
branch: { type: string }
worktree: { type: string }
repoRemote: { type: string }
required: [$status, branch, worktree]
- properties:
$status: { const: "fix_code" }
report: { type: string }
repoRemote: { type: string }
worktree: { type: string }
branch: { type: string }
required: [$status, report]
- properties:
$status: { const: "fix_spec" }
report: { type: string }
repoRemote: { type: string }
worktree: { type: string }
branch: { type: string }
required: [$status, report]
committer:
description: "Commits and creates PR"
goal: "You are a committer agent. You create a clean commit and push a PR linking the original issue."
capabilities: []
procedure: |
The worktree path, branch name, and repo remote (owner/repo) are provided in your task prompt.
cd into the worktree first.
Note: You inherit the developer's worktree and branch. Do NOT create a new branch.
1. Check `git status` — if working tree is clean and branch is ahead of origin, skip to step 3 (push).
2. If there are unstaged/uncommitted changes: `git add -A` then `git commit -m "type: description\n\nFixes #N"`
3. Push the branch: `git push -u origin <branch-name>`
4. **Verify push succeeded** — run `git ls-remote origin <branch-name>` and confirm it prints a commit hash.
- If no output or push failed: capture the error, mark hook_failed
5. Create a PR using the Gitea API (do NOT use `tea pr create` — it fails in worktrees):
```bash
GITEA_TOKEN=$(cfg get GITEA_TOKEN)
curl -s -X POST -H "Authorization: token $GITEA_TOKEN" -H "Content-Type: application/json" \
"https://git.shazhou.work/api/v1/repos/<owner>/<repo>/pulls" \
-d '{"title":"...","body":"...","head":"<branch>","base":"main"}'
```
- The repo remote (owner/repo format, e.g. "shazhou/united-workforce") is given in your task prompt — use it directly.
- PR body must include: What / Why / Changes / Ref sections, with `Fixes #N` in Ref
6. **Verify PR was created** — parse the curl response JSON: it must contain a `"number"` field. Print the PR URL.
- If curl returns an error or no number field: capture the response, mark hook_failed
7. After PR creation, clean up the worktree:
- cd to the repo root (parent of .worktrees)
- `git worktree remove <worktree-path>`
output: "Include PR URL on success or error log on failure. Set $status to committed (with prUrl) or hook_failed (with error)."
frontmatter:
oneOf:
- properties:
$status: { const: "committed" }
prUrl: { type: string }
repoRemote: { type: string }
worktree: { type: string }
branch: { type: string }
required: [$status, prUrl]
- properties:
$status: { const: "hook_failed" }
error: { type: string }
repoRemote: { type: string }
worktree: { type: string }
branch: { type: string }
required: [$status, error]
graph:
$START:
new: { role: "planner", prompt: "Analyze the issue and produce an implementation plan." }
resume: { role: "planner", prompt: "Review the previous run output and continue the work." }
planner:
insufficient_info: { role: "$SUSPEND", prompt: "信息不足,需要补充:{{{reason}}}" }
ready: { role: "developer", prompt: "Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}. Repo remote: {{{repoRemote}}}." }
developer:
done: { role: "reviewer", prompt: "Review branch {{{branch}}} at {{{worktree}}} for code standards compliance. Repo remote: {{{repoRemote}}}." }
failed: { role: "$END", prompt: "Developer failed: {{{reason}}}. Ending workflow." }
reviewer:
rejected: { role: "developer", prompt: "Reviewer rejected: {{{comments}}}. Fix the issues in repo {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
approved: { role: "tester", prompt: "Review passed. Run tests on branch {{{branch}}} at {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
tester:
fix_code: { role: "developer", prompt: "Tests found code issues: {{{report}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
fix_spec: { role: "planner", prompt: "Tests found spec issues: {{{report}}}. Revise the test spec. Repo remote: {{{repoRemote}}}." }
passed: { role: "committer", prompt: "All tests passed. Commit and push branch {{{branch}}} from {{{worktree}}}. Repo remote (owner/repo): {{{repoRemote}}}." }
committer:
hook_failed: { role: "developer", prompt: "Push hook failed: {{{error}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
committed: { role: "$END", prompt: "PR created: {{{prUrl}}}. Workflow complete." }
+1 -1
View File
@@ -23,7 +23,7 @@ roles:
type: object
properties:
$status:
enum: ["done"]
const: done
thesis:
type: string
keyPoints:
+1 -2
View File
@@ -18,8 +18,7 @@ roles:
type: object
properties:
$status:
type: string
enum: [done]
const: done
summary:
type: string
required: [$status, summary]
+27 -7
View File
@@ -1,5 +1,5 @@
name: "solve-issue"
description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds."
description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds. Uses pnpm."
roles:
planner:
description: "Analyzes issue and outputs a TDD test spec"
@@ -80,7 +80,7 @@ roles:
2. `git fetch origin` to get latest refs
3. First time (no existing branch):
- `git worktree add .worktrees/fix/<issue-number>-<short-slug> -b fix/<issue-number>-<short-slug> origin/main`
- `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`
- `cd .worktrees/fix/<issue-number>-<short-slug> && pnpm install`
4. If continuing on existing branch (prompt says "Continue work on existing branch" or provides a worktree path):
- cd directly into the worktree path provided in the prompt
- `git fetch origin && git rebase origin/main`
@@ -95,8 +95,20 @@ roles:
7. If bounced back from reviewer or tester: read the previous role's feedback in your task prompt
8. Write tests first based on the spec
9. Implement the code to make tests pass
10. Ensure `bun run build` passes with no errors
11. Run `bun test` to verify all tests pass
10. Ensure `pnpm run build` passes with no errors
11. Run `pnpm test` to verify all tests pass
After implementation, before reporting done:
12. Add a changeset file (`.changeset/<short-slug>.md`) with correct bump type:
- `patch` for bug fixes, internal refactors, test-only changes
- `minor` for new features, new CLI commands, new API surfaces
- `major` for breaking changes
List every affected package in the changeset frontmatter.
13. Update documentation if the change affects user-facing behavior:
- `README.md` — usage examples, feature descriptions
- `.cards/` — architecture decision records (if applicable)
- CLI prompt subcommand output (if CLI help text changes)
- CLI `--help` text (if flags/commands are added or changed)
If you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,
or repeated attempts fail), set $status=failed with a reason.
@@ -127,8 +139,8 @@ roles:
Then perform code review:
Hard checks (must all pass):
3. `bun run build` — no build errors
4. `bunx biome check` — no lint violations
3. `pnpm run build` — no build errors
4. `pnpm run check` — no lint violations
5. TypeScript strict mode — no type errors
Soft checks (review against project conventions if CLAUDE.md / .cursor/rules exist):
@@ -136,6 +148,14 @@ roles:
- No `console.log` in production code
- No dynamic imports in production code
Documentation & changeset checks:
6. Changeset exists in `.changeset/` with correct bump type (`patch`/`minor`/`major`) and lists all affected packages
7. If the change is user-facing, documentation is updated:
- `README.md` reflects new/changed behavior
- `.cards/` architecture cards updated if design decisions changed
- CLI prompt subcommand output updated (if it generates skill/reference content)
- CLI `--help` text matches new flags/commands
Only review standards compliance. Do NOT test functionality.
If rejecting, you MUST explain the specific reason in your output.
output: "Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments)."
@@ -159,7 +179,7 @@ roles:
procedure: |
The worktree path is provided in your task prompt. cd into it first.
1. Run `bun test` for automated test verification
1. Run `pnpm test` for automated test verification
2. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner step in the thread history)
3. Verify each scenario in the spec is covered and passing
4. Determine outcome:
@@ -176,8 +176,12 @@ async function runClaudeCode(ctx: AgentContext, model: string | null): Promise<A
log("K7R2M4N8", `prompt for role=${ctx.role} (length=${fullPrompt.length}):\n${fullPrompt}`);
// Try resuming a cached session for re-entry scenarios (e.g. reviewer reject → developer re-entry).
if (!ctx.isFirstVisit) {
// Try resuming a cached session. This covers both normal re-entry
// (e.g. reviewer reject → developer re-entry) AND the case where a
// previous run completed but frontmatter validation failed — the step
// was never written to CAS so isFirstVisit is still true, but the
// session cache holds a valid session we should resume.
{
const cachedSessionId = await getCachedSessionId(
"claude-code",
ctx.threadId,
+5 -1
View File
@@ -110,11 +110,15 @@ async function prepareSession(
cwd: string,
resumeDisabled: boolean,
): Promise<PromptAttempt> {
if (ctx.isFirstVisit || resumeDisabled) {
if (resumeDisabled) {
await client.connect(cwd);
return { useContinuation: false, resumed: false };
}
// Check session cache regardless of isFirstVisit. A previous run may
// have completed and cached its session but failed frontmatter
// validation — the step never got written to CAS so isFirstVisit is
// still true, yet we should resume the existing session.
const cachedSessionId = await getCachedSessionId(ctx.threadId, ctx.role, ctx.storageRoot);
if (cachedSessionId === null) {
log("6RWK3N8Q", `no cached session for ${ctx.threadId}:${ctx.role}, starting new session`);
@@ -21,11 +21,11 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
"..",
"..",
"..",
".workflows",
"examples",
"solve-issue.yaml",
);
test("committer procedure should use curl API instead of tea pr create", async () => {
test("committer procedure should create PR via tea pr create", async () => {
const yamlContent = await readFile(workflowPath, "utf-8");
const workflow = parse(yamlContent) as WorkflowPayload;
@@ -33,25 +33,22 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
const committerProcedure = workflow.roles.committer?.procedure;
expect(committerProcedure).toBeDefined();
// Verify the procedure uses curl API, not tea pr create
expect(committerProcedure).toContain("curl");
expect(committerProcedure).toContain("api/v1/repos");
expect(committerProcedure).toContain("/pulls");
// Verify it explicitly warns against tea pr create
expect(committerProcedure).toMatch(/do NOT use.*tea pr create/i);
// Verify the procedure uses tea pr create for PR creation
expect(committerProcedure).toContain("tea pr create");
expect(committerProcedure).toContain("git push");
expect(committerProcedure).toContain("Fixes #N");
});
test("committer procedure should reference repoRemote from task prompt", async () => {
test("committer procedure should extract owner/repo from git remote", async () => {
const yamlContent = await readFile(workflowPath, "utf-8");
const workflow = parse(yamlContent) as WorkflowPayload;
const committerProcedure = workflow.roles.committer?.procedure;
expect(committerProcedure).toBeDefined();
// Verify the procedure mentions repoRemote is provided in task prompt
expect(committerProcedure).toMatch(/repo remote.*provided.*task prompt/i);
expect(committerProcedure).toMatch(/owner\/repo/i);
// Verify the procedure extracts owner/repo from remote
expect(committerProcedure).toContain("git remote get-url origin");
expect(committerProcedure).toContain("hook_failed");
});
test("committer procedure should include error handling for curl failures", async () => {
@@ -100,45 +97,42 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
expect(committedVariant.required).toContain("$status");
});
test("developer procedure should include mandatory verification step", async () => {
test("developer procedure should include worktree setup", async () => {
const yamlContent = await readFile(workflowPath, "utf-8");
const workflow = parse(yamlContent) as WorkflowPayload;
const developerProcedure = workflow.roles.developer?.procedure;
expect(developerProcedure).toBeDefined();
// Verify the procedure includes mandatory verification step
expect(developerProcedure).toContain("MANDATORY VERIFICATION");
expect(developerProcedure).toContain("git branch --show-current");
expect(developerProcedure).toContain("git status");
expect(developerProcedure).toMatch(/ls -la|verify.*exist/i);
// Verify the procedure includes worktree setup
expect(developerProcedure).toContain("IMPORTANT");
expect(developerProcedure).toContain("git worktree add");
expect(developerProcedure).toContain("pnpm install");
});
test("reviewer procedure should enforce worktree path verification", async () => {
test("reviewer procedure should verify branch and run checks", async () => {
const yamlContent = await readFile(workflowPath, "utf-8");
const workflow = parse(yamlContent) as WorkflowPayload;
const reviewerProcedure = workflow.roles.reviewer?.procedure;
expect(reviewerProcedure).toBeDefined();
// Verify the procedure includes critical enforcement
expect(reviewerProcedure).toContain("CRITICAL");
expect(reviewerProcedure).toMatch(/cd.*pwd/);
expect(reviewerProcedure).toContain(
"Do NOT report results without running the actual commands",
);
// Verify the procedure includes branch verification and build checks
expect(reviewerProcedure).toContain("git branch --show-current");
expect(reviewerProcedure).toContain("pnpm run build");
expect(reviewerProcedure).toContain("pnpm run check");
});
test("developer procedure should include test debugging escalation", async () => {
test("developer procedure should include changeset and failure handling", async () => {
const yamlContent = await readFile(workflowPath, "utf-8");
const workflow = parse(yamlContent) as WorkflowPayload;
const developerProcedure = workflow.roles.developer?.procedure;
expect(developerProcedure).toBeDefined();
// Verify the procedure includes test failure guidance
expect(developerProcedure).toMatch(/tests fail.*first run/i);
expect(developerProcedure).toMatch(/3 test cycles|after 3 attempts/i);
// Verify the procedure includes changeset requirement and failure path
expect(developerProcedure).toContain(".changeset/");
expect(developerProcedure).toContain("$status=failed");
expect(developerProcedure).toContain("pnpm test");
});
});
+1 -1
View File
@@ -1,6 +1,6 @@
{
"name": "@united-workforce/eval",
"version": "0.1.4",
"version": "0.1.5",
"private": false,
"files": [
"src",
-329
View File
@@ -1,329 +0,0 @@
name: solve-issue
description: TDD-driven issue resolution adapted for the workflow monorepo with bun + vitest
roles:
planner:
description: Analyzes issue and outputs a TDD test spec
goal: You are a planning agent. You analyze Gitea issues and produce a TDD test specification that downstream roles will implement and verify.
capabilities:
- issue-analysis
- planning
procedure: 'On first run (no previous steps):
1. Read the issue and all comments from Gitea using `tea issues <number> -r <owner/repo>`
2. Look for project conventions files (CLAUDE.md, CONTRIBUTING.md) in the repo
3. Assess whether the issue has enough information to produce a test spec
4. If insufficient info: comment on the issue via `echo "..." | tea comment <number> -r <owner/repo>` (skip if you already commented), then output $status=insufficient_info
5. If sufficient: produce a detailed TDD test spec in markdown covering all scenarios
On subsequent runs (bounced back by tester with fix_spec):
1. Read the tester''s output from the previous step to understand what''s wrong with the spec
2. Revise the test spec accordingly
After producing the test spec:
1. The test spec is stored in CAS automatically by the uwf pipeline (agents do not need to call `ocas put` directly)
2. Put the hash in frontmatter.plan (required when $status=ready)
3. Set repoPath to the absolute path of the repository root
IMPORTANT: Extract the repo remote (owner/repo) from git:
```bash
git remote get-url origin | sed ''s|.*[:/]\([^/]*/[^.]*\).*|\1|''
```
Store the result as repoRemote in your frontmatter output so downstream roles can use it for tea/API calls.'
output: Output a brief summary of the test spec. Set $status to ready (with plan hash and repoPath) or insufficient_info.
frontmatter:
oneOf:
- properties:
$status:
const: ready
plan:
type: string
repoPath:
type: string
repoRemote:
type: string
required:
- $status
- plan
- repoPath
- properties:
$status:
const: insufficient_info
reason:
type: string
required:
- $status
- reason
developer:
description: TDD implementation per test spec
goal: You are a developer agent. You implement code changes following TDD — write tests first, then implementation.
capabilities:
- coding
procedure: "IMPORTANT: Always work in a git worktree, NEVER modify the main working directory directly.\nThe repo path and other details are provided in your task prompt.\n\nBefore starting any work,\
\ set up an isolated worktree:\n1. cd into the repo path provided in your task prompt\n2. `git fetch origin` to get latest refs\n3. First time (no existing branch):\n - `git worktree add .worktrees/fix/<issue-number>-<short-slug>\
\ -b fix/<issue-number>-<short-slug> origin/main`\n - `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`\n4. If bounced back from reviewer or tester (branch already exists):\n - cd\
\ into the existing worktree under `.worktrees/fix/<issue-number>-<short-slug>`\n - `git fetch origin && git rebase origin/main`\n5. ALL subsequent work must happen inside the worktree directory.\n\
\nThen implement TDD:\n6. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner's output in your task prompt)\n7. If bounced back from reviewer or tester: read the\
\ previous role's feedback in your task prompt\n8. Write tests first based on the spec (use vitest)\n9. Implement the code to make tests pass\n10. Ensure `bun run build` passes with no errors\n11.\
\ Run `bun test` to verify all tests pass\n\nIf you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,\nor repeated attempts fail), set $status=failed\
\ with a reason.\n"
output: List all files changed and provide a summary. Set $status to done (with branch/worktree), or failed (with reason).
frontmatter:
oneOf:
- properties:
$status:
const: done
branch:
type: string
worktree:
type: string
repoRemote:
type: string
required:
- $status
- branch
- worktree
- properties:
$status:
const: failed
reason:
type: string
repoRemote:
type: string
required:
- $status
- reason
reviewer:
description: Code standards compliance check
goal: You are a code reviewer. You verify code standards compliance — NOT functionality (that's the tester's job).
capabilities:
- code-review
- static-analysis
procedure: 'The worktree path is provided in your task prompt. cd into it first.
Before reviewing, verify the git branch:
1. Run `git branch --show-current` — confirm the branch name references the issue number being worked on
2. If the branch doesn''t correspond to the issue, flag it in your output and reject
Then perform code review:
Hard checks (must all pass):
3. `bun run build` — no build errors
4. `bunx biome check` — no lint violations
5. TypeScript strict mode — no type errors
Soft checks (review against project conventions from CLAUDE.md):
- Functional-first: functions + types, no classes (except for errors or third-party requirements)
- Named exports only, no default exports
- No optional properties (use `T | null` instead of `?:`)
- Folder module discipline: index.ts only re-exports, types in types.ts
- Crockford Base32 log tags (8-char, unique per call site)
- No `console.log` in production code (use createLogger from @united-workforce/util)
- No dynamic imports in production code
Only review standards compliance. Do NOT test functionality.
If rejecting, you MUST explain the specific reason in your output.
'
output: Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments).
frontmatter:
oneOf:
- properties:
$status:
const: approved
branch:
type: string
worktree:
type: string
repoRemote:
type: string
required:
- $status
- branch
- worktree
- properties:
$status:
const: rejected
comments:
type: string
worktree:
type: string
repoRemote:
type: string
required:
- $status
- comments
- worktree
tester:
description: Functional correctness verification
goal: You are a tester agent. You verify that the implementation correctly satisfies every scenario in the test spec.
capabilities:
- testing
procedure: "The worktree path is provided in your task prompt. cd into it first.\n\n1. Run `bun test` for automated test verification\n2. Read the test spec from CAS: `ocas get <plan hash>` (find\
\ the hash from the planner step in the thread history)\n3. Verify each scenario in the spec is covered and passing\n4. Determine outcome:\n - passed: all scenarios verified, tests pass\n - fix_code:\
\ tests fail or implementation doesn't match spec → send back to developer\n - fix_spec: the spec itself is wrong or incomplete → send back to planner\n"
output: Report test results per scenario. Set $status to passed (with branch/worktree), fix_code (with report), or fix_spec (with report).
frontmatter:
oneOf:
- properties:
$status:
const: passed
branch:
type: string
worktree:
type: string
repoRemote:
type: string
required:
- $status
- branch
- worktree
- properties:
$status:
const: fix_code
report:
type: string
repoRemote:
type: string
worktree:
type: string
branch:
type: string
required:
- $status
- report
- properties:
$status:
const: fix_spec
report:
type: string
repoRemote:
type: string
worktree:
type: string
branch:
type: string
required:
- $status
- report
committer:
description: Commits and creates PR
goal: You are a committer agent. You create a clean commit and push a PR linking the original issue.
capabilities: []
procedure: "The worktree path, branch name, and repo remote (owner/repo) are provided in your task prompt.\ncd into the worktree first.\n\nNote: You inherit the developer's worktree and branch. Do NOT\
\ create a new branch.\n1. Stage all changes: `git add -A`\n2. Commit with a descriptive message referencing the issue: `git commit -m \"type: description\\n\\nFixes #N\"`\n3. Push the branch: `git\
\ push -u origin <branch-name>`\n4. **Verify push succeeded** — run `git ls-remote origin <branch-name>` and confirm it prints a commit hash.\n - If no output or push failed: capture the error, mark hook_failed\n\
5. Create a PR using the Gitea API (do NOT use `tea pr create` — it fails in worktrees):\n ```bash\n GITEA_TOKEN=$(cfg get GITEA_TOKEN)\n curl -s -X POST -H \"Authorization: token $GITEA_TOKEN\" -H \"Content-Type: application/json\" \\\n\
\ \"https://git.shazhou.work/api/v1/repos/<owner>/<repo>/pulls\" \\\n -d '{\"title\":\"...\",\"body\":\"...\",\"head\":\"<branch>\",\"base\":\"main\"}'\n ```\n - The repo remote (owner/repo format, e.g. \"shazhou/united-workforce\") is given in your task prompt — use it directly.\n\
\ - PR body must include: What / Why / Changes / Ref sections, with `Fixes #N` in Ref\n6. **Verify PR was created** — parse the curl response JSON: it must contain a `\"number\"` field. Print the PR URL.\n\
\ - If curl returns an error or no number field: capture the response, mark hook_failed\n7. After PR creation, clean up the worktree:\n - cd to the repo root (parent of .worktrees)\n - `git worktree remove <worktree-path>`"
output: Include PR URL on success or error log on failure. Set $status to committed (with prUrl) or hook_failed (with error).
frontmatter:
oneOf:
- properties:
$status:
const: committed
prUrl:
type: string
repoRemote:
type: string
worktree:
type: string
branch:
type: string
required:
- $status
- prUrl
- properties:
$status:
const: hook_failed
error:
type: string
repoRemote:
type: string
worktree:
type: string
branch:
type: string
required:
- $status
- error
graph:
$START:
new:
role: planner
prompt: Analyze the issue and produce an implementation plan.
resume:
role: planner
prompt: Review the previous run output and continue the work.
planner:
insufficient_info:
role: $SUSPEND
prompt: "信息不足,需要补充:{{{reason}}}"
ready:
role: developer
prompt: 'Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}. Repo remote: {{{repoRemote}}}.'
developer:
done:
role: reviewer
prompt: 'Review branch {{{branch}}} at {{{worktree}}} for code standards compliance. Repo remote: {{{repoRemote}}}.'
failed:
role: $END
prompt: 'Developer failed: {{{reason}}}. Ending workflow.'
reviewer:
rejected:
role: developer
prompt: 'Reviewer rejected: {{{comments}}}. Fix the issues in repo {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
approved:
role: tester
prompt: 'Review passed. Run tests on branch {{{branch}}} at {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
tester:
fix_code:
role: developer
prompt: 'Tests found code issues: {{{report}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
fix_spec:
role: planner
prompt: 'Tests found spec issues: {{{report}}}. Revise the test spec. Repo remote: {{{repoRemote}}}.'
passed:
role: committer
prompt: 'All tests passed. Commit and push branch {{{branch}}} from {{{worktree}}}. Repo remote (owner/repo): {{{repoRemote}}}.'
committer:
hook_failed:
role: developer
prompt: 'Push hook failed: {{{error}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
committed:
role: $END
prompt: 'PR created: {{{prUrl}}}. Workflow complete.'