Compare commits
83 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 69ec8c2c5e | |||
| 81aa282c92 | |||
| a620defbcf | |||
| 439891f6b6 | |||
| df244c52e8 | |||
| cb6e0d6a11 | |||
| 9d0c6df62c | |||
| 0f5bb1f191 | |||
| 00d960daba | |||
| 3a26285872 | |||
| 13c0812944 | |||
| 2e7e5f6ec4 | |||
| 88c077d439 | |||
| aaadab4445 | |||
| adf7837975 | |||
| 513846f4ab | |||
| aee123cc82 | |||
| 8ddada5879 | |||
| aa732f5466 | |||
| e354fc4341 | |||
| 0e7e3ea44b | |||
| aa454c85dd | |||
| 6dd7d521be | |||
| 950dc056d8 | |||
| d360b85374 | |||
| 509dfad857 | |||
| 58b84e3b3c | |||
| f821ac99f4 | |||
| 2c4700c49f | |||
| 4410afcd4a | |||
| a0e254a681 | |||
| dd77b40f6c | |||
| 5ed6f68e4b | |||
| 1ed0bf1f76 | |||
| d97840cf8d | |||
| b560818f1a | |||
| f989dee85b | |||
| 7e4a59de7e | |||
| 68079cc003 | |||
| 1a37928bb9 | |||
| 57511a93fe | |||
| adc3982a4a | |||
| 4580388270 | |||
| caba82fe36 | |||
| 6aee2ed5ef | |||
| 709b9dc1e5 | |||
| 7a788a9d90 | |||
| e5af5e9027 | |||
| fde87b6274 | |||
| a33f12c74f | |||
| 0ad10b9b6d | |||
| 3be92bfac2 | |||
| 8d6f480b0f | |||
| 5450bc1230 | |||
| f1f122b0b1 | |||
| 57ae6d1755 | |||
| d64d150071 | |||
| c5eb8b79d1 | |||
| 36a3ca6a08 | |||
| eb0b7b514f | |||
| a47871ec4e | |||
| 5851e5d162 | |||
| 61dfb40933 | |||
| fbfd31a042 | |||
| d99a376b60 | |||
| a536efee00 | |||
| 9260d81084 | |||
| c8d884072a | |||
| abeb465f46 | |||
| 28427a973f | |||
| 794f9db568 | |||
| cd585a26f1 | |||
| 1cf8f350d0 | |||
| 427568a21d | |||
| d3a2353acf | |||
| 8085d1d6e0 | |||
| 8764d7bda3 | |||
| 850a3b2f25 | |||
| 3d6a517e83 | |||
| 825f0c641a | |||
| 81bbe1178f | |||
| a0e139935e | |||
| a08775896f |
@@ -1,246 +0,0 @@
|
||||
name: "solve-issue"
|
||||
description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds."
|
||||
roles:
|
||||
planner:
|
||||
description: "Analyzes issue and outputs a TDD test spec"
|
||||
goal: "You are a planning agent. You analyze Gitea issues and produce a TDD test specification that downstream roles will implement and verify."
|
||||
capabilities:
|
||||
- issue-analysis
|
||||
- planning
|
||||
procedure: |
|
||||
On first run (no previous steps):
|
||||
1. Read the issue and all comments from Gitea using `tea issues <number> -r <owner/repo>`
|
||||
2. Look for project conventions files (CLAUDE.md, CONTRIBUTING.md, .cursor/rules/) in the repo
|
||||
3. Assess whether the issue has enough information to produce a test spec
|
||||
4. If insufficient info: comment on the issue via `echo "..." | tea comment <number> -r <owner/repo>` (skip if you already commented), then output $status=insufficient_info
|
||||
5. If sufficient: produce a detailed TDD test spec in markdown covering all scenarios
|
||||
|
||||
On subsequent runs (bounced back by tester with fix_spec):
|
||||
1. Read the tester's output from the previous step to understand what's wrong with the spec
|
||||
2. Revise the test spec accordingly
|
||||
|
||||
After producing the test spec:
|
||||
1. The test spec is stored in CAS automatically by the uwf pipeline (agents do not need to call `ocas put` directly)
|
||||
2. Put the plan hash in frontmatter.plan (required when $status=ready)
|
||||
3. Set repoPath to the absolute path of the repository root
|
||||
|
||||
IMPORTANT: Extract the repo remote (owner/repo) from git:
|
||||
```bash
|
||||
git remote get-url origin | sed 's|.*[:/]\([^/]*/[^.]*\).*|\1|'
|
||||
```
|
||||
Store the result as repoRemote in your frontmatter output so downstream roles can use it for tea/API calls.
|
||||
output: "Output a brief summary of the test spec. Set $status to ready (with plan hash and repoPath) or insufficient_info."
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "ready" }
|
||||
plan: { type: string }
|
||||
repoPath: { type: string }
|
||||
repoRemote: { type: string }
|
||||
required: [$status, plan, repoPath, repoRemote]
|
||||
- properties:
|
||||
$status: { const: "insufficient_info" }
|
||||
reason: { type: string }
|
||||
required: [$status, reason]
|
||||
developer:
|
||||
description: "TDD implementation per test spec"
|
||||
goal: "You are a developer agent. You implement code changes following TDD — write tests first, then implementation."
|
||||
capabilities:
|
||||
- coding
|
||||
procedure: |
|
||||
IMPORTANT: Always work in a git worktree, NEVER modify the main working directory directly.
|
||||
The repo path and other details are provided in your task prompt.
|
||||
|
||||
Before starting any work, set up an isolated worktree:
|
||||
1. cd into the repo path provided in your task prompt
|
||||
2. `git fetch origin` to get latest refs
|
||||
3. First time (no existing branch):
|
||||
- `git worktree add .worktrees/fix/<issue-number>-<short-slug> -b fix/<issue-number>-<short-slug> origin/main`
|
||||
- `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`
|
||||
4. If bounced back from reviewer or tester (branch already exists):
|
||||
- cd into the existing worktree under `.worktrees/fix/<issue-number>-<short-slug>`
|
||||
- `git fetch origin && git rebase origin/main`
|
||||
5. ALL subsequent work must happen inside the worktree directory.
|
||||
|
||||
Then implement TDD:
|
||||
6. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner's output in your task prompt)
|
||||
7. If bounced back from reviewer or tester: read the previous role's feedback in your task prompt
|
||||
8. Write tests first based on the spec
|
||||
9. Implement the code to make tests pass
|
||||
10. Ensure `bun run build` passes with no errors
|
||||
11. Run `bun test` to verify all tests pass
|
||||
- If tests fail on first run:
|
||||
* Read the test output carefully for missing imports or setup issues
|
||||
* Check if you're running tests from the correct working directory (package root vs workspace root)
|
||||
* Fix the immediate issue and rerun ONCE
|
||||
* If tests still fail after 2 attempts: check the test spec for ambiguities
|
||||
* If stuck after 3 test cycles: set $status=failed with detailed error report rather than continuing blind retries
|
||||
12. MANDATORY VERIFICATION before reporting done:
|
||||
- Run `git branch --show-current` and confirm branch name matches expected
|
||||
- Run `git status` and verify changed files exist
|
||||
- Run `ls -la <key-implementation-files>` to verify they exist on disk
|
||||
- If ANY verification fails: retry the implementation, do NOT report done
|
||||
|
||||
If you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,
|
||||
or repeated attempts fail), set $status=failed with a reason.
|
||||
output: "List all files changed and provide a summary. Set $status to done (with branch/worktree), or failed (with reason)."
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "done" }
|
||||
branch: { type: string }
|
||||
worktree: { type: string }
|
||||
repoRemote: { type: string }
|
||||
required: [$status, branch, worktree]
|
||||
- properties:
|
||||
$status: { const: "failed" }
|
||||
reason: { type: string }
|
||||
required: [$status, reason]
|
||||
reviewer:
|
||||
description: "Code standards compliance check"
|
||||
goal: "You are a code reviewer. You verify code standards compliance — NOT functionality (that's the tester's job)."
|
||||
capabilities:
|
||||
- code-review
|
||||
- static-analysis
|
||||
procedure: |
|
||||
The worktree path is provided in your task prompt. cd into it first.
|
||||
|
||||
CRITICAL: You MUST execute every verification command below. Do NOT report results without running the actual commands. Do NOT rely on prior context or assumptions.
|
||||
|
||||
Before reviewing, verify the worktree and branch exist:
|
||||
0. Run `cd <worktree-path> && pwd` to confirm the path is accessible
|
||||
- If the cd fails: the worktree truly doesn't exist, reject with that reason
|
||||
- If the cd succeeds: proceed with step 1 below
|
||||
1. Run `git branch --show-current` — confirm the branch name references the issue number being worked on
|
||||
2. If the branch doesn't correspond to the issue, flag it in your output and reject
|
||||
|
||||
Then perform code review:
|
||||
Hard checks (must all pass):
|
||||
3. `bun run build` — no build errors
|
||||
4. `bunx biome check` — no lint violations
|
||||
5. TypeScript strict mode — no type errors
|
||||
|
||||
Soft checks (review against project conventions if CLAUDE.md / .cursor/rules exist):
|
||||
- Naming conventions, module boundaries, code style
|
||||
- No `console.log` in production code
|
||||
- No dynamic imports in production code
|
||||
|
||||
Only review standards compliance. Do NOT test functionality.
|
||||
If rejecting, you MUST explain the specific reason in your output.
|
||||
output: "Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments)."
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "approved" }
|
||||
branch: { type: string }
|
||||
worktree: { type: string }
|
||||
repoRemote: { type: string }
|
||||
required: [$status, branch, worktree]
|
||||
- properties:
|
||||
$status: { const: "rejected" }
|
||||
comments: { type: string }
|
||||
worktree: { type: string }
|
||||
repoRemote: { type: string }
|
||||
required: [$status, comments, worktree]
|
||||
tester:
|
||||
description: "Functional correctness verification"
|
||||
goal: "You are a tester agent. You verify that the implementation correctly satisfies every scenario in the test spec."
|
||||
capabilities:
|
||||
- testing
|
||||
procedure: |
|
||||
The worktree path is provided in your task prompt. cd into it first.
|
||||
|
||||
1. Run `bun test` for automated test verification
|
||||
2. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner step in the thread history)
|
||||
3. Verify each scenario in the spec is covered and passing
|
||||
4. Determine outcome:
|
||||
- passed: all scenarios verified, tests pass
|
||||
- fix_code: tests fail or implementation doesn't match spec → send back to developer
|
||||
- fix_spec: the spec itself is wrong or incomplete → send back to planner
|
||||
output: "Report test results per scenario. Set $status to passed (with branch/worktree), fix_code (with report), or fix_spec (with report)."
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "passed" }
|
||||
branch: { type: string }
|
||||
worktree: { type: string }
|
||||
repoRemote: { type: string }
|
||||
required: [$status, branch, worktree]
|
||||
- properties:
|
||||
$status: { const: "fix_code" }
|
||||
report: { type: string }
|
||||
repoRemote: { type: string }
|
||||
worktree: { type: string }
|
||||
branch: { type: string }
|
||||
required: [$status, report]
|
||||
- properties:
|
||||
$status: { const: "fix_spec" }
|
||||
report: { type: string }
|
||||
repoRemote: { type: string }
|
||||
worktree: { type: string }
|
||||
branch: { type: string }
|
||||
required: [$status, report]
|
||||
committer:
|
||||
description: "Commits and creates PR"
|
||||
goal: "You are a committer agent. You create a clean commit and push a PR linking the original issue."
|
||||
capabilities: []
|
||||
procedure: |
|
||||
The worktree path, branch name, and repo remote (owner/repo) are provided in your task prompt.
|
||||
cd into the worktree first.
|
||||
|
||||
Note: You inherit the developer's worktree and branch. Do NOT create a new branch.
|
||||
1. Check `git status` — if working tree is clean and branch is ahead of origin, skip to step 3 (push).
|
||||
2. If there are unstaged/uncommitted changes: `git add -A` then `git commit -m "type: description\n\nFixes #N"`
|
||||
3. Push the branch: `git push -u origin <branch-name>`
|
||||
4. **Verify push succeeded** — run `git ls-remote origin <branch-name>` and confirm it prints a commit hash.
|
||||
- If no output or push failed: capture the error, mark hook_failed
|
||||
5. Create a PR using the Gitea API (do NOT use `tea pr create` — it fails in worktrees):
|
||||
```bash
|
||||
GITEA_TOKEN=$(cfg get GITEA_TOKEN)
|
||||
curl -s -X POST -H "Authorization: token $GITEA_TOKEN" -H "Content-Type: application/json" \
|
||||
"https://git.shazhou.work/api/v1/repos/<owner>/<repo>/pulls" \
|
||||
-d '{"title":"...","body":"...","head":"<branch>","base":"main"}'
|
||||
```
|
||||
- The repo remote (owner/repo format, e.g. "shazhou/united-workforce") is given in your task prompt — use it directly.
|
||||
- PR body must include: What / Why / Changes / Ref sections, with `Fixes #N` in Ref
|
||||
6. **Verify PR was created** — parse the curl response JSON: it must contain a `"number"` field. Print the PR URL.
|
||||
- If curl returns an error or no number field: capture the response, mark hook_failed
|
||||
7. After PR creation, clean up the worktree:
|
||||
- cd to the repo root (parent of .worktrees)
|
||||
- `git worktree remove <worktree-path>`
|
||||
output: "Include PR URL on success or error log on failure. Set $status to committed (with prUrl) or hook_failed (with error)."
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "committed" }
|
||||
prUrl: { type: string }
|
||||
repoRemote: { type: string }
|
||||
worktree: { type: string }
|
||||
branch: { type: string }
|
||||
required: [$status, prUrl]
|
||||
- properties:
|
||||
$status: { const: "hook_failed" }
|
||||
error: { type: string }
|
||||
repoRemote: { type: string }
|
||||
worktree: { type: string }
|
||||
branch: { type: string }
|
||||
required: [$status, error]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: "planner", prompt: "Analyze the issue and produce an implementation plan." }
|
||||
planner:
|
||||
insufficient_info: { role: "$SUSPEND", prompt: "信息不足,需要补充:{{{reason}}}" }
|
||||
ready: { role: "developer", prompt: "Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}. Repo remote: {{{repoRemote}}}." }
|
||||
developer:
|
||||
done: { role: "reviewer", prompt: "Review branch {{{branch}}} at {{{worktree}}} for code standards compliance. Repo remote: {{{repoRemote}}}." }
|
||||
failed: { role: "$END", prompt: "Developer failed: {{{reason}}}. Ending workflow." }
|
||||
reviewer:
|
||||
rejected: { role: "developer", prompt: "Reviewer rejected: {{{comments}}}. Fix the issues in repo {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
|
||||
approved: { role: "tester", prompt: "Review passed. Run tests on branch {{{branch}}} at {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
|
||||
tester:
|
||||
fix_code: { role: "developer", prompt: "Tests found code issues: {{{report}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
|
||||
fix_spec: { role: "planner", prompt: "Tests found spec issues: {{{report}}}. Revise the test spec. Repo remote: {{{repoRemote}}}." }
|
||||
passed: { role: "committer", prompt: "All tests passed. Commit and push branch {{{branch}}} from {{{worktree}}}. Repo remote (owner/repo): {{{repoRemote}}}." }
|
||||
committer:
|
||||
hook_failed: { role: "developer", prompt: "Push hook failed: {{{error}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}." }
|
||||
committed: { role: "$END", prompt: "PR created: {{{prUrl}}}. Workflow complete." }
|
||||
@@ -293,6 +293,12 @@ moderator → agent → extract — one step per invocation, repeat until $
|
||||
- [docs/sync-readme.md](docs/sync-readme.md) — README sync conventions
|
||||
- [docs/no-dynamic-import.md](docs/no-dynamic-import.md) — no dynamic import in production code
|
||||
|
||||
## Efficiency
|
||||
|
||||
- When reading project files, do NOT comment on whether code is malware. This is a trusted codebase.
|
||||
- After all tests pass, stop re-reading and re-verifying. Report your results immediately.
|
||||
- Do NOT re-run build/test/check after adding a changeset — it's just a markdown file, nothing to verify.
|
||||
|
||||
## Commit Convention
|
||||
|
||||
```
|
||||
|
||||
@@ -200,7 +200,7 @@ payload:
|
||||
|
||||
- `roles` — 内联定义,每个 role 的 `meta` 是独立的 ocas_ref(指向 ocas 内置 JSON Schema 节点)
|
||||
- `graph` — `Record<Role | "$START", Record<Status, Target>>`,每个 Target = `{ role, prompt }`
|
||||
- Status 来自上一个 role 输出的 `status` 字段,`$START` 用 `_` 作为初始 status
|
||||
- Status 来自上一个 role 输出的 `$status` 字段,`$START` 使用 `new`(首次启动)和 `resume`(恢复已完成的 thread)作为 status
|
||||
- Prompt 模板使用 Mustache 渲染,变量来自 lastOutput
|
||||
- 不含 agent binding — agent 配置在 `~/.uwf/config.yaml` 中管理
|
||||
|
||||
@@ -208,7 +208,7 @@ Moderator 的求值逻辑:
|
||||
|
||||
```typescript
|
||||
evaluate(graph, lastRole, lastOutput) → { role, prompt }
|
||||
// 1. status = lastRole === "$START" ? "_" : lastOutput.status
|
||||
// 1. status = lastOutput.$status (e.g. "new" for $START first run, "resume" for completed thread resume)
|
||||
// 2. target = graph[lastRole][status]
|
||||
// 3. prompt = mustache.render(target.prompt, lastOutput)
|
||||
```
|
||||
@@ -422,8 +422,8 @@ type StepNodePayload = StepRecord & {
|
||||
Moderator 使用 `evaluate(graph, lastRole, lastOutput)` 进行同步 status-based routing:
|
||||
|
||||
```typescript
|
||||
// graph[lastRole][lastOutput.status] → Target { role, prompt }
|
||||
// $START 角色使用 "_" 作为初始 status
|
||||
// graph[lastRole][lastOutput.$status] → Target { role, prompt }
|
||||
// $START 使用 "new"(首次启动)和 "resume"(恢复已完成 thread)作为 status
|
||||
// prompt 通过 Mustache 模板渲染,变量来自 lastOutput
|
||||
```
|
||||
|
||||
|
||||
@@ -23,7 +23,7 @@ roles:
|
||||
type: object
|
||||
properties:
|
||||
$status:
|
||||
enum: ["done"]
|
||||
const: done
|
||||
thesis:
|
||||
type: string
|
||||
keyPoints:
|
||||
@@ -35,6 +35,7 @@ roles:
|
||||
required: [$status, thesis, keyPoints]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: "analyst", prompt: "Analyze the topic in the task and produce a structured summary with key points." }
|
||||
new: { role: "analyst", prompt: "Analyze the topic in the task and produce a structured summary with key points." }
|
||||
resume: { role: "analyst", prompt: "Review the previous analysis output and continue with additional context." }
|
||||
analyst:
|
||||
done: { role: "$END", prompt: "Analysis complete. Finish the workflow." }
|
||||
|
||||
+124
-55
@@ -1,62 +1,131 @@
|
||||
name: "debate"
|
||||
description: "Structured debate between two sides. Tests cross-process session resume."
|
||||
name: debate
|
||||
description: "Multi-role structured debate with critical thinking framework and host summary."
|
||||
|
||||
# Shared frontmatter schema for debater roles (YAML anchor)
|
||||
x-debater-frontmatter: &debater-frontmatter
|
||||
type: object
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: speak }
|
||||
argument: { type: string }
|
||||
required: [$status, argument]
|
||||
- properties:
|
||||
$status: { const: conceded }
|
||||
reason: { type: string }
|
||||
required: [$status, reason]
|
||||
- properties:
|
||||
$status: { const: final }
|
||||
closing: { type: string }
|
||||
required: [$status, closing]
|
||||
|
||||
roles:
|
||||
against:
|
||||
description: "Argues against the proposition"
|
||||
goal: |
|
||||
You are a skilled debater arguing AGAINST the proposition.
|
||||
Be logical, cite evidence, and directly address your opponent's points.
|
||||
Keep each argument concise (under 200 words).
|
||||
capabilities:
|
||||
- argumentation
|
||||
- critical-thinking
|
||||
proponent:
|
||||
description: "Argues FOR the proposition"
|
||||
goal: "Build a compelling case for the proposition through logical reasoning and evidence"
|
||||
capabilities: []
|
||||
procedure: |
|
||||
1. If this is the opening, present your strongest argument against the proposition.
|
||||
2. If responding to the other side, directly counter their points with evidence and logic.
|
||||
3. If you find yourself genuinely convinced by the other side, you may concede.
|
||||
output: |
|
||||
Provide your argument in the frontmatter.
|
||||
Set status to "conceded" ONLY if you are genuinely convinced and wish to stop debating.
|
||||
Otherwise set status to "continue".
|
||||
You are an experienced scholar arguing FOR the proposition.
|
||||
|
||||
## Critical Thinking Framework (execute before every speech)
|
||||
|
||||
### A. Pre-speech reflection (internal, do not output)
|
||||
- Does every step in my argument chain hold? Any hidden assumptions or logical gaps?
|
||||
- If I were my opponent, how would I attack this? Where am I weakest?
|
||||
- Does my evidence actually support my claim, or could it backfire?
|
||||
- Should I go on offense or defense this round?
|
||||
|
||||
### B. Evidence discipline
|
||||
- Verify key numbers — watch for order-of-magnitude errors
|
||||
- Assess data freshness — fast-moving fields have short half-lives
|
||||
- Distinguish primary data from secondary citations, expert opinion, and common assumptions
|
||||
|
||||
### C. Anti-fragility
|
||||
- Anticipate counterarguments; preemptively strengthen or strategically abandon weak points
|
||||
- Catch logical gaps, data misuse, or outdated claims in your opponent's reasoning
|
||||
|
||||
## Rules
|
||||
1. Check Thread Progress to see how many times you have spoken.
|
||||
2. On your 3rd speech, you MUST output $status: final (closing statement).
|
||||
3. If genuinely convinced by the opponent, output $status: conceded.
|
||||
4. Otherwise output $status: speak and counter the opponent's points.
|
||||
5. Be rigorous, cite evidence, stay concise.
|
||||
output: "Debate argument"
|
||||
frontmatter: *debater-frontmatter
|
||||
|
||||
opponent:
|
||||
description: "Argues AGAINST the proposition"
|
||||
goal: "Build a compelling case against the proposition through logical reasoning and evidence"
|
||||
capabilities: []
|
||||
procedure: |
|
||||
You are an experienced scholar arguing AGAINST the proposition.
|
||||
|
||||
## Critical Thinking Framework (execute before every speech)
|
||||
|
||||
### A. Pre-speech reflection (internal, do not output)
|
||||
- Does every step in my argument chain hold? Any hidden assumptions or logical gaps?
|
||||
- If I were my opponent, how would I attack this? Where am I weakest?
|
||||
- Does my evidence actually support my claim, or could it backfire?
|
||||
- Should I go on offense or defense this round?
|
||||
|
||||
### B. Evidence discipline
|
||||
- Verify key numbers — watch for order-of-magnitude errors
|
||||
- Assess data freshness — fast-moving fields have short half-lives
|
||||
- Distinguish primary data from secondary citations, expert opinion, and common assumptions
|
||||
|
||||
### C. Anti-fragility
|
||||
- Anticipate counterarguments; preemptively strengthen or strategically abandon weak points
|
||||
- Catch logical gaps, data misuse, or outdated claims in your opponent's reasoning
|
||||
|
||||
## Rules
|
||||
1. Check Thread Progress to see how many times you have spoken.
|
||||
2. On your 3rd speech, or when the proponent has issued a final statement, you MUST output $status: final.
|
||||
3. If genuinely convinced by the proponent, output $status: conceded.
|
||||
4. Otherwise output $status: speak and counter the proponent's points.
|
||||
5. Be rigorous, cite evidence, stay concise.
|
||||
output: "Debate argument"
|
||||
frontmatter: *debater-frontmatter
|
||||
|
||||
host:
|
||||
description: "Debate moderator — delivers impartial summary and verdict"
|
||||
goal: "Objectively review the debate, analyze both sides, and deliver a verdict"
|
||||
capabilities: []
|
||||
procedure: |
|
||||
You are an experienced academic debate moderator.
|
||||
|
||||
## Task
|
||||
1. Outline each side's core arguments
|
||||
2. Evaluate reasoning quality and evidence use
|
||||
3. Highlight the most impactful exchanges
|
||||
4. Analyze the deeper significance of the topic
|
||||
5. Deliver an overall verdict
|
||||
|
||||
## Style
|
||||
- Impartial but with independent judgment
|
||||
- Substantive, not superficial
|
||||
output: "Debate summary report"
|
||||
frontmatter:
|
||||
type: object
|
||||
properties:
|
||||
$status:
|
||||
enum: ["continue", "conceded"]
|
||||
argument:
|
||||
type: string
|
||||
required: [$status, argument]
|
||||
for:
|
||||
description: "Argues for the proposition"
|
||||
goal: |
|
||||
You are a skilled debater arguing FOR the proposition.
|
||||
Be logical, cite evidence, and directly address your opponent's points.
|
||||
Keep each argument concise (under 200 words).
|
||||
capabilities:
|
||||
- argumentation
|
||||
- critical-thinking
|
||||
procedure: |
|
||||
1. Read the opposing side's latest argument carefully.
|
||||
2. Counter their points with evidence and logic.
|
||||
3. If you find yourself genuinely convinced by the other side, you may concede.
|
||||
output: |
|
||||
Provide your argument in the frontmatter.
|
||||
Set status to "conceded" ONLY if you are genuinely convinced and wish to stop debating.
|
||||
Otherwise set status to "continue".
|
||||
frontmatter:
|
||||
type: object
|
||||
properties:
|
||||
$status:
|
||||
enum: ["continue", "conceded"]
|
||||
argument:
|
||||
type: string
|
||||
required: [$status, argument]
|
||||
$status: { const: done }
|
||||
summary: { type: string }
|
||||
highlights: { type: string }
|
||||
verdict: { type: string }
|
||||
required: [$status, summary, highlights, verdict]
|
||||
|
||||
graph:
|
||||
$START:
|
||||
_: { role: "against", prompt: "Present your opening argument against the proposition." }
|
||||
against:
|
||||
conceded: { role: "$END", prompt: "The against side conceded. Debate over." }
|
||||
continue: { role: "for", prompt: "Counter the opposing argument: {{{argument}}}" }
|
||||
for:
|
||||
conceded: { role: "$END", prompt: "The for side conceded. Debate over." }
|
||||
continue: { role: "against", prompt: "Counter the opposing argument: {{{argument}}}" }
|
||||
new: { role: proponent, prompt: "The debate begins. You are arguing FOR the proposition. Present your opening argument." }
|
||||
resume: { role: proponent, prompt: "The debate continues." }
|
||||
|
||||
proponent:
|
||||
speak: { role: opponent, prompt: "Proponent argues:\n\n{{{argument}}}\n\nYou are the opponent. Counter this argument." }
|
||||
conceded: { role: host, prompt: "The proponent conceded: {{{reason}}}\n\nPlease summarize the debate." }
|
||||
final: { role: opponent, prompt: "Proponent's closing statement:\n\n{{{closing}}}\n\nYou are the opponent. Deliver your final response." }
|
||||
|
||||
opponent:
|
||||
speak: { role: proponent, prompt: "Opponent argues:\n\n{{{argument}}}\n\nYou are the proponent. Counter this argument." }
|
||||
conceded: { role: host, prompt: "The opponent conceded: {{{reason}}}\n\nPlease summarize the debate." }
|
||||
final: { role: host, prompt: "Opponent's closing statement:\n\n{{{closing}}}\n\nThe debate is over. Please summarize." }
|
||||
|
||||
host:
|
||||
done: { role: "$END", prompt: "Summary complete." }
|
||||
|
||||
@@ -18,13 +18,13 @@ roles:
|
||||
type: object
|
||||
properties:
|
||||
$status:
|
||||
type: string
|
||||
enum: [done]
|
||||
const: done
|
||||
summary:
|
||||
type: string
|
||||
required: [$status, summary]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: "fixer", prompt: "Fix the code issue described in the task prompt." }
|
||||
new: { role: "fixer", prompt: "Fix the code issue described in the task prompt." }
|
||||
resume: { role: "fixer", prompt: "Review the previous run output and continue fixing the code issue." }
|
||||
fixer:
|
||||
done: { role: "$END", prompt: "Fix complete." }
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
name: "solve-issue"
|
||||
description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds."
|
||||
description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds. Uses pnpm."
|
||||
roles:
|
||||
planner:
|
||||
description: "Analyzes issue and outputs a TDD test spec"
|
||||
@@ -80,7 +80,7 @@ roles:
|
||||
2. `git fetch origin` to get latest refs
|
||||
3. First time (no existing branch):
|
||||
- `git worktree add .worktrees/fix/<issue-number>-<short-slug> -b fix/<issue-number>-<short-slug> origin/main`
|
||||
- `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`
|
||||
- `cd .worktrees/fix/<issue-number>-<short-slug> && pnpm install`
|
||||
4. If continuing on existing branch (prompt says "Continue work on existing branch" or provides a worktree path):
|
||||
- cd directly into the worktree path provided in the prompt
|
||||
- `git fetch origin && git rebase origin/main`
|
||||
@@ -95,8 +95,20 @@ roles:
|
||||
7. If bounced back from reviewer or tester: read the previous role's feedback in your task prompt
|
||||
8. Write tests first based on the spec
|
||||
9. Implement the code to make tests pass
|
||||
10. Ensure `bun run build` passes with no errors
|
||||
11. Run `bun test` to verify all tests pass
|
||||
10. Ensure `pnpm run build` passes with no errors
|
||||
11. Run `pnpm test` to verify all tests pass
|
||||
|
||||
After implementation, before reporting done:
|
||||
12. Add a changeset file (`.changeset/<short-slug>.md`) with correct bump type:
|
||||
- `patch` for bug fixes, internal refactors, test-only changes
|
||||
- `minor` for new features, new CLI commands, new API surfaces
|
||||
- `major` for breaking changes
|
||||
List every affected package in the changeset frontmatter.
|
||||
13. Update documentation if the change affects user-facing behavior:
|
||||
- `README.md` — usage examples, feature descriptions
|
||||
- `.cards/` — architecture decision records (if applicable)
|
||||
- CLI prompt subcommand output (if CLI help text changes)
|
||||
- CLI `--help` text (if flags/commands are added or changed)
|
||||
|
||||
If you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,
|
||||
or repeated attempts fail), set $status=failed with a reason.
|
||||
@@ -127,8 +139,8 @@ roles:
|
||||
|
||||
Then perform code review:
|
||||
Hard checks (must all pass):
|
||||
3. `bun run build` — no build errors
|
||||
4. `bunx biome check` — no lint violations
|
||||
3. `pnpm run build` — no build errors
|
||||
4. `pnpm run check` — no lint violations
|
||||
5. TypeScript strict mode — no type errors
|
||||
|
||||
Soft checks (review against project conventions if CLAUDE.md / .cursor/rules exist):
|
||||
@@ -136,6 +148,14 @@ roles:
|
||||
- No `console.log` in production code
|
||||
- No dynamic imports in production code
|
||||
|
||||
Documentation & changeset checks:
|
||||
6. Changeset exists in `.changeset/` with correct bump type (`patch`/`minor`/`major`) and lists all affected packages
|
||||
7. If the change is user-facing, documentation is updated:
|
||||
- `README.md` reflects new/changed behavior
|
||||
- `.cards/` architecture cards updated if design decisions changed
|
||||
- CLI prompt subcommand output updated (if it generates skill/reference content)
|
||||
- CLI `--help` text matches new flags/commands
|
||||
|
||||
Only review standards compliance. Do NOT test functionality.
|
||||
If rejecting, you MUST explain the specific reason in your output.
|
||||
output: "Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments)."
|
||||
@@ -159,7 +179,7 @@ roles:
|
||||
procedure: |
|
||||
The worktree path is provided in your task prompt. cd into it first.
|
||||
|
||||
1. Run `bun test` for automated test verification
|
||||
1. Run `pnpm test` for automated test verification
|
||||
2. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner step in the thread history)
|
||||
3. Verify each scenario in the spec is covered and passing
|
||||
4. Determine outcome:
|
||||
@@ -215,7 +235,8 @@ roles:
|
||||
required: [$status, error]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: "planner", prompt: "Analyze the issue and produce an implementation plan." }
|
||||
new: { role: "planner", prompt: "Analyze the issue and produce an implementation plan." }
|
||||
resume: { role: "planner", prompt: "Review the previous run output and continue the work." }
|
||||
planner:
|
||||
insufficient_info: { role: "$SUSPEND", prompt: "信息不足,需要补充:{{{reason}}}" }
|
||||
ready: { role: "developer", prompt: "Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}." }
|
||||
|
||||
@@ -264,7 +264,8 @@ roles:
|
||||
|
||||
graph:
|
||||
$START:
|
||||
_: { role: "bootstrap", prompt: "Set up the Docker container and verify uwf is runnable." }
|
||||
new: { role: "bootstrap", prompt: "Set up the Docker container and verify uwf is runnable." }
|
||||
resume: { role: "bootstrap", prompt: "Review the previous run output and continue the walkthrough." }
|
||||
bootstrap:
|
||||
pass: { role: "config-and-registry", prompt: "Container {{{containerName}}} is ready. Validate config and workflow registration." }
|
||||
fail: { role: "$END", prompt: "Bootstrap failed: {{{error}}}. No container was created." }
|
||||
+4
-1
@@ -21,9 +21,12 @@ graph:
|
||||
role: package-metadata
|
||||
prompt: Biome setup failed ({{{reason}}}), but continue. Standardize package metadata for repo at {{{repoPath}}}.
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: workspace
|
||||
prompt: Set up bun workspace structure for repo at {{{repoPath}}}.
|
||||
resume:
|
||||
role: workspace
|
||||
prompt: Review the previous run output and continue setting up the bun workspace structure for repo at {{{repoPath}}}.
|
||||
release:
|
||||
done:
|
||||
role: testing
|
||||
+1
-1
@@ -21,7 +21,7 @@
|
||||
"@agentclientprotocol/sdk": "^0.22.1",
|
||||
"@biomejs/biome": "^2.4.14",
|
||||
"@changesets/cli": "^2.31.0",
|
||||
"@shazhou/proman": "^0.5.1",
|
||||
"@shazhou/proman": "^0.6.3",
|
||||
"@types/node": "^25.7.0",
|
||||
"@types/xxhashjs": "^0.2.4",
|
||||
"@united-workforce/agent-hermes": "workspace:*",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/agent-builtin",
|
||||
"version": "0.1.0",
|
||||
"version": "0.1.2",
|
||||
"files": [
|
||||
"src",
|
||||
"dist",
|
||||
@@ -21,7 +21,7 @@
|
||||
"test:ci": "vitest run __tests__/"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ocas/core": "^0.3.0",
|
||||
"@ocas/core": "^0.4.0",
|
||||
"@united-workforce/util": "workspace:^",
|
||||
"@united-workforce/util-agent": "workspace:^"
|
||||
},
|
||||
|
||||
@@ -1,4 +1,11 @@
|
||||
#!/usr/bin/env node
|
||||
#!/usr/bin/env -S node --disable-warning=ExperimentalWarning
|
||||
|
||||
// eslint-disable-next-line -- dynamic import for version
|
||||
const pkg = await import("../package.json", { with: { type: "json" } });
|
||||
if (process.argv.includes("--version") || process.argv.includes("-V")) {
|
||||
process.stdout.write(`${pkg.default.version}\n`);
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
import { createBuiltinAgent } from "./agent.js";
|
||||
|
||||
|
||||
@@ -0,0 +1,8 @@
|
||||
# Changelog
|
||||
|
||||
## 0.1.4 — 2026-06-07
|
||||
|
||||
- fix: decouple session resume from isFirstVisit guard
|
||||
|
||||
When frontmatter validation fails, the step is never written to CAS, so isFirstVisit remains true on the next run. Both adapters now always check the session cache regardless of isFirstVisit. When resuming after a frontmatter-only failure (isFirstVisit + cache hit), a minimal correction prompt is sent via buildFrontmatterRetryPrompt() instead of re-sending the full initial prompt.
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/agent-claude-code",
|
||||
"version": "0.1.0",
|
||||
"version": "0.1.4",
|
||||
"files": [
|
||||
"src",
|
||||
"dist",
|
||||
@@ -21,7 +21,7 @@
|
||||
"test:ci": "vitest run __tests__/"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ocas/core": "^0.3.0",
|
||||
"@ocas/core": "^0.4.0",
|
||||
"@united-workforce/protocol": "workspace:^",
|
||||
"@united-workforce/util": "workspace:^",
|
||||
"@united-workforce/util-agent": "workspace:^"
|
||||
|
||||
@@ -6,7 +6,9 @@ import {
|
||||
type AgentContext,
|
||||
type AgentRunResult,
|
||||
buildContinuationPrompt,
|
||||
buildFrontmatterRetryPrompt,
|
||||
buildRolePrompt,
|
||||
buildThreadProgress,
|
||||
createAgent,
|
||||
getCachedSessionId,
|
||||
setCachedSessionId,
|
||||
@@ -27,6 +29,10 @@ export function buildClaudeCodePrompt(ctx: AgentContext): string {
|
||||
if (ctx.outputFormatInstruction !== undefined && ctx.outputFormatInstruction !== "") {
|
||||
parts.push(ctx.outputFormatInstruction, "");
|
||||
}
|
||||
|
||||
// Inject thread progress so the agent knows step count and role visit count
|
||||
parts.push(buildThreadProgress(ctx.steps, ctx.role), "");
|
||||
|
||||
parts.push(rolePrompt, "", "## Task", ctx.start.prompt);
|
||||
|
||||
if (!ctx.isFirstVisit) {
|
||||
@@ -171,8 +177,12 @@ async function runClaudeCode(ctx: AgentContext, model: string | null): Promise<A
|
||||
|
||||
log("K7R2M4N8", `prompt for role=${ctx.role} (length=${fullPrompt.length}):\n${fullPrompt}`);
|
||||
|
||||
// Try resuming a cached session for re-entry scenarios (e.g. reviewer reject → developer re-entry).
|
||||
if (!ctx.isFirstVisit) {
|
||||
// Try resuming a cached session. This covers both normal re-entry
|
||||
// (e.g. reviewer reject → developer re-entry) AND the case where a
|
||||
// previous run completed but frontmatter validation failed — the step
|
||||
// was never written to CAS so isFirstVisit is still true, but the
|
||||
// session cache holds a valid session we should resume.
|
||||
{
|
||||
const cachedSessionId = await getCachedSessionId(
|
||||
"claude-code",
|
||||
ctx.threadId,
|
||||
@@ -180,13 +190,20 @@ async function runClaudeCode(ctx: AgentContext, model: string | null): Promise<A
|
||||
ctx.storageRoot,
|
||||
);
|
||||
if (cachedSessionId !== null) {
|
||||
// isFirstVisit + cache hit = previous run completed but frontmatter
|
||||
// validation failed. The session already has full context — send a
|
||||
// minimal correction prompt instead of the full initial prompt.
|
||||
const resumePrompt = ctx.isFirstVisit
|
||||
? buildFrontmatterRetryPrompt(ctx.outputFormatInstruction)
|
||||
: fullPrompt;
|
||||
|
||||
try {
|
||||
const { stdout, stderr, exitCode } = await spawnClaudeResume(
|
||||
cachedSessionId,
|
||||
fullPrompt,
|
||||
resumePrompt,
|
||||
model,
|
||||
);
|
||||
const result = await processClaudeOutput(stdout, stderr, exitCode, ctx.store, fullPrompt);
|
||||
const result = await processClaudeOutput(stdout, stderr, exitCode, ctx.store, resumePrompt);
|
||||
if (result.sessionId !== undefined && result.sessionId !== "") {
|
||||
await setCachedSessionId(
|
||||
"claude-code",
|
||||
|
||||
@@ -1,4 +1,11 @@
|
||||
#!/usr/bin/env node
|
||||
#!/usr/bin/env -S node --disable-warning=ExperimentalWarning
|
||||
|
||||
// eslint-disable-next-line -- dynamic import for version
|
||||
const pkg = await import("../package.json", { with: { type: "json" } });
|
||||
if (process.argv.includes("--version") || process.argv.includes("-V")) {
|
||||
process.stdout.write(`${pkg.default.version}\n`);
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
import { createClaudeCodeAgent } from "./claude-code.js";
|
||||
|
||||
|
||||
@@ -0,0 +1,24 @@
|
||||
# @united-workforce/agent-hermes
|
||||
|
||||
## 0.1.5 — 2026-06-07
|
||||
|
||||
- fix: decouple session resume from isFirstVisit guard
|
||||
|
||||
When frontmatter validation fails, the step is never written to CAS, so isFirstVisit remains true on the next run. Both adapters now always check the session cache regardless of isFirstVisit. When resuming after a frontmatter-only failure (isFirstVisit + cache hit), a minimal correction prompt is sent via buildFrontmatterRetryPrompt() instead of re-sending the full initial prompt.
|
||||
|
||||
## 0.1.1
|
||||
|
||||
### Patch Changes
|
||||
|
||||
- 8085d1d: fix: read token usage from ACP PromptResponse instead of DB
|
||||
|
||||
Token counts (inputTokens, outputTokens) now come from the ACP
|
||||
`PromptResponse.usage` field, which is populated synchronously from
|
||||
`run_conversation()` return data — no WAL race condition.
|
||||
|
||||
Turns (assistant message count) still come from the DB via
|
||||
`snapshotTurns()` before/after delta.
|
||||
|
||||
Previously both tokens and turns were read from the Hermes state DB
|
||||
after the ACP prompt returned, but due to WAL write lag the DB often
|
||||
had incomplete token data at read time (e.g. 235 vs actual 26,080).
|
||||
@@ -15,7 +15,8 @@ describe("Issue #551 — bin entry & engines", () => {
|
||||
const pkg = JSON.parse(readFileSync(join(PKG_ROOT, "package.json"), "utf-8"));
|
||||
const binPath = pkg.bin["uwf-hermes"];
|
||||
const content = readFileSync(join(PKG_ROOT, binPath), "utf-8");
|
||||
expect(content.startsWith("#!/usr/bin/env node")).toBe(true);
|
||||
expect(content.startsWith("#!/usr/bin/env")).toBe(true);
|
||||
expect(content).toContain("node");
|
||||
});
|
||||
|
||||
test("README.md explains uwf-hermes is an adapter", () => {
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import { describe, expect, test } from "vitest";
|
||||
import { computeUsageDelta, snapshotUsage } from "../src/hermes.js";
|
||||
import type { AcpUsage } from "../src/acp-client.js";
|
||||
import { buildUsage, snapshotTurns } from "../src/hermes.js";
|
||||
import type { HermesSessionJson } from "../src/types.js";
|
||||
|
||||
function makeSession(overrides: Partial<HermesSessionJson> = {}): HermesSessionJson {
|
||||
@@ -14,19 +15,19 @@ function makeSession(overrides: Partial<HermesSessionJson> = {}): HermesSessionJ
|
||||
};
|
||||
}
|
||||
|
||||
describe("snapshotUsage", () => {
|
||||
test("returns zero snapshot for null session", () => {
|
||||
const result = snapshotUsage(null);
|
||||
expect(result).toEqual({ turns: 0, inputTokens: 0, outputTokens: 0 });
|
||||
describe("snapshotTurns", () => {
|
||||
test("returns zero for null session", () => {
|
||||
const result = snapshotTurns(null);
|
||||
expect(result).toEqual({ turns: 0 });
|
||||
});
|
||||
|
||||
test("returns zero snapshot for empty session", () => {
|
||||
const result = snapshotUsage(makeSession());
|
||||
expect(result).toEqual({ turns: 0, inputTokens: 0, outputTokens: 0 });
|
||||
test("returns zero for empty session", () => {
|
||||
const result = snapshotTurns(makeSession());
|
||||
expect(result).toEqual({ turns: 0 });
|
||||
});
|
||||
|
||||
test("counts assistant messages as turns", () => {
|
||||
const result = snapshotUsage(
|
||||
const result = snapshotTurns(
|
||||
makeSession({
|
||||
messages: [
|
||||
{ role: "user", content: "hello", reasoning: null, tool_calls: null },
|
||||
@@ -39,11 +40,11 @@ describe("snapshotUsage", () => {
|
||||
outputTokens: 500,
|
||||
}),
|
||||
);
|
||||
expect(result).toEqual({ turns: 2, inputTokens: 1000, outputTokens: 500 });
|
||||
expect(result).toEqual({ turns: 2 });
|
||||
});
|
||||
|
||||
test("ignores non-assistant messages for turn count", () => {
|
||||
const result = snapshotUsage(
|
||||
const result = snapshotTurns(
|
||||
makeSession({
|
||||
messages: [
|
||||
{ role: "user", content: "hello", reasoning: null, tool_calls: null },
|
||||
@@ -55,11 +56,13 @@ describe("snapshotUsage", () => {
|
||||
});
|
||||
});
|
||||
|
||||
describe("computeUsageDelta", () => {
|
||||
test("first visit: before is zero, after has all values", () => {
|
||||
const before = { turns: 0, inputTokens: 0, outputTokens: 0 };
|
||||
const after = { turns: 3, inputTokens: 5000, outputTokens: 2000 };
|
||||
const result = computeUsageDelta(before, after, 12.5);
|
||||
describe("buildUsage", () => {
|
||||
const acpUsage: AcpUsage = { inputTokens: 5000, outputTokens: 2000, totalTokens: 7000 };
|
||||
|
||||
test("first visit: tokens from ACP, turns from DB delta", () => {
|
||||
const beforeTurns = { turns: 0 };
|
||||
const afterTurns = { turns: 3 };
|
||||
const result = buildUsage(acpUsage, beforeTurns, afterTurns, 12.5);
|
||||
expect(result).toEqual({
|
||||
turns: 3,
|
||||
inputTokens: 5000,
|
||||
@@ -68,43 +71,52 @@ describe("computeUsageDelta", () => {
|
||||
});
|
||||
});
|
||||
|
||||
test("re-entry: computes delta correctly", () => {
|
||||
const before = { turns: 2, inputTokens: 3000, outputTokens: 1000 };
|
||||
const after = { turns: 4, inputTokens: 8000, outputTokens: 3500 };
|
||||
const result = computeUsageDelta(before, after, 7.3);
|
||||
test("re-entry: turn delta computed correctly, tokens from ACP", () => {
|
||||
const beforeTurns = { turns: 2 };
|
||||
const afterTurns = { turns: 4 };
|
||||
const acpDelta: AcpUsage = { inputTokens: 8000, outputTokens: 3500, totalTokens: 11500 };
|
||||
const result = buildUsage(acpDelta, beforeTurns, afterTurns, 7.3);
|
||||
expect(result).toEqual({
|
||||
turns: 2,
|
||||
inputTokens: 5000,
|
||||
outputTokens: 2500,
|
||||
inputTokens: 8000,
|
||||
outputTokens: 3500,
|
||||
duration: 7,
|
||||
});
|
||||
});
|
||||
|
||||
test("floors negative deltas at 0 (defensive)", () => {
|
||||
const before = { turns: 5, inputTokens: 10000, outputTokens: 5000 };
|
||||
const after = { turns: 3, inputTokens: 8000, outputTokens: 4000 };
|
||||
const result = computeUsageDelta(before, after, 1.0);
|
||||
test("floors negative turn deltas at 0, then defaults to 1", () => {
|
||||
const beforeTurns = { turns: 5 };
|
||||
const afterTurns = { turns: 3 };
|
||||
const result = buildUsage(acpUsage, beforeTurns, afterTurns, 1.0);
|
||||
// turns would be negative (-2), floored to 0, then || 1 gives 1
|
||||
expect(result.turns).toBe(1);
|
||||
expect(result.inputTokens).toBe(0);
|
||||
expect(result.outputTokens).toBe(0);
|
||||
});
|
||||
|
||||
test("zero turns delta defaults to 1 (at least one turn happened)", () => {
|
||||
const before = { turns: 3, inputTokens: 1000, outputTokens: 500 };
|
||||
const after = { turns: 3, inputTokens: 2000, outputTokens: 1000 };
|
||||
const result = computeUsageDelta(before, after, 5.0);
|
||||
const beforeTurns = { turns: 3 };
|
||||
const afterTurns = { turns: 3 };
|
||||
const result = buildUsage(acpUsage, beforeTurns, afterTurns, 5.0);
|
||||
// turns delta is 0, || 1 gives 1
|
||||
expect(result.turns).toBe(1);
|
||||
expect(result.inputTokens).toBe(1000);
|
||||
expect(result.outputTokens).toBe(500);
|
||||
});
|
||||
|
||||
test("null ACP usage yields zero tokens", () => {
|
||||
const beforeTurns = { turns: 0 };
|
||||
const afterTurns = { turns: 2 };
|
||||
const result = buildUsage(null, beforeTurns, afterTurns, 10.0);
|
||||
expect(result).toEqual({
|
||||
turns: 2,
|
||||
inputTokens: 0,
|
||||
outputTokens: 0,
|
||||
duration: 10,
|
||||
});
|
||||
});
|
||||
|
||||
test("duration is rounded", () => {
|
||||
const before = { turns: 0, inputTokens: 0, outputTokens: 0 };
|
||||
const after = { turns: 1, inputTokens: 100, outputTokens: 50 };
|
||||
expect(computeUsageDelta(before, after, 3.7).duration).toBe(4);
|
||||
expect(computeUsageDelta(before, after, 3.2).duration).toBe(3);
|
||||
expect(computeUsageDelta(before, after, 0.0).duration).toBe(0);
|
||||
const beforeTurns = { turns: 0 };
|
||||
const afterTurns = { turns: 1 };
|
||||
expect(buildUsage(acpUsage, beforeTurns, afterTurns, 3.7).duration).toBe(4);
|
||||
expect(buildUsage(acpUsage, beforeTurns, afterTurns, 3.2).duration).toBe(3);
|
||||
expect(buildUsage(acpUsage, beforeTurns, afterTurns, 0.0).duration).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/agent-hermes",
|
||||
"version": "0.1.0",
|
||||
"version": "0.1.5",
|
||||
"files": [
|
||||
"src",
|
||||
"dist",
|
||||
@@ -21,7 +21,7 @@
|
||||
"test:ci": "vitest run __tests__/"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ocas/core": "^0.3.0",
|
||||
"@ocas/core": "^0.4.0",
|
||||
"@united-workforce/protocol": "workspace:^",
|
||||
"@united-workforce/util": "workspace:^",
|
||||
"@united-workforce/util-agent": "workspace:^"
|
||||
|
||||
@@ -1,8 +1,22 @@
|
||||
import type { ChildProcess } from "node:child_process";
|
||||
import { spawn } from "node:child_process";
|
||||
import { readFileSync } from "node:fs";
|
||||
import { dirname, join } from "node:path";
|
||||
import { createInterface } from "node:readline";
|
||||
import { fileURLToPath } from "node:url";
|
||||
|
||||
const HERMES_COMMAND = "hermes";
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const OWN_VERSION = (
|
||||
JSON.parse(readFileSync(join(__dirname, "..", "package.json"), "utf-8")) as {
|
||||
version: string;
|
||||
}
|
||||
).version;
|
||||
|
||||
/** Resolve hermes binary: `UWF_HERMES_BIN` override → default `"hermes"` via PATH. */
|
||||
function resolveHermesCommand(): string {
|
||||
const override = process.env.UWF_HERMES_BIN;
|
||||
return override !== undefined && override !== "" ? override : "hermes";
|
||||
}
|
||||
const PROTOCOL_VERSION = 1;
|
||||
|
||||
type JsonRpcResponse = {
|
||||
@@ -17,9 +31,17 @@ type PendingRequest = {
|
||||
reject: (reason: Error) => void;
|
||||
};
|
||||
|
||||
/** Token usage returned by ACP PromptResponse. */
|
||||
export type AcpUsage = {
|
||||
inputTokens: number;
|
||||
outputTokens: number;
|
||||
totalTokens: number;
|
||||
};
|
||||
|
||||
export type AcpPromptResult = {
|
||||
text: string;
|
||||
sessionId: string;
|
||||
usage: AcpUsage | null;
|
||||
};
|
||||
|
||||
export class HermesAcpClient {
|
||||
@@ -96,9 +118,25 @@ export class HermesAcpClient {
|
||||
);
|
||||
}
|
||||
|
||||
// Extract token usage from ACP PromptResponse.result.usage (camelCase wire format)
|
||||
const result = (response as { result?: Record<string, unknown> }).result;
|
||||
const rawUsage = result?.usage as Record<string, unknown> | undefined;
|
||||
const usage: AcpUsage | null =
|
||||
rawUsage !== undefined &&
|
||||
typeof rawUsage.inputTokens === "number" &&
|
||||
typeof rawUsage.outputTokens === "number" &&
|
||||
typeof rawUsage.totalTokens === "number"
|
||||
? {
|
||||
inputTokens: rawUsage.inputTokens,
|
||||
outputTokens: rawUsage.outputTokens,
|
||||
totalTokens: rawUsage.totalTokens,
|
||||
}
|
||||
: null;
|
||||
|
||||
return {
|
||||
text: this.messageChunks.join(""),
|
||||
sessionId: this.sessionId,
|
||||
usage,
|
||||
};
|
||||
}
|
||||
|
||||
@@ -237,7 +275,8 @@ export class HermesAcpClient {
|
||||
return;
|
||||
}
|
||||
|
||||
const child = spawn(HERMES_COMMAND, ["acp"], {
|
||||
const hermesCommand = resolveHermesCommand();
|
||||
const child = spawn(hermesCommand, ["acp"], {
|
||||
env: process.env,
|
||||
shell: false,
|
||||
stdio: ["pipe", "pipe", "pipe"],
|
||||
@@ -275,7 +314,7 @@ export class HermesAcpClient {
|
||||
private async initialize(): Promise<void> {
|
||||
const initResponse = await this.sendRequest("initialize", {
|
||||
protocolVersion: PROTOCOL_VERSION,
|
||||
clientInfo: { name: "uwf", version: "0.1.0" },
|
||||
clientInfo: { name: "uwf-hermes", version: OWN_VERSION },
|
||||
capabilities: {},
|
||||
});
|
||||
|
||||
|
||||
@@ -1,4 +1,11 @@
|
||||
#!/usr/bin/env node
|
||||
#!/usr/bin/env -S node --disable-warning=ExperimentalWarning
|
||||
|
||||
// eslint-disable-next-line -- dynamic import for version
|
||||
const pkg = await import("../package.json", { with: { type: "json" } });
|
||||
if (process.argv.includes("--version") || process.argv.includes("-V")) {
|
||||
process.stdout.write(`${pkg.default.version}\n`);
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
import { createHermesAgent } from "./hermes.js";
|
||||
import { isResumeDisabled } from "./session-cache.js";
|
||||
|
||||
@@ -5,10 +5,12 @@ import {
|
||||
type AgentContext,
|
||||
type AgentRunResult,
|
||||
buildContinuationPrompt,
|
||||
buildFrontmatterRetryPrompt,
|
||||
buildRolePrompt,
|
||||
buildThreadProgress,
|
||||
createAgent,
|
||||
} from "@united-workforce/util-agent";
|
||||
|
||||
import type { AcpUsage } from "./acp-client.js";
|
||||
import { HermesAcpClient } from "./acp-client.js";
|
||||
import { getCachedSessionId, setCachedSessionId } from "./session-cache.js";
|
||||
import { loadHermesSession, storeHermesSessionDetail } from "./session-detail.js";
|
||||
@@ -17,36 +19,37 @@ import type { HermesSessionJson } from "./types.js";
|
||||
const log = createLogger({ sink: { kind: "stderr" } });
|
||||
|
||||
/** Snapshot of session metrics taken before and after a prompt call. */
|
||||
type UsageSnapshot = {
|
||||
type TurnsSnapshot = {
|
||||
turns: number;
|
||||
inputTokens: number;
|
||||
outputTokens: number;
|
||||
};
|
||||
|
||||
const ZERO_SNAPSHOT: UsageSnapshot = { turns: 0, inputTokens: 0, outputTokens: 0 };
|
||||
const ZERO_TURNS: TurnsSnapshot = { turns: 0 };
|
||||
|
||||
/** Extract usage metrics from a session. Returns zeros for null sessions. */
|
||||
export function snapshotUsage(session: HermesSessionJson | null): UsageSnapshot {
|
||||
/** Extract assistant turn count from a session. Returns zero for null sessions. */
|
||||
export function snapshotTurns(session: HermesSessionJson | null): TurnsSnapshot {
|
||||
if (session === null) {
|
||||
return ZERO_SNAPSHOT;
|
||||
return ZERO_TURNS;
|
||||
}
|
||||
return {
|
||||
turns: session.messages.filter((m) => m.role === "assistant").length,
|
||||
inputTokens: session.inputTokens,
|
||||
outputTokens: session.outputTokens,
|
||||
};
|
||||
}
|
||||
|
||||
/** Compute the delta between two snapshots (after minus before). Floors at 0. */
|
||||
export function computeUsageDelta(
|
||||
before: UsageSnapshot,
|
||||
after: UsageSnapshot,
|
||||
/**
|
||||
* Build Usage from ACP token data + DB turn delta.
|
||||
* Tokens come from ACP PromptResponse (synchronous, accurate).
|
||||
* Turns come from DB before/after snapshots (may have WAL lag, but acceptable).
|
||||
*/
|
||||
export function buildUsage(
|
||||
acpUsage: AcpUsage | null,
|
||||
beforeTurns: TurnsSnapshot,
|
||||
afterTurns: TurnsSnapshot,
|
||||
durationSec: number,
|
||||
): Usage {
|
||||
return {
|
||||
turns: Math.max(0, after.turns - before.turns) || 1,
|
||||
inputTokens: Math.max(0, after.inputTokens - before.inputTokens),
|
||||
outputTokens: Math.max(0, after.outputTokens - before.outputTokens),
|
||||
turns: Math.max(0, afterTurns.turns - beforeTurns.turns) || 1,
|
||||
inputTokens: acpUsage?.inputTokens ?? 0,
|
||||
outputTokens: acpUsage?.outputTokens ?? 0,
|
||||
duration: Math.round(durationSec),
|
||||
};
|
||||
}
|
||||
@@ -59,6 +62,9 @@ export function buildHermesPrompt(ctx: AgentContext): string {
|
||||
parts.push(ctx.outputFormatInstruction, "");
|
||||
}
|
||||
|
||||
// Inject thread progress so the agent knows step count and role visit count
|
||||
parts.push(buildThreadProgress(ctx.steps, ctx.role), "");
|
||||
|
||||
if (!ctx.isFirstVisit) {
|
||||
// Re-entry: show only steps since last visit, meta only
|
||||
parts.push(buildContinuationPrompt(ctx.steps, ctx.role, ctx.edgePrompt));
|
||||
@@ -97,6 +103,8 @@ async function storePromptResult(store: Store, sessionId: string): Promise<{ det
|
||||
type PromptAttempt = {
|
||||
useContinuation: boolean;
|
||||
resumed: boolean;
|
||||
/** True when resuming after a frontmatter-only failure (isFirstVisit + cache hit). */
|
||||
frontmatterRetry: boolean;
|
||||
};
|
||||
|
||||
async function prepareSession(
|
||||
@@ -105,28 +113,36 @@ async function prepareSession(
|
||||
cwd: string,
|
||||
resumeDisabled: boolean,
|
||||
): Promise<PromptAttempt> {
|
||||
if (ctx.isFirstVisit || resumeDisabled) {
|
||||
if (resumeDisabled) {
|
||||
await client.connect(cwd);
|
||||
return { useContinuation: false, resumed: false };
|
||||
return { useContinuation: false, resumed: false, frontmatterRetry: false };
|
||||
}
|
||||
|
||||
// Check session cache regardless of isFirstVisit. A previous run may
|
||||
// have completed and cached its session but failed frontmatter
|
||||
// validation — the step never got written to CAS so isFirstVisit is
|
||||
// still true, yet we should resume the existing session.
|
||||
const cachedSessionId = await getCachedSessionId(ctx.threadId, ctx.role, ctx.storageRoot);
|
||||
if (cachedSessionId === null) {
|
||||
log("6RWK3N8Q", `no cached session for ${ctx.threadId}:${ctx.role}, starting new session`);
|
||||
await client.connect(cwd);
|
||||
return { useContinuation: false, resumed: false };
|
||||
return { useContinuation: false, resumed: false, frontmatterRetry: false };
|
||||
}
|
||||
|
||||
try {
|
||||
await client.resume(cachedSessionId, cwd);
|
||||
log("9MHT4V2P", `resumed hermes session ${cachedSessionId} for ${ctx.threadId}:${ctx.role}`);
|
||||
return { useContinuation: true, resumed: true };
|
||||
return {
|
||||
useContinuation: !ctx.isFirstVisit,
|
||||
resumed: true,
|
||||
frontmatterRetry: ctx.isFirstVisit,
|
||||
};
|
||||
} catch (error) {
|
||||
const message = error instanceof Error ? error.message : String(error);
|
||||
log("3XPN7K4W", `session resume failed, falling back to new session: ${message}`);
|
||||
await client.close();
|
||||
await client.connect(cwd);
|
||||
return { useContinuation: false, resumed: false };
|
||||
return { useContinuation: false, resumed: false, frontmatterRetry: false };
|
||||
}
|
||||
}
|
||||
|
||||
@@ -148,12 +164,15 @@ export function createHermesAgent(resumeDisabled: boolean): () => Promise<void>
|
||||
async function runPrompt(
|
||||
ctx: AgentContext,
|
||||
useContinuation: boolean,
|
||||
beforeSnapshot: UsageSnapshot,
|
||||
beforeTurns: TurnsSnapshot,
|
||||
frontmatterRetry: boolean,
|
||||
): Promise<AgentRunResult> {
|
||||
const effectiveCtx = useContinuation ? ctx : { ...ctx, isFirstVisit: true };
|
||||
const fullPrompt = buildHermesPrompt(effectiveCtx);
|
||||
// Frontmatter retry: session has full context, just re-output the format.
|
||||
const fullPrompt = frontmatterRetry
|
||||
? buildFrontmatterRetryPrompt(ctx.outputFormatInstruction)
|
||||
: buildHermesPrompt(useContinuation ? ctx : { ...ctx, isFirstVisit: true });
|
||||
const startMs = Date.now();
|
||||
const { text, sessionId } = await client.prompt(fullPrompt);
|
||||
const { text, sessionId, usage: acpUsage } = await client.prompt(fullPrompt);
|
||||
const durationSec = (Date.now() - startMs) / 1000;
|
||||
const { detailHash } = await storePromptResult(ctx.store, sessionId);
|
||||
|
||||
@@ -161,9 +180,10 @@ export function createHermesAgent(resumeDisabled: boolean): () => Promise<void>
|
||||
await setCachedSessionId(ctx.threadId, ctx.role, sessionId, ctx.storageRoot);
|
||||
}
|
||||
|
||||
// Turns from DB (may lag slightly due to WAL, but acceptable)
|
||||
const afterSession = await loadHermesSession(sessionId);
|
||||
const afterSnapshot = snapshotUsage(afterSession);
|
||||
const usage = computeUsageDelta(beforeSnapshot, afterSnapshot, durationSec);
|
||||
const afterTurns = snapshotTurns(afterSession);
|
||||
const usage = buildUsage(acpUsage, beforeTurns, afterTurns, durationSec);
|
||||
|
||||
return { output: text, detailHash, sessionId, assembledPrompt: fullPrompt, usage };
|
||||
}
|
||||
@@ -173,16 +193,16 @@ export function createHermesAgent(resumeDisabled: boolean): () => Promise<void>
|
||||
const attempt = await prepareSession(client, ctx, cwd, resumeDisabled);
|
||||
|
||||
// Snapshot before prompt: for resumed sessions, captures cumulative state
|
||||
// so we can compute the delta. For new sessions, this is ZERO_SNAPSHOT.
|
||||
// so we can compute the turn delta. For new sessions, this is ZERO_TURNS.
|
||||
const currentSessionId = client.getSessionId();
|
||||
const beforeSession =
|
||||
attempt.resumed && currentSessionId !== null
|
||||
? await loadHermesSession(currentSessionId)
|
||||
: null;
|
||||
const beforeSnapshot = snapshotUsage(beforeSession);
|
||||
const beforeTurns = snapshotTurns(beforeSession);
|
||||
|
||||
try {
|
||||
return await runPrompt(ctx, attempt.useContinuation, beforeSnapshot);
|
||||
return await runPrompt(ctx, attempt.useContinuation, beforeTurns, attempt.frontmatterRetry);
|
||||
} catch (error) {
|
||||
if (!attempt.resumed) {
|
||||
throw error;
|
||||
@@ -193,7 +213,7 @@ export function createHermesAgent(resumeDisabled: boolean): () => Promise<void>
|
||||
await client.close();
|
||||
await client.connect(cwd);
|
||||
// Fresh session after retry — reset snapshot to zero
|
||||
return runPrompt(ctx, false, ZERO_SNAPSHOT);
|
||||
return runPrompt(ctx, false, ZERO_TURNS, false);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -204,20 +224,20 @@ export function createHermesAgent(resumeDisabled: boolean): () => Promise<void>
|
||||
): Promise<AgentRunResult> {
|
||||
// Client is already connected from runHermes — same ACP session,
|
||||
// so the agent sees the full conversation history (crucial for retries).
|
||||
// Snapshot before the continuation prompt for delta computation.
|
||||
// Snapshot turns before the continuation prompt for delta computation.
|
||||
const currentSessionId = client.getSessionId();
|
||||
const beforeSession =
|
||||
currentSessionId !== null ? await loadHermesSession(currentSessionId) : null;
|
||||
const beforeSnapshot = snapshotUsage(beforeSession);
|
||||
const beforeTurns = snapshotTurns(beforeSession);
|
||||
|
||||
const startMs = Date.now();
|
||||
const { text, sessionId } = await client.prompt(message);
|
||||
const { text, sessionId, usage: acpUsage } = await client.prompt(message);
|
||||
const durationSec = (Date.now() - startMs) / 1000;
|
||||
const { detailHash } = await storePromptResult(store, sessionId);
|
||||
|
||||
const afterSession = await loadHermesSession(sessionId);
|
||||
const afterSnapshot = snapshotUsage(afterSession);
|
||||
const usage = computeUsageDelta(beforeSnapshot, afterSnapshot, durationSec);
|
||||
const afterTurns = snapshotTurns(afterSession);
|
||||
const usage = buildUsage(acpUsage, beforeTurns, afterTurns, durationSec);
|
||||
|
||||
return { output: text, detailHash, sessionId, assembledPrompt: "", usage };
|
||||
}
|
||||
|
||||
@@ -1,7 +1,8 @@
|
||||
export type { AcpUsage } from "./acp-client.js";
|
||||
export { HermesAcpClient } from "./acp-client.js";
|
||||
export {
|
||||
buildHermesPrompt,
|
||||
computeUsageDelta,
|
||||
buildUsage,
|
||||
createHermesAgent,
|
||||
snapshotUsage,
|
||||
snapshotTurns,
|
||||
} from "./hermes.js";
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/agent-mock",
|
||||
"version": "0.1.0",
|
||||
"version": "0.1.2",
|
||||
"files": [
|
||||
"src",
|
||||
"dist",
|
||||
@@ -21,7 +21,7 @@
|
||||
"test:ci": "vitest run __tests__/"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ocas/core": "^0.3.0",
|
||||
"@ocas/core": "^0.4.0",
|
||||
"@united-workforce/protocol": "workspace:^",
|
||||
"@united-workforce/util": "workspace:^",
|
||||
"@united-workforce/util-agent": "workspace:^",
|
||||
|
||||
@@ -1,4 +1,11 @@
|
||||
#!/usr/bin/env node
|
||||
#!/usr/bin/env -S node --disable-warning=ExperimentalWarning
|
||||
|
||||
// eslint-disable-next-line -- dynamic import for version
|
||||
const pkg = await import("../package.json", { with: { type: "json" } });
|
||||
if (process.argv.includes("--version") || process.argv.includes("-V")) {
|
||||
process.stdout.write(`${pkg.default.version}\n`);
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
import { createMockAgent } from "./mock-agent.js";
|
||||
|
||||
|
||||
@@ -0,0 +1,9 @@
|
||||
# @united-workforce/cli
|
||||
|
||||
## 0.1.1
|
||||
|
||||
### Patch Changes
|
||||
|
||||
- 850a3b2: fix: resolve --agent override via config alias before raw command
|
||||
|
||||
`resolveAgentConfig()` now checks `config.agents[alias]` first before falling back to `parseAgentOverride()`. Eval CLI default `--agent` changed from `"hermes"` to `"uwf-hermes"`.
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/cli",
|
||||
"version": "0.1.0",
|
||||
"version": "0.3.0",
|
||||
"files": [
|
||||
"src",
|
||||
"dist",
|
||||
@@ -11,8 +11,8 @@
|
||||
"uwf": "./dist/cli.js"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ocas/core": "^0.3.0",
|
||||
"@ocas/fs": "^0.3.0",
|
||||
"@ocas/core": "^0.4.0",
|
||||
"@ocas/fs": "^0.4.0",
|
||||
"@united-workforce/protocol": "workspace:^",
|
||||
"@united-workforce/util": "workspace:^",
|
||||
"@united-workforce/util-agent": "workspace:^",
|
||||
|
||||
@@ -58,7 +58,10 @@ describe("C1: adapter JSON round-trip integration", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Do the work", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Do the work", location: null },
|
||||
resume: { role: "worker", prompt: "Resume the work", location: null },
|
||||
},
|
||||
worker: { done: { role: "$END", prompt: "completed", location: null } },
|
||||
},
|
||||
});
|
||||
|
||||
@@ -28,9 +28,13 @@ roles:
|
||||
$status: "ready"
|
||||
frontmatter:
|
||||
type: object
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "ready" }
|
||||
required: ["$status"]
|
||||
- properties:
|
||||
$status: { const: "not-ready" }
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["ready", "not-ready"] }
|
||||
roleB:
|
||||
description: Second role
|
||||
goal: Do B
|
||||
@@ -42,13 +46,17 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["done"] }
|
||||
$status: { const: "done" }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: roleA
|
||||
prompt: "Do A"
|
||||
location: null
|
||||
resume:
|
||||
role: roleA
|
||||
prompt: "Resume A"
|
||||
location: null
|
||||
roleA:
|
||||
ready:
|
||||
role: roleB
|
||||
@@ -78,9 +86,13 @@ roles:
|
||||
$status: "pass"
|
||||
frontmatter:
|
||||
type: object
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "pass" }
|
||||
required: ["$status"]
|
||||
- properties:
|
||||
$status: { const: "fail" }
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["pass", "fail"] }
|
||||
roleB:
|
||||
description: Pass role
|
||||
goal: Do B
|
||||
@@ -92,7 +104,7 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["done"] }
|
||||
$status: { const: "done" }
|
||||
roleC:
|
||||
description: Fail role
|
||||
goal: Do C
|
||||
@@ -104,13 +116,17 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["done"] }
|
||||
$status: { const: "done" }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: roleA
|
||||
prompt: "Do A"
|
||||
location: null
|
||||
resume:
|
||||
role: roleA
|
||||
prompt: "Resume A"
|
||||
location: null
|
||||
roleA:
|
||||
pass:
|
||||
role: roleB
|
||||
@@ -147,13 +163,17 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["done"] }
|
||||
$status: { const: "done" }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: worker
|
||||
prompt: "Work"
|
||||
location: null
|
||||
resume:
|
||||
role: worker
|
||||
prompt: "Resume work"
|
||||
location: null
|
||||
worker:
|
||||
done:
|
||||
role: $END
|
||||
|
||||
@@ -36,7 +36,8 @@ roles:
|
||||
required: [$status]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: analyst, prompt: 'Analyze the task' }
|
||||
new: { role: analyst, prompt: 'Analyze the task' }
|
||||
resume: { role: analyst, prompt: 'Review the previous run output and continue the work.' }
|
||||
analyst:
|
||||
analyzed: { role: developer, prompt: 'Implement the change' }
|
||||
developer:
|
||||
|
||||
@@ -25,7 +25,8 @@ roles:
|
||||
required: [$status]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: planner, prompt: 'Plan the task' }
|
||||
new: { role: planner, prompt: 'Plan the task' }
|
||||
resume: { role: planner, prompt: 'Review the previous run output and continue the work.' }
|
||||
planner:
|
||||
ready: { role: worker, prompt: 'Do the work' }
|
||||
worker:
|
||||
|
||||
@@ -28,7 +28,8 @@ roles:
|
||||
required: [$status]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: developer, prompt: 'Implement the change' }
|
||||
new: { role: developer, prompt: 'Implement the change' }
|
||||
resume: { role: developer, prompt: 'Review the previous run output and continue the work.' }
|
||||
developer:
|
||||
review_needed: { role: reviewer, prompt: 'Review the change' }
|
||||
reviewer:
|
||||
|
||||
@@ -27,7 +27,8 @@ roles:
|
||||
required: [$status]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: planner, prompt: 'Plan the task' }
|
||||
new: { role: planner, prompt: 'Plan the task' }
|
||||
resume: { role: planner, prompt: 'Review the previous run output and continue the work.' }
|
||||
planner:
|
||||
ready: { role: worker, prompt: 'Work on branch {{{branch}}} in {{{repoPath}}}' }
|
||||
worker:
|
||||
|
||||
@@ -18,7 +18,8 @@ roles:
|
||||
required: [$status]
|
||||
graph:
|
||||
$START:
|
||||
_: { role: planner, prompt: 'Analyze the task' }
|
||||
new: { role: planner, prompt: 'Analyze the task' }
|
||||
resume: { role: planner, prompt: 'Review the previous run output and continue the work.' }
|
||||
planner:
|
||||
insufficient_info: { role: '$SUSPEND', prompt: 'Need more info: {{{reason}}}' }
|
||||
ready: { role: '$END', prompt: 'Done' }
|
||||
|
||||
@@ -5,7 +5,12 @@ import { evaluate } from "../moderator/evaluate.js";
|
||||
|
||||
const solveIssueGraph: WorkflowPayload["graph"] = {
|
||||
$START: {
|
||||
_: { role: "planner", prompt: "Start planning from the issue in the task.", location: null },
|
||||
new: { role: "planner", prompt: "Start planning from the issue in the task.", location: null },
|
||||
resume: {
|
||||
role: "planner",
|
||||
prompt: "Review the previous run output and continue the work.",
|
||||
location: null,
|
||||
},
|
||||
},
|
||||
planner: {
|
||||
planned: { role: "developer", prompt: "Implement the plan: {{plan}}", location: null },
|
||||
@@ -20,8 +25,8 @@ const solveIssueGraph: WorkflowPayload["graph"] = {
|
||||
};
|
||||
|
||||
describe("evaluate", () => {
|
||||
test("$START → first role (unit status _)", () => {
|
||||
const result = evaluate(solveIssueGraph, "$START", { $status: "_" });
|
||||
test("$START → first role (status new)", () => {
|
||||
const result = evaluate(solveIssueGraph, "$START", { $status: "new" });
|
||||
expect(result).toEqual({
|
||||
ok: true,
|
||||
value: {
|
||||
@@ -32,6 +37,18 @@ describe("evaluate", () => {
|
||||
});
|
||||
});
|
||||
|
||||
test("$START → first role (status resume)", () => {
|
||||
const result = evaluate(solveIssueGraph, "$START", { $status: "resume" });
|
||||
expect(result).toEqual({
|
||||
ok: true,
|
||||
value: {
|
||||
role: "planner",
|
||||
prompt: "Review the previous run output and continue the work.",
|
||||
location: null,
|
||||
},
|
||||
});
|
||||
});
|
||||
|
||||
test("status-based routing (reviewer rejected → developer)", () => {
|
||||
const result = evaluate(solveIssueGraph, "reviewer", {
|
||||
$status: "rejected",
|
||||
@@ -95,7 +112,7 @@ describe("evaluate", () => {
|
||||
});
|
||||
|
||||
test("missing role in graph → error", () => {
|
||||
const result = evaluate(solveIssueGraph, "unknown-role", { $status: "_" });
|
||||
const result = evaluate(solveIssueGraph, "unknown-role", { $status: "new" });
|
||||
expect(result.ok).toBe(false);
|
||||
if (!result.ok) {
|
||||
expect(result.error.message).toBe('no transitions defined for role "unknown-role"');
|
||||
|
||||
@@ -9,31 +9,25 @@ import {
|
||||
cmdPromptAdapterDeveloping,
|
||||
cmdPromptBootstrap,
|
||||
cmdPromptList,
|
||||
cmdPromptSetup,
|
||||
cmdPromptUsage,
|
||||
cmdPromptUsageReference,
|
||||
cmdPromptWorkflowAuthoring,
|
||||
} from "../commands/prompt.js";
|
||||
|
||||
describe("prompt commands", () => {
|
||||
test("prompt list returns new prompt names", () => {
|
||||
test("prompt list returns prompt names (no bootstrap)", () => {
|
||||
const result = cmdPromptList();
|
||||
expect(result).toBeInstanceOf(Array);
|
||||
expect(result).toContain("usage");
|
||||
expect(result).toContain("workflow-authoring");
|
||||
expect(result).toContain("adapter-developing");
|
||||
expect(result).toContain("bootstrap");
|
||||
expect(result).not.toContain("user");
|
||||
expect(result).not.toContain("author");
|
||||
expect(result).not.toContain("developer");
|
||||
expect(result).not.toContain("adapter");
|
||||
expect(result).not.toContain("bootstrap");
|
||||
for (const name of result) {
|
||||
expect(name).toMatch(/^\S+$/);
|
||||
}
|
||||
});
|
||||
|
||||
test("prompt usage-reference returns non-empty markdown string with frontmatter", () => {
|
||||
const result = cmdPromptUsageReference();
|
||||
test("prompt usage returns only the usage reference with frontmatter", () => {
|
||||
const result = cmdPromptUsage();
|
||||
expect(typeof result).toBe("string");
|
||||
expect(result).toContain("uwf");
|
||||
expect(result).toContain("thread");
|
||||
@@ -42,6 +36,9 @@ describe("prompt commands", () => {
|
||||
expect(result).toContain("---");
|
||||
expect(result).toContain("name:");
|
||||
expect(result).toContain("version:");
|
||||
// Should NOT contain other references
|
||||
expect(result).not.toContain("Workflow Authoring Reference");
|
||||
expect(result).not.toContain("Adapter Developing Reference");
|
||||
expect(result.length).toBeGreaterThan(500);
|
||||
});
|
||||
|
||||
@@ -71,44 +68,29 @@ describe("prompt commands", () => {
|
||||
expect(result.length).toBeGreaterThan(500);
|
||||
});
|
||||
|
||||
test("prompt bootstrap returns non-empty skill with frontmatter", () => {
|
||||
test("prompt bootstrap returns framework-agnostic setup instructions", () => {
|
||||
const result = cmdPromptBootstrap();
|
||||
expect(typeof result).toBe("string");
|
||||
expect(result).toContain("uwf");
|
||||
expect(result).toContain("---");
|
||||
expect(result.length).toBeGreaterThan(100);
|
||||
});
|
||||
|
||||
test("prompt usage combines remaining references (no developer)", () => {
|
||||
const result = cmdPromptUsage();
|
||||
expect(typeof result).toBe("string");
|
||||
expect(result).toContain("Usage Reference");
|
||||
expect(result).toContain("Workflow Authoring Reference");
|
||||
expect(result).toContain("Adapter Developing Reference");
|
||||
expect(result).not.toContain("Developer Reference");
|
||||
expect(result).toContain("---");
|
||||
expect(result.length).toBeGreaterThan(2000);
|
||||
});
|
||||
|
||||
test("prompt setup returns simplified setup instructions", () => {
|
||||
const result = cmdPromptSetup();
|
||||
expect(typeof result).toBe("string");
|
||||
expect(result).toContain("uwf Skill Setup");
|
||||
expect(result).toContain("uwf prompt bootstrap");
|
||||
expect(result).toContain("SKILL.md");
|
||||
expect(result).toContain("version");
|
||||
expect(result).not.toMatch(/\bbun (install|run|test|changeset|version|release)\b/);
|
||||
});
|
||||
|
||||
test("prompt setup references new subcommand names", () => {
|
||||
const result = cmdPromptSetup();
|
||||
// Skills installation
|
||||
expect(result).toContain("uwf prompt usage");
|
||||
expect(result).toContain("uwf prompt workflow-authoring");
|
||||
expect(result).toContain("uwf prompt adapter-developing");
|
||||
expect(result).not.toContain("uwf prompt user");
|
||||
expect(result).not.toContain("uwf prompt author");
|
||||
expect(result).not.toContain("uwf prompt developer");
|
||||
expect(result).not.toMatch(/uwf prompt adapter\b(?!-developing)/);
|
||||
expect(result).toContain("uwf-usage");
|
||||
expect(result).toContain("uwf-workflow-authoring");
|
||||
expect(result).toContain("uwf-adapter-developing");
|
||||
// Fresh install scenario
|
||||
expect(result).toContain("Fresh Install");
|
||||
expect(result).toContain("uwf setup");
|
||||
expect(result).toContain("--provider");
|
||||
expect(result).toContain("--api-key");
|
||||
expect(result).toContain("agent adapter");
|
||||
// Upgrade scenario
|
||||
expect(result).toContain("Upgrade");
|
||||
expect(result).toContain("Migrate");
|
||||
// Should NOT contain Hermes-specific paths
|
||||
expect(result).not.toContain("~/.hermes/skills/");
|
||||
expect(result).not.toContain("> ~/.hermes/");
|
||||
expect(result.length).toBeGreaterThan(100);
|
||||
});
|
||||
|
||||
test("prompt help subcommand is suppressed", { timeout: 30_000 }, () => {
|
||||
@@ -119,11 +101,12 @@ describe("prompt commands", () => {
|
||||
});
|
||||
expect(output).not.toMatch(/help\s+\[command\]/i);
|
||||
expect(output).toContain("usage");
|
||||
expect(output).toContain("setup");
|
||||
expect(output).toContain("bootstrap");
|
||||
expect(output).toContain("workflow-authoring");
|
||||
expect(output).toContain("adapter-developing");
|
||||
expect(output).toContain("bootstrap");
|
||||
expect(output).toContain("list");
|
||||
expect(output).not.toContain("developer");
|
||||
// Removed subcommands should not appear as command names
|
||||
expect(output).not.toMatch(/^\s+setup\s/m);
|
||||
expect(output).not.toContain("usage-reference");
|
||||
});
|
||||
});
|
||||
|
||||
@@ -21,11 +21,11 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
|
||||
"..",
|
||||
"..",
|
||||
"..",
|
||||
".workflows",
|
||||
"examples",
|
||||
"solve-issue.yaml",
|
||||
);
|
||||
|
||||
test("committer procedure should use curl API instead of tea pr create", async () => {
|
||||
test("committer procedure should create PR via tea pr create", async () => {
|
||||
const yamlContent = await readFile(workflowPath, "utf-8");
|
||||
const workflow = parse(yamlContent) as WorkflowPayload;
|
||||
|
||||
@@ -33,25 +33,22 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
|
||||
const committerProcedure = workflow.roles.committer?.procedure;
|
||||
expect(committerProcedure).toBeDefined();
|
||||
|
||||
// Verify the procedure uses curl API, not tea pr create
|
||||
expect(committerProcedure).toContain("curl");
|
||||
expect(committerProcedure).toContain("api/v1/repos");
|
||||
expect(committerProcedure).toContain("/pulls");
|
||||
|
||||
// Verify it explicitly warns against tea pr create
|
||||
expect(committerProcedure).toMatch(/do NOT use.*tea pr create/i);
|
||||
// Verify the procedure uses tea pr create for PR creation
|
||||
expect(committerProcedure).toContain("tea pr create");
|
||||
expect(committerProcedure).toContain("git push");
|
||||
expect(committerProcedure).toContain("Fixes #N");
|
||||
});
|
||||
|
||||
test("committer procedure should reference repoRemote from task prompt", async () => {
|
||||
test("committer procedure should extract owner/repo from git remote", async () => {
|
||||
const yamlContent = await readFile(workflowPath, "utf-8");
|
||||
const workflow = parse(yamlContent) as WorkflowPayload;
|
||||
|
||||
const committerProcedure = workflow.roles.committer?.procedure;
|
||||
expect(committerProcedure).toBeDefined();
|
||||
|
||||
// Verify the procedure mentions repoRemote is provided in task prompt
|
||||
expect(committerProcedure).toMatch(/repo remote.*provided.*task prompt/i);
|
||||
expect(committerProcedure).toMatch(/owner\/repo/i);
|
||||
// Verify the procedure extracts owner/repo from remote
|
||||
expect(committerProcedure).toContain("git remote get-url origin");
|
||||
expect(committerProcedure).toContain("hook_failed");
|
||||
});
|
||||
|
||||
test("committer procedure should include error handling for curl failures", async () => {
|
||||
@@ -100,45 +97,42 @@ describe("solve-issue workflow: Gitea API PR creation", () => {
|
||||
expect(committedVariant.required).toContain("$status");
|
||||
});
|
||||
|
||||
test("developer procedure should include mandatory verification step", async () => {
|
||||
test("developer procedure should include worktree setup", async () => {
|
||||
const yamlContent = await readFile(workflowPath, "utf-8");
|
||||
const workflow = parse(yamlContent) as WorkflowPayload;
|
||||
|
||||
const developerProcedure = workflow.roles.developer?.procedure;
|
||||
expect(developerProcedure).toBeDefined();
|
||||
|
||||
// Verify the procedure includes mandatory verification step
|
||||
expect(developerProcedure).toContain("MANDATORY VERIFICATION");
|
||||
expect(developerProcedure).toContain("git branch --show-current");
|
||||
expect(developerProcedure).toContain("git status");
|
||||
expect(developerProcedure).toMatch(/ls -la|verify.*exist/i);
|
||||
// Verify the procedure includes worktree setup
|
||||
expect(developerProcedure).toContain("IMPORTANT");
|
||||
expect(developerProcedure).toContain("git worktree add");
|
||||
expect(developerProcedure).toContain("pnpm install");
|
||||
});
|
||||
|
||||
test("reviewer procedure should enforce worktree path verification", async () => {
|
||||
test("reviewer procedure should verify branch and run checks", async () => {
|
||||
const yamlContent = await readFile(workflowPath, "utf-8");
|
||||
const workflow = parse(yamlContent) as WorkflowPayload;
|
||||
|
||||
const reviewerProcedure = workflow.roles.reviewer?.procedure;
|
||||
expect(reviewerProcedure).toBeDefined();
|
||||
|
||||
// Verify the procedure includes critical enforcement
|
||||
expect(reviewerProcedure).toContain("CRITICAL");
|
||||
expect(reviewerProcedure).toMatch(/cd.*pwd/);
|
||||
expect(reviewerProcedure).toContain(
|
||||
"Do NOT report results without running the actual commands",
|
||||
);
|
||||
// Verify the procedure includes branch verification and build checks
|
||||
expect(reviewerProcedure).toContain("git branch --show-current");
|
||||
expect(reviewerProcedure).toContain("pnpm run build");
|
||||
expect(reviewerProcedure).toContain("pnpm run check");
|
||||
});
|
||||
|
||||
test("developer procedure should include test debugging escalation", async () => {
|
||||
test("developer procedure should include changeset and failure handling", async () => {
|
||||
const yamlContent = await readFile(workflowPath, "utf-8");
|
||||
const workflow = parse(yamlContent) as WorkflowPayload;
|
||||
|
||||
const developerProcedure = workflow.roles.developer?.procedure;
|
||||
expect(developerProcedure).toBeDefined();
|
||||
|
||||
// Verify the procedure includes test failure guidance
|
||||
expect(developerProcedure).toMatch(/tests fail.*first run/i);
|
||||
expect(developerProcedure).toMatch(/3 test cycles|after 3 attempts/i);
|
||||
// Verify the procedure includes changeset requirement and failure path
|
||||
expect(developerProcedure).toContain(".changeset/");
|
||||
expect(developerProcedure).toContain("$status=failed");
|
||||
expect(developerProcedure).toContain("pnpm test");
|
||||
});
|
||||
});
|
||||
|
||||
@@ -253,7 +253,10 @@ describe("thread read timing", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "go", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "go", location: null },
|
||||
resume: { role: "worker", prompt: "resume", location: null },
|
||||
},
|
||||
worker: { done: { role: "$END", prompt: "", location: null } },
|
||||
},
|
||||
});
|
||||
@@ -319,7 +322,10 @@ describe("thread read timing", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "go", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "go", location: null },
|
||||
resume: { role: "worker", prompt: "resume", location: null },
|
||||
},
|
||||
worker: { done: { role: "$END", prompt: "", location: null } },
|
||||
},
|
||||
});
|
||||
|
||||
@@ -54,13 +54,17 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["ready"] }
|
||||
$status: { const: "ready" }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: planner
|
||||
prompt: "Plan the work"
|
||||
location: null
|
||||
resume:
|
||||
role: planner
|
||||
prompt: "Resume the work"
|
||||
location: null
|
||||
planner:
|
||||
ready:
|
||||
role: $END
|
||||
@@ -110,13 +114,17 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["ready"] }
|
||||
$status: { const: "ready" }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: planner
|
||||
prompt: "Plan"
|
||||
location: null
|
||||
resume:
|
||||
role: planner
|
||||
prompt: "Resume"
|
||||
location: null
|
||||
planner:
|
||||
ready:
|
||||
role: $END
|
||||
@@ -153,13 +161,17 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["ready"] }
|
||||
$status: { const: "ready" }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: planner
|
||||
prompt: "Plan"
|
||||
location: null
|
||||
resume:
|
||||
role: planner
|
||||
prompt: "Resume"
|
||||
location: null
|
||||
planner:
|
||||
ready:
|
||||
role: $END
|
||||
|
||||
@@ -70,7 +70,10 @@ async function setupSuspendedThread(mode: MockAgentMode): Promise<{
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start work", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start work", location: null },
|
||||
resume: { role: "worker", prompt: "Resume the work", location: null },
|
||||
},
|
||||
worker: {
|
||||
needs_input: {
|
||||
role: "$SUSPEND",
|
||||
@@ -233,7 +236,10 @@ describe("uwf thread resume", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start", location: null },
|
||||
resume: { role: "worker", prompt: "Resume", location: null },
|
||||
},
|
||||
worker: { done: { role: "$END", prompt: "Done", location: null } },
|
||||
},
|
||||
});
|
||||
@@ -479,7 +485,10 @@ describe("uwf thread resume - completed threads", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start work", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start work", location: null },
|
||||
resume: { role: "worker", prompt: "Resume the work", location: null },
|
||||
},
|
||||
worker: { done: { role: "reviewer", prompt: "Review the work", location: null } },
|
||||
reviewer: { done: { role: "$END", prompt: "Done", location: null } },
|
||||
},
|
||||
@@ -610,7 +619,7 @@ echo '${adapterJson}'
|
||||
expect(cliOutput.done).toBe(false);
|
||||
|
||||
const capturedPrompt = await readFile(promptCapturePath, "utf8");
|
||||
expect(capturedPrompt).toContain("Previous run completed");
|
||||
expect(capturedPrompt).toContain("Resume the work");
|
||||
expect(capturedPrompt).toContain("Additional context");
|
||||
|
||||
const storeModule = await import("../store.js");
|
||||
@@ -640,7 +649,10 @@ echo '${adapterJson}'
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start", location: null },
|
||||
resume: { role: "worker", prompt: "Resume", location: null },
|
||||
},
|
||||
worker: { done: { role: "$END", prompt: "Done", location: null } },
|
||||
},
|
||||
});
|
||||
@@ -688,7 +700,10 @@ echo '${adapterJson}'
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start", location: null },
|
||||
resume: { role: "worker", prompt: "Resume", location: null },
|
||||
},
|
||||
worker: { done: { role: "$END", prompt: "Done", location: null } },
|
||||
},
|
||||
});
|
||||
|
||||
@@ -31,13 +31,17 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["ready"] }
|
||||
$status: { const: "ready" }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: planner
|
||||
prompt: "Plan the work"
|
||||
location: null
|
||||
resume:
|
||||
role: planner
|
||||
prompt: "Resume the work"
|
||||
location: null
|
||||
planner:
|
||||
ready:
|
||||
role: $END
|
||||
@@ -66,10 +70,14 @@ roles:
|
||||
question: { type: string }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: worker
|
||||
prompt: "Start work"
|
||||
location: null
|
||||
resume:
|
||||
role: worker
|
||||
prompt: "Resume work"
|
||||
location: null
|
||||
worker:
|
||||
needs_input:
|
||||
role: $SUSPEND
|
||||
|
||||
@@ -54,13 +54,17 @@ roles:
|
||||
type: object
|
||||
required: ["$status"]
|
||||
properties:
|
||||
$status: { type: string, enum: ["ready"] }
|
||||
$status: { const: "ready" }
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
new:
|
||||
role: planner
|
||||
prompt: "Plan the work"
|
||||
location: null
|
||||
resume:
|
||||
role: planner
|
||||
prompt: "Resume the work"
|
||||
location: null
|
||||
planner:
|
||||
ready:
|
||||
role: $END
|
||||
|
||||
@@ -58,7 +58,10 @@ describe("suspend step CAS chain and threads.yaml metadata", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start work", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start work", location: null },
|
||||
resume: { role: "worker", prompt: "Resume work", location: null },
|
||||
},
|
||||
worker: {
|
||||
needs_input: {
|
||||
role: "$SUSPEND",
|
||||
|
||||
@@ -55,7 +55,10 @@ describe("suspended thread display", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start work", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start work", location: null },
|
||||
resume: { role: "worker", prompt: "Resume work", location: null },
|
||||
},
|
||||
worker: {
|
||||
needs_input: {
|
||||
role: "$SUSPEND",
|
||||
@@ -162,7 +165,10 @@ describe("suspended thread display", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start work", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start work", location: null },
|
||||
resume: { role: "worker", prompt: "Resume work", location: null },
|
||||
},
|
||||
worker: {
|
||||
needs_input: {
|
||||
role: "$SUSPEND",
|
||||
@@ -248,7 +254,10 @@ describe("suspended thread display", () => {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "Start work", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "Start work", location: null },
|
||||
resume: { role: "worker", prompt: "Resume work", location: null },
|
||||
},
|
||||
},
|
||||
});
|
||||
|
||||
|
||||
@@ -17,7 +17,7 @@ function makeWorkflow(overrides?: Partial<WorkflowPayload>): WorkflowPayload {
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { enum: ["done"] },
|
||||
$status: { const: "done" },
|
||||
plan: { type: "string" },
|
||||
},
|
||||
required: ["$status", "plan"],
|
||||
@@ -51,7 +51,10 @@ function makeWorkflow(overrides?: Partial<WorkflowPayload>): WorkflowPayload {
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "writer", prompt: "Begin writing", location: null } },
|
||||
$START: {
|
||||
new: { role: "writer", prompt: "Begin writing", location: null },
|
||||
resume: { role: "writer", prompt: "Review previous output and continue", location: null },
|
||||
},
|
||||
writer: { done: { role: "reviewer", prompt: "Review this: {{{plan}}}", location: null } },
|
||||
reviewer: {
|
||||
approved: { role: "$END", prompt: "Done: {{{summary}}}", location: null },
|
||||
@@ -82,7 +85,7 @@ describe("Suite 1: Role Reference Integrity", () => {
|
||||
output: "None",
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: { $status: { enum: ["done"] } },
|
||||
properties: { $status: { const: "done" } },
|
||||
required: ["$status"],
|
||||
} as unknown as string,
|
||||
};
|
||||
@@ -135,27 +138,38 @@ describe("Suite 2: Graph Structure", () => {
|
||||
expect(errors.some((e) => e.includes("$START must be defined in graph"))).toBe(true);
|
||||
});
|
||||
|
||||
test("2.2 $START has multiple status keys", () => {
|
||||
test("2.2 $START missing resume edge", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.graph.$START = {
|
||||
_: { role: "writer", prompt: "Begin", location: null },
|
||||
other: { role: "reviewer", prompt: "Also", location: null },
|
||||
new: { role: "writer", prompt: "Begin", location: null },
|
||||
};
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(
|
||||
errors.some((e) => e.includes('$START must have exactly one edge with status "_"')),
|
||||
errors.some((e) => e.includes('$START must have edges with statuses "new" and "resume"')),
|
||||
).toBe(true);
|
||||
});
|
||||
|
||||
test("2.3 $START edge uses non-_ status", () => {
|
||||
test("2.3 $START missing new edge", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.graph.$START = { ready: { role: "writer", prompt: "Begin", location: null } };
|
||||
wf.graph.$START = {
|
||||
resume: { role: "writer", prompt: "Resume", location: null },
|
||||
};
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(
|
||||
errors.some((e) => e.includes('$START must have exactly one edge with status "_"')),
|
||||
errors.some((e) => e.includes('$START must have edges with statuses "new" and "resume"')),
|
||||
).toBe(true);
|
||||
});
|
||||
|
||||
test("2.3b $START with new and resume passes", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.graph.$START = {
|
||||
new: { role: "writer", prompt: "Begin", location: null },
|
||||
resume: { role: "writer", prompt: "Resume", location: null },
|
||||
};
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(errors.some((e) => e.includes("$START must have edges"))).toBe(false);
|
||||
});
|
||||
|
||||
test("2.4 $END has outgoing edges", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.graph.$END = { _: { role: "writer", prompt: "Loop", location: null } };
|
||||
@@ -173,7 +187,7 @@ describe("Suite 2: Graph Structure", () => {
|
||||
output: "Isolated",
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: { $status: { enum: ["done"] } },
|
||||
properties: { $status: { const: "done" } },
|
||||
required: ["$status"],
|
||||
} as unknown as string,
|
||||
};
|
||||
@@ -193,15 +207,18 @@ describe("Suite 2: Graph Structure", () => {
|
||||
});
|
||||
|
||||
describe("Suite 3: Status-Edge Consistency", () => {
|
||||
test("3.1 user role using _ graph key is rejected", () => {
|
||||
test("3.1 user role using _ graph key is treated as an unknown status", () => {
|
||||
// "_" is no longer special-cased — it's just a status key that does not
|
||||
// match the role's $status enum, so it surfaces as extra/missing keys.
|
||||
const wf = makeWorkflow();
|
||||
wf.graph.writer = { _: { role: "reviewer", prompt: "Review", location: null } };
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(
|
||||
errors.some((e) =>
|
||||
e.includes('role "writer" must use explicit $status keys in graph, not "_"'),
|
||||
),
|
||||
).toBe(true);
|
||||
expect(errors.some((e) => e.includes('role "writer" graph has extra status keys: _'))).toBe(
|
||||
true,
|
||||
);
|
||||
expect(errors.some((e) => e.includes('role "writer" graph is missing status keys: done'))).toBe(
|
||||
true,
|
||||
);
|
||||
});
|
||||
|
||||
test("3.2 user role graph key not matching $status enum", () => {
|
||||
@@ -240,20 +257,23 @@ describe("Suite 3: Status-Edge Consistency", () => {
|
||||
).toBe(true);
|
||||
});
|
||||
|
||||
test("3.5 multi-exit role with _ key", () => {
|
||||
test("3.5 multi-exit role with _ key is treated as an unknown status", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.graph.reviewer = { _: { role: "$END", prompt: "Done", location: null } };
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(errors.some((e) => e.includes('role "reviewer" graph has extra status keys: _'))).toBe(
|
||||
true,
|
||||
);
|
||||
expect(
|
||||
errors.some((e) =>
|
||||
e.includes('role "reviewer" must use explicit $status keys in graph, not "_"'),
|
||||
e.includes('role "reviewer" graph is missing status keys: approved, rejected'),
|
||||
),
|
||||
).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
describe("Suite 3b: Enum-Based Multi-Exit", () => {
|
||||
test("3b.1 enum multi-exit passes with matching graph keys", () => {
|
||||
describe("Suite 3b: Enum-Based $status is Rejected", () => {
|
||||
test("3b.1 enum multi-exit is rejected (must use oneOf + const)", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.roles.reviewer = {
|
||||
...wf.roles.reviewer,
|
||||
@@ -271,52 +291,10 @@ describe("Suite 3b: Enum-Based Multi-Exit", () => {
|
||||
rejected: { role: "writer", prompt: "Fix: {{{comments}}}", location: null },
|
||||
};
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(errors).toEqual([]);
|
||||
expect(errors.some((e) => e.includes("must define") && e.includes("const"))).toBe(true);
|
||||
});
|
||||
|
||||
test("3b.2 enum multi-exit with extra graph key", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.roles.reviewer = {
|
||||
...wf.roles.reviewer,
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { enum: ["approved", "rejected"] },
|
||||
comments: { type: "string" },
|
||||
},
|
||||
required: ["$status", "comments"],
|
||||
} as unknown as string,
|
||||
};
|
||||
wf.graph.reviewer = {
|
||||
approved: { role: "$END", prompt: "Done", location: null },
|
||||
rejected: { role: "writer", prompt: "Fix", location: null },
|
||||
timeout: { role: "$END", prompt: "Timed out", location: null },
|
||||
};
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(errors.some((e) => e.includes("extra status keys: timeout"))).toBe(true);
|
||||
});
|
||||
|
||||
test("3b.3 enum multi-exit with missing graph key", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.roles.reviewer = {
|
||||
...wf.roles.reviewer,
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { enum: ["approved", "rejected"] },
|
||||
comments: { type: "string" },
|
||||
},
|
||||
required: ["$status", "comments"],
|
||||
} as unknown as string,
|
||||
};
|
||||
wf.graph.reviewer = {
|
||||
approved: { role: "$END", prompt: "Done", location: null },
|
||||
};
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(errors.some((e) => e.includes("missing status keys: rejected"))).toBe(true);
|
||||
});
|
||||
|
||||
test("3b.4 enum with single explicit value passes", () => {
|
||||
test("3b.2 enum single-exit is rejected (must use const)", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.roles.writer = {
|
||||
...wf.roles.writer,
|
||||
@@ -331,28 +309,71 @@ describe("Suite 3b: Enum-Based Multi-Exit", () => {
|
||||
};
|
||||
wf.graph.writer = { ready: { role: "reviewer", prompt: "Review: {{{plan}}}", location: null } };
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(errors).toEqual([]);
|
||||
expect(errors.some((e) => e.includes("must define") && e.includes("const"))).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
test("3b.5 enum multi-exit mustache var not in frontmatter", () => {
|
||||
describe("Suite 3c: Const-Based Flat Schema", () => {
|
||||
test("3c.1 flat schema with const $status passes validation", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.roles.reviewer = {
|
||||
...wf.roles.reviewer,
|
||||
wf.roles.writer = {
|
||||
...wf.roles.writer,
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { enum: ["approved", "rejected"] },
|
||||
comments: { type: "string" },
|
||||
$status: { const: "done" },
|
||||
plan: { type: "string" },
|
||||
},
|
||||
required: ["$status", "comments"],
|
||||
required: ["$status", "plan"],
|
||||
} as unknown as string,
|
||||
};
|
||||
wf.graph.reviewer = {
|
||||
approved: { role: "$END", prompt: "Done: {{{nonexistent}}}", location: null },
|
||||
rejected: { role: "writer", prompt: "Fix: {{{comments}}}", location: null },
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(errors).toEqual([]);
|
||||
});
|
||||
|
||||
test("3c.2 flat schema with const $status detects extra graph key", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.roles.writer = {
|
||||
...wf.roles.writer,
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { const: "done" },
|
||||
plan: { type: "string" },
|
||||
},
|
||||
required: ["$status", "plan"],
|
||||
} as unknown as string,
|
||||
};
|
||||
wf.graph.writer = {
|
||||
done: { role: "reviewer", prompt: "Review.", location: null },
|
||||
extra: { role: "$END", prompt: "Nope.", location: null },
|
||||
};
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(errors.some((e) => e.includes("nonexistent") && e.includes("not found"))).toBe(true);
|
||||
expect(errors.some((e) => e.includes("extra status keys") && e.includes("extra"))).toBe(true);
|
||||
});
|
||||
|
||||
test("3c.3 flat schema with const $status validates mustache vars", () => {
|
||||
const wf = makeWorkflow();
|
||||
wf.roles.writer = {
|
||||
...wf.roles.writer,
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { const: "done" },
|
||||
plan: { type: "string" },
|
||||
},
|
||||
required: ["$status", "plan"],
|
||||
} as unknown as string,
|
||||
};
|
||||
wf.graph.writer = {
|
||||
done: { role: "reviewer", prompt: "Review: {{{nonexistent}}}", location: null },
|
||||
};
|
||||
const errors = validateWorkflow(wf);
|
||||
expect(
|
||||
errors.some(
|
||||
(e) => e.includes('prompt variable "nonexistent"') && e.includes('role "writer"'),
|
||||
),
|
||||
).toBe(true);
|
||||
});
|
||||
});
|
||||
|
||||
@@ -460,7 +481,7 @@ describe("Suite 6: Multiple Errors Collection", () => {
|
||||
output: "None",
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: { $status: { enum: ["done"] } },
|
||||
properties: { $status: { const: "done" } },
|
||||
required: ["$status"],
|
||||
} as unknown as string,
|
||||
};
|
||||
|
||||
@@ -31,14 +31,17 @@ function makeMinimalPayload(name: string, description: string): WorkflowPayload
|
||||
frontmatter: {
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { type: "string", enum: ["done"] },
|
||||
$status: { const: "done" },
|
||||
},
|
||||
required: ["$status"],
|
||||
} as unknown as CasRef,
|
||||
},
|
||||
},
|
||||
graph: {
|
||||
$START: { _: { role: "worker", prompt: "start working", location: null } },
|
||||
$START: {
|
||||
new: { role: "worker", prompt: "start working", location: null },
|
||||
resume: { role: "worker", prompt: "resume working", location: null },
|
||||
},
|
||||
worker: { done: { role: "$END", prompt: "done", location: null } },
|
||||
},
|
||||
};
|
||||
|
||||
+14
-26
@@ -1,4 +1,4 @@
|
||||
#!/usr/bin/env node
|
||||
#!/usr/bin/env -S node --disable-warning=ExperimentalWarning
|
||||
|
||||
import type { CasRef, ThreadId, ThreadStatus } from "@united-workforce/protocol";
|
||||
import { Command } from "commander";
|
||||
@@ -8,12 +8,10 @@ import {
|
||||
cmdPromptAdapterDeveloping,
|
||||
cmdPromptBootstrap,
|
||||
cmdPromptList,
|
||||
cmdPromptSetup,
|
||||
cmdPromptUsage,
|
||||
cmdPromptUsageReference,
|
||||
cmdPromptWorkflowAuthoring,
|
||||
} from "./commands/prompt.js";
|
||||
import { cmdSetup, cmdSetupInteractive } from "./commands/setup.js";
|
||||
import { cmdSetup, cmdSetupInteractive, resolvePresetBaseUrl } from "./commands/setup.js";
|
||||
import { cmdStepFork, cmdStepList, cmdStepRead, cmdStepShow } from "./commands/step.js";
|
||||
import {
|
||||
cmdThreadCancel,
|
||||
@@ -509,23 +507,16 @@ prompt.addHelpCommand(false);
|
||||
|
||||
prompt
|
||||
.command("usage")
|
||||
.description("Print the complete skill content (all references combined)")
|
||||
.description("Print the usage reference (CLI guide + typical workflows)")
|
||||
.action(() => {
|
||||
console.log(cmdPromptUsage());
|
||||
});
|
||||
|
||||
prompt
|
||||
.command("setup")
|
||||
.description("Print setup instructions for installing the uwf skill")
|
||||
.command("bootstrap")
|
||||
.description("Print setup instructions for installing uwf skills")
|
||||
.action(() => {
|
||||
console.log(cmdPromptSetup());
|
||||
});
|
||||
|
||||
prompt
|
||||
.command("usage-reference")
|
||||
.description("Print the usage reference (CLI guide + typical workflows)")
|
||||
.action(() => {
|
||||
console.log(cmdPromptUsageReference());
|
||||
console.log(cmdPromptBootstrap());
|
||||
});
|
||||
|
||||
prompt
|
||||
@@ -542,13 +533,6 @@ prompt
|
||||
console.log(cmdPromptAdapterDeveloping());
|
||||
});
|
||||
|
||||
prompt
|
||||
.command("bootstrap")
|
||||
.description("Print the bootstrap skill YAML for Hermes agents")
|
||||
.action(() => {
|
||||
console.log(cmdPromptBootstrap());
|
||||
});
|
||||
|
||||
prompt
|
||||
.command("list")
|
||||
.description("List all available prompt names")
|
||||
@@ -558,7 +542,7 @@ prompt
|
||||
|
||||
program
|
||||
.command("setup")
|
||||
.description("Configure provider, model, and agent")
|
||||
.description("Configure provider, model, and agent. Run without options for interactive wizard.")
|
||||
.option("--provider <name>", "Provider name")
|
||||
.option("--base-url <url>", "OpenAI-compatible API base URL")
|
||||
.option("--api-key <key>", "API key")
|
||||
@@ -574,10 +558,14 @@ program
|
||||
}) => {
|
||||
const storageRoot = resolveStorageRoot();
|
||||
runAction(async () => {
|
||||
if (opts.provider && opts.baseUrl && opts.apiKey && opts.model) {
|
||||
// Resolve preset base-url when provider is known but --base-url is omitted
|
||||
const resolvedBaseUrl =
|
||||
opts.baseUrl ??
|
||||
(opts.provider !== undefined ? resolvePresetBaseUrl(opts.provider) : null);
|
||||
if (opts.provider && resolvedBaseUrl && opts.apiKey && opts.model) {
|
||||
const result = await cmdSetup({
|
||||
provider: opts.provider,
|
||||
baseUrl: opts.baseUrl,
|
||||
baseUrl: resolvedBaseUrl,
|
||||
apiKey: opts.apiKey,
|
||||
model: opts.model,
|
||||
agent: opts.agent ?? undefined,
|
||||
@@ -588,7 +576,7 @@ program
|
||||
await cmdSetupInteractive(storageRoot);
|
||||
} else {
|
||||
throw new Error(
|
||||
"Non-interactive setup requires all of: --provider, --base-url, --api-key, --model",
|
||||
"Non-interactive setup requires: --provider, --api-key, --model (--base-url is optional for preset providers)",
|
||||
);
|
||||
}
|
||||
});
|
||||
|
||||
@@ -1,14 +1,38 @@
|
||||
import { readFileSync } from "node:fs";
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
import {
|
||||
generateAdapterDevelopingReference,
|
||||
generateBootstrapReference,
|
||||
generateUsageReference,
|
||||
generateWorkflowAuthoringReference,
|
||||
} from "@united-workforce/util";
|
||||
|
||||
// CLI package version (for bootstrap prompt — uwf --version prints this)
|
||||
// Walk up from __dirname to find the nearest package.json (works from both src/ and dist/)
|
||||
function _findCliVersion(): string {
|
||||
let dir = dirname(fileURLToPath(import.meta.url));
|
||||
for (let i = 0; i < 5; i++) {
|
||||
const candidate = join(dir, "package.json");
|
||||
try {
|
||||
const pkg = JSON.parse(readFileSync(candidate, "utf-8")) as {
|
||||
name?: string;
|
||||
version?: string;
|
||||
};
|
||||
if (pkg.name === "@united-workforce/cli") {
|
||||
return pkg.version ?? "0.0.0";
|
||||
}
|
||||
} catch {
|
||||
// not found, keep walking
|
||||
}
|
||||
dir = dirname(dir);
|
||||
}
|
||||
return "0.0.0";
|
||||
}
|
||||
const CLI_VERSION = _findCliVersion();
|
||||
|
||||
export {
|
||||
generateAdapterDevelopingReference as cmdPromptAdapterDeveloping,
|
||||
generateBootstrapReference as cmdPromptBootstrap,
|
||||
generateUsageReference as cmdPromptUsageReference,
|
||||
generateUsageReference as cmdPromptUsage,
|
||||
generateWorkflowAuthoringReference as cmdPromptWorkflowAuthoring,
|
||||
};
|
||||
|
||||
@@ -16,66 +40,291 @@ const PROMPT_ENTRIES: ReadonlyArray<{ name: string; generate: () => string }> =
|
||||
{ name: "usage", generate: generateUsageReference },
|
||||
{ name: "workflow-authoring", generate: generateWorkflowAuthoringReference },
|
||||
{ name: "adapter-developing", generate: generateAdapterDevelopingReference },
|
||||
{ name: "bootstrap", generate: generateBootstrapReference },
|
||||
];
|
||||
|
||||
export function cmdPromptList(): ReadonlyArray<string> {
|
||||
return PROMPT_ENTRIES.map((e) => e.name);
|
||||
}
|
||||
|
||||
export function cmdPromptUsage(): string {
|
||||
return PROMPT_ENTRIES.filter((e) => e.name !== "bootstrap")
|
||||
.map((e) => e.generate())
|
||||
.join("\n\n---\n\n");
|
||||
}
|
||||
export function cmdPromptBootstrap(): string {
|
||||
return `# uwf Bootstrap
|
||||
|
||||
export function cmdPromptSetup(): string {
|
||||
return `# uwf Skill Setup
|
||||
Set up or upgrade uwf (United Workforce) — from zero to running your first workflow.
|
||||
|
||||
You are being asked to install or update the uwf (United Workforce) skill
|
||||
so that you know how to use the \`uwf\` CLI for workflow orchestration.
|
||||
## Scenario A: Fresh Install
|
||||
|
||||
## Steps
|
||||
### Step 0 — Environment pre-flight check
|
||||
|
||||
1. **Check if uwf CLI is installed:**
|
||||
Run ALL checks below BEFORE installing anything. If any check fails, fix it first — do NOT skip ahead.
|
||||
|
||||
\`\`\`bash
|
||||
# 1. Node.js
|
||||
node --version # need v20+
|
||||
# FIX: install via nvm (https://github.com/nvm-sh/nvm) or your OS package manager
|
||||
|
||||
# 2. Package manager — pnpm (preferred) or npm
|
||||
pnpm --version || npm --version
|
||||
# FIX for pnpm: corepack enable pnpm (Node 20+)
|
||||
# or: npm install -g pnpm
|
||||
# If pnpm is not an option, npm works too — adjust install commands below
|
||||
|
||||
# 3. Global bin directory is in PATH
|
||||
# For pnpm:
|
||||
pnpm bin -g # prints the global bin directory
|
||||
# For npm:
|
||||
npm prefix -g # global prefix; bin is <prefix>/bin
|
||||
# The directory printed above MUST be in your PATH.
|
||||
# Test: echo \\$PATH | tr ':' '\\\\n'
|
||||
# FIX: add the missing directory to your shell's startup file
|
||||
# (e.g. ~/.bashrc, ~/.zshrc, ~/.profile, or fish config):
|
||||
# export PATH="<global-bin-dir>:$PATH"
|
||||
\`\`\`
|
||||
|
||||
**All checks must pass before continuing.** If you had to modify PATH, verify the change persists by opening a new shell or sourcing your shell config.
|
||||
|
||||
### Step 1 — Discover agents and install adapter
|
||||
|
||||
**First, detect which supported agents are already installed on the user's machine:**
|
||||
|
||||
\`\`\`bash
|
||||
# Check for Hermes Agent
|
||||
which hermes 2>/dev/null && hermes --version
|
||||
|
||||
# Check for Claude Code
|
||||
which claude 2>/dev/null && claude --version # should show "X.Y.Z (Claude Code)"
|
||||
\`\`\`
|
||||
|
||||
**Based on the results:**
|
||||
|
||||
- **Only hermes found** → install \`uwf-hermes\` adapter
|
||||
- **Only claude found** → install \`uwf-claude-code\` adapter
|
||||
- **Both found** → ask the user which agent they want uwf to use as default
|
||||
- **Neither found** → the user must install at least one agent first:
|
||||
- Hermes Agent: https://hermes-agent.nousresearch.com/docs
|
||||
- Claude Code: \`npm install -g @anthropic-ai/claude-code\`
|
||||
|
||||
**Install the uwf CLI and the chosen adapter** using pnpm or npm:
|
||||
|
||||
\`\`\`bash
|
||||
# CLI (required)
|
||||
pnpm add -g @united-workforce/cli # or: npm install -g @united-workforce/cli
|
||||
|
||||
# Adapter — install the one matching the detected agent:
|
||||
pnpm add -g @united-workforce/agent-hermes # or: npm i -g @united-workforce/agent-hermes
|
||||
pnpm add -g @united-workforce/agent-claude-code # or: npm i -g @united-workforce/agent-claude-code
|
||||
\`\`\`
|
||||
|
||||
**⚠ Adapter versions are independent from CLI versions.** Do NOT try to match adapter version to CLI version. Just install \`@latest\` (the default).
|
||||
|
||||
**After installing, verify that \`uwf\` and the adapter are available in PATH:**
|
||||
|
||||
\`\`\`bash
|
||||
uwf --version # should print ${CLI_VERSION}
|
||||
uwf-hermes --version # or: uwf-claude-code --version
|
||||
\`\`\`
|
||||
|
||||
If either command is not found, the global bin directory is not in the current shell's PATH. **You must fix this before continuing:**
|
||||
|
||||
1. Find where the binary was installed:
|
||||
\`\`\`bash
|
||||
uwf --help
|
||||
find ~/.local ~/.hermes /usr/local -name uwf -type f 2>/dev/null
|
||||
npm prefix -g # global prefix — bin is <prefix>/bin
|
||||
\`\`\`
|
||||
If not installed: \`npm install -g @united-workforce/cli\`
|
||||
|
||||
2. **Clean up old versions of the skill** (if any exist):
|
||||
- Look for any existing uwf/workflow skill files in your skill directories
|
||||
(e.g. \`~/.hermes/skills/uwf/SKILL.md\` or similar paths).
|
||||
- If a skill exists, read its \`version\` from the YAML frontmatter.
|
||||
- If the version matches the current CLI version (\`uwf --version\`), you're done — no update needed.
|
||||
- If the version is older, **delete the old skill file** before proceeding.
|
||||
|
||||
3. **Install the bootstrap skill:**
|
||||
2. Add the directory to PATH permanently by appending to the user's shell config (e.g. \`~/.bashrc\`, \`~/.zshrc\`, \`~/.profile\`, or fish config):
|
||||
\`\`\`bash
|
||||
uwf prompt bootstrap > ~/.hermes/skills/uwf/SKILL.md
|
||||
export PATH="<global-bin-dir>:$PATH"
|
||||
\`\`\`
|
||||
The bootstrap prompt already includes complete YAML frontmatter — no editing needed.
|
||||
3. Source the updated config or open a new shell, then re-verify the commands work.
|
||||
|
||||
4. **Verify** the skill is loadable by your agent framework.
|
||||
**uwf-hermes** also requires the Hermes ACP plugin. Verify with \`hermes acp --help\`. If not available, install it:
|
||||
\`\`\`bash
|
||||
# Option A: install into hermes venv (recommended)
|
||||
source ~/.hermes/hermes-agent/.venv/bin/activate && pip install hermes-agent[acp]
|
||||
|
||||
## Individual prompts
|
||||
# Option B: pipx
|
||||
pipx install 'hermes-agent[acp]'
|
||||
|
||||
Each prompt outputs a complete SKILL.md with frontmatter — pipe directly to a file:
|
||||
# Option C: if installed from source
|
||||
pip install -e '.[acp]'
|
||||
\`\`\`
|
||||
|
||||
### Step 2 — Configure provider and model
|
||||
|
||||
uwf needs an LLM provider to run agents. **Ask the user** for their provider, API key, and model, then run:
|
||||
|
||||
\`\`\`bash
|
||||
uwf setup --provider <name> --api-key <key> --model <model> --agent <adapter-command>
|
||||
\`\`\`
|
||||
|
||||
**Note:** \`--agent\` takes the adapter **command name** (e.g. \`uwf-hermes\`), not the npm package name.
|
||||
|
||||
**Preset providers** — when using a preset name, \`--base-url\` is auto-filled and can be omitted:
|
||||
|
||||
| Provider | Name | Default base URL |
|
||||
|----------|------|-----------------|
|
||||
| OpenAI | \`openai\` | https://api.openai.com/v1 |
|
||||
| xAI | \`xai\` | https://api.x.ai/v1 |
|
||||
| OpenRouter | \`openrouter\` | https://openrouter.ai/api/v1 |
|
||||
| Venice | \`venice\` | https://api.venice.ai/api/v1 |
|
||||
| Dashscope | \`dashscope\` | https://dashscope.aliyuncs.com/compatible-mode/v1 |
|
||||
| DeepSeek | \`deepseek\` | https://api.deepseek.com/v1 |
|
||||
| SiliconFlow | \`siliconflow\` | https://api.siliconflow.cn/v1 |
|
||||
| VolcEngine | \`volcengine\` | https://ark.cn-beijing.volces.com/api/v3 |
|
||||
| Kimi (Moonshot) | \`kimi\` | https://api.moonshot.cn/v1 |
|
||||
| GLM (Zhipu AI) | \`glm\` | https://open.bigmodel.cn/api/paas/v4 |
|
||||
| StepFun | \`stepfun\` | https://api.stepfun.com/v1 |
|
||||
| MiniMax | \`minimax\` | https://api.minimax.io/v1 |
|
||||
| Ollama (local) | \`ollama\` | http://localhost:11434/v1 |
|
||||
|
||||
For **non-preset providers**, you must specify \`--base-url\` manually.
|
||||
|
||||
Example:
|
||||
\`\`\`bash
|
||||
uwf setup --provider openrouter --api-key sk-or-... --model anthropic/claude-sonnet-4 --agent uwf-hermes
|
||||
\`\`\`
|
||||
|
||||
If the user doesn't know what to choose, suggest \`openrouter\` with \`anthropic/claude-sonnet-4\` as a sensible default.
|
||||
|
||||
Config is saved to \`~/.uwf/config.yaml\`. Verify with \`cat ~/.uwf/config.yaml\`.
|
||||
|
||||
### Step 3 — Install skills
|
||||
|
||||
Run each command and create a skill from its output:
|
||||
|
||||
\`\`\`bash
|
||||
uwf prompt usage # → save as skill "uwf-usage"
|
||||
uwf prompt workflow-authoring # → save as skill "uwf-workflow-authoring"
|
||||
uwf prompt adapter-developing # → save as skill "uwf-adapter-developing"
|
||||
\`\`\`
|
||||
|
||||
Each command outputs a complete SKILL.md with YAML frontmatter. Use your agent framework's skill creation API to save them (e.g. \`skill_manage(action='create', name='uwf-usage', content=<output>)\`).
|
||||
|
||||
Verify skills are installed by listing them (e.g. \`skills_list()\`) and confirming all three appear.
|
||||
|
||||
**⚠ After saving all skills, start a new session** so the agent loads the updated skill content. Skills saved in the current session are not active until the next session.
|
||||
|
||||
### Step 4 — Verify end-to-end
|
||||
|
||||
Create a minimal workflow file to test your setup:
|
||||
|
||||
\`\`\`bash
|
||||
cat > /tmp/hello.yaml << 'YAML'
|
||||
name: hello
|
||||
description: Minimal smoke test
|
||||
roles:
|
||||
greeter:
|
||||
description: "Greet the user"
|
||||
goal: "Respond with a friendly greeting"
|
||||
capabilities: []
|
||||
procedure: "Write a short greeting based on the prompt."
|
||||
output: "A greeting message."
|
||||
frontmatter:
|
||||
type: object
|
||||
properties:
|
||||
$status: { const: done }
|
||||
message: { type: string }
|
||||
required: [$status, message]
|
||||
graph:
|
||||
$START:
|
||||
new: { role: greeter, prompt: "Say hello to the user." }
|
||||
resume: { role: greeter, prompt: "Greet the user again." }
|
||||
greeter:
|
||||
done: { role: "$END", prompt: "Done." }
|
||||
YAML
|
||||
\`\`\`
|
||||
|
||||
Then run:
|
||||
|
||||
\`\`\`bash
|
||||
uwf thread start /tmp/hello.yaml -p "Hello, world!"
|
||||
uwf thread exec <thread-id>
|
||||
uwf thread show <thread-id>
|
||||
\`\`\`
|
||||
|
||||
If the thread reaches \`$END\` with status \`completed\`, the setup is working.
|
||||
|
||||
## Scenario B: Upgrade from Previous Version
|
||||
|
||||
### Step 1 — Update packages
|
||||
|
||||
\`\`\`bash
|
||||
# Using pnpm
|
||||
pnpm add -g @united-workforce/cli@latest
|
||||
|
||||
# Using npm
|
||||
npm install -g @united-workforce/cli@latest
|
||||
\`\`\`
|
||||
|
||||
\`\`\`bash
|
||||
uwf --version # should print ${CLI_VERSION}
|
||||
\`\`\`
|
||||
|
||||
Also update your adapter(s):
|
||||
|
||||
\`\`\`bash
|
||||
# pnpm
|
||||
pnpm add -g @united-workforce/agent-hermes@latest
|
||||
|
||||
# npm
|
||||
npm install -g @united-workforce/agent-hermes@latest
|
||||
\`\`\`
|
||||
|
||||
### Step 2 — Regenerate skills
|
||||
|
||||
Skill content is bundled with the CLI — always regenerate after upgrading:
|
||||
|
||||
\`\`\`bash
|
||||
uwf prompt usage # → update skill "uwf-usage"
|
||||
uwf prompt workflow-authoring # → update skill "uwf-workflow-authoring"
|
||||
uwf prompt adapter-developing # → update skill "uwf-adapter-developing"
|
||||
\`\`\`
|
||||
|
||||
**⚠ After updating skills, start a new session** to load the new skill content.
|
||||
|
||||
### Step 3 — Migrate workflow YAML files (if needed)
|
||||
|
||||
Check the changelog for breaking changes. Known migrations:
|
||||
|
||||
- **v0.2.0**: \`$START._\` → \`$START.new\` + \`$START.resume\`. All workflow YAML files must be updated:
|
||||
\`\`\`yaml
|
||||
# Before (v0.1.x)
|
||||
$START:
|
||||
_: { role: planner, prompt: "..." }
|
||||
|
||||
# After (v0.2.0+)
|
||||
$START:
|
||||
new: { role: planner, prompt: "..." }
|
||||
resume: { role: planner, prompt: "Review previous run and continue." }
|
||||
\`\`\`
|
||||
|
||||
Update all \`.workflow/\` and \`.workflows/\` YAML files in your projects. \`uwf workflow add\` will reject files with the old \`_\` syntax.
|
||||
|
||||
- **v0.2.1**: \`$status: { enum: [value] }\` → \`$status: { const: "value" }\`. The validator no longer accepts \`enum\` for \`$status\`. Update all workflow YAML files:
|
||||
\`\`\`yaml
|
||||
# Before (v0.2.0)
|
||||
$status: { enum: [done] }
|
||||
$status: { type: string, enum: ["ready", "failed"] }
|
||||
|
||||
# After (v0.2.1+)
|
||||
$status: { const: "done" }
|
||||
# For multi-exit, use oneOf with const (unchanged)
|
||||
\`\`\`
|
||||
|
||||
### Step 4 — Verify
|
||||
|
||||
\`\`\`bash
|
||||
uwf thread start <your-workflow> -p "upgrade test"
|
||||
uwf thread exec <thread-id>
|
||||
\`\`\`
|
||||
|
||||
## Available prompts
|
||||
|
||||
\`\`\`bash
|
||||
uwf prompt list # list available prompt names
|
||||
uwf prompt usage > ~/.hermes/skills/uwf-usage/SKILL.md # CLI usage guide
|
||||
uwf prompt workflow-authoring > ~/.hermes/skills/uwf-workflow-authoring/SKILL.md
|
||||
uwf prompt adapter-developing > ~/.hermes/skills/uwf-adapter-developing/SKILL.md
|
||||
uwf prompt bootstrap > ~/.hermes/skills/uwf/SKILL.md # bootstrap skill
|
||||
uwf prompt usage # CLI usage guide
|
||||
uwf prompt workflow-authoring # workflow YAML design guide
|
||||
uwf prompt adapter-developing # building agent adapters
|
||||
uwf prompt bootstrap # this guide
|
||||
\`\`\`
|
||||
|
||||
## Notes
|
||||
|
||||
- The skill content is bundled with the CLI and versioned with it — always use
|
||||
\`uwf prompt usage\` to get the content matching your installed version.
|
||||
- Do NOT hand-edit the skill body. If the CLI is updated, re-run \`uwf prompt setup\`
|
||||
and follow the steps again.
|
||||
- When upgrading, always delete the old skill first to avoid stale instructions.
|
||||
`;
|
||||
}
|
||||
|
||||
@@ -1,3 +1,4 @@
|
||||
import { execFileSync } from "node:child_process";
|
||||
import { existsSync, mkdirSync, readdirSync, readFileSync, statSync, writeFileSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { stdin as input, stdout as output } from "node:process";
|
||||
@@ -72,6 +73,12 @@ const PRESET_PROVIDERS = [
|
||||
{ name: "ollama", label: "Ollama (local)", baseUrl: "http://localhost:11434/v1" },
|
||||
] as const;
|
||||
|
||||
/** Look up the base URL for a preset provider name. Returns null if not a preset. */
|
||||
export function resolvePresetBaseUrl(providerName: string): string | null {
|
||||
const preset = PRESET_PROVIDERS.find((p) => p.name === providerName);
|
||||
return preset !== undefined ? preset.baseUrl : null;
|
||||
}
|
||||
|
||||
type SetupArgs = {
|
||||
provider: string;
|
||||
baseUrl: string;
|
||||
@@ -175,7 +182,6 @@ export async function _discoverAgents(): Promise<string[]> {
|
||||
|
||||
async function _tryWhichDiscovery(): Promise<string[] | null> {
|
||||
try {
|
||||
const { execFileSync } = await import("node:child_process");
|
||||
const text = execFileSync("which", ["-a", "uwf-hermes", "uwf-claude-code", "uwf-cursor"], {
|
||||
encoding: "utf-8",
|
||||
stdio: ["pipe", "pipe", "pipe"],
|
||||
@@ -391,6 +397,37 @@ function mergeConfig(existing: Record<string, unknown>, args: SetupArgs): Record
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* Check if the configured adapter binary (and its dependencies) are in PATH.
|
||||
* Returns warnings array — empty means all good.
|
||||
*/
|
||||
export function _checkAdapterAvailability(agentName: string): string[] {
|
||||
const warnings: string[] = [];
|
||||
const binary = `uwf-${agentName}`;
|
||||
|
||||
try {
|
||||
execFileSync("which", [binary], { encoding: "utf8", stdio: ["pipe", "pipe", "pipe"] });
|
||||
} catch {
|
||||
warnings.push(
|
||||
`${binary} not found in PATH. Install it: pnpm add -g @united-workforce/agent-${agentName}`,
|
||||
);
|
||||
return warnings; // skip dependency check if adapter itself is missing
|
||||
}
|
||||
|
||||
// uwf-hermes depends on hermes CLI
|
||||
if (agentName === "hermes") {
|
||||
try {
|
||||
execFileSync("which", ["hermes"], { encoding: "utf8", stdio: ["pipe", "pipe", "pipe"] });
|
||||
} catch {
|
||||
warnings.push(
|
||||
'hermes CLI not found in PATH (required by uwf-hermes). Fix: export PATH="$HOME/.hermes/hermes-agent/.venv/bin:$PATH"',
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
return warnings;
|
||||
}
|
||||
|
||||
/**
|
||||
* Non-interactive setup. All required args provided via CLI flags.
|
||||
*/
|
||||
@@ -405,15 +442,26 @@ export async function cmdSetup(args: SetupArgs): Promise<Record<string, unknown>
|
||||
|
||||
writeFileSync(configPath, stringify(merged, { indent: 2 }), "utf8");
|
||||
|
||||
// Print config path to stderr (stdout is reserved for JSON output)
|
||||
console.error(`Config saved to ${configPath} ✓`);
|
||||
|
||||
// Validate model connectivity
|
||||
const validation = await validateModel(args.baseUrl, args.apiKey, args.model);
|
||||
|
||||
// Check adapter availability
|
||||
const agentName = _agentNameFromBinary(args.agent ?? "hermes");
|
||||
const adapterWarnings = _checkAdapterAvailability(agentName);
|
||||
for (const w of adapterWarnings) {
|
||||
console.error(`⚠ ${w}`);
|
||||
}
|
||||
|
||||
return {
|
||||
configPath,
|
||||
provider: args.provider,
|
||||
model: args.model,
|
||||
defaultAgent: merged.defaultAgent,
|
||||
validation,
|
||||
adapterWarnings,
|
||||
};
|
||||
}
|
||||
|
||||
|
||||
@@ -911,7 +911,7 @@ function resolveEvaluateArgs(
|
||||
chain: ChainState,
|
||||
): { lastRole: string; lastOutput: EvaluateLastOutput } {
|
||||
if (chain.headIsStart) {
|
||||
return { lastRole: START_ROLE, lastOutput: { [STATUS_KEY]: "_" } };
|
||||
return { lastRole: START_ROLE, lastOutput: { [STATUS_KEY]: "new" } };
|
||||
}
|
||||
|
||||
const lastStep = chain.stepsNewestFirst[0];
|
||||
@@ -961,6 +961,12 @@ function resolveAgentConfig(
|
||||
agentOverride: string | null,
|
||||
): AgentConfig {
|
||||
if (agentOverride !== null) {
|
||||
// Try config alias first (e.g. "hermes" → config.agents.hermes),
|
||||
// then fall back to raw command name (e.g. "uwf-hermes" or "/usr/bin/agent").
|
||||
const fromAlias = config.agents[agentOverride as AgentAlias];
|
||||
if (fromAlias !== undefined) {
|
||||
return fromAlias;
|
||||
}
|
||||
return parseAgentOverride(agentOverride);
|
||||
}
|
||||
|
||||
@@ -998,6 +1004,12 @@ function spawnAgent(
|
||||
});
|
||||
} catch (e) {
|
||||
const err = e as NodeJS.ErrnoException & { stderr?: Buffer | string | null };
|
||||
if (err.code === "ENOENT") {
|
||||
failStep(
|
||||
plog,
|
||||
`"${agent.command}" not found in PATH. Install it or check your PATH config. Run: which ${agent.command}`,
|
||||
);
|
||||
}
|
||||
const stderr =
|
||||
err.stderr == null
|
||||
? ""
|
||||
@@ -1031,7 +1043,6 @@ function archiveThread(uwf: UwfStore, threadId: ThreadId, _workflow: CasRef, _he
|
||||
completeThread(uwf.varStore, threadId, "completed");
|
||||
}
|
||||
|
||||
// biome-ignore lint/complexity/noExcessiveCognitiveComplexity: orchestration function with inherent branching
|
||||
export async function cmdThreadResume(
|
||||
storageRoot: string,
|
||||
threadId: ThreadId,
|
||||
@@ -1095,7 +1106,7 @@ export async function cmdThreadResume(
|
||||
|
||||
// status === "completed"
|
||||
const workflow = loadWorkflowPayload(uwf, workflowHash);
|
||||
const startResult = evaluate(workflow.graph, START_ROLE, {});
|
||||
const startResult = evaluate(workflow.graph, START_ROLE, { [STATUS_KEY]: "resume" });
|
||||
if (!startResult.ok) {
|
||||
fail(`failed to evaluate $START: ${startResult.error.message}`);
|
||||
}
|
||||
@@ -1107,11 +1118,7 @@ export async function cmdThreadResume(
|
||||
}
|
||||
|
||||
const startRole = startResult.value.role;
|
||||
const completedPromptPrefix = "Previous run completed. Resuming with additional context.";
|
||||
const completedResumePrompt =
|
||||
supplement !== null && supplement !== ""
|
||||
? `${completedPromptPrefix}\n\n${supplement}`
|
||||
: completedPromptPrefix;
|
||||
const completedResumePrompt = buildResumePrompt(startResult.value.prompt, supplement);
|
||||
|
||||
const updatedEntry = { ...entry, status: "idle" as const, completedAt: null };
|
||||
setThread(uwf.varStore, threadId, updatedEntry);
|
||||
|
||||
@@ -6,11 +6,11 @@ describe("Edge prompt template variable resolution", () => {
|
||||
test("returns error when rendered prompt is empty string", () => {
|
||||
const graph = {
|
||||
$START: {
|
||||
_: { role: "classifier", prompt: "{{{userPrompt}}}", location: null },
|
||||
new: { role: "classifier", prompt: "{{{userPrompt}}}", location: null },
|
||||
},
|
||||
};
|
||||
|
||||
const result = evaluate(graph, "$START", {});
|
||||
const result = evaluate(graph, "$START", { $status: "new" });
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (!result.ok) {
|
||||
@@ -22,11 +22,11 @@ describe("Edge prompt template variable resolution", () => {
|
||||
test("returns error when rendered prompt is whitespace-only", () => {
|
||||
const graph = {
|
||||
$START: {
|
||||
_: { role: "classifier", prompt: " {{{userPrompt}}} ", location: null },
|
||||
new: { role: "classifier", prompt: " {{{userPrompt}}} ", location: null },
|
||||
},
|
||||
};
|
||||
|
||||
const result = evaluate(graph, "$START", {});
|
||||
const result = evaluate(graph, "$START", { $status: "new" });
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
if (!result.ok) {
|
||||
@@ -38,11 +38,11 @@ describe("Edge prompt template variable resolution", () => {
|
||||
test("succeeds when all template variables resolve to non-empty values", () => {
|
||||
const graph = {
|
||||
$START: {
|
||||
_: { role: "classifier", prompt: "{{{userPrompt}}}", location: null },
|
||||
new: { role: "classifier", prompt: "{{{userPrompt}}}", location: null },
|
||||
},
|
||||
};
|
||||
|
||||
const result = evaluate(graph, "$START", { userPrompt: "Fix the bug" });
|
||||
const result = evaluate(graph, "$START", { $status: "new", userPrompt: "Fix the bug" });
|
||||
|
||||
expect(result.ok).toBe(true);
|
||||
if (result.ok) {
|
||||
@@ -53,11 +53,11 @@ describe("Edge prompt template variable resolution", () => {
|
||||
test("succeeds with static (no-variable) prompt", () => {
|
||||
const graph = {
|
||||
$START: {
|
||||
_: { role: "classifier", prompt: "Classify this input", location: null },
|
||||
new: { role: "classifier", prompt: "Classify this input", location: null },
|
||||
},
|
||||
};
|
||||
|
||||
const result = evaluate(graph, "$START", {});
|
||||
const result = evaluate(graph, "$START", { $status: "new" });
|
||||
|
||||
expect(result.ok).toBe(true);
|
||||
if (result.ok) {
|
||||
@@ -68,11 +68,11 @@ describe("Edge prompt template variable resolution", () => {
|
||||
test("succeeds when prompt has mix of static text and unresolved variables", () => {
|
||||
const graph = {
|
||||
$START: {
|
||||
_: { role: "classifier", prompt: "Please handle: {{{userPrompt}}}", location: null },
|
||||
new: { role: "classifier", prompt: "Please handle: {{{userPrompt}}}", location: null },
|
||||
},
|
||||
};
|
||||
|
||||
const result = evaluate(graph, "$START", {});
|
||||
const result = evaluate(graph, "$START", { $status: "new" });
|
||||
|
||||
expect(result.ok).toBe(true);
|
||||
if (result.ok) {
|
||||
@@ -83,11 +83,11 @@ describe("Edge prompt template variable resolution", () => {
|
||||
test("returns error when ALL variables missing and no static text remains", () => {
|
||||
const graph = {
|
||||
$START: {
|
||||
_: { role: "classifier", prompt: "{{{a}}}{{{b}}}", location: null },
|
||||
new: { role: "classifier", prompt: "{{{a}}}{{{b}}}", location: null },
|
||||
},
|
||||
};
|
||||
|
||||
const result = evaluate(graph, "$START", {});
|
||||
const result = evaluate(graph, "$START", { $status: "new" });
|
||||
|
||||
expect(result.ok).toBe(false);
|
||||
});
|
||||
|
||||
@@ -6,10 +6,7 @@ import type { EvaluateResult, Result } from "./types.js";
|
||||
// Disable HTML escaping — prompts are plain text, not HTML.
|
||||
mustache.escape = (text: string) => text;
|
||||
|
||||
const START_ROLE = "$START";
|
||||
const SUSPEND_ROLE = "$SUSPEND";
|
||||
// $START is a special entry node with no agent output — it always uses this key.
|
||||
const START_STATUS = "_";
|
||||
|
||||
type LastOutput = Record<string, unknown>;
|
||||
|
||||
@@ -21,9 +18,7 @@ export function evaluate(
|
||||
lastOutput: LastOutput,
|
||||
): Result<EvaluateResult, Error> {
|
||||
let status: string;
|
||||
if (lastRole === START_ROLE) {
|
||||
status = START_STATUS;
|
||||
} else if (typeof lastOutput[STATUS_KEY] === "string") {
|
||||
if (typeof lastOutput[STATUS_KEY] === "string") {
|
||||
status = lastOutput[STATUS_KEY] as string;
|
||||
} else {
|
||||
return {
|
||||
|
||||
@@ -24,22 +24,22 @@ function isOneOfSchema(fm: unknown): fm is SchemaObj & { oneOf: SchemaObj[] } {
|
||||
return Array.isArray(obj.oneOf);
|
||||
}
|
||||
|
||||
/** Check if a frontmatter schema declares "$status" as an enum (the required form for user roles). */
|
||||
function hasStatusEnum(fm: unknown): boolean {
|
||||
/** Check if a frontmatter schema declares "$status" as const (flat schema form). */
|
||||
function hasStatusConst(fm: unknown): boolean {
|
||||
if (typeof fm !== "object" || fm === null) return false;
|
||||
const obj = fm as SchemaObj;
|
||||
const props = obj.properties as Record<string, SchemaObj> | undefined;
|
||||
if (!props?.$status) return false;
|
||||
return Array.isArray(props.$status.enum);
|
||||
return typeof props.$status.const === "string";
|
||||
}
|
||||
|
||||
/** Extract status values from an enum-based $status field. */
|
||||
function getEnumStatuses(fm: SchemaObj): string[] {
|
||||
/** Extract status values from a const-based $status field. */
|
||||
function getConstStatuses(fm: SchemaObj): string[] {
|
||||
const props = fm.properties as Record<string, SchemaObj> | undefined;
|
||||
if (!props?.$status) return [];
|
||||
const statusDef = props.$status;
|
||||
if (!Array.isArray(statusDef.enum)) return [];
|
||||
return statusDef.enum as string[];
|
||||
if (typeof statusDef.const === "string") return [statusDef.const];
|
||||
return [];
|
||||
}
|
||||
|
||||
/** Get property names from a schema object. */
|
||||
@@ -97,9 +97,9 @@ function checkGraphStructure(payload: WorkflowPayload, errors: string[]): void {
|
||||
if (!graphNodes.has("$START")) {
|
||||
errors.push("$START must be defined in graph");
|
||||
} else {
|
||||
const startKeys = Object.keys(payload.graph.$START);
|
||||
if (startKeys.length !== 1 || startKeys[0] !== "_") {
|
||||
errors.push('$START must have exactly one edge with status "_"');
|
||||
const startKeys = new Set(Object.keys(payload.graph.$START));
|
||||
if (!startKeys.has("new") || !startKeys.has("resume")) {
|
||||
errors.push('$START must have edges with statuses "new" and "resume"');
|
||||
}
|
||||
}
|
||||
|
||||
@@ -190,22 +190,13 @@ function checkOneOfDiscriminant(
|
||||
}
|
||||
}
|
||||
|
||||
/** Check status-edge consistency for a user role. "_" is reserved for $START and rejected here. */
|
||||
/** Check status-edge consistency for a user role. */
|
||||
function checkStatusEdges(
|
||||
roleName: string,
|
||||
graphKeys: Set<string>,
|
||||
statusSet: Set<string>,
|
||||
errors: string[],
|
||||
): void {
|
||||
if (graphKeys.has("_")) {
|
||||
errors.push(`role "${roleName}" must use explicit $status keys in graph, not "_"`);
|
||||
return;
|
||||
}
|
||||
if (statusSet.has("_")) {
|
||||
errors.push(`role "${roleName}" $status enum must use explicit values, not "_"`);
|
||||
return;
|
||||
}
|
||||
|
||||
const extraKeys = [...graphKeys].filter((k) => !statusSet.has(k));
|
||||
const missingKeys = [...statusSet].filter((k) => !graphKeys.has(k));
|
||||
if (extraKeys.length > 0) {
|
||||
@@ -257,21 +248,21 @@ function checkRoleConsistency(payload: WorkflowPayload, errors: string[]): void
|
||||
checkOneOfDiscriminant(roleName, variants, statuses, errors);
|
||||
checkStatusEdges(roleName, graphKeys, new Set(statuses), errors);
|
||||
checkMultiExitMustache(roleName, graphEntry, variants, errors);
|
||||
} else if (hasStatusEnum(fm)) {
|
||||
const statuses = getEnumStatuses(fm as SchemaObj);
|
||||
} else if (hasStatusConst(fm)) {
|
||||
const statuses = getConstStatuses(fm as SchemaObj);
|
||||
checkStatusEdges(roleName, graphKeys, new Set(statuses), errors);
|
||||
// For enum-based schemas, mustache vars come from the flat properties
|
||||
checkEnumMustache(roleName, graphEntry, fm as SchemaObj, errors);
|
||||
// For const-based flat schemas, mustache vars come from the flat properties
|
||||
checkFlatMustache(roleName, graphEntry, fm as SchemaObj, errors);
|
||||
} else {
|
||||
errors.push(
|
||||
`role "${roleName}" must define "$status" as an enum (or oneOf const) in frontmatter`,
|
||||
`role "${roleName}" must define "$status" as const (or oneOf with const) in frontmatter`,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/** Check mustache vars in all edge prompts against flat schema properties. */
|
||||
function checkEnumMustache(
|
||||
function checkFlatMustache(
|
||||
roleName: string,
|
||||
graphEntry: Record<string, { role: string; prompt: string }>,
|
||||
fm: SchemaObj,
|
||||
|
||||
@@ -57,13 +57,13 @@ function isGraph(value: unknown): boolean {
|
||||
if (!isRecord(value)) {
|
||||
return false;
|
||||
}
|
||||
return Object.entries(value).every(([node, statusMap]) => {
|
||||
return Object.values(value).every((statusMap) => {
|
||||
if (!isRecord(statusMap)) {
|
||||
return false;
|
||||
}
|
||||
return Object.entries(statusMap).every(([status, target]) => {
|
||||
// "_" is only valid as a status key for the $START entry node.
|
||||
if (status === "_" && node !== "$START") {
|
||||
// "_" is no longer a valid status key anywhere — $START uses "new"/"resume".
|
||||
if (status === "_") {
|
||||
return false;
|
||||
}
|
||||
return isTarget(target);
|
||||
@@ -99,7 +99,7 @@ export function checkWorkflowFilenameConsistency(
|
||||
): string | null {
|
||||
const expected = workflowNameFromPath(filePath);
|
||||
if (payload.name !== expected) {
|
||||
return `workflow name mismatch: file "${basename(filePath)}" implies name "${expected}" but YAML declares name "${payload.name}"`;
|
||||
return `workflow name mismatch: file "${basename(filePath)}" implies name "${expected}" but YAML declares name "${payload.name}". Either rename the file to "${payload.name}.yaml" or change the YAML \`name\` field to "${expected}"`;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
@@ -0,0 +1,9 @@
|
||||
# @united-workforce/eval
|
||||
|
||||
## 0.1.2
|
||||
|
||||
### Patch Changes
|
||||
|
||||
- 850a3b2: fix: resolve --agent override via config alias before raw command
|
||||
|
||||
`resolveAgentConfig()` now checks `config.agents[alias]` first before falling back to `parseAgentOverride()`. Eval CLI default `--agent` changed from `"hermes"` to `"uwf-hermes"`.
|
||||
@@ -91,6 +91,29 @@ describe("frontmatter-compliance judge", () => {
|
||||
const result = await runFrontmatterJudge("T4");
|
||||
expect(result.score).toBe(0);
|
||||
});
|
||||
|
||||
test("parsed object output with $status → score 1.0", async () => {
|
||||
mockedReadSteps.mockReturnValue([
|
||||
makeStep({ role: "a", output: { $status: "done", summary: "fixed" } as unknown as string }),
|
||||
makeStep({ role: "b", output: { $status: "reviewed" } as unknown as string }),
|
||||
]);
|
||||
|
||||
const result = await runFrontmatterJudge("T5");
|
||||
const data = result.data as { stepsTotal: number; stepsValid: number; invalidSteps: unknown[] };
|
||||
|
||||
expect(result.score).toBe(1.0);
|
||||
expect(data.stepsTotal).toBe(2);
|
||||
expect(data.stepsValid).toBe(2);
|
||||
});
|
||||
|
||||
test("parsed object output missing $status → score 0", async () => {
|
||||
mockedReadSteps.mockReturnValue([
|
||||
makeStep({ role: "a", output: { summary: "no status field" } as unknown as string }),
|
||||
]);
|
||||
|
||||
const result = await runFrontmatterJudge("T6");
|
||||
expect(result.score).toBe(0);
|
||||
});
|
||||
});
|
||||
|
||||
describe("token-stats judge", () => {
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/eval",
|
||||
"version": "0.1.0",
|
||||
"version": "0.1.5",
|
||||
"private": false,
|
||||
"files": [
|
||||
"src",
|
||||
@@ -22,8 +22,8 @@
|
||||
"test:ci": "vitest run __tests__/"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ocas/core": "^0.3.0",
|
||||
"@ocas/fs": "^0.3.0",
|
||||
"@ocas/core": "^0.4.0",
|
||||
"@ocas/fs": "^0.4.0",
|
||||
"@united-workforce/protocol": "workspace:^",
|
||||
"@united-workforce/util": "workspace:^",
|
||||
"commander": "^14.0.3",
|
||||
|
||||
@@ -7,12 +7,15 @@ import {
|
||||
registerRunCommand,
|
||||
} from "./commands/index.js";
|
||||
|
||||
// eslint-disable-next-line -- dynamic import for version
|
||||
const pkg = await import("../package.json", { with: { type: "json" } });
|
||||
|
||||
const program = new Command();
|
||||
|
||||
program
|
||||
.name("uwf-eval")
|
||||
.description("Evaluate uwf workflow quality with real agents")
|
||||
.version("0.1.0");
|
||||
.version(pkg.default.version, "-V, --version");
|
||||
|
||||
registerRunCommand(program);
|
||||
registerReportCommand(program);
|
||||
|
||||
@@ -6,7 +6,7 @@ import { formatList, selectEntries } from "./format.js";
|
||||
import { readEvalEntries } from "./read.js";
|
||||
|
||||
const log = createLogger({ sink: { kind: "stderr" } });
|
||||
const LOG_LIST = "L5KX9R2B";
|
||||
const LOG_LIST = "H5KX9R2B";
|
||||
|
||||
type ListCliOptions = {
|
||||
task: string | undefined;
|
||||
|
||||
@@ -52,7 +52,7 @@ export function registerRunCommand(program: Command): void {
|
||||
program
|
||||
.command("run <task>")
|
||||
.description("Run eval on a task directory or tarball")
|
||||
.option("--agent <name>", "agent adapter to use", "hermes")
|
||||
.option("--agent <name>", "agent adapter to use", "uwf-hermes")
|
||||
.option("--model <model>", "model override")
|
||||
.option("--count <n>", "number of eval runs", "1")
|
||||
.action(async (task: string, opts: RunCliOptions) => {
|
||||
|
||||
@@ -39,6 +39,16 @@ function extractFrontmatterYaml(output: unknown): string | null {
|
||||
|
||||
/** Validate a single step's frontmatter, returning a list of errors (empty = valid). */
|
||||
function validateStepFrontmatter(output: unknown): string[] {
|
||||
// CAS stores the extracted output as a JSON object after the extract pipeline.
|
||||
// Accept both: parsed object (from step.output) or raw markdown string.
|
||||
if (typeof output === "object" && output !== null && !Array.isArray(output)) {
|
||||
const status = (output as Record<string, unknown>).$status;
|
||||
if (typeof status !== "string" || status.trim() === "") {
|
||||
return ["$status field is missing or not a non-empty string"];
|
||||
}
|
||||
return [];
|
||||
}
|
||||
|
||||
const yaml = extractFrontmatterYaml(output);
|
||||
if (yaml === null) {
|
||||
return ["output does not begin with a valid '---' frontmatter block"];
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/protocol",
|
||||
"version": "0.1.0",
|
||||
"version": "0.1.1",
|
||||
"files": [
|
||||
"src",
|
||||
"dist",
|
||||
@@ -18,8 +18,8 @@
|
||||
"test:ci": "vitest run src/__tests__/"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ocas/core": "^0.3.0",
|
||||
"@ocas/fs": "^0.3.0"
|
||||
"@ocas/core": "^0.4.0",
|
||||
"@ocas/fs": "^0.4.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"typescript": "^5.8.3"
|
||||
|
||||
@@ -0,0 +1,8 @@
|
||||
# Changelog
|
||||
|
||||
## 0.1.2 — 2026-06-07
|
||||
|
||||
- fix: decouple session resume from isFirstVisit guard
|
||||
|
||||
When frontmatter validation fails, the step is never written to CAS, so isFirstVisit remains true on the next run. Both adapters now always check the session cache regardless of isFirstVisit. When resuming after a frontmatter-only failure (isFirstVisit + cache hit), a minimal correction prompt is sent via buildFrontmatterRetryPrompt() instead of re-sending the full initial prompt.
|
||||
|
||||
@@ -143,7 +143,7 @@ describe("buildOutputFormatInstruction", () => {
|
||||
{
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { type: "string", enum: ["approved"] },
|
||||
$status: { const: "approved" },
|
||||
branch: { type: "string" },
|
||||
},
|
||||
required: ["$status"],
|
||||
@@ -151,7 +151,7 @@ describe("buildOutputFormatInstruction", () => {
|
||||
{
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { type: "string", enum: ["rejected"] },
|
||||
$status: { const: "rejected" },
|
||||
comments: { type: "string" },
|
||||
},
|
||||
required: ["$status"],
|
||||
@@ -225,4 +225,34 @@ describe("buildOutputFormatInstruction", () => {
|
||||
const result = buildOutputFormatInstruction({});
|
||||
expect(result).toContain("Focus exclusively on YOUR role");
|
||||
});
|
||||
|
||||
test("renders const value as literal in flat schema example", () => {
|
||||
const schema = {
|
||||
type: "object",
|
||||
properties: {
|
||||
$status: { type: "string", const: "greeted" },
|
||||
message: { type: "string" },
|
||||
},
|
||||
required: ["$status", "message"],
|
||||
};
|
||||
const result = buildOutputFormatInstruction(schema);
|
||||
expect(result).toContain("$status: greeted");
|
||||
expect(result).toContain("fixed value");
|
||||
expect(result).not.toContain("$status: <string>");
|
||||
});
|
||||
|
||||
test("renders const value for non-string types", () => {
|
||||
const schema = {
|
||||
type: "object",
|
||||
properties: {
|
||||
count: { type: "number", const: 42 },
|
||||
done: { type: "boolean", const: true },
|
||||
},
|
||||
required: ["count", "done"],
|
||||
};
|
||||
const result = buildOutputFormatInstruction(schema);
|
||||
expect(result).toContain("count: 42");
|
||||
expect(result).toContain("done: true");
|
||||
expect(result).toContain("fixed value");
|
||||
});
|
||||
});
|
||||
|
||||
@@ -0,0 +1,59 @@
|
||||
import type { StepContext } from "@united-workforce/protocol";
|
||||
import { describe, expect, test } from "vitest";
|
||||
import { buildThreadProgress } from "../src/build-thread-progress.js";
|
||||
|
||||
function makeStep(role: string): StepContext {
|
||||
return {
|
||||
role,
|
||||
output: {},
|
||||
detail: "0000000000000" as string,
|
||||
agent: "uwf-mock",
|
||||
edgePrompt: "",
|
||||
startedAtMs: 0,
|
||||
completedAtMs: 0,
|
||||
cwd: "",
|
||||
assembledPrompt: null,
|
||||
usage: null,
|
||||
content: null,
|
||||
};
|
||||
}
|
||||
|
||||
describe("buildThreadProgress", () => {
|
||||
test("first step of thread", () => {
|
||||
const result = buildThreadProgress([], "proponent");
|
||||
expect(result).toContain("## Thread Progress");
|
||||
expect(result).toContain("first step");
|
||||
expect(result).toContain("first time");
|
||||
expect(result).toContain("proponent");
|
||||
});
|
||||
|
||||
test("second step, role not seen before", () => {
|
||||
const steps = [makeStep("opponent")];
|
||||
const result = buildThreadProgress(steps, "proponent");
|
||||
expect(result).toContain("Thread step 2");
|
||||
expect(result).toContain("spoken 0 times");
|
||||
});
|
||||
|
||||
test("role has spoken once before", () => {
|
||||
const steps = [makeStep("proponent"), makeStep("opponent")];
|
||||
const result = buildThreadProgress(steps, "proponent");
|
||||
expect(result).toContain("Thread step 3");
|
||||
expect(result).toContain("spoken 1 time before");
|
||||
// singular "time" not "times"
|
||||
expect(result).not.toContain("1 times");
|
||||
});
|
||||
|
||||
test("role has spoken multiple times", () => {
|
||||
const steps = [
|
||||
makeStep("proponent"),
|
||||
makeStep("opponent"),
|
||||
makeStep("proponent"),
|
||||
makeStep("opponent"),
|
||||
makeStep("proponent"),
|
||||
makeStep("opponent"),
|
||||
];
|
||||
const result = buildThreadProgress(steps, "proponent");
|
||||
expect(result).toContain("Thread step 7");
|
||||
expect(result).toContain("spoken 3 times");
|
||||
});
|
||||
});
|
||||
@@ -0,0 +1,23 @@
|
||||
import { describe, expect, test } from "vitest";
|
||||
import { buildFrontmatterRetryPrompt } from "../src/frontmatter-retry-prompt.js";
|
||||
|
||||
describe("buildFrontmatterRetryPrompt", () => {
|
||||
test("includes correction instruction", () => {
|
||||
const result = buildFrontmatterRetryPrompt("Use YAML frontmatter");
|
||||
expect(result).toContain("previous run completed");
|
||||
expect(result).toContain("do NOT need to redo any work");
|
||||
expect(result).toContain("corrected YAML frontmatter");
|
||||
});
|
||||
|
||||
test("includes outputFormatInstruction when provided", () => {
|
||||
const instruction = "---\nstatus: $done | $review\nsummary: string\n---";
|
||||
const result = buildFrontmatterRetryPrompt(instruction);
|
||||
expect(result).toContain(instruction);
|
||||
});
|
||||
|
||||
test("works with empty outputFormatInstruction", () => {
|
||||
const result = buildFrontmatterRetryPrompt("");
|
||||
expect(result).not.toContain("\n\n\n");
|
||||
expect(result).toContain("corrected YAML frontmatter");
|
||||
});
|
||||
});
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/util-agent",
|
||||
"version": "0.1.0",
|
||||
"version": "0.1.2",
|
||||
"files": [
|
||||
"src",
|
||||
"dist",
|
||||
@@ -18,8 +18,8 @@
|
||||
"test:ci": "vitest run __tests__/ src/__tests__/"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ocas/core": "^0.3.0",
|
||||
"@ocas/fs": "^0.3.0",
|
||||
"@ocas/core": "^0.4.0",
|
||||
"@ocas/fs": "^0.4.0",
|
||||
"@united-workforce/protocol": "workspace:^",
|
||||
"@united-workforce/util": "workspace:^",
|
||||
"dotenv": "^16.6.1",
|
||||
|
||||
@@ -74,6 +74,10 @@ function collectObjectSchemas(schema: JSONSchema): JSONSchema[] {
|
||||
}
|
||||
|
||||
function resolvePropertySchema(prop: JSONSchema): JSONSchema {
|
||||
if (prop.const !== undefined) {
|
||||
return prop;
|
||||
}
|
||||
|
||||
if (Array.isArray(prop.enum) && prop.enum.length > 0) {
|
||||
return prop;
|
||||
}
|
||||
@@ -113,6 +117,11 @@ function buildPropertyExampleLine(prop: SchemaProperty): string {
|
||||
commentParts.push("required");
|
||||
}
|
||||
|
||||
if (resolved.const !== undefined) {
|
||||
commentParts.push("fixed value");
|
||||
return `${prop.name}: ${formatYamlScalar(resolved.const)}${buildPropertyComment(commentParts)}`;
|
||||
}
|
||||
|
||||
if (Array.isArray(resolved.enum) && resolved.enum.length > 0) {
|
||||
const enumValues = resolved.enum.map((v) => String(v));
|
||||
commentParts.push(...enumValues);
|
||||
|
||||
@@ -0,0 +1,27 @@
|
||||
import type { StepContext } from "@united-workforce/protocol";
|
||||
|
||||
/**
|
||||
* Build a compact thread-progress summary so the agent knows where it is
|
||||
* in the conversation without making tool calls to count steps.
|
||||
*
|
||||
* Example output:
|
||||
* ## Thread Progress
|
||||
* Thread step 6. You (proponent) have spoken 2 times before this turn.
|
||||
*/
|
||||
export function buildThreadProgress(steps: StepContext[], role: string): string {
|
||||
const totalSteps = steps.length;
|
||||
const roleVisits = steps.filter((s) => s.role === role).length;
|
||||
|
||||
const parts = [`## Thread Progress`];
|
||||
if (totalSteps === 0) {
|
||||
parts.push(
|
||||
`This is the first step of the thread. You (${role}) are speaking for the first time.`,
|
||||
);
|
||||
} else {
|
||||
parts.push(
|
||||
`Thread step ${totalSteps + 1}. You (${role}) have spoken ${roleVisits} time${roleVisits === 1 ? "" : "s"} before this turn.`,
|
||||
);
|
||||
}
|
||||
|
||||
return parts.join("\n");
|
||||
}
|
||||
@@ -0,0 +1,21 @@
|
||||
/**
|
||||
* Build a minimal prompt for retrying frontmatter output on a resumed session.
|
||||
*
|
||||
* Used when a previous run completed successfully but frontmatter validation
|
||||
* failed — the session already has full context, we just need the agent to
|
||||
* re-output correctly formatted frontmatter without redoing any work.
|
||||
*/
|
||||
export function buildFrontmatterRetryPrompt(outputFormatInstruction: string): string {
|
||||
const parts: string[] = [
|
||||
"Your previous run completed all work successfully, but the output format was incorrect.",
|
||||
"You do NOT need to redo any work — all changes are already in place.",
|
||||
"",
|
||||
];
|
||||
if (outputFormatInstruction !== "") {
|
||||
parts.push(outputFormatInstruction, "");
|
||||
}
|
||||
parts.push(
|
||||
"Please output ONLY the corrected YAML frontmatter block (--- delimited) followed by a brief summary of the work you completed.",
|
||||
);
|
||||
return parts.join("\n");
|
||||
}
|
||||
@@ -1,6 +1,7 @@
|
||||
export { buildContinuationPrompt } from "./build-continuation-prompt.js";
|
||||
export { buildOutputFormatInstruction } from "./build-output-format-instruction.js";
|
||||
export { buildRolePrompt } from "./build-role-prompt.js";
|
||||
export { buildThreadProgress } from "./build-thread-progress.js";
|
||||
export type { BuildContextMeta } from "./context.js";
|
||||
export { buildContext, buildContextWithMeta } from "./context.js";
|
||||
export type { ExtractResult, ResolvedLlmProvider } from "./extract.js";
|
||||
@@ -11,6 +12,7 @@ export {
|
||||
} from "./extract.js";
|
||||
export type { FrontmatterFastPathResult } from "./frontmatter.js";
|
||||
export { tryFrontmatterFastPath } from "./frontmatter.js";
|
||||
export { buildFrontmatterRetryPrompt } from "./frontmatter-retry-prompt.js";
|
||||
export { createAgent, parseArgv } from "./run.js";
|
||||
export { getCachedSessionId, getCachePath, setCachedSessionId } from "./session-cache.js";
|
||||
export { getConfigPath, getEnvPath, loadWorkflowConfig, resolveStorageRoot } from "./storage.js";
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@united-workforce/util",
|
||||
"version": "0.1.0",
|
||||
"version": "0.1.4",
|
||||
"files": [
|
||||
"src",
|
||||
"dist",
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
export function generateBootstrapReference(): string {
|
||||
return `---
|
||||
name: uwf
|
||||
description: "United Workforce (uwf) — YAML 状态机工作流引擎。任务涉及 workflow 时加载此 skill。"
|
||||
tags: [workflow, uwf]
|
||||
triggers:
|
||||
- uwf
|
||||
- workflow
|
||||
- 工作流
|
||||
---
|
||||
|
||||
# uwf (United Workforce)
|
||||
|
||||
YAML 状态机工作流引擎。当用户提到「workflow」「工作流」时,指的是 **uwf workflow**(YAML 定义的状态机),不是 Hermes skill。用 \`uwf\` CLI 操作,不要混淆。
|
||||
|
||||
## 首次使用
|
||||
|
||||
运行以下命令获取完整用法:
|
||||
|
||||
\`\`\`bash
|
||||
uwf prompt usage # 完整用法文档(所有引用合并)
|
||||
uwf prompt workflow-authoring # workflow 编写指南(role 定义、graph 路由、schema)
|
||||
uwf prompt adapter-developing # adapter 开发指南(构建新的 agent adapter)
|
||||
\`\`\`
|
||||
|
||||
## 快速参考
|
||||
|
||||
\`\`\`bash
|
||||
uwf workflow list # 查看已注册 workflow
|
||||
uwf workflow add <file.yaml> # 注册 workflow
|
||||
uwf thread start <workflow> -p "prompt" # 创建 thread
|
||||
uwf thread exec <thread-id> -c 10 # 执行最多 10 步
|
||||
uwf thread list # 查看所有 thread
|
||||
\`\`\`
|
||||
|
||||
## 示例 workflow
|
||||
|
||||
参考项目 \`examples/\` 目录下的 YAML 文件(analyze-topic、debate、solve-issue)。
|
||||
`;
|
||||
}
|
||||
@@ -2,7 +2,6 @@ export { generateActorReference } from "./actor-reference.js";
|
||||
export { generateAdapterDevelopingReference } from "./adapter-developing-reference.js";
|
||||
export { generateArchitectureReference } from "./architecture-reference.js";
|
||||
export { encodeUint64AsCrockford } from "./base32.js";
|
||||
export { generateBootstrapReference } from "./bootstrap-reference.js";
|
||||
export { generateCliReference } from "./cli-reference.js";
|
||||
export { env } from "./env.js";
|
||||
export type {
|
||||
@@ -16,7 +15,7 @@ export {
|
||||
validateFrontmatter,
|
||||
} from "./frontmatter-markdown/index.js";
|
||||
export { createLogger } from "./logger.js";
|
||||
export { generateModeratorReference } from "./moderator-reference.js";
|
||||
|
||||
export type {
|
||||
CreateProcessLoggerOptions,
|
||||
ProcessLogFn,
|
||||
@@ -36,4 +35,3 @@ export { extractUlidTimestamp, generateUlid } from "./ulid.js";
|
||||
export { generateUsageReference } from "./usage-reference.js";
|
||||
export { VERSION } from "./version.js";
|
||||
export { generateWorkflowAuthoringReference } from "./workflow-authoring-reference.js";
|
||||
export { generateYamlReference } from "./yaml-reference.js";
|
||||
|
||||
@@ -1,56 +0,0 @@
|
||||
export function generateModeratorReference(): string {
|
||||
return `# Moderator Reference
|
||||
|
||||
## Overview
|
||||
|
||||
The moderator is the workflow engine's routing component. It evaluates the directed graph defined in the workflow YAML to determine the next role (or \`$END\`) after each step — with zero LLM cost.
|
||||
|
||||
## Status-Based Routing
|
||||
|
||||
The moderator uses **status-based routing**: it inspects the previous step's extracted output (specifically the \`$status\` field) and looks up the corresponding edge in the graph.
|
||||
|
||||
### Graph Structure
|
||||
|
||||
The graph is a nested map: \`Record<Role | "$START", Record<Status, Target>>\`. Each role maps its possible \`$status\` values to a target with a \`role\` and \`prompt\`:
|
||||
|
||||
\`\`\`yaml
|
||||
graph:
|
||||
$START:
|
||||
_: { role: planner, prompt: "Analyze the issue." }
|
||||
planner:
|
||||
ready: { role: developer, prompt: "Implement the plan (CAS hash: {{{plan}}})." }
|
||||
insufficient_info: { role: $END, prompt: "Not enough info." }
|
||||
developer:
|
||||
done: { role: reviewer, prompt: "Review branch {{{branch}}} at {{{worktree}}}." }
|
||||
failed: { role: $END, prompt: "Developer failed: {{{reason}}}." }
|
||||
reviewer:
|
||||
approved: { role: tester, prompt: "Run tests on {{{branch}}} at {{{worktree}}}." }
|
||||
rejected: { role: developer, prompt: "Fix issues: {{{comments}}}." }
|
||||
\`\`\`
|
||||
|
||||
### Routing Algorithm
|
||||
|
||||
1. Look up \`graph[lastRole]\` to get the status map for the current role
|
||||
2. Look up \`statusMap[lastOutput.$status]\` to get the target
|
||||
3. If target role is \`$END\`, mark thread as completed
|
||||
4. Otherwise, render the edge prompt (Mustache templates with \`{{{field}}}\` from output) and spawn the next agent
|
||||
|
||||
### Edge Prompts and Mustache Templates
|
||||
|
||||
Edge prompts use triple-brace Mustache syntax (\`{{{field}}}\`) to interpolate values from the previous step's output into the next agent's task prompt. This passes structured data (branch names, file paths, CAS hashes) between roles without manual wiring.
|
||||
|
||||
## Special Nodes
|
||||
|
||||
- \`$START\` — entry point; uses status key \`_\` (unconditional) since there is no previous output
|
||||
- \`$END\` — terminal node; thread completes when reached and is moved to history
|
||||
|
||||
## Integration with Steps
|
||||
|
||||
Each \`uwf thread exec\` cycle:
|
||||
1. Moderator reads the thread's head step output
|
||||
2. Looks up \`graph[lastRole][output.$status]\` to pick the next role
|
||||
3. If next is \`$END\`, marks thread as completed
|
||||
4. Otherwise, renders the edge prompt and spawns the agent for the selected role
|
||||
5. Extract pipeline parses agent output → new step node → append to CAS chain
|
||||
`;
|
||||
}
|
||||
@@ -140,5 +140,18 @@ For specific scenarios, run the corresponding \`uwf prompt\` command:
|
||||
|----------|---------|-------------|
|
||||
| Writing workflow YAML | \`uwf prompt workflow-authoring\` | Designing roles, conditions, graphs, and edge prompts |
|
||||
| Building a new agent adapter | \`uwf prompt adapter-developing\` | Creating a new \`uwf-<name>\` CLI adapter |
|
||||
|
||||
## Upgrading
|
||||
|
||||
\`\`\`bash
|
||||
# Install the latest version
|
||||
pnpm add -g @united-workforce/cli@latest @united-workforce/agent-hermes@latest
|
||||
# or: npm install -g @united-workforce/cli@latest @united-workforce/agent-hermes@latest
|
||||
|
||||
# Verify
|
||||
uwf --version
|
||||
|
||||
# Then run uwf prompt bootstrap and follow the upgrade instructions
|
||||
\`\`\`
|
||||
`;
|
||||
}
|
||||
|
||||
@@ -1,2 +1,9 @@
|
||||
// This version is kept in sync with package.json during releases.
|
||||
export const VERSION = "0.1.0";
|
||||
import { readFileSync } from "node:fs";
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const pkg = JSON.parse(readFileSync(join(__dirname, "..", "package.json"), "utf-8")) as {
|
||||
version: string;
|
||||
};
|
||||
export const VERSION = pkg.version;
|
||||
|
||||
@@ -28,6 +28,7 @@ roles: # named actors
|
||||
2. Do that
|
||||
output: "..." # what the agent should produce
|
||||
frontmatter: # JSON Schema for structured output
|
||||
type: object
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "ready" }
|
||||
@@ -40,7 +41,8 @@ roles: # named actors
|
||||
|
||||
graph: # status-based routing
|
||||
$START:
|
||||
_: { role: planner, prompt: "Analyze the issue." }
|
||||
new: { role: planner, prompt: "Analyze the issue." }
|
||||
resume: { role: planner, prompt: "Review the previous run output and continue." }
|
||||
planner:
|
||||
ready: { role: developer, prompt: "Implement {{{plan}}}." }
|
||||
failed: { role: $END, prompt: "Failed: {{{error}}}" }
|
||||
@@ -70,10 +72,13 @@ The \`frontmatter\` field is a standard JSON Schema. It defines the structured f
|
||||
|
||||
### \`$status\` Field
|
||||
|
||||
\`$status\` is the only standard field. Its value determines which graph edge the moderator follows. Use \`const\` to constrain each variant:
|
||||
\`$status\` is the only standard field. Its value determines which graph edge the moderator follows.
|
||||
|
||||
**Multi-exit (oneOf)** — use \`const\` to constrain each variant:
|
||||
|
||||
\`\`\`yaml
|
||||
frontmatter:
|
||||
type: object
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: "done" }
|
||||
@@ -85,22 +90,26 @@ frontmatter:
|
||||
required: [$status, error]
|
||||
\`\`\`
|
||||
|
||||
### Custom Fields
|
||||
|
||||
Add any fields you need for data passing between roles. These are available in edge prompts via Mustache templates.
|
||||
|
||||
### Flat Schema (Single Status)
|
||||
|
||||
When a role has only one outcome:
|
||||
**Single-exit (flat schema)** — same syntax, just no \`oneOf\` wrapper:
|
||||
|
||||
\`\`\`yaml
|
||||
frontmatter:
|
||||
type: object
|
||||
properties:
|
||||
$status: { const: "done" }
|
||||
summary: { type: string }
|
||||
required: [$status, summary]
|
||||
\`\`\`
|
||||
|
||||
**Important rules:**
|
||||
- \`type: object\` is **required** at the top level of frontmatter (both flat and oneOf)
|
||||
- \`$status\` always uses \`const: "value"\` — simple and consistent
|
||||
- \`enum\` is **not supported** for \`$status\` — the validator will reject it
|
||||
|
||||
### Custom Fields
|
||||
|
||||
Add any fields you need for data passing between roles. These are available in edge prompts via Mustache templates.
|
||||
|
||||
## Graph Routing
|
||||
|
||||
The graph maps each role's \`$status\` values to the next role:
|
||||
@@ -113,7 +122,7 @@ graph[role][$status] → { role: nextRole, prompt: edgePrompt }
|
||||
|
||||
| Node | Purpose |
|
||||
|------|---------|
|
||||
| \`$START\` | Entry point — status key is always \`_\` (unconditional) |
|
||||
| \`$START\` | Entry point — status keys \`new\` (first start) and \`resume\` (resuming a completed thread) |
|
||||
| \`$END\` | Terminal — thread completes and is archived |
|
||||
|
||||
### Edge Prompts
|
||||
@@ -178,7 +187,7 @@ ocas get <output-hash>
|
||||
1. Every \`$status\` value in a role's frontmatter has a matching edge in the graph
|
||||
2. Every field referenced in edge prompts (\`{{{field}}}\`) exists in the source role's schema
|
||||
3. Every role referenced in the graph exists in \`roles\`
|
||||
4. \`$START\` has exactly one edge with key \`_\`
|
||||
4. \`$START\` has edges with keys \`new\` and \`resume\`
|
||||
5. At least one path leads to \`$END\`
|
||||
6. No orphan roles (defined but never routed to)
|
||||
|
||||
|
||||
@@ -1,82 +0,0 @@
|
||||
export function generateYamlReference(): string {
|
||||
return `# Workflow YAML Schema Reference
|
||||
|
||||
## Top-Level Structure
|
||||
|
||||
A workflow YAML file defines the complete workflow specification:
|
||||
|
||||
\`\`\`yaml
|
||||
name: solve-issue # verb-first kebab-case identifier
|
||||
description: "..." # human-readable description
|
||||
|
||||
roles: # named actors in the workflow
|
||||
planner:
|
||||
description: "Analyzes issue and outputs a plan"
|
||||
goal: "You are a planning agent."
|
||||
capabilities:
|
||||
- issue-analysis
|
||||
- planning
|
||||
procedure: |
|
||||
1. Read the issue
|
||||
2. Produce a test spec
|
||||
output: "Output the plan summary. Set $status to ready or insufficient_info."
|
||||
frontmatter: # JSON Schema for structured output (drives routing)
|
||||
oneOf:
|
||||
- properties:
|
||||
$status: { const: ready }
|
||||
plan: { type: string }
|
||||
required: [$status, plan]
|
||||
- properties:
|
||||
$status: { const: insufficient_info }
|
||||
required: [$status]
|
||||
|
||||
graph: # status-based routing (nested map)
|
||||
$START:
|
||||
_: { role: planner, prompt: "Analyze the issue." }
|
||||
planner:
|
||||
ready: { role: developer, prompt: "Implement plan {{{plan}}}." }
|
||||
insufficient_info: { role: $END, prompt: "Not enough info." }
|
||||
\`\`\`
|
||||
|
||||
## roles
|
||||
|
||||
Each role defines an actor in the workflow:
|
||||
|
||||
| Field | Type | Description |
|
||||
|-------|------|-------------|
|
||||
| \`description\` | string | Short description of the role's purpose |
|
||||
| \`goal\` | string | System-level goal statement for the agent |
|
||||
| \`capabilities\` | string[] | Tags describing what the role can do |
|
||||
| \`procedure\` | string | Step-by-step instructions for the agent |
|
||||
| \`output\` | string | Description of expected output format |
|
||||
| \`frontmatter\` | JSON Schema | Defines the structured output the agent must produce |
|
||||
|
||||
### frontmatter
|
||||
|
||||
The \`frontmatter\` field is a standard JSON Schema object. The extract pipeline validates agent output against it. Key conventions:
|
||||
- \`$status\` field drives routing decisions in the graph
|
||||
- Use \`const\` or \`enum\` to constrain status values
|
||||
- Use \`oneOf\` to define multiple valid output shapes (one per status)
|
||||
- All \`required\` fields must appear in the agent's frontmatter output
|
||||
|
||||
## graph
|
||||
|
||||
The graph is a nested map defining status-based routing:
|
||||
|
||||
\`\`\`
|
||||
Record<Role | "$START", Record<Status, { role: string, prompt: string }>>
|
||||
\`\`\`
|
||||
|
||||
| Level | Key | Value |
|
||||
|-------|-----|-------|
|
||||
| Outer | Role name or \`$START\` | Status map for that role |
|
||||
| Inner | \`$status\` value (or \`_\` for unconditional) | Target: \`{ role, prompt }\` |
|
||||
|
||||
### Special Nodes
|
||||
- \`$START\` — entry point; uses status key \`_\` (unconditional, no previous output)
|
||||
- \`$END\` — terminal node; thread completes when reached
|
||||
|
||||
### Edge Prompts
|
||||
Prompts use triple-brace Mustache templates (\`{{{field}}}\`) to interpolate values from the previous step's output. Example: \`"Implement plan {{{plan}}} in repo {{{repoPath}}}."\`
|
||||
`;
|
||||
}
|
||||
Generated
+38
-36
@@ -18,8 +18,8 @@ importers:
|
||||
specifier: ^2.31.0
|
||||
version: 2.31.0(@types/node@25.9.1)
|
||||
'@shazhou/proman':
|
||||
specifier: ^0.5.1
|
||||
version: 0.5.1(@biomejs/biome@2.4.16)(typescript@5.9.3)(vite@7.3.5(@types/node@25.9.1)(jiti@2.7.0)(lightningcss@1.32.0)(yaml@2.9.0))(vitest@3.2.6(@types/node@25.9.1)(jiti@2.7.0)(lightningcss@1.32.0)(msw@2.14.6(@types/node@25.9.1)(typescript@5.9.3))(yaml@2.9.0))
|
||||
specifier: ^0.6.3
|
||||
version: 0.6.3(@biomejs/biome@2.4.16)(typescript@5.9.3)(vite@7.3.5(@types/node@25.9.1)(jiti@2.7.0)(lightningcss@1.32.0)(yaml@2.9.0))(vitest@3.2.6(@types/node@25.9.1)(jiti@2.7.0)(lightningcss@1.32.0)(msw@2.14.6(@types/node@25.9.1)(typescript@5.9.3))(yaml@2.9.0))
|
||||
'@types/node':
|
||||
specifier: ^25.7.0
|
||||
version: 25.9.1
|
||||
@@ -45,8 +45,8 @@ importers:
|
||||
packages/agent-builtin:
|
||||
dependencies:
|
||||
'@ocas/core':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@united-workforce/util':
|
||||
specifier: workspace:^
|
||||
version: link:../util
|
||||
@@ -61,8 +61,8 @@ importers:
|
||||
packages/agent-claude-code:
|
||||
dependencies:
|
||||
'@ocas/core':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@united-workforce/protocol':
|
||||
specifier: workspace:^
|
||||
version: link:../protocol
|
||||
@@ -80,8 +80,8 @@ importers:
|
||||
packages/agent-hermes:
|
||||
dependencies:
|
||||
'@ocas/core':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@united-workforce/protocol':
|
||||
specifier: workspace:^
|
||||
version: link:../protocol
|
||||
@@ -99,8 +99,8 @@ importers:
|
||||
packages/agent-mock:
|
||||
dependencies:
|
||||
'@ocas/core':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@united-workforce/protocol':
|
||||
specifier: workspace:^
|
||||
version: link:../protocol
|
||||
@@ -121,11 +121,11 @@ importers:
|
||||
packages/cli:
|
||||
dependencies:
|
||||
'@ocas/core':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@ocas/fs':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@united-workforce/protocol':
|
||||
specifier: workspace:^
|
||||
version: link:../protocol
|
||||
@@ -231,11 +231,11 @@ importers:
|
||||
packages/eval:
|
||||
dependencies:
|
||||
'@ocas/core':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@ocas/fs':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@united-workforce/protocol':
|
||||
specifier: workspace:^
|
||||
version: link:../protocol
|
||||
@@ -256,11 +256,11 @@ importers:
|
||||
packages/protocol:
|
||||
dependencies:
|
||||
'@ocas/core':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@ocas/fs':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
devDependencies:
|
||||
typescript:
|
||||
specifier: ^5.8.3
|
||||
@@ -275,11 +275,11 @@ importers:
|
||||
packages/util-agent:
|
||||
dependencies:
|
||||
'@ocas/core':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@ocas/fs':
|
||||
specifier: ^0.3.0
|
||||
version: 0.3.0
|
||||
specifier: ^0.4.0
|
||||
version: 0.4.0
|
||||
'@united-workforce/protocol':
|
||||
specifier: workspace:^
|
||||
version: link:../protocol
|
||||
@@ -892,11 +892,13 @@ packages:
|
||||
resolution: {integrity: sha512-oGB+UxlgWcgQkgwo8GcEGwemoTFt3FIO9ababBmaGwXIoBKZ+GTy0pP185beGg7Llih/NSHSV2XAs1lnznocSg==}
|
||||
engines: {node: '>= 8'}
|
||||
|
||||
'@ocas/core@0.3.0':
|
||||
resolution: {integrity: sha512-ejDDZbmQkTj2GoJg+cNjXa3eHlQGybW3PrUZlwERBvBFjjnYBLHOG7AQQYM48bI52UiqucafgZjPEYk9SZd6AQ==}
|
||||
'@ocas/core@0.4.0':
|
||||
resolution: {integrity: sha512-6JvHd3nr5GncMOBNaZTf9ZTWou/txONTfZbkrblmgqL/H+YuRj1FfeFY+b1ndUlfwR7AuJ6bvoSxR5RP+AbC0w==}
|
||||
engines: {node: '>=22.5.0'}
|
||||
|
||||
'@ocas/fs@0.3.0':
|
||||
resolution: {integrity: sha512-/6/nICYVJWXeWx2LcPoHHJAFoqXpJoAtvhLKLS0zpkwtsZX3g0D9X6J5soHCV1QS+BOWybuOJ0+W3cB1FBRkZA==}
|
||||
'@ocas/fs@0.4.0':
|
||||
resolution: {integrity: sha512-AQG6dk1YCL1qpSszUWUgEY+LQhYbTv5hXYrs3J2pHAi2/lY615O2cTgjwEeh6JTcrqHsFwiDsDdKIKMpADchZA==}
|
||||
engines: {node: '>=22.5.0'}
|
||||
|
||||
'@open-draft/deferred-promise@2.2.0':
|
||||
resolution: {integrity: sha512-CecwLWx3rhxVQF6V4bAgPS5t+So2sTbPgAzafKkVizyi7tlwpcFpdFqq+wqF2OwNBmqFuu6tOyouTuxgpMfzmA==}
|
||||
@@ -1152,8 +1154,8 @@ packages:
|
||||
'@sec-ant/readable-stream@0.4.1':
|
||||
resolution: {integrity: sha512-831qok9r2t8AlxLko40y2ebgSDhenenCatLVeW/uBtnHPyhHOvG0C7TvfgecV+wHzIm5KUICgzmVpWS+IMEAeg==}
|
||||
|
||||
'@shazhou/proman@0.5.1':
|
||||
resolution: {integrity: sha512-GmFUvd8SAOUW/eaDIEh31pVKSE3XhbgHOZ5vSpX4xS+F8Zl6lAfhgVCjcjRK8w5d43tsH47CVorwyxQcRaJFfA==}
|
||||
'@shazhou/proman@0.6.3':
|
||||
resolution: {integrity: sha512-KguWl1xHrWXx1YWYrWj47v4NRbaQuKCm7Hd7T8dzrqnkM8UL8em3R9rC7GeDzI8YDDfriFeLTX+xb03UHkhTDA==}
|
||||
hasBin: true
|
||||
peerDependencies:
|
||||
'@biomejs/biome': ^2.0.0
|
||||
@@ -3896,16 +3898,16 @@ snapshots:
|
||||
'@nodelib/fs.scandir': 2.1.5
|
||||
fastq: 1.20.1
|
||||
|
||||
'@ocas/core@0.3.0':
|
||||
'@ocas/core@0.4.0':
|
||||
dependencies:
|
||||
ajv: 8.20.0
|
||||
cborg: 4.5.8
|
||||
liquidjs: 10.27.0
|
||||
xxhash-wasm: 1.1.0
|
||||
|
||||
'@ocas/fs@0.3.0':
|
||||
'@ocas/fs@0.4.0':
|
||||
dependencies:
|
||||
'@ocas/core': 0.3.0
|
||||
'@ocas/core': 0.4.0
|
||||
cborg: 4.5.8
|
||||
|
||||
'@open-draft/deferred-promise@2.2.0': {}
|
||||
@@ -4049,7 +4051,7 @@ snapshots:
|
||||
|
||||
'@sec-ant/readable-stream@0.4.1': {}
|
||||
|
||||
'@shazhou/proman@0.5.1(@biomejs/biome@2.4.16)(typescript@5.9.3)(vite@7.3.5(@types/node@25.9.1)(jiti@2.7.0)(lightningcss@1.32.0)(yaml@2.9.0))(vitest@3.2.6(@types/node@25.9.1)(jiti@2.7.0)(lightningcss@1.32.0)(msw@2.14.6(@types/node@25.9.1)(typescript@5.9.3))(yaml@2.9.0))':
|
||||
'@shazhou/proman@0.6.3(@biomejs/biome@2.4.16)(typescript@5.9.3)(vite@7.3.5(@types/node@25.9.1)(jiti@2.7.0)(lightningcss@1.32.0)(yaml@2.9.0))(vitest@3.2.6(@types/node@25.9.1)(jiti@2.7.0)(lightningcss@1.32.0)(msw@2.14.6(@types/node@25.9.1)(typescript@5.9.3))(yaml@2.9.0))':
|
||||
dependencies:
|
||||
'@biomejs/biome': 2.4.16
|
||||
typescript: 5.9.3
|
||||
|
||||
@@ -1,326 +0,0 @@
|
||||
name: solve-issue
|
||||
description: TDD-driven issue resolution adapted for the workflow monorepo with bun + vitest
|
||||
roles:
|
||||
planner:
|
||||
description: Analyzes issue and outputs a TDD test spec
|
||||
goal: You are a planning agent. You analyze Gitea issues and produce a TDD test specification that downstream roles will implement and verify.
|
||||
capabilities:
|
||||
- issue-analysis
|
||||
- planning
|
||||
procedure: 'On first run (no previous steps):
|
||||
|
||||
1. Read the issue and all comments from Gitea using `tea issues <number> -r <owner/repo>`
|
||||
|
||||
2. Look for project conventions files (CLAUDE.md, CONTRIBUTING.md) in the repo
|
||||
|
||||
3. Assess whether the issue has enough information to produce a test spec
|
||||
|
||||
4. If insufficient info: comment on the issue via `echo "..." | tea comment <number> -r <owner/repo>` (skip if you already commented), then output $status=insufficient_info
|
||||
|
||||
5. If sufficient: produce a detailed TDD test spec in markdown covering all scenarios
|
||||
|
||||
|
||||
On subsequent runs (bounced back by tester with fix_spec):
|
||||
|
||||
1. Read the tester''s output from the previous step to understand what''s wrong with the spec
|
||||
|
||||
2. Revise the test spec accordingly
|
||||
|
||||
|
||||
After producing the test spec:
|
||||
|
||||
1. The test spec is stored in CAS automatically by the uwf pipeline (agents do not need to call `ocas put` directly)
|
||||
|
||||
2. Put the hash in frontmatter.plan (required when $status=ready)
|
||||
|
||||
3. Set repoPath to the absolute path of the repository root
|
||||
|
||||
|
||||
|
||||
IMPORTANT: Extract the repo remote (owner/repo) from git:
|
||||
|
||||
```bash
|
||||
|
||||
git remote get-url origin | sed ''s|.*[:/]\([^/]*/[^.]*\).*|\1|''
|
||||
|
||||
```
|
||||
|
||||
Store the result as repoRemote in your frontmatter output so downstream roles can use it for tea/API calls.'
|
||||
output: Output a brief summary of the test spec. Set $status to ready (with plan hash and repoPath) or insufficient_info.
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status:
|
||||
const: ready
|
||||
plan:
|
||||
type: string
|
||||
repoPath:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- plan
|
||||
- repoPath
|
||||
- properties:
|
||||
$status:
|
||||
const: insufficient_info
|
||||
reason:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- reason
|
||||
developer:
|
||||
description: TDD implementation per test spec
|
||||
goal: You are a developer agent. You implement code changes following TDD — write tests first, then implementation.
|
||||
capabilities:
|
||||
- coding
|
||||
procedure: "IMPORTANT: Always work in a git worktree, NEVER modify the main working directory directly.\nThe repo path and other details are provided in your task prompt.\n\nBefore starting any work,\
|
||||
\ set up an isolated worktree:\n1. cd into the repo path provided in your task prompt\n2. `git fetch origin` to get latest refs\n3. First time (no existing branch):\n - `git worktree add .worktrees/fix/<issue-number>-<short-slug>\
|
||||
\ -b fix/<issue-number>-<short-slug> origin/main`\n - `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`\n4. If bounced back from reviewer or tester (branch already exists):\n - cd\
|
||||
\ into the existing worktree under `.worktrees/fix/<issue-number>-<short-slug>`\n - `git fetch origin && git rebase origin/main`\n5. ALL subsequent work must happen inside the worktree directory.\n\
|
||||
\nThen implement TDD:\n6. Read the test spec from CAS: `ocas get <plan hash>` (find the hash from the planner's output in your task prompt)\n7. If bounced back from reviewer or tester: read the\
|
||||
\ previous role's feedback in your task prompt\n8. Write tests first based on the spec (use vitest)\n9. Implement the code to make tests pass\n10. Ensure `bun run build` passes with no errors\n11.\
|
||||
\ Run `bun test` to verify all tests pass\n\nIf you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,\nor repeated attempts fail), set $status=failed\
|
||||
\ with a reason.\n"
|
||||
output: List all files changed and provide a summary. Set $status to done (with branch/worktree), or failed (with reason).
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status:
|
||||
const: done
|
||||
branch:
|
||||
type: string
|
||||
worktree:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- branch
|
||||
- worktree
|
||||
- properties:
|
||||
$status:
|
||||
const: failed
|
||||
reason:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- reason
|
||||
reviewer:
|
||||
description: Code standards compliance check
|
||||
goal: You are a code reviewer. You verify code standards compliance — NOT functionality (that's the tester's job).
|
||||
capabilities:
|
||||
- code-review
|
||||
- static-analysis
|
||||
procedure: 'The worktree path is provided in your task prompt. cd into it first.
|
||||
|
||||
|
||||
Before reviewing, verify the git branch:
|
||||
|
||||
1. Run `git branch --show-current` — confirm the branch name references the issue number being worked on
|
||||
|
||||
2. If the branch doesn''t correspond to the issue, flag it in your output and reject
|
||||
|
||||
|
||||
Then perform code review:
|
||||
|
||||
Hard checks (must all pass):
|
||||
|
||||
3. `bun run build` — no build errors
|
||||
|
||||
4. `bunx biome check` — no lint violations
|
||||
|
||||
5. TypeScript strict mode — no type errors
|
||||
|
||||
|
||||
Soft checks (review against project conventions from CLAUDE.md):
|
||||
|
||||
- Functional-first: functions + types, no classes (except for errors or third-party requirements)
|
||||
|
||||
- Named exports only, no default exports
|
||||
|
||||
- No optional properties (use `T | null` instead of `?:`)
|
||||
|
||||
- Folder module discipline: index.ts only re-exports, types in types.ts
|
||||
|
||||
- Crockford Base32 log tags (8-char, unique per call site)
|
||||
|
||||
- No `console.log` in production code (use createLogger from @united-workforce/util)
|
||||
|
||||
- No dynamic imports in production code
|
||||
|
||||
|
||||
Only review standards compliance. Do NOT test functionality.
|
||||
|
||||
If rejecting, you MUST explain the specific reason in your output.
|
||||
|
||||
'
|
||||
output: Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments).
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status:
|
||||
const: approved
|
||||
branch:
|
||||
type: string
|
||||
worktree:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- branch
|
||||
- worktree
|
||||
- properties:
|
||||
$status:
|
||||
const: rejected
|
||||
comments:
|
||||
type: string
|
||||
worktree:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- comments
|
||||
- worktree
|
||||
tester:
|
||||
description: Functional correctness verification
|
||||
goal: You are a tester agent. You verify that the implementation correctly satisfies every scenario in the test spec.
|
||||
capabilities:
|
||||
- testing
|
||||
procedure: "The worktree path is provided in your task prompt. cd into it first.\n\n1. Run `bun test` for automated test verification\n2. Read the test spec from CAS: `ocas get <plan hash>` (find\
|
||||
\ the hash from the planner step in the thread history)\n3. Verify each scenario in the spec is covered and passing\n4. Determine outcome:\n - passed: all scenarios verified, tests pass\n - fix_code:\
|
||||
\ tests fail or implementation doesn't match spec → send back to developer\n - fix_spec: the spec itself is wrong or incomplete → send back to planner\n"
|
||||
output: Report test results per scenario. Set $status to passed (with branch/worktree), fix_code (with report), or fix_spec (with report).
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status:
|
||||
const: passed
|
||||
branch:
|
||||
type: string
|
||||
worktree:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- branch
|
||||
- worktree
|
||||
- properties:
|
||||
$status:
|
||||
const: fix_code
|
||||
report:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
worktree:
|
||||
type: string
|
||||
branch:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- report
|
||||
- properties:
|
||||
$status:
|
||||
const: fix_spec
|
||||
report:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
worktree:
|
||||
type: string
|
||||
branch:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- report
|
||||
committer:
|
||||
description: Commits and creates PR
|
||||
goal: You are a committer agent. You create a clean commit and push a PR linking the original issue.
|
||||
capabilities: []
|
||||
procedure: "The worktree path, branch name, and repo remote (owner/repo) are provided in your task prompt.\ncd into the worktree first.\n\nNote: You inherit the developer's worktree and branch. Do NOT\
|
||||
\ create a new branch.\n1. Stage all changes: `git add -A`\n2. Commit with a descriptive message referencing the issue: `git commit -m \"type: description\\n\\nFixes #N\"`\n3. Push the branch: `git\
|
||||
\ push -u origin <branch-name>`\n4. **Verify push succeeded** — run `git ls-remote origin <branch-name>` and confirm it prints a commit hash.\n - If no output or push failed: capture the error, mark hook_failed\n\
|
||||
5. Create a PR using the Gitea API (do NOT use `tea pr create` — it fails in worktrees):\n ```bash\n GITEA_TOKEN=$(cfg get GITEA_TOKEN)\n curl -s -X POST -H \"Authorization: token $GITEA_TOKEN\" -H \"Content-Type: application/json\" \\\n\
|
||||
\ \"https://git.shazhou.work/api/v1/repos/<owner>/<repo>/pulls\" \\\n -d '{\"title\":\"...\",\"body\":\"...\",\"head\":\"<branch>\",\"base\":\"main\"}'\n ```\n - The repo remote (owner/repo format, e.g. \"shazhou/united-workforce\") is given in your task prompt — use it directly.\n\
|
||||
\ - PR body must include: What / Why / Changes / Ref sections, with `Fixes #N` in Ref\n6. **Verify PR was created** — parse the curl response JSON: it must contain a `\"number\"` field. Print the PR URL.\n\
|
||||
\ - If curl returns an error or no number field: capture the response, mark hook_failed\n7. After PR creation, clean up the worktree:\n - cd to the repo root (parent of .worktrees)\n - `git worktree remove <worktree-path>`"
|
||||
output: Include PR URL on success or error log on failure. Set $status to committed (with prUrl) or hook_failed (with error).
|
||||
frontmatter:
|
||||
oneOf:
|
||||
- properties:
|
||||
$status:
|
||||
const: committed
|
||||
prUrl:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
worktree:
|
||||
type: string
|
||||
branch:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- prUrl
|
||||
- properties:
|
||||
$status:
|
||||
const: hook_failed
|
||||
error:
|
||||
type: string
|
||||
repoRemote:
|
||||
type: string
|
||||
worktree:
|
||||
type: string
|
||||
branch:
|
||||
type: string
|
||||
required:
|
||||
- $status
|
||||
- error
|
||||
graph:
|
||||
$START:
|
||||
_:
|
||||
role: planner
|
||||
prompt: Analyze the issue and produce an implementation plan.
|
||||
planner:
|
||||
insufficient_info:
|
||||
role: $SUSPEND
|
||||
prompt: "信息不足,需要补充:{{{reason}}}"
|
||||
ready:
|
||||
role: developer
|
||||
prompt: 'Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}. Repo remote: {{{repoRemote}}}.'
|
||||
developer:
|
||||
done:
|
||||
role: reviewer
|
||||
prompt: 'Review branch {{{branch}}} at {{{worktree}}} for code standards compliance. Repo remote: {{{repoRemote}}}.'
|
||||
failed:
|
||||
role: $END
|
||||
prompt: 'Developer failed: {{{reason}}}. Ending workflow.'
|
||||
reviewer:
|
||||
rejected:
|
||||
role: developer
|
||||
prompt: 'Reviewer rejected: {{{comments}}}. Fix the issues in repo {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
|
||||
approved:
|
||||
role: tester
|
||||
prompt: 'Review passed. Run tests on branch {{{branch}}} at {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
|
||||
tester:
|
||||
fix_code:
|
||||
role: developer
|
||||
prompt: 'Tests found code issues: {{{report}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
|
||||
fix_spec:
|
||||
role: planner
|
||||
prompt: 'Tests found spec issues: {{{report}}}. Revise the test spec. Repo remote: {{{repoRemote}}}.'
|
||||
passed:
|
||||
role: committer
|
||||
prompt: 'All tests passed. Commit and push branch {{{branch}}} from {{{worktree}}}. Repo remote (owner/repo): {{{repoRemote}}}.'
|
||||
committer:
|
||||
hook_failed:
|
||||
role: developer
|
||||
prompt: 'Push hook failed: {{{error}}}. Fix and re-submit. Worktree: {{{worktree}}}. Repo remote: {{{repoRemote}}}.'
|
||||
committed:
|
||||
role: $END
|
||||
prompt: 'PR created: {{{prUrl}}}. Workflow complete.'
|
||||
Reference in New Issue
Block a user