Compare commits

...

5 Commits

Author SHA1 Message Date
xingyue 6324122168 fix(uwf-hermes): read turn data from Hermes session file instead of ACP stream
CI / test (pull_request) Failing after 12m19s
Closes #519

The ACP protocol's tool_call updates only carry a display title (not a
structured tool name) and omit rawInput for polished tools, making the
reconstructed messages unusable for step read/show.

Changes:
- hermes.ts: storePromptResult reads ~/.hermes/sessions/session_{id}.json
  via loadHermesSession() instead of using ACP-reconstructed messages
- acp-client.ts: strip message/tool-call collection logic, keep only
  text chunk accumulation for final response extraction
- step.ts: TurnData gains role + toolCalls fields; formatTurnBody
  renders them in step read markdown output
- README: document sessions.write_json_snapshots requirement
2026-05-25 22:21:03 +08:00
xiaoju 25b411f22e Merge pull request 'fix(validate): support enum-based multi-exit frontmatter schemas' (#518) from fix/enum-multi-exit-validation into main
CI / test (push) Failing after 15m56s
2026-05-25 13:23:10 +00:00
xiaoju 54dc8fcb39 fix(validate): support enum-based multi-exit and upgrade json-cas to 0.5.3
CI / test (pull_request) Failing after 5m55s
Two fixes for 'uwf thread start solve-issue' failures:

1. json-cas 0.5.2 (npm) was missing oneOf in ALLOWED_SCHEMA_KEYS.
   Published json-cas 0.5.3 with the fix, bumped all packages to ^0.5.3.

2. Semantic validator only recognized oneOf-based multi-exit schemas.
   Roles using $status with enum (e.g. enum: [approved, rejected]) were
   incorrectly treated as single-exit. Added isEnumMultiExit() support.

Changes:
- validate-semantic.ts: isEnumMultiExit(), getEnumStatuses(), checkSingleExitMustache()
- All package.json: @uncaged/json-cas ^0.5.2 → ^0.5.3
- validate-semantic.test.ts: 5 new enum multi-exit tests (Suite 3b)
- solve-issue-tea-worktree.test.ts: updated for current workflow structure

小橘 🍊
2026-05-25 13:13:51 +00:00
xiaoju a40e1bb847 fix(cli): remove Chinese text from uwf --help description
CI / test (pull_request) Failing after 33s
Remove the annotation line entirely — the layer names are self-explanatory.

小橘 🍊
2026-05-25 12:42:10 +00:00
xiaomo 2c8bcf7996 Merge pull request 'feat(setup): auto-discover and configure agents during uwf setup' (#515) from feat/424-setup-agent-discovery into main
CI / test (push) Failing after 1m34s
2026-05-25 12:39:34 +00:00
17 changed files with 536 additions and 310 deletions
+175 -69
View File
@@ -1,92 +1,198 @@
name: "solve-issue"
description: "End-to-end issue resolution"
description: "TDD-driven issue resolution for small, focused changes. Loop protection relies on engine maxRounds."
roles:
planner:
description: "Creates implementation plan"
goal: "You are a planning agent. You analyze issues and create implementation plans grounded in the actual codebase."
description: "Analyzes issue and outputs a TDD test spec"
goal: "You are a planning agent. You analyze Gitea issues and produce a TDD test specification that downstream roles will implement and verify."
capabilities:
- issue-analysis
- planning
- file-read
- shell
procedure: |
1. Locate the code repository:
- Check if the current working directory is the repo (look for package.json, .git, etc.)
- If the task mentions a repo URL, clone it first.
- If this is a new project, create the repo and note the path.
2. Explore the codebase — read the relevant source files mentioned in the issue. Understand the current architecture, types, and conventions (check CLAUDE.md, CONTRIBUTING.md, .cursor/rules/).
3. Identify which files need changes and what the changes should be, with specific code references.
4. Output the plan with:
- `repoPath`: absolute path to the repository root
- `plan`: detailed implementation plan with file paths and code references
- `steps`: concrete action items for the developer
output: |
Provide repoPath, plan summary, and steps in the frontmatter.
The plan MUST reference actual file paths and code structures you found by reading the source.
Do NOT guess — if you haven't read a file, read it before referencing it.
On first run (no previous steps):
1. Read the issue and all comments from Gitea using `tea issues <number> -r <owner/repo>`
2. Look for project conventions files (CLAUDE.md, CONTRIBUTING.md, .cursor/rules/) in the repo
3. Assess whether the issue has enough information to produce a test spec
4. If insufficient info: comment on the issue via `echo "..." | tea comment <number> -r <owner/repo>` (skip if you already commented), then output $status=insufficient_info
5. If sufficient: produce a detailed TDD test spec in markdown covering all scenarios
On subsequent runs (bounced back by tester with fix_spec):
1. Read the tester's output from the previous step to understand what's wrong with the spec
2. Revise the test spec accordingly
After producing the test spec:
1. Store it via `uwf cas put-text "<markdown content>"` and capture the returned hash
2. Put the hash in frontmatter.plan (required when $status=ready)
3. Set repoPath to the absolute path of the repository root
output: "Output a brief summary of the test spec. Set $status to ready (with plan hash and repoPath) or insufficient_info."
frontmatter:
type: object
properties:
$status:
enum: ["_"]
repoPath:
type: string
plan:
type: string
required: [$status, repoPath, plan]
oneOf:
- properties:
$status: { const: "ready" }
plan: { type: string }
repoPath: { type: string }
required: [$status, plan, repoPath]
- properties:
$status: { const: "insufficient_info" }
required: [$status]
developer:
description: "Implements code changes"
goal: "You are a developer agent. You implement code changes according to plans."
description: "TDD implementation per test spec"
goal: "You are a developer agent. You implement code changes following TDD — write tests first, then implementation."
capabilities:
- file-edit
- shell
- testing
- coding
procedure: |
1. Read the planner's output to get the repoPath and implementation plan.
2. cd to the repoPath before making any changes.
3. Create a feature branch from the default branch.
4. Implement the plan — write code, tests, and ensure existing tests pass.
5. Run the project's lint/check command (e.g. `bun run check`, `npm run lint`) and fix ALL errors before proceeding. Build and lint must pass cleanly.
6. Commit your changes with a descriptive message referencing the issue.
output: "List all files changed and provide a summary of the implementation."
IMPORTANT: Always work in a git worktree, NEVER modify the main working directory directly.
The repo path and other details are provided in your task prompt.
Before starting any work, set up an isolated worktree:
1. cd into the repo path provided in your task prompt
2. `git fetch origin` to get latest refs
3. First time (no existing branch):
- `git worktree add .worktrees/fix/<issue-number>-<short-slug> -b fix/<issue-number>-<short-slug> origin/main`
- `cd .worktrees/fix/<issue-number>-<short-slug> && bun install`
4. If bounced back from reviewer or tester (branch already exists):
- cd into the existing worktree under `.worktrees/fix/<issue-number>-<short-slug>`
- `git fetch origin && git rebase origin/main`
5. ALL subsequent work must happen inside the worktree directory.
Then implement TDD:
6. Read the test spec from CAS: `uwf cas get <plan hash>` (find the hash from the planner's output in your task prompt)
7. If bounced back from reviewer or tester: read the previous role's feedback in your task prompt
8. Write tests first based on the spec
9. Implement the code to make tests pass
10. Ensure `bun run build` passes with no errors
11. Run `bun test` to verify all tests pass
If you cannot complete the implementation (e.g. the issue is too complex, blocked by external factors,
or repeated attempts fail), set $status=failed with a reason.
output: "List all files changed and provide a summary. Set $status to done (with branch/worktree), or failed (with reason)."
frontmatter:
type: object
properties:
$status:
enum: ["_"]
filesChanged:
type: array
items:
type: string
summary:
type: string
required: [$status, filesChanged, summary]
oneOf:
- properties:
$status: { const: "done" }
branch: { type: string }
worktree: { type: string }
required: [$status, branch, worktree]
- properties:
$status: { const: "failed" }
reason: { type: string }
required: [$status, reason]
reviewer:
description: "Reviews code changes"
goal: "You are a code reviewer. You review implementations for correctness and quality."
description: "Code standards compliance check"
goal: "You are a code reviewer. You verify code standards compliance — NOT functionality (that's the tester's job)."
capabilities:
- code-review
- static-analysis
procedure: |
1. Run hard checks first — build (`bun run build` or equivalent) and lint (`bunx biome check .` or equivalent) MUST pass with zero errors. If they fail, reject immediately.
2. Then review code quality: correctness, edge cases, naming, project conventions (CLAUDE.md), and test coverage.
3. Only reject for hard check failures or genuine correctness/security issues. Style suggestions alone should not block approval.
output: "Approve or reject with detailed comments explaining your decision."
The worktree path is provided in your task prompt. cd into it first.
Before reviewing, verify the git branch:
1. Run `git branch --show-current` — confirm the branch name references the issue number being worked on
2. If the branch doesn't correspond to the issue, flag it in your output and reject
Then perform code review:
Hard checks (must all pass):
3. `bun run build` — no build errors
4. `bunx biome check` — no lint violations
5. TypeScript strict mode — no type errors
Soft checks (review against project conventions if CLAUDE.md / .cursor/rules exist):
- Naming conventions, module boundaries, code style
- No `console.log` in production code
- No dynamic imports in production code
Only review standards compliance. Do NOT test functionality.
If rejecting, you MUST explain the specific reason in your output.
output: "Explain your decision with specific file/line references. Set $status to approved (with branch/worktree) or rejected (with comments)."
frontmatter:
type: object
properties:
$status:
enum: ["approved", "rejected"]
comments:
type: string
required: [$status, comments]
oneOf:
- properties:
$status: { const: "approved" }
branch: { type: string }
worktree: { type: string }
required: [$status, branch, worktree]
- properties:
$status: { const: "rejected" }
comments: { type: string }
worktree: { type: string }
required: [$status, comments, worktree]
tester:
description: "Functional correctness verification"
goal: "You are a tester agent. You verify that the implementation correctly satisfies every scenario in the test spec."
capabilities:
- testing
procedure: |
The worktree path is provided in your task prompt. cd into it first.
1. Run `bun test` for automated test verification
2. Read the test spec from CAS: `uwf cas get <plan hash>` (find the hash from the planner step in the thread history)
3. Verify each scenario in the spec is covered and passing
4. Determine outcome:
- passed: all scenarios verified, tests pass
- fix_code: tests fail or implementation doesn't match spec → send back to developer
- fix_spec: the spec itself is wrong or incomplete → send back to planner
output: "Report test results per scenario. Set $status to passed (with branch/worktree), fix_code (with report), or fix_spec (with report)."
frontmatter:
oneOf:
- properties:
$status: { const: "passed" }
branch: { type: string }
worktree: { type: string }
required: [$status, branch, worktree]
- properties:
$status: { const: "fix_code" }
report: { type: string }
required: [$status, report]
- properties:
$status: { const: "fix_spec" }
report: { type: string }
required: [$status, report]
committer:
description: "Commits and creates PR"
goal: "You are a committer agent. You create a clean commit and push a PR linking the original issue."
capabilities: []
procedure: |
The worktree path, branch name, and repo info are provided in your task prompt.
cd into the worktree first.
Note: You inherit the developer's worktree and branch. Do NOT create a new branch.
1. Stage all changes: `git add -A`
2. Commit with a descriptive message referencing the issue: `git commit -m "type: description\n\nFixes #N"`
3. Push the branch: `git push -u origin <branch-name>`
- If push hook fails: capture the error log in your output, mark hook_failed
4. On push success: create a PR via `tea pr create --repo <owner/repo> --title "..." --description "..."`
- Extract owner/repo from: `git remote get-url origin | sed 's/.*[:/]\([^/]*\/[^.]*\).*/\1/'`
- PR description must include: What / Why / Changes / Ref sections, with `Fixes #N` in Ref
- On tea failure: capture stderr/stdout, include PR details for manual creation, mark hook_failed
5. After PR creation, clean up the worktree:
- cd to the repo root (parent of .worktrees)
- `git worktree remove <worktree-path>`
output: "Include PR URL on success or error log on failure. Set $status to committed (with prUrl) or hook_failed (with error)."
frontmatter:
oneOf:
- properties:
$status: { const: "committed" }
prUrl: { type: string }
required: [$status, prUrl]
- properties:
$status: { const: "hook_failed" }
error: { type: string }
required: [$status, error]
graph:
$START:
_: { role: "planner", prompt: "Analyze the issue described in the task and produce a detailed implementation plan." }
_: { role: "planner", prompt: "Analyze the issue and produce an implementation plan." }
planner:
_: { role: "developer", prompt: "Implement the plan from the planner. Write code, tests, and ensure existing tests pass." }
insufficient_info: { role: "$END", prompt: "Insufficient information to proceed; end the workflow." }
ready: { role: "developer", prompt: "Implement the TDD test spec (CAS hash: {{{plan}}}) in repo {{{repoPath}}}." }
developer:
_: { role: "reviewer", prompt: "Review the developer's implementation against the plan for correctness and quality." }
done: { role: "reviewer", prompt: "Review branch {{{branch}}} at {{{worktree}}} for code standards compliance." }
failed: { role: "$END", prompt: "Developer failed: {{{reason}}}. Ending workflow." }
reviewer:
approved: { role: "$END", prompt: "The review passed. Complete the workflow." }
rejected: { role: "developer", prompt: "The reviewer rejected your implementation. Read their feedback and fix the issues: {{{comments}}}" }
rejected: { role: "developer", prompt: "Reviewer rejected: {{{comments}}}. Fix the issues in repo {{{worktree}}}." }
approved: { role: "tester", prompt: "Review passed. Run tests on branch {{{branch}}} at {{{worktree}}}." }
tester:
fix_code: { role: "developer", prompt: "Tests found code issues: {{{report}}}. Fix and re-submit." }
fix_spec: { role: "planner", prompt: "Tests found spec issues: {{{report}}}. Revise the test spec." }
passed: { role: "committer", prompt: "All tests passed. Commit and push branch {{{branch}}} from {{{worktree}}}." }
committer:
hook_failed: { role: "developer", prompt: "Push hook failed: {{{error}}}. Fix and re-submit." }
committed: { role: "$END", prompt: "PR created: {{{prUrl}}}. Workflow complete." }
+2 -2
View File
@@ -11,8 +11,8 @@
"uwf": "./src/cli.ts"
},
"dependencies": {
"@uncaged/json-cas": "^0.5.2",
"@uncaged/json-cas-fs": "^0.5.2",
"@uncaged/json-cas": "^0.5.3",
"@uncaged/json-cas-fs": "^0.5.3",
"@uncaged/workflow-protocol": "workspace:^",
"@uncaged/workflow-util": "workspace:^",
"@uncaged/workflow-util-agent": "workspace:^",
@@ -453,7 +453,78 @@ describe("step read", () => {
expect(markdown).not.toContain("## Turn");
});
test("test 6: turn content with special characters", async () => {
test("test 6: displays role and tool calls in turn body", async () => {
const casDir = join(tmpDir, "cas");
await mkdir(casDir, { recursive: true });
const store = createFsStore(casDir);
const schemas = await registerUwfSchemas(store);
const detailSchemas = await registerDetailSchemas(store);
const workflowHash = await store.put(schemas.workflow, {
name: "test-wf",
description: "desc",
roles: {
worker: {
description: "Worker",
goal: "You are a worker agent.",
capabilities: [],
procedure: "Do the work.",
output: "Summarize the work.",
meta: "placeholder00" as CasRef,
},
},
conditions: {},
graph: {},
});
const startHash = await store.put(schemas.startNode, {
workflow: workflowHash,
prompt: "Test task",
});
const outputHash = await store.put(schemas.workflow, {
name: "out",
description: "",
roles: {},
conditions: {},
graph: {},
});
const turnHash = await store.put(detailSchemas.turn, {
index: 0,
role: "assistant",
content: "",
toolCalls: [{ name: "terminal", args: '{"command":"echo hi"}' }],
reasoning: null,
});
const detailHash = await store.put(detailSchemas.detail, {
sessionId: "session-1",
model: "test-model",
duration: 1000,
turnCount: 1,
turns: [turnHash],
});
const stepHash = await store.put(schemas.stepNode, {
start: startHash,
prev: null,
role: "worker",
output: outputHash,
detail: detailHash,
agent: "uwf-hermes",
startedAtMs: 1000000000000,
completedAtMs: 1000000005000,
});
const markdown = await cmdStepRead(tmpDir, stepHash, 4000);
expect(markdown).toContain("**Turn role:** assistant");
expect(markdown).toContain("**terminal**");
expect(markdown).toContain('{"command":"echo hi"}');
});
test("test 7: turn content with special characters", async () => {
const casDir = join(tmpDir, "cas");
await mkdir(casDir, { recursive: true });
const store = createFsStore(casDir);
@@ -250,6 +250,110 @@ describe("Suite 3: Status-Edge Consistency", () => {
});
});
describe("Suite 3b: Enum-Based Multi-Exit", () => {
test("3b.1 enum multi-exit passes with matching graph keys", () => {
const wf = makeWorkflow();
wf.roles.reviewer = {
...wf.roles.reviewer,
frontmatter: {
type: "object",
properties: {
$status: { enum: ["approved", "rejected"] },
comments: { type: "string" },
},
required: ["$status", "comments"],
} as unknown as string,
};
wf.graph.reviewer = {
approved: { role: "$END", prompt: "Done" },
rejected: { role: "writer", prompt: "Fix: {{{comments}}}" },
};
const errors = validateWorkflow(wf);
expect(errors).toEqual([]);
});
test("3b.2 enum multi-exit with extra graph key", () => {
const wf = makeWorkflow();
wf.roles.reviewer = {
...wf.roles.reviewer,
frontmatter: {
type: "object",
properties: {
$status: { enum: ["approved", "rejected"] },
comments: { type: "string" },
},
required: ["$status", "comments"],
} as unknown as string,
};
wf.graph.reviewer = {
approved: { role: "$END", prompt: "Done" },
rejected: { role: "writer", prompt: "Fix" },
timeout: { role: "$END", prompt: "Timed out" },
};
const errors = validateWorkflow(wf);
expect(errors.some((e) => e.includes("extra status keys: timeout"))).toBe(true);
});
test("3b.3 enum multi-exit with missing graph key", () => {
const wf = makeWorkflow();
wf.roles.reviewer = {
...wf.roles.reviewer,
frontmatter: {
type: "object",
properties: {
$status: { enum: ["approved", "rejected"] },
comments: { type: "string" },
},
required: ["$status", "comments"],
} as unknown as string,
};
wf.graph.reviewer = {
approved: { role: "$END", prompt: "Done" },
};
const errors = validateWorkflow(wf);
expect(errors.some((e) => e.includes("missing status keys: rejected"))).toBe(true);
});
test("3b.4 enum with single value (not multi-exit) treated as single-exit", () => {
const wf = makeWorkflow();
wf.roles.writer = {
...wf.roles.writer,
frontmatter: {
type: "object",
properties: {
$status: { enum: ["_"] },
plan: { type: "string" },
},
required: ["$status", "plan"],
} as unknown as string,
};
wf.graph.writer = { _: { role: "reviewer", prompt: "Review: {{{plan}}}" } };
const errors = validateWorkflow(wf);
expect(errors).toEqual([]);
});
test("3b.5 enum multi-exit mustache var not in frontmatter", () => {
const wf = makeWorkflow();
wf.roles.reviewer = {
...wf.roles.reviewer,
frontmatter: {
type: "object",
properties: {
$status: { enum: ["approved", "rejected"] },
comments: { type: "string" },
},
required: ["$status", "comments"],
} as unknown as string,
};
wf.graph.reviewer = {
approved: { role: "$END", prompt: "Done: {{{nonexistent}}}" },
rejected: { role: "writer", prompt: "Fix: {{{comments}}}" },
};
const errors = validateWorkflow(wf);
expect(errors.some((e) => e.includes("nonexistent") && e.includes("not found"))).toBe(true);
});
});
describe("Suite 4: Mustache Template Variable Existence", () => {
test("4.1 prompt references nonexistent variable (single-exit)", () => {
const wf = makeWorkflow();
+1 -2
View File
@@ -55,8 +55,7 @@ program
.description(
"Stateless workflow CLI\n\n" +
"Four-layer architecture:\n" +
" workflow → thread → step → turn\n" +
" 模板定义 执行实例 单步结果 agent内部交互",
" workflow → thread → step → turn",
)
.version(pkg.default.version, "-V, --version");
program.option("--format <fmt>", "Output format: json or yaml", "json");
+79 -16
View File
@@ -19,9 +19,16 @@ import {
walkChain,
} from "./shared.js";
type TurnToolCall = {
name: string;
args: string;
};
type TurnData = {
index: number;
role: string;
content: string;
toolCalls: TurnToolCall[] | null;
};
/**
@@ -128,8 +135,74 @@ function loadStepDetail(store: BootstrapCapableStore, detailRef: CasRef): Record
return detailNode.payload as Record<string, unknown>;
}
function parseTurnToolCalls(raw: unknown): TurnToolCall[] | null {
if (!Array.isArray(raw) || raw.length === 0) {
return null;
}
const calls: TurnToolCall[] = [];
for (const entry of raw) {
if (typeof entry !== "object" || entry === null) {
continue;
}
const record = entry as Record<string, unknown>;
const name = record.name;
const args = record.args;
if (typeof name === "string") {
calls.push({ name, args: typeof args === "string" ? args : "" });
}
}
return calls.length > 0 ? calls : null;
}
function formatTurnBody(turn: TurnData): string {
const parts: string[] = [];
parts.push(`**Turn role:** ${turn.role}`);
if (turn.toolCalls !== null) {
for (const call of turn.toolCalls) {
const argsSuffix = call.args !== "" ? `\`${call.args}\`` : "";
parts.push(`- **${call.name}**${argsSuffix}`);
}
}
if (turn.content !== "") {
if (parts.length > 0) {
parts.push("");
}
parts.push(turn.content);
}
return parts.join("\n");
}
function parseSingleTurn(
store: BootstrapCapableStore,
turnRef: unknown,
fallbackIndex: number,
): TurnData | null {
if (typeof turnRef !== "string") {
return null;
}
const turnNode = store.get(turnRef as CasRef);
if (turnNode === null) {
return null;
}
const turn = turnNode.payload as Record<string, unknown>;
const content = typeof turn.content === "string" ? turn.content : "";
const toolCalls = parseTurnToolCalls(turn.toolCalls);
if (content === "" && toolCalls === null) {
return null;
}
return {
index: typeof turn.index === "number" ? turn.index : fallbackIndex,
role: typeof turn.role === "string" ? turn.role : "assistant",
content,
toolCalls,
};
}
/**
* Load all turn nodes from CAS store and extract content
* Load all turn nodes from CAS store and extract display fields
*/
function loadTurnData(store: BootstrapCapableStore, turns: unknown): TurnData[] {
if (!Array.isArray(turns) || turns.length === 0) {
@@ -138,19 +211,9 @@ function loadTurnData(store: BootstrapCapableStore, turns: unknown): TurnData[]
const turnData: TurnData[] = [];
for (const turnRef of turns) {
if (typeof turnRef !== "string") {
continue;
}
const turnNode = store.get(turnRef as CasRef);
if (turnNode === null) {
continue;
}
const turn = turnNode.payload as Record<string, unknown>;
if (typeof turn.content === "string") {
turnData.push({
index: typeof turn.index === "number" ? turn.index : turnData.length,
content: turn.content,
});
const parsed = parseSingleTurn(store, turnRef, turnData.length);
if (parsed !== null) {
turnData.push(parsed);
}
}
return turnData;
@@ -168,7 +231,7 @@ function selectTurnsForQuota(turnData: TurnData[], availableQuota: number): Turn
if (turn === undefined) continue;
const turnHeader = `## Turn ${turn.index + 1}\n\n`;
const turnBlock = turnHeader + turn.content;
const turnBlock = turnHeader + formatTurnBody(turn);
const separatorCost = selectedTurns.length > 0 ? 2 : 0;
const addCost = turnBlock.length + separatorCost;
@@ -213,7 +276,7 @@ function formatStepMarkdown(
parts.push("");
parts.push(`## Turn ${turn.index + 1}`);
parts.push("");
parts.push(turn.content);
parts.push(formatTurnBody(turn));
}
return parts.join("\n");
@@ -23,6 +23,28 @@ function isOneOfSchema(fm: unknown): fm is SchemaObj & { oneOf: SchemaObj[] } {
return Array.isArray(obj.oneOf);
}
/** Check if a frontmatter schema uses enum-based multi-exit ($status with multiple enum values). */
function isEnumMultiExit(fm: unknown): boolean {
if (typeof fm !== "object" || fm === null) return false;
const obj = fm as SchemaObj;
const props = obj.properties as Record<string, SchemaObj> | undefined;
if (!props?.$status) return false;
const statusDef = props.$status;
if (!Array.isArray(statusDef.enum)) return false;
// Filter out "_" (wildcard) — if remaining values > 1, it's multi-exit
const statuses = (statusDef.enum as string[]).filter((s) => s !== "_");
return statuses.length > 1;
}
/** Extract status values from an enum-based $status field. */
function getEnumStatuses(fm: SchemaObj): string[] {
const props = fm.properties as Record<string, SchemaObj> | undefined;
if (!props?.$status) return [];
const statusDef = props.$status;
if (!Array.isArray(statusDef.enum)) return [];
return (statusDef.enum as string[]).filter((s) => s !== "_");
}
/** Get property names from a schema object. */
function getPropertyNames(schema: SchemaObj): Set<string> {
const props = schema.properties;
@@ -230,6 +252,11 @@ function checkRoleConsistency(payload: WorkflowPayload, errors: string[]): void
checkOneOfDiscriminant(roleName, variants, statuses, errors);
checkMultiExitEdges(roleName, graphKeys, new Set(statuses), errors);
checkMultiExitMustache(roleName, graphEntry, variants, errors);
} else if (isEnumMultiExit(fm)) {
const statuses = getEnumStatuses(fm as SchemaObj);
checkMultiExitEdges(roleName, graphKeys, new Set(statuses), errors);
// For enum-based schemas, mustache vars come from the flat properties
checkSingleExitMustache(roleName, graphEntry, fm as SchemaObj, errors);
} else {
checkSingleExitRole(roleName, graphKeys, graphEntry, fm as SchemaObj | null, errors);
}
@@ -265,6 +292,27 @@ function checkSingleExitRole(
}
}
/** Check mustache vars in all edge prompts against flat schema properties. */
function checkSingleExitMustache(
roleName: string,
graphEntry: Record<string, { role: string; prompt: string }>,
fm: SchemaObj,
errors: string[],
): void {
const propNames = getPropertyNames(fm);
for (const [status, target] of Object.entries(graphEntry)) {
const vars = extractMustacheVars(target.prompt);
for (const v of vars) {
if (v === "$status") continue;
if (!propNames.has(v)) {
errors.push(
`prompt variable "${v}" in graph[${roleName}][${status}] not found in role "${roleName}" frontmatter`,
);
}
}
}
}
/**
* Validate a parsed WorkflowPayload for semantic correctness.
* Returns an array of error messages. Empty array = valid.
+1 -1
View File
@@ -22,7 +22,7 @@
"test:ci": "bun test"
},
"dependencies": {
"@uncaged/json-cas": "^0.5.2",
"@uncaged/json-cas": "^0.5.3",
"@uncaged/workflow-util-agent": "workspace:^",
"@uncaged/workflow-util": "workspace:^"
},
@@ -22,7 +22,7 @@
"test:ci": "bun test"
},
"dependencies": {
"@uncaged/json-cas": "^0.5.2",
"@uncaged/json-cas": "^0.5.3",
"@uncaged/workflow-util-agent": "workspace:^",
"@uncaged/workflow-util": "workspace:^"
},
+9
View File
@@ -18,6 +18,15 @@ bun add -g @uncaged/workflow-agent-hermes
Requires the `hermes` CLI on `PATH`.
Hermes must write session JSON snapshots so `uwf-hermes` can load structured tool calls from disk. Add this to `~/.hermes/config.yaml`:
```yaml
sessions:
write_json_snapshots: true
```
Session files are stored at `~/.hermes/sessions/session_{sessionId}.json`.
## CLI Usage
Invoked by `uwf thread step` (not typically run directly):
@@ -2,7 +2,7 @@ import { afterEach, beforeEach, describe, expect, it } from "bun:test";
import { HermesAcpClient } from "../src/acp-client.js";
describe("handleSessionUpdate — helper extraction", () => {
describe("handleSessionUpdate — text extraction", () => {
let client: HermesAcpClient;
beforeEach(() => {
@@ -14,80 +14,41 @@ describe("handleSessionUpdate — helper extraction", () => {
});
it("agent_message_chunk accumulates text in messageChunks", () => {
(client as any).handleSessionUpdate({
(
client as unknown as { handleSessionUpdate: (u: Record<string, unknown>) => void }
).handleSessionUpdate({
sessionUpdate: "agent_message_chunk",
content: { type: "text", text: "hello" },
});
(client as any).handleSessionUpdate({
(
client as unknown as { handleSessionUpdate: (u: Record<string, unknown>) => void }
).handleSessionUpdate({
sessionUpdate: "agent_message_chunk",
content: { type: "text", text: " world" },
});
expect((client as any).messageChunks).toEqual(["hello", " world"]);
expect((client as unknown as { messageChunks: string[] }).messageChunks).toEqual([
"hello",
" world",
]);
});
it("agent_thought_chunk accumulates reasoning in reasoningChunks", () => {
(client as any).handleSessionUpdate({
sessionUpdate: "agent_thought_chunk",
content: { type: "text", text: "thinking" },
it("non-text chunks and other update types are ignored", () => {
(
client as unknown as { handleSessionUpdate: (u: Record<string, unknown>) => void }
).handleSessionUpdate({
sessionUpdate: "agent_message_chunk",
content: { type: "image", text: "ignored" },
});
expect((client as any).reasoningChunks).toEqual(["thinking"]);
});
it("tool_call registers a pending tool and flushes message chunks", () => {
(client as any).messageChunks = ["pre-tool text"];
(client as any).handleSessionUpdate({
(
client as unknown as { handleSessionUpdate: (u: Record<string, unknown>) => void }
).handleSessionUpdate({
sessionUpdate: "tool_call",
title: "Bash",
rawInput: { command: "ls" },
toolCallId: "tc-1",
});
expect((client as any).pendingTools.get("tc-1")).toEqual({
name: "Bash",
args: JSON.stringify({ command: "ls" }),
});
expect((client as any).messageChunks).toEqual([]);
expect((client as any).messages).toHaveLength(1);
expect((client as any).messages[0].role).toBe("assistant");
});
it("tool_call_update completed pushes tool_call and tool messages", () => {
(client as any).pendingTools.set("tc-2", { name: "Read", args: '{"path":"/foo"}' });
(client as any).handleSessionUpdate({
sessionUpdate: "tool_call_update",
status: "completed",
toolCallId: "tc-2",
rawOutput: "file contents",
});
const msgs = (client as any).messages as Array<{
role: string;
tool_calls: unknown;
content: string | null;
}>;
expect(msgs).toHaveLength(2);
expect(msgs[0].role).toBe("assistant");
expect(msgs[0].tool_calls).toEqual([
{ function: { name: "Read", arguments: '{"path":"/foo"}' } },
]);
expect(msgs[1].role).toBe("tool");
expect(msgs[1].content).toBe("file contents");
expect((client as any).pendingTools.has("tc-2")).toBe(false);
});
it("tool_call_update with non-string rawOutput JSON-stringifies it", () => {
(client as any).pendingTools.set("tc-3", { name: "Fetch", args: "" });
(client as any).handleSessionUpdate({
sessionUpdate: "tool_call_update",
status: "completed",
toolCallId: "tc-3",
rawOutput: { html: "<p>page</p>" },
});
const msgs = (client as any).messages as Array<{ role: string; content: string | null }>;
expect(msgs[1].content).toBe(JSON.stringify({ html: "<p>page</p>" }));
});
it("unknown updateType is a no-op", () => {
(client as any).handleSessionUpdate({ sessionUpdate: "unknown_type", data: {} });
expect((client as any).messages).toHaveLength(0);
expect((client as any).messageChunks).toHaveLength(0);
(
client as unknown as { handleSessionUpdate: (u: Record<string, unknown>) => void }
).handleSessionUpdate({ sessionUpdate: "unknown_type", data: {} });
expect((client as unknown as { messageChunks: string[] }).messageChunks).toHaveLength(0);
});
});
@@ -53,23 +53,4 @@ describe("HermesAcpClient", () => {
},
{ timeout: 2 * 60 * 1000 },
);
// TODO(#435): flaky — depends on live LLM; mock or move to integration suite
it.skip(
"prompt() collects structured messages including tool calls",
async () => {
await client.connect(process.cwd());
const result = await client.prompt("Run this command: echo TOOL_DETAIL_TEST");
expect(result.messages.length).toBeGreaterThan(0);
const toolMessages = result.messages.filter((m) => m.role === "tool");
expect(toolMessages.length).toBeGreaterThan(0);
const toolContent = toolMessages[0]?.content ?? "";
expect(toolContent).toContain("TOOL_DETAIL_TEST");
const assistantWithTools = result.messages.filter(
(m) => m.role === "assistant" && m.tool_calls !== null,
);
expect(assistantWithTools.length).toBeGreaterThan(0);
},
{ timeout: 2 * 60 * 1000 },
);
});
+1 -1
View File
@@ -22,7 +22,7 @@
"test:ci": "bun test __tests__/*.test.ts"
},
"dependencies": {
"@uncaged/json-cas": "^0.5.2",
"@uncaged/json-cas": "^0.5.3",
"@uncaged/workflow-util-agent": "workspace:^",
"@uncaged/workflow-protocol": "workspace:^",
"@uncaged/workflow-util": "workspace:^"
@@ -2,8 +2,6 @@ import type { ChildProcess } from "node:child_process";
import { spawn } from "node:child_process";
import { createInterface } from "node:readline";
import type { HermesSessionMessage } from "./types.js";
const HERMES_COMMAND = "hermes";
const PROTOCOL_VERSION = 1;
@@ -19,16 +17,9 @@ type PendingRequest = {
reject: (reason: Error) => void;
};
/** Tracks in-flight tool calls so we can build complete messages when they finish. */
type PendingToolCall = {
name: string;
args: string;
};
export type AcpPromptResult = {
text: string;
sessionId: string;
messages: HermesSessionMessage[];
};
export class HermesAcpClient {
@@ -38,11 +29,8 @@ export class HermesAcpClient {
private stderrBuffer = "";
private pending = new Map<number, PendingRequest>();
// Message collection state
/** Accumulated assistant text chunks from agent_message_chunk updates. */
private messageChunks: string[] = [];
private reasoningChunks: string[] = [];
private pendingTools = new Map<string, PendingToolCall>();
messages: HermesSessionMessage[] = [];
/** Spawn hermes acp, initialize, create session */
async connect(cwd: string): Promise<string> {
@@ -84,14 +72,13 @@ export class HermesAcpClient {
return sessionId;
}
/** Send prompt and collect full response text + structured messages. */
/** Send prompt and collect final assistant text from ACP stream chunks. */
async prompt(text: string): Promise<AcpPromptResult> {
if (this.sessionId === null) {
throw new Error("Not connected — call connect() first");
}
this.messageChunks = [];
this.reasoningChunks = [];
const response = await this.sendRequest("session/prompt", {
sessionId: this.sessionId,
@@ -104,28 +91,9 @@ export class HermesAcpClient {
);
}
// Flush any trailing assistant text that wasn't followed by a tool call.
this.flushAssistantMessage();
// Extract the final assistant text from collected messages.
let finalText = "";
for (let i = this.messages.length - 1; i >= 0; i--) {
const msg = this.messages[i];
if (
msg !== undefined &&
msg.role === "assistant" &&
msg.content !== null &&
msg.content.trim() !== ""
) {
finalText = msg.content;
break;
}
}
return {
text: finalText,
text: this.messageChunks.join(""),
sessionId: this.sessionId,
messages: this.messages,
};
}
@@ -242,94 +210,16 @@ export class HermesAcpClient {
}
}
// ---- Session update → structured messages ----
private handleSessionUpdate(update: Record<string, unknown>): void {
switch (update.sessionUpdate as string) {
case "agent_message_chunk":
this.handleAgentMessageChunk(update);
break;
case "agent_thought_chunk":
this.handleAgentThoughtChunk(update);
break;
case "tool_call":
this.handleToolCall(update);
break;
case "tool_call_update":
this.handleToolCallUpdate(update);
break;
default:
break;
if (update.sessionUpdate !== "agent_message_chunk") {
return;
}
}
private handleAgentMessageChunk(update: Record<string, unknown>): void {
const content = update.content as { type?: string; text?: string } | undefined;
if (content?.type === "text" && typeof content.text === "string") {
this.messageChunks.push(content.text);
}
}
private handleAgentThoughtChunk(update: Record<string, unknown>): void {
const content = update.content as { type?: string; text?: string } | undefined;
if (content?.type === "text" && typeof content.text === "string") {
this.reasoningChunks.push(content.text);
}
}
private handleToolCall(update: Record<string, unknown>): void {
const title = (update.title as string) ?? "";
const rawInput = update.rawInput;
const args = rawInput !== undefined && rawInput !== null ? JSON.stringify(rawInput) : "";
const toolCallId = update.toolCallId as string;
this.pendingTools.set(toolCallId, { name: title, args });
this.flushAssistantMessage();
}
private handleToolCallUpdate(update: Record<string, unknown>): void {
const status = update.status as string | undefined;
if (status !== "completed" && status !== "failed") return;
const toolCallId = update.toolCallId as string;
const pending = this.pendingTools.get(toolCallId);
const toolName = pending?.name ?? toolCallId;
const rawOutput = update.rawOutput;
const outputStr =
rawOutput !== undefined && rawOutput !== null
? typeof rawOutput === "string"
? rawOutput
: JSON.stringify(rawOutput)
: "";
this.messages.push({
role: "assistant",
content: null,
reasoning: null,
tool_calls: [{ function: { name: toolName, arguments: pending?.args ?? "" } }],
});
this.messages.push({
role: "tool",
content: outputStr,
reasoning: null,
tool_calls: null,
});
this.pendingTools.delete(toolCallId);
}
/** Flush any accumulated text/reasoning into an assistant message. */
private flushAssistantMessage(): void {
const text = this.messageChunks.join("");
const reasoning = this.reasoningChunks.join("");
if (text !== "" || reasoning !== "") {
this.messages.push({
role: "assistant",
content: text || null,
reasoning: reasoning || null,
tool_calls: null,
});
}
this.messageChunks = [];
this.reasoningChunks = [];
}
private rejectAll(err: Error): void {
for (const handler of this.pending.values()) {
handler.reject(err);
+10 -16
View File
@@ -10,7 +10,7 @@ import {
import { HermesAcpClient } from "./acp-client.js";
import { getCachedSessionId, isResumeDisabled, setCachedSessionId } from "./session-cache.js";
import { storeHermesSessionDetail } from "./session-detail.js";
import { loadHermesSession, storeHermesSessionDetail } from "./session-detail.js";
const log = createLogger({ sink: { kind: "stderr" } });
@@ -49,17 +49,11 @@ export function buildHermesPrompt(ctx: AgentContext): string {
return parts.join("\n");
}
async function storePromptResult(
store: Store,
sessionId: string,
messages: Awaited<ReturnType<HermesAcpClient["prompt"]>>["messages"],
): Promise<{ detailHash: string }> {
const session = {
session_id: sessionId,
model: "",
session_start: new Date().toISOString(),
messages,
};
async function storePromptResult(store: Store, sessionId: string): Promise<{ detailHash: string }> {
const session = await loadHermesSession(sessionId);
if (session === null) {
throw new Error(`Hermes session file not found: ${sessionId}`);
}
return storeHermesSessionDetail(store, session);
}
@@ -116,8 +110,8 @@ export function createHermesAgent(): () => Promise<void> {
async function runPrompt(ctx: AgentContext, useContinuation: boolean): Promise<AgentRunResult> {
const effectiveCtx = useContinuation ? ctx : { ...ctx, isFirstVisit: true };
const fullPrompt = buildHermesPrompt(effectiveCtx);
const { text, sessionId, messages } = await client.prompt(fullPrompt);
const { detailHash } = await storePromptResult(ctx.store, sessionId, messages);
const { text, sessionId } = await client.prompt(fullPrompt);
const { detailHash } = await storePromptResult(ctx.store, sessionId);
if (!isResumeDisabled()) {
await setCachedSessionId(ctx.threadId, ctx.role, sessionId);
@@ -152,8 +146,8 @@ export function createHermesAgent(): () => Promise<void> {
): Promise<AgentRunResult> {
// Client is already connected from runHermes — same ACP session,
// so the agent sees the full conversation history (crucial for retries).
const { text, sessionId, messages } = await client.prompt(message);
const { detailHash } = await storePromptResult(store, sessionId, messages);
const { text, sessionId } = await client.prompt(message);
const { detailHash } = await storePromptResult(store, sessionId);
return { output: text, detailHash, sessionId };
}
+2 -2
View File
@@ -15,8 +15,8 @@
}
},
"dependencies": {
"@uncaged/json-cas": "^0.5.2",
"@uncaged/json-cas-fs": "^0.5.2"
"@uncaged/json-cas": "^0.5.3",
"@uncaged/json-cas-fs": "^0.5.3"
},
"devDependencies": {
"typescript": "^5.8.3"
+2 -2
View File
@@ -19,8 +19,8 @@
"test:ci": "bun test"
},
"dependencies": {
"@uncaged/json-cas": "^0.5.2",
"@uncaged/json-cas-fs": "^0.5.2",
"@uncaged/json-cas": "^0.5.3",
"@uncaged/json-cas-fs": "^0.5.3",
"@uncaged/workflow-protocol": "workspace:^",
"@uncaged/workflow-util": "workspace:^",
"dotenv": "^16.6.1",