xiaoju
d97840cf8d
chore: release cli@0.3.0 util@0.1.3 agent-hermes@0.1.3 agent-claude-code@0.1.2 agent-builtin@0.1.2 agent-mock@0.1.2
CI / check (push) Successful in 1m46s
2026-06-06 00:13:48 +00:00
xiaoju
f989dee85b
fix: bootstrap — remind to restart session after skill install/update
...
CI / check (pull_request) Successful in 1m42s
- Step 3 (fresh install): warn skills not active until new session
- Step 2 (upgrade): same reminder after regenerating skills
- Step 3 (upgrade): add v0.2.1 migration note for enum → const
Refs #123
2026-06-05 23:48:53 +00:00
xiaoju
68079cc003
fix: unify $status to const-only, drop enum support ( #123 )
...
CI / check (pull_request) Successful in 1m43s
- Validator: hasStatusConst/getConstStatuses replace enum checks
- enum in $status is now rejected with clear error message
- All docs/examples/tests migrated from enum to const/oneOf
- bootstrap hello.yaml updated
Fixes #123
2026-06-05 23:31:56 +00:00
xiaoju
1a37928bb9
fix: workflow-authoring docs — type:object + const vs enum clarity ( #123 )
...
CI / check (pull_request) Successful in 1m41s
- Add type:object to all frontmatter examples (flat and oneOf)
- Restructure $status section: Multi-exit (oneOf/const) vs Single-exit (flat/enum)
- Add Important rules box clarifying validation requirements
- Restore Custom Fields subsection
Fixes #123
2026-06-05 23:13:54 +00:00
xiaoju
adc3982a4a
fix: bootstrap agent discovery + adapter version independence ( #120 )
...
CI / check (pull_request) Successful in 1m42s
- Step 1: detect hermes/claude before choosing adapter
- Adapter versions independent from CLI — install @latest
- ACP verification: hermes acp --help
- Remove uwf-builtin (not ready)
Refs #120
2026-06-05 22:29:35 +00:00
xiaoju
caba82fe36
fix: bootstrap PATH fix guidance — find binary location + update shell config ( #118 #1 )
CI / check (pull_request) Successful in 1m44s
2026-06-05 16:45:33 +00:00
xiaoju
6aee2ed5ef
fix: bootstrap docs — pnpm/npm parity, adapter order, preset table ( #118 )
...
CI / check (pull_request) Successful in 2m27s
- Show pnpm and npm install commands side-by-side
- Clarify adapter must be installed before uwf setup --agent
- Add version verification steps with PATH troubleshooting
- --agent takes adapter command name (uwf-hermes), not npm package
- Preset providers shown as table with default base URLs
- Non-preset providers must specify --base-url manually
Fixes #118 (#2 , #3 , #4 , #5 )
2026-06-05 16:41:35 +00:00
xiaoju
7a788a9d90
fix: suppress ExperimentalWarning, PEP 668 guidance, setup help
...
CI / check (pull_request) Successful in 2m31s
- All 5 CLI bins: shebang --disable-warning=ExperimentalWarning
- Remove NODE_OPTIONS injection from thread.ts spawn (redundant now)
- Bootstrap pip install: venv (recommended) / pipx / source options
- setup --help mentions interactive wizard mode
- Update shebang test to accept -S flag
Fixes #116
2026-06-05 16:12:06 +00:00
xiaoju
fde87b6274
fix: setup UX improvements — adapter check, ENOENT, SQLite warning, VERSION, PATH docs
...
CI / check (pull_request) Successful in 2m24s
- setup validates adapter binary availability, prints install command if missing
- setup prints 'Config saved to <path> ✓' on success
- spawn ENOENT gives actionable error with which command
- SQLite ExperimentalWarning suppressed via NODE_OPTIONS
- bootstrap VERSION reads cli package.json (was reading util)
- bootstrap PATH guidance is shell-agnostic
Fixes #114
2026-06-05 15:42:22 +00:00
xiaoju
3be92bfac2
fix: bootstrap adds Step 0 environment pre-flight check
...
CI / check (pull_request) Successful in 3m44s
- Node.js, pnpm/npm, global bin PATH, hermes CLI checks with FIX instructions
- Agent must pass all checks before proceeding to install
- Install commands changed from npm to pnpm (with npm fallback)
- hermes PATH guidance moved from Step 1 to Step 0
Fixes #112
2026-06-05 14:09:33 +00:00
xiaoju
5450bc1230
fix: workflow-authoring flat schema, bootstrap PATH guidance
...
CI / check (pull_request) Successful in 2m18s
- #110.3: flat schema example uses enum: [done] instead of bare const
(bare const fails validate-semantic hasStatusEnum check)
- #110.4: bootstrap adds 'which hermes' PATH check and venv guidance
- #110.1: already fixed in rc.1 (inline hello.yaml)
- #110.2: already fixed in rc.1 (capabilities: [] present)
Fixes #110
2026-06-05 11:44:20 +00:00
xiaoju
57ae6d1755
fix: preset base-url auto-fill, bootstrap ACP docs, friendlier errors
...
CI / check (pull_request) Successful in 2m26s
- #106 : uwf setup --provider <preset> now auto-fills --base-url
- #107 : bootstrap documents hermes ACP dependency (pip install hermes-agent[acp])
- #107 : verify step uses inline hello.yaml instead of missing examples/eval-simple.yaml
- #108 : workflow name mismatch error suggests how to fix (rename file or change YAML name)
Fixes #106 , Fixes #107 , Fixes #108
2026-06-05 11:06:35 +00:00
xiaoju
c5eb8b79d1
fix: expand bootstrap prompt with full onboarding and upgrade guide
...
CI / check (pull_request) Successful in 2m56s
- Fresh install: CLI + adapter install, uwf setup, skills, e2e verify
- Upgrade: update packages, regenerate skills, migrate workflows
- Explicitly tells agent to ask user for provider/api-key/model
- Lists all available adapters with install commands
- Documents v0.2.0 $START migration
Fixes #104
2026-06-05 10:35:01 +00:00
xiaoju
36a3ca6a08
chore: bump cli@0.2.0, util@0.1.2
CI / check (push) Successful in 2m25s
2026-06-05 10:11:19 +00:00
xiaoju
a47871ec4e
chore: remove unused moderator-reference and yaml-reference
...
CI / check (pull_request) Successful in 2m1s
These generate* functions were exported from util but never consumed
by any code. Dead exports are maintenance burden.
Refs #101
2026-06-05 09:44:50 +00:00
xiaoju
fbfd31a042
feat: replace $START _ status with new/resume semantics
...
CI / check (pull_request) Successful in 2m27s
BREAKING: All workflow YAML files must update $START._ to $START.new + $START.resume.
The resume edge prompt replaces the previously hardcoded resume message.
- evaluate.ts: remove START_ROLE/START_STATUS special case, use $status like all nodes
- thread.ts: resolveEvaluateArgs passes 'new', cmdThreadResume passes 'resume'
- validate.ts: reject '_' everywhere (no longer valid)
- validate-semantic.ts: require 'new' and 'resume' edges on $START
- All workflow YAMLs and test fixtures updated
Fixes #101
2026-06-05 09:30:09 +00:00
xiaoju
a536efee00
fix: simplify prompt subcommands, framework-agnostic bootstrap
...
CI / check (pull_request) Successful in 3m24s
- `uwf prompt usage` now outputs only the usage skill (was three combined)
- `uwf prompt bootstrap` replaces `setup` with framework-agnostic instructions
- Remove `usage-reference` and `setup` subcommands
- Remove `generateBootstrapReference` from util (moved to cli)
Fixes #99
小橘 🍊 (NEKO Team)
2026-06-05 08:52:35 +00:00
xiaoju
9260d81084
chore: version bump for --version fix
...
CI / check (push) Successful in 3m2s
agent-hermes@0.1.2 agent-claude-code@0.1.1 agent-builtin@0.1.1
agent-mock@0.1.1 eval@0.1.3 util@0.1.1
小橘 🍊 (NEKO Team)
2026-06-05 08:12:50 +00:00
xiaoju
abeb465f46
fix: acp-client reports own package version, not util VERSION
...
CI / check (pull_request) Successful in 2m36s
Address review nit from PR #97 : clientInfo.version should be
agent-hermes's own version for correct identification under
independent versioning.
小橘 🍊 (NEKO Team)
2026-06-05 07:50:03 +00:00
xiaoju
794f9db568
fix: add --version to adapter CLIs, read VERSION from package.json
...
CI / check (pull_request) Successful in 3m29s
- All uwf-* adapter CLIs now support --version / -V
- util VERSION constant reads from package.json at runtime
- agent-hermes ACP clientInfo uses dynamic VERSION
小橘 🍊 (NEKO Team)
2026-06-05 07:29:54 +00:00
xiaoju
1cf8f350d0
fix: read eval CLI version from package.json
...
CI / check (pull_request) Successful in 3m30s
Fixes #95
小橘 🍊 (NEKO Team)
2026-06-05 06:43:27 +00:00
xiaoju
427568a21d
chore: version bump agent-hermes@0.1.1 cli@0.1.1 eval@0.1.2
...
CI / check (push) Successful in 2m37s
小橘 🍊 (NEKO Team)
2026-06-05 06:29:25 +00:00
xiaoju
8085d1d6e0
fix: read token usage from ACP response instead of DB
...
CI / check (pull_request) Successful in 3m10s
Tokens (inputTokens, outputTokens) now come from ACP PromptResponse.usage
which is populated synchronously from run_conversation() — no WAL race.
Turns still come from DB before/after snapshot.
Previously both were read from hermes state.db after ACP prompt returned,
but WAL write lag caused incomplete token data (e.g. 235 vs actual 26,080).
Refs #91
2026-06-05 06:08:11 +00:00
xiaoju
825f0c641a
fix: resolve --agent override via config alias before raw command
...
CI / check (pull_request) Successful in 3m37s
When --agent is passed to uwf thread exec, try config.agents[alias]
first (e.g. 'hermes' → config.agents.hermes = {command: 'uwf-hermes'}),
then fall back to parseAgentOverride for raw command names.
Also change eval CLI default --agent from 'hermes' to 'uwf-hermes'
so it works without config alias lookup.
Refs #91
2026-06-05 04:20:09 +00:00
xiaoju
81bbe1178f
chore: release @united-workforce/eval@0.1.1
CI / check (push) Successful in 2m45s
2026-06-05 03:02:05 +00:00
xiaoju
a08775896f
fix: frontmatter judge handles parsed object output
...
CI / check (pull_request) Successful in 2m38s
The extract pipeline stores step output as a JSON object in CAS,
but the frontmatter judge only checked for raw markdown strings.
Now accepts both formats: parsed objects check $status directly,
raw strings go through YAML frontmatter extraction.
Fixes eval frontmatter-compliance scoring 0 on valid outputs.
2026-06-05 02:55:58 +00:00
xiaoju
c892b9125b
chore: remove prepublishOnly guards (proman handles release)
CI / check (push) Successful in 2m26s
2026-06-05 02:29:53 +00:00
xiaoju
5edb67b79d
chore: prepare 0.1.0 release
...
CI / check (pull_request) Successful in 2m12s
- Remove legacy .changeset/ directory (no longer used)
- Add eval package to proman.yaml
- Set eval package to public for npm publishing
2026-06-05 02:21:24 +00:00
xiaoju
63cb4d3645
fix: remove _ single-exit for user roles
...
CI / check (pull_request) Successful in 3m7s
$START keeps _ (special entry node). All user-defined roles now require
explicit $status enum in frontmatter + matching graph keys.
- moderator: remove UNIT_STATUS fallback, error on missing $status
- validate: reject _ graph keys for non-$START roles
- validate-semantic: remove checkSingleExitRole(), require $status enum
- update all test fixtures to use explicit status values
- fix examples/analyze-topic.yaml
Fixes #86
2026-06-05 02:00:45 +00:00
xiaoju
ae81e4b5ac
feat: eval report, diff, list commands
...
CI / check (pull_request) Successful in 1m44s
Implement the 3 read commands for eval framework:
- report: read eval-run from CAS, render formatted text
(task, overall, config, judges table, thread ID)
- diff: side-by-side comparison with ▲/▼ delta indicators
and config change markers
- list: scan @uwf/eval/*/latest variables, sort by timestamp desc,
--task filter, --limit pagination
Architecture: pure formatting functions (format.ts) + data access
(read.ts) + thin CLI handlers. Types in types.ts.
11 new tests (formatReport, formatDiff, formatList, selectEntries)
Refs #72
2026-06-05 00:19:25 +00:00
xiaoju
8c26f16716
feat: builtin judges — frontmatter + token-stats (deterministic) + upstream/hallucination (stubs)
...
CI / check (pull_request) Successful in 1m45s
Implement 4 builtin judges for eval framework:
- frontmatter-compliance: validates YAML frontmatter with $status field,
score = stepsValid / stepsTotal
- token-stats: aggregates Usage from step nodes, always score 1.0
(informational only)
- upstream-consumption: LLM-as-judge stub (score 0, TODO)
- hallucination: LLM-as-judge stub (score 0, TODO)
Infrastructure:
- judge/builtin/read-steps.ts — shell out to uwf step list
- judge/builtin/types.ts — BuiltinJudge, BuiltinJudgeOutput
- runner/collect.ts — dispatch builtin judges by name
9 new tests (frontmatter validation + token aggregation)
Refs #71
2026-06-05 00:09:06 +00:00
xiaoju
fae9e9ed3a
feat: eval run command — prepare, execute, collect pipeline
...
CI / check (pull_request) Successful in 1m45s
Implement the uwf-eval run <task-dir> command with 3-phase pipeline:
- prepare: read task.yaml, copy fixture/ to temp workdir
- execute: shell out to uwf thread start + exec
- collect: run judges, compute weighted score, store CAS node,
set @uwf/eval/<task>/latest variable
Changes:
- src/runner/ — types, prepare, execute, collect, index
- src/storage/store.ts — createEvalStore(), setEvalLatest()
- src/commands/run.ts — full pipeline wiring with --agent/--model/--count
- 9 new tests (prepare + collect + weighted scoring)
Builtin judges return placeholder score 0 (Phase 1c).
Refs #70
2026-06-04 23:59:21 +00:00
xiaoju
99619d85db
feat: eval package scaffold with CLI, schemas, types, task loader
...
CI / check (pull_request) Successful in 1m42s
New package @united-workforce/eval (uwf-eval CLI):
- CLI skeleton: run/report/diff/list subcommands (stubs)
- 5 OCAS schemas: eval-run, judge-frontmatter, judge-upstream,
judge-hallucination, judge-token-stats
- TaskManifest type + parser/validator for task.yaml
- JudgeOutput/JudgeInput types for judge contract
- EvalRunPayload/EvalRunConfig/EvalJudgeRecord storage types
- 19 unit tests: task loader validation + schema definitions
Refs #69
2026-06-04 23:42:16 +00:00
xiaoju
1593dbb521
fix: compute usage as delta for session re-entry
...
CI / check (pull_request) Successful in 1m41s
On session resume, turns/inputTokens/outputTokens were cumulative
(entire session history) instead of per-step increments. Now we
snapshot metrics before prompt, compare after, and report the delta.
Changes:
- acp-client: add getSessionId() accessor
- hermes: extract snapshotUsage() + computeUsageDelta() pure functions
- hermes: runPrompt/runHermes/continueHermes use before/after snapshots
- 9 new unit tests for usage delta computation
Refs #68
2026-06-04 23:22:16 +00:00
xiaoju
d1c523c442
feat: agent-hermes reads real token counts from session DB
...
CI / check (pull_request) Successful in 1m41s
- Add inputTokens/outputTokens to HermesSessionJson type
- Query input_tokens, output_tokens from sessions table in loadHermesSessionFromDb
- Update test fixture schema with token columns
- runPrompt now reports real token counts from Hermes state.db
Refs #76 , #68
2026-06-04 23:06:52 +00:00
xiaoju
be92cb2dd2
feat: agent-claude-code reports real $usage from stream-json output
...
CI / check (pull_request) Successful in 1m40s
- Map parsed numTurns, inputTokens, outputTokens, durationMs to Usage type
- Add @united-workforce/protocol dependency + tsconfig reference
- 747 tests pass
Fixes #77
Refs #68
2026-06-04 22:36:44 +00:00
xiaoju
7681e8b8e2
feat: agent-hermes reports $usage (turns + duration)
...
CI / check (pull_request) Successful in 1m40s
- Count assistant turns from session messages
- Measure wall-clock duration per prompt call
- inputTokens/outputTokens remain 0 (ACP protocol doesn't expose token data yet)
- Both runPrompt and continueHermes report usage
Fixes #76
Refs #68
2026-06-04 22:30:14 +00:00
xiaoju
248ac710fd
feat: agent-mock emits fixed $usage stats
...
CI / check (pull_request) Successful in 1m41s
- Mock agent returns {turns:1, inputTokens:0, outputTokens:0, duration:0}
- E2E test 1 (linear workflow) asserts usage in CAS step nodes
- 747 tests pass
Fixes #75
Refs #68
2026-06-04 22:19:29 +00:00
xiaomo
172c232e61
Merge pull request 'feat: add $usage field to adapter protocol' ( #80 ) from feat/74-usage-in-protocol into main
...
CI / check (push) Successful in 1m41s
feat: add $usage field to adapter protocol (#80 )
2026-06-04 22:14:12 +00:00
xiaoju
99f40c2488
feat: add $usage field to adapter protocol
...
CI / check (pull_request) Successful in 2m28s
- Add Usage type to protocol (turns, inputTokens, outputTokens, duration)
- Add usage to StepRecord, StepNodePayload, StepEntry, STEP_NODE_SCHEMA
- Thread usage through util-agent extract pipeline (writeStepNode → persistStep → createAgent)
- All adapters return usage: null as placeholder (mock, hermes, claude-code, builtin)
- 746 tests pass, no breaking changes (usage not in schema required array)
Fixes #74
Refs #68
2026-06-04 15:41:07 +00:00
xingyue
bf489c59a5
fix: agent bin fields point to dist/cli.js instead of src/cli.ts
...
CI / check (pull_request) Successful in 3m23s
All three agent packages had bin pointing to ./src/cli.ts (bun-era
leftover). Node cannot execute .ts files directly, causing
ERR_MODULE_NOT_FOUND when spawning agents.
Closes #78
2026-06-04 23:25:39 +08:00
xingyue
83bcda60ff
refactor(prompt): rename subcommands and add frontmatter output
...
CI / check (pull_request) Successful in 3m1s
- Rename: user→usage-reference, author→workflow-authoring, adapter→adapter-developing
- Remove: developer (content lives in CLAUDE.md)
- All prompts output complete SKILL.md with YAML frontmatter
- Setup instructions simplified: uwf prompt bootstrap > SKILL.md
- Remove all bun references, use pnpm/npm
- Fix CLAUDE.md: fixed→independent versioning
- Delete old reference files (user/author/developer/adapter)
Closes #66
2026-06-04 22:46:11 +08:00
xiaoju
3401873051
chore: rebranding cleanup — reset versions to 0.1.0, bun→pnpm in docs
...
CI / check (pull_request) Successful in 2m49s
- All 9 packages reset to version 0.1.0
- CLAUDE.md: bun→pnpm, fixed→independent versioning, proman commands
- docs/architecture.md: bun→pnpm in toolchain table
- docs/sync-readme.md: bun→pnpm in conventions
2026-06-04 13:05:26 +00:00
xiaoju
18170a4313
refactor: extract validateCount, replace CLI spawn with direct import
...
CI / check (pull_request) Successful in 2m24s
- Extract validateCount() from cmdThreadExec (throw instead of process.exit)
- 5 validation tests now import validateCount directly (no subprocess)
- Only --help tests still spawn CLI (need Commander output)
- Test time: 1.7s → 475ms
Fixes #61
2026-06-04 12:31:17 +00:00
xiaoju
8bf5b88172
chore: remove integration tests, clean up CI exclusion
...
CI / check (pull_request) Successful in 2m41s
Deleted:
- acp-client.integration.test.ts (3 cases)
- resume-e2e.integration.test.ts (1 case, already skipped)
These tests spawn a real hermes CLI and hit live LLM,
belonging to the eval layer (#34 ), not CI.
ACP protocol parsing is already covered by unit test
acp-client.test.ts.
Also removed the --exclude integration/ hack from test:ci.
Fixes #60
2026-06-04 12:19:24 +00:00
xiaoju
66c2e2a79b
fix: use node dist/cli.js instead of npx tsx in thread-step-count tests
...
CI / check (pull_request) Successful in 3m30s
npx tsx hangs in CI Docker (30s+ timeout). node dist/cli.js runs in <2s.
2026-06-04 11:57:32 +00:00
xiaoju
58b58d511e
fix: add timeout to cmdThreadExec count logic tests
CI / check (pull_request) Failing after 4m17s
2026-06-04 11:48:46 +00:00
xiaoju
596c05bfcc
fix: use node dist/cli.js instead of npx tsx in prompt help test
...
CI / check (pull_request) Failing after 3m40s
npx tsx fails in CI (tsx not found, npm tries to install it)
2026-06-04 11:32:09 +00:00
xiaoju
d26f54e8ea
fix: biome format + remove unused noConsole suppressions
CI / check (pull_request) Failing after 3m58s
2026-06-04 11:22:46 +00:00
xiaoju
883bd79bcb
fix: add timeout to CI-slow tests + check stderr for help output
CI / check (pull_request) Failing after 1m55s
2026-06-04 11:18:49 +00:00