united-workforce

Author	SHA1	Message	Date
xiaoju	a536efee00	fix: simplify prompt subcommands, framework-agnostic bootstrap CI / check (pull_request) Successful in 3m24s Details - `uwf prompt usage` now outputs only the usage skill (was three combined) - `uwf prompt bootstrap` replaces `setup` with framework-agnostic instructions - Remove `usage-reference` and `setup` subcommands - Remove `generateBootstrapReference` from util (moved to cli) Fixes #99 小橘 🍊（NEKO Team）	2026-06-05 08:52:35 +00:00
xiaoju	9260d81084	chore: version bump for --version fix CI / check (push) Successful in 3m2s Details agent-hermes@0.1.2 agent-claude-code@0.1.1 agent-builtin@0.1.1 agent-mock@0.1.1 eval@0.1.3 util@0.1.1 小橘 🍊（NEKO Team）	2026-06-05 08:12:50 +00:00
xiaoju	abeb465f46	fix: acp-client reports own package version, not util VERSION CI / check (pull_request) Successful in 2m36s Details Address review nit from PR #97: clientInfo.version should be agent-hermes's own version for correct identification under independent versioning. 小橘 🍊（NEKO Team）	2026-06-05 07:50:03 +00:00
xiaoju	794f9db568	fix: add --version to adapter CLIs, read VERSION from package.json CI / check (pull_request) Successful in 3m29s Details - All uwf-* adapter CLIs now support --version / -V - util VERSION constant reads from package.json at runtime - agent-hermes ACP clientInfo uses dynamic VERSION 小橘 🍊（NEKO Team）	2026-06-05 07:29:54 +00:00
xiaoju	1cf8f350d0	fix: read eval CLI version from package.json CI / check (pull_request) Successful in 3m30s Details Fixes #95 小橘 🍊（NEKO Team）	2026-06-05 06:43:27 +00:00
xiaoju	427568a21d	chore: version bump agent-hermes@0.1.1 cli@0.1.1 eval@0.1.2 CI / check (push) Successful in 2m37s Details 小橘 🍊（NEKO Team）	2026-06-05 06:29:25 +00:00
xiaoju	8085d1d6e0	fix: read token usage from ACP response instead of DB CI / check (pull_request) Successful in 3m10s Details Tokens (inputTokens, outputTokens) now come from ACP PromptResponse.usage which is populated synchronously from run_conversation() — no WAL race. Turns still come from DB before/after snapshot. Previously both were read from hermes state.db after ACP prompt returned, but WAL write lag caused incomplete token data (e.g. 235 vs actual 26,080). Refs #91	2026-06-05 06:08:11 +00:00
xiaoju	825f0c641a	fix: resolve --agent override via config alias before raw command CI / check (pull_request) Successful in 3m37s Details When --agent is passed to uwf thread exec, try config.agents[alias] first (e.g. 'hermes' → config.agents.hermes = {command: 'uwf-hermes'}), then fall back to parseAgentOverride for raw command names. Also change eval CLI default --agent from 'hermes' to 'uwf-hermes' so it works without config alias lookup. Refs #91	2026-06-05 04:20:09 +00:00
xiaoju	81bbe1178f	chore: release @united-workforce/eval@0.1.1 CI / check (push) Successful in 2m45s Details	2026-06-05 03:02:05 +00:00
xiaoju	a08775896f	fix: frontmatter judge handles parsed object output CI / check (pull_request) Successful in 2m38s Details The extract pipeline stores step output as a JSON object in CAS, but the frontmatter judge only checked for raw markdown strings. Now accepts both formats: parsed objects check $status directly, raw strings go through YAML frontmatter extraction. Fixes eval frontmatter-compliance scoring 0 on valid outputs.	2026-06-05 02:55:58 +00:00
xiaoju	c892b9125b	chore: remove prepublishOnly guards (proman handles release) CI / check (push) Successful in 2m26s Details	2026-06-05 02:29:53 +00:00
xiaoju	5edb67b79d	chore: prepare 0.1.0 release CI / check (pull_request) Successful in 2m12s Details - Remove legacy .changeset/ directory (no longer used) - Add eval package to proman.yaml - Set eval package to public for npm publishing	2026-06-05 02:21:24 +00:00
xiaoju	63cb4d3645	fix: remove _ single-exit for user roles CI / check (pull_request) Successful in 3m7s Details $START keeps _ (special entry node). All user-defined roles now require explicit $status enum in frontmatter + matching graph keys. - moderator: remove UNIT_STATUS fallback, error on missing $status - validate: reject _ graph keys for non-$START roles - validate-semantic: remove checkSingleExitRole(), require $status enum - update all test fixtures to use explicit status values - fix examples/analyze-topic.yaml Fixes #86	2026-06-05 02:00:45 +00:00
xiaoju	ae81e4b5ac	feat: eval report, diff, list commands CI / check (pull_request) Successful in 1m44s Details Implement the 3 read commands for eval framework: - report: read eval-run from CAS, render formatted text (task, overall, config, judges table, thread ID) - diff: side-by-side comparison with ▲/▼ delta indicators and config change markers - list: scan @uwf/eval/*/latest variables, sort by timestamp desc, --task filter, --limit pagination Architecture: pure formatting functions (format.ts) + data access (read.ts) + thin CLI handlers. Types in types.ts. 11 new tests (formatReport, formatDiff, formatList, selectEntries) Refs #72	2026-06-05 00:19:25 +00:00
xiaoju	8c26f16716	feat: builtin judges — frontmatter + token-stats (deterministic) + upstream/hallucination (stubs) CI / check (pull_request) Successful in 1m45s Details Implement 4 builtin judges for eval framework: - frontmatter-compliance: validates YAML frontmatter with $status field, score = stepsValid / stepsTotal - token-stats: aggregates Usage from step nodes, always score 1.0 (informational only) - upstream-consumption: LLM-as-judge stub (score 0, TODO) - hallucination: LLM-as-judge stub (score 0, TODO) Infrastructure: - judge/builtin/read-steps.ts — shell out to uwf step list - judge/builtin/types.ts — BuiltinJudge, BuiltinJudgeOutput - runner/collect.ts — dispatch builtin judges by name 9 new tests (frontmatter validation + token aggregation) Refs #71	2026-06-05 00:09:06 +00:00
xiaoju	fae9e9ed3a	feat: eval run command — prepare, execute, collect pipeline CI / check (pull_request) Successful in 1m45s Details Implement the uwf-eval run <task-dir> command with 3-phase pipeline: - prepare: read task.yaml, copy fixture/ to temp workdir - execute: shell out to uwf thread start + exec - collect: run judges, compute weighted score, store CAS node, set @uwf/eval/<task>/latest variable Changes: - src/runner/ — types, prepare, execute, collect, index - src/storage/store.ts — createEvalStore(), setEvalLatest() - src/commands/run.ts — full pipeline wiring with --agent/--model/--count - 9 new tests (prepare + collect + weighted scoring) Builtin judges return placeholder score 0 (Phase 1c). Refs #70	2026-06-04 23:59:21 +00:00
xiaoju	99619d85db	feat: eval package scaffold with CLI, schemas, types, task loader CI / check (pull_request) Successful in 1m42s Details New package @united-workforce/eval (uwf-eval CLI): - CLI skeleton: run/report/diff/list subcommands (stubs) - 5 OCAS schemas: eval-run, judge-frontmatter, judge-upstream, judge-hallucination, judge-token-stats - TaskManifest type + parser/validator for task.yaml - JudgeOutput/JudgeInput types for judge contract - EvalRunPayload/EvalRunConfig/EvalJudgeRecord storage types - 19 unit tests: task loader validation + schema definitions Refs #69	2026-06-04 23:42:16 +00:00
xiaoju	1593dbb521	fix: compute usage as delta for session re-entry CI / check (pull_request) Successful in 1m41s Details On session resume, turns/inputTokens/outputTokens were cumulative (entire session history) instead of per-step increments. Now we snapshot metrics before prompt, compare after, and report the delta. Changes: - acp-client: add getSessionId() accessor - hermes: extract snapshotUsage() + computeUsageDelta() pure functions - hermes: runPrompt/runHermes/continueHermes use before/after snapshots - 9 new unit tests for usage delta computation Refs #68	2026-06-04 23:22:16 +00:00
xiaoju	d1c523c442	feat: agent-hermes reads real token counts from session DB CI / check (pull_request) Successful in 1m41s Details - Add inputTokens/outputTokens to HermesSessionJson type - Query input_tokens, output_tokens from sessions table in loadHermesSessionFromDb - Update test fixture schema with token columns - runPrompt now reports real token counts from Hermes state.db Refs #76, #68	2026-06-04 23:06:52 +00:00
xiaoju	be92cb2dd2	feat: agent-claude-code reports real $usage from stream-json output CI / check (pull_request) Successful in 1m40s Details - Map parsed numTurns, inputTokens, outputTokens, durationMs to Usage type - Add @united-workforce/protocol dependency + tsconfig reference - 747 tests pass Fixes #77 Refs #68	2026-06-04 22:36:44 +00:00
xiaoju	7681e8b8e2	feat: agent-hermes reports $usage (turns + duration) CI / check (pull_request) Successful in 1m40s Details - Count assistant turns from session messages - Measure wall-clock duration per prompt call - inputTokens/outputTokens remain 0 (ACP protocol doesn't expose token data yet) - Both runPrompt and continueHermes report usage Fixes #76 Refs #68	2026-06-04 22:30:14 +00:00
xiaoju	248ac710fd	feat: agent-mock emits fixed $usage stats CI / check (pull_request) Successful in 1m41s Details - Mock agent returns {turns:1, inputTokens:0, outputTokens:0, duration:0} - E2E test 1 (linear workflow) asserts usage in CAS step nodes - 747 tests pass Fixes #75 Refs #68	2026-06-04 22:19:29 +00:00
xiaomo	172c232e61	Merge pull request 'feat: add $usage field to adapter protocol' (#80 ) from feat/74-usage-in-protocol into main CI / check (push) Successful in 1m41s Details feat: add $usage field to adapter protocol (#80)	2026-06-04 22:14:12 +00:00
xiaoju	99f40c2488	feat: add $usage field to adapter protocol CI / check (pull_request) Successful in 2m28s Details - Add Usage type to protocol (turns, inputTokens, outputTokens, duration) - Add usage to StepRecord, StepNodePayload, StepEntry, STEP_NODE_SCHEMA - Thread usage through util-agent extract pipeline (writeStepNode → persistStep → createAgent) - All adapters return usage: null as placeholder (mock, hermes, claude-code, builtin) - 746 tests pass, no breaking changes (usage not in schema required array) Fixes #74 Refs #68	2026-06-04 15:41:07 +00:00
xingyue	bf489c59a5	fix: agent bin fields point to dist/cli.js instead of src/cli.ts CI / check (pull_request) Successful in 3m23s Details All three agent packages had bin pointing to ./src/cli.ts (bun-era leftover). Node cannot execute .ts files directly, causing ERR_MODULE_NOT_FOUND when spawning agents. Closes #78	2026-06-04 23:25:39 +08:00
xingyue	83bcda60ff	refactor(prompt): rename subcommands and add frontmatter output CI / check (pull_request) Successful in 3m1s Details - Rename: user→usage-reference, author→workflow-authoring, adapter→adapter-developing - Remove: developer (content lives in CLAUDE.md) - All prompts output complete SKILL.md with YAML frontmatter - Setup instructions simplified: uwf prompt bootstrap > SKILL.md - Remove all bun references, use pnpm/npm - Fix CLAUDE.md: fixed→independent versioning - Delete old reference files (user/author/developer/adapter) Closes #66	2026-06-04 22:46:11 +08:00
xiaoju	3401873051	chore: rebranding cleanup — reset versions to 0.1.0, bun→pnpm in docs CI / check (pull_request) Successful in 2m49s Details - All 9 packages reset to version 0.1.0 - CLAUDE.md: bun→pnpm, fixed→independent versioning, proman commands - docs/architecture.md: bun→pnpm in toolchain table - docs/sync-readme.md: bun→pnpm in conventions	2026-06-04 13:05:26 +00:00
xiaoju	18170a4313	refactor: extract validateCount, replace CLI spawn with direct import CI / check (pull_request) Successful in 2m24s Details - Extract validateCount() from cmdThreadExec (throw instead of process.exit) - 5 validation tests now import validateCount directly (no subprocess) - Only --help tests still spawn CLI (need Commander output) - Test time: 1.7s → 475ms Fixes #61	2026-06-04 12:31:17 +00:00
xiaoju	8bf5b88172	chore: remove integration tests, clean up CI exclusion CI / check (pull_request) Successful in 2m41s Details Deleted: - acp-client.integration.test.ts (3 cases) - resume-e2e.integration.test.ts (1 case, already skipped) These tests spawn a real hermes CLI and hit live LLM, belonging to the eval layer (#34), not CI. ACP protocol parsing is already covered by unit test acp-client.test.ts. Also removed the --exclude integration/ hack from test:ci. Fixes #60	2026-06-04 12:19:24 +00:00
xiaoju	66c2e2a79b	fix: use node dist/cli.js instead of npx tsx in thread-step-count tests CI / check (pull_request) Successful in 3m30s Details npx tsx hangs in CI Docker (30s+ timeout). node dist/cli.js runs in <2s.	2026-06-04 11:57:32 +00:00
xiaoju	58b58d511e	fix: add timeout to cmdThreadExec count logic tests CI / check (pull_request) Failing after 4m17s Details	2026-06-04 11:48:46 +00:00
xiaoju	596c05bfcc	fix: use node dist/cli.js instead of npx tsx in prompt help test CI / check (pull_request) Failing after 3m40s Details npx tsx fails in CI (tsx not found, npm tries to install it)	2026-06-04 11:32:09 +00:00
xiaoju	d26f54e8ea	fix: biome format + remove unused noConsole suppressions CI / check (pull_request) Failing after 3m58s Details	2026-06-04 11:22:46 +00:00
xiaoju	883bd79bcb	fix: add timeout to CI-slow tests + check stderr for help output CI / check (pull_request) Failing after 1m55s Details	2026-06-04 11:18:49 +00:00
xiaoju	63454a4cfd	fix: OCAS_DIR → OCAS_HOME in test helpers + exclude integration tests from CI CI / check (pull_request) Failing after 2m27s Details - Remaining OCAS_DIR references caused test isolation failures - agent-hermes integration tests need 'hermes' CLI, skip in CI Fixes #58	2026-06-04 11:06:42 +00:00
xiaoju	9f5891169e	fix: add missing workflow destructure in current-role test CI / check (pull_request) Failing after 1m37s Details The createMarker call used shorthand 'workflow' but the variable was not destructured from cmdThreadStart. Fixes #56	2026-06-04 10:56:44 +00:00
xiaomo	f56e24cf82	Merge pull request 'test: expand E2E coverage — suspend, count, mustache, completed resume' (#51 ) from test/33-more-e2e into main CI / check (push) Failing after 1m28s Details test: expand E2E coverage — suspend, count, mustache, completed resume (#51)	2026-06-04 09:04:09 +00:00
xiaoju	974c2b8f1b	test: add E2E tests for suspend/resume, --count, mustache, and completed resume (#33 ) CI / check (pull_request) Failing after 1m40s Details 4 new E2E scenarios: 4. $SUSPEND → resume lifecycle (suspendedRole/suspendMessage metadata) 5. --count 3 runs entire pipeline in one invocation 6. mustache template variables rendered into edgePrompt 7. completed thread resume (衔尾蛇: end → start, CAS chain preserved) Total: 7 E2E scenarios, all passing.	2026-06-04 09:03:01 +00:00
xingyue	dbb7885ffd	chore: fix biome check errors (40 → 0) CI / check (pull_request) Failing after 1m39s Details - Auto-fix: import sorting, formatting (17 files) - Unsafe auto-fix: unused vars, template literals (7 files) - Manual: nursery/noConsole → suspicious/noConsole suppression - Manual: suppress noExcessiveCognitiveComplexity for cmdThreadResume and parseWorkflowPayload - Manual: remove unused destructured vars in current-role tests Closes #48	2026-06-04 16:45:45 +08:00
xiaomo	cd7e4e77ff	Merge pull request 'feat: agent-mock package for deterministic E2E testing (#33 )' (#44 ) from test/33-mock-agent into main CI / check (push) Failing after 1m38s Details feat: agent-mock package for deterministic E2E testing (#44)	2026-06-04 08:38:51 +00:00
xiaoju	80e8efb05e	test: E2E integration tests with uwf-mock agent (#33 ) CI / check (pull_request) Failing after 2m30s Details Three scenarios testing the full CLI pipeline: 1. Linear workflow (planner → worker → $END): CAS chain integrity 2. Loop workflow (developer ↔ reviewer): moderator routing through cycles 3. Role mismatch detection: agent catches routing bugs Uses workflow add → thread start → thread exec with uwf-mock, verifying CAS state, thread lifecycle, and error handling. Updated assertions to use getThread().status === 'completed' (aligned with PR #45 unified thread storage). Refs #33	2026-06-04 08:06:22 +00:00
xiaoju	75fb752a82	feat: add agent-mock package for deterministic E2E testing (#33 ) New package @united-workforce/agent-mock (uwf-mock CLI): - Reads pre-scripted outputs from a YAML mock data file (--mock-data) - Counts existing CAS chain steps to determine step index - Validates expected role matches actual moderator routing - Stores minimal detail node in CAS for valid step refs - Zero LLM, instant execution, 100% deterministic Usage in config.yaml: agents: mock: command: uwf-mock args: ["--mock-data", "./fixtures/scenario.yaml"] Refs #33	2026-06-04 08:00:07 +00:00
xingyue	06af1dc668	fix: resolve workflow from CAS chain in collectCompletedThreads CI / check (pull_request) Failing after 1m28s Details Instead of hardcoding workflow as empty string for completed/cancelled threads, use resolveWorkflowFromHead to get the actual workflow hash from the CAS chain, consistent with active thread handling. Closes #46	2026-06-04 15:35:08 +08:00
xiaomo	bbea89c067	Merge pull request 'refactor: unified thread storage + resume completed threads' (#45 ) from refactor/39-unified-thread-storage into main CI / check (push) Failing after 1m26s Details refactor: unified thread storage + resume completed threads (#45)	2026-06-04 07:25:56 +00:00
xingyue	bda3e3a861	feat(cli): resume completed threads (衔尾蛇: end → start) CI / check (pull_request) Failing after 3m45s Details uwf thread resume now supports completed threads: - Evaluates workflow graph from $START to find first role - Clears completed state (status → idle, completedAt → null) - Builds resume prompt with supplement context - Full CAS chain preserved for rich context Suspended resume behavior unchanged. Cancelled/idle threads still rejected. 425 tests pass. Part of #39, closes #43	2026-06-04 15:13:47 +08:00
xingyue	ca7b68ca5f	refactor(cli): unify thread storage, remove history prefix - store.ts: all threads in @uwf/thread/* with status tag - Remove HISTORY_VAR_PREFIX, ThreadHistoryLine, deleteThread - Add loadActiveThreads, loadHistoryThreads, completeThread - Add migrateHistoryVarsToThreadVars migration - thread.ts: replace deleteThread+addHistoryEntry with completeThread - shared.ts: remove findHistoryEntry fallback - Update all tests for unified storage model 422 tests pass. Part of #39, closes #41, closes #42	2026-06-04 15:01:20 +08:00
xingyue	23e2ae9eb4	refactor(protocol): add status + completedAt to ThreadIndexEntry - ThreadIndexEntry gains status and completedAt fields - createThreadIndexEntry defaults to idle/null - normalizeThreadIndexEntry backward-compat defaults - updateThreadHead resets to idle (衔尾蛇 resume prep) - markThreadSuspended sets status=suspended - New markThreadCompleted(entry, status, now) function - serializeThreadIndexEntry includes new fields Part of #39, closes #40	2026-06-04 14:42:14 +08:00
xiaoju	6b7636b088	refactor: unify env vars (UWF_HOME, OCAS_HOME) + env only in CLI (#37 ) CI / check (pull_request) Failing after 3m6s Details Breaking changes: - UWF_STORAGE_ROOT → UWF_HOME - WORKFLOW_STORAGE_ROOT removed (no fallback) - OCAS_DIR → OCAS_HOME (aligned with ocas CLI) Library functions no longer read process.env: - util-agent/storage.ts: resolveStorageRoot(override), getGlobalCasDir(override) - agent-hermes: isResumeDisabled(flag) pure function, CLI reads env - agent-claude-code: CLI reads CLAUDE_MODEL and passes to agent Fixes #37	2026-06-04 05:12:05 +00:00
xiaoju	06e959e7a5	test: add unit tests for core modules (#35 ) CI / check (pull_request) Failing after 1m39s Details Cover high-priority untested modules: - util: base32, result, refs-field, storage-root, log-tag - util-agent: storage (normalizeWorkflowConfig, resolveStorageRoot), run (parseArgv) - agent-builtin: tools (read-file, write-file, run-command), session, detail 627 → 719 tests (+92), all passing. Refs #35	2026-06-04 04:35:33 +00:00
xiaoju	90893b0aa8	chore: integrate proman scaffold CI / check (pull_request) Failing after 1m47s Details - Add proman.yaml with 8 packages in dependency order - Add @shazhou/proman as devDependency - Replace root scripts: build/test/check/format → proman commands - Keep typecheck script for standalone tsc --build Fixes #27	2026-06-04 03:10:14 +00:00

1 2 3 4 5 ...

668 Commits