Commit Graph

652 Commits

Author SHA1 Message Date
xiaoju 99619d85db feat: eval package scaffold with CLI, schemas, types, task loader
CI / check (pull_request) Successful in 1m42s
New package @united-workforce/eval (uwf-eval CLI):

- CLI skeleton: run/report/diff/list subcommands (stubs)
- 5 OCAS schemas: eval-run, judge-frontmatter, judge-upstream,
  judge-hallucination, judge-token-stats
- TaskManifest type + parser/validator for task.yaml
- JudgeOutput/JudgeInput types for judge contract
- EvalRunPayload/EvalRunConfig/EvalJudgeRecord storage types
- 19 unit tests: task loader validation + schema definitions

Refs #69
2026-06-04 23:42:16 +00:00
xiaoju 1593dbb521 fix: compute usage as delta for session re-entry
CI / check (pull_request) Successful in 1m41s
On session resume, turns/inputTokens/outputTokens were cumulative
(entire session history) instead of per-step increments. Now we
snapshot metrics before prompt, compare after, and report the delta.

Changes:
- acp-client: add getSessionId() accessor
- hermes: extract snapshotUsage() + computeUsageDelta() pure functions
- hermes: runPrompt/runHermes/continueHermes use before/after snapshots
- 9 new unit tests for usage delta computation

Refs #68
2026-06-04 23:22:16 +00:00
xiaoju d1c523c442 feat: agent-hermes reads real token counts from session DB
CI / check (pull_request) Successful in 1m41s
- Add inputTokens/outputTokens to HermesSessionJson type
- Query input_tokens, output_tokens from sessions table in loadHermesSessionFromDb
- Update test fixture schema with token columns
- runPrompt now reports real token counts from Hermes state.db

Refs #76, #68
2026-06-04 23:06:52 +00:00
xiaoju be92cb2dd2 feat: agent-claude-code reports real $usage from stream-json output
CI / check (pull_request) Successful in 1m40s
- Map parsed numTurns, inputTokens, outputTokens, durationMs to Usage type
- Add @united-workforce/protocol dependency + tsconfig reference
- 747 tests pass

Fixes #77
Refs #68
2026-06-04 22:36:44 +00:00
xiaoju 7681e8b8e2 feat: agent-hermes reports $usage (turns + duration)
CI / check (pull_request) Successful in 1m40s
- Count assistant turns from session messages
- Measure wall-clock duration per prompt call
- inputTokens/outputTokens remain 0 (ACP protocol doesn't expose token data yet)
- Both runPrompt and continueHermes report usage

Fixes #76
Refs #68
2026-06-04 22:30:14 +00:00
xiaoju 248ac710fd feat: agent-mock emits fixed $usage stats
CI / check (pull_request) Successful in 1m41s
- Mock agent returns {turns:1, inputTokens:0, outputTokens:0, duration:0}
- E2E test 1 (linear workflow) asserts usage in CAS step nodes
- 747 tests pass

Fixes #75
Refs #68
2026-06-04 22:19:29 +00:00
xiaomo 172c232e61 Merge pull request 'feat: add $usage field to adapter protocol' (#80) from feat/74-usage-in-protocol into main
CI / check (push) Successful in 1m41s
feat: add $usage field to adapter protocol (#80)
2026-06-04 22:14:12 +00:00
xiaoju 99f40c2488 feat: add $usage field to adapter protocol
CI / check (pull_request) Successful in 2m28s
- Add Usage type to protocol (turns, inputTokens, outputTokens, duration)
- Add usage to StepRecord, StepNodePayload, StepEntry, STEP_NODE_SCHEMA
- Thread usage through util-agent extract pipeline (writeStepNode → persistStep → createAgent)
- All adapters return usage: null as placeholder (mock, hermes, claude-code, builtin)
- 746 tests pass, no breaking changes (usage not in schema required array)

Fixes #74
Refs #68
2026-06-04 15:41:07 +00:00
xingyue bf489c59a5 fix: agent bin fields point to dist/cli.js instead of src/cli.ts
CI / check (pull_request) Successful in 3m23s
All three agent packages had bin pointing to ./src/cli.ts (bun-era
leftover). Node cannot execute .ts files directly, causing
ERR_MODULE_NOT_FOUND when spawning agents.

Closes #78
2026-06-04 23:25:39 +08:00
xingyue 83bcda60ff refactor(prompt): rename subcommands and add frontmatter output
CI / check (pull_request) Successful in 3m1s
- Rename: user→usage-reference, author→workflow-authoring, adapter→adapter-developing
- Remove: developer (content lives in CLAUDE.md)
- All prompts output complete SKILL.md with YAML frontmatter
- Setup instructions simplified: uwf prompt bootstrap > SKILL.md
- Remove all bun references, use pnpm/npm
- Fix CLAUDE.md: fixed→independent versioning
- Delete old reference files (user/author/developer/adapter)

Closes #66
2026-06-04 22:46:11 +08:00
xiaoju 3401873051 chore: rebranding cleanup — reset versions to 0.1.0, bun→pnpm in docs
CI / check (pull_request) Successful in 2m49s
- All 9 packages reset to version 0.1.0
- CLAUDE.md: bun→pnpm, fixed→independent versioning, proman commands
- docs/architecture.md: bun→pnpm in toolchain table
- docs/sync-readme.md: bun→pnpm in conventions
2026-06-04 13:05:26 +00:00
xiaoju 18170a4313 refactor: extract validateCount, replace CLI spawn with direct import
CI / check (pull_request) Successful in 2m24s
- Extract validateCount() from cmdThreadExec (throw instead of process.exit)
- 5 validation tests now import validateCount directly (no subprocess)
- Only --help tests still spawn CLI (need Commander output)
- Test time: 1.7s → 475ms

Fixes #61
2026-06-04 12:31:17 +00:00
xiaoju 8bf5b88172 chore: remove integration tests, clean up CI exclusion
CI / check (pull_request) Successful in 2m41s
Deleted:
- acp-client.integration.test.ts (3 cases)
- resume-e2e.integration.test.ts (1 case, already skipped)

These tests spawn a real hermes CLI and hit live LLM,
belonging to the eval layer (#34), not CI.

ACP protocol parsing is already covered by unit test
acp-client.test.ts.

Also removed the --exclude integration/ hack from test:ci.

Fixes #60
2026-06-04 12:19:24 +00:00
xiaoju 66c2e2a79b fix: use node dist/cli.js instead of npx tsx in thread-step-count tests
CI / check (pull_request) Successful in 3m30s
npx tsx hangs in CI Docker (30s+ timeout). node dist/cli.js runs in <2s.
2026-06-04 11:57:32 +00:00
xiaoju 58b58d511e fix: add timeout to cmdThreadExec count logic tests
CI / check (pull_request) Failing after 4m17s
2026-06-04 11:48:46 +00:00
xiaoju 596c05bfcc fix: use node dist/cli.js instead of npx tsx in prompt help test
CI / check (pull_request) Failing after 3m40s
npx tsx fails in CI (tsx not found, npm tries to install it)
2026-06-04 11:32:09 +00:00
xiaoju d26f54e8ea fix: biome format + remove unused noConsole suppressions
CI / check (pull_request) Failing after 3m58s
2026-06-04 11:22:46 +00:00
xiaoju 883bd79bcb fix: add timeout to CI-slow tests + check stderr for help output
CI / check (pull_request) Failing after 1m55s
2026-06-04 11:18:49 +00:00
xiaoju 63454a4cfd fix: OCAS_DIR → OCAS_HOME in test helpers + exclude integration tests from CI
CI / check (pull_request) Failing after 2m27s
- Remaining OCAS_DIR references caused test isolation failures
- agent-hermes integration tests need 'hermes' CLI, skip in CI

Fixes #58
2026-06-04 11:06:42 +00:00
xiaoju 9f5891169e fix: add missing workflow destructure in current-role test
CI / check (pull_request) Failing after 1m37s
The createMarker call used shorthand 'workflow' but the variable
was not destructured from cmdThreadStart.

Fixes #56
2026-06-04 10:56:44 +00:00
xiaomo f56e24cf82 Merge pull request 'test: expand E2E coverage — suspend, count, mustache, completed resume' (#51) from test/33-more-e2e into main
CI / check (push) Failing after 1m28s
test: expand E2E coverage — suspend, count, mustache, completed resume (#51)
2026-06-04 09:04:09 +00:00
xiaoju 974c2b8f1b test: add E2E tests for suspend/resume, --count, mustache, and completed resume (#33)
CI / check (pull_request) Failing after 1m40s
4 new E2E scenarios:
4. $SUSPEND → resume lifecycle (suspendedRole/suspendMessage metadata)
5. --count 3 runs entire pipeline in one invocation
6. mustache template variables rendered into edgePrompt
7. completed thread resume (衔尾蛇: end → start, CAS chain preserved)

Total: 7 E2E scenarios, all passing.
2026-06-04 09:03:01 +00:00
xingyue dbb7885ffd chore: fix biome check errors (40 → 0)
CI / check (pull_request) Failing after 1m39s
- Auto-fix: import sorting, formatting (17 files)
- Unsafe auto-fix: unused vars, template literals (7 files)
- Manual: nursery/noConsole → suspicious/noConsole suppression
- Manual: suppress noExcessiveCognitiveComplexity for cmdThreadResume and parseWorkflowPayload
- Manual: remove unused destructured vars in current-role tests

Closes #48
2026-06-04 16:45:45 +08:00
xiaomo cd7e4e77ff Merge pull request 'feat: agent-mock package for deterministic E2E testing (#33)' (#44) from test/33-mock-agent into main
CI / check (push) Failing after 1m38s
feat: agent-mock package for deterministic E2E testing (#44)
2026-06-04 08:38:51 +00:00
xiaoju 80e8efb05e test: E2E integration tests with uwf-mock agent (#33)
CI / check (pull_request) Failing after 2m30s
Three scenarios testing the full CLI pipeline:
1. Linear workflow (planner → worker → $END): CAS chain integrity
2. Loop workflow (developer ↔ reviewer): moderator routing through cycles
3. Role mismatch detection: agent catches routing bugs

Uses workflow add → thread start → thread exec with uwf-mock,
verifying CAS state, thread lifecycle, and error handling.

Updated assertions to use getThread().status === 'completed'
(aligned with PR #45 unified thread storage).

Refs #33
2026-06-04 08:06:22 +00:00
xiaoju 75fb752a82 feat: add agent-mock package for deterministic E2E testing (#33)
New package @united-workforce/agent-mock (uwf-mock CLI):
- Reads pre-scripted outputs from a YAML mock data file (--mock-data)
- Counts existing CAS chain steps to determine step index
- Validates expected role matches actual moderator routing
- Stores minimal detail node in CAS for valid step refs
- Zero LLM, instant execution, 100% deterministic

Usage in config.yaml:
  agents:
    mock:
      command: uwf-mock
      args: ["--mock-data", "./fixtures/scenario.yaml"]

Refs #33
2026-06-04 08:00:07 +00:00
xingyue 06af1dc668 fix: resolve workflow from CAS chain in collectCompletedThreads
CI / check (pull_request) Failing after 1m28s
Instead of hardcoding workflow as empty string for completed/cancelled
threads, use resolveWorkflowFromHead to get the actual workflow hash
from the CAS chain, consistent with active thread handling.

Closes #46
2026-06-04 15:35:08 +08:00
xiaomo bbea89c067 Merge pull request 'refactor: unified thread storage + resume completed threads' (#45) from refactor/39-unified-thread-storage into main
CI / check (push) Failing after 1m26s
refactor: unified thread storage + resume completed threads (#45)
2026-06-04 07:25:56 +00:00
xingyue bda3e3a861 feat(cli): resume completed threads (衔尾蛇: end → start)
CI / check (pull_request) Failing after 3m45s
uwf thread resume now supports completed threads:
- Evaluates workflow graph from $START to find first role
- Clears completed state (status → idle, completedAt → null)
- Builds resume prompt with supplement context
- Full CAS chain preserved for rich context

Suspended resume behavior unchanged.
Cancelled/idle threads still rejected.

425 tests pass.

Part of #39, closes #43
2026-06-04 15:13:47 +08:00
xingyue ca7b68ca5f refactor(cli): unify thread storage, remove history prefix
- store.ts: all threads in @uwf/thread/* with status tag
- Remove HISTORY_VAR_PREFIX, ThreadHistoryLine, deleteThread
- Add loadActiveThreads, loadHistoryThreads, completeThread
- Add migrateHistoryVarsToThreadVars migration
- thread.ts: replace deleteThread+addHistoryEntry with completeThread
- shared.ts: remove findHistoryEntry fallback
- Update all tests for unified storage model

422 tests pass.

Part of #39, closes #41, closes #42
2026-06-04 15:01:20 +08:00
xingyue 23e2ae9eb4 refactor(protocol): add status + completedAt to ThreadIndexEntry
- ThreadIndexEntry gains status and completedAt fields
- createThreadIndexEntry defaults to idle/null
- normalizeThreadIndexEntry backward-compat defaults
- updateThreadHead resets to idle (衔尾蛇 resume prep)
- markThreadSuspended sets status=suspended
- New markThreadCompleted(entry, status, now) function
- serializeThreadIndexEntry includes new fields

Part of #39, closes #40
2026-06-04 14:42:14 +08:00
xiaoju 6b7636b088 refactor: unify env vars (UWF_HOME, OCAS_HOME) + env only in CLI (#37)
CI / check (pull_request) Failing after 3m6s
Breaking changes:
- UWF_STORAGE_ROOT → UWF_HOME
- WORKFLOW_STORAGE_ROOT removed (no fallback)
- OCAS_DIR → OCAS_HOME (aligned with ocas CLI)

Library functions no longer read process.env:
- util-agent/storage.ts: resolveStorageRoot(override), getGlobalCasDir(override)
- agent-hermes: isResumeDisabled(flag) pure function, CLI reads env
- agent-claude-code: CLI reads CLAUDE_MODEL and passes to agent

Fixes #37
2026-06-04 05:12:05 +00:00
xiaoju 06e959e7a5 test: add unit tests for core modules (#35)
CI / check (pull_request) Failing after 1m39s
Cover high-priority untested modules:
- util: base32, result, refs-field, storage-root, log-tag
- util-agent: storage (normalizeWorkflowConfig, resolveStorageRoot), run (parseArgv)
- agent-builtin: tools (read-file, write-file, run-command), session, detail

627 → 719 tests (+92), all passing.

Refs #35
2026-06-04 04:35:33 +00:00
xiaoju 90893b0aa8 chore: integrate proman scaffold
CI / check (pull_request) Failing after 1m47s
- Add proman.yaml with 8 packages in dependency order
- Add @shazhou/proman as devDependency
- Replace root scripts: build/test/check/format → proman commands
- Keep typecheck script for standalone tsc --build

Fixes #27
2026-06-04 03:10:14 +00:00
xiaoju d0ef2c4676 chore: upgrade @ocas/* to ^0.3.0, migrate better-sqlite3 → node:sqlite
CI / check (pull_request) Failing after 1m13s
- @ocas/core and @ocas/fs upgraded from ^0.2.2 to ^0.3.0
- agent-hermes: replace better-sqlite3 with node:sqlite (DatabaseSync)
- Remove better-sqlite3 and @types/better-sqlite3 dependencies
- Fix remaining bun references in cli test helpers (execFileSync)

Refs #28
2026-06-04 01:59:00 +00:00
xiaoju fd33bf5ee1 fix: address PR review — await store.cas.put + bun shebang → node
CI / check (pull_request) Failing after 3m59s
- Add missing await on store.cas.put() in run.ts:192
- Replace #!/usr/bin/env bun → #!/usr/bin/env node in all CLI bins
- Update issue-551 test to assert node shebang
2026-06-03 14:58:23 +00:00
xiaoju 8cb74672bc fix: resolve remaining agent-hermes test failures
CI / check (pull_request) Failing after 12s
- Update issue-551 test: assert bun engines removed (not present)
- Migrate session-detail tests from bun:sqlite to better-sqlite3 API
  (db.exec for DDL, db.prepare().run() for inserts)

Refs #26
2026-06-03 14:39:20 +00:00
xiaoju e5e6de2fad chore: migrate from bun to pnpm + vitest + esbuild
- Replace bun:test with vitest across all packages
- Replace bun build with esbuild
- Replace bun:sqlite with better-sqlite3
- Fix OCAS Store API: store.put/get → store.cas.put/get
- Fix vitest vi.mock hoisting (vi.hoisted)
- Add pnpm-workspace.yaml and pnpm-lock.yaml
- Update all package.json test/build scripts

WIP: 8 failures remain in agent-hermes (bun engines check + sqlite migration)

Refs #26
2026-06-03 14:33:03 +00:00
xingyue 4306935dbc refactor: remove legacy symlink migration code
CI / check (pull_request) Failing after 7m43s
Remove migrateStorageIfNeeded() which created symlinks from
~/.uncaged/workflow → ~/.uwf and ~/.uncaged/json-cas → ~/.ocas.

This was temporary migration support. Users who still have old paths
can run a one-time copy manually.

Zero 'uncaged' references remain in active codebase.
2026-06-03 00:22:13 +08:00
xingyue 87b893bd28 refactor: remove all uncaged codename references
CI / check (pull_request) Failing after 8m0s
- Remove UNCAGED_CAS_DIR and UNCAGED_WORKFLOW_STORAGE_ROOT env var fallbacks
- Tests updated to use OCAS_DIR / UWF_STORAGE_ROOT
- All docs, READMEs, scripts, workflows, skills updated
- Only symlink migration code retains .uncaged paths (functional requirement)

Closes #12 (Phase 5 complete)
2026-06-03 00:08:45 +08:00
xingyue e2098e7371 docs: update stale comments and prompts referencing old storage formats
CI / check (pull_request) Failing after 10m12s
- architecture-reference.ts: threads.yaml/history.jsonl/registry.yaml → variable store, storage layout updated (~/.ocas/ + ~/.uwf/)
- protocol types.ts: JSDoc comments updated
- thread-index.ts: serialization comment updated
- util-agent context.ts: buildContext JSDoc updated

Only migration code in store.ts retains old file references (needed to read legacy files).
2026-06-02 23:59:07 +08:00
xingyue 5970456a54 refactor: align package folder names with npm package names
CI / check (pull_request) Failing after 8m30s
Rename packages/ subdirectories to match their @united-workforce/* scope:
  cli-workflow → cli
  workflow-agent-builtin → agent-builtin
  workflow-agent-claude-code → agent-claude-code
  workflow-agent-hermes → agent-hermes
  workflow-dashboard → dashboard
  workflow-protocol → protocol
  workflow-util-agent → util-agent
  workflow-util → util

Updated all tsconfig references, scripts, and active docs.
Historical docs (docs/plans/, docs/superpowers/) left as-is.

Closes #21
2026-06-02 23:45:45 +08:00
xingyue 34ce190e5f fix: agent createAgentStore uses wrong CAS path (~/.uwf/cas instead of ~/.ocas)
CI / check (pull_request) Failing after 8m23s
createAgentStore was calling getCasDir(storageRoot) which resolves to
~/.uwf/cas/, but since Phase 3 all CAS data lives in ~/.ocas/.
getActiveThreadEntry already used getGlobalCasDir() correctly, causing
a split where thread lookup succeeded but CAS node reads failed.

Found during e2e walkthrough after Phase 0-5 migration.
2026-06-02 23:19:14 +08:00
xingyue 3e12e6ebc0 refactor: migrate thread history from JSONL to ocas variable store (Phase 4c)
CI / check (pull_request) Failing after 10m39s
2026-06-02 22:53:24 +08:00
xingyue 93b96987a3 refactor: migrate threads index from YAML to ocas variable store (Phase 4b)
CI / check (pull_request) Failing after 12m38s
- Replace loadThreadsIndex/saveThreadsIndex with granular variable API:
  loadAllThreads, getThread, setThread, deleteThread
- Variable: @uwf/thread/<thread-id>, value=head hash, tags=suspend metadata
- Auto-migration: threads.yaml → variables, renames to .migrated
- Updated ~20 call sites in thread.ts, step.ts, shared.ts
- workflow-util-agent: getActiveThreadEntry reads from variable store
- New test helper: seedThread/seedThreads
- biome fix: removed unused imports
- 22 files changed

Ref #11
2026-06-02 22:22:38 +08:00
xingyue 8052473728 refactor: migrate workflow registry from YAML to ocas variable store (Phase 4a)
CI / check (pull_request) Failing after 8m40s
- UwfStore gains varStore: VariableStore (SQLite at ~/.ocas/variables.db)
- loadWorkflowRegistry reads from @uwf/registry/* variables
- saveWorkflowRegistry writes individual @uwf/registry/<name> variables
- Auto-migration: workflows.yaml → variables on first run, renames to .migrated
- Updated callers in workflow.ts and thread.ts
- Tests updated and passing

Ref #11
2026-06-02 21:58:58 +08:00
xingyue 1aacf11ad9 refactor: remove uwf cas subcommand, use ocas CLI
CI / check (pull_request) Failing after 9m57s
- Remove entire 'uwf cas' command group from CLI
- Delete commands/cas.ts (only used by CLI + tests)
- Delete cas.test.ts and cas-exit-code.test.ts
- Update workflow YAMLs: uwf cas get/has/refs/walk → ocas
- Update e2e-walkthrough script to use ocas
- Update docs and reference files
- Keep store-global-cas.test.ts (internal CAS store tests)

CAS operations now go through 'ocas' CLI exclusively.
Agent text storage handled internally by uwf pipeline.

Closes #10
2026-06-02 21:30:59 +08:00
xingyue eb8e98f67f refactor: migrate storage paths ~/.uncaged/workflow → ~/.uwf
CI / check (pull_request) Failing after 8m2s
- Default storage root: ~/.uncaged/workflow → ~/.uwf
- Default CAS root: ~/.uncaged/json-cas → ~/.ocas
- Env var priority: UWF_STORAGE_ROOT → WORKFLOW_STORAGE_ROOT → UNCAGED_WORKFLOW_STORAGE_ROOT (legacy)
- CAS env var: OCAS_DIR → UNCAGED_CAS_DIR (legacy)
- Auto-migration: symlink old paths on first run + deprecation warning
- Updated all comments, JSDoc, reference docs, CLAUDE.md
- New test: store-storage-root.test.ts

Closes #9
2026-06-02 21:14:48 +08:00
xingyue e067a2f25a refactor: rebrand npm packages @uncaged/* → @united-workforce/*
CI / check (pull_request) Failing after 9m51s
Package mapping:
- @uncaged/cli-workflow → @united-workforce/cli
- @uncaged/workflow-protocol → @united-workforce/protocol
- @uncaged/workflow-util → @united-workforce/util
- @uncaged/workflow-util-agent → @united-workforce/util-agent
- @uncaged/workflow-agent-hermes → @united-workforce/agent-hermes
- @uncaged/workflow-agent-claude-code → @united-workforce/agent-claude-code
- @uncaged/workflow-agent-builtin → @united-workforce/agent-builtin
- @uncaged/workflow-dashboard → @united-workforce/dashboard

Changes:
- 8 package.json name + dependency refs
- 82 files: import statements updated
- .changeset/config.json updated
- CLAUDE.md updated
- bunfig.toml restored for preload

CLI command (uwf) and directory names unchanged.

Closes shazhou/united-workforce#8
2026-06-02 20:56:06 +08:00
xingyue e65e2aec72 refactor: migrate test runner from vitest to bun:test
- Replace vitest with bun:test across all 8 packages (47 test files)
- vi.spyOn → spyOn, vi.restoreAllMocks() → mock.restore() (3 files)
- toHaveBeenCalledOnce → toHaveBeenCalledTimes(1) (bun:test compat)
- Delete all vitest.config.ts files
- Remove vitest from devDependencies
- Add preload.ts for process.exit mock (cli-workflow)
- Fix import ordering (biome check --write)

All tests pass. Closes #601
2026-06-02 18:55:17 +08:00