united-workforce

Author	SHA1	Message	Date
xingyue	df244c52e8	Revert "Merge pull request 'chore: release — bump @ocas/* ^0.4.0, @shazhou/proman ^0.6.3' (#150 ) from release/bump-ocas-proman into main" CI / check (pull_request) Successful in 3m45s Details This reverts commit `9d0c6df62c`, reversing changes made to `00d960daba`.	2026-06-07 15:25:31 +08:00
xingyue	0f5bb1f191	chore: release — bump @ocas/* ^0.4.0, @shazhou/proman ^0.6.3 CI / check (pull_request) Successful in 2m35s Details Published: - @united-workforce/protocol@0.1.1 - @united-workforce/util-agent@0.1.2 - @united-workforce/agent-builtin@0.1.3 - @united-workforce/agent-claude-code@0.1.4 - @united-workforce/agent-hermes@0.1.5 - @united-workforce/agent-mock@0.1.3 - @united-workforce/cli@0.3.1 - @united-workforce/eval@0.1.6	2026-06-07 15:06:43 +08:00
xingyue	3a26285872	chore: bump @ocas/* to ^0.4.0 and @shazhou/proman to ^0.6.3 CI / check (pull_request) Successful in 3m28s Details	2026-06-07 14:12:03 +08:00
xiaoju	aa732f5466	chore: bump eval to 0.1.5 CI / check (push) Successful in 3m56s Details Fix workspace:^ not being replaced in 0.1.4 publish (was published with npm instead of pnpm). 小橘 🍊	2026-06-06 08:57:24 +00:00
xiaoju	e354fc4341	chore: bump eval to 0.1.4 CI / check (push) Successful in 3m1s Details 小橘 🍊（NEKO Team）	2026-06-06 08:02:33 +00:00
xiaoju	0e7e3ea44b	fix: invalid Crockford Base32 log tag in eval list command CI / check (pull_request) Successful in 3m57s Details CI / check (push) Successful in 3m31s Details L is not a valid Crockford Base32 character. Replace with H. 小橘 🍊（NEKO Team）	2026-06-06 07:57:00 +00:00
xiaoju	9260d81084	chore: version bump for --version fix CI / check (push) Successful in 3m2s Details agent-hermes@0.1.2 agent-claude-code@0.1.1 agent-builtin@0.1.1 agent-mock@0.1.1 eval@0.1.3 util@0.1.1 小橘 🍊（NEKO Team）	2026-06-05 08:12:50 +00:00
xiaoju	1cf8f350d0	fix: read eval CLI version from package.json CI / check (pull_request) Successful in 3m30s Details Fixes #95 小橘 🍊（NEKO Team）	2026-06-05 06:43:27 +00:00
xiaoju	427568a21d	chore: version bump agent-hermes@0.1.1 cli@0.1.1 eval@0.1.2 CI / check (push) Successful in 2m37s Details 小橘 🍊（NEKO Team）	2026-06-05 06:29:25 +00:00
xiaoju	825f0c641a	fix: resolve --agent override via config alias before raw command CI / check (pull_request) Successful in 3m37s Details When --agent is passed to uwf thread exec, try config.agents[alias] first (e.g. 'hermes' → config.agents.hermes = {command: 'uwf-hermes'}), then fall back to parseAgentOverride for raw command names. Also change eval CLI default --agent from 'hermes' to 'uwf-hermes' so it works without config alias lookup. Refs #91	2026-06-05 04:20:09 +00:00
xiaoju	81bbe1178f	chore: release @united-workforce/eval@0.1.1 CI / check (push) Successful in 2m45s Details	2026-06-05 03:02:05 +00:00
xiaoju	a08775896f	fix: frontmatter judge handles parsed object output CI / check (pull_request) Successful in 2m38s Details The extract pipeline stores step output as a JSON object in CAS, but the frontmatter judge only checked for raw markdown strings. Now accepts both formats: parsed objects check $status directly, raw strings go through YAML frontmatter extraction. Fixes eval frontmatter-compliance scoring 0 on valid outputs.	2026-06-05 02:55:58 +00:00
xiaoju	c892b9125b	chore: remove prepublishOnly guards (proman handles release) CI / check (push) Successful in 2m26s Details	2026-06-05 02:29:53 +00:00
xiaoju	5edb67b79d	chore: prepare 0.1.0 release CI / check (pull_request) Successful in 2m12s Details - Remove legacy .changeset/ directory (no longer used) - Add eval package to proman.yaml - Set eval package to public for npm publishing	2026-06-05 02:21:24 +00:00
xiaoju	ae81e4b5ac	feat: eval report, diff, list commands CI / check (pull_request) Successful in 1m44s Details Implement the 3 read commands for eval framework: - report: read eval-run from CAS, render formatted text (task, overall, config, judges table, thread ID) - diff: side-by-side comparison with ▲/▼ delta indicators and config change markers - list: scan @uwf/eval/*/latest variables, sort by timestamp desc, --task filter, --limit pagination Architecture: pure formatting functions (format.ts) + data access (read.ts) + thin CLI handlers. Types in types.ts. 11 new tests (formatReport, formatDiff, formatList, selectEntries) Refs #72	2026-06-05 00:19:25 +00:00
xiaoju	8c26f16716	feat: builtin judges — frontmatter + token-stats (deterministic) + upstream/hallucination (stubs) CI / check (pull_request) Successful in 1m45s Details Implement 4 builtin judges for eval framework: - frontmatter-compliance: validates YAML frontmatter with $status field, score = stepsValid / stepsTotal - token-stats: aggregates Usage from step nodes, always score 1.0 (informational only) - upstream-consumption: LLM-as-judge stub (score 0, TODO) - hallucination: LLM-as-judge stub (score 0, TODO) Infrastructure: - judge/builtin/read-steps.ts — shell out to uwf step list - judge/builtin/types.ts — BuiltinJudge, BuiltinJudgeOutput - runner/collect.ts — dispatch builtin judges by name 9 new tests (frontmatter validation + token aggregation) Refs #71	2026-06-05 00:09:06 +00:00
xiaoju	fae9e9ed3a	feat: eval run command — prepare, execute, collect pipeline CI / check (pull_request) Successful in 1m45s Details Implement the uwf-eval run <task-dir> command with 3-phase pipeline: - prepare: read task.yaml, copy fixture/ to temp workdir - execute: shell out to uwf thread start + exec - collect: run judges, compute weighted score, store CAS node, set @uwf/eval/<task>/latest variable Changes: - src/runner/ — types, prepare, execute, collect, index - src/storage/store.ts — createEvalStore(), setEvalLatest() - src/commands/run.ts — full pipeline wiring with --agent/--model/--count - 9 new tests (prepare + collect + weighted scoring) Builtin judges return placeholder score 0 (Phase 1c). Refs #70	2026-06-04 23:59:21 +00:00
xiaoju	99619d85db	feat: eval package scaffold with CLI, schemas, types, task loader CI / check (pull_request) Successful in 1m42s Details New package @united-workforce/eval (uwf-eval CLI): - CLI skeleton: run/report/diff/list subcommands (stubs) - 5 OCAS schemas: eval-run, judge-frontmatter, judge-upstream, judge-hallucination, judge-token-stats - TaskManifest type + parser/validator for task.yaml - JudgeOutput/JudgeInput types for judge contract - EvalRunPayload/EvalRunConfig/EvalJudgeRecord storage types - 19 unit tests: task loader validation + schema definitions Refs #69	2026-06-04 23:42:16 +00:00

18 Commits