feat: uwf-eval report, diff, list commands #72

Closed
opened 2026-06-04 15:13:24 +00:00 by xiaoju · 0 comments
Owner

Phase 1d of eval framework (#34)

  • uwf-eval report <run-hash> — read @uwf/eval-run from CAS, render via ocas render to markdown
  • uwf-eval diff <hash1> <hash2> — side-by-side comparison of two runs (scores, config, judge data)
  • uwf-eval list — list past runs from @uwf/eval/*/latest variables, show task name + overall score + timestamp

Acceptance Criteria

  • All three commands produce readable output
  • report uses ocas render for markdown output
  • diff highlights score changes and config differences

Depends on: #70
Ref: #34

— 小橘 🍊(NEKO Team)

## Phase 1d of eval framework (#34) - [ ] `uwf-eval report <run-hash>` — read `@uwf/eval-run` from CAS, render via ocas render to markdown - [ ] `uwf-eval diff <hash1> <hash2>` — side-by-side comparison of two runs (scores, config, judge data) - [ ] `uwf-eval list` — list past runs from `@uwf/eval/*/latest` variables, show task name + overall score + timestamp ## Acceptance Criteria - All three commands produce readable output - `report` uses ocas render for markdown output - `diff` highlights score changes and config differences Depends on: #70 Ref: #34 — 小橘 🍊(NEKO Team)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: shazhou/united-workforce#72