chore/remove-eval-plan
main
Plan fully implemented: CLI (run/report/diff/list), runner, 4 builtin judges, CAS storage, task loader, 6 test files. New card captures design decisions.