chore: bump eval to 0.1.4

小橘 🍊（NEKO Team）
fix: invalid Crockford Base32 log tag in eval list command
2026-06-06 08:02:33 +00:00 · 2026-06-06 07:57:00 +00:00 · 2026-06-06 04:40:27 +00:00 · 2026-06-06 04:23:12 +00:00 · 2026-06-06 04:16:13 +00:00 · 2026-06-06 04:11:13 +00:00
11 changed files with 176 additions and 76 deletions
@@ -1,12 +0,0 @@
---
-"@united-workforce/agent-hermes": patch
-"@united-workforce/agent-claude-code": patch
-"@united-workforce/util-agent": patch
---
-
-feat: inject thread progress into agent prompt (#127)
-
-Agents now receive a "Thread Progress" section in their prompt showing the
-current step number and how many times the current role has spoken before.
-This eliminates the need for agents to make tool calls (terminal, delegate_task)
-just to count their own turn history.
@@ -1,63 +1,131 @@
-name: "debate"
-description: "Structured debate between two sides. Tests cross-process session resume."
+name: debate
+description: "Multi-role structured debate with critical thinking framework and host summary."
+
+# Shared frontmatter schema for debater roles (YAML anchor)
+x-debater-frontmatter: &debater-frontmatter
+  type: object
+  oneOf:
+    - properties:
+        $status: { const: speak }
+        argument: { type: string }
+      required: [$status, argument]
+    - properties:
+        $status: { const: conceded }
+        reason: { type: string }
+      required: [$status, reason]
+    - properties:
+        $status: { const: final }
+        closing: { type: string }
+      required: [$status, closing]
+
 roles:
-  against:
-    description: "Argues against the proposition"
-    goal: |
-      You are a skilled debater arguing AGAINST the proposition.
-      Be logical, cite evidence, and directly address your opponent's points.
-      Keep each argument concise (under 200 words).
-    capabilities:
-      - argumentation
-      - critical-thinking
+  proponent:
+    description: "Argues FOR the proposition"
+    goal: "Build a compelling case for the proposition through logical reasoning and evidence"
+    capabilities: []
    procedure: |
-      1. If this is the opening, present your strongest argument against the proposition.
-      2. If responding to the other side, directly counter their points with evidence and logic.
-      3. If you find yourself genuinely convinced by the other side, you may concede.
-    output: |
-      Provide your argument in the frontmatter.
-      Set status to "conceded" ONLY if you are genuinely convinced and wish to stop debating.
-      Otherwise set status to "continue".
+      You are an experienced scholar arguing FOR the proposition.
+
+      ## Critical Thinking Framework (execute before every speech)
+
+      ### A. Pre-speech reflection (internal, do not output)
+      - Does every step in my argument chain hold? Any hidden assumptions or logical gaps?
+      - If I were my opponent, how would I attack this? Where am I weakest?
+      - Does my evidence actually support my claim, or could it backfire?
+      - Should I go on offense or defense this round?
+
+      ### B. Evidence discipline
+      - Verify key numbers — watch for order-of-magnitude errors
+      - Assess data freshness — fast-moving fields have short half-lives
+      - Distinguish primary data from secondary citations, expert opinion, and common assumptions
+
+      ### C. Anti-fragility
+      - Anticipate counterarguments; preemptively strengthen or strategically abandon weak points
+      - Catch logical gaps, data misuse, or outdated claims in your opponent's reasoning
+
+      ## Rules
+      1. Check Thread Progress to see how many times you have spoken.
+      2. On your 3rd speech, you MUST output $status: final (closing statement).
+      3. If genuinely convinced by the opponent, output $status: conceded.
+      4. Otherwise output $status: speak and counter the opponent's points.
+      5. Be rigorous, cite evidence, stay concise.
+    output: "Debate argument"
+    frontmatter: *debater-frontmatter
+
+  opponent:
+    description: "Argues AGAINST the proposition"
+    goal: "Build a compelling case against the proposition through logical reasoning and evidence"
+    capabilities: []
+    procedure: |
+      You are an experienced scholar arguing AGAINST the proposition.
+
+      ## Critical Thinking Framework (execute before every speech)
+
+      ### A. Pre-speech reflection (internal, do not output)
+      - Does every step in my argument chain hold? Any hidden assumptions or logical gaps?
+      - If I were my opponent, how would I attack this? Where am I weakest?
+      - Does my evidence actually support my claim, or could it backfire?
+      - Should I go on offense or defense this round?
+
+      ### B. Evidence discipline
+      - Verify key numbers — watch for order-of-magnitude errors
+      - Assess data freshness — fast-moving fields have short half-lives
+      - Distinguish primary data from secondary citations, expert opinion, and common assumptions
+
+      ### C. Anti-fragility
+      - Anticipate counterarguments; preemptively strengthen or strategically abandon weak points
+      - Catch logical gaps, data misuse, or outdated claims in your opponent's reasoning
+
+      ## Rules
+      1. Check Thread Progress to see how many times you have spoken.
+      2. On your 3rd speech, or when the proponent has issued a final statement, you MUST output $status: final.
+      3. If genuinely convinced by the proponent, output $status: conceded.
+      4. Otherwise output $status: speak and counter the proponent's points.
+      5. Be rigorous, cite evidence, stay concise.
+    output: "Debate argument"
+    frontmatter: *debater-frontmatter
+
+  host:
+    description: "Debate moderator — delivers impartial summary and verdict"
+    goal: "Objectively review the debate, analyze both sides, and deliver a verdict"
+    capabilities: []
+    procedure: |
+      You are an experienced academic debate moderator.
+
+      ## Task
+      1. Outline each side's core arguments
+      2. Evaluate reasoning quality and evidence use
+      3. Highlight the most impactful exchanges
+      4. Analyze the deeper significance of the topic
+      5. Deliver an overall verdict
+
+      ## Style
+      - Impartial but with independent judgment
+      - Substantive, not superficial
+    output: "Debate summary report"
    frontmatter:
      type: object
      properties:
-        $status:
-          enum: ["continue", "conceded"]
-        argument:
-          type: string
-      required: [$status, argument]
-  for:
-    description: "Argues for the proposition"
-    goal: |
-      You are a skilled debater arguing FOR the proposition.
-      Be logical, cite evidence, and directly address your opponent's points.
-      Keep each argument concise (under 200 words).
-    capabilities:
-      - argumentation
-      - critical-thinking
-    procedure: |
-      1. Read the opposing side's latest argument carefully.
-      2. Counter their points with evidence and logic.
-      3. If you find yourself genuinely convinced by the other side, you may concede.
-    output: |
-      Provide your argument in the frontmatter.
-      Set status to "conceded" ONLY if you are genuinely convinced and wish to stop debating.
-      Otherwise set status to "continue".
-    frontmatter:
-      type: object
-      properties:
-        $status:
-          enum: ["continue", "conceded"]
-        argument:
-          type: string
-      required: [$status, argument]
+        $status: { const: done }
+        summary: { type: string }
+        highlights: { type: string }
+        verdict: { type: string }
+      required: [$status, summary, highlights, verdict]
+
 graph:
  $START:
-    new: { role: "against", prompt: "Present your opening argument against the proposition." }
-    resume: { role: "against", prompt: "Review the previous debate output and continue the argument against the proposition." }
-  against:
-    conceded: { role: "$END", prompt: "The against side conceded. Debate over." }
-    continue: { role: "for", prompt: "Counter the opposing argument: {{{argument}}}" }
-  for:
-    conceded: { role: "$END", prompt: "The for side conceded. Debate over." }
-    continue: { role: "against", prompt: "Counter the opposing argument: {{{argument}}}" }
+    new: { role: proponent, prompt: "The debate begins. You are arguing FOR the proposition. Present your opening argument." }
+    resume: { role: proponent, prompt: "The debate continues." }
+
+  proponent:
+    speak: { role: opponent, prompt: "Proponent argues:\n\n{{{argument}}}\n\nYou are the opponent. Counter this argument." }
+    conceded: { role: host, prompt: "The proponent conceded: {{{reason}}}\n\nPlease summarize the debate." }
+    final: { role: opponent, prompt: "Proponent's closing statement:\n\n{{{closing}}}\n\nYou are the opponent. Deliver your final response." }
+
+  opponent:
+    speak: { role: proponent, prompt: "Opponent argues:\n\n{{{argument}}}\n\nYou are the proponent. Counter this argument." }
+    conceded: { role: host, prompt: "The opponent conceded: {{{reason}}}\n\nPlease summarize the debate." }
+    final: { role: host, prompt: "Opponent's closing statement:\n\n{{{closing}}}\n\nThe debate is over. Please summarize." }
+
+  host:
+    done: { role: "$END", prompt: "Summary complete." }
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/agent-claude-code",
-  "version": "0.1.2",
+  "version": "0.1.3",
  "files": [
    "src",
    "dist",
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/agent-hermes",
-  "version": "0.1.3",
+  "version": "0.1.4",
  "files": [
    "src",
    "dist",
@@ -12,7 +12,11 @@ const OWN_VERSION = (
  }
 ).version;

-const HERMES_COMMAND = "hermes";
+/** Resolve hermes binary: `UWF_HERMES_BIN` override → default `"hermes"` via PATH. */
+function resolveHermesCommand(): string {
+  const override = process.env.UWF_HERMES_BIN;
+  return override !== undefined && override !== "" ? override : "hermes";
+}
 const PROTOCOL_VERSION = 1;

 type JsonRpcResponse = {
@@ -271,7 +275,8 @@ export class HermesAcpClient {
      return;
    }

-    const child = spawn(HERMES_COMMAND, ["acp"], {
+    const hermesCommand = resolveHermesCommand();
+    const child = spawn(hermesCommand, ["acp"], {
      env: process.env,
      shell: false,
      stdio: ["pipe", "pipe", "pipe"],
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/eval",
-  "version": "0.1.3",
+  "version": "0.1.4",
  "private": false,
  "files": [
    "src",
@@ -6,7 +6,7 @@ import { formatList, selectEntries } from "./format.js";
 import { readEvalEntries } from "./read.js";

 const log = createLogger({ sink: { kind: "stderr" } });
-const LOG_LIST = "L5KX9R2B";
+const LOG_LIST = "H5KX9R2B";

 type ListCliOptions = {
  task: string | undefined;
@@ -225,4 +225,34 @@ describe("buildOutputFormatInstruction", () => {
    const result = buildOutputFormatInstruction({});
    expect(result).toContain("Focus exclusively on YOUR role");
  });
+
+  test("renders const value as literal in flat schema example", () => {
+    const schema = {
+      type: "object",
+      properties: {
+        $status: { type: "string", const: "greeted" },
+        message: { type: "string" },
+      },
+      required: ["$status", "message"],
+    };
+    const result = buildOutputFormatInstruction(schema);
+    expect(result).toContain("$status: greeted");
+    expect(result).toContain("fixed value");
+    expect(result).not.toContain("$status: <string>");
+  });
+
+  test("renders const value for non-string types", () => {
+    const schema = {
+      type: "object",
+      properties: {
+        count: { type: "number", const: 42 },
+        done: { type: "boolean", const: true },
+      },
+      required: ["count", "done"],
+    };
+    const result = buildOutputFormatInstruction(schema);
+    expect(result).toContain("count: 42");
+    expect(result).toContain("done: true");
+    expect(result).toContain("fixed value");
+  });
 });
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/util-agent",
-  "version": "0.1.0",
+  "version": "0.1.1",
  "files": [
    "src",
    "dist",
@@ -74,6 +74,10 @@ function collectObjectSchemas(schema: JSONSchema): JSONSchema[] {
 }

 function resolvePropertySchema(prop: JSONSchema): JSONSchema {
+  if (prop.const !== undefined) {
+    return prop;
+  }
+
  if (Array.isArray(prop.enum) && prop.enum.length > 0) {
    return prop;
  }
@@ -113,6 +117,11 @@ function buildPropertyExampleLine(prop: SchemaProperty): string {
    commentParts.push("required");
  }

+  if (resolved.const !== undefined) {
+    commentParts.push("fixed value");
+    return `${prop.name}: ${formatYamlScalar(resolved.const)}${buildPropertyComment(commentParts)}`;
+  }
+
  if (Array.isArray(resolved.enum) && resolved.enum.length > 0) {
    const enumValues = resolved.enum.map((v) => String(v));
    commentParts.push(...enumValues);
@@ -1,6 +1,6 @@
 {
  "name": "@united-workforce/util",
-  "version": "0.1.3",
+  "version": "0.1.4",
  "files": [
    "src",
    "dist",
Author	SHA1	Message	Date
xiaoju	e354fc4341	chore: bump eval to 0.1.4 CI / check (push) Successful in 3m1s Details 小橘 🍊（NEKO Team）	2026-06-06 08:02:33 +00:00
xiaoju	0e7e3ea44b	fix: invalid Crockford Base32 log tag in eval list command CI / check (pull_request) Successful in 3m57s Details CI / check (push) Successful in 3m31s Details L is not a valid Crockford Base32 character. Replace with H. 小橘 🍊（NEKO Team）	2026-06-06 07:57:00 +00:00
xiaoju	aa454c85dd	chore: bump versions for release CI / check (push) Successful in 2m56s Details - @united-workforce/util: 0.1.3 → 0.1.4 - @united-workforce/util-agent: 0.1.0 → 0.1.1 - @united-workforce/agent-hermes: 0.1.3 → 0.1.4 - @united-workforce/agent-claude-code: 0.1.2 → 0.1.3	2026-06-06 04:40:27 +00:00
xiaomo	6dd7d521be	Merge pull request 'chore: deduplicate debate frontmatter with YAML anchor' (#135 ) from chore/debate-yaml-cleanup into main CI / check (push) Successful in 2m40s Details Merge PR #135: chore: deduplicate debate frontmatter with YAML anchor	2026-06-06 04:23:12 +00:00
xiaoju	950dc056d8	chore: deduplicate debate frontmatter with YAML anchor CI / check (pull_request) Successful in 2m22s Details Use &debater-frontmatter anchor for the shared oneOf schema between proponent and opponent roles. Procedure blocks remain duplicated since YAML anchors cannot be embedded inside block scalars. capabilities: [] kept — required by WorkflowPayload type. Addresses review suggestions from #133.	2026-06-06 04:16:13 +00:00
xiaomo	d360b85374	Merge pull request 'docs: upgrade debate example + fix: UWF_HERMES_BIN env support' (#133 ) from docs/upgrade-debate-example into main CI / check (push) Successful in 3m1s Details Merge PR #133: docs: upgrade debate example + fix: UWF_HERMES_BIN env support	2026-06-06 04:11:13 +00:00
xiaoju	509dfad857	fix: support UWF_HERMES_BIN env var for hermes binary path CI / check (pull_request) Successful in 3m28s Details Replace hardcoded HERMES_COMMAND constant with resolveHermesCommand() that checks UWF_HERMES_BIN first, falling back to 'hermes' via PATH. This fixes environments where hermes is installed in a venv or non-standard location that isn't in the non-login shell PATH (e.g. ~/.local/bin symlink only available in login shell). Refs #134	2026-06-06 03:59:08 +00:00
xiaoju	58b84e3b3c	docs: upgrade debate example — 3 roles, oneOf routing, bounded termination CI / check (pull_request) Failing after 11m23s Details Replace the original 2-role debate with a 3-role version featuring: - proponent/opponent/host roles (was: for/against) - oneOf + const status routing (was: enum) - Critical thinking framework in procedure (pre-speech reflection, evidence discipline, anti-fragility) - Bounded termination via Thread Progress (3rd speech → final) - Host role for impartial summary and verdict Based on xiaonuo's debate workflow design.	2026-06-06 03:30:54 +00:00
xiaomo	f821ac99f4	Merge pull request 'docs: add upgrading section to usage reference' (#132 ) from feat/usage-upgrade-hint into main CI / check (push) Successful in 2m8s Details	2026-06-06 03:00:09 +00:00
xiaomo	4410afcd4a	Merge pull request 'fix: render const values as literals in output format instruction (#129 )' (#130 ) from fix/129-const-prompt into main CI / check (push) Successful in 2m29s Details	2026-06-06 01:44:24 +00:00
xiaoju	a0e254a681	fix: render const values as literals in output format instruction (#129 ) CI / check (pull_request) Successful in 1m48s Details buildOutputFormatInstruction now renders const fields with their actual value (e.g. $status: greeted) instead of the type placeholder (<string>). Also adds early return in resolvePropertySchema for const properties. Fixes #129	2026-06-06 01:12:13 +00:00