refactor: simplify ExtractFn to (schema, contentHash)

- Remove extractPrompt from RoleDefinition - Remove ExtractContext type - ExtractFn now takes (schema, contentHash) instead of (schema, prompt, ExtractContext) - createExtract reads CAS content by hash, keeps ReAct loop with cas_get - Coder schema uses .describe() for phase hash hint - All role definitions, CLI templates, and skill output updated Refs #180, closes #174, closes #181
2026-05-11 07:53:04 +00:00
parent da6bcb10d6
commit 1742ced6df
22 changed files with 27 additions and 134 deletions
@@ -50,7 +50,6 @@ const greeterMetaSchema = z.object({
 export const greeterRole: RoleDefinition<HelloTemplateMeta["greeter"]> = {
  description: "Says hello — replace with your first role.",
  systemPrompt: "You are a helpful assistant. Reply with one short friendly sentence.",
-  extractPrompt: "Extract the assistant's greeting as message.",
  schema: greeterMetaSchema,
  extractRefs: null,
 };
@@ -93,18 +93,18 @@ Init 生成的骨架：\`templates/\` 下放可复用定义，\`workflows/\` 下
 ## 2. 核心概念

 - **RoleMeta**：\`Record<string, Record<string, unknown>>\`，角色名 → 该角色结构化 meta 的形状约定。
- **RoleDefinition<Meta>**：纯数据——\`description\`、\`systemPrompt\`、\`extractPrompt\`、\`schema\`（Zod v4）。不含执行逻辑。
+- **RoleDefinition<Meta>**：纯数据——\`description\`、\`systemPrompt\`、\`schema\`（Zod v4）。不含执行逻辑。
 - **WorkflowDefinition<M extends RoleMeta>**：\`description\` + \`roles\`（各角色定义）+ **Moderator**。
 - **Moderator**：\`(ctx: ModeratorContext<M>) => (角色名) | END\`。同步、纯函数，只做路由。
 - **AgentFn**：\`(ctx: AgentContext) => Promise<string>\`，原始文本输出；从上下文读取当前角色的 \`systemPrompt\`。
- **ExtractFn**：从上下文与 prompt 解析结构化数据（引擎与 Agent 都可使用）。
+- **ExtractFn**：从 CAS content hash 解析结构化数据（引擎与 Agent 都可使用）。

 引擎循环简述：**Moderator** → 选角色 → **Agent** 产出文本 → **Extract** 写入 **meta** → 追加 step，重复直至 **END**。详见 \`docs/architecture.md\` 中的三阶段说明。

 ## 3. 开发流程

 1. **定义 RoleMeta**：为每个角色约定 meta 的 TypeScript 类型（与 Zod schema 对齐）。
-2. **编写 RoleDefinition**：为每个角色写 Zod \`schema\`，补齐 \`systemPrompt\` / \`extractPrompt\` / \`description\`。
+2. **编写 RoleDefinition**：为每个角色写 Zod \`schema\`，补齐 \`systemPrompt\` / \`description\`。
 3. **编写 Moderator**：根据 \`ctx.steps\` 与业务状态返回下一个角色名或 \`END\`。
 4. **组装 WorkflowDefinition**：在模板 \`index\` 中导出 definition（以及必要的角色 / moderator 导出）。
 5. **实例化**：在 workflow 包中使用 \`createWorkflow(def, binding)\`（或项目约定的封装）绑定 **AgentFn**；**ExtractFn** 由引擎从 **workflow.yaml** 注入 \`WorkflowRuntime\`。
@@ -223,7 +223,6 @@ Each role has:
 |-------|------|---------|
 | \`description\` | string | What the role does |
 | \`systemPrompt\` | string | System prompt for the agent |
-| \`extractPrompt\` | string | Instruction for extracting structured meta |
 | \`schema\` | ZodSchema | Validates the extracted meta |
 | \`extractRefs\` | fn or null | Extracts CAS hashes from meta for DAG linking |

@@ -1,12 +1,11 @@
 import { describe, expect, test } from "bun:test";
-import type { ExtractContext, ExtractFn } from "@uncaged/workflow-runtime";
+import type { ExtractFn } from "@uncaged/workflow-runtime";
 import type * as z from "zod/v4";
 import { createCursorAgent, validateCursorAgentConfig } from "../src/index.js";

 const testExtract: ExtractFn = async <T extends Record<string, unknown>>(
  _schema: z.ZodType<T>,
-  _prompt: string,
-  _ctx: ExtractContext,
+  _contentHash: string,
 ): Promise<{ meta: T; contentPayload: string; refs: string[] }> => ({
  meta: { workspace: "/tmp" } as unknown as T,
  contentPayload: "",
@@ -1,4 +1,4 @@
-import type { AgentFn, ExtractContext } from "@uncaged/workflow-runtime";
+import type { AgentFn } from "@uncaged/workflow-runtime";
 import { buildAgentPrompt, type SpawnCliError, spawnCli } from "@uncaged/workflow-util-agent";
 import * as z from "zod/v4";

@@ -44,16 +44,7 @@ export function createCursorAgent(config: CursorAgentConfig): AgentFn {
  const timeoutMs = config.timeout > 0 ? config.timeout : null;

  return async (ctx) => {
-    const extractCtx: ExtractContext = {
-      ...ctx,
-      agentContent: "",
-    };
-    const extracted = await config.extract(
-      cursorWorkspaceSchema,
-      "From the thread context, determine the absolute filesystem path where the project/repository is located.",
-      extractCtx,
-    );
-    const { workspace } = extracted.meta;
+    const { workspace } = ctx.currentRole as unknown as { workspace: string };
    const fullPrompt = await buildAgentPrompt(ctx);
    const args = [
      "-p",
@@ -2,8 +2,7 @@ import { afterEach, describe, expect, test } from "bun:test";
 import { mkdtemp, rm } from "node:fs/promises";
 import { tmpdir } from "node:os";
 import { join } from "node:path";
-import { createCasStore } from "@uncaged/workflow-cas";
-import { type ExtractContext, START } from "@uncaged/workflow-runtime";
+import { createCasStore, putContentNodeWithRefs } from "@uncaged/workflow-cas";
 import * as z from "zod/v4";

 import { createExtract } from "../src/extract/extract-fn.js";
@@ -45,21 +44,9 @@ describe("createExtract — ExtractResult shape", () => {
      );

      const schema = z.object({ confidence: z.number() });
-      const ctx: ExtractContext = {
-        threadId: "01THREADTESTAAAAAAAAAAAAAA",
-        depth: 0,
-        start: {
-          role: START,
-          content: "task text",
-          meta: { maxRounds: 10 },
-          timestamp: 100,
-        },
-        steps: [],
-        currentRole: { name: "analyst", systemPrompt: "be precise" },
-        agentContent: "model says hello",
-      };
+      const contentHash = await putContentNodeWithRefs(cas, "model says hello", []);

-      const out = await extract(schema, "extract fields", ctx);
+      const out = await extract(schema, contentHash);

      expect(out.meta).toEqual({ confidence: 0.9 });
      expect(out.contentPayload).toBe("model says hello");
@@ -1,7 +1,6 @@
 import { type CasStore, getContentMerklePayload } from "@uncaged/workflow-cas";
 import { createLlmFn, createThreadReactor } from "@uncaged/workflow-reactor";
 import type {
-  ExtractContext,
  ExtractFn,
  ExtractResult,
  LlmProvider,
@@ -31,7 +30,7 @@ const CAS_GET_TOOL_DEFINITION = {
  },
 };

-export type ExtractThreadContext = {
+type ExtractThreadContext = {
  cas: CasStore;
 };

@@ -39,41 +38,6 @@ function isRecord(value: unknown): value is Record<string, unknown> {
  return typeof value === "object" && value !== null && !Array.isArray(value);
 }

-/** Builds the user-side extraction prompt (thread + agent output + instruction). */
-export async function buildExtractUserContent(
-  ctx: ExtractContext,
-  prompt: string,
-  deps: ExtractDeps,
-): Promise<string> {
-  const lines: string[] = [];
-  lines.push(`## Role: ${ctx.currentRole.name}`);
-  lines.push(ctx.currentRole.systemPrompt);
-  lines.push("");
-  lines.push("## Task");
-  lines.push(ctx.start.content);
-  lines.push("");
-  if (ctx.steps.length > 0) {
-    lines.push("## Thread History");
-    for (const step of ctx.steps) {
-      const body = await getContentMerklePayload(deps.cas, step.contentHash);
-      if (body === null) {
-        throw new Error(`extract: missing CAS blob for step ${step.role}: ${step.contentHash}`);
-      }
-      lines.push(`### ${step.role}`);
-      lines.push(body);
-      lines.push(`Meta: ${JSON.stringify(step.meta)}`);
-      lines.push("");
-    }
-  }
-  lines.push("## Agent Output");
-  lines.push(ctx.agentContent);
-  lines.push("");
-  lines.push("## Extraction Instruction");
-  lines.push(prompt);
-
-  return lines.join("\n");
-}
-
 /**
 * Create an ExtractFn backed by an LLM provider.
 *
@@ -102,7 +66,7 @@ export function createExtract(provider: LlmProvider, deps: ExtractDeps): Extract
      };
    },
    systemPromptForStructuredTool: (structuredToolName) =>
-      `You extract structured metadata from the agent output below. Use cas_get to read Merkle DAG nodes from CAS (YAML: type, payload, refs for content nodes or children for step/thread legacy nodes) when the agent output references hashes you must traverse. When you have the complete structured object, call the ${structuredToolName} tool with JSON arguments matching the schema. You may instead reply with only a JSON object (no prose) when no tools are needed.`,
+      `You extract structured metadata from content. The content is from a CAS node. Use cas_get to read referenced nodes if needed. When ready, call the ${structuredToolName} tool with JSON matching the schema. You may instead reply with only a JSON object (no prose) when no tools are needed.`,
    toolHandler: async (call, thread) => {
      if (call.function.name !== "cas_get") {
        return `Unexpected tool routed to handler: ${call.function.name}`;
@@ -124,10 +88,13 @@ export function createExtract(provider: LlmProvider, deps: ExtractDeps): Extract

  return async <T extends Record<string, unknown>>(
    schema: z.ZodType<T>,
-    prompt: string,
-    ctx: ExtractContext,
+    contentHash: string,
  ): Promise<ExtractResult<T>> => {
-    const text = await buildExtractUserContent(ctx, prompt, deps);
+    const payload = await getContentMerklePayload(deps.cas, contentHash);
+    if (payload === null) {
+      throw new Error(`extract: missing CAS content node for hash ${contentHash}`);
+    }
+    const text = `${payload}\n\nExtract structured metadata according to the schema.`;
    const result = await reactor({
      thread: { cas: deps.cas },
      input: text,
@@ -138,7 +105,7 @@ export function createExtract(provider: LlmProvider, deps: ExtractDeps): Extract
    }
    return {
      meta: result.value,
-      contentPayload: ctx.agentContent,
+      contentPayload: payload,
      refs: [],
    };
  };
@@ -1,8 +1,4 @@
-export {
-  buildExtractUserContent,
-  createExtract,
-  type ExtractThreadContext,
-} from "./extract-fn.js";
+export { createExtract } from "./extract-fn.js";
 export {
  extractFunctionToolFromZodSchema,
  llmErrorToCause,
@@ -37,9 +37,7 @@ export { EMPTY_CHAIN_STATE } from "./engine/types.js";
 export { getWorkerHostScriptPath } from "./engine/worker-entry-path.js";
 export type { ExtractFn, LlmError, LlmExtractArgs } from "./extract/index.js";
 export {
-  buildExtractUserContent,
  createExtract,
-  type ExtractThreadContext,
  extractFunctionToolFromZodSchema,
  llmErrorToCause,
  llmExtract,
@@ -14,7 +14,6 @@ export type {
  AgentContext,
  AgentFn,
  CasStore,
-  ExtractContext,
  ExtractFn,
  ExtractResult,
  FALLBACK,
@@ -76,10 +76,6 @@ export type AgentContext<M extends RoleMeta = RoleMeta> = ModeratorContext<M> &
  };
 };

-export type ExtractContext<M extends RoleMeta = RoleMeta> = AgentContext<M> & {
-  agentContent: string;
-};
-
 // ── Workflow Completion ────────────────────────────────────────────

 export type WorkflowCompletion = {
@@ -128,8 +124,7 @@ export type ExtractResult<T extends Record<string, unknown>> = {

 export type ExtractFn = <T extends Record<string, unknown>>(
  schema: z.ZodType<T>,
-  prompt: string,
-  ctx: ExtractContext,
+  contentHash: string,
 ) => Promise<ExtractResult<T>>;

 export type AgentFn = (ctx: AgentContext) => Promise<string>;
@@ -154,7 +149,6 @@ export type WorkflowFn = (
 export type RoleDefinition<Meta extends Record<string, unknown>> = {
  description: string;
  systemPrompt: string;
-  extractPrompt: string;
  schema: z.ZodType<Meta>;
  extractRefs: ((meta: Meta) => string[]) | null;
 };
@@ -7,7 +7,6 @@ import {
  type AgentContext,
  type AgentFn,
  END,
-  type ExtractContext,
  type ModeratorContext,
  type RoleDefinition,
  type RoleMeta,
@@ -89,15 +88,11 @@ async function advanceOneRound<M extends RoleMeta>(
  const agent = agentForRole(binding, next);
  const raw = await agent(agentCtx as unknown as AgentContext);

-  const extractCtx: ExtractContext<M> = {
-    ...agentCtx,
-    agentContent: raw,
-  };
+  const agentContentHash = await putContentNodeWithRefs(runtime.cas, raw, []);

  const extracted = await runtime.extract(
    roleDef.schema as z.ZodType<Record<string, unknown>>,
-    roleDef.extractPrompt,
-    extractCtx as unknown as ExtractContext,
+    agentContentHash,
  );

  const refsFromMeta = resolveExtractedRefs(
@@ -106,11 +101,9 @@ async function advanceOneRound<M extends RoleMeta>(
  );
  const artifactRefs = mergeUniqueHashes(extracted.refs, refsFromMeta);

-  const contentHash = await putContentNodeWithRefs(
-    runtime.cas,
-    extracted.contentPayload,
-    artifactRefs,
-  );
+  const contentHash = artifactRefs.length === 0
+    ? agentContentHash
+    : await putContentNodeWithRefs(runtime.cas, extracted.contentPayload, artifactRefs);
  const refs = artifactRefs.includes(contentHash) ? artifactRefs : [...artifactRefs, contentHash];

  const step = {
@@ -6,7 +6,6 @@ export type {
  AgentContext,
  AgentFn,
  CasStore,
-  ExtractContext,
  ExtractFn,
  ExtractResult,
  FALLBACK,
@@ -8,7 +8,6 @@ export type {
  AgentContext,
  AgentFn,
  CasStore,
-  ExtractContext,
  ExtractFn,
  ExtractResult,
  FALLBACK,
@@ -2,7 +2,7 @@ import type { RoleDefinition } from "@uncaged/workflow-runtime";
 import * as z from "zod/v4";

 export const coderMetaSchema = z.object({
-  completedPhase: z.string(),
+  completedPhase: z.string().describe("The planner phase hash finished this round. If multiple phases were completed, use the last finished phase hash."),
  filesChanged: z.array(z.string()),
  summary: z.string(),
 });
@@ -27,8 +27,6 @@ export const coderRole: RoleDefinition<CoderMeta> = {
  description:
    "Implements the next incomplete planner phase and reports structured completion metadata.",
  systemPrompt: CODER_SYSTEM,
-  extractPrompt:
-    "Extract completedPhase: the planner phase hash finished this round (exact hash string from the plan). If multiple phases were finished in one round, use the last finished phase hash. Extract filesChanged and a summary of the work.",
  schema: coderMetaSchema,
  extractRefs: (meta) => [meta.completedPhase],
 };
@@ -28,8 +28,6 @@ Do not attempt to fix failures yourself.`;
 export const committerRole: RoleDefinition<CommitterMeta> = {
  description: "Creates a branch and commits changes.",
  systemPrompt: COMMITTER_SYSTEM,
-  extractPrompt:
-    "Extract the commit result: committed (with branch and SHA), recoverable failure, or unrecoverable failure. Include error details and log references if applicable.",
  schema: committerMetaSchema,
  extractRefs: null,
 };
@@ -44,8 +44,6 @@ Order phases so earlier steps unblock later ones. Cover root cause, edge cases,
 export const plannerRole: RoleDefinition<PlannerMeta> = {
  description: "Breaks the task into sequential phases for the coder.",
  systemPrompt: PLANNER_SYSTEM,
-  extractPrompt:
-    "Extract the implementation phases from the agent's output. Each phase has a hash (the CAS content-hash returned by the cas put command) and a title (one-line summary).",
  schema: plannerMetaSchema,
  extractRefs: (meta) => meta.phases.map((p) => p.hash),
 };
@@ -37,8 +37,6 @@ Be thorough. A false approve costs more than a false reject.`;
 export const reviewerRole: RoleDefinition<ReviewerMeta> = {
  description: "Runs git diff checks and sets approved when the change is ready.",
  systemPrompt: REVIEWER_SYSTEM,
-  extractPrompt:
-    "Extract the review verdict: approved or rejected. If rejected, list the blocking issues.",
  schema: reviewerMetaSchema,
  extractRefs: null,
 };
@@ -19,8 +19,6 @@ const TESTER_SYSTEM = `You are a tester. Run the project's test suite, build, an
 export const testerRole: RoleDefinition<TesterMeta> = {
  description: "Runs test, build, and lint commands and reports pass or fail with details.",
  systemPrompt: TESTER_SYSTEM,
-  extractPrompt:
-    "Extract the verification result: passed with summary details, or failed with details of what broke.",
  schema: testerMetaSchema,
  extractRefs: null,
 };
@@ -16,21 +16,10 @@ The actual implementation (planning → coding → reviewing → testing → com

 Pass through the task and let the child workflow do the work.`;

-const DEVELOPER_EXTRACT_PROMPT = `The agent output is the root CAS hash of a child workflow thread. Use the cas_get tool to traverse the Merkle DAG and extract the developer summary.
-
-Procedure:
-1. cas_get(<rootHash>) — the root node lists all child step hashes (planner, coder, reviewer, tester, committer).
-2. Find the committer step. cas_get its hash to read the committer's meta — extract branch and commitSha from there.
-3. Find every coder step. cas_get each to read the coder's filesChanged. Union all filesChanged across coder steps.
-4. Compose a short human-readable summary describing what the develop child workflow accomplished (drawn from the coder summaries, or a synthesis of them).
-
-Return: { branch, commitSha, filesChanged, summary }.`;
-
 export const developerRole: RoleDefinition<DeveloperMeta> = {
  description:
    "Delegates the actual implementation to the develop workflow (workflow-as-agent). Produces a summary by traversing the child thread's Merkle DAG.",
  systemPrompt: DEVELOPER_SYSTEM,
-  extractPrompt: DEVELOPER_EXTRACT_PROMPT,
  schema: developerMetaSchema,
  extractRefs: () => [],
 };
@@ -44,8 +44,6 @@ export const preparerRole: RoleDefinition<PreparerMeta> = {
  description:
    "Locates or clones the target repository, ensures it is up to date, and gathers project context (conventions, toolchain).",
  systemPrompt: PREPARER_SYSTEM,
-  extractPrompt:
-    "Extract repoPath (absolute path), defaultBranch, conventions (summary string or null), and toolchain (packageManager, testCommand, lintCommand, buildCommand — each string or null).",
  schema: preparerMetaSchema,
  extractRefs: null,
 };
@@ -31,13 +31,9 @@ Read the thread for context:

 On any failure (push rejected, gh not authenticated, PR creation failed, etc.), report status="failed" with a short error message. Do not retry — surface the error so the moderator can decide.`;

-const SUBMITTER_EXTRACT_PROMPT =
-  "Extract the submission result. status='submitted' with prUrl on success, or status='failed' with a short error message on failure.";
-
 export const submitterRole: RoleDefinition<SubmitterMeta> = {
  description: "Pushes the developer's branch to the remote and opens a pull request.",
  systemPrompt: SUBMITTER_SYSTEM,
-  extractPrompt: SUBMITTER_EXTRACT_PROMPT,
  schema: submitterMetaSchema,
  extractRefs: null,
 };