feat: retrospect-workflow — add Phase 0 validation

- Check workflow exists in current project, block with wrong_project if not - Compare thread's workflow hash vs current version - If versions differ, diff and filter out already-fixed findings - New status: wrong_project → $END with clear error message
2026-05-30 15:32:33 +08:00
parent 5b26602fd4
commit f741729b41
1 changed files with 31 additions and 11 deletions
@@ -9,30 +9,45 @@ roles:
    procedure: |
      You receive a completed thread ID in your task prompt.

+      Phase 0 — Validation (must pass before any analysis):
+      1. Run `uwf step list <thread-id>` to get thread metadata including the workflow hash
+      2. Run `uwf workflow show <workflow-hash>` to get the workflow name
+      3. Verify the workflow exists locally: check `.workflows/<name>.yaml` in the current repo
+         - If NOT found: output $status=wrong_project with the workflow name. Do NOT proceed.
+      4. Compare the thread's workflow hash against the current registered version:
+         - Run `uwf workflow show <name>` to get the current hash
+         - If hashes differ: the thread ran on an older version. Note this — you will need to diff versions after analysis.
+
      Phase 1 — Overview scan:
-      1. Run `uwf step list <thread-id>` to get all steps with role, status, durationMs, and token usage
-      2. For each step, compute a health signal:
+      5. From the step list, compute a health signal for each step:
         - Duration: flag if >2x the median of other steps
         - Output tokens: flag if >2x the median
         - Status flow: flag non-happy-path transitions (rejected, fix_code, fix_spec, hook_failed)
         - Step count: flag if the same role appears more than expected (indicates loops)
-      3. If no anomalies found, output $status=clean with a brief summary
+      6. If no anomalies found AND versions match: output $status=clean
+      7. If no anomalies found BUT versions differ:
+         - Diff the two workflow versions to check if any procedure changes are relevant
+         - If the current version already addresses potential concerns: output $status=clean with a note
+         - Otherwise: proceed to Phase 2

      Phase 2 — Targeted deep-dive (only for flagged steps):
-      4. For each flagged step, run `uwf step show <hash>` to get the detail with turns
-      5. Analyze the turn sequence for:
+      8. For each flagged step, run `uwf step show <hash>` to get the detail with turns
+      9. Analyze the turn sequence for:
         - Repeated tool calls with the same or similar input (blind retries)
         - Tool errors followed by no strategy change (same approach retried)
         - Unnecessary exploration (reading files or running commands unrelated to the task)
         - Hallucinated commands or flags (commands that don't exist or wrong syntax)
         - Excessive turns before reaching the goal
-      6. For each finding, record:
-         - Which role and step hash
-         - What happened (specific turn indices and commands)
-         - Root cause hypothesis (procedure gap, missing pitfall, unclear instruction)
-         - Suggested fix (what to add/change in the procedure)
+      10. For each finding, record:
+          - Which role and step hash
+          - What happened (specific turn indices and commands)
+          - Root cause hypothesis (procedure gap, missing pitfall, unclear instruction)
+          - Suggested fix (what to add/change in the procedure)
+      11. If versions differ: compare findings against the version diff.
+          Mark any finding that is already fixed in the current version as "resolved_in_current".
+          Only report findings that are NOT yet addressed.

-      Output a structured findings report. Set $status=clean if nothing found, or $status=findings if issues detected.
+      Output a structured findings report. Set $status=clean if nothing actionable, $status=findings if unresolved issues exist, or $status=wrong_project if the workflow doesn't belong here.
    output: "A findings report with per-issue root cause and suggested procedure fixes. Set $status to clean or findings (with report hash)."
    frontmatter:
      oneOf:
@@ -45,6 +60,10 @@ roles:
            report: { type: string }
            targetWorkflow: { type: string }
          required: [$status, report, targetWorkflow]
+        - properties:
+            $status: { const: "wrong_project" }
+            workflowName: { type: string }
+          required: [$status, workflowName]
  proposer:
    description: "Translates findings into concrete workflow edits"
    goal: "You are a workflow improvement proposer. You read the analyst's findings and produce specific, minimal edits to the workflow YAML."
@@ -186,6 +205,7 @@ graph:
  analyst:
    clean: { role: "$END", prompt: "No issues found. Thread executed cleanly." }
    findings: { role: "proposer", prompt: "Findings report: {{{report}}}. Target workflow: {{{targetWorkflow}}}. Propose minimal edits." }
+    wrong_project: { role: "$END", prompt: "Thread uses workflow '{{{workflowName}}}' which does not exist in this project. Run retrospect from the correct repo." }
  proposer:
    no_action: { role: "$END", prompt: "No actionable changes needed: {{{reason}}}." }
    ready: { role: "developer", prompt: "Apply the change plan (CAS hash: {{{plan}}}) to the workflow definitions in this repo." }