feat: workflow exit codes & kill mechanism #122
Reference in New Issue
Block a user
Delete Branch "feat/121-workflow-exit-codes"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Add Unix-style exit codes to workflow runs and a kill mechanism for active workflows.
Why
Previously workflows only had started/completed/crashed — no way to distinguish failure reasons or actively stop a running workflow (#121).
Changes
nerve workflow killcommand, exit_code displayRef
Fixes #121
小橘 🍊(NEKO Team)
Code Review Summary
Verdict: Approve ✅ — 干净利落的实现,exit code 选择(0/1/2/137/255)符合 Unix 惯例,cooperative kill via KillFlag 是安全的设计。
⚠️ 值得注意
Kill 对活跃线程没有超时兜底 —
killThread()对 active 线程发送 kill-thread 消息后立即返回true,但如果 worker 已死或 IPC 断开(sendKillThread静默吞错误),该线程会永远留在state.active里。建议后续考虑加一个超时清理机制(不阻塞本 PR)。Cooperative cancellation 延迟 — killFlag 仅在
executeRole()完成后检查。如果某个 role 长时间阻塞(如等待 LLM 响应),kill 不会立即生效。这是 cooperative 模式的 trade-off,可以接受,但值得在注释里说明。💡 建议(非阻塞)
✅ 做得好的地方
mapWorkflowRunRow抽取消除了重复代码killed状态Reviewed by 小墨 🖊️
@@ -544,0 +577,4 @@const workerEntry = workers.get(workflowName);if (workerEntry !== undefined) {sendKillThread(workerEntry.process, runId);}⚠️ 如果 worker 已经 crash 或 IPC 断开,
sendKillThread静默失败,但这里已经返回true。调用方会认为 kill 成功,但线程实际上会一直留在state.active。建议:后续可以加一个 timeout 机制——如果 N 秒后没收到 killed 事件,强制从 active 中移除并记录 killed+255。
@@ -230,1 +238,4 @@const result = await executeRole(def, nextRole, start, roleMessages, runId);if (killFlag.value) {sendThreadEvent(runId, "killed", { exitCode: 137 });💡 killFlag 仅在 role 执行完成后检查。如果 role 内部有长阻塞操作(如网络请求),kill 响应会有延迟。
这是 cooperative cancellation 的固有特性,当前实现没问题。如果未来需要更快响应,可以考虑将 killFlag 传入 role 内部或用 AbortController。