Compare commits

...

7 Commits

Author SHA1 Message Date
xiaomo 49f3d91d1b chore: dead code cleanup — remove unused exports and fix stale docs
- Delete createEchoAgent (daemon, never referenced)
- Delete isDryRun (workflow-utils, deprecated, always false)
- Delete KNOWN_AGENT_ADAPTER_IDS (core, never referenced)
- Remove parseDurationStringToMs, labelSenseTrigger from core public API
- Remove spawnSafe re-export from workflow-utils
- Fix core/README.md stale API names
- Clean stale hermes-options.ts comment

Closes #302
2026-04-30 14:29:45 +00:00
xingyue 4d75c8683f Merge pull request 'docs(cli): sync Hermes SKILL.md with flat workspace and runtime types' (#301) from fix/298-update-hermes-skill into main 2026-04-30 14:20:36 +00:00
xingyue 99b0e58fb6 Merge pull request 'chore: RFC-006 Phase 4 cleanup — delete worker-fork-support.ts' (#300) from chore/rfc-006-cleanup into main 2026-04-30 14:19:44 +00:00
xiaomo a1b1d5eaf1 chore: RFC-006 Phase 4 cleanup — delete worker-fork-support.ts
- Move formatChildExitSummary/formatCapturedStderrTail to worker-runtime.ts
- Move ignoreSessionBroadcastSignals to new worker-signals.ts
- Delete worker-fork-support.ts (teeCapturedStderr no longer used)
- Update .knowledge/worker-isolation.md and architecture.md for WorkerRuntime
- All 167 tests pass, biome check clean

Closes #283
2026-04-30 14:17:16 +00:00
xingyue 7ce46e7735 Merge pull request 'RFC-006 Phase 3: Migrate workflow-manager process logic to WorkerRuntime' (#295) from refactor/rfc-006-workflow-runtime into main 2026-04-30 14:14:05 +00:00
xiaomo 0455f928f5 fix: address review feedback (星月) for Phase 3
1. sendToWorker: IPC send failure now marks thread as failed + dequeues next
2. crashLimitBlocked Set: prevents new startWorkflow from bypassing crash limit
3. "respawning" log skipped when crash limit is active
4. logWorkflowEvent payload: unknown | null (project convention, not ?:)
2026-04-30 14:10:06 +00:00
xiaomo dc4454d23e refactor: migrate workflow-manager process logic to WorkerRuntime (RFC-006 Phase 3)
- workflow-manager.ts: 792 → 498 lines, no more fork/ChildProcess/crash counting
- Extract pure functions to workflow-manager-support.ts (256 lines)
- WorkerRuntime gains: onCrashLimitReached callback, WorkerDrainOpts for per-call timeout
- worker-pool.ts updated for new evict/drain signature (null opts)
- All 167 daemon tests pass

Closes #282
2026-04-30 13:56:43 +00:00
24 changed files with 756 additions and 562 deletions
+1
View File
@@ -33,6 +33,7 @@ Senses own both the "what" (compute logic) and the "when" (config-driven schedul
- One worker per Workflow type (on-demand)
- Workers never talk to each other
- All user code runs in isolated Workers; kernel never loads user code directly
- **`WorkerRuntime`** (`packages/daemon/src/worker-runtime.ts`) centralizes fork lifecycle for both sense groups (`worker-pool.ts`) and workflow types (`workflow-manager.ts`); see `.knowledge/worker-isolation.md`
## Storage Systems
+8 -2
View File
@@ -12,6 +12,12 @@ Kernel (Main Process)
└── Workflow Worker (review) ── review workflow instances
```
### WorkerRuntime (RFC-006)
Forked worker processes are managed by **`WorkerRuntime`** (`worker-runtime.ts`): one Node child per logical key, cold start, optional respawn after crash, drain/evict, and coordinated shutdown over IPC. **`worker-pool.ts`** (sense groups) and **`workflow-manager.ts`** (workflow types) both configure and delegate to `createWorkerRuntime` instead of owning ad-hoc fork logic.
Worker **entrypoints** (`sense-worker.ts`, `workflow-worker.ts`) import lightweight helpers only — e.g. `worker-signals.ts` for session broadcast signal handling — so they do not pull in the parent-side runtime module.
## Isolation Boundaries
### 1. Sense Workers
@@ -111,10 +117,10 @@ workflows:
### Process Management
#### Signal Handling
Workers ignore session broadcast signals (SIGINT/SIGTERM):
Workers ignore session broadcast signals (SIGINT/SIGTERM) via `ignoreSessionBroadcastSignals()` in `worker-signals.ts`:
```typescript
// Workers ignore terminal signals; kernel coordinates shutdown
process.on("SIGINT", () => {});
process.on("SIGINT", () => {});
process.on("SIGTERM", () => {});
```
+146
View File
@@ -0,0 +1,146 @@
# Nerve 单仓死代码分析报告
**分析日期**: 2026-04-30
**范围**: `packages/core`, `daemon`, `store`, `cli`, `khala`, `workflow-utils`, `workflow-meta`, `adapter-cursor`, `adapter-hermes` 的 TypeScript 源码与 `package.json` 依赖。
**方法**: 全仓 `ripgrep` 交叉验证 `import` / `export` 路径;未运行 Knip/TS 死代码专项工具。
**说明**: 未包含 `role-reviewer` / `role-committer` / `skills` 等包(你列出的范围外),但 `workflow-meta` 对其中部分有依赖,相关结论已注明。
---
## 方法限制(读前必读)
| 限制 | 影响 |
|------|------|
| **动态 `import(url)`** | `cli``loadDaemonModule` 在运行时加载 `@uncaged/nerve-daemon`,静态分析无法把 `createKernel` 等映射到具体导出。 |
| **已发布包的公共 API** | 大量导出仅被仓外消费者使用;本报告中的「仓内未引用」≠「应删除」。 |
| **构建入口** | Rslib 多入口(如 `sense-worker``workflow-worker``daemon-bootstrap`)视为存活入口,不作为孤儿文件。 |
| **内部未导出函数** | 未做逐函数调用图;「死函数」仅列出高置信个例。 |
---
## 1. 未使用导出(仓内无 `import`)
以下符号在 **monorepo 内** 没有任何文件从 `@uncaged/nerve-core` / `@uncaged/nerve-workflow-utils` / `@uncaged/nerve-daemon` 等包根导出处引用(测试与定义自身除外)。
### 1.1 `@uncaged/nerve-core`
| 路径 | 未使用项 | 置信度 | 建议 |
|------|----------|--------|------|
| `packages/core/src/index.ts``agent.js` | `KNOWN_AGENT_ADAPTER_IDS` | **高** | **调查**:若 CLI/校验应约束 adapter id,可接入;否则可从公共 API 移除或保留作文档性常量并注明。 |
| `packages/core/src/index.ts``sense.js` | `labelSenseTrigger``senseTriggerLabels``cli`/`daemon` 有使用) | **高(相对仓内)** | **保留或收敛导出**:仅 `senseTriggerLabels` 对外即可;`labelSenseTrigger` 可作为内部函数若不需要单独暴露。 |
| `packages/core/src/index.ts``util.js` | `parseDurationStringToMs`(仅 `config.ts` 内部使用) | **高** | **调查**:无需在 `index` 再导出则改为非导出或移除 re-export,减少 API 面。 |
| `packages/core/src/index.ts``sense.js` | 类型 `SenseModule` | **中** | **保留**:用户文档/外部 Sense 模块常用;仓内未按名引用属正常。 |
| `packages/core/src/index.ts``config.js` | 单独导出的类型 `NerveApiConfig`(仅 `config.ts``index` 出现) | **低** | **保留**:通常随 `NerveConfig` 一并使用;单独 export 多为 TS 公共类型便利。 |
### 1.2 `@uncaged/nerve-workflow-utils`
下列项在 **`packages/*/src/**/*.ts` 中无人从包入口导入**(`workflow-meta``role-committer` 等主要只用 `createRole``decorateRole``onFail``withDryRun``LlmExtractorConfig`)。
| 路径 | 未使用项 | 置信度 | 建议 |
|------|----------|--------|------|
| `packages/workflow-utils/src/index.ts` | `createLlmAdapter` | **高** | **保留(对外 API)** 或标注文档;仓内无调用。 |
| 同上 | `llmExtract``llmExtractWithRetry` | **高** | **保留**:高级用法 / 文档示例;内部与测试使用。 |
| 同上 | `mergeExtractConfig``ExtractConfigLayer` | **高** | **调查**:若 RFC 分层配置仍在推进则保留;否则评估移除导出。 |
| 同上 | `assertZodMetaSchemas``createLlmExtractFn``extractMetaOrThrow``zodMeta``ZodMetaSchema` | **高** | **保留(对外 API)**;当前仅 `workflow-utils` 测试与内部 `create-role` 链路使用 `extractMetaOrThrow`。 |
| 同上 | `readNerveYaml``nerveAgentContext` 及相关类型 | **高** | **调查**:若已无 YAML 上下文注入场景可删导出;否则保留给 Agent 工具链。 |
| 同上 | `spawnSafe``nerveCommandEnv``Spawn*` 类型(从 core 再导出) | **高** | **删除再导出或改为文档链接**:仓内均直接从 `@uncaged/nerve-core` 引用 spawn。 |
| 同上 | `isDryRun` | **高** | **删除或实现**:见 §3「死函数」。 |
| 同上 | `LlmMessage``MetaExtractConfig``LlmChatError``LlmError``LlmProvider` 等类型再导出 | **中** | **保留**:供外部精细类型标注;仓内未单独 import。 |
### 1.3 `@uncaged/nerve-daemon`
| 路径 | 未使用项 | 置信度 | 建议 |
|------|----------|--------|------|
| `packages/daemon/src/index.ts``agent-adapters/echo.js` | `createEchoAgent` | **高** | **调查**:无任何测试或运行时引用;若设计保留 `type: "echo"` 适配器,应在 workflow/agent 装载路径接线或删除导出与文件。 |
| `packages/daemon/src/index.ts`(及其它导出) | 多数 IPC / runtime 符号 | **低** | **保留**:动态加载与外部集成无法静态判定;仅 `createEchoAgent` 可确定为当前静态图下的死角。 |
### 1.4 `@uncaged/nerve-adapter-cursor` / `@uncaged/nerve-adapter-hermes`
| 路径 | 未使用项 | 置信度 | 建议 |
|------|----------|--------|------|
| `packages/adapter-cursor/src/index.ts` | `cursorAdapter``createCursorAdapter`(以及 `cursorAgent` 仅被 `workflow-utils/src/role-cursor.ts` 使用) | **中** | **保留**:默认实例与工厂供下游与工作流仓使用;仓内业务包未直接 import `cursorAdapter` 属预期。 |
| `packages/adapter-hermes/src/index.ts` | `hermesAdapter``createHermesAdapter``hermesAgent``workflow-utils` 间接使用) | **中** | 同上。 |
### 1.5 `@uncaged/nerve-cli`
| 路径 | 未使用项 | 置信度 | 建议 |
|------|----------|--------|------|
| `packages/cli/src/index.ts` | `getNerveRoot``loadDaemonModule` 等程序化导出 | **高(仓内)** | **保留**:无任何 workspace 包引用 `@uncaged/nerve-cli`;面向 CLI 二次开发者。 |
---
## 2. 孤儿文件(非入口、未被其它源码引用)
在当前 Rslib / `cli` 入口定义下,**未发现**「零 import、且非 entry / 非测试」的 `.ts` 业务文件:
- `workflow-utils` 下的 `role-cursor.ts``role-hermes.ts``role-llm.ts``role-react.ts` 虽**未出现在包 `index.ts`**,但被 `packages/workflow-utils/src/__tests__/role-factories.test.ts` 直接引用,**不是孤儿文件**。
- `daemon``sense-worker.ts``workflow-worker.ts` 等为 **显式构建入口**
**置信度**: 对全表扫描为 **中**(未跑依赖图工具;动态路径可能掩盖极少数脚本引用)。
---
## 3. 死函数(内部未导出且未被调用)
| 路径 | 说明 | 置信度 | 建议 |
|------|------|--------|------|
| `packages/workflow-utils/src/role-types.ts` | `isDryRun(_start: StartStep)`:导出在 `index.ts`,但全仓无调用(函数体恒返回 `false` 且标注 deprecated) | **高** | **删除**或改为内部常量;若需向后兼容则保留并文档标注「保留占位」。 |
其它内部函数未系统逐文件枚举。
---
## 4. 未使用的 npm 依赖(package.json)
在声明范围内通过源码 `import` 抽查:**未发现明显「完全未使用」的生产依赖**。
| 包 | 依赖 | 结论 |
|----|------|------|
| `cli` | `citty``yaml``picomatch` | 均有对应源码引用。 |
| `khala` | `hono``jsonata``ulidx``@uncaged/nerve-core` | 均有引用。 |
| `core` / `daemon` | `yaml``drizzle-orm` | 均有引用。 |
| `workflow-utils` / `workflow-meta` | `zod`、workspace 包 | 均有引用。 |
**置信度**: **中**(未对 peer、可选路径、CLI `bin` 专用依赖做自动化扫描)。
---
## 5. 陈旧测试夹具 / 未引用辅助文件
未发现独立的「fixture 目录」明显失联;`cli``e2e-harness``__tests__` 内 helper 均有对应测试引用。
**置信度**: **低**(未枚举每个 `__tests__` 资源文件)。
---
## 6. 重构遗留 / 文档漂移
| 项目 | 位置 | 说明 | 置信度 | 建议 |
|------|------|------|--------|------|
| 已更名 API 仍出现在 README | `packages/core/README.md` | 仍描述 `parseSenseWorkflowDirective``ParsedSenseWorkflowDirective``SenseComputeRoute`;源码已为 `parseWorkflowTrigger` / `routeSenseComputeOutput` / `RoutedSenseOutput` | **高** | **更新文档**(本次分析不改代码,仅记录)。 |
| Hermes 选项合并注释 | `packages/workflow-utils/src/shared/hermes-agent.ts` | 注释称 absorbed from `hermes-options.ts`,该文件已不存在 | **中** | **清理注释**,避免误导。 |
| `KNOWN_AGENT_ADAPTER_IDS``codex` | `packages/core/src/agent.ts` | 仓内无 `codex` 适配器包;与常量未被引用叠加 | **中** | **对齐产品**:实现适配器或从列表移除。 |
未发现用户提到的 `worker-fork-support` 等字符串副本(全仓无匹配)。
---
## 7. 汇总建议优先级
1. **高优先级调查**: `createEchoAgent``KNOWN_AGENT_ADAPTER_IDS` — 要么接入运行时,要么删减以免维护假象。
2. **API 面收敛**: `parseDurationStringToMs``labelSenseTrigger` 若无意对外,可从 `core` 公共导出移除。
3. **`workflow-utils`**: 评估 `isDryRun` 删除;`spawnSafe` 等从 `workflow-utils` 再导出是否仍有必要。
4. **文档**: 修正 `packages/core/README.md` 中 Sense→Workflow 路由 API 名称。
---
## 8. 重新验证命令示例
后续可在本地采用工具增强置信度(非本次执行):
```bash
pnpm add -D -w knip
pnpm exec knip
```
或在各包对 `@uncaged/nerve-*` 的导出做面向消费者的契约测试,避免误删对外 API。
+22 -14
View File
@@ -4,11 +4,11 @@ Shared types and configuration parser for the [nerve](../../README.md) observati
## What's Inside
- **Type definitions** — `Signal`, `SenseConfig`, `SenseInfo`, `SenseReflexConfig`, `ReflexConfig` (sense-only), `WorkflowConfig`, `NerveConfig`, and related types
- **Type definitions** — `Signal`, `SenseConfig`, `SenseInfo`, `WorkflowConfig`, `NerveConfig`, and related types
- **Config parser** — `parseNerveConfig(yaml)` validates and parses `nerve.yaml` into `NerveConfig` (rejects reflex entries that declare a `workflow` key; reflexes only schedule senses)
- **Sense → workflow routing** — `parseSenseWorkflowDirective`, `routeSenseComputeOutput`, and types `ParsedSenseWorkflowDirective`, `SenseComputeRoute`
- **Sense → workflow routing** — `parseWorkflowTrigger`, `routeSenseComputeOutput`, and types `WorkflowTrigger`, `RoutedSenseOutput`
- **Daemon IPC protocol** — request/response types (`DaemonIpcRequest`, `DaemonIpcResponse`, …) and `parseDaemonIpcRequest` for newline-delimited JSON on the CLI ↔ daemon socket
- **Workflow automaton types** — `START` / `END` sentinel constants, `WorkflowMessage`, `StartStep`, `RoleStep`, `ModeratorContext` (`start` + `steps`; empty `steps` on first moderator call), `Moderator` (single `context` argument), `WorkflowDefinition`, `Role`, `SenseResult`, plus `DEFAULT_ENGINE_MAX_ROUNDS`
- **Workflow automaton types** — `START` / `END` sentinel constants, `WorkflowMessage`, `StartStep`, `RoleStep`, `ModeratorContext` (`start` + `steps`; empty `steps` on first moderator call), `Moderator` (single `context` argument), `WorkflowDefinition`, `Role`, `RoleResult`, plus `DEFAULT_ENGINE_MAX_ROUNDS`
- **Result type** — `Result<T>` with `ok()` / `err()` helpers for explicit error handling (no thrown exceptions for parse paths)
## Usage
@@ -26,23 +26,31 @@ if (result.ok) {
### Sense return → signal vs workflow
```typescript
import { parseSenseWorkflowDirective, routeSenseComputeOutput } from "@uncaged/nerve-core";
import { parseWorkflowTrigger, routeSenseComputeOutput } from "@uncaged/nerve-core";
const directive = parseSenseWorkflowDirective("my-workflow|8|Hello from sense");
const directive = parseWorkflowTrigger({
name: "my-workflow",
maxRounds: 8,
prompt: "Hello from sense",
dryRun: false,
});
if (directive.ok) {
console.log(directive.value.workflowName, directive.value.maxRounds, directive.value.prompt);
console.log(directive.value.name, directive.value.maxRounds, directive.value.prompt);
}
const route = routeSenseComputeOutput({
metric: 42,
workflow: "my-workflow|8|Run now",
signal: { metric: 42 },
workflow: {
name: "my-workflow",
maxRounds: 8,
prompt: "Run now",
dryRun: false,
},
});
if (route.kind === "launch") {
// engine starts workflow; no Signal to the bus for this return
console.log(route.launch);
} else {
// normal signal with payload
console.log(route.payload);
if (route.ok && route.value.workflow !== null) {
console.log(route.value.workflow);
} else if (route.ok) {
console.log(route.value.signal);
}
```
-6
View File
@@ -21,9 +21,3 @@ export class ExtractError extends Error {
Object.setPrototypeOf(this, new.target.prototype);
}
}
/**
* Agent adapter ids referenced by tooling / docs (RFC-003).
* Workflows import adapter packages directly; echo may be used in tests via a small factory.
*/
export const KNOWN_AGENT_ADAPTER_IDS = ["echo", "cursor", "hermes", "codex"] as const;
+1 -3
View File
@@ -13,7 +13,7 @@ export type {
} from "./config.js";
export type { Signal, SenseInfo } from "./sense.js";
export type { SenseComputeFn, SenseModule } from "./sense.js";
export { labelSenseTrigger, senseTriggerLabels } from "./sense.js";
export { senseTriggerLabels } from "./sense.js";
export type {
WorkflowMessage,
RoleResult,
@@ -29,7 +29,6 @@ export type {
WorkflowDefinition,
} from "./workflow.js";
export { START, END, DEFAULT_ENGINE_MAX_ROUNDS } from "./workflow.js";
export { parseDurationStringToMs } from "./util.js";
export type { Schema, ExtractFn } from "./agent.js";
export { ExtractError } from "./agent.js";
export type { Result } from "./util.js";
@@ -46,7 +45,6 @@ export { parseNerveConfig } from "./config.js";
export type { KnowledgeConfig } from "./config.js";
export { parseKnowledgeYaml } from "./config.js";
export { isPlainRecord } from "./util.js";
export { KNOWN_AGENT_ADAPTER_IDS } from "./agent.js";
export type { RoutedSenseOutput } from "./sense.js";
export { parseWorkflowTrigger, routeSenseComputeOutput } from "./sense.js";
@@ -28,6 +28,11 @@ function makeMockChild(pid = 1): MockChild {
child.connected = true;
child.exitCode = null;
child.pid = pid;
setImmediate(() => {
if (child.connected) {
child.emit("message", { type: "ready" });
}
});
child.send = vi.fn((msg: unknown) => {
if (
msg !== null &&
@@ -132,6 +137,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
mgr.startWorkflow("my-wf", { prompt: "test 1", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-wf", { prompt: "test 2", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("my-wf")).toBe(2);
// Simulate unexpected exit (not shutdown)
@@ -159,6 +165,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("my-wf")).toBe(2);
const child = mockChildren[0];
@@ -183,6 +190,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
const child = mockChildren[0];
@@ -216,6 +224,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
firstChild.exitCode = 1;
firstChild.connected = false;
@@ -260,6 +269,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
// Start one thread to fill the concurrency slot (so queued run stays queued on respawn)
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
firstChild.exitCode = 1;
firstChild.connected = false;
@@ -285,6 +295,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const startCall = (child.send as ReturnType<typeof vi.fn>).mock.calls[0];
@@ -322,6 +333,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const launch = { prompt: "build-docker for myrepo", maxRounds: 10, dryRun: false };
mgr.startWorkflow("my-wf", launch);
await vi.runAllTimersAsync();
const startedCall = logStore.upsertWorkflowRun.mock.calls.find(
(args: any[]) => (args[0] as { type: string }).type === "started",
@@ -357,6 +369,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
// Start one thread to fill the concurrency slot
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
// Crash once → respawn → crash again → second respawn
@@ -398,6 +411,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
firstChild.exitCode = 1;
firstChild.connected = false;
@@ -428,6 +442,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("crash-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
// Crash the worker 6 times in rapid succession (within CRASH_WINDOW_MS = 60s)
for (let i = 0; i < 6; i++) {
@@ -33,6 +33,11 @@ function makeMockChild(pid = 1): MockChild {
child.connected = true;
child.exitCode = null;
child.pid = pid;
setImmediate(() => {
if (child.connected) {
child.emit("message", { type: "ready" });
}
});
child.send = vi.fn((msg: unknown) => {
if (
msg !== null &&
@@ -114,6 +119,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
// Remove workflow from config before drain completes
@@ -134,6 +140,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("my-wf")).toBe(2);
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
@@ -165,6 +172,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
@@ -181,6 +189,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
@@ -198,6 +207,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
await vi.runAllTimersAsync();
@@ -223,6 +233,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "first", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
await vi.runAllTimersAsync();
@@ -230,6 +241,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
// Start a new thread on the fresh worker
mgr.startWorkflow("my-wf", { prompt: "second", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const newChild = mockChildren[1];
const startCalls = (newChild.send as ReturnType<typeof vi.fn>).mock.calls.filter(
@@ -257,12 +269,13 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
vi.clearAllMocks();
});
it("does not send shutdown while a thread is still active", () => {
it("does not send shutdown while a thread is still active", async () => {
const logStore = makeLogStore();
const config = makeWfConfig({ "my-wf": { concurrency: 1, overflow: "drop" } });
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
mgr.drainWhenIdle("my-wf");
@@ -282,6 +295,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
@@ -311,6 +325,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
mgr.startWorkflow("my-wf", { prompt: "a", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-wf", { prompt: "b", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const sendMock = child.send as ReturnType<typeof vi.fn>;
const runIdA = (sendMock.mock.calls[0][0] as { runId: string }).runId;
@@ -355,6 +370,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
@@ -388,6 +404,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
@@ -414,6 +431,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "once", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
const runId = (firstChild.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as {
runId: string;
@@ -471,6 +489,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
// Trigger a workflow thread so a worker is spawned
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
// Manually call drainAndRespawn (simulating what kernel does on workflow file change)
const drainPromise = kernel.workflowManager.drainAndRespawn("my-wf", 1000);
@@ -511,6 +530,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
maxRounds: 10,
dryRun: false,
});
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
// Reload config without old-wf
@@ -551,6 +571,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
});
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const workersBefore = mockChildren.length;
// Reload with updated concurrency — should NOT spawn a new workflow worker
@@ -573,6 +594,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
// Can now start up to 5 concurrent threads (previously only 1)
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(kernel.workflowManager.activeCount("my-wf")).toBe(3);
const stopPromise = kernel.stop();
@@ -194,6 +194,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
// A workflow worker should be spawned and a start-thread message sent
const workflowWorker = mockChildren.find((c) =>
(c.send as ReturnType<typeof vi.fn>).mock.calls.some(
@@ -252,6 +254,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
// Find the start-thread call and verify triggerPayload
const startThreadCall = mockChildren
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
@@ -306,6 +310,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
const senseEntries = logStore.append.mock.calls
.map((c) => c[0] as { source: string; type: string; refId: string | null })
.filter((e) => e.source === "sense" && e.refId === "cpu-usage");
@@ -423,6 +429,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
expect(logStore.upsertWorkflowRun).toHaveBeenCalledWith(
expect.objectContaining({ source: "workflow", type: "started" }),
expect.objectContaining({ workflow: "log-test-workflow", status: "started" }),
@@ -498,6 +506,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
const startThreadCall = mockChildren
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
.find(
@@ -582,6 +592,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
const startThreadCall = mockChildren
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
.find(
@@ -642,6 +654,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
const stopPromise = kernel.stop();
await vi.runAllTimersAsync();
await expect(stopPromise).resolves.toBeUndefined();
@@ -16,6 +16,7 @@ function baseConfig(script: string) {
onMessage: vi.fn(),
onReady: vi.fn(),
onExit: vi.fn(),
onCrashLimitReached: null,
respawn: {
enabled: true,
maxCrashes: 6,
@@ -84,7 +85,7 @@ describe("createWorkerRuntime", () => {
const rt = track(createWorkerRuntime(baseConfig(echoWorkerPath)));
await rt.start("k");
expect(rt.has("k")).toBe(true);
await rt.evict("k");
await rt.evict("k", null);
expect(rt.has("k")).toBe(false);
await rt.shutdown();
});
@@ -94,7 +95,7 @@ describe("createWorkerRuntime", () => {
await rt.start("k");
const before = rt.pid("k");
expect(before).not.toBeNull();
await rt.drain("k");
await rt.drain("k", null);
const after = rt.pid("k");
expect(after).not.toBeNull();
expect(after).not.toBe(before);
@@ -26,6 +26,11 @@ function makeMockChild(pid = 1): MockChild {
child.connected = true;
child.exitCode = null;
child.pid = pid;
setImmediate(() => {
if (child.connected) {
child.emit("message", { type: "ready" });
}
});
child.send = vi.fn((msg: unknown) => {
if (
msg !== null &&
@@ -110,7 +115,7 @@ describe("WorkflowManager", () => {
});
describe("startWorkflow under concurrency limit dispatches thread", () => {
it("forks a worker and sends start-thread when active < concurrency", () => {
it("forks a worker and sends start-thread when active < concurrency", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"my-workflow": { concurrency: 2, overflow: "drop" },
@@ -118,6 +123,7 @@ describe("WorkflowManager", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-workflow", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
expect(mockChildren[0].send).toHaveBeenCalledWith(
@@ -126,7 +132,7 @@ describe("WorkflowManager", () => {
expect(mgr.activeCount("my-workflow")).toBe(1);
});
it("reuses the same worker for a second thread under the limit", () => {
it("reuses the same worker for a second thread under the limit", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"my-workflow": { concurrency: 3, overflow: "drop" },
@@ -135,6 +141,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("my-workflow", { prompt: "test 1", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-workflow", { prompt: "test 2", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
// Only one forked child — worker is reused
expect(mockChildren).toHaveLength(1);
@@ -142,7 +149,7 @@ describe("WorkflowManager", () => {
expect(mgr.activeCount("my-workflow")).toBe(2);
});
it("logs a 'started' event for each dispatched thread", () => {
it("logs a 'started' event for each dispatched thread", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"my-workflow": { concurrency: 2, overflow: "drop" },
@@ -150,6 +157,7 @@ describe("WorkflowManager", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-workflow", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(logStore.upsertWorkflowRun).toHaveBeenCalledWith(
expect.objectContaining({ source: "workflow", type: "started" }),
@@ -159,7 +167,7 @@ describe("WorkflowManager", () => {
});
describe("startWorkflow at limit with drop overflow drops the request", () => {
it("does NOT send start-thread when at concurrency limit with overflow=drop", () => {
it("does NOT send start-thread when at concurrency limit with overflow=drop", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"drop-wf": { concurrency: 1, overflow: "drop" },
@@ -169,6 +177,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("drop-wf", { prompt: "first", maxRounds: 10, dryRun: false });
// now at limit — second call should be dropped
mgr.startWorkflow("drop-wf", { prompt: "second", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("drop-wf")).toBe(1);
expect(mgr.queueLength("drop-wf")).toBe(0);
@@ -254,7 +263,7 @@ describe("WorkflowManager", () => {
});
describe("completing a thread dequeues the next one", () => {
it("dispatches the next queued thread when the active thread sends completed", () => {
it("dispatches the next queued thread when the active thread sends completed", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"queue-wf": { concurrency: 1, overflow: "queue", maxQueue: 5 },
@@ -263,6 +272,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("queue-wf", { prompt: "first", maxRounds: 10, dryRun: false });
mgr.startWorkflow("queue-wf", { prompt: "second", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("queue-wf")).toBe(1);
expect(mgr.queueLength("queue-wf")).toBe(1);
@@ -289,7 +299,7 @@ describe("WorkflowManager", () => {
);
});
it("dispatches next queued thread when active thread sends failed", () => {
it("dispatches next queued thread when active thread sends failed", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"queue-wf": { concurrency: 1, overflow: "queue", maxQueue: 5 },
@@ -298,6 +308,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("queue-wf", { prompt: "first", maxRounds: 10, dryRun: false });
mgr.startWorkflow("queue-wf", { prompt: "second", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const firstRunId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0].runId as string;
@@ -325,6 +336,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("wf-a", { prompt: "test", maxRounds: 10, dryRun: false });
mgr.startWorkflow("wf-b", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
// Two distinct workers should have been forked
expect(mockChildren).toHaveLength(2);
@@ -1,9 +0,0 @@
import type { AgentConfig, AgentFn, ThreadContext } from "@uncaged/nerve-core";
/**
* Echo adapter (`type: "echo"`) — returns the assembled prompt unchanged.
* Used for tests and dry-run wiring before real adapters exist.
*/
export function createEchoAgent(_config: AgentConfig): AgentFn {
return async (_ctx: ThreadContext, prompt: string) => prompt;
}
-2
View File
@@ -56,5 +56,3 @@ export type {
export { createWorkflowManager } from "./workflow-manager.js";
export type { WorkflowManager } from "./workflow-manager.js";
export { createEchoAgent } from "./agent-adapters/echo.js";
+1 -1
View File
@@ -25,7 +25,7 @@ import type { WorkerToParentMessage } from "./ipc.js";
import { parseParentMessage } from "./ipc.js";
import { executeCompute, loadSenseModule, openSenseDb } from "./sense-runtime.js";
import type { SenseRuntime } from "./sense-runtime.js";
import { ignoreSessionBroadcastSignals } from "./worker-fork-support.js";
import { ignoreSessionBroadcastSignals } from "./worker-signals.js";
// ---------------------------------------------------------------------------
// IPC helpers
@@ -1,45 +0,0 @@
import type { ChildProcess } from "node:child_process";
const STDERR_TAIL_MAX_CHARS = 16_384;
/**
* Forked workers inherit the parent's process group. In foreground `nerve dev`,
* terminal-driven SIGINT/SIGTERM is delivered to the whole group, so workers can exit
* on the default handler before the kernel sends `{ type: "shutdown" }` over IPC.
* Swallow these in worker processes so the parent coordinates shutdown (issue #55).
* Only call when `process.send` is defined (fork IPC); standalone `node …-worker.js` keeps default Ctrl+C behaviour.
*/
export function ignoreSessionBroadcastSignals(): void {
const swallow = (): void => {};
process.on("SIGINT", swallow);
process.on("SIGTERM", swallow);
}
export function teeCapturedStderr(child: ChildProcess, tail: { value: string }): void {
const stream = child.stderr;
if (stream === null || stream === undefined) return;
stream.setEncoding("utf8");
stream.on("data", (chunk: string | Buffer) => {
const text = typeof chunk === "string" ? chunk : chunk.toString("utf8");
process.stderr.write(text);
tail.value = (tail.value + text).slice(-STDERR_TAIL_MAX_CHARS);
});
}
export function formatChildExitSummary(code: number | null, signal: NodeJS.Signals | null): string {
const codeStr = code === null || code === undefined ? "null" : String(code);
if (signal) {
return `code=${codeStr} signal=${signal}`;
}
return `code=${codeStr}`;
}
export function formatCapturedStderrTail(tail: string, maxChars = 800): string {
const trimmed = tail.trim();
if (trimmed.length === 0) return "";
const normalized = trimmed.replace(/\r?\n/g, "\\n");
if (normalized.length <= maxChars) {
return ` worker_stderr=${normalized}`;
}
return ` worker_stderr=…${normalized.slice(-maxChars)}`;
}
+8 -4
View File
@@ -6,8 +6,11 @@ import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
import type { ComputeMessage } from "./ipc.js";
import { formatCapturedStderrTail, formatChildExitSummary } from "./worker-fork-support.js";
import { createWorkerRuntime } from "./worker-runtime.js";
import {
createWorkerRuntime,
formatCapturedStderrTail,
formatChildExitSummary,
} from "./worker-runtime.js";
export function resolveWorkerScript(): string {
const __filename = fileURLToPath(import.meta.url);
@@ -66,6 +69,7 @@ export function createSenseWorkerPool(options: SenseWorkerPoolOptions): SenseWor
onReady: (_key, msg) => {
options.onWorkerMessage(msg);
},
onCrashLimitReached: null,
onExit: (group, code, signal) => {
const sig =
signal === null || signal === undefined || signal === ""
@@ -102,13 +106,13 @@ export function createSenseWorkerPool(options: SenseWorkerPoolOptions): SenseWor
return;
}
options.onBeforeGroupRestart(group);
await runtime.drain(group);
await runtime.drain(group, null);
}
function evictGroup(group: string): void {
trackedGroups.delete(group);
evicting.add(group);
void runtime.evict(group).finally(() => {
void runtime.evict(group, null).finally(() => {
evicting.delete(group);
});
}
+45 -9
View File
@@ -8,6 +8,28 @@ import { isPlainRecord } from "@uncaged/nerve-core";
const STDERR_TAIL_MAX_CHARS = 2048;
export function formatChildExitSummary(code: number | null, signal: NodeJS.Signals | null): string {
const codeStr = code === null || code === undefined ? "null" : String(code);
if (signal) {
return `code=${codeStr} signal=${signal}`;
}
return `code=${codeStr}`;
}
export function formatCapturedStderrTail(tail: string, maxChars = 800): string {
const trimmed = tail.trim();
if (trimmed.length === 0) return "";
const normalized = trimmed.replace(/\r?\n/g, "\\n");
if (normalized.length <= maxChars) {
return ` worker_stderr=${normalized}`;
}
return ` worker_stderr=…${normalized.slice(-maxChars)}`;
}
export type WorkerDrainOpts = {
shutdownTimeoutMs: number | null;
};
export type WorkerRuntimeConfig<K extends string> = {
script: string;
argsForKey: (key: K) => string[];
@@ -16,6 +38,8 @@ export type WorkerRuntimeConfig<K extends string> = {
onMessage: (key: K, msg: unknown) => void;
onReady: (key: K, msg: unknown) => void;
onExit: (key: K, code: number | null, signal: string | null) => void;
/** Invoked when automatic respawn is skipped because `maxCrashes` was exceeded in `windowMs`. */
onCrashLimitReached: ((key: K) => void) | null;
respawn: {
enabled: boolean;
maxCrashes: number;
@@ -32,8 +56,8 @@ export type WorkerRuntime<K extends string> = {
/** When the worker is already ready and IPC-connected, sends synchronously (returns true). Otherwise false — caller may fall back to `send`. */
trySendSync: (key: K, msg: unknown) => boolean;
start: (key: K) => Promise<void>;
evict: (key: K) => Promise<void>;
drain: (key: K) => Promise<void>;
evict: (key: K, opts: WorkerDrainOpts | null) => Promise<void>;
drain: (key: K, opts: WorkerDrainOpts | null) => Promise<void>;
shutdown: () => Promise<void>;
has: (key: K) => boolean;
/** True when a child exists but IPC is disconnected (legacy pool skipped sends in this case). */
@@ -264,7 +288,14 @@ export function createWorkerRuntime<K extends string>(
await waitForReady(slot, config.shutdownTimeoutMs);
}
async function gracefulStop(slot: WorkerSlot<K>): Promise<void> {
function resolveShutdownTimeoutMs(opts: WorkerDrainOpts | null): number {
if (opts !== null && opts.shutdownTimeoutMs !== null) {
return opts.shutdownTimeoutMs;
}
return config.shutdownTimeoutMs;
}
async function gracefulStop(slot: WorkerSlot<K>, shutdownTimeoutMs: number): Promise<void> {
if (slot.child === null) {
return;
}
@@ -276,7 +307,7 @@ export function createWorkerRuntime<K extends string>(
} catch {
// IPC channel may have closed between null-check and send
}
await waitForChildExit(child, config.shutdownTimeoutMs);
await waitForChildExit(child, shutdownTimeoutMs);
}
async function handleUnexpectedCrashRecovery(slot: WorkerSlot<K>): Promise<void> {
@@ -295,6 +326,9 @@ export function createWorkerRuntime<K extends string>(
console.error(
`[WorkerRuntime] worker "${String(slot.key)}" exceeded crash limit (${String(config.respawn.maxCrashes)} in ${String(config.respawn.windowMs)}ms); not respawning`,
);
if (config.onCrashLimitReached !== null) {
config.onCrashLimitReached(slot.key);
}
return;
}
@@ -303,7 +337,7 @@ export function createWorkerRuntime<K extends string>(
}
async function shutdownWorker(slot: WorkerSlot<K>): Promise<void> {
await gracefulStop(slot);
await gracefulStop(slot, config.shutdownTimeoutMs);
workers.delete(slot.key);
}
@@ -348,22 +382,24 @@ export function createWorkerRuntime<K extends string>(
});
},
evict: async (key: K) => {
evict: async (key: K, opts: WorkerDrainOpts | null) => {
const slot = getOrCreateSlot(key);
const shutdownMs = resolveShutdownTimeoutMs(opts);
await enqueueOp(slot, async () => {
await gracefulStop(slot);
await gracefulStop(slot, shutdownMs);
workers.delete(key);
});
},
drain: async (key: K) => {
drain: async (key: K, opts: WorkerDrainOpts | null) => {
const slot = getOrCreateSlot(key);
const shutdownMs = resolveShutdownTimeoutMs(opts);
await enqueueOp(slot, async () => {
if (slot.child === null) {
await forkAndWaitReady(slot);
return;
}
await gracefulStop(slot);
await gracefulStop(slot, shutdownMs);
await forkAndWaitReady(slot);
});
},
+17
View File
@@ -0,0 +1,17 @@
/**
* Worker-process signal handling (fork IPC children only).
* Worker entrypoints import this module — not worker-runtime.ts (parent/kernel code).
*/
/**
* Forked workers inherit the parent's process group. In foreground `nerve dev`,
* terminal-driven SIGINT/SIGTERM is delivered to the whole group, so workers can exit
* on the default handler before the kernel sends `{ type: "shutdown" }` over IPC.
* Swallow these in worker processes so the parent coordinates shutdown (issue #55).
* Only call when `process.send` is defined (fork IPC); standalone `node …-worker.js` keeps default Ctrl+C behaviour.
*/
export function ignoreSessionBroadcastSignals(): void {
const swallow = (): void => {};
process.on("SIGINT", swallow);
process.on("SIGTERM", swallow);
}
@@ -0,0 +1,256 @@
/**
* Pure helpers and IPC branching for workflow-manager (keeps workflow-manager.ts lean).
*/
import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
import type { WorkflowMessage } from "@uncaged/nerve-core";
import { START, isPlainRecord } from "@uncaged/nerve-core";
import type { LogStore, WorkflowRunStatus } from "@uncaged/nerve-store";
import type { ResumeThreadMessage, ThreadEventMessage } from "./ipc.js";
import type { WorkerToParentMessage } from "./ipc.js";
export type PendingThread = {
runId: string;
prompt: string;
maxRounds: number;
dryRun: boolean;
};
export type WorkflowState = {
active: Set<string>;
queue: PendingThread[];
};
/** Matches legacy manager: 6 crashes within 60s stops respawn (was `length > 5`). */
export const WORKFLOW_WORKER_RESPAWN = {
enabled: true,
maxCrashes: 6,
windowMs: 60_000,
delayMs: 0,
} as const;
/**
* Worker shutdown timeout — must stay in sync with SHUTDOWN_TIMEOUT_MS in workflow-worker.ts.
* The drain timeout passed to drainAndRespawn must be >= this value so the worker has
* enough time to finish in-flight threads before the parent force-kills it.
*/
export const WORKER_SHUTDOWN_TIMEOUT_MS = 10_000;
export const DEFAULT_MAX_QUEUE = 100;
export function readLaunchFromTriggerPayload(
raw: unknown,
engineDefaultMaxRounds: number,
): { prompt: string; maxRounds: number; dryRun: boolean } {
if (isPlainRecord(raw)) {
const o = raw;
if (typeof o.prompt === "string" && typeof o.maxRounds === "number") {
const dryRun = typeof o.dryRun === "boolean" ? o.dryRun : false;
return { prompt: o.prompt, maxRounds: o.maxRounds, dryRun };
}
}
return { prompt: "", maxRounds: engineDefaultMaxRounds, dryRun: false };
}
export function ensureThreadMessagesWithStart(
messages: Array<{ role: string; content: string; meta: unknown; timestamp: number }>,
threadId: string,
fallbackPrompt: string,
fallbackMaxRounds: number,
): WorkflowMessage[] {
const mapped: WorkflowMessage[] = messages.map((m) => ({
role: m.role,
content: m.content,
meta: m.meta,
timestamp: m.timestamp,
}));
if (mapped.length > 0 && mapped[0].role === START) {
return mapped;
}
const start: WorkflowMessage = {
role: START,
content: fallbackPrompt,
meta: { maxRounds: fallbackMaxRounds, threadId },
timestamp: Date.now(),
};
return [start, ...mapped];
}
export function resolveWorkflowWorkerScript(): string {
const __filename = fileURLToPath(import.meta.url);
const __dir = dirname(__filename);
return join(__dir, "workflow-worker.js");
}
export function mapWorkflowRunStatus(eventType: string): WorkflowRunStatus | null {
const map: Record<string, WorkflowRunStatus> = {
started: "started",
queued: "queued",
completed: "completed",
failed: "failed",
crashed: "crashed",
dropped: "dropped",
interrupted: "interrupted",
killed: "killed",
};
return map[eventType] ?? null;
}
export function extractExitCodeFromPayload(payload: unknown): number | null {
if (isPlainRecord(payload) && typeof payload.exitCode === "number") {
return payload.exitCode;
}
return null;
}
export function appendWorkflowRunLog(
logStore: LogStore,
workflowName: string,
runId: string,
eventType: string,
payload: unknown | undefined,
exitCode: number | null,
): void {
const timestamp = Date.now();
const serialised = payload !== undefined ? JSON.stringify(payload) : null;
const status = mapWorkflowRunStatus(eventType);
if (status !== null) {
logStore.upsertWorkflowRun(
{
source: "workflow",
type: eventType,
refId: runId,
payload: serialised,
timestamp,
},
{ runId, workflow: workflowName, status, timestamp, exitCode },
);
} else {
logStore.append({
source: "workflow",
type: eventType,
refId: runId,
payload: serialised,
timestamp,
});
}
}
export function recoverQueuedRun(
workflowName: string,
runId: string,
state: WorkflowState,
logStore: LogStore,
engineMaxRounds: number,
): void {
if (state.queue.some((q) => q.runId === runId)) return;
const launch = readLaunchFromTriggerPayload(logStore.getTriggerPayload(runId), engineMaxRounds);
state.queue.push({
runId,
prompt: launch.prompt,
maxRounds: launch.maxRounds,
dryRun: launch.dryRun,
});
process.stderr.write(
`[workflow-manager] crash-recovery: re-queued thread "${runId}" for "${workflowName}"\n`,
);
}
export function recoverStartedRun(
workflowName: string,
runId: string,
state: WorkflowState,
logStore: LogStore,
engineMaxRounds: number,
sendResume: (wf: string, msg: ResumeThreadMessage) => void,
): void {
if (state.active.has(runId)) return;
const rawMessages = logStore.getThreadMessages(runId);
const launch = readLaunchFromTriggerPayload(logStore.getTriggerPayload(runId), engineMaxRounds);
const messages = ensureThreadMessagesWithStart(
rawMessages,
runId,
launch.prompt,
launch.maxRounds,
);
state.active.add(runId);
const msg: ResumeThreadMessage = {
type: "resume-thread",
runId,
messages,
maxRounds: launch.maxRounds,
dryRun: launch.dryRun,
};
sendResume(workflowName, msg);
process.stderr.write(
`[workflow-manager] crash-recovery: resuming thread "${runId}" for "${workflowName}" (${String(messages.length)} messages)\n`,
);
}
export function recoverThreadsFromStore(
workflowName: string,
logStore: LogStore,
engineMaxRounds: number,
getOrCreateState: (name: string) => WorkflowState,
sendResume: (wf: string, msg: ResumeThreadMessage) => void,
): void {
const activeRuns = logStore.getActiveWorkflowRuns(workflowName);
const state = getOrCreateState(workflowName);
for (const run of activeRuns) {
if (run.status === "queued") {
recoverQueuedRun(workflowName, run.runId, state, logStore, engineMaxRounds);
} else if (run.status === "started") {
recoverStartedRun(workflowName, run.runId, state, logStore, engineMaxRounds, sendResume);
}
}
}
export type WorkflowManagerMessageDeps = {
logStore: LogStore;
handleThreadEvent: (workflowName: string, msg: ThreadEventMessage) => void;
onWorkflowRoleError: (
workflowName: string,
runId: string,
error: string,
exitCode: number,
) => void;
};
export function dispatchWorkflowWorkerMessage(
workflowName: string,
msg: WorkerToParentMessage,
deps: WorkflowManagerMessageDeps,
): void {
if (msg.type === "thread-event") {
deps.handleThreadEvent(workflowName, msg);
return;
}
if (msg.type === "thread-workflow-message") {
deps.logStore.append({
source: "workflow",
type: "thread_workflow_message",
refId: msg.runId,
payload: JSON.stringify(msg.message),
timestamp: Date.now(),
});
return;
}
if (msg.type === "workflow-error") {
process.stderr.write(
`[workflow-manager] workflow-error for runId "${msg.runId}" in "${workflowName}": ${msg.error}\n`,
);
deps.onWorkflowRoleError(workflowName, msg.runId, msg.error, msg.exitCode);
return;
}
if (msg.type === "error") {
process.stderr.write(`[workflow-manager] error from "${workflowName}" worker: ${msg.error}\n`);
}
}
+175 -445
View File
@@ -6,33 +6,27 @@
* Concurrency and overflow (drop/queue) are enforced here in the parent process.
*/
import { fork } from "node:child_process";
import type { ChildProcess } from "node:child_process";
import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
import type { NerveConfig, WorkflowConfig, WorkflowStatus } from "@uncaged/nerve-core";
import type {
NerveConfig,
WorkflowConfig,
WorkflowMessage,
WorkflowStatus,
} from "@uncaged/nerve-core";
import { START, isPlainRecord } from "@uncaged/nerve-core";
import type { LogStore, WorkflowRunStatus } from "@uncaged/nerve-store";
import type {
KillThreadMessage,
ResumeThreadMessage,
ShutdownMessage,
StartThreadMessage,
ThreadEventMessage,
} from "./ipc.js";
import type { LogStore } from "@uncaged/nerve-store";
import type { KillThreadMessage, StartThreadMessage, ThreadEventMessage } from "./ipc.js";
import { parseWorkerMessage } from "./ipc.js";
import {
createWorkerRuntime,
formatCapturedStderrTail,
formatChildExitSummary,
teeCapturedStderr,
} from "./worker-fork-support.js";
} from "./worker-runtime.js";
import {
DEFAULT_MAX_QUEUE,
WORKER_SHUTDOWN_TIMEOUT_MS,
WORKFLOW_WORKER_RESPAWN,
type WorkflowState,
appendWorkflowRunLog,
dispatchWorkflowWorkerMessage,
extractExitCodeFromPayload,
recoverThreadsFromStore,
resolveWorkflowWorkerScript,
} from "./workflow-manager-support.js";
export type WorkflowLaunchParams = {
prompt: string;
@@ -74,169 +68,109 @@ export type WorkflowManager = {
stop: () => Promise<void>;
};
type PendingThread = {
runId: string;
prompt: string;
maxRounds: number;
dryRun: boolean;
};
type WorkflowState = {
active: Set<string>;
queue: PendingThread[];
};
type WorkerEntry = {
workflowName: string;
process: ChildProcess;
stopping: boolean;
/** When set, the worker is draining before a hot-reload respawn. */
draining: boolean;
stderrTail: { value: string };
};
// Crash respawn backoff: track crash timestamps per workflow.
const MAX_CRASHES_IN_WINDOW = 5;
const CRASH_WINDOW_MS = 60_000;
/**
* Worker shutdown timeout — must stay in sync with SHUTDOWN_TIMEOUT_MS in workflow-worker.ts.
* The drain timeout passed to drainAndRespawn must be >= this value so the worker has
* enough time to finish in-flight threads before the parent force-kills it.
*/
const WORKER_SHUTDOWN_TIMEOUT_MS = 10_000;
const DEFAULT_MAX_QUEUE = 100;
function readLaunchFromTriggerPayload(
raw: unknown,
engineDefaultMaxRounds: number,
): { prompt: string; maxRounds: number; dryRun: boolean } {
if (isPlainRecord(raw)) {
const o = raw;
if (typeof o.prompt === "string" && typeof o.maxRounds === "number") {
const dryRun = typeof o.dryRun === "boolean" ? o.dryRun : false;
return { prompt: o.prompt, maxRounds: o.maxRounds, dryRun };
}
}
return { prompt: "", maxRounds: engineDefaultMaxRounds, dryRun: false };
}
function ensureThreadMessagesWithStart(
messages: Array<{ role: string; content: string; meta: unknown; timestamp: number }>,
threadId: string,
fallbackPrompt: string,
fallbackMaxRounds: number,
): WorkflowMessage[] {
const mapped: WorkflowMessage[] = messages.map((m) => ({
role: m.role,
content: m.content,
meta: m.meta,
timestamp: m.timestamp,
}));
if (mapped.length > 0 && mapped[0].role === START) {
return mapped;
}
const start: WorkflowMessage = {
role: START,
content: fallbackPrompt,
meta: { maxRounds: fallbackMaxRounds, threadId },
timestamp: Date.now(),
};
return [start, ...mapped];
}
function resolveWorkerScript(): string {
const __filename = fileURLToPath(import.meta.url);
const __dir = dirname(__filename);
return join(__dir, "workflow-worker.js");
}
function spawnWorkflowWorker(
nerveRoot: string,
workflowName: string,
workerScript: string,
stderrTail: { value: string },
): ChildProcess {
const child = fork(workerScript, ["--workflow", workflowName, "--root", nerveRoot], {
stdio: ["ignore", "inherit", "pipe", "ipc"],
});
teeCapturedStderr(child, stderrTail);
// Prevent unhandled EPIPE when writing to a child whose IPC channel closed
child.on("error", (err) => {
if ((err as NodeJS.ErrnoException).code !== "EPIPE") {
console.error("[worker] error:", err.message);
}
});
return child;
}
function sendStartThread(worker: ChildProcess, msg: StartThreadMessage): void {
if (worker.connected === false) return;
try {
worker.send(msg);
} catch {
// IPC channel closed between connected check and send
}
}
function sendShutdown(worker: ChildProcess, entry: WorkerEntry): void {
entry.stopping = true;
if (worker.connected === false) return;
const msg: ShutdownMessage = { type: "shutdown" };
try {
worker.send(msg);
} catch {
// IPC channel closed between connected check and send
}
}
function sendResumeThread(worker: ChildProcess, msg: ResumeThreadMessage): void {
if (worker.connected === false) return;
try {
worker.send(msg);
} catch {
// IPC channel closed between connected check and send
}
}
function sendKillThread(worker: ChildProcess, runId: string): void {
if (worker.connected === false) return;
const msg: KillThreadMessage = { type: "kill-thread", runId };
try {
worker.send(msg);
} catch {
// IPC channel closed between connected check and send
}
}
function waitForExit(child: ChildProcess, timeoutMs: number): Promise<void> {
return new Promise((resolve) => {
const timer = setTimeout(() => {
child.kill("SIGKILL");
resolve();
}, timeoutMs);
child.once("exit", () => {
clearTimeout(timer);
resolve();
});
});
}
export function createWorkflowManager(
nerveRoot: string,
initialConfig: NerveConfig,
logStore: LogStore,
): WorkflowManager {
const workerScript = resolveWorkerScript();
const workerScript = resolveWorkflowWorkerScript();
/**
* Default drain timeout must be at least WORKER_SHUTDOWN_TIMEOUT_MS so the worker
* has enough time to finish in-flight threads before the parent force-kills it.
*/
const DEFAULT_DRAIN_TIMEOUT_MS = Math.max(30_000, WORKER_SHUTDOWN_TIMEOUT_MS + 5_000);
const states = new Map<string, WorkflowState>();
const workers = new Map<string, WorkerEntry>();
const crashTimestamps = new Map<string, number[]>();
const trackedWorkflows = new Set<string>();
const hotReloadEvicting = new Set<string>();
const crashRecoveryPending = new Set<string>();
const crashLimitBlocked = new Set<string>();
let stopped = false;
let config = initialConfig;
const pendingDrains = new Set<string>();
function logWorkflowEvent(
workflowName: string,
runId: string,
eventType: string,
payload: unknown | null = null,
exitCode: number | null = null,
): void {
appendWorkflowRunLog(logStore, workflowName, runId, eventType, payload, exitCode);
}
const runtime = createWorkerRuntime<string>({
script: workerScript,
argsForKey: (workflowName) => ["--workflow", workflowName, "--root", nerveRoot],
forwardStderr: true,
onMessage: (workflowName, raw) => {
handleWorkerMessage(workflowName, raw);
},
onReady: (workflowName, _msg) => {
if (crashRecoveryPending.has(workflowName)) {
crashRecoveryPending.delete(workflowName);
recoverThreadsFromStore(
workflowName,
logStore,
config.maxRounds,
getOrCreateState,
(wf, msg) => {
sendToWorker(wf, msg);
},
);
}
},
onExit: (workflowName, code, signalStr) => {
const sig =
signalStr === null || signalStr === undefined || signalStr === ""
? null
: (signalStr as NodeJS.Signals);
if (hotReloadEvicting.has(workflowName)) {
hotReloadEvicting.delete(workflowName);
markActiveRunsInterrupted(workflowName);
if (!stopped && workflowConfig(workflowName) !== null) {
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" drained, respawning\n`,
);
}
return;
}
if (stopped) {
const state = states.get(workflowName);
if (state !== undefined) {
state.active.clear();
}
crashRecoveryPending.delete(workflowName);
return;
}
const summary = formatChildExitSummary(code, sig);
const stderrExtra = formatCapturedStderrTail(runtime.stderrTail(workflowName));
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" exited (${summary})${stderrExtra}\n`,
);
cleanupAfterUnexpectedWorkerExit(workflowName);
crashRecoveryPending.add(workflowName);
},
onCrashLimitReached: (workflowName) => {
crashRecoveryPending.delete(workflowName);
trackedWorkflows.delete(workflowName);
crashLimitBlocked.add(workflowName);
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" exceeded crash limit (${String(WORKFLOW_WORKER_RESPAWN.maxCrashes)} in ${String(WORKFLOW_WORKER_RESPAWN.windowMs)}ms) — stopping respawn\n`,
);
},
respawn: {
...WORKFLOW_WORKER_RESPAWN,
allowRespawn: (_wf) => !stopped,
},
shutdownTimeoutMs: DEFAULT_DRAIN_TIMEOUT_MS,
});
function getOrCreateState(workflowName: string): WorkflowState {
let state = states.get(workflowName);
if (state === undefined) {
@@ -250,60 +184,38 @@ export function createWorkflowManager(
return config.workflows[workflowName] ?? null;
}
function toWorkflowRunStatus(eventType: string): WorkflowRunStatus | null {
const map: Record<string, WorkflowRunStatus> = {
started: "started",
queued: "queued",
completed: "completed",
failed: "failed",
crashed: "crashed",
dropped: "dropped",
interrupted: "interrupted",
killed: "killed",
};
return map[eventType] ?? null;
}
function extractExitCode(payload: unknown): number | null {
if (isPlainRecord(payload) && typeof payload.exitCode === "number") {
return payload.exitCode;
/** IPC send — matches legacy pool: no-op when IPC is disconnected; cold-start via WorkerRuntime.send. */
function sendToWorker(workflowName: string, msg: unknown): void {
if (crashLimitBlocked.has(workflowName)) {
return;
}
return null;
}
function logWorkflowEvent(
workflowName: string,
runId: string,
eventType: string,
payload?: unknown,
exitCode: number | null = null,
): void {
const timestamp = Date.now();
const serialised = payload !== undefined ? JSON.stringify(payload) : null;
const status = toWorkflowRunStatus(eventType);
if (status !== null) {
logStore.upsertWorkflowRun(
{
source: "workflow",
type: eventType,
refId: runId,
payload: serialised,
timestamp,
},
{ runId, workflow: workflowName, status, timestamp, exitCode },
);
} else {
logStore.append({
source: "workflow",
type: eventType,
refId: runId,
payload: serialised,
timestamp,
trackedWorkflows.add(workflowName);
if (runtime.hasDisconnectedChild(workflowName)) {
return;
}
if (!runtime.trySendSync(workflowName, msg)) {
void runtime.send(workflowName, msg).catch(() => {
// IPC channel closed — mark any thread from this message as failed
if (isStartThreadMsg(msg)) {
const state = states.get(workflowName);
if (state?.active.has(msg.runId)) {
state.active.delete(msg.runId);
logWorkflowEvent(workflowName, msg.runId, "failed", { error: "IPC channel closed" }, 1);
dequeueNext(workflowName);
}
}
});
}
}
function isStartThreadMsg(msg: unknown): msg is StartThreadMessage {
return (
msg !== null &&
typeof msg === "object" &&
(msg as Record<string, unknown>).type === "start-thread"
);
}
function dispatchThread(
workflowName: string,
runId: string,
@@ -314,7 +226,6 @@ export function createWorkflowManager(
const state = getOrCreateState(workflowName);
state.active.add(runId);
const worker = getOrSpawnWorker(workflowName);
const msg: StartThreadMessage = {
type: "start-thread",
runId,
@@ -323,7 +234,7 @@ export function createWorkflowManager(
maxRounds,
dryRun,
};
sendStartThread(worker.process, msg);
sendToWorker(workflowName, msg);
logWorkflowEvent(workflowName, runId, "started", { prompt, maxRounds, dryRun });
}
@@ -367,92 +278,20 @@ export function createWorkflowManager(
if (msg.eventType === "completed" || msg.eventType === "failed" || msg.eventType === "killed") {
state.active.delete(msg.runId);
dequeueNext(workflowName);
const exitCode = extractExitCode(msg.payload);
const exitCode = extractExitCodeFromPayload(msg.payload);
logWorkflowEvent(workflowName, msg.runId, msg.eventType, msg.payload, exitCode);
maybeDeferredHotReloadDrain(workflowName);
}
}
function recoverQueuedRun(workflowName: string, runId: string, state: WorkflowState): void {
if (state.queue.some((q) => q.runId === runId)) return;
const launch = readLaunchFromTriggerPayload(
logStore.getTriggerPayload(runId),
config.maxRounds,
);
state.queue.push({
runId,
prompt: launch.prompt,
maxRounds: launch.maxRounds,
dryRun: launch.dryRun,
});
process.stderr.write(
`[workflow-manager] crash-recovery: re-queued thread "${runId}" for "${workflowName}"\n`,
);
}
function recoverStartedRun(
workflowName: string,
runId: string,
state: WorkflowState,
worker: WorkerEntry,
): void {
if (state.active.has(runId)) return;
const rawMessages = logStore.getThreadMessages(runId);
const launch = readLaunchFromTriggerPayload(
logStore.getTriggerPayload(runId),
config.maxRounds,
);
const messages = ensureThreadMessagesWithStart(
rawMessages,
runId,
launch.prompt,
launch.maxRounds,
);
state.active.add(runId);
const msg: ResumeThreadMessage = {
type: "resume-thread",
runId,
messages,
maxRounds: launch.maxRounds,
dryRun: launch.dryRun,
};
sendResumeThread(worker.process, msg);
process.stderr.write(
`[workflow-manager] crash-recovery: resuming thread "${runId}" for "${workflowName}" (${messages.length} messages)\n`,
);
}
function recoverThreadsForWorker(workflowName: string, worker: WorkerEntry): void {
const activeRuns = logStore.getActiveWorkflowRuns(workflowName);
const state = getOrCreateState(workflowName);
for (const run of activeRuns) {
if (run.status === "queued") {
recoverQueuedRun(workflowName, run.runId, state);
} else if (run.status === "started") {
recoverStartedRun(workflowName, run.runId, state, worker);
}
}
}
function recordCrashAndCheckLimit(workflowName: string): boolean {
const now = Date.now();
const timestamps = (crashTimestamps.get(workflowName) ?? []).filter(
(t) => now - t < CRASH_WINDOW_MS,
);
timestamps.push(now);
crashTimestamps.set(workflowName, timestamps);
return timestamps.length > MAX_CRASHES_IN_WINDOW;
}
function handleWorkerCrash(workflowName: string): void {
function cleanupAfterUnexpectedWorkerExit(workflowName: string): void {
const state = states.get(workflowName);
if (state === undefined) return;
const crashedCount = state.active.size;
if (crashedCount > 0) {
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" crashed with ${crashedCount} active thread(s)\n`,
`[workflow-manager] worker for "${workflowName}" crashed with ${String(crashedCount)} active thread(s)\n`,
);
for (const runId of state.active) {
logWorkflowEvent(workflowName, runId, "crashed", undefined, 255);
@@ -460,26 +299,13 @@ export function createWorkflowManager(
}
state.active.clear();
workers.delete(workflowName);
pendingDrains.delete(workflowName);
if (stopped || workflowConfig(workflowName) === null) return;
if (recordCrashAndCheckLimit(workflowName)) {
const count = crashTimestamps.get(workflowName)?.length ?? 0;
if (!stopped && !crashLimitBlocked.has(workflowName) && workflowConfig(workflowName) !== null) {
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" crashed ${count} times in ${CRASH_WINDOW_MS}ms — stopping respawn\n`,
`[workflow-manager] respawning worker for "${workflowName}" after crash\n`,
);
return;
}
process.stderr.write(
`[workflow-manager] respawning worker for "${workflowName}" after crash\n`,
);
const newWorker = getOrSpawnWorker(workflowName);
setImmediate(() => {
recoverThreadsForWorker(workflowName, newWorker);
});
}
function handleWorkerMessage(workflowName: string, raw: unknown): void {
@@ -490,43 +316,19 @@ export function createWorkflowManager(
);
return;
}
const msg = result.value;
if (msg.type === "thread-event") {
handleThreadEvent(workflowName, msg);
return;
}
if (msg.type === "thread-workflow-message") {
logStore.append({
source: "workflow",
type: "thread_workflow_message",
refId: msg.runId,
payload: JSON.stringify(msg.message),
timestamp: Date.now(),
});
return;
}
if (msg.type === "workflow-error") {
process.stderr.write(
`[workflow-manager] workflow-error for runId "${msg.runId}" in "${workflowName}": ${msg.error}\n`,
);
const state = states.get(workflowName);
if (state !== undefined) {
state.active.delete(msg.runId);
dequeueNext(workflowName);
}
logWorkflowEvent(workflowName, msg.runId, "failed", { error: msg.error }, msg.exitCode);
maybeDeferredHotReloadDrain(workflowName);
return;
}
if (msg.type === "error") {
process.stderr.write(
`[workflow-manager] error from "${workflowName}" worker: ${msg.error}\n`,
);
}
dispatchWorkflowWorkerMessage(workflowName, result.value, {
logStore,
handleThreadEvent,
onWorkflowRoleError: (wf, runId, error, exitCode) => {
const state = states.get(wf);
if (state !== undefined) {
state.active.delete(runId);
dequeueNext(wf);
}
logWorkflowEvent(wf, runId, "failed", { error }, exitCode);
maybeDeferredHotReloadDrain(wf);
},
});
}
function markActiveRunsInterrupted(workflowName: string): void {
@@ -538,67 +340,6 @@ export function createWorkflowManager(
state.active.clear();
}
function handleWorkerExit(
workflowName: string,
code: number | null,
signal: NodeJS.Signals | null,
): void {
const entry = workers.get(workflowName);
if (entry?.draining) {
workers.delete(workflowName);
markActiveRunsInterrupted(workflowName);
if (!stopped && workflowConfig(workflowName) !== null) {
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" drained, respawning\n`,
);
getOrSpawnWorker(workflowName);
}
return;
}
if (entry?.stopping) {
workers.delete(workflowName);
const state = states.get(workflowName);
if (state !== undefined) {
state.active.clear();
}
return;
}
const summary = formatChildExitSummary(code, signal);
const stderrExtra = entry !== undefined ? formatCapturedStderrTail(entry.stderrTail.value) : "";
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" exited (${summary})${stderrExtra}\n`,
);
handleWorkerCrash(workflowName);
}
function getOrSpawnWorker(workflowName: string): WorkerEntry {
const existing = workers.get(workflowName);
if (existing !== undefined && existing.process.exitCode === null) {
return existing;
}
const stderrTail = { value: "" };
const child = spawnWorkflowWorker(nerveRoot, workflowName, workerScript, stderrTail);
child.on("message", (raw: unknown) => {
handleWorkerMessage(workflowName, raw);
});
child.on("exit", (code, signal) => {
handleWorkerExit(workflowName, code, signal ?? null);
});
const entry: WorkerEntry = {
workflowName,
process: child,
stopping: false,
draining: false,
stderrTail,
};
workers.set(workflowName, entry);
return entry;
}
function killThread(runId: string): boolean {
for (const [workflowName, state] of states) {
const queueIdx = state.queue.findIndex((q) => q.runId === runId);
@@ -609,10 +350,8 @@ export function createWorkflowManager(
}
if (state.active.has(runId)) {
const workerEntry = workers.get(workflowName);
if (workerEntry !== undefined) {
sendKillThread(workerEntry.process, runId);
}
const msg: KillThreadMessage = { type: "kill-thread", runId };
sendToWorker(workflowName, msg);
return true;
}
}
@@ -663,7 +402,7 @@ export function createWorkflowManager(
state.queue.push({ runId, prompt, maxRounds, dryRun });
logWorkflowEvent(workflowName, runId, "queued");
process.stderr.write(
`[workflow-manager] queued thread for "${workflowName}" runId "${runId}" (queue length: ${state.queue.length})\n`,
`[workflow-manager] queued thread for "${workflowName}" runId "${runId}" (queue length: ${String(state.queue.length)})\n`,
);
}
@@ -707,35 +446,29 @@ export function createWorkflowManager(
config = newConfig;
}
/**
* Default drain timeout must be at least WORKER_SHUTDOWN_TIMEOUT_MS so the worker
* has enough time to finish in-flight threads before the parent force-kills it.
*/
const DEFAULT_DRAIN_TIMEOUT_MS = Math.max(30_000, WORKER_SHUTDOWN_TIMEOUT_MS + 5_000);
async function drainAndRespawn(
workflowName: string,
drainTimeoutMs: number = DEFAULT_DRAIN_TIMEOUT_MS,
): Promise<void> {
const entry = workers.get(workflowName);
if (entry === undefined) {
// No active worker — nothing to drain
if (!trackedWorkflows.has(workflowName)) {
return;
}
entry.draining = true;
// Send shutdown without setting stopping=true (so the exit handler uses the draining branch)
if (entry.process.connected) {
const msg: ShutdownMessage = { type: "shutdown" };
try {
entry.process.send(msg);
} catch {
// IPC closed
const shutdownMs = Math.max(drainTimeoutMs, WORKER_SHUTDOWN_TIMEOUT_MS);
hotReloadEvicting.add(workflowName);
try {
await runtime.evict(workflowName, { shutdownTimeoutMs: shutdownMs });
trackedWorkflows.delete(workflowName);
if (!stopped && workflowConfig(workflowName) !== null) {
trackedWorkflows.add(workflowName);
await runtime.start(workflowName);
}
} finally {
hotReloadEvicting.delete(workflowName);
}
await waitForExit(entry.process, drainTimeoutMs);
// The exit handler (draining branch) will respawn the worker automatically
}
function drainWhenIdle(workflowName: string): void {
const state = states.get(workflowName);
const hasActiveRuns = state !== undefined && state.active.size > 0;
@@ -761,20 +494,17 @@ export function createWorkflowManager(
pendingDrains.add(workflowName);
process.stderr.write(
`[workflow-manager] deferring hot-reload for "${workflowName}" until ${state.active.size} active run(s) complete\n`,
`[workflow-manager] deferring hot-reload for "${workflowName}" until ${String(state.active.size)} active run(s) complete\n`,
);
}
async function stop(): Promise<void> {
stopped = true;
pendingDrains.clear();
const exitPromises: Promise<void>[] = [];
for (const entry of workers.values()) {
sendShutdown(entry.process, entry);
exitPromises.push(waitForExit(entry.process, 5000));
}
await Promise.all(exitPromises);
workers.clear();
hotReloadEvicting.clear();
crashRecoveryPending.clear();
await runtime.shutdown();
trackedWorkflows.clear();
}
return {
+1 -1
View File
@@ -30,7 +30,7 @@ import type {
WorkerToParentMessage,
} from "./ipc.js";
import { parseParentMessage } from "./ipc.js";
import { ignoreSessionBroadcastSignals } from "./worker-fork-support.js";
import { ignoreSessionBroadcastSignals } from "./worker-signals.js";
// ---------------------------------------------------------------------------
// IPC helpers
-2
View File
@@ -26,13 +26,11 @@ export {
} from "./role-decorators.js";
export {
nerveCommandEnv,
spawnSafe,
type SpawnEnv,
type SpawnError,
type SpawnResult,
type SpawnSafeOptions,
} from "@uncaged/nerve-core";
export type { LlmError, LlmProvider } from "./shared/llm-extract.js";
export { isDryRun } from "./role-types.js";
export type { LlmMessage, MetaExtractConfig } from "./role-types.js";
export type { LlmChatError } from "./shared/llm-chat.js";
+1 -9
View File
@@ -1,16 +1,8 @@
import type { SpawnEnv, StartStep, ThreadContext } from "@uncaged/nerve-core";
import type { SpawnEnv, ThreadContext } from "@uncaged/nerve-core";
import type { z } from "zod";
import type { LlmProvider } from "./shared/llm-extract.js";
/**
* @deprecated `dryRun` has been removed from `StartStep.meta` (RFC-005).
* Use adapter/role-level `dryRun` config instead.
*/
export function isDryRun(_start: StartStep): boolean {
return false;
}
export type CliPromptFn = (ctx: ThreadContext) => Promise<string>;
export type LlmMessage = { role: "system" | "user" | "assistant"; content: string };
@@ -3,7 +3,7 @@ import type { HermesRoleDefaults, HermesRoleRequired } from "../role-types.js";
export { hermesAgent } from "@uncaged/nerve-adapter-hermes";
export type { HermesAgentOptions } from "@uncaged/nerve-adapter-hermes";
// --- Hermes options resolution (absorbed from hermes-options.ts) ---
// --- Hermes role defaults resolution ---
const HERMES_DEFAULTS: HermesRoleDefaults = {
model: null,