Compare commits

..

9 Commits

Author SHA1 Message Date
tuanzi 5db80c99a0 feat(cli): nerve agent inject cursor — generate .cursorrules
Phase 4 of #289:
- Add cursor target to nerve agent inject/remove/status
- Generate .cursorrules from latest SKILL.md content (aligned with PR #301):
  - Parameterless SenseComputeFn compute signature
  - createRole four-tuple for LLM integration
  - Flat roles/<role>.ts layout
  - Sense persistence pitfall
  - nerve agent CLI docs including cursor commands
- Cursor-specific tips (@file, @Folder, @Terminal references)
- Version tracking via .nerve-version + HTML comment
- --path flag for cursor (project-local, not global like hermes)

Closes #299
Ref: #289
2026-04-30 14:38:19 +00:00
xingyue 4d75c8683f Merge pull request 'docs(cli): sync Hermes SKILL.md with flat workspace and runtime types' (#301) from fix/298-update-hermes-skill into main 2026-04-30 14:20:36 +00:00
xingyue 99b0e58fb6 Merge pull request 'chore: RFC-006 Phase 4 cleanup — delete worker-fork-support.ts' (#300) from chore/rfc-006-cleanup into main 2026-04-30 14:19:44 +00:00
xiaomo a1b1d5eaf1 chore: RFC-006 Phase 4 cleanup — delete worker-fork-support.ts
- Move formatChildExitSummary/formatCapturedStderrTail to worker-runtime.ts
- Move ignoreSessionBroadcastSignals to new worker-signals.ts
- Delete worker-fork-support.ts (teeCapturedStderr no longer used)
- Update .knowledge/worker-isolation.md and architecture.md for WorkerRuntime
- All 167 tests pass, biome check clean

Closes #283
2026-04-30 14:17:16 +00:00
xiaoju 1849789c02 docs(cli): sync Hermes SKILL with workspace AGENT and runtime types
Update sense compute docs to SenseComputeFn (no DB/peers). Document AGENT.md, flat roles, moderator/build helpers, createRole + createLlmAdapter, verb-first naming, nerve agent commands, and root npm/pnpm build.

Fixes #298

Made-with: Cursor
2026-04-30 14:15:14 +00:00
xingyue 7ce46e7735 Merge pull request 'RFC-006 Phase 3: Migrate workflow-manager process logic to WorkerRuntime' (#295) from refactor/rfc-006-workflow-runtime into main 2026-04-30 14:14:05 +00:00
xiaomo 0455f928f5 fix: address review feedback (星月) for Phase 3
1. sendToWorker: IPC send failure now marks thread as failed + dequeues next
2. crashLimitBlocked Set: prevents new startWorkflow from bypassing crash limit
3. "respawning" log skipped when crash limit is active
4. logWorkflowEvent payload: unknown | null (project convention, not ?:)
2026-04-30 14:10:06 +00:00
xingyue 11cedfb5a5 Merge pull request 'refactor(cli): dynamic version for nerve agent — Phase 3 of RFC #289' (#297) from feat/agent-inject-phase3 into main 2026-04-30 14:08:03 +00:00
xiaomo dc4454d23e refactor: migrate workflow-manager process logic to WorkerRuntime (RFC-006 Phase 3)
- workflow-manager.ts: 792 → 498 lines, no more fork/ChildProcess/crash counting
- Extract pure functions to workflow-manager-support.ts (256 lines)
- WorkerRuntime gains: onCrashLimitReached callback, WorkerDrainOpts for per-call timeout
- worker-pool.ts updated for new evict/drain signature (null opts)
- All 167 daemon tests pass

Closes #282
2026-04-30 13:56:43 +00:00
18 changed files with 1467 additions and 602 deletions
+1
View File
@@ -33,6 +33,7 @@ Senses own both the "what" (compute logic) and the "when" (config-driven schedul
- One worker per Workflow type (on-demand)
- Workers never talk to each other
- All user code runs in isolated Workers; kernel never loads user code directly
- **`WorkerRuntime`** (`packages/daemon/src/worker-runtime.ts`) centralizes fork lifecycle for both sense groups (`worker-pool.ts`) and workflow types (`workflow-manager.ts`); see `.knowledge/worker-isolation.md`
## Storage Systems
+8 -2
View File
@@ -12,6 +12,12 @@ Kernel (Main Process)
└── Workflow Worker (review) ── review workflow instances
```
### WorkerRuntime (RFC-006)
Forked worker processes are managed by **`WorkerRuntime`** (`worker-runtime.ts`): one Node child per logical key, cold start, optional respawn after crash, drain/evict, and coordinated shutdown over IPC. **`worker-pool.ts`** (sense groups) and **`workflow-manager.ts`** (workflow types) both configure and delegate to `createWorkerRuntime` instead of owning ad-hoc fork logic.
Worker **entrypoints** (`sense-worker.ts`, `workflow-worker.ts`) import lightweight helpers only — e.g. `worker-signals.ts` for session broadcast signal handling — so they do not pull in the parent-side runtime module.
## Isolation Boundaries
### 1. Sense Workers
@@ -111,10 +117,10 @@ workflows:
### Process Management
#### Signal Handling
Workers ignore session broadcast signals (SIGINT/SIGTERM):
Workers ignore session broadcast signals (SIGINT/SIGTERM) via `ignoreSessionBroadcastSignals()` in `worker-signals.ts`:
```typescript
// Workers ignore terminal signals; kernel coordinates shutdown
process.on("SIGINT", () => {});
process.on("SIGINT", () => {});
process.on("SIGTERM", () => {});
```
+587
View File
@@ -0,0 +1,587 @@
<!-- nerve-cli-version: __NERVE_CLI_VERSION__ -->
## Cursor Agent 使用提示
在 Cursor 中与 Agent 对话时,可以用以下方式指代代码与配置:
- **`@Files` / `@file`**:引用单个文件,例如 `@nerve.yaml`、`@senses/cpu-usage/src/index.ts`,减少幻觉并让修改对准正确路径。
- **`@Folder` / `@Codebase`**:需要跨目录理解工作区结构时使用;改动前仍应优先打开相关 sense/workflow 源文件确认。
- **`@Terminal`**:把 CLI 输出纳入上下文,便于对照 `nerve daemon logs`、`nerve sense query` 等结果。
- **`@Docs`**:若项目或依赖有文档索引,可用来对齐 API 与约定。
- 工作区根目录下的 **`nerve.yaml`**、`senses/`、`workflows/` 是 nerve 的核心入口;讨论调度与配置时优先 `@` 这些路径。
- 本规则由 `nerve agent inject cursor` 安装;更新 CLI 后在同一目录再次执行可覆盖为新版。
---
# Nerve — AI Agent 观测引擎
Nerve 是一个轻量级观测引擎守护进程。它持续观测外部状态,通过声明式规则响应变化,编排多步骤工作流。
## 核心架构
```
External World → Sense → Signal → Workflow → Log
```
| 概念 | 说明 |
|------|------|
| **Sense** | 观测函数,`compute()` 采样或推导数据。返回非 null 则发出 Signal,可选触发 Workflow。每个 Sense 有独立 SQLite 数据库。 |
| **Signal** | Sense 返回非 null 时发出的通知。纯事实,无意图。通过内存 Signal Bus 分发,不持久化。 |
| **Workflow** | 有状态的多步骤执行。包含 Role(有副作用的执行者)和 Moderator(纯路由器)。每个实例是一个 Thread,有唯一 runId。 |
| **Log** | 不可变审计日志。记录执行、状态转换、错误。不能触发 Sense(防止反馈循环)。 |
| **Engine** | 内核,持有 Signal Bus、Process Manager、Workflow Manager。不直接加载用户代码。 |
| **Daemon** | 引擎运行时,作为后台进程运行。 |
**关键规则:**
- 因果链单向:External → Sense → Signal → Workflow + Log
- 进程隔离:每个 Sense group 一个 worker(长期),每个 Workflow 类型一个 worker(按需)
- 两个扩展点:Sense(观测什么 + 何时)、Workflow(做什么)
## 工作区结构
由 `nerve init` 生成的工作区根目录(默认 `~/.uncaged-nerve/`)包含 **`AGENT.md`**。实现 sense/workflow 前先阅读该文件:它与本文 skill 对齐,约定目录布局、`createRole` 用法以及**始终在仓库根目录**执行的构建命令。
```
~/.uncaged-nerve/
├── AGENT.md # 人类 / Agent 可读的工作区约定(init 生成)
├── nerve.yaml # 核心配置
├── package.json # 单一根包(sense/workflow 下不再有独立 package)
├── scripts/build.mjs # 根目录 esbuild;通过 npm/pnpm 的 build 脚本调用
├── senses/
│ └── <name>/
│ ├── src/index.ts # exports compute() + table
│ ├── src/schema.ts # Drizzle 表定义
│ └── migrations/ # SQL 迁移
├── workflows/
│ └── <name>/
│ ├── index.ts # default export:WorkflowDefinition
│ ├── moderator.ts # 可选:抽出 moderator,由 index 导入
│ ├── build.ts # 可选:共享常量 / 纯函数(避免 index 臃肿;非 esbuild 入口)
│ └── roles/
│ └── <role>.ts # 每角色单文件(推荐平铺,而非 roles/<role>/index.ts)
└── data/ # 运行时数据(SQLite、blobs)
```
### 命名约定
- **Workflow**:动词开头的 kebab-case(例如 `review-pull-request`、`deploy-staging`)。避免单独名词式命名(如 `notifications`)。
- **Sense**:描述性名词 kebab-case(例如 `cpu-usage`)。
---
## CLI 完整参考
全局选项:`--host <host:port>`(连接远程 daemon)、`--api-token <secret>`(Bearer 认证)
### 初始化与脚手架
```bash
nerve init # 初始化工作区
nerve init --from <git-url> # 从 git 仓库克隆工作区
nerve init workspace # 只初始化工作区结构
nerve create sense <name> # 创建 sense 脚手架
nerve create sense <name> --force # 覆盖已有
nerve create workflow <name> # 创建 workflow 脚手架
nerve create workflow <name> --force
nerve validate # 验证 nerve.yaml 配置
```
### Daemon 管理
```bash
nerve daemon start # 启动后台 daemon
nerve daemon start --port 3000 # 指定 HTTP API 端口
nerve daemon stop # 停止 daemon
nerve daemon restart # 重启
nerve daemon status # 查看状态
nerve daemon logs # 查看日志
nerve daemon logs --follow # 实时日志
nerve daemon logs --n 50 # 最近 50 行
nerve dev # 前台开发模式(不 fork daemon)
nerve dev --port 3000 # 指定端口
```
### Sense 操作
```bash
nerve sense list # 列出所有注册的 sense
nerve sense trigger <name> # 手动触发 sense 计算
nerve sense schema <name> # 查看 sense 数据库表结构
nerve sense schema <name> --json # JSON 格式
nerve sense query <name> <sql> # 对 sense 数据库执行只读 SQL
nerve sense query <name> "SELECT * FROM samples ORDER BY ts DESC LIMIT 10" --json
```
### Workflow 操作
```bash
nerve workflow list # 列出 nerve.yaml 中定义的 workflow
nerve workflow status # 查看运行中的 workflow 状态
nerve workflow trigger <name> # 触发 workflow
nerve workflow trigger <name> --prompt "检查生产环境"
nerve workflow trigger <name> --maxRounds 50
nerve workflow trigger <name> --dryRun # 干跑模式
```
### Thread(Workflow 执行记录)
```bash
nerve thread list # 列出最近的 workflow 执行
nerve thread list --all # 包含已完成/失败的
nerve thread list --workflow <name> # 按 workflow 过滤
nerve thread list --limit 50 # 最多 50 条
nerve thread show <runId> # 查看 role 对话轮次
nerve thread show <runId> --budget 16000 # 增大输出预算(默认 8000 字符)
nerve thread inspect <runId> # 查看详情和事件
nerve thread kill <runId> # 终止运行中/排队中的 thread
```
### Store(日志归档)
```bash
nerve store archive # 导出旧日志到 JSONL 归档
nerve store archive --vacuum # 归档后 VACUUM 数据库
```
### Knowledge(知识库)
```bash
nerve knowledge sync # 从 knowledge.yaml 重建索引
nerve knowledge query "搜索内容" # 搜索知识库
nerve knowledge query "内容" --limit 5
nerve knowledge query "内容" -g # 搜索所有注册仓库
```
### Remote(远程 daemon)
```bash
nerve remote add <name> <host:port> --token <secret>
nerve remote list
nerve remote show <name>
nerve remote set-url <name> <host>
nerve remote set-token <name> <token>
nerve remote remove <name>
nerve remote default <name> # 设为默认远程
```
### Agent(向 Hermes 注入本 skill)
```bash
nerve agent status # CLI 版本与各 Hermes 注入目录中的 skill 版本
nerve agent inject hermes # 安装到 ~/.hermes/skills/nerve
nerve agent inject hermes --profile <name> # 写入 ~/.hermes/profiles/<name>/skills/nerve
nerve agent update # 将所有已注入目录更新到当前 CLI 对应版本
nerve agent remove hermes # 移除默认 profile 的注入
nerve agent remove hermes --profile <name>
nerve agent inject cursor # 在 cwd 生成 .cursorrules
nerve agent inject cursor --path /foo # 在指定目录生成
nerve agent remove cursor [--path /foo]
```
---
## nerve.yaml 配置参考
```yaml
# 引擎全局配置
max_rounds: 100 # moderator 最大轮次(默认 100)
# Sense 配置
senses:
cpu-usage:
group: system # 必填,同 group 的 sense 共享 worker
interval: 10s # 轮询间隔(duration: 5s, 10m, 1h)
throttle: 5s # 最小计算间隔
timeout: 10s # compute 超时
grace_period: null # 优雅关闭等待
retention: 10000 # _signals 表最大行数(默认 10000)
system-health:
group: derived
on: [cpu-usage, disk-usage] # 响应式:被列出的 sense 发出 signal 时触发
throttle: null
timeout: null
# Workflow 配置
workflows:
my-workflow:
concurrency: 1 # 必填,并发数
overflow: drop # 必填,超并发时处理:drop | queue
max_queue: 100 # overflow=queue 时的队列上限(默认 100)
# HTTP API
api:
port: 3000 # null = 不启用 HTTP
host: "127.0.0.1" # 监听地址
token: null # 非 loopback 时必填
# LLM Extract(可选)
extract:
provider: anthropic
model: claude-sonnet-4-20250514
```
---
## Sense 开发指南
### compute 函数签名
Sense 的 `compute` **无参数**。它不接收数据库句柄:daemon 在 worker 内调用 `SenseComputeFn`,由运行时负责把非 null 结果的 `signal` 写入该 sense 的 Drizzle 表并记入 `_signals`。超时由运行时控制(对应 `nerve.yaml` 里的 `timeout`),无需在业务代码里读取 `AbortSignal`。
```typescript
import type { ComputeResult, SenseComputeFn } from "@uncaged/nerve-core";
export const compute: SenseComputeFn<MySignalShape> = async () => {
// ...
};
// 或等价地:
export async function compute(): Promise<ComputeResult<MySignalShape>> {
// ...
}
```
(运行时定义见 `@uncaged/nerve-core` 的 `SenseComputeFn` / `SenseModule`,daemon 侧在 `sense-runtime.ts` 的 `executeCompute` 中插入 `result.signal`。)
### 返回值
```typescript
// 返回 null = 静默,不发 signal
// 返回非 null = 发出 signal(并写入业务表),可选触发 workflow
type ComputeResult<T> =
| null
| { signal: T; workflow: WorkflowTrigger | null };
type WorkflowTrigger = {
name: string; // workflow 名称(对应 nerve.yaml 中的 key)
maxRounds: number; // moderator 最大轮次
prompt: string; // 初始 prompt
dryRun: boolean; // 干跑模式
};
```
若返回值是普通对象且不含 `signal` 字段,内核会按 shorthand 视为 `{ signal: payload, workflow: null }`(见 core 的 `routeSenseComputeOutput`)。
### Sense 模块导出
```typescript
// senses/<name>/src/index.ts
import type { ComputeResult } from "@uncaged/nerve-core";
import { table } from "./schema.js";
type Row = { ts: number; value: number };
export async function compute(): Promise<ComputeResult<Row>> {
const row: Row = { ts: Date.now(), value: Math.random() }; // 替换为真实观测逻辑
return { signal: row, workflow: null };
}
export { table };
```
### Schema 定义
```typescript
// senses/<name>/src/schema.ts
import { sqliteTable, integer, real } from "drizzle-orm/sqlite-core";
export const table = sqliteTable("samples", {
ts: integer("ts").notNull(),
value: real("value").notNull(),
});
```
### 调度方式
1. **interval 轮询**:`interval: 10s` — 每 10 秒执行一次
2. **响应式触发**:`on: [cpu-usage]` — 当 cpu-usage 发出 signal 时触发
3. 两者可以组合
### 调试
```bash
nerve dev # 前台运行,看实时输出
nerve sense trigger <name> # 手动触发一次
nerve sense query <name> "SELECT * FROM samples ORDER BY ts DESC LIMIT 5"
```
### 完整示例:CPU 监控
```typescript
// senses/cpu-usage/src/schema.ts
import { sqliteTable, integer, real } from "drizzle-orm/sqlite-core";
export const table = sqliteTable("samples", {
ts: integer("ts").notNull(),
value: real("value").notNull(),
});
// senses/cpu-usage/src/index.ts
import os from "node:os";
import type { ComputeResult } from "@uncaged/nerve-core";
import { table } from "./schema.js";
type Row = { ts: number; value: number };
export async function compute(): Promise<ComputeResult<Row>> {
const oneMin = os.loadavg()[0];
return { signal: { ts: Date.now(), value: oneMin }, workflow: null };
}
export { table };
```
nerve.yaml:
```yaml
senses:
cpu-usage:
group: system
interval: 10s
throttle: 5s
timeout: 10s
retention: 10000
```
---
## Workflow 开发指南
### 核心类型
```typescript
import type {
WorkflowDefinition,
RoleResult,
ThreadContext,
RoleMeta,
Moderator,
} from "@uncaged/nerve-core";
import { END } from "@uncaged/nerve-core";
// Role<Meta> — (ctx: ThreadContext) => Promise<RoleResult<Meta>>
// RoleResult<Meta> — { content: string; meta: Meta }
// ThreadContext<M extends RoleMeta> — threadId, start(__start__ 帧), steps(各 role 轮次)
// Moderator<M> — (ctx) => 下一个 role 名 | END
// WorkflowDefinition<M extends RoleMeta> — name, roles, moderator
```
### createRole 四元组(接入 LLM 时推荐)
工作区根目录需安装 **`@uncaged/nerve-workflow-utils`**(及所选 agent 适配器包)。默认 `nerve init` 的 `package.json` 不含该依赖时,在 `~/.uncaged-nerve` 下执行 `pnpm add @uncaged/nerve-workflow-utils`(或 npm 等价命令)。
使用 **`createRole`**,按固定顺序传入四件事:
1. **adapter** — `AgentFn`,`(ctx, systemPrompt) => Promise<string>`(原始模型输出文本)。
2. **prompt** — `string`,或 `async (ctx: ThreadContext) => string`。
3. **meta** — `z.ZodType<M>`,供 moderator 路由的结构化 meta。
4. **extract** — `{ provider: LlmProvider; dryRun: boolean | null }`,声明从回复中抽取 meta 时用的 LLM(OpenAI 兼容)及是否 dry-run。
```typescript
import { createLlmAdapter, createRole } from "@uncaged/nerve-workflow-utils";
import type { ThreadContext } from "@uncaged/nerve-core";
import { z } from "zod";
const provider = {
baseUrl: "https://api.example.com/v1",
apiKey: process.env.EXAMPLE_API_KEY!,
model: "gpt-4o-mini",
};
const planMeta = z.object({ next: z.enum(["execute", "stop"]) });
export const planner = createRole(
createLlmAdapter(provider),
async (ctx: ThreadContext) => `规划任务:${ctx.start.content}`,
planMeta,
{ provider, dryRun: null },
);
```
`createLlmAdapter` 仅位于 **`@uncaged/nerve-workflow-utils`**:用 `LlmProvider` 生成 `AgentFn`,单轮对话里 **system** 来自 `createRole` 解析后的 prompt 字符串,**user** 为线程起点 `ctx.start.content`。
### 基本 Workflow 示例(平铺 `roles/<role>.ts`)
```typescript
// workflows/example/roles/main.ts
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
export async function main(ctx: ThreadContext): Promise<RoleResult<{ round: number }>> {
const prompt = ctx.start.content;
return {
content: `处理完成: ${prompt}`,
meta: { round: ctx.steps.length },
};
}
```
```typescript
// workflows/example/index.ts
import type { ThreadContext, WorkflowDefinition } from "@uncaged/nerve-core";
import { END } from "@uncaged/nerve-core";
import { main } from "./roles/main.js";
type Meta = Record<"main", { round: number }>;
const workflow: WorkflowDefinition<Meta> = {
name: "example",
roles: { main },
moderator(ctx: ThreadContext<Meta>) {
return ctx.steps.length === 0 ? "main" : END;
},
};
export default workflow;
```
可选:将 `moderator` 挪到 `moderator.ts` 再 `import { route } from "./moderator.js"`,保持 `index.ts` 只负责组装 `WorkflowDefinition`。
### 多 Role Workflow 示例
```typescript
// workflows/plan-execute-review/roles/planner.ts
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
export async function planner(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
void ctx;
return { content: "计划: ...", meta: { status: "planned" } };
}
```
```typescript
// workflows/plan-execute-review/roles/executor.ts
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
export async function executor(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
void ctx;
return { content: "执行: ...", meta: { status: "executed" } };
}
```
```typescript
// workflows/plan-execute-review/roles/reviewer.ts
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
export async function reviewer(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
void ctx;
return { content: "审核通过", meta: { status: "approved" } };
}
```
```typescript
// workflows/plan-execute-review/index.ts
import type { WorkflowDefinition, ThreadContext } from "@uncaged/nerve-core";
import { END } from "@uncaged/nerve-core";
import { executor } from "./roles/executor.js";
import { planner } from "./roles/planner.js";
import { reviewer } from "./roles/reviewer.js";
type Roles = Record<"planner" | "executor" | "reviewer", { status: string }>;
const workflow: WorkflowDefinition<Roles> = {
name: "plan-execute-review",
roles: { planner, executor, reviewer },
moderator(ctx: ThreadContext<Roles>) {
if (ctx.steps.length === 0) return "planner";
const last = ctx.steps[ctx.steps.length - 1];
if (last.role === "planner") return "executor";
if (last.role === "executor") return "reviewer";
return END;
},
};
export default workflow;
```
### Agent 适配器
Workflow role 可以集成 AI agent。已知适配器 **ID**:`echo`、`cursor`、`hermes`、`codex`。
```typescript
type AgentFn = (ctx: ThreadContext, systemPrompt: string) => Promise<string>;
```
没有现成 agent 包时,用 **`createLlmAdapter`(`@uncaged/nerve-workflow-utils`)** 从 OpenAI 兼容的 `LlmProvider` 构造 `AgentFn`,再交给 **`createRole`** 的四元组。
### Workflow 运行状态
`queued` → `started` → `completed` | `failed` | `crashed` | `killed` | `interrupted` | `dropped`
---
## 日常操作 Pattern
### 查看系统整体状态
```bash
nerve daemon status # daemon 是否在运行
nerve sense list # 所有 sense 及其调度配置
nerve workflow status # 运行中的 workflow
nerve thread list # 最近的 workflow 执行记录
```
### 检查某个 sense 的历史数据
```bash
nerve sense query cpu-usage "SELECT * FROM samples ORDER BY ts DESC LIMIT 10" --json
nerve sense schema cpu-usage # 查看表结构
```
### 手动触发 workflow
```bash
nerve workflow trigger my-workflow --prompt "手动检查"
nerve thread list --workflow my-workflow # 查看执行状态
nerve thread show <runId> # 查看对话详情
```
### 排查 sense 报错
```bash
nerve daemon logs --follow # 查看实时日志
nerve sense trigger <name> # 手动触发看报错
nerve dev # 前台模式,更详细的输出
```
### 开发新 sense
```bash
nerve create sense my-sensor # 脚手架
# 编辑 senses/my-sensor/src/index.ts 和 schema.ts
nerve validate # 验证配置
nerve dev # 前台测试
nerve sense trigger my-sensor # 单次触发验证
nerve sense query my-sensor "SELECT * FROM ..." # 检查数据
```
### 开发新 workflow
```bash
nerve create workflow my-flow # 脚手架(当前 CLI 可能仍生成 roles/<name>/ 子目录)
# 推荐对齐 AGENT.md:workflows/my-flow/index.ts + roles/<role>.ts(平铺),moderator 可拆到 moderator.ts
nerve validate # 验证配置
cd ~/.uncaged-nerve && npm run build # 工作区根目录构建(等价:pnpm run build);勿在单个 workflow 子目录单独跑 build
nerve workflow trigger my-flow --prompt "测试" --dryRun # 干跑
nerve thread show <runId> # 查看执行轨迹
```
---
## Pitfalls
- **Sense 返回值**:返回 `null` 表示静默(不发 signal);返回 `{ signal, workflow }` 才发 signal。不要返回 undefined。
- **Sense 持久化**:daemon 在 `compute()` 返回非 null 时自动执行 `db.insert(table).values(signal)` 并写入 `_signals`;业务代码不要自行 insert。
- **no optional properties**:nerve 代码规范禁止 `?:`,用 `T | null` 代替。
- **函数式风格**:用 `function` + `type`,不用 `class` + `interface`。
- **workflow 用 default export**:工作区里通常只有 `workflows/<name>/index.ts` 使用 default export(daemon 加载约定)。
- **_signals 表**:每个 sense 自动有 `_signals` 表记录 signal 历史,受 `retention` 配置限制。
- **concurrency + overflow**:workflow 必须配置并发策略,否则验证失败。
- **moderator 是同步函数**:不要加 async,moderator 是纯路由逻辑,不能有副作用。
+148 -73
View File
@@ -36,23 +36,34 @@ External World → Sense → Signal → Workflow → Log
## 工作区结构
`nerve init` 生成的工作区根目录(默认 `~/.uncaged-nerve/`)包含 **`AGENT.md`**。实现 sense/workflow 前先阅读该文件:它与本文 skill 对齐,约定目录布局、`createRole` 用法以及**始终在仓库根目录**执行的构建命令。
```
~/.uncaged-nerve/ # 默认工作区(nerve init 创建)
~/.uncaged-nerve/
├── AGENT.md # 人类 / Agent 可读的工作区约定(init 生成)
├── nerve.yaml # 核心配置
├── package.json # 单一根包(sense/workflow 下不再有独立 package)
├── scripts/build.mjs # 根目录 esbuild;通过 npm/pnpm 的 build 脚本调用
├── senses/
│ └── <name>/
│ ├── src/index.ts # exports compute() + table
│ ├── src/schema.ts # drizzle 表定义
│ ├── src/schema.ts # Drizzle 表定义
│ └── migrations/ # SQL 迁移
├── workflows/
│ └── <name>/
│ ├── index.ts # exports WorkflowDefinition
── roles/<role>/
├── index.ts # role 实现
└── prompt.md # 可选 system prompt
│ ├── index.ts # default exportWorkflowDefinition
── moderator.ts # 可选:抽出 moderator,由 index 导入
├── build.ts # 可选:共享常量 / 纯函数(避免 index 臃肿;非 esbuild 入口)
│ └── roles/
│ └── <role>.ts # 每角色单文件(推荐平铺,而非 roles/<role>/index.ts)
└── data/ # 运行时数据(SQLite、blobs)
```
### 命名约定
- **Workflow**:动词开头的 kebab-case(例如 `review-pull-request``deploy-staging`)。避免单独名词式命名(如 `notifications`)。
- **Sense**:描述性名词 kebab-case(例如 `cpu-usage`)。
---
## CLI 完整参考
@@ -156,6 +167,17 @@ nerve remote remove <name>
nerve remote default <name> # 设为默认远程
```
### Agent(向 Hermes 注入本 skill)
```bash
nerve agent status # CLI 版本与各 Hermes 注入目录中的 skill 版本
nerve agent inject hermes # 安装到 ~/.hermes/skills/nerve
nerve agent inject hermes --profile <name> # 写入 ~/.hermes/profiles/<name>/skills/nerve
nerve agent update # 将所有已注入目录更新到当前 CLI 对应版本
nerve agent remove hermes # 移除默认 profile 的注入
nerve agent remove hermes --profile <name>
```
---
## nerve.yaml 配置参考
@@ -205,22 +227,27 @@ extract:
### compute 函数签名
```typescript
import type { LibSQLDatabase } from "drizzle-orm/libsql";
import type { ComputeResult, WorkflowTrigger } from "@uncaged/nerve-core";
Sense 的 `compute` **无参数**。它不接收数据库句柄:daemon 在 worker 内调用 `SenseComputeFn`,由运行时负责把非 null 结果的 `signal` 写入该 sense 的 Drizzle 表并记入 `_signals`。超时由运行时控制(对应 `nerve.yaml` 里的 `timeout`),无需在业务代码里读取 `AbortSignal`
export async function compute(
db: LibSQLDatabase, // 此 sense 的 Drizzle ORM 数据库
peers: Record<string, LibSQLDatabase>, // 其他 sense 的数据库(只读)
options: { signal: AbortSignal }, // 超时 abort signal
): Promise<ComputeResult<T>>
```typescript
import type { ComputeResult, SenseComputeFn } from "@uncaged/nerve-core";
export const compute: SenseComputeFn<MySignalShape> = async () => {
// ...
};
// 或等价地:
export async function compute(): Promise<ComputeResult<MySignalShape>> {
// ...
}
```
(运行时定义见 `@uncaged/nerve-core``SenseComputeFn` / `SenseModule`,daemon 侧在 `sense-runtime.ts``executeCompute` 中插入 `result.signal`。)
### 返回值
```typescript
// 返回 null = 静默,不发 signal
// 返回非 null = 发出 signal,可选触发 workflow
// 返回非 null = 发出 signal(并写入业务表),可选触发 workflow
type ComputeResult<T> =
| null
| { signal: T; workflow: WorkflowTrigger | null };
@@ -233,21 +260,20 @@ type WorkflowTrigger = {
};
```
若返回值是普通对象且不含 `signal` 字段,内核会按 shorthand 视为 `{ signal: payload, workflow: null }`(见 core 的 `routeSenseComputeOutput`)。
### Sense 模块导出
```typescript
// senses/<name>/src/index.ts
import type { SenseModule, ComputeResult } from "@uncaged/nerve-core";
import type { ComputeResult } from "@uncaged/nerve-core";
import { table } from "./schema.js";
export async function compute(
db: LibSQLDatabase,
_peers: Record<string, LibSQLDatabase>,
_options: { signal: AbortSignal },
): Promise<ComputeResult<number>> {
const value = Math.random(); // 替换为真实观测逻辑
await db.insert(table).values({ ts: Date.now(), value });
return { signal: value, workflow: null };
type Row = { ts: number; value: number };
export async function compute(): Promise<ComputeResult<Row>> {
const row: Row = { ts: Date.now(), value: Math.random() }; // 替换为真实观测逻辑
return { signal: row, workflow: null };
}
export { table };
@@ -292,18 +318,14 @@ export const table = sqliteTable("samples", {
// senses/cpu-usage/src/index.ts
import os from "node:os";
import type { LibSQLDatabase } from "drizzle-orm/libsql";
import type { ComputeResult } from "@uncaged/nerve-core";
import { table } from "./schema.js";
export async function compute(
db: LibSQLDatabase,
_peers: Record<string, LibSQLDatabase>,
_options: { signal: AbortSignal },
): Promise<ComputeResult<number>> {
type Row = { ts: number; value: number };
export async function compute(): Promise<ComputeResult<Row>> {
const oneMin = os.loadavg()[0];
await db.insert(table).values({ ts: Date.now(), value: oneMin });
return { signal: oneMin, workflow: null };
return { signal: { ts: Date.now(), value: oneMin }, workflow: null };
}
export { table };
@@ -331,55 +353,80 @@ import type {
WorkflowDefinition,
RoleResult,
ThreadContext,
StartStep,
RoleStep,
RoleMeta,
Moderator,
} from "@uncaged/nerve-core";
import { END } from "@uncaged/nerve-core";
// Role:执行者,接收上下文返回结果
type Role<Meta> = (ctx: ThreadContext) => Promise<RoleResult<Meta>>;
type RoleResult<Meta> = { content: string; meta: Meta };
// Moderator:路由器,决定下一个 role 或结束
type Moderator<M> = (ctx: ThreadContext<M>) => (keyof M & string) | typeof END;
// ThreadContext:对话上下文
type ThreadContext<M = RoleMeta> = {
threadId: string;
start: StartStep; // 初始 prompt(role: "__start__")
steps: RoleStep<M>[]; // 所有 role 的执行记录
};
// WorkflowDefinition:完整定义
type WorkflowDefinition<M> = {
name: string;
roles: { [K in keyof M & string]: Role<M[K]> };
moderator: Moderator<M>;
};
// Role<Meta> — (ctx: ThreadContext) => Promise<RoleResult<Meta>>
// RoleResult<Meta> — { content: string; meta: Meta }
// ThreadContext<M extends RoleMeta> — threadId, start(__start__ 帧), steps(各 role 轮次)
// Moderator<M> — (ctx) => 下一个 role 名 | END
// WorkflowDefinition<M extends RoleMeta> — name, roles, moderator
```
### 基本 Workflow 示例
### createRole 四元组(接入 LLM 时推荐)
工作区根目录需安装 **`@uncaged/nerve-workflow-utils`**(及所选 agent 适配器包)。默认 `nerve init``package.json` 不含该依赖时,在 `~/.uncaged-nerve` 下执行 `pnpm add @uncaged/nerve-workflow-utils`(或 npm 等价命令)。
使用 **`createRole`**,按固定顺序传入四件事:
1. **adapter**`AgentFn``(ctx, systemPrompt) => Promise<string>`(原始模型输出文本)。
2. **prompt**`string`,或 `async (ctx: ThreadContext) => string`
3. **meta**`z.ZodType<M>`,供 moderator 路由的结构化 meta。
4. **extract**`{ provider: LlmProvider; dryRun: boolean | null }`,声明从回复中抽取 meta 时用的 LLM(OpenAI 兼容)及是否 dry-run。
```typescript
// workflows/example/index.ts
import type { RoleResult, ThreadContext, WorkflowDefinition } from "@uncaged/nerve-core";
import { END } from "@uncaged/nerve-core";
import { createLlmAdapter, createRole } from "@uncaged/nerve-workflow-utils";
import type { ThreadContext } from "@uncaged/nerve-core";
import { z } from "zod";
type Meta = Record<"main", { round: number }>;
const provider = {
baseUrl: "https://api.example.com/v1",
apiKey: process.env.EXAMPLE_API_KEY!,
model: "gpt-4o-mini",
};
async function main(ctx: ThreadContext): Promise<RoleResult<{ round: number }>> {
const planMeta = z.object({ next: z.enum(["execute", "stop"]) });
export const planner = createRole(
createLlmAdapter(provider),
async (ctx: ThreadContext) => `规划任务:${ctx.start.content}`,
planMeta,
{ provider, dryRun: null },
);
```
`createLlmAdapter` 仅位于 **`@uncaged/nerve-workflow-utils`**:用 `LlmProvider` 生成 `AgentFn`,单轮对话里 **system** 来自 `createRole` 解析后的 prompt 字符串,**user** 为线程起点 `ctx.start.content`
### 基本 Workflow 示例(平铺 `roles/<role>.ts`)
```typescript
// workflows/example/roles/main.ts
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
export async function main(ctx: ThreadContext): Promise<RoleResult<{ round: number }>> {
const prompt = ctx.start.content;
return {
content: `处理完成: ${prompt}`,
meta: { round: ctx.steps.length },
};
}
```
```typescript
// workflows/example/index.ts
import type { ThreadContext, WorkflowDefinition } from "@uncaged/nerve-core";
import { END } from "@uncaged/nerve-core";
import { main } from "./roles/main.js";
type Meta = Record<"main", { round: number }>;
const workflow: WorkflowDefinition<Meta> = {
name: "example",
roles: { main },
moderator(ctx: ThreadContext<Meta>) {
// 执行一次 main 就结束
return ctx.steps.length === 0 ? "main" : END;
},
};
@@ -387,25 +434,50 @@ const workflow: WorkflowDefinition<Meta> = {
export default workflow;
```
可选:将 `moderator` 挪到 `moderator.ts``import { route } from "./moderator.js"`,保持 `index.ts` 只负责组装 `WorkflowDefinition`
### 多 Role Workflow 示例
```typescript
import type { WorkflowDefinition, RoleResult, ThreadContext } from "@uncaged/nerve-core";
import { END } from "@uncaged/nerve-core";
// workflows/plan-execute-review/roles/planner.ts
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
type Roles = Record<"planner" | "executor" | "reviewer", { status: string }>;
async function planner(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
export async function planner(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
void ctx;
return { content: "计划: ...", meta: { status: "planned" } };
}
```
async function executor(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
```typescript
// workflows/plan-execute-review/roles/executor.ts
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
export async function executor(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
void ctx;
return { content: "执行: ...", meta: { status: "executed" } };
}
```
async function reviewer(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
```typescript
// workflows/plan-execute-review/roles/reviewer.ts
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
export async function reviewer(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
void ctx;
return { content: "审核通过", meta: { status: "approved" } };
}
```
```typescript
// workflows/plan-execute-review/index.ts
import type { WorkflowDefinition, ThreadContext } from "@uncaged/nerve-core";
import { END } from "@uncaged/nerve-core";
import { executor } from "./roles/executor.js";
import { planner } from "./roles/planner.js";
import { reviewer } from "./roles/reviewer.js";
type Roles = Record<"planner" | "executor" | "reviewer", { status: string }>;
const workflow: WorkflowDefinition<Roles> = {
name: "plan-execute-review",
@@ -424,12 +496,14 @@ export default workflow;
### Agent 适配器
Workflow role 可以集成 AI agent。已知适配器 ID`echo``cursor``hermes``codex`
Workflow role 可以集成 AI agent。已知适配器 **ID**`echo``cursor``hermes``codex`
```typescript
type AgentFn = (ctx: ThreadContext, systemPrompt: string) => Promise<string>;
```
没有现成 agent 包时,用 **`createLlmAdapter``@uncaged/nerve-workflow-utils`)** 从 OpenAI 兼容的 `LlmProvider` 构造 `AgentFn`,再交给 **`createRole`** 的四元组。
### Workflow 运行状态
`queued``started``completed` | `failed` | `crashed` | `killed` | `interrupted` | `dropped`
@@ -484,9 +558,10 @@ nerve sense query my-sensor "SELECT * FROM ..." # 检查数据
### 开发新 workflow
```bash
nerve create workflow my-flow # 脚手架
# 编辑 workflows/my-flow/index.ts roles/
nerve create workflow my-flow # 脚手架(当前 CLI 可能仍生成 roles/<name>/ 子目录)
# 推荐对齐 AGENT.md:workflows/my-flow/index.ts + roles/<role>.ts(平铺),moderator 可拆到 moderator.ts
nerve validate # 验证配置
cd ~/.uncaged-nerve && npm run build # 工作区根目录构建(等价:pnpm run build);勿在单个 workflow 子目录单独跑 build
nerve workflow trigger my-flow --prompt "测试" --dryRun # 干跑
nerve thread show <runId> # 查看执行轨迹
```
@@ -496,10 +571,10 @@ nerve thread show <runId> # 查看执行轨迹
## Pitfalls
- **Sense 返回值**:返回 `null` 表示静默(不发 signal);返回 `{ signal, workflow }` 才发 signal。不要返回 undefined。
- **Sense 持久化**:daemon 在 `compute()` 返回非 null 时自动执行 `db.insert(table).values(signal)` 并写入 `_signals`;业务代码不要自行 insert。
- **no optional properties**:nerve 代码规范禁止 `?:`,用 `T | null` 代替。
- **函数式风格**:用 `function` + `type`,不用 `class` + `interface`
- **workflow 用 default export**:这是唯一允许 default export 的场景
- **workflow 用 default export**:工作区里通常只有 `workflows/<name>/index.ts` 使用 default export(daemon 加载约定)
- **_signals 表**:每个 sense 自动有 `_signals` 表记录 signal 历史,受 `retention` 配置限制。
- **peers 只读**:sense 的 `peers` 参数提供其他 sense 数据库的只读访问,不要写入。
- **concurrency + overflow**:workflow 必须配置并发策略,否则验证失败。
- **moderator 是同步函数**:不要加 async,moderator 是纯路由逻辑,不能有副作用。
+147 -13
View File
@@ -5,10 +5,11 @@ import {
readFileSync,
readdirSync,
rmSync,
statSync,
writeFileSync,
} from "node:fs";
import { homedir } from "node:os";
import { dirname, join } from "node:path";
import { dirname, join, resolve as resolvePath } from "node:path";
import { fileURLToPath } from "node:url";
import { defineCommand } from "citty";
@@ -62,6 +63,75 @@ function writeVersionFile(skillDir: string, version: string): void {
writeFileSync(join(skillDir, ".nerve-version"), `${version}\n`, "utf8");
}
const CURSOR_VERSION_MARKER_RE = /<!--\s*nerve-cli-version:\s*([^>]+?)\s*-->/;
function resolveCursorProjectDir(pathArg: string | null): string {
const raw = pathArg !== null && pathArg !== "" ? pathArg : process.cwd();
return resolvePath(raw);
}
function assertDirectory(projectDir: string, label: string): void {
if (!existsSync(projectDir)) {
process.stderr.write(`${label} does not exist: ${projectDir}\n`);
process.exit(1);
}
if (!statSync(projectDir).isDirectory()) {
process.stderr.write(`${label} is not a directory: ${projectDir}\n`);
process.exit(1);
}
}
function readCursorInjectVersion(projectDir: string): string | null {
const versionPath = join(projectDir, ".nerve-version");
if (existsSync(versionPath)) {
return readFileSync(versionPath, "utf8").trim();
}
const rulesPath = join(projectDir, ".cursorrules");
if (!existsSync(rulesPath)) return null;
const content = readFileSync(rulesPath, "utf8");
const match = content.match(CURSOR_VERSION_MARKER_RE);
return match !== null ? match[1].trim() : null;
}
function injectCursor(projectDir: string): void {
assertDirectory(projectDir, "Project directory");
const rulesPath = join(projectDir, ".cursorrules");
const existingVer = readCursorInjectVersion(projectDir);
if (existingVer === cliVersion() && existsSync(rulesPath)) {
process.stdout.write(
`✅ Cursor .cursorrules is already up to date (v${cliVersion()}) at ${projectDir}\n`,
);
return;
}
const templatePath = join(getSkillSourceDir(), "cursor", ".cursorrules");
if (!existsSync(templatePath)) {
throw new Error("Cannot locate cursor/.cursorrules template. Is the CLI package intact?");
}
let body = readFileSync(templatePath, "utf8");
body = body.replaceAll("__NERVE_CLI_VERSION__", cliVersion());
writeFileSync(rulesPath, body, "utf8");
writeVersionFile(projectDir, cliVersion());
const action = existingVer !== null ? "Updated" : "Installed";
process.stdout.write(`${action} Cursor .cursorrules v${cliVersion()} at ${projectDir}\n`);
}
function removeCursor(projectDir: string): void {
assertDirectory(projectDir, "Project directory");
const rulesPath = join(projectDir, ".cursorrules");
const versionPath = join(projectDir, ".nerve-version");
if (!existsSync(rulesPath)) {
process.stdout.write(`ℹ️ Cursor .cursorrules is not present at ${projectDir}\n`);
return;
}
rmSync(rulesPath, { force: true });
if (existsSync(versionPath)) {
rmSync(versionPath, { force: true });
}
process.stdout.write(`✅ Removed Cursor .cursorrules from ${projectDir}\n`);
}
function injectHermes(profile: string | null): void {
const sourceDir = join(getSkillSourceDir(), "hermes");
const targetDir = getHermesSkillDir(profile);
@@ -94,9 +164,35 @@ function removeHermes(profile: string | null): void {
process.stdout.write(`✅ Removed Hermes nerve skill${loc}\n`);
}
function printCursorStatusLine(projectDir: string): void {
const rulesPath = join(projectDir, ".cursorrules");
const label = `Cursor (${projectDir})`;
if (!existsSync(rulesPath)) {
process.stdout.write(` ${label}: ❌ not installed\n`);
return;
}
const ver = readCursorInjectVersion(projectDir);
if (ver === null) {
process.stdout.write(
` ${label}: ⚠️ installed (unknown version; run \`nerve agent inject cursor\`)\n`,
);
return;
}
if (ver === cliVersion()) {
process.stdout.write(` ${label}: ✅ v${ver}\n`);
} else {
process.stdout.write(
` ${label}: ⚠️ v${ver} → v${cliVersion()} available (run \`nerve agent inject cursor\`)\n`,
);
}
}
function printStatus(): void {
process.stdout.write(`nerve agent skills (CLI v${cliVersion()})\n\n`);
printCursorStatusLine(process.cwd());
process.stdout.write("\n");
// Default profile
const defaultDir = getHermesSkillDir(null);
const defaultVer = readVersionFile(defaultDir);
@@ -141,20 +237,39 @@ const injectCommand = defineCommand({
args: {
target: {
type: "positional",
description: "Agent target: hermes",
description: "Agent target: hermes | cursor",
},
profile: {
type: "string",
description: "Hermes profile name (default: main profile)",
},
path: {
type: "string",
description: "Project directory for Cursor rules (default: cwd); only used with cursor",
},
},
run({ args }) {
if (args.target !== "hermes") {
process.stderr.write(`❌ Unknown agent target: ${args.target}\n`);
process.stderr.write(" Supported targets: hermes\n");
process.exit(1);
const target = args.target;
if (target === "hermes") {
if (args.path != null && args.path !== "") {
process.stderr.write("❌ --path applies only to the cursor target\n");
process.exit(1);
}
injectHermes(args.profile ?? null);
return;
}
injectHermes(args.profile ?? null);
if (target === "cursor") {
if (args.profile != null && args.profile !== "") {
process.stderr.write("❌ --profile applies only to the hermes target\n");
process.exit(1);
}
const pathArg = args.path != null && args.path !== "" ? args.path : null;
injectCursor(resolveCursorProjectDir(pathArg));
return;
}
process.stderr.write(`❌ Unknown agent target: ${target}\n`);
process.stderr.write(" Supported targets: hermes, cursor\n");
process.exit(1);
},
});
@@ -203,20 +318,39 @@ const removeCommand = defineCommand({
args: {
target: {
type: "positional",
description: "Agent target: hermes",
description: "Agent target: hermes | cursor",
},
profile: {
type: "string",
description: "Hermes profile name (default: main profile)",
},
path: {
type: "string",
description: "Project directory for Cursor rules (default: cwd); only used with cursor",
},
},
run({ args }) {
if (args.target !== "hermes") {
process.stderr.write(`❌ Unknown agent target: ${args.target}\n`);
process.stderr.write(" Supported targets: hermes\n");
process.exit(1);
const target = args.target;
if (target === "hermes") {
if (args.path != null && args.path !== "") {
process.stderr.write("❌ --path applies only to the cursor target\n");
process.exit(1);
}
removeHermes(args.profile ?? null);
return;
}
removeHermes(args.profile ?? null);
if (target === "cursor") {
if (args.profile != null && args.profile !== "") {
process.stderr.write("❌ --profile applies only to the hermes target\n");
process.exit(1);
}
const pathArg = args.path != null && args.path !== "" ? args.path : null;
removeCursor(resolveCursorProjectDir(pathArg));
return;
}
process.stderr.write(`❌ Unknown agent target: ${target}\n`);
process.stderr.write(" Supported targets: hermes, cursor\n");
process.exit(1);
},
});
@@ -28,6 +28,11 @@ function makeMockChild(pid = 1): MockChild {
child.connected = true;
child.exitCode = null;
child.pid = pid;
setImmediate(() => {
if (child.connected) {
child.emit("message", { type: "ready" });
}
});
child.send = vi.fn((msg: unknown) => {
if (
msg !== null &&
@@ -132,6 +137,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
mgr.startWorkflow("my-wf", { prompt: "test 1", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-wf", { prompt: "test 2", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("my-wf")).toBe(2);
// Simulate unexpected exit (not shutdown)
@@ -159,6 +165,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("my-wf")).toBe(2);
const child = mockChildren[0];
@@ -183,6 +190,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
const child = mockChildren[0];
@@ -216,6 +224,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
firstChild.exitCode = 1;
firstChild.connected = false;
@@ -260,6 +269,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
// Start one thread to fill the concurrency slot (so queued run stays queued on respawn)
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
firstChild.exitCode = 1;
firstChild.connected = false;
@@ -285,6 +295,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const startCall = (child.send as ReturnType<typeof vi.fn>).mock.calls[0];
@@ -322,6 +333,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const launch = { prompt: "build-docker for myrepo", maxRounds: 10, dryRun: false };
mgr.startWorkflow("my-wf", launch);
await vi.runAllTimersAsync();
const startedCall = logStore.upsertWorkflowRun.mock.calls.find(
(args: any[]) => (args[0] as { type: string }).type === "started",
@@ -357,6 +369,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
// Start one thread to fill the concurrency slot
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
// Crash once → respawn → crash again → second respawn
@@ -398,6 +411,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
firstChild.exitCode = 1;
firstChild.connected = false;
@@ -428,6 +442,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("crash-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
// Crash the worker 6 times in rapid succession (within CRASH_WINDOW_MS = 60s)
for (let i = 0; i < 6; i++) {
@@ -33,6 +33,11 @@ function makeMockChild(pid = 1): MockChild {
child.connected = true;
child.exitCode = null;
child.pid = pid;
setImmediate(() => {
if (child.connected) {
child.emit("message", { type: "ready" });
}
});
child.send = vi.fn((msg: unknown) => {
if (
msg !== null &&
@@ -114,6 +119,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
// Remove workflow from config before drain completes
@@ -134,6 +140,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("my-wf")).toBe(2);
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
@@ -165,6 +172,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
@@ -181,6 +189,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
@@ -198,6 +207,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
await vi.runAllTimersAsync();
@@ -223,6 +233,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "first", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
await vi.runAllTimersAsync();
@@ -230,6 +241,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
// Start a new thread on the fresh worker
mgr.startWorkflow("my-wf", { prompt: "second", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const newChild = mockChildren[1];
const startCalls = (newChild.send as ReturnType<typeof vi.fn>).mock.calls.filter(
@@ -257,12 +269,13 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
vi.clearAllMocks();
});
it("does not send shutdown while a thread is still active", () => {
it("does not send shutdown while a thread is still active", async () => {
const logStore = makeLogStore();
const config = makeWfConfig({ "my-wf": { concurrency: 1, overflow: "drop" } });
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
mgr.drainWhenIdle("my-wf");
@@ -282,6 +295,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
@@ -311,6 +325,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
mgr.startWorkflow("my-wf", { prompt: "a", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-wf", { prompt: "b", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const sendMock = child.send as ReturnType<typeof vi.fn>;
const runIdA = (sendMock.mock.calls[0][0] as { runId: string }).runId;
@@ -355,6 +370,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
@@ -388,6 +404,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
@@ -414,6 +431,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-wf", { prompt: "once", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const firstChild = mockChildren[0];
const runId = (firstChild.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as {
runId: string;
@@ -471,6 +489,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
// Trigger a workflow thread so a worker is spawned
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
// Manually call drainAndRespawn (simulating what kernel does on workflow file change)
const drainPromise = kernel.workflowManager.drainAndRespawn("my-wf", 1000);
@@ -511,6 +530,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
maxRounds: 10,
dryRun: false,
});
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
// Reload config without old-wf
@@ -551,6 +571,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
});
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const workersBefore = mockChildren.length;
// Reload with updated concurrency — should NOT spawn a new workflow worker
@@ -573,6 +594,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
// Can now start up to 5 concurrent threads (previously only 1)
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(kernel.workflowManager.activeCount("my-wf")).toBe(3);
const stopPromise = kernel.stop();
@@ -194,6 +194,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
// A workflow worker should be spawned and a start-thread message sent
const workflowWorker = mockChildren.find((c) =>
(c.send as ReturnType<typeof vi.fn>).mock.calls.some(
@@ -252,6 +254,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
// Find the start-thread call and verify triggerPayload
const startThreadCall = mockChildren
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
@@ -306,6 +310,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
const senseEntries = logStore.append.mock.calls
.map((c) => c[0] as { source: string; type: string; refId: string | null })
.filter((e) => e.source === "sense" && e.refId === "cpu-usage");
@@ -423,6 +429,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
expect(logStore.upsertWorkflowRun).toHaveBeenCalledWith(
expect.objectContaining({ source: "workflow", type: "started" }),
expect.objectContaining({ workflow: "log-test-workflow", status: "started" }),
@@ -498,6 +506,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
const startThreadCall = mockChildren
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
.find(
@@ -582,6 +592,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
const startThreadCall = mockChildren
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
.find(
@@ -642,6 +654,8 @@ describe("kernel + workflowManager integration", () => {
});
}
await vi.runAllTimersAsync();
const stopPromise = kernel.stop();
await vi.runAllTimersAsync();
await expect(stopPromise).resolves.toBeUndefined();
@@ -16,6 +16,7 @@ function baseConfig(script: string) {
onMessage: vi.fn(),
onReady: vi.fn(),
onExit: vi.fn(),
onCrashLimitReached: null,
respawn: {
enabled: true,
maxCrashes: 6,
@@ -84,7 +85,7 @@ describe("createWorkerRuntime", () => {
const rt = track(createWorkerRuntime(baseConfig(echoWorkerPath)));
await rt.start("k");
expect(rt.has("k")).toBe(true);
await rt.evict("k");
await rt.evict("k", null);
expect(rt.has("k")).toBe(false);
await rt.shutdown();
});
@@ -94,7 +95,7 @@ describe("createWorkerRuntime", () => {
await rt.start("k");
const before = rt.pid("k");
expect(before).not.toBeNull();
await rt.drain("k");
await rt.drain("k", null);
const after = rt.pid("k");
expect(after).not.toBeNull();
expect(after).not.toBe(before);
@@ -26,6 +26,11 @@ function makeMockChild(pid = 1): MockChild {
child.connected = true;
child.exitCode = null;
child.pid = pid;
setImmediate(() => {
if (child.connected) {
child.emit("message", { type: "ready" });
}
});
child.send = vi.fn((msg: unknown) => {
if (
msg !== null &&
@@ -110,7 +115,7 @@ describe("WorkflowManager", () => {
});
describe("startWorkflow under concurrency limit dispatches thread", () => {
it("forks a worker and sends start-thread when active < concurrency", () => {
it("forks a worker and sends start-thread when active < concurrency", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"my-workflow": { concurrency: 2, overflow: "drop" },
@@ -118,6 +123,7 @@ describe("WorkflowManager", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-workflow", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mockChildren).toHaveLength(1);
expect(mockChildren[0].send).toHaveBeenCalledWith(
@@ -126,7 +132,7 @@ describe("WorkflowManager", () => {
expect(mgr.activeCount("my-workflow")).toBe(1);
});
it("reuses the same worker for a second thread under the limit", () => {
it("reuses the same worker for a second thread under the limit", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"my-workflow": { concurrency: 3, overflow: "drop" },
@@ -135,6 +141,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("my-workflow", { prompt: "test 1", maxRounds: 10, dryRun: false });
mgr.startWorkflow("my-workflow", { prompt: "test 2", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
// Only one forked child — worker is reused
expect(mockChildren).toHaveLength(1);
@@ -142,7 +149,7 @@ describe("WorkflowManager", () => {
expect(mgr.activeCount("my-workflow")).toBe(2);
});
it("logs a 'started' event for each dispatched thread", () => {
it("logs a 'started' event for each dispatched thread", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"my-workflow": { concurrency: 2, overflow: "drop" },
@@ -150,6 +157,7 @@ describe("WorkflowManager", () => {
const mgr = createWorkflowManager("/nerve-root", config, logStore);
mgr.startWorkflow("my-workflow", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(logStore.upsertWorkflowRun).toHaveBeenCalledWith(
expect.objectContaining({ source: "workflow", type: "started" }),
@@ -159,7 +167,7 @@ describe("WorkflowManager", () => {
});
describe("startWorkflow at limit with drop overflow drops the request", () => {
it("does NOT send start-thread when at concurrency limit with overflow=drop", () => {
it("does NOT send start-thread when at concurrency limit with overflow=drop", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"drop-wf": { concurrency: 1, overflow: "drop" },
@@ -169,6 +177,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("drop-wf", { prompt: "first", maxRounds: 10, dryRun: false });
// now at limit — second call should be dropped
mgr.startWorkflow("drop-wf", { prompt: "second", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("drop-wf")).toBe(1);
expect(mgr.queueLength("drop-wf")).toBe(0);
@@ -254,7 +263,7 @@ describe("WorkflowManager", () => {
});
describe("completing a thread dequeues the next one", () => {
it("dispatches the next queued thread when the active thread sends completed", () => {
it("dispatches the next queued thread when the active thread sends completed", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"queue-wf": { concurrency: 1, overflow: "queue", maxQueue: 5 },
@@ -263,6 +272,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("queue-wf", { prompt: "first", maxRounds: 10, dryRun: false });
mgr.startWorkflow("queue-wf", { prompt: "second", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
expect(mgr.activeCount("queue-wf")).toBe(1);
expect(mgr.queueLength("queue-wf")).toBe(1);
@@ -289,7 +299,7 @@ describe("WorkflowManager", () => {
);
});
it("dispatches next queued thread when active thread sends failed", () => {
it("dispatches next queued thread when active thread sends failed", async () => {
const logStore = makeLogStore();
const config = makeConfig({
"queue-wf": { concurrency: 1, overflow: "queue", maxQueue: 5 },
@@ -298,6 +308,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("queue-wf", { prompt: "first", maxRounds: 10, dryRun: false });
mgr.startWorkflow("queue-wf", { prompt: "second", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
const child = mockChildren[0];
const firstRunId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0].runId as string;
@@ -325,6 +336,7 @@ describe("WorkflowManager", () => {
mgr.startWorkflow("wf-a", { prompt: "test", maxRounds: 10, dryRun: false });
mgr.startWorkflow("wf-b", { prompt: "test", maxRounds: 10, dryRun: false });
await vi.runAllTimersAsync();
// Two distinct workers should have been forked
expect(mockChildren).toHaveLength(2);
+1 -1
View File
@@ -25,7 +25,7 @@ import type { WorkerToParentMessage } from "./ipc.js";
import { parseParentMessage } from "./ipc.js";
import { executeCompute, loadSenseModule, openSenseDb } from "./sense-runtime.js";
import type { SenseRuntime } from "./sense-runtime.js";
import { ignoreSessionBroadcastSignals } from "./worker-fork-support.js";
import { ignoreSessionBroadcastSignals } from "./worker-signals.js";
// ---------------------------------------------------------------------------
// IPC helpers
@@ -1,45 +0,0 @@
import type { ChildProcess } from "node:child_process";
const STDERR_TAIL_MAX_CHARS = 16_384;
/**
* Forked workers inherit the parent's process group. In foreground `nerve dev`,
* terminal-driven SIGINT/SIGTERM is delivered to the whole group, so workers can exit
* on the default handler before the kernel sends `{ type: "shutdown" }` over IPC.
* Swallow these in worker processes so the parent coordinates shutdown (issue #55).
* Only call when `process.send` is defined (fork IPC); standalone `node …-worker.js` keeps default Ctrl+C behaviour.
*/
export function ignoreSessionBroadcastSignals(): void {
const swallow = (): void => {};
process.on("SIGINT", swallow);
process.on("SIGTERM", swallow);
}
export function teeCapturedStderr(child: ChildProcess, tail: { value: string }): void {
const stream = child.stderr;
if (stream === null || stream === undefined) return;
stream.setEncoding("utf8");
stream.on("data", (chunk: string | Buffer) => {
const text = typeof chunk === "string" ? chunk : chunk.toString("utf8");
process.stderr.write(text);
tail.value = (tail.value + text).slice(-STDERR_TAIL_MAX_CHARS);
});
}
export function formatChildExitSummary(code: number | null, signal: NodeJS.Signals | null): string {
const codeStr = code === null || code === undefined ? "null" : String(code);
if (signal) {
return `code=${codeStr} signal=${signal}`;
}
return `code=${codeStr}`;
}
export function formatCapturedStderrTail(tail: string, maxChars = 800): string {
const trimmed = tail.trim();
if (trimmed.length === 0) return "";
const normalized = trimmed.replace(/\r?\n/g, "\\n");
if (normalized.length <= maxChars) {
return ` worker_stderr=${normalized}`;
}
return ` worker_stderr=…${normalized.slice(-maxChars)}`;
}
+8 -4
View File
@@ -6,8 +6,11 @@ import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
import type { ComputeMessage } from "./ipc.js";
import { formatCapturedStderrTail, formatChildExitSummary } from "./worker-fork-support.js";
import { createWorkerRuntime } from "./worker-runtime.js";
import {
createWorkerRuntime,
formatCapturedStderrTail,
formatChildExitSummary,
} from "./worker-runtime.js";
export function resolveWorkerScript(): string {
const __filename = fileURLToPath(import.meta.url);
@@ -66,6 +69,7 @@ export function createSenseWorkerPool(options: SenseWorkerPoolOptions): SenseWor
onReady: (_key, msg) => {
options.onWorkerMessage(msg);
},
onCrashLimitReached: null,
onExit: (group, code, signal) => {
const sig =
signal === null || signal === undefined || signal === ""
@@ -102,13 +106,13 @@ export function createSenseWorkerPool(options: SenseWorkerPoolOptions): SenseWor
return;
}
options.onBeforeGroupRestart(group);
await runtime.drain(group);
await runtime.drain(group, null);
}
function evictGroup(group: string): void {
trackedGroups.delete(group);
evicting.add(group);
void runtime.evict(group).finally(() => {
void runtime.evict(group, null).finally(() => {
evicting.delete(group);
});
}
+45 -9
View File
@@ -8,6 +8,28 @@ import { isPlainRecord } from "@uncaged/nerve-core";
const STDERR_TAIL_MAX_CHARS = 2048;
export function formatChildExitSummary(code: number | null, signal: NodeJS.Signals | null): string {
const codeStr = code === null || code === undefined ? "null" : String(code);
if (signal) {
return `code=${codeStr} signal=${signal}`;
}
return `code=${codeStr}`;
}
export function formatCapturedStderrTail(tail: string, maxChars = 800): string {
const trimmed = tail.trim();
if (trimmed.length === 0) return "";
const normalized = trimmed.replace(/\r?\n/g, "\\n");
if (normalized.length <= maxChars) {
return ` worker_stderr=${normalized}`;
}
return ` worker_stderr=…${normalized.slice(-maxChars)}`;
}
export type WorkerDrainOpts = {
shutdownTimeoutMs: number | null;
};
export type WorkerRuntimeConfig<K extends string> = {
script: string;
argsForKey: (key: K) => string[];
@@ -16,6 +38,8 @@ export type WorkerRuntimeConfig<K extends string> = {
onMessage: (key: K, msg: unknown) => void;
onReady: (key: K, msg: unknown) => void;
onExit: (key: K, code: number | null, signal: string | null) => void;
/** Invoked when automatic respawn is skipped because `maxCrashes` was exceeded in `windowMs`. */
onCrashLimitReached: ((key: K) => void) | null;
respawn: {
enabled: boolean;
maxCrashes: number;
@@ -32,8 +56,8 @@ export type WorkerRuntime<K extends string> = {
/** When the worker is already ready and IPC-connected, sends synchronously (returns true). Otherwise false — caller may fall back to `send`. */
trySendSync: (key: K, msg: unknown) => boolean;
start: (key: K) => Promise<void>;
evict: (key: K) => Promise<void>;
drain: (key: K) => Promise<void>;
evict: (key: K, opts: WorkerDrainOpts | null) => Promise<void>;
drain: (key: K, opts: WorkerDrainOpts | null) => Promise<void>;
shutdown: () => Promise<void>;
has: (key: K) => boolean;
/** True when a child exists but IPC is disconnected (legacy pool skipped sends in this case). */
@@ -264,7 +288,14 @@ export function createWorkerRuntime<K extends string>(
await waitForReady(slot, config.shutdownTimeoutMs);
}
async function gracefulStop(slot: WorkerSlot<K>): Promise<void> {
function resolveShutdownTimeoutMs(opts: WorkerDrainOpts | null): number {
if (opts !== null && opts.shutdownTimeoutMs !== null) {
return opts.shutdownTimeoutMs;
}
return config.shutdownTimeoutMs;
}
async function gracefulStop(slot: WorkerSlot<K>, shutdownTimeoutMs: number): Promise<void> {
if (slot.child === null) {
return;
}
@@ -276,7 +307,7 @@ export function createWorkerRuntime<K extends string>(
} catch {
// IPC channel may have closed between null-check and send
}
await waitForChildExit(child, config.shutdownTimeoutMs);
await waitForChildExit(child, shutdownTimeoutMs);
}
async function handleUnexpectedCrashRecovery(slot: WorkerSlot<K>): Promise<void> {
@@ -295,6 +326,9 @@ export function createWorkerRuntime<K extends string>(
console.error(
`[WorkerRuntime] worker "${String(slot.key)}" exceeded crash limit (${String(config.respawn.maxCrashes)} in ${String(config.respawn.windowMs)}ms); not respawning`,
);
if (config.onCrashLimitReached !== null) {
config.onCrashLimitReached(slot.key);
}
return;
}
@@ -303,7 +337,7 @@ export function createWorkerRuntime<K extends string>(
}
async function shutdownWorker(slot: WorkerSlot<K>): Promise<void> {
await gracefulStop(slot);
await gracefulStop(slot, config.shutdownTimeoutMs);
workers.delete(slot.key);
}
@@ -348,22 +382,24 @@ export function createWorkerRuntime<K extends string>(
});
},
evict: async (key: K) => {
evict: async (key: K, opts: WorkerDrainOpts | null) => {
const slot = getOrCreateSlot(key);
const shutdownMs = resolveShutdownTimeoutMs(opts);
await enqueueOp(slot, async () => {
await gracefulStop(slot);
await gracefulStop(slot, shutdownMs);
workers.delete(key);
});
},
drain: async (key: K) => {
drain: async (key: K, opts: WorkerDrainOpts | null) => {
const slot = getOrCreateSlot(key);
const shutdownMs = resolveShutdownTimeoutMs(opts);
await enqueueOp(slot, async () => {
if (slot.child === null) {
await forkAndWaitReady(slot);
return;
}
await gracefulStop(slot);
await gracefulStop(slot, shutdownMs);
await forkAndWaitReady(slot);
});
},
+17
View File
@@ -0,0 +1,17 @@
/**
* Worker-process signal handling (fork IPC children only).
* Worker entrypoints import this module — not worker-runtime.ts (parent/kernel code).
*/
/**
* Forked workers inherit the parent's process group. In foreground `nerve dev`,
* terminal-driven SIGINT/SIGTERM is delivered to the whole group, so workers can exit
* on the default handler before the kernel sends `{ type: "shutdown" }` over IPC.
* Swallow these in worker processes so the parent coordinates shutdown (issue #55).
* Only call when `process.send` is defined (fork IPC); standalone `node …-worker.js` keeps default Ctrl+C behaviour.
*/
export function ignoreSessionBroadcastSignals(): void {
const swallow = (): void => {};
process.on("SIGINT", swallow);
process.on("SIGTERM", swallow);
}
@@ -0,0 +1,256 @@
/**
* Pure helpers and IPC branching for workflow-manager (keeps workflow-manager.ts lean).
*/
import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
import type { WorkflowMessage } from "@uncaged/nerve-core";
import { START, isPlainRecord } from "@uncaged/nerve-core";
import type { LogStore, WorkflowRunStatus } from "@uncaged/nerve-store";
import type { ResumeThreadMessage, ThreadEventMessage } from "./ipc.js";
import type { WorkerToParentMessage } from "./ipc.js";
export type PendingThread = {
runId: string;
prompt: string;
maxRounds: number;
dryRun: boolean;
};
export type WorkflowState = {
active: Set<string>;
queue: PendingThread[];
};
/** Matches legacy manager: 6 crashes within 60s stops respawn (was `length > 5`). */
export const WORKFLOW_WORKER_RESPAWN = {
enabled: true,
maxCrashes: 6,
windowMs: 60_000,
delayMs: 0,
} as const;
/**
* Worker shutdown timeout — must stay in sync with SHUTDOWN_TIMEOUT_MS in workflow-worker.ts.
* The drain timeout passed to drainAndRespawn must be >= this value so the worker has
* enough time to finish in-flight threads before the parent force-kills it.
*/
export const WORKER_SHUTDOWN_TIMEOUT_MS = 10_000;
export const DEFAULT_MAX_QUEUE = 100;
export function readLaunchFromTriggerPayload(
raw: unknown,
engineDefaultMaxRounds: number,
): { prompt: string; maxRounds: number; dryRun: boolean } {
if (isPlainRecord(raw)) {
const o = raw;
if (typeof o.prompt === "string" && typeof o.maxRounds === "number") {
const dryRun = typeof o.dryRun === "boolean" ? o.dryRun : false;
return { prompt: o.prompt, maxRounds: o.maxRounds, dryRun };
}
}
return { prompt: "", maxRounds: engineDefaultMaxRounds, dryRun: false };
}
export function ensureThreadMessagesWithStart(
messages: Array<{ role: string; content: string; meta: unknown; timestamp: number }>,
threadId: string,
fallbackPrompt: string,
fallbackMaxRounds: number,
): WorkflowMessage[] {
const mapped: WorkflowMessage[] = messages.map((m) => ({
role: m.role,
content: m.content,
meta: m.meta,
timestamp: m.timestamp,
}));
if (mapped.length > 0 && mapped[0].role === START) {
return mapped;
}
const start: WorkflowMessage = {
role: START,
content: fallbackPrompt,
meta: { maxRounds: fallbackMaxRounds, threadId },
timestamp: Date.now(),
};
return [start, ...mapped];
}
export function resolveWorkflowWorkerScript(): string {
const __filename = fileURLToPath(import.meta.url);
const __dir = dirname(__filename);
return join(__dir, "workflow-worker.js");
}
export function mapWorkflowRunStatus(eventType: string): WorkflowRunStatus | null {
const map: Record<string, WorkflowRunStatus> = {
started: "started",
queued: "queued",
completed: "completed",
failed: "failed",
crashed: "crashed",
dropped: "dropped",
interrupted: "interrupted",
killed: "killed",
};
return map[eventType] ?? null;
}
export function extractExitCodeFromPayload(payload: unknown): number | null {
if (isPlainRecord(payload) && typeof payload.exitCode === "number") {
return payload.exitCode;
}
return null;
}
export function appendWorkflowRunLog(
logStore: LogStore,
workflowName: string,
runId: string,
eventType: string,
payload: unknown | undefined,
exitCode: number | null,
): void {
const timestamp = Date.now();
const serialised = payload !== undefined ? JSON.stringify(payload) : null;
const status = mapWorkflowRunStatus(eventType);
if (status !== null) {
logStore.upsertWorkflowRun(
{
source: "workflow",
type: eventType,
refId: runId,
payload: serialised,
timestamp,
},
{ runId, workflow: workflowName, status, timestamp, exitCode },
);
} else {
logStore.append({
source: "workflow",
type: eventType,
refId: runId,
payload: serialised,
timestamp,
});
}
}
export function recoverQueuedRun(
workflowName: string,
runId: string,
state: WorkflowState,
logStore: LogStore,
engineMaxRounds: number,
): void {
if (state.queue.some((q) => q.runId === runId)) return;
const launch = readLaunchFromTriggerPayload(logStore.getTriggerPayload(runId), engineMaxRounds);
state.queue.push({
runId,
prompt: launch.prompt,
maxRounds: launch.maxRounds,
dryRun: launch.dryRun,
});
process.stderr.write(
`[workflow-manager] crash-recovery: re-queued thread "${runId}" for "${workflowName}"\n`,
);
}
export function recoverStartedRun(
workflowName: string,
runId: string,
state: WorkflowState,
logStore: LogStore,
engineMaxRounds: number,
sendResume: (wf: string, msg: ResumeThreadMessage) => void,
): void {
if (state.active.has(runId)) return;
const rawMessages = logStore.getThreadMessages(runId);
const launch = readLaunchFromTriggerPayload(logStore.getTriggerPayload(runId), engineMaxRounds);
const messages = ensureThreadMessagesWithStart(
rawMessages,
runId,
launch.prompt,
launch.maxRounds,
);
state.active.add(runId);
const msg: ResumeThreadMessage = {
type: "resume-thread",
runId,
messages,
maxRounds: launch.maxRounds,
dryRun: launch.dryRun,
};
sendResume(workflowName, msg);
process.stderr.write(
`[workflow-manager] crash-recovery: resuming thread "${runId}" for "${workflowName}" (${String(messages.length)} messages)\n`,
);
}
export function recoverThreadsFromStore(
workflowName: string,
logStore: LogStore,
engineMaxRounds: number,
getOrCreateState: (name: string) => WorkflowState,
sendResume: (wf: string, msg: ResumeThreadMessage) => void,
): void {
const activeRuns = logStore.getActiveWorkflowRuns(workflowName);
const state = getOrCreateState(workflowName);
for (const run of activeRuns) {
if (run.status === "queued") {
recoverQueuedRun(workflowName, run.runId, state, logStore, engineMaxRounds);
} else if (run.status === "started") {
recoverStartedRun(workflowName, run.runId, state, logStore, engineMaxRounds, sendResume);
}
}
}
export type WorkflowManagerMessageDeps = {
logStore: LogStore;
handleThreadEvent: (workflowName: string, msg: ThreadEventMessage) => void;
onWorkflowRoleError: (
workflowName: string,
runId: string,
error: string,
exitCode: number,
) => void;
};
export function dispatchWorkflowWorkerMessage(
workflowName: string,
msg: WorkerToParentMessage,
deps: WorkflowManagerMessageDeps,
): void {
if (msg.type === "thread-event") {
deps.handleThreadEvent(workflowName, msg);
return;
}
if (msg.type === "thread-workflow-message") {
deps.logStore.append({
source: "workflow",
type: "thread_workflow_message",
refId: msg.runId,
payload: JSON.stringify(msg.message),
timestamp: Date.now(),
});
return;
}
if (msg.type === "workflow-error") {
process.stderr.write(
`[workflow-manager] workflow-error for runId "${msg.runId}" in "${workflowName}": ${msg.error}\n`,
);
deps.onWorkflowRoleError(workflowName, msg.runId, msg.error, msg.exitCode);
return;
}
if (msg.type === "error") {
process.stderr.write(`[workflow-manager] error from "${workflowName}" worker: ${msg.error}\n`);
}
}
+175 -445
View File
@@ -6,33 +6,27 @@
* Concurrency and overflow (drop/queue) are enforced here in the parent process.
*/
import { fork } from "node:child_process";
import type { ChildProcess } from "node:child_process";
import { dirname, join } from "node:path";
import { fileURLToPath } from "node:url";
import type { NerveConfig, WorkflowConfig, WorkflowStatus } from "@uncaged/nerve-core";
import type {
NerveConfig,
WorkflowConfig,
WorkflowMessage,
WorkflowStatus,
} from "@uncaged/nerve-core";
import { START, isPlainRecord } from "@uncaged/nerve-core";
import type { LogStore, WorkflowRunStatus } from "@uncaged/nerve-store";
import type {
KillThreadMessage,
ResumeThreadMessage,
ShutdownMessage,
StartThreadMessage,
ThreadEventMessage,
} from "./ipc.js";
import type { LogStore } from "@uncaged/nerve-store";
import type { KillThreadMessage, StartThreadMessage, ThreadEventMessage } from "./ipc.js";
import { parseWorkerMessage } from "./ipc.js";
import {
createWorkerRuntime,
formatCapturedStderrTail,
formatChildExitSummary,
teeCapturedStderr,
} from "./worker-fork-support.js";
} from "./worker-runtime.js";
import {
DEFAULT_MAX_QUEUE,
WORKER_SHUTDOWN_TIMEOUT_MS,
WORKFLOW_WORKER_RESPAWN,
type WorkflowState,
appendWorkflowRunLog,
dispatchWorkflowWorkerMessage,
extractExitCodeFromPayload,
recoverThreadsFromStore,
resolveWorkflowWorkerScript,
} from "./workflow-manager-support.js";
export type WorkflowLaunchParams = {
prompt: string;
@@ -74,169 +68,109 @@ export type WorkflowManager = {
stop: () => Promise<void>;
};
type PendingThread = {
runId: string;
prompt: string;
maxRounds: number;
dryRun: boolean;
};
type WorkflowState = {
active: Set<string>;
queue: PendingThread[];
};
type WorkerEntry = {
workflowName: string;
process: ChildProcess;
stopping: boolean;
/** When set, the worker is draining before a hot-reload respawn. */
draining: boolean;
stderrTail: { value: string };
};
// Crash respawn backoff: track crash timestamps per workflow.
const MAX_CRASHES_IN_WINDOW = 5;
const CRASH_WINDOW_MS = 60_000;
/**
* Worker shutdown timeout — must stay in sync with SHUTDOWN_TIMEOUT_MS in workflow-worker.ts.
* The drain timeout passed to drainAndRespawn must be >= this value so the worker has
* enough time to finish in-flight threads before the parent force-kills it.
*/
const WORKER_SHUTDOWN_TIMEOUT_MS = 10_000;
const DEFAULT_MAX_QUEUE = 100;
function readLaunchFromTriggerPayload(
raw: unknown,
engineDefaultMaxRounds: number,
): { prompt: string; maxRounds: number; dryRun: boolean } {
if (isPlainRecord(raw)) {
const o = raw;
if (typeof o.prompt === "string" && typeof o.maxRounds === "number") {
const dryRun = typeof o.dryRun === "boolean" ? o.dryRun : false;
return { prompt: o.prompt, maxRounds: o.maxRounds, dryRun };
}
}
return { prompt: "", maxRounds: engineDefaultMaxRounds, dryRun: false };
}
function ensureThreadMessagesWithStart(
messages: Array<{ role: string; content: string; meta: unknown; timestamp: number }>,
threadId: string,
fallbackPrompt: string,
fallbackMaxRounds: number,
): WorkflowMessage[] {
const mapped: WorkflowMessage[] = messages.map((m) => ({
role: m.role,
content: m.content,
meta: m.meta,
timestamp: m.timestamp,
}));
if (mapped.length > 0 && mapped[0].role === START) {
return mapped;
}
const start: WorkflowMessage = {
role: START,
content: fallbackPrompt,
meta: { maxRounds: fallbackMaxRounds, threadId },
timestamp: Date.now(),
};
return [start, ...mapped];
}
function resolveWorkerScript(): string {
const __filename = fileURLToPath(import.meta.url);
const __dir = dirname(__filename);
return join(__dir, "workflow-worker.js");
}
function spawnWorkflowWorker(
nerveRoot: string,
workflowName: string,
workerScript: string,
stderrTail: { value: string },
): ChildProcess {
const child = fork(workerScript, ["--workflow", workflowName, "--root", nerveRoot], {
stdio: ["ignore", "inherit", "pipe", "ipc"],
});
teeCapturedStderr(child, stderrTail);
// Prevent unhandled EPIPE when writing to a child whose IPC channel closed
child.on("error", (err) => {
if ((err as NodeJS.ErrnoException).code !== "EPIPE") {
console.error("[worker] error:", err.message);
}
});
return child;
}
function sendStartThread(worker: ChildProcess, msg: StartThreadMessage): void {
if (worker.connected === false) return;
try {
worker.send(msg);
} catch {
// IPC channel closed between connected check and send
}
}
function sendShutdown(worker: ChildProcess, entry: WorkerEntry): void {
entry.stopping = true;
if (worker.connected === false) return;
const msg: ShutdownMessage = { type: "shutdown" };
try {
worker.send(msg);
} catch {
// IPC channel closed between connected check and send
}
}
function sendResumeThread(worker: ChildProcess, msg: ResumeThreadMessage): void {
if (worker.connected === false) return;
try {
worker.send(msg);
} catch {
// IPC channel closed between connected check and send
}
}
function sendKillThread(worker: ChildProcess, runId: string): void {
if (worker.connected === false) return;
const msg: KillThreadMessage = { type: "kill-thread", runId };
try {
worker.send(msg);
} catch {
// IPC channel closed between connected check and send
}
}
function waitForExit(child: ChildProcess, timeoutMs: number): Promise<void> {
return new Promise((resolve) => {
const timer = setTimeout(() => {
child.kill("SIGKILL");
resolve();
}, timeoutMs);
child.once("exit", () => {
clearTimeout(timer);
resolve();
});
});
}
export function createWorkflowManager(
nerveRoot: string,
initialConfig: NerveConfig,
logStore: LogStore,
): WorkflowManager {
const workerScript = resolveWorkerScript();
const workerScript = resolveWorkflowWorkerScript();
/**
* Default drain timeout must be at least WORKER_SHUTDOWN_TIMEOUT_MS so the worker
* has enough time to finish in-flight threads before the parent force-kills it.
*/
const DEFAULT_DRAIN_TIMEOUT_MS = Math.max(30_000, WORKER_SHUTDOWN_TIMEOUT_MS + 5_000);
const states = new Map<string, WorkflowState>();
const workers = new Map<string, WorkerEntry>();
const crashTimestamps = new Map<string, number[]>();
const trackedWorkflows = new Set<string>();
const hotReloadEvicting = new Set<string>();
const crashRecoveryPending = new Set<string>();
const crashLimitBlocked = new Set<string>();
let stopped = false;
let config = initialConfig;
const pendingDrains = new Set<string>();
function logWorkflowEvent(
workflowName: string,
runId: string,
eventType: string,
payload: unknown | null = null,
exitCode: number | null = null,
): void {
appendWorkflowRunLog(logStore, workflowName, runId, eventType, payload, exitCode);
}
const runtime = createWorkerRuntime<string>({
script: workerScript,
argsForKey: (workflowName) => ["--workflow", workflowName, "--root", nerveRoot],
forwardStderr: true,
onMessage: (workflowName, raw) => {
handleWorkerMessage(workflowName, raw);
},
onReady: (workflowName, _msg) => {
if (crashRecoveryPending.has(workflowName)) {
crashRecoveryPending.delete(workflowName);
recoverThreadsFromStore(
workflowName,
logStore,
config.maxRounds,
getOrCreateState,
(wf, msg) => {
sendToWorker(wf, msg);
},
);
}
},
onExit: (workflowName, code, signalStr) => {
const sig =
signalStr === null || signalStr === undefined || signalStr === ""
? null
: (signalStr as NodeJS.Signals);
if (hotReloadEvicting.has(workflowName)) {
hotReloadEvicting.delete(workflowName);
markActiveRunsInterrupted(workflowName);
if (!stopped && workflowConfig(workflowName) !== null) {
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" drained, respawning\n`,
);
}
return;
}
if (stopped) {
const state = states.get(workflowName);
if (state !== undefined) {
state.active.clear();
}
crashRecoveryPending.delete(workflowName);
return;
}
const summary = formatChildExitSummary(code, sig);
const stderrExtra = formatCapturedStderrTail(runtime.stderrTail(workflowName));
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" exited (${summary})${stderrExtra}\n`,
);
cleanupAfterUnexpectedWorkerExit(workflowName);
crashRecoveryPending.add(workflowName);
},
onCrashLimitReached: (workflowName) => {
crashRecoveryPending.delete(workflowName);
trackedWorkflows.delete(workflowName);
crashLimitBlocked.add(workflowName);
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" exceeded crash limit (${String(WORKFLOW_WORKER_RESPAWN.maxCrashes)} in ${String(WORKFLOW_WORKER_RESPAWN.windowMs)}ms) — stopping respawn\n`,
);
},
respawn: {
...WORKFLOW_WORKER_RESPAWN,
allowRespawn: (_wf) => !stopped,
},
shutdownTimeoutMs: DEFAULT_DRAIN_TIMEOUT_MS,
});
function getOrCreateState(workflowName: string): WorkflowState {
let state = states.get(workflowName);
if (state === undefined) {
@@ -250,60 +184,38 @@ export function createWorkflowManager(
return config.workflows[workflowName] ?? null;
}
function toWorkflowRunStatus(eventType: string): WorkflowRunStatus | null {
const map: Record<string, WorkflowRunStatus> = {
started: "started",
queued: "queued",
completed: "completed",
failed: "failed",
crashed: "crashed",
dropped: "dropped",
interrupted: "interrupted",
killed: "killed",
};
return map[eventType] ?? null;
}
function extractExitCode(payload: unknown): number | null {
if (isPlainRecord(payload) && typeof payload.exitCode === "number") {
return payload.exitCode;
/** IPC send — matches legacy pool: no-op when IPC is disconnected; cold-start via WorkerRuntime.send. */
function sendToWorker(workflowName: string, msg: unknown): void {
if (crashLimitBlocked.has(workflowName)) {
return;
}
return null;
}
function logWorkflowEvent(
workflowName: string,
runId: string,
eventType: string,
payload?: unknown,
exitCode: number | null = null,
): void {
const timestamp = Date.now();
const serialised = payload !== undefined ? JSON.stringify(payload) : null;
const status = toWorkflowRunStatus(eventType);
if (status !== null) {
logStore.upsertWorkflowRun(
{
source: "workflow",
type: eventType,
refId: runId,
payload: serialised,
timestamp,
},
{ runId, workflow: workflowName, status, timestamp, exitCode },
);
} else {
logStore.append({
source: "workflow",
type: eventType,
refId: runId,
payload: serialised,
timestamp,
trackedWorkflows.add(workflowName);
if (runtime.hasDisconnectedChild(workflowName)) {
return;
}
if (!runtime.trySendSync(workflowName, msg)) {
void runtime.send(workflowName, msg).catch(() => {
// IPC channel closed — mark any thread from this message as failed
if (isStartThreadMsg(msg)) {
const state = states.get(workflowName);
if (state?.active.has(msg.runId)) {
state.active.delete(msg.runId);
logWorkflowEvent(workflowName, msg.runId, "failed", { error: "IPC channel closed" }, 1);
dequeueNext(workflowName);
}
}
});
}
}
function isStartThreadMsg(msg: unknown): msg is StartThreadMessage {
return (
msg !== null &&
typeof msg === "object" &&
(msg as Record<string, unknown>).type === "start-thread"
);
}
function dispatchThread(
workflowName: string,
runId: string,
@@ -314,7 +226,6 @@ export function createWorkflowManager(
const state = getOrCreateState(workflowName);
state.active.add(runId);
const worker = getOrSpawnWorker(workflowName);
const msg: StartThreadMessage = {
type: "start-thread",
runId,
@@ -323,7 +234,7 @@ export function createWorkflowManager(
maxRounds,
dryRun,
};
sendStartThread(worker.process, msg);
sendToWorker(workflowName, msg);
logWorkflowEvent(workflowName, runId, "started", { prompt, maxRounds, dryRun });
}
@@ -367,92 +278,20 @@ export function createWorkflowManager(
if (msg.eventType === "completed" || msg.eventType === "failed" || msg.eventType === "killed") {
state.active.delete(msg.runId);
dequeueNext(workflowName);
const exitCode = extractExitCode(msg.payload);
const exitCode = extractExitCodeFromPayload(msg.payload);
logWorkflowEvent(workflowName, msg.runId, msg.eventType, msg.payload, exitCode);
maybeDeferredHotReloadDrain(workflowName);
}
}
function recoverQueuedRun(workflowName: string, runId: string, state: WorkflowState): void {
if (state.queue.some((q) => q.runId === runId)) return;
const launch = readLaunchFromTriggerPayload(
logStore.getTriggerPayload(runId),
config.maxRounds,
);
state.queue.push({
runId,
prompt: launch.prompt,
maxRounds: launch.maxRounds,
dryRun: launch.dryRun,
});
process.stderr.write(
`[workflow-manager] crash-recovery: re-queued thread "${runId}" for "${workflowName}"\n`,
);
}
function recoverStartedRun(
workflowName: string,
runId: string,
state: WorkflowState,
worker: WorkerEntry,
): void {
if (state.active.has(runId)) return;
const rawMessages = logStore.getThreadMessages(runId);
const launch = readLaunchFromTriggerPayload(
logStore.getTriggerPayload(runId),
config.maxRounds,
);
const messages = ensureThreadMessagesWithStart(
rawMessages,
runId,
launch.prompt,
launch.maxRounds,
);
state.active.add(runId);
const msg: ResumeThreadMessage = {
type: "resume-thread",
runId,
messages,
maxRounds: launch.maxRounds,
dryRun: launch.dryRun,
};
sendResumeThread(worker.process, msg);
process.stderr.write(
`[workflow-manager] crash-recovery: resuming thread "${runId}" for "${workflowName}" (${messages.length} messages)\n`,
);
}
function recoverThreadsForWorker(workflowName: string, worker: WorkerEntry): void {
const activeRuns = logStore.getActiveWorkflowRuns(workflowName);
const state = getOrCreateState(workflowName);
for (const run of activeRuns) {
if (run.status === "queued") {
recoverQueuedRun(workflowName, run.runId, state);
} else if (run.status === "started") {
recoverStartedRun(workflowName, run.runId, state, worker);
}
}
}
function recordCrashAndCheckLimit(workflowName: string): boolean {
const now = Date.now();
const timestamps = (crashTimestamps.get(workflowName) ?? []).filter(
(t) => now - t < CRASH_WINDOW_MS,
);
timestamps.push(now);
crashTimestamps.set(workflowName, timestamps);
return timestamps.length > MAX_CRASHES_IN_WINDOW;
}
function handleWorkerCrash(workflowName: string): void {
function cleanupAfterUnexpectedWorkerExit(workflowName: string): void {
const state = states.get(workflowName);
if (state === undefined) return;
const crashedCount = state.active.size;
if (crashedCount > 0) {
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" crashed with ${crashedCount} active thread(s)\n`,
`[workflow-manager] worker for "${workflowName}" crashed with ${String(crashedCount)} active thread(s)\n`,
);
for (const runId of state.active) {
logWorkflowEvent(workflowName, runId, "crashed", undefined, 255);
@@ -460,26 +299,13 @@ export function createWorkflowManager(
}
state.active.clear();
workers.delete(workflowName);
pendingDrains.delete(workflowName);
if (stopped || workflowConfig(workflowName) === null) return;
if (recordCrashAndCheckLimit(workflowName)) {
const count = crashTimestamps.get(workflowName)?.length ?? 0;
if (!stopped && !crashLimitBlocked.has(workflowName) && workflowConfig(workflowName) !== null) {
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" crashed ${count} times in ${CRASH_WINDOW_MS}ms — stopping respawn\n`,
`[workflow-manager] respawning worker for "${workflowName}" after crash\n`,
);
return;
}
process.stderr.write(
`[workflow-manager] respawning worker for "${workflowName}" after crash\n`,
);
const newWorker = getOrSpawnWorker(workflowName);
setImmediate(() => {
recoverThreadsForWorker(workflowName, newWorker);
});
}
function handleWorkerMessage(workflowName: string, raw: unknown): void {
@@ -490,43 +316,19 @@ export function createWorkflowManager(
);
return;
}
const msg = result.value;
if (msg.type === "thread-event") {
handleThreadEvent(workflowName, msg);
return;
}
if (msg.type === "thread-workflow-message") {
logStore.append({
source: "workflow",
type: "thread_workflow_message",
refId: msg.runId,
payload: JSON.stringify(msg.message),
timestamp: Date.now(),
});
return;
}
if (msg.type === "workflow-error") {
process.stderr.write(
`[workflow-manager] workflow-error for runId "${msg.runId}" in "${workflowName}": ${msg.error}\n`,
);
const state = states.get(workflowName);
if (state !== undefined) {
state.active.delete(msg.runId);
dequeueNext(workflowName);
}
logWorkflowEvent(workflowName, msg.runId, "failed", { error: msg.error }, msg.exitCode);
maybeDeferredHotReloadDrain(workflowName);
return;
}
if (msg.type === "error") {
process.stderr.write(
`[workflow-manager] error from "${workflowName}" worker: ${msg.error}\n`,
);
}
dispatchWorkflowWorkerMessage(workflowName, result.value, {
logStore,
handleThreadEvent,
onWorkflowRoleError: (wf, runId, error, exitCode) => {
const state = states.get(wf);
if (state !== undefined) {
state.active.delete(runId);
dequeueNext(wf);
}
logWorkflowEvent(wf, runId, "failed", { error }, exitCode);
maybeDeferredHotReloadDrain(wf);
},
});
}
function markActiveRunsInterrupted(workflowName: string): void {
@@ -538,67 +340,6 @@ export function createWorkflowManager(
state.active.clear();
}
function handleWorkerExit(
workflowName: string,
code: number | null,
signal: NodeJS.Signals | null,
): void {
const entry = workers.get(workflowName);
if (entry?.draining) {
workers.delete(workflowName);
markActiveRunsInterrupted(workflowName);
if (!stopped && workflowConfig(workflowName) !== null) {
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" drained, respawning\n`,
);
getOrSpawnWorker(workflowName);
}
return;
}
if (entry?.stopping) {
workers.delete(workflowName);
const state = states.get(workflowName);
if (state !== undefined) {
state.active.clear();
}
return;
}
const summary = formatChildExitSummary(code, signal);
const stderrExtra = entry !== undefined ? formatCapturedStderrTail(entry.stderrTail.value) : "";
process.stderr.write(
`[workflow-manager] worker for "${workflowName}" exited (${summary})${stderrExtra}\n`,
);
handleWorkerCrash(workflowName);
}
function getOrSpawnWorker(workflowName: string): WorkerEntry {
const existing = workers.get(workflowName);
if (existing !== undefined && existing.process.exitCode === null) {
return existing;
}
const stderrTail = { value: "" };
const child = spawnWorkflowWorker(nerveRoot, workflowName, workerScript, stderrTail);
child.on("message", (raw: unknown) => {
handleWorkerMessage(workflowName, raw);
});
child.on("exit", (code, signal) => {
handleWorkerExit(workflowName, code, signal ?? null);
});
const entry: WorkerEntry = {
workflowName,
process: child,
stopping: false,
draining: false,
stderrTail,
};
workers.set(workflowName, entry);
return entry;
}
function killThread(runId: string): boolean {
for (const [workflowName, state] of states) {
const queueIdx = state.queue.findIndex((q) => q.runId === runId);
@@ -609,10 +350,8 @@ export function createWorkflowManager(
}
if (state.active.has(runId)) {
const workerEntry = workers.get(workflowName);
if (workerEntry !== undefined) {
sendKillThread(workerEntry.process, runId);
}
const msg: KillThreadMessage = { type: "kill-thread", runId };
sendToWorker(workflowName, msg);
return true;
}
}
@@ -663,7 +402,7 @@ export function createWorkflowManager(
state.queue.push({ runId, prompt, maxRounds, dryRun });
logWorkflowEvent(workflowName, runId, "queued");
process.stderr.write(
`[workflow-manager] queued thread for "${workflowName}" runId "${runId}" (queue length: ${state.queue.length})\n`,
`[workflow-manager] queued thread for "${workflowName}" runId "${runId}" (queue length: ${String(state.queue.length)})\n`,
);
}
@@ -707,35 +446,29 @@ export function createWorkflowManager(
config = newConfig;
}
/**
* Default drain timeout must be at least WORKER_SHUTDOWN_TIMEOUT_MS so the worker
* has enough time to finish in-flight threads before the parent force-kills it.
*/
const DEFAULT_DRAIN_TIMEOUT_MS = Math.max(30_000, WORKER_SHUTDOWN_TIMEOUT_MS + 5_000);
async function drainAndRespawn(
workflowName: string,
drainTimeoutMs: number = DEFAULT_DRAIN_TIMEOUT_MS,
): Promise<void> {
const entry = workers.get(workflowName);
if (entry === undefined) {
// No active worker — nothing to drain
if (!trackedWorkflows.has(workflowName)) {
return;
}
entry.draining = true;
// Send shutdown without setting stopping=true (so the exit handler uses the draining branch)
if (entry.process.connected) {
const msg: ShutdownMessage = { type: "shutdown" };
try {
entry.process.send(msg);
} catch {
// IPC closed
const shutdownMs = Math.max(drainTimeoutMs, WORKER_SHUTDOWN_TIMEOUT_MS);
hotReloadEvicting.add(workflowName);
try {
await runtime.evict(workflowName, { shutdownTimeoutMs: shutdownMs });
trackedWorkflows.delete(workflowName);
if (!stopped && workflowConfig(workflowName) !== null) {
trackedWorkflows.add(workflowName);
await runtime.start(workflowName);
}
} finally {
hotReloadEvicting.delete(workflowName);
}
await waitForExit(entry.process, drainTimeoutMs);
// The exit handler (draining branch) will respawn the worker automatically
}
function drainWhenIdle(workflowName: string): void {
const state = states.get(workflowName);
const hasActiveRuns = state !== undefined && state.active.size > 0;
@@ -761,20 +494,17 @@ export function createWorkflowManager(
pendingDrains.add(workflowName);
process.stderr.write(
`[workflow-manager] deferring hot-reload for "${workflowName}" until ${state.active.size} active run(s) complete\n`,
`[workflow-manager] deferring hot-reload for "${workflowName}" until ${String(state.active.size)} active run(s) complete\n`,
);
}
async function stop(): Promise<void> {
stopped = true;
pendingDrains.clear();
const exitPromises: Promise<void>[] = [];
for (const entry of workers.values()) {
sendShutdown(entry.process, entry);
exitPromises.push(waitForExit(entry.process, 5000));
}
await Promise.all(exitPromises);
workers.clear();
hotReloadEvicting.clear();
crashRecoveryPending.clear();
await runtime.shutdown();
trackedWorkflows.clear();
}
return {
+1 -1
View File
@@ -30,7 +30,7 @@ import type {
WorkerToParentMessage,
} from "./ipc.js";
import { parseParentMessage } from "./ipc.js";
import { ignoreSessionBroadcastSignals } from "./worker-fork-support.js";
import { ignoreSessionBroadcastSignals } from "./worker-signals.js";
// ---------------------------------------------------------------------------
// IPC helpers