Compare commits
26 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 06b91c2e63 | |||
| b7200ce51c | |||
| 3eab2e29f5 | |||
| 10c4cf4148 | |||
| 5db80c99a0 | |||
| 49f3d91d1b | |||
| 4d75c8683f | |||
| 99b0e58fb6 | |||
| a1b1d5eaf1 | |||
| 1849789c02 | |||
| 7ce46e7735 | |||
| 0455f928f5 | |||
| 11cedfb5a5 | |||
| ed1bc4e25f | |||
| dc4454d23e | |||
| 7c256620c5 | |||
| 14898e1827 | |||
| 082d2e72f2 | |||
| fbf63e0266 | |||
| 7d89e8ab61 | |||
| e67ddc58d8 | |||
| 06b1e3d785 | |||
| f828ebc28b | |||
| 809a11afe3 | |||
| 4dffcb636b | |||
| c34ec46416 |
+22
-1
@@ -28,8 +28,29 @@ For long-running or incremental agent outputs:
|
||||
|---------|---------|------|
|
||||
| `@uncaged/nerve-adapter-cursor` | `cursorAdapter` / `createCursorAdapter()` | cursor-agent CLI |
|
||||
| `@uncaged/nerve-adapter-hermes` | `hermesAdapter` / `createHermesAdapter()` | hermes chat CLI |
|
||||
| `@uncaged/nerve-workflow-utils` | `createLlmAdapter(provider)` | OpenAI-compatible HTTP chat (single-turn) |
|
||||
|
||||
Each exports a **default instance** (sensible defaults) and a **factory** for custom config.
|
||||
The Cursor and Hermes adapter packages each export a **default instance** (sensible defaults) and a **factory** for custom config. `createLlmAdapter` is a factory on `@uncaged/nerve-workflow-utils` only.
|
||||
|
||||
## createLlmAdapter
|
||||
|
||||
`createLlmAdapter` builds an `AgentFn` from an `LlmProvider` (`baseUrl`, `apiKey`, `model`). One chat completion per role step: **system** = the string passed by `createRole` (your prompt); **user** = `ctx.start.content` (the thread’s start frame). On failure it throws with a formatted LLM error.
|
||||
|
||||
```ts
|
||||
import { createLlmAdapter, createRole } from "@uncaged/nerve-workflow-utils";
|
||||
import { z } from "zod";
|
||||
|
||||
const metaSchema = z.object({ ok: z.boolean() });
|
||||
|
||||
const planner = createRole(
|
||||
createLlmAdapter({ baseUrl: "https://api.example.com/v1", apiKey: "…", model: "gpt-4o-mini" }),
|
||||
"You are a planner…",
|
||||
metaSchema,
|
||||
extractConfig,
|
||||
);
|
||||
```
|
||||
|
||||
Use this when you want a role backed by an HTTP LLM instead of a subprocess CLI adapter.
|
||||
|
||||
## Usage in Workflows
|
||||
|
||||
|
||||
@@ -33,6 +33,7 @@ Senses own both the "what" (compute logic) and the "when" (config-driven schedul
|
||||
- One worker per Workflow type (on-demand)
|
||||
- Workers never talk to each other
|
||||
- All user code runs in isolated Workers; kernel never loads user code directly
|
||||
- **`WorkerRuntime`** (`packages/daemon/src/worker-runtime.ts`) centralizes fork lifecycle for both sense groups (`worker-pool.ts`) and workflow types (`workflow-manager.ts`); see `.knowledge/worker-isolation.md`
|
||||
|
||||
## Storage Systems
|
||||
|
||||
|
||||
@@ -12,6 +12,12 @@ Kernel (Main Process)
|
||||
└── Workflow Worker (review) ── review workflow instances
|
||||
```
|
||||
|
||||
### WorkerRuntime (RFC-006)
|
||||
|
||||
Forked worker processes are managed by **`WorkerRuntime`** (`worker-runtime.ts`): one Node child per logical key, cold start, optional respawn after crash, drain/evict, and coordinated shutdown over IPC. **`worker-pool.ts`** (sense groups) and **`workflow-manager.ts`** (workflow types) both configure and delegate to `createWorkerRuntime` instead of owning ad-hoc fork logic.
|
||||
|
||||
Worker **entrypoints** (`sense-worker.ts`, `workflow-worker.ts`) import lightweight helpers only — e.g. `worker-signals.ts` for session broadcast signal handling — so they do not pull in the parent-side runtime module.
|
||||
|
||||
## Isolation Boundaries
|
||||
|
||||
### 1. Sense Workers
|
||||
@@ -111,10 +117,10 @@ workflows:
|
||||
### Process Management
|
||||
|
||||
#### Signal Handling
|
||||
Workers ignore session broadcast signals (SIGINT/SIGTERM):
|
||||
Workers ignore session broadcast signals (SIGINT/SIGTERM) via `ignoreSessionBroadcastSignals()` in `worker-signals.ts`:
|
||||
```typescript
|
||||
// Workers ignore terminal signals; kernel coordinates shutdown
|
||||
process.on("SIGINT", () => {});
|
||||
process.on("SIGINT", () => {});
|
||||
process.on("SIGTERM", () => {});
|
||||
```
|
||||
|
||||
|
||||
@@ -2,6 +2,20 @@
|
||||
|
||||
Stateful multi-step execution driven by Roles and a Moderator.
|
||||
|
||||
## Workspace Layout (authoring)
|
||||
|
||||
User Nerve workspaces use a **flat** build: one root `package.json`, one root bundle script (typically `scripts/build.mjs` wired from `scripts.build`), and **no** per-workflow `package.json` or `tsconfig.json`.
|
||||
|
||||
| Location | Purpose |
|
||||
|----------|---------|
|
||||
| `workflows/<name>/index.ts` | Default export: `WorkflowDefinition` (moderator + role map). |
|
||||
| `workflows/<name>/roles/<role>.ts` | One module per role — schemas, prompts, `createRole` factories, or hand-written async role functions. |
|
||||
| `dist/workflows/<name>/index.js` | Emit of the root build; this is what the daemon loads. |
|
||||
|
||||
**Naming:** Workflow ids should be **verb-first** kebab-case phrases (e.g. `deploy-staging`, `scan-dependencies`), not opaque nouns alone.
|
||||
|
||||
Senses follow the same flat pattern: `senses/<name>/src/*.ts`, `migrations/`, root build → `dist/senses/<name>/index.js`. See `.knowledge/sense.md`.
|
||||
|
||||
## Core Concepts
|
||||
|
||||
- **Workflow** — definition with concurrency strategy
|
||||
|
||||
@@ -0,0 +1,146 @@
|
||||
# Nerve 单仓死代码分析报告
|
||||
|
||||
**分析日期**: 2026-04-30
|
||||
**范围**: `packages/core`, `daemon`, `store`, `cli`, `khala`, `workflow-utils`, `workflow-meta`, `adapter-cursor`, `adapter-hermes` 的 TypeScript 源码与 `package.json` 依赖。
|
||||
**方法**: 全仓 `ripgrep` 交叉验证 `import` / `export` 路径;未运行 Knip/TS 死代码专项工具。
|
||||
**说明**: 未包含 `role-reviewer` / `role-committer` / `skills` 等包(你列出的范围外),但 `workflow-meta` 对其中部分有依赖,相关结论已注明。
|
||||
|
||||
---
|
||||
|
||||
## 方法限制(读前必读)
|
||||
|
||||
| 限制 | 影响 |
|
||||
|------|------|
|
||||
| **动态 `import(url)`** | `cli` 的 `loadDaemonModule` 在运行时加载 `@uncaged/nerve-daemon`,静态分析无法把 `createKernel` 等映射到具体导出。 |
|
||||
| **已发布包的公共 API** | 大量导出仅被仓外消费者使用;本报告中的「仓内未引用」≠「应删除」。 |
|
||||
| **构建入口** | Rslib 多入口(如 `sense-worker`、`workflow-worker`、`daemon-bootstrap`)视为存活入口,不作为孤儿文件。 |
|
||||
| **内部未导出函数** | 未做逐函数调用图;「死函数」仅列出高置信个例。 |
|
||||
|
||||
---
|
||||
|
||||
## 1. 未使用导出(仓内无 `import`)
|
||||
|
||||
以下符号在 **monorepo 内** 没有任何文件从 `@uncaged/nerve-core` / `@uncaged/nerve-workflow-utils` / `@uncaged/nerve-daemon` 等包根导出处引用(测试与定义自身除外)。
|
||||
|
||||
### 1.1 `@uncaged/nerve-core`
|
||||
|
||||
| 路径 | 未使用项 | 置信度 | 建议 |
|
||||
|------|----------|--------|------|
|
||||
| `packages/core/src/index.ts` → `agent.js` | `KNOWN_AGENT_ADAPTER_IDS` | **高** | **调查**:若 CLI/校验应约束 adapter id,可接入;否则可从公共 API 移除或保留作文档性常量并注明。 |
|
||||
| `packages/core/src/index.ts` → `sense.js` | `labelSenseTrigger`(`senseTriggerLabels` 在 `cli`/`daemon` 有使用) | **高(相对仓内)** | **保留或收敛导出**:仅 `senseTriggerLabels` 对外即可;`labelSenseTrigger` 可作为内部函数若不需要单独暴露。 |
|
||||
| `packages/core/src/index.ts` → `util.js` | `parseDurationStringToMs`(仅 `config.ts` 内部使用) | **高** | **调查**:无需在 `index` 再导出则改为非导出或移除 re-export,减少 API 面。 |
|
||||
| `packages/core/src/index.ts` → `sense.js` | 类型 `SenseModule` | **中** | **保留**:用户文档/外部 Sense 模块常用;仓内未按名引用属正常。 |
|
||||
| `packages/core/src/index.ts` → `config.js` | 单独导出的类型 `NerveApiConfig`(仅 `config.ts` 与 `index` 出现) | **低** | **保留**:通常随 `NerveConfig` 一并使用;单独 export 多为 TS 公共类型便利。 |
|
||||
|
||||
### 1.2 `@uncaged/nerve-workflow-utils`
|
||||
|
||||
下列项在 **`packages/*/src/**/*.ts` 中无人从包入口导入**(`workflow-meta`、`role-committer` 等主要只用 `createRole`、`decorateRole`、`onFail`、`withDryRun`、`LlmExtractorConfig`)。
|
||||
|
||||
| 路径 | 未使用项 | 置信度 | 建议 |
|
||||
|------|----------|--------|------|
|
||||
| `packages/workflow-utils/src/index.ts` | `createLlmAdapter` | **高** | **保留(对外 API)** 或标注文档;仓内无调用。 |
|
||||
| 同上 | `llmExtract`、`llmExtractWithRetry` | **高** | **保留**:高级用法 / 文档示例;内部与测试使用。 |
|
||||
| 同上 | `mergeExtractConfig`、`ExtractConfigLayer` | **高** | **调查**:若 RFC 分层配置仍在推进则保留;否则评估移除导出。 |
|
||||
| 同上 | `assertZodMetaSchemas`、`createLlmExtractFn`、`extractMetaOrThrow`、`zodMeta`、`ZodMetaSchema` | **高** | **保留(对外 API)**;当前仅 `workflow-utils` 测试与内部 `create-role` 链路使用 `extractMetaOrThrow`。 |
|
||||
| 同上 | `readNerveYaml`、`nerveAgentContext` 及相关类型 | **高** | **调查**:若已无 YAML 上下文注入场景可删导出;否则保留给 Agent 工具链。 |
|
||||
| 同上 | `spawnSafe`、`nerveCommandEnv` 及 `Spawn*` 类型(从 core 再导出) | **高** | **删除再导出或改为文档链接**:仓内均直接从 `@uncaged/nerve-core` 引用 spawn。 |
|
||||
| 同上 | `isDryRun` | **高** | **删除或实现**:见 §3「死函数」。 |
|
||||
| 同上 | `LlmMessage`、`MetaExtractConfig`、`LlmChatError`、`LlmError`、`LlmProvider` 等类型再导出 | **中** | **保留**:供外部精细类型标注;仓内未单独 import。 |
|
||||
|
||||
### 1.3 `@uncaged/nerve-daemon`
|
||||
|
||||
| 路径 | 未使用项 | 置信度 | 建议 |
|
||||
|------|----------|--------|------|
|
||||
| `packages/daemon/src/index.ts` → `agent-adapters/echo.js` | `createEchoAgent` | **高** | **调查**:无任何测试或运行时引用;若设计保留 `type: "echo"` 适配器,应在 workflow/agent 装载路径接线或删除导出与文件。 |
|
||||
| `packages/daemon/src/index.ts`(及其它导出) | 多数 IPC / runtime 符号 | **低** | **保留**:动态加载与外部集成无法静态判定;仅 `createEchoAgent` 可确定为当前静态图下的死角。 |
|
||||
|
||||
### 1.4 `@uncaged/nerve-adapter-cursor` / `@uncaged/nerve-adapter-hermes`
|
||||
|
||||
| 路径 | 未使用项 | 置信度 | 建议 |
|
||||
|------|----------|--------|------|
|
||||
| `packages/adapter-cursor/src/index.ts` | `cursorAdapter`、`createCursorAdapter`(以及 `cursorAgent` 仅被 `workflow-utils/src/role-cursor.ts` 使用) | **中** | **保留**:默认实例与工厂供下游与工作流仓使用;仓内业务包未直接 import `cursorAdapter` 属预期。 |
|
||||
| `packages/adapter-hermes/src/index.ts` | `hermesAdapter`、`createHermesAdapter`(`hermesAgent` 经 `workflow-utils` 间接使用) | **中** | 同上。 |
|
||||
|
||||
### 1.5 `@uncaged/nerve-cli`
|
||||
|
||||
| 路径 | 未使用项 | 置信度 | 建议 |
|
||||
|------|----------|--------|------|
|
||||
| `packages/cli/src/index.ts` | `getNerveRoot`、`loadDaemonModule` 等程序化导出 | **高(仓内)** | **保留**:无任何 workspace 包引用 `@uncaged/nerve-cli`;面向 CLI 二次开发者。 |
|
||||
|
||||
---
|
||||
|
||||
## 2. 孤儿文件(非入口、未被其它源码引用)
|
||||
|
||||
在当前 Rslib / `cli` 入口定义下,**未发现**「零 import、且非 entry / 非测试」的 `.ts` 业务文件:
|
||||
|
||||
- `workflow-utils` 下的 `role-cursor.ts`、`role-hermes.ts`、`role-llm.ts`、`role-react.ts` 虽**未出现在包 `index.ts`**,但被 `packages/workflow-utils/src/__tests__/role-factories.test.ts` 直接引用,**不是孤儿文件**。
|
||||
- `daemon` 的 `sense-worker.ts`、`workflow-worker.ts` 等为 **显式构建入口**。
|
||||
|
||||
**置信度**: 对全表扫描为 **中**(未跑依赖图工具;动态路径可能掩盖极少数脚本引用)。
|
||||
|
||||
---
|
||||
|
||||
## 3. 死函数(内部未导出且未被调用)
|
||||
|
||||
| 路径 | 说明 | 置信度 | 建议 |
|
||||
|------|------|--------|------|
|
||||
| `packages/workflow-utils/src/role-types.ts` | `isDryRun(_start: StartStep)`:导出在 `index.ts`,但全仓无调用(函数体恒返回 `false` 且标注 deprecated) | **高** | **删除**或改为内部常量;若需向后兼容则保留并文档标注「保留占位」。 |
|
||||
|
||||
其它内部函数未系统逐文件枚举。
|
||||
|
||||
---
|
||||
|
||||
## 4. 未使用的 npm 依赖(package.json)
|
||||
|
||||
在声明范围内通过源码 `import` 抽查:**未发现明显「完全未使用」的生产依赖**。
|
||||
|
||||
| 包 | 依赖 | 结论 |
|
||||
|----|------|------|
|
||||
| `cli` | `citty`、`yaml`、`picomatch` | 均有对应源码引用。 |
|
||||
| `khala` | `hono`、`jsonata`、`ulidx`、`@uncaged/nerve-core` | 均有引用。 |
|
||||
| `core` / `daemon` | `yaml`、`drizzle-orm` | 均有引用。 |
|
||||
| `workflow-utils` / `workflow-meta` | `zod`、workspace 包 | 均有引用。 |
|
||||
|
||||
**置信度**: **中**(未对 peer、可选路径、CLI `bin` 专用依赖做自动化扫描)。
|
||||
|
||||
---
|
||||
|
||||
## 5. 陈旧测试夹具 / 未引用辅助文件
|
||||
|
||||
未发现独立的「fixture 目录」明显失联;`cli` 下 `e2e-harness`、`__tests__` 内 helper 均有对应测试引用。
|
||||
|
||||
**置信度**: **低**(未枚举每个 `__tests__` 资源文件)。
|
||||
|
||||
---
|
||||
|
||||
## 6. 重构遗留 / 文档漂移
|
||||
|
||||
| 项目 | 位置 | 说明 | 置信度 | 建议 |
|
||||
|------|------|------|--------|------|
|
||||
| 已更名 API 仍出现在 README | `packages/core/README.md` | 仍描述 `parseSenseWorkflowDirective`、`ParsedSenseWorkflowDirective`、`SenseComputeRoute`;源码已为 `parseWorkflowTrigger` / `routeSenseComputeOutput` / `RoutedSenseOutput` | **高** | **更新文档**(本次分析不改代码,仅记录)。 |
|
||||
| Hermes 选项合并注释 | `packages/workflow-utils/src/shared/hermes-agent.ts` | 注释称 absorbed from `hermes-options.ts`,该文件已不存在 | **中** | **清理注释**,避免误导。 |
|
||||
| `KNOWN_AGENT_ADAPTER_IDS` 含 `codex` | `packages/core/src/agent.ts` | 仓内无 `codex` 适配器包;与常量未被引用叠加 | **中** | **对齐产品**:实现适配器或从列表移除。 |
|
||||
|
||||
未发现用户提到的 `worker-fork-support` 等字符串副本(全仓无匹配)。
|
||||
|
||||
---
|
||||
|
||||
## 7. 汇总建议优先级
|
||||
|
||||
1. **高优先级调查**: `createEchoAgent` 与 `KNOWN_AGENT_ADAPTER_IDS` — 要么接入运行时,要么删减以免维护假象。
|
||||
2. **API 面收敛**: `parseDurationStringToMs`、`labelSenseTrigger` 若无意对外,可从 `core` 公共导出移除。
|
||||
3. **`workflow-utils`**: 评估 `isDryRun` 删除;`spawnSafe` 等从 `workflow-utils` 再导出是否仍有必要。
|
||||
4. **文档**: 修正 `packages/core/README.md` 中 Sense→Workflow 路由 API 名称。
|
||||
|
||||
---
|
||||
|
||||
## 8. 重新验证命令示例
|
||||
|
||||
后续可在本地采用工具增强置信度(非本次执行):
|
||||
|
||||
```bash
|
||||
pnpm add -D -w knip
|
||||
pnpm exec knip
|
||||
```
|
||||
|
||||
或在各包对 `@uncaged/nerve-*` 的导出做面向消费者的契约测试,避免误删对外 API。
|
||||
@@ -10,7 +10,7 @@
|
||||
},
|
||||
"main": "dist/index.js",
|
||||
"types": "dist/index.d.ts",
|
||||
"files": ["dist"],
|
||||
"files": ["dist", "skills"],
|
||||
"publishConfig": {
|
||||
"access": "public"
|
||||
},
|
||||
|
||||
@@ -0,0 +1,587 @@
|
||||
<!-- nerve-cli-version: __NERVE_CLI_VERSION__ -->
|
||||
|
||||
## Cursor Agent 使用提示
|
||||
|
||||
在 Cursor 中与 Agent 对话时,可以用以下方式指代代码与配置:
|
||||
|
||||
- **`@Files` / `@file`**:引用单个文件,例如 `@nerve.yaml`、`@senses/cpu-usage/src/index.ts`,减少幻觉并让修改对准正确路径。
|
||||
- **`@Folder` / `@Codebase`**:需要跨目录理解工作区结构时使用;改动前仍应优先打开相关 sense/workflow 源文件确认。
|
||||
- **`@Terminal`**:把 CLI 输出纳入上下文,便于对照 `nerve daemon logs`、`nerve sense query` 等结果。
|
||||
- **`@Docs`**:若项目或依赖有文档索引,可用来对齐 API 与约定。
|
||||
- 工作区根目录下的 **`nerve.yaml`**、`senses/`、`workflows/` 是 nerve 的核心入口;讨论调度与配置时优先 `@` 这些路径。
|
||||
- 本规则由 `nerve agent inject cursor` 安装;更新 CLI 后在同一目录再次执行可覆盖为新版。
|
||||
|
||||
---
|
||||
|
||||
# Nerve — AI Agent 观测引擎
|
||||
|
||||
Nerve 是一个轻量级观测引擎守护进程。它持续观测外部状态,通过声明式规则响应变化,编排多步骤工作流。
|
||||
|
||||
## 核心架构
|
||||
|
||||
```
|
||||
External World → Sense → Signal → Workflow → Log
|
||||
```
|
||||
|
||||
| 概念 | 说明 |
|
||||
|------|------|
|
||||
| **Sense** | 观测函数,`compute()` 采样或推导数据。返回非 null 则发出 Signal,可选触发 Workflow。每个 Sense 有独立 SQLite 数据库。 |
|
||||
| **Signal** | Sense 返回非 null 时发出的通知。纯事实,无意图。通过内存 Signal Bus 分发,不持久化。 |
|
||||
| **Workflow** | 有状态的多步骤执行。包含 Role(有副作用的执行者)和 Moderator(纯路由器)。每个实例是一个 Thread,有唯一 runId。 |
|
||||
| **Log** | 不可变审计日志。记录执行、状态转换、错误。不能触发 Sense(防止反馈循环)。 |
|
||||
| **Engine** | 内核,持有 Signal Bus、Process Manager、Workflow Manager。不直接加载用户代码。 |
|
||||
| **Daemon** | 引擎运行时,作为后台进程运行。 |
|
||||
|
||||
**关键规则:**
|
||||
- 因果链单向:External → Sense → Signal → Workflow + Log
|
||||
- 进程隔离:每个 Sense group 一个 worker(长期),每个 Workflow 类型一个 worker(按需)
|
||||
- 两个扩展点:Sense(观测什么 + 何时)、Workflow(做什么)
|
||||
|
||||
## 工作区结构
|
||||
|
||||
由 `nerve init` 生成的工作区根目录(默认 `~/.uncaged-nerve/`)包含 **`AGENT.md`**。实现 sense/workflow 前先阅读该文件:它与本文 skill 对齐,约定目录布局、`createRole` 用法以及**始终在仓库根目录**执行的构建命令。
|
||||
|
||||
```
|
||||
~/.uncaged-nerve/
|
||||
├── AGENT.md # 人类 / Agent 可读的工作区约定(init 生成)
|
||||
├── nerve.yaml # 核心配置
|
||||
├── package.json # 单一根包(sense/workflow 下不再有独立 package)
|
||||
├── scripts/build.mjs # 根目录 esbuild;通过 npm/pnpm 的 build 脚本调用
|
||||
├── senses/
|
||||
│ └── <name>/
|
||||
│ ├── src/index.ts # exports compute() + table
|
||||
│ ├── src/schema.ts # Drizzle 表定义
|
||||
│ └── migrations/ # SQL 迁移
|
||||
├── workflows/
|
||||
│ └── <name>/
|
||||
│ ├── index.ts # default export:WorkflowDefinition
|
||||
│ ├── moderator.ts # 可选:抽出 moderator,由 index 导入
|
||||
│ ├── build.ts # 可选:共享常量 / 纯函数(避免 index 臃肿;非 esbuild 入口)
|
||||
│ └── roles/
|
||||
│ └── <role>.ts # 每角色单文件(推荐平铺,而非 roles/<role>/index.ts)
|
||||
└── data/ # 运行时数据(SQLite、blobs)
|
||||
```
|
||||
|
||||
### 命名约定
|
||||
|
||||
- **Workflow**:动词开头的 kebab-case(例如 `review-pull-request`、`deploy-staging`)。避免单独名词式命名(如 `notifications`)。
|
||||
- **Sense**:描述性名词 kebab-case(例如 `cpu-usage`)。
|
||||
|
||||
---
|
||||
|
||||
## CLI 完整参考
|
||||
|
||||
全局选项:`--host <host:port>`(连接远程 daemon)、`--api-token <secret>`(Bearer 认证)
|
||||
|
||||
### 初始化与脚手架
|
||||
|
||||
```bash
|
||||
nerve init # 初始化工作区
|
||||
nerve init --from <git-url> # 从 git 仓库克隆工作区
|
||||
nerve init workspace # 只初始化工作区结构
|
||||
|
||||
nerve create sense <name> # 创建 sense 脚手架
|
||||
nerve create sense <name> --force # 覆盖已有
|
||||
nerve create workflow <name> # 创建 workflow 脚手架
|
||||
nerve create workflow <name> --force
|
||||
|
||||
nerve validate # 验证 nerve.yaml 配置
|
||||
```
|
||||
|
||||
### Daemon 管理
|
||||
|
||||
```bash
|
||||
nerve daemon start # 启动后台 daemon
|
||||
nerve daemon start --port 3000 # 指定 HTTP API 端口
|
||||
nerve daemon stop # 停止 daemon
|
||||
nerve daemon restart # 重启
|
||||
nerve daemon status # 查看状态
|
||||
nerve daemon logs # 查看日志
|
||||
nerve daemon logs --follow # 实时日志
|
||||
nerve daemon logs --n 50 # 最近 50 行
|
||||
|
||||
nerve dev # 前台开发模式(不 fork daemon)
|
||||
nerve dev --port 3000 # 指定端口
|
||||
```
|
||||
|
||||
### Sense 操作
|
||||
|
||||
```bash
|
||||
nerve sense list # 列出所有注册的 sense
|
||||
nerve sense trigger <name> # 手动触发 sense 计算
|
||||
nerve sense schema <name> # 查看 sense 数据库表结构
|
||||
nerve sense schema <name> --json # JSON 格式
|
||||
nerve sense query <name> <sql> # 对 sense 数据库执行只读 SQL
|
||||
nerve sense query <name> "SELECT * FROM samples ORDER BY ts DESC LIMIT 10" --json
|
||||
```
|
||||
|
||||
### Workflow 操作
|
||||
|
||||
```bash
|
||||
nerve workflow list # 列出 nerve.yaml 中定义的 workflow
|
||||
nerve workflow status # 查看运行中的 workflow 状态
|
||||
nerve workflow trigger <name> # 触发 workflow
|
||||
nerve workflow trigger <name> --prompt "检查生产环境"
|
||||
nerve workflow trigger <name> --maxRounds 50
|
||||
nerve workflow trigger <name> --dryRun # 干跑模式
|
||||
```
|
||||
|
||||
### Thread(Workflow 执行记录)
|
||||
|
||||
```bash
|
||||
nerve thread list # 列出最近的 workflow 执行
|
||||
nerve thread list --all # 包含已完成/失败的
|
||||
nerve thread list --workflow <name> # 按 workflow 过滤
|
||||
nerve thread list --limit 50 # 最多 50 条
|
||||
|
||||
nerve thread show <runId> # 查看 role 对话轮次
|
||||
nerve thread show <runId> --budget 16000 # 增大输出预算(默认 8000 字符)
|
||||
|
||||
nerve thread inspect <runId> # 查看详情和事件
|
||||
|
||||
nerve thread kill <runId> # 终止运行中/排队中的 thread
|
||||
```
|
||||
|
||||
### Store(日志归档)
|
||||
|
||||
```bash
|
||||
nerve store archive # 导出旧日志到 JSONL 归档
|
||||
nerve store archive --vacuum # 归档后 VACUUM 数据库
|
||||
```
|
||||
|
||||
### Knowledge(知识库)
|
||||
|
||||
```bash
|
||||
nerve knowledge sync # 从 knowledge.yaml 重建索引
|
||||
nerve knowledge query "搜索内容" # 搜索知识库
|
||||
nerve knowledge query "内容" --limit 5
|
||||
nerve knowledge query "内容" -g # 搜索所有注册仓库
|
||||
```
|
||||
|
||||
### Remote(远程 daemon)
|
||||
|
||||
```bash
|
||||
nerve remote add <name> <host:port> --token <secret>
|
||||
nerve remote list
|
||||
nerve remote show <name>
|
||||
nerve remote set-url <name> <host>
|
||||
nerve remote set-token <name> <token>
|
||||
nerve remote remove <name>
|
||||
nerve remote default <name> # 设为默认远程
|
||||
```
|
||||
|
||||
### Agent(向 Hermes 注入本 skill)
|
||||
|
||||
```bash
|
||||
nerve agent status # CLI 版本与各 Hermes 注入目录中的 skill 版本
|
||||
nerve agent inject hermes # 安装到 ~/.hermes/skills/nerve
|
||||
nerve agent inject hermes --profile <name> # 写入 ~/.hermes/profiles/<name>/skills/nerve
|
||||
nerve agent update # 将所有已注入目录更新到当前 CLI 对应版本
|
||||
nerve agent remove hermes # 移除默认 profile 的注入
|
||||
nerve agent remove hermes --profile <name>
|
||||
|
||||
nerve agent inject cursor # 在 cwd 生成 .cursorrules
|
||||
nerve agent inject cursor --path /foo # 在指定目录生成
|
||||
nerve agent remove cursor [--path /foo]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## nerve.yaml 配置参考
|
||||
|
||||
```yaml
|
||||
# 引擎全局配置
|
||||
max_rounds: 100 # moderator 最大轮次(默认 100)
|
||||
|
||||
# Sense 配置
|
||||
senses:
|
||||
cpu-usage:
|
||||
group: system # 必填,同 group 的 sense 共享 worker
|
||||
interval: 10s # 轮询间隔(duration: 5s, 10m, 1h)
|
||||
throttle: 5s # 最小计算间隔
|
||||
timeout: 10s # compute 超时
|
||||
grace_period: null # 优雅关闭等待
|
||||
retention: 10000 # _signals 表最大行数(默认 10000)
|
||||
|
||||
system-health:
|
||||
group: derived
|
||||
on: [cpu-usage, disk-usage] # 响应式:被列出的 sense 发出 signal 时触发
|
||||
throttle: null
|
||||
timeout: null
|
||||
|
||||
# Workflow 配置
|
||||
workflows:
|
||||
my-workflow:
|
||||
concurrency: 1 # 必填,并发数
|
||||
overflow: drop # 必填,超并发时处理:drop | queue
|
||||
max_queue: 100 # overflow=queue 时的队列上限(默认 100)
|
||||
|
||||
# HTTP API
|
||||
api:
|
||||
port: 3000 # null = 不启用 HTTP
|
||||
host: "127.0.0.1" # 监听地址
|
||||
token: null # 非 loopback 时必填
|
||||
|
||||
# LLM Extract(可选)
|
||||
extract:
|
||||
provider: anthropic
|
||||
model: claude-sonnet-4-20250514
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sense 开发指南
|
||||
|
||||
### compute 函数签名
|
||||
|
||||
Sense 的 `compute` **无参数**。它不接收数据库句柄:daemon 在 worker 内调用 `SenseComputeFn`,由运行时负责把非 null 结果的 `signal` 写入该 sense 的 Drizzle 表并记入 `_signals`。超时由运行时控制(对应 `nerve.yaml` 里的 `timeout`),无需在业务代码里读取 `AbortSignal`。
|
||||
|
||||
```typescript
|
||||
import type { ComputeResult, SenseComputeFn } from "@uncaged/nerve-core";
|
||||
|
||||
export const compute: SenseComputeFn<MySignalShape> = async () => {
|
||||
// ...
|
||||
};
|
||||
// 或等价地:
|
||||
export async function compute(): Promise<ComputeResult<MySignalShape>> {
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
(运行时定义见 `@uncaged/nerve-core` 的 `SenseComputeFn` / `SenseModule`,daemon 侧在 `sense-runtime.ts` 的 `executeCompute` 中插入 `result.signal`。)
|
||||
|
||||
### 返回值
|
||||
|
||||
```typescript
|
||||
// 返回 null = 静默,不发 signal
|
||||
// 返回非 null = 发出 signal(并写入业务表),可选触发 workflow
|
||||
type ComputeResult<T> =
|
||||
| null
|
||||
| { signal: T; workflow: WorkflowTrigger | null };
|
||||
|
||||
type WorkflowTrigger = {
|
||||
name: string; // workflow 名称(对应 nerve.yaml 中的 key)
|
||||
maxRounds: number; // moderator 最大轮次
|
||||
prompt: string; // 初始 prompt
|
||||
dryRun: boolean; // 干跑模式
|
||||
};
|
||||
```
|
||||
|
||||
若返回值是普通对象且不含 `signal` 字段,内核会按 shorthand 视为 `{ signal: payload, workflow: null }`(见 core 的 `routeSenseComputeOutput`)。
|
||||
|
||||
### Sense 模块导出
|
||||
|
||||
```typescript
|
||||
// senses/<name>/src/index.ts
|
||||
import type { ComputeResult } from "@uncaged/nerve-core";
|
||||
import { table } from "./schema.js";
|
||||
|
||||
type Row = { ts: number; value: number };
|
||||
|
||||
export async function compute(): Promise<ComputeResult<Row>> {
|
||||
const row: Row = { ts: Date.now(), value: Math.random() }; // 替换为真实观测逻辑
|
||||
return { signal: row, workflow: null };
|
||||
}
|
||||
|
||||
export { table };
|
||||
```
|
||||
|
||||
### Schema 定义
|
||||
|
||||
```typescript
|
||||
// senses/<name>/src/schema.ts
|
||||
import { sqliteTable, integer, real } from "drizzle-orm/sqlite-core";
|
||||
|
||||
export const table = sqliteTable("samples", {
|
||||
ts: integer("ts").notNull(),
|
||||
value: real("value").notNull(),
|
||||
});
|
||||
```
|
||||
|
||||
### 调度方式
|
||||
|
||||
1. **interval 轮询**:`interval: 10s` — 每 10 秒执行一次
|
||||
2. **响应式触发**:`on: [cpu-usage]` — 当 cpu-usage 发出 signal 时触发
|
||||
3. 两者可以组合
|
||||
|
||||
### 调试
|
||||
|
||||
```bash
|
||||
nerve dev # 前台运行,看实时输出
|
||||
nerve sense trigger <name> # 手动触发一次
|
||||
nerve sense query <name> "SELECT * FROM samples ORDER BY ts DESC LIMIT 5"
|
||||
```
|
||||
|
||||
### 完整示例:CPU 监控
|
||||
|
||||
```typescript
|
||||
// senses/cpu-usage/src/schema.ts
|
||||
import { sqliteTable, integer, real } from "drizzle-orm/sqlite-core";
|
||||
|
||||
export const table = sqliteTable("samples", {
|
||||
ts: integer("ts").notNull(),
|
||||
value: real("value").notNull(),
|
||||
});
|
||||
|
||||
// senses/cpu-usage/src/index.ts
|
||||
import os from "node:os";
|
||||
import type { ComputeResult } from "@uncaged/nerve-core";
|
||||
import { table } from "./schema.js";
|
||||
|
||||
type Row = { ts: number; value: number };
|
||||
|
||||
export async function compute(): Promise<ComputeResult<Row>> {
|
||||
const oneMin = os.loadavg()[0];
|
||||
return { signal: { ts: Date.now(), value: oneMin }, workflow: null };
|
||||
}
|
||||
|
||||
export { table };
|
||||
```
|
||||
|
||||
nerve.yaml:
|
||||
```yaml
|
||||
senses:
|
||||
cpu-usage:
|
||||
group: system
|
||||
interval: 10s
|
||||
throttle: 5s
|
||||
timeout: 10s
|
||||
retention: 10000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow 开发指南
|
||||
|
||||
### 核心类型
|
||||
|
||||
```typescript
|
||||
import type {
|
||||
WorkflowDefinition,
|
||||
RoleResult,
|
||||
ThreadContext,
|
||||
RoleMeta,
|
||||
Moderator,
|
||||
} from "@uncaged/nerve-core";
|
||||
import { END } from "@uncaged/nerve-core";
|
||||
|
||||
// Role<Meta> — (ctx: ThreadContext) => Promise<RoleResult<Meta>>
|
||||
// RoleResult<Meta> — { content: string; meta: Meta }
|
||||
// ThreadContext<M extends RoleMeta> — threadId, start(__start__ 帧), steps(各 role 轮次)
|
||||
// Moderator<M> — (ctx) => 下一个 role 名 | END
|
||||
// WorkflowDefinition<M extends RoleMeta> — name, roles, moderator
|
||||
```
|
||||
|
||||
### createRole 四元组(接入 LLM 时推荐)
|
||||
|
||||
工作区根目录需安装 **`@uncaged/nerve-workflow-utils`**(及所选 agent 适配器包)。默认 `nerve init` 的 `package.json` 不含该依赖时,在 `~/.uncaged-nerve` 下执行 `pnpm add @uncaged/nerve-workflow-utils`(或 npm 等价命令)。
|
||||
|
||||
使用 **`createRole`**,按固定顺序传入四件事:
|
||||
|
||||
1. **adapter** — `AgentFn`,`(ctx, systemPrompt) => Promise<string>`(原始模型输出文本)。
|
||||
2. **prompt** — `string`,或 `async (ctx: ThreadContext) => string`。
|
||||
3. **meta** — `z.ZodType<M>`,供 moderator 路由的结构化 meta。
|
||||
4. **extract** — `{ provider: LlmProvider; dryRun: boolean | null }`,声明从回复中抽取 meta 时用的 LLM(OpenAI 兼容)及是否 dry-run。
|
||||
|
||||
```typescript
|
||||
import { createLlmAdapter, createRole } from "@uncaged/nerve-workflow-utils";
|
||||
import type { ThreadContext } from "@uncaged/nerve-core";
|
||||
import { z } from "zod";
|
||||
|
||||
const provider = {
|
||||
baseUrl: "https://api.example.com/v1",
|
||||
apiKey: process.env.EXAMPLE_API_KEY!,
|
||||
model: "gpt-4o-mini",
|
||||
};
|
||||
|
||||
const planMeta = z.object({ next: z.enum(["execute", "stop"]) });
|
||||
|
||||
export const planner = createRole(
|
||||
createLlmAdapter(provider),
|
||||
async (ctx: ThreadContext) => `规划任务:${ctx.start.content}`,
|
||||
planMeta,
|
||||
{ provider, dryRun: null },
|
||||
);
|
||||
```
|
||||
|
||||
`createLlmAdapter` 仅位于 **`@uncaged/nerve-workflow-utils`**:用 `LlmProvider` 生成 `AgentFn`,单轮对话里 **system** 来自 `createRole` 解析后的 prompt 字符串,**user** 为线程起点 `ctx.start.content`。
|
||||
|
||||
### 基本 Workflow 示例(平铺 `roles/<role>.ts`)
|
||||
|
||||
```typescript
|
||||
// workflows/example/roles/main.ts
|
||||
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
export async function main(ctx: ThreadContext): Promise<RoleResult<{ round: number }>> {
|
||||
const prompt = ctx.start.content;
|
||||
return {
|
||||
content: `处理完成: ${prompt}`,
|
||||
meta: { round: ctx.steps.length },
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// workflows/example/index.ts
|
||||
import type { ThreadContext, WorkflowDefinition } from "@uncaged/nerve-core";
|
||||
import { END } from "@uncaged/nerve-core";
|
||||
|
||||
import { main } from "./roles/main.js";
|
||||
|
||||
type Meta = Record<"main", { round: number }>;
|
||||
|
||||
const workflow: WorkflowDefinition<Meta> = {
|
||||
name: "example",
|
||||
roles: { main },
|
||||
moderator(ctx: ThreadContext<Meta>) {
|
||||
return ctx.steps.length === 0 ? "main" : END;
|
||||
},
|
||||
};
|
||||
|
||||
export default workflow;
|
||||
```
|
||||
|
||||
可选:将 `moderator` 挪到 `moderator.ts` 再 `import { route } from "./moderator.js"`,保持 `index.ts` 只负责组装 `WorkflowDefinition`。
|
||||
|
||||
### 多 Role Workflow 示例
|
||||
|
||||
```typescript
|
||||
// workflows/plan-execute-review/roles/planner.ts
|
||||
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
export async function planner(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
|
||||
void ctx;
|
||||
return { content: "计划: ...", meta: { status: "planned" } };
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// workflows/plan-execute-review/roles/executor.ts
|
||||
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
export async function executor(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
|
||||
void ctx;
|
||||
return { content: "执行: ...", meta: { status: "executed" } };
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// workflows/plan-execute-review/roles/reviewer.ts
|
||||
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
export async function reviewer(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
|
||||
void ctx;
|
||||
return { content: "审核通过", meta: { status: "approved" } };
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// workflows/plan-execute-review/index.ts
|
||||
import type { WorkflowDefinition, ThreadContext } from "@uncaged/nerve-core";
|
||||
import { END } from "@uncaged/nerve-core";
|
||||
|
||||
import { executor } from "./roles/executor.js";
|
||||
import { planner } from "./roles/planner.js";
|
||||
import { reviewer } from "./roles/reviewer.js";
|
||||
|
||||
type Roles = Record<"planner" | "executor" | "reviewer", { status: string }>;
|
||||
|
||||
const workflow: WorkflowDefinition<Roles> = {
|
||||
name: "plan-execute-review",
|
||||
roles: { planner, executor, reviewer },
|
||||
moderator(ctx: ThreadContext<Roles>) {
|
||||
if (ctx.steps.length === 0) return "planner";
|
||||
const last = ctx.steps[ctx.steps.length - 1];
|
||||
if (last.role === "planner") return "executor";
|
||||
if (last.role === "executor") return "reviewer";
|
||||
return END;
|
||||
},
|
||||
};
|
||||
|
||||
export default workflow;
|
||||
```
|
||||
|
||||
### Agent 适配器
|
||||
|
||||
Workflow role 可以集成 AI agent。已知适配器 **ID**:`echo`、`cursor`、`hermes`、`codex`。
|
||||
|
||||
```typescript
|
||||
type AgentFn = (ctx: ThreadContext, systemPrompt: string) => Promise<string>;
|
||||
```
|
||||
|
||||
没有现成 agent 包时,用 **`createLlmAdapter`(`@uncaged/nerve-workflow-utils`)** 从 OpenAI 兼容的 `LlmProvider` 构造 `AgentFn`,再交给 **`createRole`** 的四元组。
|
||||
|
||||
### Workflow 运行状态
|
||||
|
||||
`queued` → `started` → `completed` | `failed` | `crashed` | `killed` | `interrupted` | `dropped`
|
||||
|
||||
---
|
||||
|
||||
## 日常操作 Pattern
|
||||
|
||||
### 查看系统整体状态
|
||||
|
||||
```bash
|
||||
nerve daemon status # daemon 是否在运行
|
||||
nerve sense list # 所有 sense 及其调度配置
|
||||
nerve workflow status # 运行中的 workflow
|
||||
nerve thread list # 最近的 workflow 执行记录
|
||||
```
|
||||
|
||||
### 检查某个 sense 的历史数据
|
||||
|
||||
```bash
|
||||
nerve sense query cpu-usage "SELECT * FROM samples ORDER BY ts DESC LIMIT 10" --json
|
||||
nerve sense schema cpu-usage # 查看表结构
|
||||
```
|
||||
|
||||
### 手动触发 workflow
|
||||
|
||||
```bash
|
||||
nerve workflow trigger my-workflow --prompt "手动检查"
|
||||
nerve thread list --workflow my-workflow # 查看执行状态
|
||||
nerve thread show <runId> # 查看对话详情
|
||||
```
|
||||
|
||||
### 排查 sense 报错
|
||||
|
||||
```bash
|
||||
nerve daemon logs --follow # 查看实时日志
|
||||
nerve sense trigger <name> # 手动触发看报错
|
||||
nerve dev # 前台模式,更详细的输出
|
||||
```
|
||||
|
||||
### 开发新 sense
|
||||
|
||||
```bash
|
||||
nerve create sense my-sensor # 脚手架
|
||||
# 编辑 senses/my-sensor/src/index.ts 和 schema.ts
|
||||
nerve validate # 验证配置
|
||||
nerve dev # 前台测试
|
||||
nerve sense trigger my-sensor # 单次触发验证
|
||||
nerve sense query my-sensor "SELECT * FROM ..." # 检查数据
|
||||
```
|
||||
|
||||
### 开发新 workflow
|
||||
|
||||
```bash
|
||||
nerve create workflow my-flow # 脚手架(当前 CLI 可能仍生成 roles/<name>/ 子目录)
|
||||
# 推荐对齐 AGENT.md:workflows/my-flow/index.ts + roles/<role>.ts(平铺),moderator 可拆到 moderator.ts
|
||||
nerve validate # 验证配置
|
||||
cd ~/.uncaged-nerve && npm run build # 工作区根目录构建(等价:pnpm run build);勿在单个 workflow 子目录单独跑 build
|
||||
nerve workflow trigger my-flow --prompt "测试" --dryRun # 干跑
|
||||
nerve thread show <runId> # 查看执行轨迹
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **Sense 返回值**:返回 `null` 表示静默(不发 signal);返回 `{ signal, workflow }` 才发 signal。不要返回 undefined。
|
||||
- **Sense 持久化**:daemon 在 `compute()` 返回非 null 时自动执行 `db.insert(table).values(signal)` 并写入 `_signals`;业务代码不要自行 insert。
|
||||
- **no optional properties**:nerve 代码规范禁止 `?:`,用 `T | null` 代替。
|
||||
- **函数式风格**:用 `function` + `type`,不用 `class` + `interface`。
|
||||
- **workflow 用 default export**:工作区里通常只有 `workflows/<name>/index.ts` 使用 default export(daemon 加载约定)。
|
||||
- **_signals 表**:每个 sense 自动有 `_signals` 表记录 signal 历史,受 `retention` 配置限制。
|
||||
- **concurrency + overflow**:workflow 必须配置并发策略,否则验证失败。
|
||||
- **moderator 是同步函数**:不要加 async,moderator 是纯路由逻辑,不能有副作用。
|
||||
@@ -0,0 +1,580 @@
|
||||
---
|
||||
name: nerve
|
||||
version: 0.5.0
|
||||
description: >
|
||||
Nerve — AI agent 观测引擎。掌握 nerve 的核心概念、CLI 操作、sense/workflow 开发。
|
||||
加载此 skill 后你可以:查看系统状态、监控 sense、触发 workflow、开发新 sense 和 workflow。
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [nerve, sense, workflow, monitoring, agent-kernel]
|
||||
homepage: https://git.shazhou.work/uncaged/nerve
|
||||
---
|
||||
|
||||
# Nerve — AI Agent 观测引擎
|
||||
|
||||
Nerve 是一个轻量级观测引擎守护进程。它持续观测外部状态,通过声明式规则响应变化,编排多步骤工作流。
|
||||
|
||||
## 核心架构
|
||||
|
||||
```
|
||||
External World → Sense → Signal → Workflow → Log
|
||||
```
|
||||
|
||||
| 概念 | 说明 |
|
||||
|------|------|
|
||||
| **Sense** | 观测函数,`compute()` 采样或推导数据。返回非 null 则发出 Signal,可选触发 Workflow。每个 Sense 有独立 SQLite 数据库。 |
|
||||
| **Signal** | Sense 返回非 null 时发出的通知。纯事实,无意图。通过内存 Signal Bus 分发,不持久化。 |
|
||||
| **Workflow** | 有状态的多步骤执行。包含 Role(有副作用的执行者)和 Moderator(纯路由器)。每个实例是一个 Thread,有唯一 runId。 |
|
||||
| **Log** | 不可变审计日志。记录执行、状态转换、错误。不能触发 Sense(防止反馈循环)。 |
|
||||
| **Engine** | 内核,持有 Signal Bus、Process Manager、Workflow Manager。不直接加载用户代码。 |
|
||||
| **Daemon** | 引擎运行时,作为后台进程运行。 |
|
||||
|
||||
**关键规则:**
|
||||
- 因果链单向:External → Sense → Signal → Workflow + Log
|
||||
- 进程隔离:每个 Sense group 一个 worker(长期),每个 Workflow 类型一个 worker(按需)
|
||||
- 两个扩展点:Sense(观测什么 + 何时)、Workflow(做什么)
|
||||
|
||||
## 工作区结构
|
||||
|
||||
由 `nerve init` 生成的工作区根目录(默认 `~/.uncaged-nerve/`)包含 **`AGENT.md`**。实现 sense/workflow 前先阅读该文件:它与本文 skill 对齐,约定目录布局、`createRole` 用法以及**始终在仓库根目录**执行的构建命令。
|
||||
|
||||
```
|
||||
~/.uncaged-nerve/
|
||||
├── AGENT.md # 人类 / Agent 可读的工作区约定(init 生成)
|
||||
├── nerve.yaml # 核心配置
|
||||
├── package.json # 单一根包(sense/workflow 下不再有独立 package)
|
||||
├── scripts/build.mjs # 根目录 esbuild;通过 npm/pnpm 的 build 脚本调用
|
||||
├── senses/
|
||||
│ └── <name>/
|
||||
│ ├── src/index.ts # exports compute() + table
|
||||
│ ├── src/schema.ts # Drizzle 表定义
|
||||
│ └── migrations/ # SQL 迁移
|
||||
├── workflows/
|
||||
│ └── <name>/
|
||||
│ ├── index.ts # default export:WorkflowDefinition
|
||||
│ ├── moderator.ts # 可选:抽出 moderator,由 index 导入
|
||||
│ ├── build.ts # 可选:共享常量 / 纯函数(避免 index 臃肿;非 esbuild 入口)
|
||||
│ └── roles/
|
||||
│ └── <role>.ts # 每角色单文件(推荐平铺,而非 roles/<role>/index.ts)
|
||||
└── data/ # 运行时数据(SQLite、blobs)
|
||||
```
|
||||
|
||||
### 命名约定
|
||||
|
||||
- **Workflow**:动词开头的 kebab-case(例如 `review-pull-request`、`deploy-staging`)。避免单独名词式命名(如 `notifications`)。
|
||||
- **Sense**:描述性名词 kebab-case(例如 `cpu-usage`)。
|
||||
|
||||
---
|
||||
|
||||
## CLI 完整参考
|
||||
|
||||
全局选项:`--host <host:port>`(连接远程 daemon)、`--api-token <secret>`(Bearer 认证)
|
||||
|
||||
### 初始化与脚手架
|
||||
|
||||
```bash
|
||||
nerve init # 初始化工作区
|
||||
nerve init --from <git-url> # 从 git 仓库克隆工作区
|
||||
nerve init workspace # 只初始化工作区结构
|
||||
|
||||
nerve create sense <name> # 创建 sense 脚手架
|
||||
nerve create sense <name> --force # 覆盖已有
|
||||
nerve create workflow <name> # 创建 workflow 脚手架
|
||||
nerve create workflow <name> --force
|
||||
|
||||
nerve validate # 验证 nerve.yaml 配置
|
||||
```
|
||||
|
||||
### Daemon 管理
|
||||
|
||||
```bash
|
||||
nerve daemon start # 启动后台 daemon
|
||||
nerve daemon start --port 3000 # 指定 HTTP API 端口
|
||||
nerve daemon stop # 停止 daemon
|
||||
nerve daemon restart # 重启
|
||||
nerve daemon status # 查看状态
|
||||
nerve daemon logs # 查看日志
|
||||
nerve daemon logs --follow # 实时日志
|
||||
nerve daemon logs --n 50 # 最近 50 行
|
||||
|
||||
nerve dev # 前台开发模式(不 fork daemon)
|
||||
nerve dev --port 3000 # 指定端口
|
||||
```
|
||||
|
||||
### Sense 操作
|
||||
|
||||
```bash
|
||||
nerve sense list # 列出所有注册的 sense
|
||||
nerve sense trigger <name> # 手动触发 sense 计算
|
||||
nerve sense schema <name> # 查看 sense 数据库表结构
|
||||
nerve sense schema <name> --json # JSON 格式
|
||||
nerve sense query <name> <sql> # 对 sense 数据库执行只读 SQL
|
||||
nerve sense query <name> "SELECT * FROM samples ORDER BY ts DESC LIMIT 10" --json
|
||||
```
|
||||
|
||||
### Workflow 操作
|
||||
|
||||
```bash
|
||||
nerve workflow list # 列出 nerve.yaml 中定义的 workflow
|
||||
nerve workflow status # 查看运行中的 workflow 状态
|
||||
nerve workflow trigger <name> # 触发 workflow
|
||||
nerve workflow trigger <name> --prompt "检查生产环境"
|
||||
nerve workflow trigger <name> --maxRounds 50
|
||||
nerve workflow trigger <name> --dryRun # 干跑模式
|
||||
```
|
||||
|
||||
### Thread(Workflow 执行记录)
|
||||
|
||||
```bash
|
||||
nerve thread list # 列出最近的 workflow 执行
|
||||
nerve thread list --all # 包含已完成/失败的
|
||||
nerve thread list --workflow <name> # 按 workflow 过滤
|
||||
nerve thread list --limit 50 # 最多 50 条
|
||||
|
||||
nerve thread show <runId> # 查看 role 对话轮次
|
||||
nerve thread show <runId> --budget 16000 # 增大输出预算(默认 8000 字符)
|
||||
|
||||
nerve thread inspect <runId> # 查看详情和事件
|
||||
|
||||
nerve thread kill <runId> # 终止运行中/排队中的 thread
|
||||
```
|
||||
|
||||
### Store(日志归档)
|
||||
|
||||
```bash
|
||||
nerve store archive # 导出旧日志到 JSONL 归档
|
||||
nerve store archive --vacuum # 归档后 VACUUM 数据库
|
||||
```
|
||||
|
||||
### Knowledge(知识库)
|
||||
|
||||
```bash
|
||||
nerve knowledge sync # 从 knowledge.yaml 重建索引
|
||||
nerve knowledge query "搜索内容" # 搜索知识库
|
||||
nerve knowledge query "内容" --limit 5
|
||||
nerve knowledge query "内容" -g # 搜索所有注册仓库
|
||||
```
|
||||
|
||||
### Remote(远程 daemon)
|
||||
|
||||
```bash
|
||||
nerve remote add <name> <host:port> --token <secret>
|
||||
nerve remote list
|
||||
nerve remote show <name>
|
||||
nerve remote set-url <name> <host>
|
||||
nerve remote set-token <name> <token>
|
||||
nerve remote remove <name>
|
||||
nerve remote default <name> # 设为默认远程
|
||||
```
|
||||
|
||||
### Agent(向 Hermes 注入本 skill)
|
||||
|
||||
```bash
|
||||
nerve agent status # CLI 版本与各 Hermes 注入目录中的 skill 版本
|
||||
nerve agent inject hermes # 安装到 ~/.hermes/skills/nerve
|
||||
nerve agent inject hermes --profile <name> # 写入 ~/.hermes/profiles/<name>/skills/nerve
|
||||
nerve agent update # 将所有已注入目录更新到当前 CLI 对应版本
|
||||
nerve agent remove hermes # 移除默认 profile 的注入
|
||||
nerve agent remove hermes --profile <name>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## nerve.yaml 配置参考
|
||||
|
||||
```yaml
|
||||
# 引擎全局配置
|
||||
max_rounds: 100 # moderator 最大轮次(默认 100)
|
||||
|
||||
# Sense 配置
|
||||
senses:
|
||||
cpu-usage:
|
||||
group: system # 必填,同 group 的 sense 共享 worker
|
||||
interval: 10s # 轮询间隔(duration: 5s, 10m, 1h)
|
||||
throttle: 5s # 最小计算间隔
|
||||
timeout: 10s # compute 超时
|
||||
grace_period: null # 优雅关闭等待
|
||||
retention: 10000 # _signals 表最大行数(默认 10000)
|
||||
|
||||
system-health:
|
||||
group: derived
|
||||
on: [cpu-usage, disk-usage] # 响应式:被列出的 sense 发出 signal 时触发
|
||||
throttle: null
|
||||
timeout: null
|
||||
|
||||
# Workflow 配置
|
||||
workflows:
|
||||
my-workflow:
|
||||
concurrency: 1 # 必填,并发数
|
||||
overflow: drop # 必填,超并发时处理:drop | queue
|
||||
max_queue: 100 # overflow=queue 时的队列上限(默认 100)
|
||||
|
||||
# HTTP API
|
||||
api:
|
||||
port: 3000 # null = 不启用 HTTP
|
||||
host: "127.0.0.1" # 监听地址
|
||||
token: null # 非 loopback 时必填
|
||||
|
||||
# LLM Extract(可选)
|
||||
extract:
|
||||
provider: anthropic
|
||||
model: claude-sonnet-4-20250514
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Sense 开发指南
|
||||
|
||||
### compute 函数签名
|
||||
|
||||
Sense 的 `compute` **无参数**。它不接收数据库句柄:daemon 在 worker 内调用 `SenseComputeFn`,由运行时负责把非 null 结果的 `signal` 写入该 sense 的 Drizzle 表并记入 `_signals`。超时由运行时控制(对应 `nerve.yaml` 里的 `timeout`),无需在业务代码里读取 `AbortSignal`。
|
||||
|
||||
```typescript
|
||||
import type { ComputeResult, SenseComputeFn } from "@uncaged/nerve-core";
|
||||
|
||||
export const compute: SenseComputeFn<MySignalShape> = async () => {
|
||||
// ...
|
||||
};
|
||||
// 或等价地:
|
||||
export async function compute(): Promise<ComputeResult<MySignalShape>> {
|
||||
// ...
|
||||
}
|
||||
```
|
||||
|
||||
(运行时定义见 `@uncaged/nerve-core` 的 `SenseComputeFn` / `SenseModule`,daemon 侧在 `sense-runtime.ts` 的 `executeCompute` 中插入 `result.signal`。)
|
||||
|
||||
### 返回值
|
||||
|
||||
```typescript
|
||||
// 返回 null = 静默,不发 signal
|
||||
// 返回非 null = 发出 signal(并写入业务表),可选触发 workflow
|
||||
type ComputeResult<T> =
|
||||
| null
|
||||
| { signal: T; workflow: WorkflowTrigger | null };
|
||||
|
||||
type WorkflowTrigger = {
|
||||
name: string; // workflow 名称(对应 nerve.yaml 中的 key)
|
||||
maxRounds: number; // moderator 最大轮次
|
||||
prompt: string; // 初始 prompt
|
||||
dryRun: boolean; // 干跑模式
|
||||
};
|
||||
```
|
||||
|
||||
若返回值是普通对象且不含 `signal` 字段,内核会按 shorthand 视为 `{ signal: payload, workflow: null }`(见 core 的 `routeSenseComputeOutput`)。
|
||||
|
||||
### Sense 模块导出
|
||||
|
||||
```typescript
|
||||
// senses/<name>/src/index.ts
|
||||
import type { ComputeResult } from "@uncaged/nerve-core";
|
||||
import { table } from "./schema.js";
|
||||
|
||||
type Row = { ts: number; value: number };
|
||||
|
||||
export async function compute(): Promise<ComputeResult<Row>> {
|
||||
const row: Row = { ts: Date.now(), value: Math.random() }; // 替换为真实观测逻辑
|
||||
return { signal: row, workflow: null };
|
||||
}
|
||||
|
||||
export { table };
|
||||
```
|
||||
|
||||
### Schema 定义
|
||||
|
||||
```typescript
|
||||
// senses/<name>/src/schema.ts
|
||||
import { sqliteTable, integer, real } from "drizzle-orm/sqlite-core";
|
||||
|
||||
export const table = sqliteTable("samples", {
|
||||
ts: integer("ts").notNull(),
|
||||
value: real("value").notNull(),
|
||||
});
|
||||
```
|
||||
|
||||
### 调度方式
|
||||
|
||||
1. **interval 轮询**:`interval: 10s` — 每 10 秒执行一次
|
||||
2. **响应式触发**:`on: [cpu-usage]` — 当 cpu-usage 发出 signal 时触发
|
||||
3. 两者可以组合
|
||||
|
||||
### 调试
|
||||
|
||||
```bash
|
||||
nerve dev # 前台运行,看实时输出
|
||||
nerve sense trigger <name> # 手动触发一次
|
||||
nerve sense query <name> "SELECT * FROM samples ORDER BY ts DESC LIMIT 5"
|
||||
```
|
||||
|
||||
### 完整示例:CPU 监控
|
||||
|
||||
```typescript
|
||||
// senses/cpu-usage/src/schema.ts
|
||||
import { sqliteTable, integer, real } from "drizzle-orm/sqlite-core";
|
||||
|
||||
export const table = sqliteTable("samples", {
|
||||
ts: integer("ts").notNull(),
|
||||
value: real("value").notNull(),
|
||||
});
|
||||
|
||||
// senses/cpu-usage/src/index.ts
|
||||
import os from "node:os";
|
||||
import type { ComputeResult } from "@uncaged/nerve-core";
|
||||
import { table } from "./schema.js";
|
||||
|
||||
type Row = { ts: number; value: number };
|
||||
|
||||
export async function compute(): Promise<ComputeResult<Row>> {
|
||||
const oneMin = os.loadavg()[0];
|
||||
return { signal: { ts: Date.now(), value: oneMin }, workflow: null };
|
||||
}
|
||||
|
||||
export { table };
|
||||
```
|
||||
|
||||
nerve.yaml:
|
||||
```yaml
|
||||
senses:
|
||||
cpu-usage:
|
||||
group: system
|
||||
interval: 10s
|
||||
throttle: 5s
|
||||
timeout: 10s
|
||||
retention: 10000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Workflow 开发指南
|
||||
|
||||
### 核心类型
|
||||
|
||||
```typescript
|
||||
import type {
|
||||
WorkflowDefinition,
|
||||
RoleResult,
|
||||
ThreadContext,
|
||||
RoleMeta,
|
||||
Moderator,
|
||||
} from "@uncaged/nerve-core";
|
||||
import { END } from "@uncaged/nerve-core";
|
||||
|
||||
// Role<Meta> — (ctx: ThreadContext) => Promise<RoleResult<Meta>>
|
||||
// RoleResult<Meta> — { content: string; meta: Meta }
|
||||
// ThreadContext<M extends RoleMeta> — threadId, start(__start__ 帧), steps(各 role 轮次)
|
||||
// Moderator<M> — (ctx) => 下一个 role 名 | END
|
||||
// WorkflowDefinition<M extends RoleMeta> — name, roles, moderator
|
||||
```
|
||||
|
||||
### createRole 四元组(接入 LLM 时推荐)
|
||||
|
||||
工作区根目录需安装 **`@uncaged/nerve-workflow-utils`**(及所选 agent 适配器包)。默认 `nerve init` 的 `package.json` 不含该依赖时,在 `~/.uncaged-nerve` 下执行 `pnpm add @uncaged/nerve-workflow-utils`(或 npm 等价命令)。
|
||||
|
||||
使用 **`createRole`**,按固定顺序传入四件事:
|
||||
|
||||
1. **adapter** — `AgentFn`,`(ctx, systemPrompt) => Promise<string>`(原始模型输出文本)。
|
||||
2. **prompt** — `string`,或 `async (ctx: ThreadContext) => string`。
|
||||
3. **meta** — `z.ZodType<M>`,供 moderator 路由的结构化 meta。
|
||||
4. **extract** — `{ provider: LlmProvider; dryRun: boolean | null }`,声明从回复中抽取 meta 时用的 LLM(OpenAI 兼容)及是否 dry-run。
|
||||
|
||||
```typescript
|
||||
import { createLlmAdapter, createRole } from "@uncaged/nerve-workflow-utils";
|
||||
import type { ThreadContext } from "@uncaged/nerve-core";
|
||||
import { z } from "zod";
|
||||
|
||||
const provider = {
|
||||
baseUrl: "https://api.example.com/v1",
|
||||
apiKey: process.env.EXAMPLE_API_KEY!,
|
||||
model: "gpt-4o-mini",
|
||||
};
|
||||
|
||||
const planMeta = z.object({ next: z.enum(["execute", "stop"]) });
|
||||
|
||||
export const planner = createRole(
|
||||
createLlmAdapter(provider),
|
||||
async (ctx: ThreadContext) => `规划任务:${ctx.start.content}`,
|
||||
planMeta,
|
||||
{ provider, dryRun: null },
|
||||
);
|
||||
```
|
||||
|
||||
`createLlmAdapter` 仅位于 **`@uncaged/nerve-workflow-utils`**:用 `LlmProvider` 生成 `AgentFn`,单轮对话里 **system** 来自 `createRole` 解析后的 prompt 字符串,**user** 为线程起点 `ctx.start.content`。
|
||||
|
||||
### 基本 Workflow 示例(平铺 `roles/<role>.ts`)
|
||||
|
||||
```typescript
|
||||
// workflows/example/roles/main.ts
|
||||
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
export async function main(ctx: ThreadContext): Promise<RoleResult<{ round: number }>> {
|
||||
const prompt = ctx.start.content;
|
||||
return {
|
||||
content: `处理完成: ${prompt}`,
|
||||
meta: { round: ctx.steps.length },
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// workflows/example/index.ts
|
||||
import type { ThreadContext, WorkflowDefinition } from "@uncaged/nerve-core";
|
||||
import { END } from "@uncaged/nerve-core";
|
||||
|
||||
import { main } from "./roles/main.js";
|
||||
|
||||
type Meta = Record<"main", { round: number }>;
|
||||
|
||||
const workflow: WorkflowDefinition<Meta> = {
|
||||
name: "example",
|
||||
roles: { main },
|
||||
moderator(ctx: ThreadContext<Meta>) {
|
||||
return ctx.steps.length === 0 ? "main" : END;
|
||||
},
|
||||
};
|
||||
|
||||
export default workflow;
|
||||
```
|
||||
|
||||
可选:将 `moderator` 挪到 `moderator.ts` 再 `import { route } from "./moderator.js"`,保持 `index.ts` 只负责组装 `WorkflowDefinition`。
|
||||
|
||||
### 多 Role Workflow 示例
|
||||
|
||||
```typescript
|
||||
// workflows/plan-execute-review/roles/planner.ts
|
||||
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
export async function planner(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
|
||||
void ctx;
|
||||
return { content: "计划: ...", meta: { status: "planned" } };
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// workflows/plan-execute-review/roles/executor.ts
|
||||
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
export async function executor(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
|
||||
void ctx;
|
||||
return { content: "执行: ...", meta: { status: "executed" } };
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// workflows/plan-execute-review/roles/reviewer.ts
|
||||
import type { RoleResult, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
export async function reviewer(ctx: ThreadContext): Promise<RoleResult<{ status: string }>> {
|
||||
void ctx;
|
||||
return { content: "审核通过", meta: { status: "approved" } };
|
||||
}
|
||||
```
|
||||
|
||||
```typescript
|
||||
// workflows/plan-execute-review/index.ts
|
||||
import type { WorkflowDefinition, ThreadContext } from "@uncaged/nerve-core";
|
||||
import { END } from "@uncaged/nerve-core";
|
||||
|
||||
import { executor } from "./roles/executor.js";
|
||||
import { planner } from "./roles/planner.js";
|
||||
import { reviewer } from "./roles/reviewer.js";
|
||||
|
||||
type Roles = Record<"planner" | "executor" | "reviewer", { status: string }>;
|
||||
|
||||
const workflow: WorkflowDefinition<Roles> = {
|
||||
name: "plan-execute-review",
|
||||
roles: { planner, executor, reviewer },
|
||||
moderator(ctx: ThreadContext<Roles>) {
|
||||
if (ctx.steps.length === 0) return "planner";
|
||||
const last = ctx.steps[ctx.steps.length - 1];
|
||||
if (last.role === "planner") return "executor";
|
||||
if (last.role === "executor") return "reviewer";
|
||||
return END;
|
||||
},
|
||||
};
|
||||
|
||||
export default workflow;
|
||||
```
|
||||
|
||||
### Agent 适配器
|
||||
|
||||
Workflow role 可以集成 AI agent。已知适配器 **ID**:`echo`、`cursor`、`hermes`、`codex`。
|
||||
|
||||
```typescript
|
||||
type AgentFn = (ctx: ThreadContext, systemPrompt: string) => Promise<string>;
|
||||
```
|
||||
|
||||
没有现成 agent 包时,用 **`createLlmAdapter`(`@uncaged/nerve-workflow-utils`)** 从 OpenAI 兼容的 `LlmProvider` 构造 `AgentFn`,再交给 **`createRole`** 的四元组。
|
||||
|
||||
### Workflow 运行状态
|
||||
|
||||
`queued` → `started` → `completed` | `failed` | `crashed` | `killed` | `interrupted` | `dropped`
|
||||
|
||||
---
|
||||
|
||||
## 日常操作 Pattern
|
||||
|
||||
### 查看系统整体状态
|
||||
|
||||
```bash
|
||||
nerve daemon status # daemon 是否在运行
|
||||
nerve sense list # 所有 sense 及其调度配置
|
||||
nerve workflow status # 运行中的 workflow
|
||||
nerve thread list # 最近的 workflow 执行记录
|
||||
```
|
||||
|
||||
### 检查某个 sense 的历史数据
|
||||
|
||||
```bash
|
||||
nerve sense query cpu-usage "SELECT * FROM samples ORDER BY ts DESC LIMIT 10" --json
|
||||
nerve sense schema cpu-usage # 查看表结构
|
||||
```
|
||||
|
||||
### 手动触发 workflow
|
||||
|
||||
```bash
|
||||
nerve workflow trigger my-workflow --prompt "手动检查"
|
||||
nerve thread list --workflow my-workflow # 查看执行状态
|
||||
nerve thread show <runId> # 查看对话详情
|
||||
```
|
||||
|
||||
### 排查 sense 报错
|
||||
|
||||
```bash
|
||||
nerve daemon logs --follow # 查看实时日志
|
||||
nerve sense trigger <name> # 手动触发看报错
|
||||
nerve dev # 前台模式,更详细的输出
|
||||
```
|
||||
|
||||
### 开发新 sense
|
||||
|
||||
```bash
|
||||
nerve create sense my-sensor # 脚手架
|
||||
# 编辑 senses/my-sensor/src/index.ts 和 schema.ts
|
||||
nerve validate # 验证配置
|
||||
nerve dev # 前台测试
|
||||
nerve sense trigger my-sensor # 单次触发验证
|
||||
nerve sense query my-sensor "SELECT * FROM ..." # 检查数据
|
||||
```
|
||||
|
||||
### 开发新 workflow
|
||||
|
||||
```bash
|
||||
nerve create workflow my-flow # 脚手架(当前 CLI 可能仍生成 roles/<name>/ 子目录)
|
||||
# 推荐对齐 AGENT.md:workflows/my-flow/index.ts + roles/<role>.ts(平铺),moderator 可拆到 moderator.ts
|
||||
nerve validate # 验证配置
|
||||
cd ~/.uncaged-nerve && npm run build # 工作区根目录构建(等价:pnpm run build);勿在单个 workflow 子目录单独跑 build
|
||||
nerve workflow trigger my-flow --prompt "测试" --dryRun # 干跑
|
||||
nerve thread show <runId> # 查看执行轨迹
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Pitfalls
|
||||
|
||||
- **Sense 返回值**:返回 `null` 表示静默(不发 signal);返回 `{ signal, workflow }` 才发 signal。不要返回 undefined。
|
||||
- **Sense 持久化**:daemon 在 `compute()` 返回非 null 时自动执行 `db.insert(table).values(signal)` 并写入 `_signals`;业务代码不要自行 insert。
|
||||
- **no optional properties**:nerve 代码规范禁止 `?:`,用 `T | null` 代替。
|
||||
- **函数式风格**:用 `function` + `type`,不用 `class` + `interface`。
|
||||
- **workflow 用 default export**:工作区里通常只有 `workflows/<name>/index.ts` 使用 default export(daemon 加载约定)。
|
||||
- **_signals 表**:每个 sense 自动有 `_signals` 表记录 signal 历史,受 `retention` 配置限制。
|
||||
- **concurrency + overflow**:workflow 必须配置并发策略,否则验证失败。
|
||||
- **moderator 是同步函数**:不要加 async,moderator 是纯路由逻辑,不能有副作用。
|
||||
@@ -205,6 +205,10 @@ describe("e2e init", () => {
|
||||
expect(existsSync(join(nerveRoot, "scripts", "build.mjs"))).toBe(true);
|
||||
expect(existsSync(join(nerveRoot, "biome.json"))).toBe(true);
|
||||
expect(existsSync(join(nerveRoot, ".gitignore"))).toBe(true);
|
||||
expect(existsSync(join(nerveRoot, "AGENT.md"))).toBe(true);
|
||||
const agentMd = readFileSync(join(nerveRoot, "AGENT.md"), "utf8");
|
||||
expect(agentMd).toContain("verb-first");
|
||||
expect(agentMd).toContain("createRole");
|
||||
expect(existsSync(join(nerveRoot, "senses", "cpu-usage", "src", "index.ts"))).toBe(true);
|
||||
expect(existsSync(join(nerveRoot, "senses", "cpu-usage", "src", "schema.ts"))).toBe(true);
|
||||
expect(existsSync(join(nerveRoot, "senses", "cpu-usage", "migrations", "0001_init.sql"))).toBe(
|
||||
|
||||
@@ -3,6 +3,7 @@ import "@uncaged/nerve-daemon/experimental-warning-suppression.js";
|
||||
import { defineCommand, runMain } from "citty";
|
||||
|
||||
import { consumeGlobalDaemonCliFlags } from "./cli-global.js";
|
||||
import { agentCommand } from "./commands/agent.js";
|
||||
import { createCommand } from "./commands/create.js";
|
||||
import { daemonCommand } from "./commands/daemon.js";
|
||||
import { devCommand } from "./commands/dev.js";
|
||||
@@ -42,6 +43,7 @@ const main = defineCommand({
|
||||
"Nerve — an AI agent kernel. Global options: --host <host:port> (remote HTTP), --api-token <secret> (Bearer auth).",
|
||||
},
|
||||
subCommands: {
|
||||
agent: agentCommand,
|
||||
init: initCommand,
|
||||
create: createCommand,
|
||||
daemon: daemonCommand,
|
||||
|
||||
@@ -0,0 +1,378 @@
|
||||
import {
|
||||
cpSync,
|
||||
existsSync,
|
||||
mkdirSync,
|
||||
readFileSync,
|
||||
readdirSync,
|
||||
rmSync,
|
||||
statSync,
|
||||
writeFileSync,
|
||||
} from "node:fs";
|
||||
import { homedir } from "node:os";
|
||||
import { dirname, join, resolve as resolvePath } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
|
||||
import { defineCommand } from "citty";
|
||||
|
||||
function getPackageRootDir(): string {
|
||||
const thisFile = fileURLToPath(import.meta.url);
|
||||
let dir = dirname(thisFile);
|
||||
for (let i = 0; i < 5; i++) {
|
||||
if (existsSync(join(dir, "package.json"))) return dir;
|
||||
dir = dirname(dir);
|
||||
}
|
||||
throw new Error("Cannot locate package root. Is the CLI package intact?");
|
||||
}
|
||||
|
||||
function getCliVersion(): string {
|
||||
const pkgPath = join(getPackageRootDir(), "package.json");
|
||||
const pkg = JSON.parse(readFileSync(pkgPath, "utf8")) as { version: string };
|
||||
return pkg.version;
|
||||
}
|
||||
|
||||
let _cachedVersion: string | null = null;
|
||||
function cliVersion(): string {
|
||||
if (_cachedVersion === null) _cachedVersion = getCliVersion();
|
||||
return _cachedVersion;
|
||||
}
|
||||
|
||||
function getSkillSourceDir(): string {
|
||||
const root = getPackageRootDir();
|
||||
const skillsDir = join(root, "skills");
|
||||
if (!existsSync(skillsDir)) {
|
||||
throw new Error("Cannot locate skills directory. Is the CLI package intact?");
|
||||
}
|
||||
return skillsDir;
|
||||
}
|
||||
|
||||
function getHermesSkillDir(profile: string | null): string {
|
||||
const hermesHome = join(homedir(), ".hermes");
|
||||
if (profile !== null) {
|
||||
return join(hermesHome, "profiles", profile, "skills", "nerve");
|
||||
}
|
||||
return join(hermesHome, "skills", "nerve");
|
||||
}
|
||||
|
||||
function readVersionFile(skillDir: string): string | null {
|
||||
const versionPath = join(skillDir, ".nerve-version");
|
||||
if (!existsSync(versionPath)) return null;
|
||||
return readFileSync(versionPath, "utf8").trim();
|
||||
}
|
||||
|
||||
function writeVersionFile(skillDir: string, version: string): void {
|
||||
writeFileSync(join(skillDir, ".nerve-version"), `${version}\n`, "utf8");
|
||||
}
|
||||
|
||||
const CURSOR_VERSION_MARKER_RE = /<!--\s*nerve-cli-version:\s*([^>]+?)\s*-->/;
|
||||
|
||||
function resolveCursorProjectDir(pathArg: string | null): string {
|
||||
const raw = pathArg !== null && pathArg !== "" ? pathArg : process.cwd();
|
||||
return resolvePath(raw);
|
||||
}
|
||||
|
||||
function assertDirectory(projectDir: string, label: string): void {
|
||||
if (!existsSync(projectDir)) {
|
||||
process.stderr.write(`❌ ${label} does not exist: ${projectDir}\n`);
|
||||
process.exit(1);
|
||||
}
|
||||
if (!statSync(projectDir).isDirectory()) {
|
||||
process.stderr.write(`❌ ${label} is not a directory: ${projectDir}\n`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
|
||||
function readCursorInjectVersion(projectDir: string): string | null {
|
||||
const versionPath = join(projectDir, ".nerve-version");
|
||||
if (existsSync(versionPath)) {
|
||||
return readFileSync(versionPath, "utf8").trim();
|
||||
}
|
||||
const rulesPath = join(projectDir, ".cursorrules");
|
||||
if (!existsSync(rulesPath)) return null;
|
||||
const content = readFileSync(rulesPath, "utf8");
|
||||
const match = content.match(CURSOR_VERSION_MARKER_RE);
|
||||
return match !== null ? match[1].trim() : null;
|
||||
}
|
||||
|
||||
function injectCursor(projectDir: string): void {
|
||||
assertDirectory(projectDir, "Project directory");
|
||||
const rulesPath = join(projectDir, ".cursorrules");
|
||||
const existingVer = readCursorInjectVersion(projectDir);
|
||||
if (existingVer === cliVersion() && existsSync(rulesPath)) {
|
||||
process.stdout.write(
|
||||
`✅ Cursor .cursorrules is already up to date (v${cliVersion()}) at ${projectDir}\n`,
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
const templatePath = join(getSkillSourceDir(), "cursor", ".cursorrules");
|
||||
if (!existsSync(templatePath)) {
|
||||
throw new Error("Cannot locate cursor/.cursorrules template. Is the CLI package intact?");
|
||||
}
|
||||
let body = readFileSync(templatePath, "utf8");
|
||||
body = body.replaceAll("__NERVE_CLI_VERSION__", cliVersion());
|
||||
writeFileSync(rulesPath, body, "utf8");
|
||||
writeVersionFile(projectDir, cliVersion());
|
||||
|
||||
const action = existingVer !== null ? "Updated" : "Installed";
|
||||
process.stdout.write(`✅ ${action} Cursor .cursorrules v${cliVersion()} at ${projectDir}\n`);
|
||||
}
|
||||
|
||||
function removeCursor(projectDir: string): void {
|
||||
assertDirectory(projectDir, "Project directory");
|
||||
const rulesPath = join(projectDir, ".cursorrules");
|
||||
const versionPath = join(projectDir, ".nerve-version");
|
||||
if (!existsSync(rulesPath)) {
|
||||
process.stdout.write(`ℹ️ Cursor .cursorrules is not present at ${projectDir}\n`);
|
||||
return;
|
||||
}
|
||||
rmSync(rulesPath, { force: true });
|
||||
if (existsSync(versionPath)) {
|
||||
rmSync(versionPath, { force: true });
|
||||
}
|
||||
process.stdout.write(`✅ Removed Cursor .cursorrules from ${projectDir}\n`);
|
||||
}
|
||||
|
||||
function injectHermes(profile: string | null): void {
|
||||
const sourceDir = join(getSkillSourceDir(), "hermes");
|
||||
const targetDir = getHermesSkillDir(profile);
|
||||
const existing = readVersionFile(targetDir);
|
||||
|
||||
if (existing === cliVersion()) {
|
||||
const loc = profile !== null ? ` (profile: ${profile})` : "";
|
||||
process.stdout.write(`✅ Hermes nerve skill is already up to date (v${cliVersion()})${loc}\n`);
|
||||
return;
|
||||
}
|
||||
|
||||
mkdirSync(targetDir, { recursive: true });
|
||||
cpSync(sourceDir, targetDir, { recursive: true });
|
||||
writeVersionFile(targetDir, cliVersion());
|
||||
|
||||
const action = existing !== null ? "Updated" : "Installed";
|
||||
const loc = profile !== null ? ` (profile: ${profile})` : "";
|
||||
process.stdout.write(`✅ ${action} Hermes nerve skill v${cliVersion()}${loc}\n`);
|
||||
process.stdout.write(` → ${targetDir}/SKILL.md\n`);
|
||||
}
|
||||
|
||||
function removeHermes(profile: string | null): void {
|
||||
const targetDir = getHermesSkillDir(profile);
|
||||
if (!existsSync(targetDir)) {
|
||||
process.stdout.write("ℹ️ Hermes nerve skill is not installed.\n");
|
||||
return;
|
||||
}
|
||||
rmSync(targetDir, { recursive: true, force: true });
|
||||
const loc = profile !== null ? ` (profile: ${profile})` : "";
|
||||
process.stdout.write(`✅ Removed Hermes nerve skill${loc}\n`);
|
||||
}
|
||||
|
||||
function printCursorStatusLine(projectDir: string): void {
|
||||
const rulesPath = join(projectDir, ".cursorrules");
|
||||
const label = `Cursor (${projectDir})`;
|
||||
if (!existsSync(rulesPath)) {
|
||||
process.stdout.write(` ${label}: ❌ not installed\n`);
|
||||
return;
|
||||
}
|
||||
const ver = readCursorInjectVersion(projectDir);
|
||||
if (ver === null) {
|
||||
process.stdout.write(
|
||||
` ${label}: ⚠️ installed (unknown version; run \`nerve agent inject cursor\`)\n`,
|
||||
);
|
||||
return;
|
||||
}
|
||||
if (ver === cliVersion()) {
|
||||
process.stdout.write(` ${label}: ✅ v${ver}\n`);
|
||||
} else {
|
||||
process.stdout.write(
|
||||
` ${label}: ⚠️ v${ver} → v${cliVersion()} available (run \`nerve agent inject cursor\`)\n`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
function printStatus(): void {
|
||||
process.stdout.write(`nerve agent skills (CLI v${cliVersion()})\n\n`);
|
||||
|
||||
printCursorStatusLine(process.cwd());
|
||||
process.stdout.write("\n");
|
||||
|
||||
// Default profile
|
||||
const defaultDir = getHermesSkillDir(null);
|
||||
const defaultVer = readVersionFile(defaultDir);
|
||||
printAgentLine("Hermes (default)", defaultVer);
|
||||
|
||||
// Named profiles
|
||||
const profilesDir = join(homedir(), ".hermes", "profiles");
|
||||
if (existsSync(profilesDir)) {
|
||||
const profiles = readdirSync(profilesDir, { withFileTypes: true })
|
||||
.filter((d) => d.isDirectory())
|
||||
.map((d) => d.name);
|
||||
|
||||
for (const profile of profiles) {
|
||||
const dir = getHermesSkillDir(profile);
|
||||
const ver = readVersionFile(dir);
|
||||
if (ver !== null) {
|
||||
printAgentLine(`Hermes (${profile})`, ver);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
process.stdout.write("\n");
|
||||
}
|
||||
|
||||
function printAgentLine(label: string, version: string | null): void {
|
||||
if (version === null) {
|
||||
process.stdout.write(` ${label}: ❌ not installed\n`);
|
||||
} else if (version === cliVersion()) {
|
||||
process.stdout.write(` ${label}: ✅ v${version}\n`);
|
||||
} else {
|
||||
process.stdout.write(
|
||||
` ${label}: ⚠️ v${version} → v${cliVersion()} available (run \`nerve agent update\`)\n`,
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
const injectCommand = defineCommand({
|
||||
meta: {
|
||||
name: "inject",
|
||||
description: "Inject nerve skill into an AI agent",
|
||||
},
|
||||
args: {
|
||||
target: {
|
||||
type: "positional",
|
||||
description: "Agent target: hermes | cursor",
|
||||
},
|
||||
profile: {
|
||||
type: "string",
|
||||
description: "Hermes profile name (default: main profile)",
|
||||
},
|
||||
path: {
|
||||
type: "string",
|
||||
description: "Project directory for Cursor rules (default: cwd); only used with cursor",
|
||||
},
|
||||
},
|
||||
run({ args }) {
|
||||
const target = args.target;
|
||||
if (target === "hermes") {
|
||||
if (args.path != null && args.path !== "") {
|
||||
process.stderr.write("❌ --path applies only to the cursor target\n");
|
||||
process.exit(1);
|
||||
}
|
||||
injectHermes(args.profile ?? null);
|
||||
return;
|
||||
}
|
||||
if (target === "cursor") {
|
||||
if (args.profile != null && args.profile !== "") {
|
||||
process.stderr.write("❌ --profile applies only to the hermes target\n");
|
||||
process.exit(1);
|
||||
}
|
||||
const pathArg = args.path != null && args.path !== "" ? args.path : null;
|
||||
injectCursor(resolveCursorProjectDir(pathArg));
|
||||
return;
|
||||
}
|
||||
process.stderr.write(`❌ Unknown agent target: ${target}\n`);
|
||||
process.stderr.write(" Supported targets: hermes, cursor\n");
|
||||
process.exit(1);
|
||||
},
|
||||
});
|
||||
|
||||
const updateCommand = defineCommand({
|
||||
meta: {
|
||||
name: "update",
|
||||
description: "Update all injected nerve skills to current CLI version",
|
||||
},
|
||||
run() {
|
||||
let updated = 0;
|
||||
|
||||
// Default profile
|
||||
const defaultDir = getHermesSkillDir(null);
|
||||
if (existsSync(defaultDir)) {
|
||||
injectHermes(null);
|
||||
updated++;
|
||||
}
|
||||
|
||||
// Named profiles
|
||||
const profilesDir = join(homedir(), ".hermes", "profiles");
|
||||
if (existsSync(profilesDir)) {
|
||||
const profiles = readdirSync(profilesDir, { withFileTypes: true })
|
||||
.filter((d) => d.isDirectory())
|
||||
.map((d) => d.name);
|
||||
|
||||
for (const profile of profiles) {
|
||||
const dir = getHermesSkillDir(profile);
|
||||
if (existsSync(dir)) {
|
||||
injectHermes(profile);
|
||||
updated++;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (updated === 0) {
|
||||
process.stdout.write("ℹ️ No injected skills found. Run `nerve agent inject hermes` first.\n");
|
||||
}
|
||||
},
|
||||
});
|
||||
|
||||
const removeCommand = defineCommand({
|
||||
meta: {
|
||||
name: "remove",
|
||||
description: "Remove injected nerve skill from an AI agent",
|
||||
},
|
||||
args: {
|
||||
target: {
|
||||
type: "positional",
|
||||
description: "Agent target: hermes | cursor",
|
||||
},
|
||||
profile: {
|
||||
type: "string",
|
||||
description: "Hermes profile name (default: main profile)",
|
||||
},
|
||||
path: {
|
||||
type: "string",
|
||||
description: "Project directory for Cursor rules (default: cwd); only used with cursor",
|
||||
},
|
||||
},
|
||||
run({ args }) {
|
||||
const target = args.target;
|
||||
if (target === "hermes") {
|
||||
if (args.path != null && args.path !== "") {
|
||||
process.stderr.write("❌ --path applies only to the cursor target\n");
|
||||
process.exit(1);
|
||||
}
|
||||
removeHermes(args.profile ?? null);
|
||||
return;
|
||||
}
|
||||
if (target === "cursor") {
|
||||
if (args.profile != null && args.profile !== "") {
|
||||
process.stderr.write("❌ --profile applies only to the hermes target\n");
|
||||
process.exit(1);
|
||||
}
|
||||
const pathArg = args.path != null && args.path !== "" ? args.path : null;
|
||||
removeCursor(resolveCursorProjectDir(pathArg));
|
||||
return;
|
||||
}
|
||||
process.stderr.write(`❌ Unknown agent target: ${target}\n`);
|
||||
process.stderr.write(" Supported targets: hermes, cursor\n");
|
||||
process.exit(1);
|
||||
},
|
||||
});
|
||||
|
||||
const statusCommand = defineCommand({
|
||||
meta: {
|
||||
name: "status",
|
||||
description: "Show injection status of nerve skills across agents",
|
||||
},
|
||||
run() {
|
||||
printStatus();
|
||||
},
|
||||
});
|
||||
|
||||
export const agentCommand = defineCommand({
|
||||
meta: {
|
||||
name: "agent",
|
||||
description: "Manage nerve skill injection for AI agents",
|
||||
},
|
||||
subCommands: {
|
||||
inject: injectCommand,
|
||||
update: updateCommand,
|
||||
remove: removeCommand,
|
||||
status: statusCommand,
|
||||
},
|
||||
});
|
||||
@@ -127,6 +127,70 @@ node_modules/
|
||||
knowledge.db
|
||||
`;
|
||||
|
||||
/** Generated at workspace root so agents can \`cat AGENT.md\` instead of npm skill paths. */
|
||||
const AGENT_MD = `# Nerve workspace — agent guide
|
||||
|
||||
This file is created by \`nerve init\`. Read it before implementing senses or workflows.
|
||||
|
||||
## Directory layout
|
||||
|
||||
| Path | Purpose |
|
||||
|------|---------|
|
||||
| \`nerve.yaml\` | Senses, workflows, intervals, groups |
|
||||
| \`package.json\` | Single root package — no per-sense/per-workflow packages |
|
||||
| \`scripts/build.mjs\` | Root esbuild step; output under \`dist/\` |
|
||||
| \`senses/<name>/src/index.ts\` | Sense \`compute()\` entry |
|
||||
| \`senses/<name>/src/schema.ts\` | Drizzle SQLite schema (TypeScript) |
|
||||
| \`senses/<name>/migrations/*.sql\` | SQL migrations (next to \`src/\`, not inside it) |
|
||||
| \`workflows/<name>/index.ts\` | Default export: \`WorkflowDefinition\` |
|
||||
| \`workflows/<name>/roles/<role>.ts\` | One TypeScript file per role |
|
||||
| \`dist/senses/<name>/index.js\` | Bundled sense (after build) |
|
||||
| \`dist/workflows/<name>/index.js\` | Bundled workflow (after build) |
|
||||
|
||||
There is **no** \`package.json\` or \`tsconfig.json\` inside individual senses or workflows.
|
||||
|
||||
## Naming
|
||||
|
||||
- **Workflows:** verb-first kebab-case (e.g. \`review-pull-request\`, \`deploy-staging\`). Avoid bare nouns like \`notifications\`.
|
||||
- **Senses:** kebab-case descriptive nouns (e.g. \`cpu-usage\`).
|
||||
|
||||
## Workflow roles — four-tuple pattern
|
||||
|
||||
Wire each role with \`createRole\` from \`@uncaged/nerve-workflow-utils\`:
|
||||
|
||||
1. **Adapter** — \`AgentFn\` (LLM call)
|
||||
2. **Prompt builder** — \`async (ctx: ThreadContext) => string\`
|
||||
3. **Meta schema** — Zod object (routing / structured output from the model)
|
||||
4. **Extractor config** — how JSON meta is parsed from replies
|
||||
|
||||
Keep meta small (often one boolean per role). The **moderator** in \`WorkflowDefinition\` routes between role names.
|
||||
|
||||
## Build commands
|
||||
|
||||
Always run from the **workspace root**:
|
||||
|
||||
\`\`\`bash
|
||||
pnpm run build
|
||||
# or: npm run build
|
||||
\`\`\`
|
||||
|
||||
Fix errors until this succeeds. New workflows must appear under \`workflows/<name>/\` and be registered in \`nerve.yaml\`; new senses under \`senses/<name>/\` with matching \`nerve.yaml\` entries.
|
||||
|
||||
## Coding style (Nerve conventions)
|
||||
|
||||
- Use \`type\`, not \`interface\`; prefer \`function\` over classes (except errors / library requirements).
|
||||
- **Named exports only** — no \`export default\` (exception: \`workflows/<name>/index.ts\` uses default export for the daemon loader).
|
||||
- Nullable fields: \`T | null\`, not TypeScript optional \`?:\`.
|
||||
- No dynamic \`import()\` in workspace code (bundling and tooling assume static imports).
|
||||
- Use \`async\`/\`await\`; use a \`Result\` type for expected failures instead of control-flow try/catch.
|
||||
|
||||
## Extra references (optional)
|
||||
|
||||
- \`CONVENTIONS.md\` — project-specific overrides at repo root.
|
||||
- \`.knowledge/*.md\` — deeper docs when working inside the Nerve monorepo.
|
||||
- \`.cursor/skills/\` — Cursor Agent Skills (\`SKILL.md\` per skill).
|
||||
`;
|
||||
|
||||
const NERVE_SKILLS_MDC = `---
|
||||
description: >-
|
||||
Where Agent Skills live in this Nerve workspace and how to use them with Cursor
|
||||
@@ -362,6 +426,7 @@ async function runInitWorkspace(force: boolean, skipInstall = false): Promise<vo
|
||||
writeFile(join(nerveRoot, "scripts", "build.mjs"), BUILD_MJS);
|
||||
writeFile(join(nerveRoot, "biome.json"), BIOME_JSON);
|
||||
writeFile(join(nerveRoot, ".gitignore"), GITIGNORE);
|
||||
writeFile(join(nerveRoot, "AGENT.md"), AGENT_MD);
|
||||
writeFile(join(nerveRoot, "senses", "cpu-usage", "src", "index.ts"), CPU_INDEX_TS);
|
||||
writeFile(join(nerveRoot, "senses", "cpu-usage", "src", "schema.ts"), CPU_SCHEMA_TS);
|
||||
writeFile(
|
||||
|
||||
+22
-14
@@ -4,11 +4,11 @@ Shared types and configuration parser for the [nerve](../../README.md) observati
|
||||
|
||||
## What's Inside
|
||||
|
||||
- **Type definitions** — `Signal`, `SenseConfig`, `SenseInfo`, `SenseReflexConfig`, `ReflexConfig` (sense-only), `WorkflowConfig`, `NerveConfig`, and related types
|
||||
- **Type definitions** — `Signal`, `SenseConfig`, `SenseInfo`, `WorkflowConfig`, `NerveConfig`, and related types
|
||||
- **Config parser** — `parseNerveConfig(yaml)` validates and parses `nerve.yaml` into `NerveConfig` (rejects reflex entries that declare a `workflow` key; reflexes only schedule senses)
|
||||
- **Sense → workflow routing** — `parseSenseWorkflowDirective`, `routeSenseComputeOutput`, and types `ParsedSenseWorkflowDirective`, `SenseComputeRoute`
|
||||
- **Sense → workflow routing** — `parseWorkflowTrigger`, `routeSenseComputeOutput`, and types `WorkflowTrigger`, `RoutedSenseOutput`
|
||||
- **Daemon IPC protocol** — request/response types (`DaemonIpcRequest`, `DaemonIpcResponse`, …) and `parseDaemonIpcRequest` for newline-delimited JSON on the CLI ↔ daemon socket
|
||||
- **Workflow automaton types** — `START` / `END` sentinel constants, `WorkflowMessage`, `StartStep`, `RoleStep`, `ModeratorContext` (`start` + `steps`; empty `steps` on first moderator call), `Moderator` (single `context` argument), `WorkflowDefinition`, `Role`, `SenseResult`, plus `DEFAULT_ENGINE_MAX_ROUNDS`
|
||||
- **Workflow automaton types** — `START` / `END` sentinel constants, `WorkflowMessage`, `StartStep`, `RoleStep`, `ModeratorContext` (`start` + `steps`; empty `steps` on first moderator call), `Moderator` (single `context` argument), `WorkflowDefinition`, `Role`, `RoleResult`, plus `DEFAULT_ENGINE_MAX_ROUNDS`
|
||||
- **Result type** — `Result<T>` with `ok()` / `err()` helpers for explicit error handling (no thrown exceptions for parse paths)
|
||||
|
||||
## Usage
|
||||
@@ -26,23 +26,31 @@ if (result.ok) {
|
||||
### Sense return → signal vs workflow
|
||||
|
||||
```typescript
|
||||
import { parseSenseWorkflowDirective, routeSenseComputeOutput } from "@uncaged/nerve-core";
|
||||
import { parseWorkflowTrigger, routeSenseComputeOutput } from "@uncaged/nerve-core";
|
||||
|
||||
const directive = parseSenseWorkflowDirective("my-workflow|8|Hello from sense");
|
||||
const directive = parseWorkflowTrigger({
|
||||
name: "my-workflow",
|
||||
maxRounds: 8,
|
||||
prompt: "Hello from sense",
|
||||
dryRun: false,
|
||||
});
|
||||
if (directive.ok) {
|
||||
console.log(directive.value.workflowName, directive.value.maxRounds, directive.value.prompt);
|
||||
console.log(directive.value.name, directive.value.maxRounds, directive.value.prompt);
|
||||
}
|
||||
|
||||
const route = routeSenseComputeOutput({
|
||||
metric: 42,
|
||||
workflow: "my-workflow|8|Run now",
|
||||
signal: { metric: 42 },
|
||||
workflow: {
|
||||
name: "my-workflow",
|
||||
maxRounds: 8,
|
||||
prompt: "Run now",
|
||||
dryRun: false,
|
||||
},
|
||||
});
|
||||
if (route.kind === "launch") {
|
||||
// engine starts workflow; no Signal to the bus for this return
|
||||
console.log(route.launch);
|
||||
} else {
|
||||
// normal signal with payload
|
||||
console.log(route.payload);
|
||||
if (route.ok && route.value.workflow !== null) {
|
||||
console.log(route.value.workflow);
|
||||
} else if (route.ok) {
|
||||
console.log(route.value.signal);
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
@@ -21,9 +21,3 @@ export class ExtractError extends Error {
|
||||
Object.setPrototypeOf(this, new.target.prototype);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Agent adapter ids referenced by tooling / docs (RFC-003).
|
||||
* Workflows import adapter packages directly; echo may be used in tests via a small factory.
|
||||
*/
|
||||
export const KNOWN_AGENT_ADAPTER_IDS = ["echo", "cursor", "hermes", "codex"] as const;
|
||||
|
||||
@@ -13,7 +13,7 @@ export type {
|
||||
} from "./config.js";
|
||||
export type { Signal, SenseInfo } from "./sense.js";
|
||||
export type { SenseComputeFn, SenseModule } from "./sense.js";
|
||||
export { labelSenseTrigger, senseTriggerLabels } from "./sense.js";
|
||||
export { senseTriggerLabels } from "./sense.js";
|
||||
export type {
|
||||
WorkflowMessage,
|
||||
RoleResult,
|
||||
@@ -29,7 +29,6 @@ export type {
|
||||
WorkflowDefinition,
|
||||
} from "./workflow.js";
|
||||
export { START, END, DEFAULT_ENGINE_MAX_ROUNDS } from "./workflow.js";
|
||||
export { parseDurationStringToMs } from "./util.js";
|
||||
export type { Schema, ExtractFn } from "./agent.js";
|
||||
export { ExtractError } from "./agent.js";
|
||||
export type { Result } from "./util.js";
|
||||
@@ -46,7 +45,6 @@ export { parseNerveConfig } from "./config.js";
|
||||
export type { KnowledgeConfig } from "./config.js";
|
||||
export { parseKnowledgeYaml } from "./config.js";
|
||||
export { isPlainRecord } from "./util.js";
|
||||
export { KNOWN_AGENT_ADAPTER_IDS } from "./agent.js";
|
||||
|
||||
export type { RoutedSenseOutput } from "./sense.js";
|
||||
export { parseWorkflowTrigger, routeSenseComputeOutput } from "./sense.js";
|
||||
|
||||
@@ -28,6 +28,11 @@ function makeMockChild(pid = 1): MockChild {
|
||||
child.connected = true;
|
||||
child.exitCode = null;
|
||||
child.pid = pid;
|
||||
setImmediate(() => {
|
||||
if (child.connected) {
|
||||
child.emit("message", { type: "ready" });
|
||||
}
|
||||
});
|
||||
child.send = vi.fn((msg: unknown) => {
|
||||
if (
|
||||
msg !== null &&
|
||||
@@ -132,6 +137,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test 1", maxRounds: 10, dryRun: false });
|
||||
mgr.startWorkflow("my-wf", { prompt: "test 2", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
expect(mgr.activeCount("my-wf")).toBe(2);
|
||||
|
||||
// Simulate unexpected exit (not shutdown)
|
||||
@@ -159,6 +165,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
expect(mgr.activeCount("my-wf")).toBe(2);
|
||||
|
||||
const child = mockChildren[0];
|
||||
@@ -183,6 +190,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
expect(mockChildren).toHaveLength(1);
|
||||
|
||||
const child = mockChildren[0];
|
||||
@@ -216,6 +224,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const firstChild = mockChildren[0];
|
||||
firstChild.exitCode = 1;
|
||||
firstChild.connected = false;
|
||||
@@ -260,6 +269,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
|
||||
// Start one thread to fill the concurrency slot (so queued run stays queued on respawn)
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const firstChild = mockChildren[0];
|
||||
firstChild.exitCode = 1;
|
||||
firstChild.connected = false;
|
||||
@@ -285,6 +295,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const child = mockChildren[0];
|
||||
const startCall = (child.send as ReturnType<typeof vi.fn>).mock.calls[0];
|
||||
@@ -322,6 +333,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
|
||||
const launch = { prompt: "build-docker for myrepo", maxRounds: 10, dryRun: false };
|
||||
mgr.startWorkflow("my-wf", launch);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const startedCall = logStore.upsertWorkflowRun.mock.calls.find(
|
||||
(args: any[]) => (args[0] as { type: string }).type === "started",
|
||||
@@ -357,6 +369,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
|
||||
// Start one thread to fill the concurrency slot
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const firstChild = mockChildren[0];
|
||||
|
||||
// Crash once → respawn → crash again → second respawn
|
||||
@@ -398,6 +411,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const firstChild = mockChildren[0];
|
||||
firstChild.exitCode = 1;
|
||||
firstChild.connected = false;
|
||||
@@ -428,6 +442,7 @@ describe("WorkflowManager — crash recovery (Phase 3)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("crash-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Crash the worker 6 times in rapid succession (within CRASH_WINDOW_MS = 60s)
|
||||
for (let i = 0; i < 6; i++) {
|
||||
|
||||
@@ -0,0 +1,9 @@
|
||||
// Ready then crashes on a timer; still echoes IPC so parent tests can send after respawn
|
||||
process.on("message", (msg) => {
|
||||
if (msg && msg.type === "shutdown") {
|
||||
process.exit(0);
|
||||
}
|
||||
process.send({ type: "echo", payload: msg });
|
||||
});
|
||||
process.send({ type: "ready" });
|
||||
setTimeout(() => process.exit(1), 50);
|
||||
@@ -0,0 +1,9 @@
|
||||
// Simple test worker: sends ready, echoes messages, handles shutdown
|
||||
process.on("message", (msg) => {
|
||||
if (msg && msg.type === "shutdown") {
|
||||
process.exit(0);
|
||||
}
|
||||
// Echo back with 'echo' type
|
||||
process.send({ type: "echo", payload: msg });
|
||||
});
|
||||
process.send({ type: "ready" });
|
||||
@@ -0,0 +1,9 @@
|
||||
// Like echo-worker but writes stderr for tail diagnostics
|
||||
console.error("stderr-marker");
|
||||
process.on("message", (msg) => {
|
||||
if (msg && msg.type === "shutdown") {
|
||||
process.exit(0);
|
||||
}
|
||||
process.send({ type: "echo", payload: msg });
|
||||
});
|
||||
process.send({ type: "ready" });
|
||||
@@ -33,6 +33,11 @@ function makeMockChild(pid = 1): MockChild {
|
||||
child.connected = true;
|
||||
child.exitCode = null;
|
||||
child.pid = pid;
|
||||
setImmediate(() => {
|
||||
if (child.connected) {
|
||||
child.emit("message", { type: "ready" });
|
||||
}
|
||||
});
|
||||
child.send = vi.fn((msg: unknown) => {
|
||||
if (
|
||||
msg !== null &&
|
||||
@@ -114,6 +119,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
expect(mockChildren).toHaveLength(1);
|
||||
|
||||
// Remove workflow from config before drain completes
|
||||
@@ -134,6 +140,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
expect(mgr.activeCount("my-wf")).toBe(2);
|
||||
|
||||
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
|
||||
@@ -165,6 +172,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
expect(mockChildren).toHaveLength(1);
|
||||
|
||||
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
|
||||
@@ -181,6 +189,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
expect(mockChildren).toHaveLength(1);
|
||||
|
||||
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
|
||||
@@ -198,6 +207,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
|
||||
await vi.runAllTimersAsync();
|
||||
@@ -223,6 +233,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "first", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const drainPromise = mgr.drainAndRespawn("my-wf", 5000);
|
||||
await vi.runAllTimersAsync();
|
||||
@@ -230,6 +241,7 @@ describe("WorkflowManager — drainAndRespawn (Phase 3 hot reload)", () => {
|
||||
|
||||
// Start a new thread on the fresh worker
|
||||
mgr.startWorkflow("my-wf", { prompt: "second", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const newChild = mockChildren[1];
|
||||
const startCalls = (newChild.send as ReturnType<typeof vi.fn>).mock.calls.filter(
|
||||
@@ -257,12 +269,13 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
|
||||
vi.clearAllMocks();
|
||||
});
|
||||
|
||||
it("does not send shutdown while a thread is still active", () => {
|
||||
it("does not send shutdown while a thread is still active", async () => {
|
||||
const logStore = makeLogStore();
|
||||
const config = makeWfConfig({ "my-wf": { concurrency: 1, overflow: "drop" } });
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const child = mockChildren[0];
|
||||
|
||||
mgr.drainWhenIdle("my-wf");
|
||||
@@ -282,6 +295,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const child = mockChildren[0];
|
||||
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
|
||||
|
||||
@@ -311,6 +325,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "a", maxRounds: 10, dryRun: false });
|
||||
mgr.startWorkflow("my-wf", { prompt: "b", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const child = mockChildren[0];
|
||||
const sendMock = child.send as ReturnType<typeof vi.fn>;
|
||||
const runIdA = (sendMock.mock.calls[0][0] as { runId: string }).runId;
|
||||
@@ -355,6 +370,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const child = mockChildren[0];
|
||||
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
|
||||
|
||||
@@ -388,6 +404,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const child = mockChildren[0];
|
||||
const runId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as { runId: string };
|
||||
|
||||
@@ -414,6 +431,7 @@ describe("WorkflowManager — drainWhenIdle (hot reload without interrupting in-
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-wf", { prompt: "once", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const firstChild = mockChildren[0];
|
||||
const runId = (firstChild.send as ReturnType<typeof vi.fn>).mock.calls[0][0] as {
|
||||
runId: string;
|
||||
@@ -471,6 +489,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
|
||||
|
||||
// Trigger a workflow thread so a worker is spawned
|
||||
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Manually call drainAndRespawn (simulating what kernel does on workflow file change)
|
||||
const drainPromise = kernel.workflowManager.drainAndRespawn("my-wf", 1000);
|
||||
@@ -511,6 +530,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
|
||||
maxRounds: 10,
|
||||
dryRun: false,
|
||||
});
|
||||
await vi.runAllTimersAsync();
|
||||
expect(mockChildren).toHaveLength(1);
|
||||
|
||||
// Reload config without old-wf
|
||||
@@ -551,6 +571,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
|
||||
});
|
||||
|
||||
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
const workersBefore = mockChildren.length;
|
||||
|
||||
// Reload with updated concurrency — should NOT spawn a new workflow worker
|
||||
@@ -573,6 +594,7 @@ describe("Kernel — workflow hot reload via file-watcher (Phase 3)", () => {
|
||||
// Can now start up to 5 concurrent threads (previously only 1)
|
||||
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
kernel.workflowManager.startWorkflow("my-wf", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
expect(kernel.workflowManager.activeCount("my-wf")).toBe(3);
|
||||
|
||||
const stopPromise = kernel.stop();
|
||||
|
||||
@@ -70,6 +70,14 @@ const { createKernel } = await import("../kernel.js");
|
||||
// Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/** Sense worker `fork` runs on the next microtask per scheduled `start`. */
|
||||
async function flushSenseWorkerForkMicrotasks(kernel: { groups: Set<string> }): Promise<void> {
|
||||
const n = kernel.groups.size;
|
||||
for (let i = 0; i < n; i++) {
|
||||
await Promise.resolve();
|
||||
}
|
||||
}
|
||||
|
||||
function makeConfig(overrides: Partial<NerveConfig> = {}): NerveConfig {
|
||||
return {
|
||||
senses: {
|
||||
@@ -142,6 +150,8 @@ describe("kernel — getHealth", () => {
|
||||
},
|
||||
});
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const health = kernel.getHealth();
|
||||
expect(health.activeSenses).toBe(3);
|
||||
@@ -171,6 +181,8 @@ describe("kernel — restartGroup", () => {
|
||||
it("sends shutdown to old worker and spawns new one", async () => {
|
||||
const config = makeConfig();
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(mockChildren.length).toBe(1);
|
||||
const oldChild = mockChildren[0];
|
||||
@@ -178,6 +190,7 @@ describe("kernel — restartGroup", () => {
|
||||
const restartPromise = kernel.restartGroup("system");
|
||||
// The shutdown message triggers exit in the mock
|
||||
await restartPromise;
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// A new child should have been spawned
|
||||
expect(mockChildren.length).toBe(2);
|
||||
@@ -191,6 +204,8 @@ describe("kernel — restartGroup", () => {
|
||||
it("restartGroup on unknown group does nothing", async () => {
|
||||
const config = makeConfig();
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(mockChildren.length).toBe(1);
|
||||
await kernel.restartGroup("nonexistent");
|
||||
@@ -218,6 +233,8 @@ describe("kernel — reloadConfig", () => {
|
||||
it("adds new group worker when new sense group appears", async () => {
|
||||
const config = makeConfig();
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(mockChildren.length).toBe(1); // only system group
|
||||
expect(kernel.groups.has("network")).toBe(false);
|
||||
@@ -249,6 +266,9 @@ describe("kernel — reloadConfig", () => {
|
||||
api: { port: null, token: null, host: "127.0.0.1" },
|
||||
});
|
||||
|
||||
await Promise.resolve();
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(kernel.groups.has("network")).toBe(true);
|
||||
expect(mockChildren.length).toBe(2); // system + network
|
||||
|
||||
@@ -283,6 +303,8 @@ describe("kernel — reloadConfig", () => {
|
||||
api: { port: null, token: null, host: "127.0.0.1" },
|
||||
};
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(mockChildren.length).toBe(2);
|
||||
expect(kernel.groups.has("network")).toBe(true);
|
||||
@@ -308,6 +330,7 @@ describe("kernel — reloadConfig", () => {
|
||||
});
|
||||
|
||||
expect(kernel.groups.has("network")).toBe(false);
|
||||
await vi.runAllTimersAsync();
|
||||
// Network child should have received shutdown
|
||||
expect(networkChild.send).toHaveBeenCalledWith(expect.objectContaining({ type: "shutdown" }));
|
||||
|
||||
@@ -317,6 +340,8 @@ describe("kernel — reloadConfig", () => {
|
||||
it("health reflects updated sense count after reloadConfig", async () => {
|
||||
const config = makeConfig();
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(kernel.getHealth().activeSenses).toBe(1);
|
||||
|
||||
|
||||
@@ -29,6 +29,9 @@ type MockChild = EventEmitter & {
|
||||
function makeMockChild(pid = 1): MockChild {
|
||||
const child = new EventEmitter() as MockChild;
|
||||
child.connected = true;
|
||||
setImmediate(() => {
|
||||
child.emit("message", { type: "ready" });
|
||||
});
|
||||
child.send = vi.fn((msg: unknown) => {
|
||||
if (
|
||||
msg !== null &&
|
||||
@@ -136,6 +139,7 @@ describe("kernel.triggerSense()", () => {
|
||||
logStore: makeMockLogStore() as never,
|
||||
});
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
expect(() => kernel.triggerSense("no-such-sense")).toThrow(/Unknown sense/);
|
||||
|
||||
await kernel.stop();
|
||||
@@ -169,6 +173,7 @@ describe("kernel.triggerSense()", () => {
|
||||
logStore: makeMockLogStore() as never,
|
||||
});
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
// Two groups → two workers
|
||||
expect(mockChildren.length).toBe(2);
|
||||
|
||||
@@ -214,6 +219,7 @@ describe("kernel.triggerSense()", () => {
|
||||
logStore: makeMockLogStore() as never,
|
||||
});
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
// Both senses share the "system" group → one worker only
|
||||
expect(mockChildren.length).toBe(1);
|
||||
const worker = mockChildren[0];
|
||||
@@ -237,6 +243,7 @@ describe("kernel.triggerSense()", () => {
|
||||
logStore: makeMockLogStore() as never,
|
||||
});
|
||||
|
||||
await new Promise<void>((resolve) => setImmediate(resolve));
|
||||
const worker = mockChildren[0];
|
||||
worker.connected = false;
|
||||
|
||||
|
||||
@@ -102,6 +102,13 @@ function makeLogStore() {
|
||||
};
|
||||
}
|
||||
|
||||
async function flushSenseWorkerForkMicrotasks(kernel: { groups: Set<string> }): Promise<void> {
|
||||
const n = kernel.groups.size;
|
||||
for (let i = 0; i < n; i++) {
|
||||
await Promise.resolve();
|
||||
}
|
||||
}
|
||||
|
||||
function makeConfig(overrides: Partial<NerveConfig> = {}): NerveConfig {
|
||||
return {
|
||||
senses: {
|
||||
@@ -164,6 +171,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Simulate a sense worker sending a signal with workflow launch payload
|
||||
// The kernel's handleWorkerMessage processes "signal" type messages
|
||||
@@ -185,6 +194,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
});
|
||||
}
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// A workflow worker should be spawned and a start-thread message sent
|
||||
const workflowWorker = mockChildren.find((c) =>
|
||||
(c.send as ReturnType<typeof vi.fn>).mock.calls.some(
|
||||
@@ -222,6 +233,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Simulate sense worker returning a signal plus workflow launch
|
||||
const workerPool = mockChildren[0];
|
||||
@@ -241,6 +254,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
});
|
||||
}
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Find the start-thread call and verify triggerPayload
|
||||
const startThreadCall = mockChildren
|
||||
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
|
||||
@@ -275,6 +290,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const workerPool = mockChildren[0];
|
||||
if (workerPool) {
|
||||
@@ -293,6 +310,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
});
|
||||
}
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const senseEntries = logStore.append.mock.calls
|
||||
.map((c) => c[0] as { source: string; type: string; refId: string | null })
|
||||
.filter((e) => e.source === "sense" && e.refId === "cpu-usage");
|
||||
@@ -337,6 +356,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Emit a regular signal (shorthand payload) — should NOT trigger any workflow
|
||||
const workerPool = mockChildren[0];
|
||||
@@ -387,6 +408,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Simulate sense compute returning a signal plus workflow launch
|
||||
const workerPool = mockChildren[0];
|
||||
@@ -406,6 +429,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
});
|
||||
}
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(logStore.upsertWorkflowRun).toHaveBeenCalledWith(
|
||||
expect.objectContaining({ source: "workflow", type: "started" }),
|
||||
expect.objectContaining({ workflow: "log-test-workflow", status: "started" }),
|
||||
@@ -440,6 +465,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Reload with a workflow added
|
||||
const newConfig: NerveConfig = {
|
||||
@@ -479,6 +506,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
});
|
||||
}
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const startThreadCall = mockChildren
|
||||
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
|
||||
.find(
|
||||
@@ -517,6 +546,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Reload with the workflow removed
|
||||
const newConfig: NerveConfig = {
|
||||
@@ -561,6 +592,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
});
|
||||
}
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const startThreadCall = mockChildren
|
||||
.flatMap((c) => (c.send as ReturnType<typeof vi.fn>).mock.calls as [unknown][])
|
||||
.find(
|
||||
@@ -600,6 +633,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Trigger a workflow via sense compute return value
|
||||
const workerPool = mockChildren[0];
|
||||
@@ -619,6 +654,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
});
|
||||
}
|
||||
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const stopPromise = kernel.stop();
|
||||
await vi.runAllTimersAsync();
|
||||
await expect(stopPromise).resolves.toBeUndefined();
|
||||
@@ -664,6 +701,8 @@ describe("kernel + workflowManager integration", () => {
|
||||
workerScript: "fake-worker.js",
|
||||
logStore,
|
||||
});
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const health = kernel.getHealth();
|
||||
expect(health).toHaveProperty("activeWorkflows");
|
||||
|
||||
@@ -16,10 +16,12 @@ type MockChild = EventEmitter & {
|
||||
send: ReturnType<typeof vi.fn>;
|
||||
kill: ReturnType<typeof vi.fn>;
|
||||
pid: number;
|
||||
connected: boolean;
|
||||
};
|
||||
|
||||
function makeMockChild(pid = 1): MockChild {
|
||||
const child = new EventEmitter() as MockChild;
|
||||
child.connected = true;
|
||||
setImmediate(() => {
|
||||
child.emit("message", { type: "ready" });
|
||||
});
|
||||
@@ -27,7 +29,10 @@ function makeMockChild(pid = 1): MockChild {
|
||||
if (msg === null || typeof msg !== "object") return;
|
||||
const m = msg as Record<string, unknown>;
|
||||
if (m.type === "shutdown") {
|
||||
setImmediate(() => child.emit("exit", 0, null));
|
||||
setImmediate(() => {
|
||||
child.connected = false;
|
||||
child.emit("exit", 0, null);
|
||||
});
|
||||
return;
|
||||
}
|
||||
if (m.type === "compute" && typeof m.sense === "string") {
|
||||
@@ -37,6 +42,7 @@ function makeMockChild(pid = 1): MockChild {
|
||||
}
|
||||
});
|
||||
child.kill = vi.fn((_signal?: string) => {
|
||||
child.connected = false;
|
||||
child.emit("exit", null, _signal ?? "SIGKILL");
|
||||
});
|
||||
child.pid = pid;
|
||||
@@ -59,6 +65,14 @@ const { createLogStore } = await import("@uncaged/nerve-store");
|
||||
// Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/** `WorkerRuntime.start` schedules `fork` on the next microtask — flush one tick per initial group. */
|
||||
async function flushSenseWorkerForkMicrotasks(kernel: { groups: Set<string> }): Promise<void> {
|
||||
const n = kernel.groups.size;
|
||||
for (let i = 0; i < n; i++) {
|
||||
await Promise.resolve();
|
||||
}
|
||||
}
|
||||
|
||||
function makeConfig(overrides: Partial<NerveConfig> = {}): NerveConfig {
|
||||
return {
|
||||
senses: {
|
||||
@@ -173,6 +187,7 @@ describe("kernel — message routing", () => {
|
||||
},
|
||||
});
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
|
||||
const child = mockChildren[0];
|
||||
child.emit("message", { type: "error", sense: "cpu-usage", error: "compute failed" });
|
||||
@@ -201,6 +216,7 @@ describe("kernel — message routing", () => {
|
||||
},
|
||||
});
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
|
||||
const child = mockChildren[0];
|
||||
const callsBefore = stderrSpy.mock.calls.length;
|
||||
@@ -228,6 +244,7 @@ describe("kernel — message routing", () => {
|
||||
},
|
||||
});
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
|
||||
const child = mockChildren[0];
|
||||
expect(() => child.emit("message", { type: "unknown-type" })).not.toThrow();
|
||||
@@ -290,6 +307,7 @@ describe("kernel — groupForSense mapping", () => {
|
||||
api: { port: null, token: null, host: "127.0.0.1" },
|
||||
};
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
|
||||
// system and network = 2 unique groups
|
||||
expect(mockChildren.length).toBe(2);
|
||||
@@ -311,8 +329,10 @@ describe("kernel — groupForSense mapping", () => {
|
||||
},
|
||||
});
|
||||
const kernel = createKernel(config, nerveRoot);
|
||||
|
||||
await flushSenseWorkerForkMicrotasks(kernel);
|
||||
const child = mockChildren[0];
|
||||
child.emit("message", { type: "ready" });
|
||||
|
||||
vi.advanceTimersByTime(500);
|
||||
|
||||
expect(child.send).toHaveBeenCalledWith(
|
||||
|
||||
@@ -50,6 +50,7 @@ async function startWorkerWithReady(
|
||||
group: string,
|
||||
): Promise<void> {
|
||||
const pr = pool.startWorker(group);
|
||||
await Promise.resolve();
|
||||
const child = mockChildren[mockChildren.length - 1];
|
||||
child.emit("message", { type: "ready" });
|
||||
await pr;
|
||||
@@ -137,6 +138,7 @@ describe("createSenseWorkerPool", () => {
|
||||
expect(pool.activeGroupCount()).toBe(1);
|
||||
pool.evictGroup("x");
|
||||
expect(pool.hasWorkerForGroup("x")).toBe(false);
|
||||
await Promise.resolve();
|
||||
expect(mockChildren[0].send).toHaveBeenCalledWith(
|
||||
expect.objectContaining({ type: "shutdown" }),
|
||||
);
|
||||
@@ -159,6 +161,7 @@ describe("createSenseWorkerPool", () => {
|
||||
|
||||
const p = pool.restartGroup("g");
|
||||
expect(onBeforeGroupRestart).toHaveBeenCalledWith("g");
|
||||
await Promise.resolve();
|
||||
expect(mockChildren[0].send).toHaveBeenCalledWith(
|
||||
expect.objectContaining({ type: "shutdown" }),
|
||||
);
|
||||
@@ -171,7 +174,7 @@ describe("createSenseWorkerPool", () => {
|
||||
});
|
||||
|
||||
it("onWorkerCrashed runs and schedules respawn after non-zero exit", async () => {
|
||||
vi.useFakeTimers({ shouldAdvanceTime: true });
|
||||
vi.useFakeTimers();
|
||||
const onWorkerCrashed = vi.fn();
|
||||
const pool = createSenseWorkerPool({
|
||||
nerveRoot: "/tmp/n",
|
||||
|
||||
@@ -0,0 +1,181 @@
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
import { afterEach, describe, expect, it, vi } from "vitest";
|
||||
import { createWorkerRuntime } from "../worker-runtime.js";
|
||||
|
||||
const fixturesDir = join(dirname(fileURLToPath(import.meta.url)), "fixtures");
|
||||
const echoWorkerPath = join(fixturesDir, "echo-worker.js");
|
||||
const crashWorkerPath = join(fixturesDir, "crash-worker.js");
|
||||
const stderrWorkerPath = join(fixturesDir, "stderr-worker.js");
|
||||
|
||||
function baseConfig(script: string) {
|
||||
return {
|
||||
script,
|
||||
argsForKey: () => [],
|
||||
forwardStderr: true,
|
||||
onMessage: vi.fn(),
|
||||
onReady: vi.fn(),
|
||||
onExit: vi.fn(),
|
||||
onCrashLimitReached: null,
|
||||
respawn: {
|
||||
enabled: true,
|
||||
maxCrashes: 6,
|
||||
windowMs: 60_000,
|
||||
delayMs: 80,
|
||||
allowRespawn: null,
|
||||
},
|
||||
shutdownTimeoutMs: 5000,
|
||||
};
|
||||
}
|
||||
|
||||
describe("createWorkerRuntime", () => {
|
||||
const runtimes: Array<{ shutdown: () => Promise<void> }> = [];
|
||||
|
||||
afterEach(async () => {
|
||||
await Promise.all(runtimes.splice(0).map((r) => r.shutdown()));
|
||||
});
|
||||
|
||||
function track<R extends { shutdown: () => Promise<void> }>(r: R): R {
|
||||
runtimes.push(r);
|
||||
return r;
|
||||
}
|
||||
|
||||
it("start + send message + receive echo", async () => {
|
||||
const incoming: unknown[] = [];
|
||||
const rt = track(
|
||||
createWorkerRuntime({
|
||||
...baseConfig(echoWorkerPath),
|
||||
onMessage: (_key, msg) => {
|
||||
incoming.push(msg);
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
await rt.start("a");
|
||||
expect(rt.has("a")).toBe(true);
|
||||
await rt.send("a", { type: "ping", n: 1 });
|
||||
|
||||
await vi.waitFor(() => {
|
||||
expect(incoming.some((m) => isEchoOf(m, { type: "ping", n: 1 }))).toBe(true);
|
||||
});
|
||||
await rt.shutdown();
|
||||
});
|
||||
|
||||
it("cold start on send (no explicit start)", async () => {
|
||||
const incoming: unknown[] = [];
|
||||
const rt = track(
|
||||
createWorkerRuntime({
|
||||
...baseConfig(echoWorkerPath),
|
||||
onMessage: (_key, msg) => {
|
||||
incoming.push(msg);
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
expect(rt.has("x")).toBe(false);
|
||||
await rt.send("x", { type: "hi" });
|
||||
await vi.waitFor(() => {
|
||||
expect(rt.has("x")).toBe(true);
|
||||
expect(incoming.some((m) => isEchoOf(m, { type: "hi" }))).toBe(true);
|
||||
});
|
||||
await rt.shutdown();
|
||||
});
|
||||
|
||||
it("evict stops worker; has() is false", async () => {
|
||||
const rt = track(createWorkerRuntime(baseConfig(echoWorkerPath)));
|
||||
await rt.start("k");
|
||||
expect(rt.has("k")).toBe(true);
|
||||
await rt.evict("k", null);
|
||||
expect(rt.has("k")).toBe(false);
|
||||
await rt.shutdown();
|
||||
});
|
||||
|
||||
it("drain stops and respawns (new pid)", async () => {
|
||||
const rt = track(createWorkerRuntime(baseConfig(echoWorkerPath)));
|
||||
await rt.start("k");
|
||||
const before = rt.pid("k");
|
||||
expect(before).not.toBeNull();
|
||||
await rt.drain("k", null);
|
||||
const after = rt.pid("k");
|
||||
expect(after).not.toBeNull();
|
||||
expect(after).not.toBe(before);
|
||||
await rt.shutdown();
|
||||
});
|
||||
|
||||
it("crash triggers auto-respawn", async () => {
|
||||
const incoming: unknown[] = [];
|
||||
const onExit = vi.fn();
|
||||
const rt = track(
|
||||
createWorkerRuntime({
|
||||
...baseConfig(crashWorkerPath),
|
||||
onExit,
|
||||
onMessage: (_key, msg) => {
|
||||
incoming.push(msg);
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
await rt.start("c");
|
||||
|
||||
await vi.waitFor(() => expect(onExit.mock.calls.length).toBeGreaterThanOrEqual(1), {
|
||||
timeout: 3000,
|
||||
});
|
||||
await vi.waitFor(() => expect(rt.has("c")).toBe(true), { timeout: 3000 });
|
||||
|
||||
await rt.send("c", { type: "after-crash" });
|
||||
await vi.waitFor(() => {
|
||||
expect(incoming.some((m) => isEchoOf(m, { type: "after-crash" }))).toBe(true);
|
||||
});
|
||||
await rt.shutdown();
|
||||
});
|
||||
|
||||
it("crash limit reached → no more automatic respawns", async () => {
|
||||
const rt = track(
|
||||
createWorkerRuntime({
|
||||
...baseConfig(crashWorkerPath),
|
||||
respawn: {
|
||||
enabled: true,
|
||||
maxCrashes: 2,
|
||||
windowMs: 60_000,
|
||||
delayMs: 50,
|
||||
allowRespawn: null,
|
||||
},
|
||||
}),
|
||||
);
|
||||
|
||||
await rt.start("z");
|
||||
|
||||
await vi.waitFor(() => expect(rt.has("z")).toBe(false), { timeout: 8000 });
|
||||
|
||||
await rt.shutdown();
|
||||
});
|
||||
|
||||
it("shutdown stops all workers", async () => {
|
||||
const rt = track(createWorkerRuntime(baseConfig(echoWorkerPath)));
|
||||
await rt.start("a");
|
||||
await rt.start("b");
|
||||
expect(rt.keys().sort()).toEqual(["a", "b"].sort());
|
||||
await rt.shutdown();
|
||||
expect(rt.keys()).toEqual([]);
|
||||
expect(rt.has("a")).toBe(false);
|
||||
expect(rt.has("b")).toBe(false);
|
||||
});
|
||||
|
||||
it("stderrTail captures stderr output", async () => {
|
||||
const rt = track(createWorkerRuntime(baseConfig(stderrWorkerPath)));
|
||||
await rt.start("s");
|
||||
await vi.waitFor(() => {
|
||||
expect(rt.stderrTail("s")).toContain("stderr-marker");
|
||||
});
|
||||
await rt.shutdown();
|
||||
});
|
||||
});
|
||||
|
||||
function isEchoOf(msg: unknown, payload: unknown): boolean {
|
||||
return (
|
||||
typeof msg === "object" &&
|
||||
msg !== null &&
|
||||
(msg as Record<string, unknown>).type === "echo" &&
|
||||
JSON.stringify((msg as Record<string, unknown>).payload) === JSON.stringify(payload)
|
||||
);
|
||||
}
|
||||
@@ -26,6 +26,11 @@ function makeMockChild(pid = 1): MockChild {
|
||||
child.connected = true;
|
||||
child.exitCode = null;
|
||||
child.pid = pid;
|
||||
setImmediate(() => {
|
||||
if (child.connected) {
|
||||
child.emit("message", { type: "ready" });
|
||||
}
|
||||
});
|
||||
child.send = vi.fn((msg: unknown) => {
|
||||
if (
|
||||
msg !== null &&
|
||||
@@ -110,7 +115,7 @@ describe("WorkflowManager", () => {
|
||||
});
|
||||
|
||||
describe("startWorkflow under concurrency limit dispatches thread", () => {
|
||||
it("forks a worker and sends start-thread when active < concurrency", () => {
|
||||
it("forks a worker and sends start-thread when active < concurrency", async () => {
|
||||
const logStore = makeLogStore();
|
||||
const config = makeConfig({
|
||||
"my-workflow": { concurrency: 2, overflow: "drop" },
|
||||
@@ -118,6 +123,7 @@ describe("WorkflowManager", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-workflow", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(mockChildren).toHaveLength(1);
|
||||
expect(mockChildren[0].send).toHaveBeenCalledWith(
|
||||
@@ -126,7 +132,7 @@ describe("WorkflowManager", () => {
|
||||
expect(mgr.activeCount("my-workflow")).toBe(1);
|
||||
});
|
||||
|
||||
it("reuses the same worker for a second thread under the limit", () => {
|
||||
it("reuses the same worker for a second thread under the limit", async () => {
|
||||
const logStore = makeLogStore();
|
||||
const config = makeConfig({
|
||||
"my-workflow": { concurrency: 3, overflow: "drop" },
|
||||
@@ -135,6 +141,7 @@ describe("WorkflowManager", () => {
|
||||
|
||||
mgr.startWorkflow("my-workflow", { prompt: "test 1", maxRounds: 10, dryRun: false });
|
||||
mgr.startWorkflow("my-workflow", { prompt: "test 2", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Only one forked child — worker is reused
|
||||
expect(mockChildren).toHaveLength(1);
|
||||
@@ -142,7 +149,7 @@ describe("WorkflowManager", () => {
|
||||
expect(mgr.activeCount("my-workflow")).toBe(2);
|
||||
});
|
||||
|
||||
it("logs a 'started' event for each dispatched thread", () => {
|
||||
it("logs a 'started' event for each dispatched thread", async () => {
|
||||
const logStore = makeLogStore();
|
||||
const config = makeConfig({
|
||||
"my-workflow": { concurrency: 2, overflow: "drop" },
|
||||
@@ -150,6 +157,7 @@ describe("WorkflowManager", () => {
|
||||
const mgr = createWorkflowManager("/nerve-root", config, logStore);
|
||||
|
||||
mgr.startWorkflow("my-workflow", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(logStore.upsertWorkflowRun).toHaveBeenCalledWith(
|
||||
expect.objectContaining({ source: "workflow", type: "started" }),
|
||||
@@ -159,7 +167,7 @@ describe("WorkflowManager", () => {
|
||||
});
|
||||
|
||||
describe("startWorkflow at limit with drop overflow drops the request", () => {
|
||||
it("does NOT send start-thread when at concurrency limit with overflow=drop", () => {
|
||||
it("does NOT send start-thread when at concurrency limit with overflow=drop", async () => {
|
||||
const logStore = makeLogStore();
|
||||
const config = makeConfig({
|
||||
"drop-wf": { concurrency: 1, overflow: "drop" },
|
||||
@@ -169,6 +177,7 @@ describe("WorkflowManager", () => {
|
||||
mgr.startWorkflow("drop-wf", { prompt: "first", maxRounds: 10, dryRun: false });
|
||||
// now at limit — second call should be dropped
|
||||
mgr.startWorkflow("drop-wf", { prompt: "second", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(mgr.activeCount("drop-wf")).toBe(1);
|
||||
expect(mgr.queueLength("drop-wf")).toBe(0);
|
||||
@@ -254,7 +263,7 @@ describe("WorkflowManager", () => {
|
||||
});
|
||||
|
||||
describe("completing a thread dequeues the next one", () => {
|
||||
it("dispatches the next queued thread when the active thread sends completed", () => {
|
||||
it("dispatches the next queued thread when the active thread sends completed", async () => {
|
||||
const logStore = makeLogStore();
|
||||
const config = makeConfig({
|
||||
"queue-wf": { concurrency: 1, overflow: "queue", maxQueue: 5 },
|
||||
@@ -263,6 +272,7 @@ describe("WorkflowManager", () => {
|
||||
|
||||
mgr.startWorkflow("queue-wf", { prompt: "first", maxRounds: 10, dryRun: false });
|
||||
mgr.startWorkflow("queue-wf", { prompt: "second", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
expect(mgr.activeCount("queue-wf")).toBe(1);
|
||||
expect(mgr.queueLength("queue-wf")).toBe(1);
|
||||
@@ -289,7 +299,7 @@ describe("WorkflowManager", () => {
|
||||
);
|
||||
});
|
||||
|
||||
it("dispatches next queued thread when active thread sends failed", () => {
|
||||
it("dispatches next queued thread when active thread sends failed", async () => {
|
||||
const logStore = makeLogStore();
|
||||
const config = makeConfig({
|
||||
"queue-wf": { concurrency: 1, overflow: "queue", maxQueue: 5 },
|
||||
@@ -298,6 +308,7 @@ describe("WorkflowManager", () => {
|
||||
|
||||
mgr.startWorkflow("queue-wf", { prompt: "first", maxRounds: 10, dryRun: false });
|
||||
mgr.startWorkflow("queue-wf", { prompt: "second", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
const child = mockChildren[0];
|
||||
const firstRunId = (child.send as ReturnType<typeof vi.fn>).mock.calls[0][0].runId as string;
|
||||
@@ -325,6 +336,7 @@ describe("WorkflowManager", () => {
|
||||
|
||||
mgr.startWorkflow("wf-a", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
mgr.startWorkflow("wf-b", { prompt: "test", maxRounds: 10, dryRun: false });
|
||||
await vi.runAllTimersAsync();
|
||||
|
||||
// Two distinct workers should have been forked
|
||||
expect(mockChildren).toHaveLength(2);
|
||||
|
||||
@@ -1,9 +0,0 @@
|
||||
import type { AgentConfig, AgentFn, ThreadContext } from "@uncaged/nerve-core";
|
||||
|
||||
/**
|
||||
* Echo adapter (`type: "echo"`) — returns the assembled prompt unchanged.
|
||||
* Used for tests and dry-run wiring before real adapters exist.
|
||||
*/
|
||||
export function createEchoAgent(_config: AgentConfig): AgentFn {
|
||||
return async (_ctx: ThreadContext, prompt: string) => prompt;
|
||||
}
|
||||
@@ -56,5 +56,3 @@ export type {
|
||||
|
||||
export { createWorkflowManager } from "./workflow-manager.js";
|
||||
export type { WorkflowManager } from "./workflow-manager.js";
|
||||
|
||||
export { createEchoAgent } from "./agent-adapters/echo.js";
|
||||
|
||||
@@ -25,7 +25,7 @@ import type { WorkerToParentMessage } from "./ipc.js";
|
||||
import { parseParentMessage } from "./ipc.js";
|
||||
import { executeCompute, loadSenseModule, openSenseDb } from "./sense-runtime.js";
|
||||
import type { SenseRuntime } from "./sense-runtime.js";
|
||||
import { ignoreSessionBroadcastSignals } from "./worker-fork-support.js";
|
||||
import { ignoreSessionBroadcastSignals } from "./worker-signals.js";
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// IPC helpers
|
||||
|
||||
@@ -1,45 +0,0 @@
|
||||
import type { ChildProcess } from "node:child_process";
|
||||
|
||||
const STDERR_TAIL_MAX_CHARS = 16_384;
|
||||
|
||||
/**
|
||||
* Forked workers inherit the parent's process group. In foreground `nerve dev`,
|
||||
* terminal-driven SIGINT/SIGTERM is delivered to the whole group, so workers can exit
|
||||
* on the default handler before the kernel sends `{ type: "shutdown" }` over IPC.
|
||||
* Swallow these in worker processes so the parent coordinates shutdown (issue #55).
|
||||
* Only call when `process.send` is defined (fork IPC); standalone `node …-worker.js` keeps default Ctrl+C behaviour.
|
||||
*/
|
||||
export function ignoreSessionBroadcastSignals(): void {
|
||||
const swallow = (): void => {};
|
||||
process.on("SIGINT", swallow);
|
||||
process.on("SIGTERM", swallow);
|
||||
}
|
||||
|
||||
export function teeCapturedStderr(child: ChildProcess, tail: { value: string }): void {
|
||||
const stream = child.stderr;
|
||||
if (stream === null || stream === undefined) return;
|
||||
stream.setEncoding("utf8");
|
||||
stream.on("data", (chunk: string | Buffer) => {
|
||||
const text = typeof chunk === "string" ? chunk : chunk.toString("utf8");
|
||||
process.stderr.write(text);
|
||||
tail.value = (tail.value + text).slice(-STDERR_TAIL_MAX_CHARS);
|
||||
});
|
||||
}
|
||||
|
||||
export function formatChildExitSummary(code: number | null, signal: NodeJS.Signals | null): string {
|
||||
const codeStr = code === null || code === undefined ? "null" : String(code);
|
||||
if (signal) {
|
||||
return `code=${codeStr} signal=${signal}`;
|
||||
}
|
||||
return `code=${codeStr}`;
|
||||
}
|
||||
|
||||
export function formatCapturedStderrTail(tail: string, maxChars = 800): string {
|
||||
const trimmed = tail.trim();
|
||||
if (trimmed.length === 0) return "";
|
||||
const normalized = trimmed.replace(/\r?\n/g, "\\n");
|
||||
if (normalized.length <= maxChars) {
|
||||
return ` worker_stderr=${normalized}`;
|
||||
}
|
||||
return ` worker_stderr=…${normalized.slice(-maxChars)}`;
|
||||
}
|
||||
@@ -1,19 +1,16 @@
|
||||
/**
|
||||
* Sense worker pool — forked child processes per sense group (IPC lifecycle).
|
||||
* Sense worker pool — thin wrapper around WorkerRuntime (RFC-006): one fork per sense group.
|
||||
*/
|
||||
|
||||
import { fork } from "node:child_process";
|
||||
import type { ChildProcess } from "node:child_process";
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
|
||||
import type { ComputeMessage, ShutdownMessage } from "./ipc.js";
|
||||
import { parseWorkerMessage } from "./ipc.js";
|
||||
import type { ComputeMessage } from "./ipc.js";
|
||||
import {
|
||||
createWorkerRuntime,
|
||||
formatCapturedStderrTail,
|
||||
formatChildExitSummary,
|
||||
teeCapturedStderr,
|
||||
} from "./worker-fork-support.js";
|
||||
} from "./worker-runtime.js";
|
||||
|
||||
export function resolveWorkerScript(): string {
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
@@ -21,17 +18,12 @@ export function resolveWorkerScript(): string {
|
||||
return join(__dir, "sense-worker.js");
|
||||
}
|
||||
|
||||
type WorkerEntry = {
|
||||
group: string;
|
||||
process: ChildProcess;
|
||||
};
|
||||
|
||||
export type SenseWorkerPoolOptions = {
|
||||
nerveRoot: string;
|
||||
workerScript: string;
|
||||
/** Invoked for every IPC message from a worker (including ready / signal / error). */
|
||||
onWorkerMessage: (raw: unknown) => void;
|
||||
/** Sense names in a group — used when clearing scheduler state on crash or restart. */
|
||||
/** Sense names in a group — reserved for scheduler-aligned cleanup (kernel passes current config). */
|
||||
sensesForGroup: (group: string) => string[];
|
||||
/**
|
||||
* Called when a worker exits with non-zero code before scheduling a respawn
|
||||
@@ -58,144 +50,107 @@ export type SenseWorkerPool = {
|
||||
activeGroupCount: () => number;
|
||||
};
|
||||
|
||||
function spawnWorker(
|
||||
nerveRoot: string,
|
||||
group: string,
|
||||
workerScript: string,
|
||||
stderrTail: { value: string },
|
||||
): ChildProcess {
|
||||
const child = fork(workerScript, ["--group", group, "--root", nerveRoot], {
|
||||
stdio: ["ignore", "inherit", "pipe", "ipc"],
|
||||
});
|
||||
teeCapturedStderr(child, stderrTail);
|
||||
child.on("error", (err) => {
|
||||
if ((err as NodeJS.ErrnoException).code !== "EPIPE") {
|
||||
console.error("[worker] error:", err.message);
|
||||
}
|
||||
});
|
||||
return child;
|
||||
}
|
||||
|
||||
function sendComputeToProcess(worker: ChildProcess, senseName: string): void {
|
||||
if (worker.connected === false) return;
|
||||
const msg: ComputeMessage = { type: "compute", sense: senseName };
|
||||
try {
|
||||
worker.send(msg);
|
||||
} catch {
|
||||
// IPC channel closed between connected check and send
|
||||
}
|
||||
}
|
||||
|
||||
function sendShutdownToProcess(worker: ChildProcess): void {
|
||||
if (worker.connected === false) return;
|
||||
const msg: ShutdownMessage = { type: "shutdown" };
|
||||
try {
|
||||
worker.send(msg);
|
||||
} catch {
|
||||
// IPC channel closed between connected check and send
|
||||
}
|
||||
}
|
||||
|
||||
function waitForExit(child: ChildProcess, timeoutMs: number): Promise<void> {
|
||||
return new Promise((resolve) => {
|
||||
const timer = setTimeout(() => {
|
||||
child.kill("SIGKILL");
|
||||
resolve();
|
||||
}, timeoutMs);
|
||||
child.once("exit", () => {
|
||||
clearTimeout(timer);
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
}
|
||||
/** Matches legacy pool: long crash window, 1s respawn delay, practical unlimited respawns. */
|
||||
const SENSE_WORKER_RESPAWN = {
|
||||
enabled: true,
|
||||
maxCrashes: 100_000,
|
||||
windowMs: 86_400_000,
|
||||
delayMs: 1000,
|
||||
} as const;
|
||||
|
||||
export function createSenseWorkerPool(options: SenseWorkerPoolOptions): SenseWorkerPool {
|
||||
const workers = new Map<string, WorkerEntry>();
|
||||
|
||||
function startWorker(group: string): Promise<void> {
|
||||
const stderrTail = { value: "" };
|
||||
const child = spawnWorker(options.nerveRoot, group, options.workerScript, stderrTail);
|
||||
|
||||
let workerReadyResolve: (() => void) | undefined;
|
||||
const workerReady = new Promise<void>((resolve) => {
|
||||
workerReadyResolve = resolve;
|
||||
});
|
||||
|
||||
child.on("message", (raw: unknown) => {
|
||||
const result = parseWorkerMessage(raw);
|
||||
if (result.ok && result.value.type === "ready") {
|
||||
workerReadyResolve?.();
|
||||
}
|
||||
const runtime = createWorkerRuntime<string>({
|
||||
script: options.workerScript,
|
||||
argsForKey: (group) => ["--group", group, "--root", options.nerveRoot],
|
||||
forwardStderr: true,
|
||||
onMessage: (_key, raw) => {
|
||||
options.onWorkerMessage(raw);
|
||||
});
|
||||
|
||||
child.on("exit", (code, signal) => {
|
||||
const summary = formatChildExitSummary(code, signal ?? null);
|
||||
},
|
||||
onReady: (_key, msg) => {
|
||||
options.onWorkerMessage(msg);
|
||||
},
|
||||
onCrashLimitReached: null,
|
||||
onExit: (group, code, signal) => {
|
||||
const sig =
|
||||
signal === null || signal === undefined || signal === ""
|
||||
? null
|
||||
: (signal as NodeJS.Signals);
|
||||
const summary = formatChildExitSummary(code, sig);
|
||||
process.stderr.write(
|
||||
`[kernel] worker for group "${group}" exited (${summary})${formatCapturedStderrTail(stderrTail.value)}\n`,
|
||||
`[kernel] worker for group "${group}" exited (${summary})${formatCapturedStderrTail(runtime.stderrTail(group))}\n`,
|
||||
);
|
||||
workerReadyResolve?.();
|
||||
if (!options.isStopped() && code !== 0) {
|
||||
process.stderr.write(`[kernel] respawning worker for group "${group}" in 1s\n`);
|
||||
options.onWorkerCrashed(group);
|
||||
setTimeout(() => {
|
||||
if (!options.isStopped()) {
|
||||
startWorker(group);
|
||||
}
|
||||
}, 1000);
|
||||
}
|
||||
});
|
||||
},
|
||||
respawn: {
|
||||
...SENSE_WORKER_RESPAWN,
|
||||
allowRespawn: (_group) => !options.isStopped(),
|
||||
},
|
||||
shutdownTimeoutMs: 5000,
|
||||
});
|
||||
|
||||
workers.set(group, { group, process: child });
|
||||
return workerReady;
|
||||
/** Groups we have ever started — mirrors legacy Map presence for `restartGroup` no-op when unknown. */
|
||||
const trackedGroups = new Set<string>();
|
||||
/** Marks groups mid-evict so `hasWorkerForGroup` drops immediately (legacy synchronous eviction). */
|
||||
const evicting = new Set<string>();
|
||||
|
||||
async function startWorker(group: string): Promise<void> {
|
||||
trackedGroups.add(group);
|
||||
await runtime.start(group);
|
||||
}
|
||||
|
||||
async function restartGroup(group: string): Promise<void> {
|
||||
const entry = workers.get(group);
|
||||
if (entry === undefined) return;
|
||||
|
||||
options.onBeforeGroupRestart(group);
|
||||
|
||||
sendShutdownToProcess(entry.process);
|
||||
await waitForExit(entry.process, 5000);
|
||||
|
||||
if (!options.isStopped()) {
|
||||
await startWorker(group);
|
||||
if (!trackedGroups.has(group)) {
|
||||
return;
|
||||
}
|
||||
options.onBeforeGroupRestart(group);
|
||||
await runtime.drain(group, null);
|
||||
}
|
||||
|
||||
function evictGroup(group: string): void {
|
||||
const entry = workers.get(group);
|
||||
if (entry === undefined) return;
|
||||
sendShutdownToProcess(entry.process);
|
||||
workers.delete(group);
|
||||
trackedGroups.delete(group);
|
||||
evicting.add(group);
|
||||
void runtime.evict(group, null).finally(() => {
|
||||
evicting.delete(group);
|
||||
});
|
||||
}
|
||||
|
||||
async function shutdownAll(): Promise<void> {
|
||||
const exitPromises: Promise<void>[] = [];
|
||||
for (const entry of workers.values()) {
|
||||
sendShutdownToProcess(entry.process);
|
||||
exitPromises.push(waitForExit(entry.process, 5000));
|
||||
}
|
||||
await Promise.all(exitPromises);
|
||||
await runtime.shutdown();
|
||||
trackedGroups.clear();
|
||||
evicting.clear();
|
||||
}
|
||||
|
||||
function sendCompute(group: string, senseName: string): void {
|
||||
const entry = workers.get(group);
|
||||
if (entry === undefined) return;
|
||||
sendComputeToProcess(entry.process, senseName);
|
||||
if (!trackedGroups.has(group) || evicting.has(group)) {
|
||||
return;
|
||||
}
|
||||
// Legacy pool: `child.send` no-op when IPC is closed (still allow cold start: child === null).
|
||||
if (runtime.hasDisconnectedChild(group)) {
|
||||
return;
|
||||
}
|
||||
const msg: ComputeMessage = { type: "compute", sense: senseName };
|
||||
if (!runtime.trySendSync(group, msg)) {
|
||||
void runtime.send(group, msg).catch(() => {
|
||||
// IPC channel may close between scheduling and send — same as legacy try/catch on child.send
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
function getWorkerPid(group: string): number | null {
|
||||
return workers.get(group)?.process.pid ?? null;
|
||||
return runtime.pid(group);
|
||||
}
|
||||
|
||||
/** True once `startWorker` has been called for the group and it is not mid-evict (matches legacy Map key). */
|
||||
function hasWorkerForGroup(group: string): boolean {
|
||||
return workers.has(group);
|
||||
return trackedGroups.has(group) && !evicting.has(group);
|
||||
}
|
||||
|
||||
/** Count of sense groups with a worker slot (includes not-yet-ready), excluding evicted keys. */
|
||||
function activeGroupCount(): number {
|
||||
return workers.size;
|
||||
return trackedGroups.size;
|
||||
}
|
||||
|
||||
return {
|
||||
|
||||
@@ -0,0 +1,440 @@
|
||||
/**
|
||||
* Generic message-routed worker process manager (RFC-006).
|
||||
* One forked Node child per key; cold start, crash respawn, drain/evict, shutdown.
|
||||
*/
|
||||
|
||||
import { type ChildProcess, type Serializable, fork } from "node:child_process";
|
||||
import { isPlainRecord } from "@uncaged/nerve-core";
|
||||
|
||||
const STDERR_TAIL_MAX_CHARS = 2048;
|
||||
|
||||
export function formatChildExitSummary(code: number | null, signal: NodeJS.Signals | null): string {
|
||||
const codeStr = code === null || code === undefined ? "null" : String(code);
|
||||
if (signal) {
|
||||
return `code=${codeStr} signal=${signal}`;
|
||||
}
|
||||
return `code=${codeStr}`;
|
||||
}
|
||||
|
||||
export function formatCapturedStderrTail(tail: string, maxChars = 800): string {
|
||||
const trimmed = tail.trim();
|
||||
if (trimmed.length === 0) return "";
|
||||
const normalized = trimmed.replace(/\r?\n/g, "\\n");
|
||||
if (normalized.length <= maxChars) {
|
||||
return ` worker_stderr=${normalized}`;
|
||||
}
|
||||
return ` worker_stderr=…${normalized.slice(-maxChars)}`;
|
||||
}
|
||||
|
||||
export type WorkerDrainOpts = {
|
||||
shutdownTimeoutMs: number | null;
|
||||
};
|
||||
|
||||
export type WorkerRuntimeConfig<K extends string> = {
|
||||
script: string;
|
||||
argsForKey: (key: K) => string[];
|
||||
/** When false, stderr is not captured into `stderrTail` (e.g. tests without a pipe). */
|
||||
forwardStderr: boolean;
|
||||
onMessage: (key: K, msg: unknown) => void;
|
||||
onReady: (key: K, msg: unknown) => void;
|
||||
onExit: (key: K, code: number | null, signal: string | null) => void;
|
||||
/** Invoked when automatic respawn is skipped because `maxCrashes` was exceeded in `windowMs`. */
|
||||
onCrashLimitReached: ((key: K) => void) | null;
|
||||
respawn: {
|
||||
enabled: boolean;
|
||||
maxCrashes: number;
|
||||
windowMs: number;
|
||||
delayMs: number;
|
||||
/** When non-null, return false to skip automatic respawn after an unexpected exit. */
|
||||
allowRespawn: ((key: K) => boolean) | null;
|
||||
};
|
||||
shutdownTimeoutMs: number;
|
||||
};
|
||||
|
||||
export type WorkerRuntime<K extends string> = {
|
||||
send: (key: K, msg: unknown) => Promise<void>;
|
||||
/** When the worker is already ready and IPC-connected, sends synchronously (returns true). Otherwise false — caller may fall back to `send`. */
|
||||
trySendSync: (key: K, msg: unknown) => boolean;
|
||||
start: (key: K) => Promise<void>;
|
||||
evict: (key: K, opts: WorkerDrainOpts | null) => Promise<void>;
|
||||
drain: (key: K, opts: WorkerDrainOpts | null) => Promise<void>;
|
||||
shutdown: () => Promise<void>;
|
||||
has: (key: K) => boolean;
|
||||
/** True when a child exists but IPC is disconnected (legacy pool skipped sends in this case). */
|
||||
hasDisconnectedChild: (key: K) => boolean;
|
||||
pid: (key: K) => number | null;
|
||||
keys: () => K[];
|
||||
stderrTail: (key: K) => string;
|
||||
};
|
||||
|
||||
type WorkerMachineState = "stopped" | "starting" | "ready" | "draining";
|
||||
|
||||
type ReadyWaiter = {
|
||||
resolve: () => void;
|
||||
reject: (err: Error) => void;
|
||||
};
|
||||
|
||||
/** Internal: one forked process slot (ManagedWorker). */
|
||||
type WorkerSlot<K extends string> = {
|
||||
key: K;
|
||||
state: WorkerMachineState;
|
||||
child: ChildProcess | null;
|
||||
pid: number | null;
|
||||
stderrTail: string;
|
||||
crashTimestamps: number[];
|
||||
expectExit: boolean;
|
||||
readyWaiters: ReadyWaiter[];
|
||||
opChain: Promise<void>;
|
||||
};
|
||||
|
||||
function isReadyIpcMessage(raw: unknown): boolean {
|
||||
return isPlainRecord(raw) && raw.type === "ready";
|
||||
}
|
||||
|
||||
function signalToString(signal: NodeJS.Signals | null): string | null {
|
||||
if (signal === null) {
|
||||
return null;
|
||||
}
|
||||
return String(signal);
|
||||
}
|
||||
|
||||
function attachStderrTail<K extends string>(child: ChildProcess, slot: WorkerSlot<K>): void {
|
||||
const stream = child.stderr;
|
||||
if (stream == null) {
|
||||
return;
|
||||
}
|
||||
stream.setEncoding("utf8");
|
||||
stream.on("data", (chunk: string | Buffer) => {
|
||||
const text = typeof chunk === "string" ? chunk : chunk.toString("utf8");
|
||||
slot.stderrTail = (slot.stderrTail + text).slice(-STDERR_TAIL_MAX_CHARS);
|
||||
});
|
||||
}
|
||||
|
||||
function enqueueOp<K extends string>(slot: WorkerSlot<K>, fn: () => Promise<void>): Promise<void> {
|
||||
const run = slot.opChain.then(fn, fn);
|
||||
slot.opChain = run.then(
|
||||
() => {},
|
||||
() => {},
|
||||
);
|
||||
return run;
|
||||
}
|
||||
|
||||
function resolveReadyWaiters<K extends string>(slot: WorkerSlot<K>): void {
|
||||
const waiters = slot.readyWaiters;
|
||||
slot.readyWaiters = [];
|
||||
for (const w of waiters) {
|
||||
w.resolve();
|
||||
}
|
||||
}
|
||||
|
||||
function rejectReadyWaiters<K extends string>(slot: WorkerSlot<K>, err: Error): void {
|
||||
const waiters = slot.readyWaiters;
|
||||
slot.readyWaiters = [];
|
||||
for (const w of waiters) {
|
||||
w.reject(err);
|
||||
}
|
||||
}
|
||||
|
||||
function waitForReady<K extends string>(
|
||||
slot: WorkerSlot<K>,
|
||||
shutdownTimeoutMs: number,
|
||||
): Promise<void> {
|
||||
if (slot.state === "ready" && slot.child !== null && slot.child.connected) {
|
||||
return Promise.resolve();
|
||||
}
|
||||
return new Promise((resolve, reject) => {
|
||||
let settled = false;
|
||||
const timer = setTimeout(() => {
|
||||
if (!settled) {
|
||||
settled = true;
|
||||
reject(new Error(`Worker "${String(slot.key)}" ready timeout`));
|
||||
}
|
||||
}, shutdownTimeoutMs);
|
||||
slot.readyWaiters.push({
|
||||
resolve: () => {
|
||||
if (settled) {
|
||||
return;
|
||||
}
|
||||
settled = true;
|
||||
clearTimeout(timer);
|
||||
resolve();
|
||||
},
|
||||
reject: (err: Error) => {
|
||||
if (settled) {
|
||||
return;
|
||||
}
|
||||
settled = true;
|
||||
clearTimeout(timer);
|
||||
reject(err);
|
||||
},
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
async function waitForChildExit(child: ChildProcess, timeoutMs: number): Promise<void> {
|
||||
await new Promise<void>((resolve) => {
|
||||
const timer = setTimeout(() => {
|
||||
child.kill("SIGKILL");
|
||||
}, timeoutMs);
|
||||
child.once("exit", () => {
|
||||
clearTimeout(timer);
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
export function createWorkerRuntime<K extends string>(
|
||||
config: WorkerRuntimeConfig<K>,
|
||||
): WorkerRuntime<K> {
|
||||
const workers = new Map<K, WorkerSlot<K>>();
|
||||
|
||||
function getOrCreateSlot(key: K): WorkerSlot<K> {
|
||||
let slot = workers.get(key);
|
||||
if (slot === undefined) {
|
||||
slot = {
|
||||
key,
|
||||
state: "stopped",
|
||||
child: null,
|
||||
pid: null,
|
||||
stderrTail: "",
|
||||
crashTimestamps: [],
|
||||
expectExit: false,
|
||||
readyWaiters: [],
|
||||
opChain: Promise.resolve(),
|
||||
};
|
||||
workers.set(key, slot);
|
||||
}
|
||||
return slot;
|
||||
}
|
||||
|
||||
function handleWorkerMessage(slot: WorkerSlot<K>, msg: unknown): void {
|
||||
if (isReadyIpcMessage(msg)) {
|
||||
if (slot.state === "starting") {
|
||||
slot.state = "ready";
|
||||
config.onReady(slot.key, msg);
|
||||
resolveReadyWaiters(slot);
|
||||
}
|
||||
return;
|
||||
}
|
||||
config.onMessage(slot.key, msg);
|
||||
}
|
||||
|
||||
function onChildExit(
|
||||
slot: WorkerSlot<K>,
|
||||
code: number | null,
|
||||
signal: NodeJS.Signals | null,
|
||||
): void {
|
||||
config.onExit(slot.key, code, signalToString(signal));
|
||||
|
||||
if (slot.child !== null) {
|
||||
slot.child.removeAllListeners("message");
|
||||
slot.child.removeAllListeners("exit");
|
||||
}
|
||||
|
||||
const wasExpect = slot.expectExit;
|
||||
slot.expectExit = false;
|
||||
|
||||
slot.child = null;
|
||||
slot.pid = null;
|
||||
|
||||
if (wasExpect) {
|
||||
slot.state = "stopped";
|
||||
return;
|
||||
}
|
||||
|
||||
rejectReadyWaiters(slot, new Error(`Worker "${String(slot.key)}" exited unexpectedly`));
|
||||
slot.state = "stopped";
|
||||
|
||||
void enqueueOp(slot, async () => {
|
||||
await handleUnexpectedCrashRecovery(slot);
|
||||
});
|
||||
}
|
||||
|
||||
function registerChild(slot: WorkerSlot<K>, child: ChildProcess): void {
|
||||
slot.child = child;
|
||||
slot.pid = child.pid ?? null;
|
||||
if (config.forwardStderr) {
|
||||
attachStderrTail(child, slot);
|
||||
}
|
||||
child.on("message", (msg: unknown) => {
|
||||
handleWorkerMessage(slot, msg);
|
||||
});
|
||||
child.on("exit", (code, sig) => {
|
||||
onChildExit(slot, code, sig ?? null);
|
||||
});
|
||||
}
|
||||
|
||||
async function forkAndWaitReady(slot: WorkerSlot<K>): Promise<void> {
|
||||
if (slot.state === "ready" && slot.child !== null && slot.child.connected) {
|
||||
return;
|
||||
}
|
||||
|
||||
slot.state = "starting";
|
||||
|
||||
let child: ChildProcess;
|
||||
try {
|
||||
child = fork(config.script, config.argsForKey(slot.key), {
|
||||
stdio: ["ignore", "inherit", "pipe", "ipc"],
|
||||
env: process.env,
|
||||
});
|
||||
} catch (e) {
|
||||
slot.state = "stopped";
|
||||
const err = e instanceof Error ? e : new Error(String(e));
|
||||
rejectReadyWaiters(slot, err);
|
||||
throw err;
|
||||
}
|
||||
|
||||
registerChild(slot, child);
|
||||
await waitForReady(slot, config.shutdownTimeoutMs);
|
||||
}
|
||||
|
||||
function resolveShutdownTimeoutMs(opts: WorkerDrainOpts | null): number {
|
||||
if (opts !== null && opts.shutdownTimeoutMs !== null) {
|
||||
return opts.shutdownTimeoutMs;
|
||||
}
|
||||
return config.shutdownTimeoutMs;
|
||||
}
|
||||
|
||||
async function gracefulStop(slot: WorkerSlot<K>, shutdownTimeoutMs: number): Promise<void> {
|
||||
if (slot.child === null) {
|
||||
return;
|
||||
}
|
||||
slot.expectExit = true;
|
||||
slot.state = "draining";
|
||||
const child = slot.child;
|
||||
try {
|
||||
child.send({ type: "shutdown" });
|
||||
} catch {
|
||||
// IPC channel may have closed between null-check and send
|
||||
}
|
||||
await waitForChildExit(child, shutdownTimeoutMs);
|
||||
}
|
||||
|
||||
async function handleUnexpectedCrashRecovery(slot: WorkerSlot<K>): Promise<void> {
|
||||
if (!config.respawn.enabled) {
|
||||
return;
|
||||
}
|
||||
if (config.respawn.allowRespawn !== null && !config.respawn.allowRespawn(slot.key)) {
|
||||
return;
|
||||
}
|
||||
|
||||
const now = Date.now();
|
||||
slot.crashTimestamps.push(now);
|
||||
slot.crashTimestamps = slot.crashTimestamps.filter((t) => now - t <= config.respawn.windowMs);
|
||||
|
||||
if (slot.crashTimestamps.length >= config.respawn.maxCrashes) {
|
||||
console.error(
|
||||
`[WorkerRuntime] worker "${String(slot.key)}" exceeded crash limit (${String(config.respawn.maxCrashes)} in ${String(config.respawn.windowMs)}ms); not respawning`,
|
||||
);
|
||||
if (config.onCrashLimitReached !== null) {
|
||||
config.onCrashLimitReached(slot.key);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
await new Promise<void>((resolve) => setTimeout(resolve, config.respawn.delayMs));
|
||||
await forkAndWaitReady(slot);
|
||||
}
|
||||
|
||||
async function shutdownWorker(slot: WorkerSlot<K>): Promise<void> {
|
||||
await gracefulStop(slot, config.shutdownTimeoutMs);
|
||||
workers.delete(slot.key);
|
||||
}
|
||||
|
||||
function isActive(slot: WorkerSlot<K>): boolean {
|
||||
return slot.state === "ready" && slot.child !== null && slot.child.connected;
|
||||
}
|
||||
|
||||
return {
|
||||
send: async (key: K, msg: unknown) => {
|
||||
const slot = getOrCreateSlot(key);
|
||||
await enqueueOp(slot, async () => {
|
||||
await forkAndWaitReady(slot);
|
||||
const child = slot.child;
|
||||
if (child === null || !child.connected) {
|
||||
throw new Error(`Worker "${String(key)}" is not connected`);
|
||||
}
|
||||
child.send(msg as Serializable);
|
||||
});
|
||||
},
|
||||
|
||||
trySendSync: (key: K, msg: unknown): boolean => {
|
||||
const slot = workers.get(key);
|
||||
if (slot === undefined || !isActive(slot)) {
|
||||
return false;
|
||||
}
|
||||
const child = slot.child;
|
||||
if (child === null || !child.connected) {
|
||||
return false;
|
||||
}
|
||||
try {
|
||||
child.send(msg as Serializable);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
},
|
||||
|
||||
start: async (key: K) => {
|
||||
const slot = getOrCreateSlot(key);
|
||||
await enqueueOp(slot, async () => {
|
||||
await forkAndWaitReady(slot);
|
||||
});
|
||||
},
|
||||
|
||||
evict: async (key: K, opts: WorkerDrainOpts | null) => {
|
||||
const slot = getOrCreateSlot(key);
|
||||
const shutdownMs = resolveShutdownTimeoutMs(opts);
|
||||
await enqueueOp(slot, async () => {
|
||||
await gracefulStop(slot, shutdownMs);
|
||||
workers.delete(key);
|
||||
});
|
||||
},
|
||||
|
||||
drain: async (key: K, opts: WorkerDrainOpts | null) => {
|
||||
const slot = getOrCreateSlot(key);
|
||||
const shutdownMs = resolveShutdownTimeoutMs(opts);
|
||||
await enqueueOp(slot, async () => {
|
||||
if (slot.child === null) {
|
||||
await forkAndWaitReady(slot);
|
||||
return;
|
||||
}
|
||||
await gracefulStop(slot, shutdownMs);
|
||||
await forkAndWaitReady(slot);
|
||||
});
|
||||
},
|
||||
|
||||
shutdown: async () => {
|
||||
const snapshot = [...workers.values()];
|
||||
await Promise.all(snapshot.map((slot) => enqueueOp(slot, () => shutdownWorker(slot))));
|
||||
},
|
||||
|
||||
has: (key: K) => {
|
||||
const slot = workers.get(key);
|
||||
return slot !== undefined && isActive(slot);
|
||||
},
|
||||
|
||||
hasDisconnectedChild: (key: K): boolean => {
|
||||
const slot = workers.get(key);
|
||||
if (slot === undefined || slot.child === null) {
|
||||
return false;
|
||||
}
|
||||
return !slot.child.connected;
|
||||
},
|
||||
|
||||
pid: (key: K) => {
|
||||
const slot = workers.get(key);
|
||||
if (slot === undefined || !isActive(slot) || slot.pid === null) {
|
||||
return null;
|
||||
}
|
||||
return slot.pid;
|
||||
},
|
||||
|
||||
keys: () => [...workers.values()].filter((slot) => isActive(slot)).map((slot) => slot.key),
|
||||
|
||||
stderrTail: (key: K) => {
|
||||
const slot = workers.get(key);
|
||||
return slot === undefined ? "" : slot.stderrTail;
|
||||
},
|
||||
};
|
||||
}
|
||||
@@ -0,0 +1,17 @@
|
||||
/**
|
||||
* Worker-process signal handling (fork IPC children only).
|
||||
* Worker entrypoints import this module — not worker-runtime.ts (parent/kernel code).
|
||||
*/
|
||||
|
||||
/**
|
||||
* Forked workers inherit the parent's process group. In foreground `nerve dev`,
|
||||
* terminal-driven SIGINT/SIGTERM is delivered to the whole group, so workers can exit
|
||||
* on the default handler before the kernel sends `{ type: "shutdown" }` over IPC.
|
||||
* Swallow these in worker processes so the parent coordinates shutdown (issue #55).
|
||||
* Only call when `process.send` is defined (fork IPC); standalone `node …-worker.js` keeps default Ctrl+C behaviour.
|
||||
*/
|
||||
export function ignoreSessionBroadcastSignals(): void {
|
||||
const swallow = (): void => {};
|
||||
process.on("SIGINT", swallow);
|
||||
process.on("SIGTERM", swallow);
|
||||
}
|
||||
@@ -0,0 +1,256 @@
|
||||
/**
|
||||
* Pure helpers and IPC branching for workflow-manager (keeps workflow-manager.ts lean).
|
||||
*/
|
||||
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
|
||||
import type { WorkflowMessage } from "@uncaged/nerve-core";
|
||||
import { START, isPlainRecord } from "@uncaged/nerve-core";
|
||||
|
||||
import type { LogStore, WorkflowRunStatus } from "@uncaged/nerve-store";
|
||||
import type { ResumeThreadMessage, ThreadEventMessage } from "./ipc.js";
|
||||
import type { WorkerToParentMessage } from "./ipc.js";
|
||||
|
||||
export type PendingThread = {
|
||||
runId: string;
|
||||
prompt: string;
|
||||
maxRounds: number;
|
||||
dryRun: boolean;
|
||||
};
|
||||
|
||||
export type WorkflowState = {
|
||||
active: Set<string>;
|
||||
queue: PendingThread[];
|
||||
};
|
||||
|
||||
/** Matches legacy manager: 6 crashes within 60s stops respawn (was `length > 5`). */
|
||||
export const WORKFLOW_WORKER_RESPAWN = {
|
||||
enabled: true,
|
||||
maxCrashes: 6,
|
||||
windowMs: 60_000,
|
||||
delayMs: 0,
|
||||
} as const;
|
||||
|
||||
/**
|
||||
* Worker shutdown timeout — must stay in sync with SHUTDOWN_TIMEOUT_MS in workflow-worker.ts.
|
||||
* The drain timeout passed to drainAndRespawn must be >= this value so the worker has
|
||||
* enough time to finish in-flight threads before the parent force-kills it.
|
||||
*/
|
||||
export const WORKER_SHUTDOWN_TIMEOUT_MS = 10_000;
|
||||
|
||||
export const DEFAULT_MAX_QUEUE = 100;
|
||||
|
||||
export function readLaunchFromTriggerPayload(
|
||||
raw: unknown,
|
||||
engineDefaultMaxRounds: number,
|
||||
): { prompt: string; maxRounds: number; dryRun: boolean } {
|
||||
if (isPlainRecord(raw)) {
|
||||
const o = raw;
|
||||
if (typeof o.prompt === "string" && typeof o.maxRounds === "number") {
|
||||
const dryRun = typeof o.dryRun === "boolean" ? o.dryRun : false;
|
||||
return { prompt: o.prompt, maxRounds: o.maxRounds, dryRun };
|
||||
}
|
||||
}
|
||||
return { prompt: "", maxRounds: engineDefaultMaxRounds, dryRun: false };
|
||||
}
|
||||
|
||||
export function ensureThreadMessagesWithStart(
|
||||
messages: Array<{ role: string; content: string; meta: unknown; timestamp: number }>,
|
||||
threadId: string,
|
||||
fallbackPrompt: string,
|
||||
fallbackMaxRounds: number,
|
||||
): WorkflowMessage[] {
|
||||
const mapped: WorkflowMessage[] = messages.map((m) => ({
|
||||
role: m.role,
|
||||
content: m.content,
|
||||
meta: m.meta,
|
||||
timestamp: m.timestamp,
|
||||
}));
|
||||
if (mapped.length > 0 && mapped[0].role === START) {
|
||||
return mapped;
|
||||
}
|
||||
const start: WorkflowMessage = {
|
||||
role: START,
|
||||
content: fallbackPrompt,
|
||||
meta: { maxRounds: fallbackMaxRounds, threadId },
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
return [start, ...mapped];
|
||||
}
|
||||
|
||||
export function resolveWorkflowWorkerScript(): string {
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const __dir = dirname(__filename);
|
||||
return join(__dir, "workflow-worker.js");
|
||||
}
|
||||
|
||||
export function mapWorkflowRunStatus(eventType: string): WorkflowRunStatus | null {
|
||||
const map: Record<string, WorkflowRunStatus> = {
|
||||
started: "started",
|
||||
queued: "queued",
|
||||
completed: "completed",
|
||||
failed: "failed",
|
||||
crashed: "crashed",
|
||||
dropped: "dropped",
|
||||
interrupted: "interrupted",
|
||||
killed: "killed",
|
||||
};
|
||||
return map[eventType] ?? null;
|
||||
}
|
||||
|
||||
export function extractExitCodeFromPayload(payload: unknown): number | null {
|
||||
if (isPlainRecord(payload) && typeof payload.exitCode === "number") {
|
||||
return payload.exitCode;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
export function appendWorkflowRunLog(
|
||||
logStore: LogStore,
|
||||
workflowName: string,
|
||||
runId: string,
|
||||
eventType: string,
|
||||
payload: unknown | undefined,
|
||||
exitCode: number | null,
|
||||
): void {
|
||||
const timestamp = Date.now();
|
||||
const serialised = payload !== undefined ? JSON.stringify(payload) : null;
|
||||
const status = mapWorkflowRunStatus(eventType);
|
||||
|
||||
if (status !== null) {
|
||||
logStore.upsertWorkflowRun(
|
||||
{
|
||||
source: "workflow",
|
||||
type: eventType,
|
||||
refId: runId,
|
||||
payload: serialised,
|
||||
timestamp,
|
||||
},
|
||||
{ runId, workflow: workflowName, status, timestamp, exitCode },
|
||||
);
|
||||
} else {
|
||||
logStore.append({
|
||||
source: "workflow",
|
||||
type: eventType,
|
||||
refId: runId,
|
||||
payload: serialised,
|
||||
timestamp,
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
export function recoverQueuedRun(
|
||||
workflowName: string,
|
||||
runId: string,
|
||||
state: WorkflowState,
|
||||
logStore: LogStore,
|
||||
engineMaxRounds: number,
|
||||
): void {
|
||||
if (state.queue.some((q) => q.runId === runId)) return;
|
||||
const launch = readLaunchFromTriggerPayload(logStore.getTriggerPayload(runId), engineMaxRounds);
|
||||
state.queue.push({
|
||||
runId,
|
||||
prompt: launch.prompt,
|
||||
maxRounds: launch.maxRounds,
|
||||
dryRun: launch.dryRun,
|
||||
});
|
||||
process.stderr.write(
|
||||
`[workflow-manager] crash-recovery: re-queued thread "${runId}" for "${workflowName}"\n`,
|
||||
);
|
||||
}
|
||||
|
||||
export function recoverStartedRun(
|
||||
workflowName: string,
|
||||
runId: string,
|
||||
state: WorkflowState,
|
||||
logStore: LogStore,
|
||||
engineMaxRounds: number,
|
||||
sendResume: (wf: string, msg: ResumeThreadMessage) => void,
|
||||
): void {
|
||||
if (state.active.has(runId)) return;
|
||||
const rawMessages = logStore.getThreadMessages(runId);
|
||||
const launch = readLaunchFromTriggerPayload(logStore.getTriggerPayload(runId), engineMaxRounds);
|
||||
const messages = ensureThreadMessagesWithStart(
|
||||
rawMessages,
|
||||
runId,
|
||||
launch.prompt,
|
||||
launch.maxRounds,
|
||||
);
|
||||
state.active.add(runId);
|
||||
const msg: ResumeThreadMessage = {
|
||||
type: "resume-thread",
|
||||
runId,
|
||||
messages,
|
||||
maxRounds: launch.maxRounds,
|
||||
dryRun: launch.dryRun,
|
||||
};
|
||||
sendResume(workflowName, msg);
|
||||
process.stderr.write(
|
||||
`[workflow-manager] crash-recovery: resuming thread "${runId}" for "${workflowName}" (${String(messages.length)} messages)\n`,
|
||||
);
|
||||
}
|
||||
|
||||
export function recoverThreadsFromStore(
|
||||
workflowName: string,
|
||||
logStore: LogStore,
|
||||
engineMaxRounds: number,
|
||||
getOrCreateState: (name: string) => WorkflowState,
|
||||
sendResume: (wf: string, msg: ResumeThreadMessage) => void,
|
||||
): void {
|
||||
const activeRuns = logStore.getActiveWorkflowRuns(workflowName);
|
||||
const state = getOrCreateState(workflowName);
|
||||
|
||||
for (const run of activeRuns) {
|
||||
if (run.status === "queued") {
|
||||
recoverQueuedRun(workflowName, run.runId, state, logStore, engineMaxRounds);
|
||||
} else if (run.status === "started") {
|
||||
recoverStartedRun(workflowName, run.runId, state, logStore, engineMaxRounds, sendResume);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
export type WorkflowManagerMessageDeps = {
|
||||
logStore: LogStore;
|
||||
handleThreadEvent: (workflowName: string, msg: ThreadEventMessage) => void;
|
||||
onWorkflowRoleError: (
|
||||
workflowName: string,
|
||||
runId: string,
|
||||
error: string,
|
||||
exitCode: number,
|
||||
) => void;
|
||||
};
|
||||
|
||||
export function dispatchWorkflowWorkerMessage(
|
||||
workflowName: string,
|
||||
msg: WorkerToParentMessage,
|
||||
deps: WorkflowManagerMessageDeps,
|
||||
): void {
|
||||
if (msg.type === "thread-event") {
|
||||
deps.handleThreadEvent(workflowName, msg);
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "thread-workflow-message") {
|
||||
deps.logStore.append({
|
||||
source: "workflow",
|
||||
type: "thread_workflow_message",
|
||||
refId: msg.runId,
|
||||
payload: JSON.stringify(msg.message),
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "workflow-error") {
|
||||
process.stderr.write(
|
||||
`[workflow-manager] workflow-error for runId "${msg.runId}" in "${workflowName}": ${msg.error}\n`,
|
||||
);
|
||||
deps.onWorkflowRoleError(workflowName, msg.runId, msg.error, msg.exitCode);
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "error") {
|
||||
process.stderr.write(`[workflow-manager] error from "${workflowName}" worker: ${msg.error}\n`);
|
||||
}
|
||||
}
|
||||
@@ -6,33 +6,27 @@
|
||||
* Concurrency and overflow (drop/queue) are enforced here in the parent process.
|
||||
*/
|
||||
|
||||
import { fork } from "node:child_process";
|
||||
import type { ChildProcess } from "node:child_process";
|
||||
import { dirname, join } from "node:path";
|
||||
import { fileURLToPath } from "node:url";
|
||||
import type { NerveConfig, WorkflowConfig, WorkflowStatus } from "@uncaged/nerve-core";
|
||||
|
||||
import type {
|
||||
NerveConfig,
|
||||
WorkflowConfig,
|
||||
WorkflowMessage,
|
||||
WorkflowStatus,
|
||||
} from "@uncaged/nerve-core";
|
||||
import { START, isPlainRecord } from "@uncaged/nerve-core";
|
||||
|
||||
import type { LogStore, WorkflowRunStatus } from "@uncaged/nerve-store";
|
||||
import type {
|
||||
KillThreadMessage,
|
||||
ResumeThreadMessage,
|
||||
ShutdownMessage,
|
||||
StartThreadMessage,
|
||||
ThreadEventMessage,
|
||||
} from "./ipc.js";
|
||||
import type { LogStore } from "@uncaged/nerve-store";
|
||||
import type { KillThreadMessage, StartThreadMessage, ThreadEventMessage } from "./ipc.js";
|
||||
import { parseWorkerMessage } from "./ipc.js";
|
||||
import {
|
||||
createWorkerRuntime,
|
||||
formatCapturedStderrTail,
|
||||
formatChildExitSummary,
|
||||
teeCapturedStderr,
|
||||
} from "./worker-fork-support.js";
|
||||
} from "./worker-runtime.js";
|
||||
import {
|
||||
DEFAULT_MAX_QUEUE,
|
||||
WORKER_SHUTDOWN_TIMEOUT_MS,
|
||||
WORKFLOW_WORKER_RESPAWN,
|
||||
type WorkflowState,
|
||||
appendWorkflowRunLog,
|
||||
dispatchWorkflowWorkerMessage,
|
||||
extractExitCodeFromPayload,
|
||||
recoverThreadsFromStore,
|
||||
resolveWorkflowWorkerScript,
|
||||
} from "./workflow-manager-support.js";
|
||||
|
||||
export type WorkflowLaunchParams = {
|
||||
prompt: string;
|
||||
@@ -74,169 +68,109 @@ export type WorkflowManager = {
|
||||
stop: () => Promise<void>;
|
||||
};
|
||||
|
||||
type PendingThread = {
|
||||
runId: string;
|
||||
prompt: string;
|
||||
maxRounds: number;
|
||||
dryRun: boolean;
|
||||
};
|
||||
|
||||
type WorkflowState = {
|
||||
active: Set<string>;
|
||||
queue: PendingThread[];
|
||||
};
|
||||
|
||||
type WorkerEntry = {
|
||||
workflowName: string;
|
||||
process: ChildProcess;
|
||||
stopping: boolean;
|
||||
/** When set, the worker is draining before a hot-reload respawn. */
|
||||
draining: boolean;
|
||||
stderrTail: { value: string };
|
||||
};
|
||||
|
||||
// Crash respawn backoff: track crash timestamps per workflow.
|
||||
const MAX_CRASHES_IN_WINDOW = 5;
|
||||
const CRASH_WINDOW_MS = 60_000;
|
||||
|
||||
/**
|
||||
* Worker shutdown timeout — must stay in sync with SHUTDOWN_TIMEOUT_MS in workflow-worker.ts.
|
||||
* The drain timeout passed to drainAndRespawn must be >= this value so the worker has
|
||||
* enough time to finish in-flight threads before the parent force-kills it.
|
||||
*/
|
||||
const WORKER_SHUTDOWN_TIMEOUT_MS = 10_000;
|
||||
|
||||
const DEFAULT_MAX_QUEUE = 100;
|
||||
|
||||
function readLaunchFromTriggerPayload(
|
||||
raw: unknown,
|
||||
engineDefaultMaxRounds: number,
|
||||
): { prompt: string; maxRounds: number; dryRun: boolean } {
|
||||
if (isPlainRecord(raw)) {
|
||||
const o = raw;
|
||||
if (typeof o.prompt === "string" && typeof o.maxRounds === "number") {
|
||||
const dryRun = typeof o.dryRun === "boolean" ? o.dryRun : false;
|
||||
return { prompt: o.prompt, maxRounds: o.maxRounds, dryRun };
|
||||
}
|
||||
}
|
||||
return { prompt: "", maxRounds: engineDefaultMaxRounds, dryRun: false };
|
||||
}
|
||||
|
||||
function ensureThreadMessagesWithStart(
|
||||
messages: Array<{ role: string; content: string; meta: unknown; timestamp: number }>,
|
||||
threadId: string,
|
||||
fallbackPrompt: string,
|
||||
fallbackMaxRounds: number,
|
||||
): WorkflowMessage[] {
|
||||
const mapped: WorkflowMessage[] = messages.map((m) => ({
|
||||
role: m.role,
|
||||
content: m.content,
|
||||
meta: m.meta,
|
||||
timestamp: m.timestamp,
|
||||
}));
|
||||
if (mapped.length > 0 && mapped[0].role === START) {
|
||||
return mapped;
|
||||
}
|
||||
const start: WorkflowMessage = {
|
||||
role: START,
|
||||
content: fallbackPrompt,
|
||||
meta: { maxRounds: fallbackMaxRounds, threadId },
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
return [start, ...mapped];
|
||||
}
|
||||
|
||||
function resolveWorkerScript(): string {
|
||||
const __filename = fileURLToPath(import.meta.url);
|
||||
const __dir = dirname(__filename);
|
||||
return join(__dir, "workflow-worker.js");
|
||||
}
|
||||
|
||||
function spawnWorkflowWorker(
|
||||
nerveRoot: string,
|
||||
workflowName: string,
|
||||
workerScript: string,
|
||||
stderrTail: { value: string },
|
||||
): ChildProcess {
|
||||
const child = fork(workerScript, ["--workflow", workflowName, "--root", nerveRoot], {
|
||||
stdio: ["ignore", "inherit", "pipe", "ipc"],
|
||||
});
|
||||
teeCapturedStderr(child, stderrTail);
|
||||
// Prevent unhandled EPIPE when writing to a child whose IPC channel closed
|
||||
child.on("error", (err) => {
|
||||
if ((err as NodeJS.ErrnoException).code !== "EPIPE") {
|
||||
console.error("[worker] error:", err.message);
|
||||
}
|
||||
});
|
||||
return child;
|
||||
}
|
||||
|
||||
function sendStartThread(worker: ChildProcess, msg: StartThreadMessage): void {
|
||||
if (worker.connected === false) return;
|
||||
try {
|
||||
worker.send(msg);
|
||||
} catch {
|
||||
// IPC channel closed between connected check and send
|
||||
}
|
||||
}
|
||||
|
||||
function sendShutdown(worker: ChildProcess, entry: WorkerEntry): void {
|
||||
entry.stopping = true;
|
||||
if (worker.connected === false) return;
|
||||
const msg: ShutdownMessage = { type: "shutdown" };
|
||||
try {
|
||||
worker.send(msg);
|
||||
} catch {
|
||||
// IPC channel closed between connected check and send
|
||||
}
|
||||
}
|
||||
|
||||
function sendResumeThread(worker: ChildProcess, msg: ResumeThreadMessage): void {
|
||||
if (worker.connected === false) return;
|
||||
try {
|
||||
worker.send(msg);
|
||||
} catch {
|
||||
// IPC channel closed between connected check and send
|
||||
}
|
||||
}
|
||||
|
||||
function sendKillThread(worker: ChildProcess, runId: string): void {
|
||||
if (worker.connected === false) return;
|
||||
const msg: KillThreadMessage = { type: "kill-thread", runId };
|
||||
try {
|
||||
worker.send(msg);
|
||||
} catch {
|
||||
// IPC channel closed between connected check and send
|
||||
}
|
||||
}
|
||||
|
||||
function waitForExit(child: ChildProcess, timeoutMs: number): Promise<void> {
|
||||
return new Promise((resolve) => {
|
||||
const timer = setTimeout(() => {
|
||||
child.kill("SIGKILL");
|
||||
resolve();
|
||||
}, timeoutMs);
|
||||
child.once("exit", () => {
|
||||
clearTimeout(timer);
|
||||
resolve();
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
export function createWorkflowManager(
|
||||
nerveRoot: string,
|
||||
initialConfig: NerveConfig,
|
||||
logStore: LogStore,
|
||||
): WorkflowManager {
|
||||
const workerScript = resolveWorkerScript();
|
||||
const workerScript = resolveWorkflowWorkerScript();
|
||||
|
||||
/**
|
||||
* Default drain timeout must be at least WORKER_SHUTDOWN_TIMEOUT_MS so the worker
|
||||
* has enough time to finish in-flight threads before the parent force-kills it.
|
||||
*/
|
||||
const DEFAULT_DRAIN_TIMEOUT_MS = Math.max(30_000, WORKER_SHUTDOWN_TIMEOUT_MS + 5_000);
|
||||
|
||||
const states = new Map<string, WorkflowState>();
|
||||
const workers = new Map<string, WorkerEntry>();
|
||||
const crashTimestamps = new Map<string, number[]>();
|
||||
const trackedWorkflows = new Set<string>();
|
||||
const hotReloadEvicting = new Set<string>();
|
||||
const crashRecoveryPending = new Set<string>();
|
||||
const crashLimitBlocked = new Set<string>();
|
||||
let stopped = false;
|
||||
let config = initialConfig;
|
||||
const pendingDrains = new Set<string>();
|
||||
|
||||
function logWorkflowEvent(
|
||||
workflowName: string,
|
||||
runId: string,
|
||||
eventType: string,
|
||||
payload: unknown | null = null,
|
||||
exitCode: number | null = null,
|
||||
): void {
|
||||
appendWorkflowRunLog(logStore, workflowName, runId, eventType, payload, exitCode);
|
||||
}
|
||||
|
||||
const runtime = createWorkerRuntime<string>({
|
||||
script: workerScript,
|
||||
argsForKey: (workflowName) => ["--workflow", workflowName, "--root", nerveRoot],
|
||||
forwardStderr: true,
|
||||
onMessage: (workflowName, raw) => {
|
||||
handleWorkerMessage(workflowName, raw);
|
||||
},
|
||||
onReady: (workflowName, _msg) => {
|
||||
if (crashRecoveryPending.has(workflowName)) {
|
||||
crashRecoveryPending.delete(workflowName);
|
||||
recoverThreadsFromStore(
|
||||
workflowName,
|
||||
logStore,
|
||||
config.maxRounds,
|
||||
getOrCreateState,
|
||||
(wf, msg) => {
|
||||
sendToWorker(wf, msg);
|
||||
},
|
||||
);
|
||||
}
|
||||
},
|
||||
onExit: (workflowName, code, signalStr) => {
|
||||
const sig =
|
||||
signalStr === null || signalStr === undefined || signalStr === ""
|
||||
? null
|
||||
: (signalStr as NodeJS.Signals);
|
||||
|
||||
if (hotReloadEvicting.has(workflowName)) {
|
||||
hotReloadEvicting.delete(workflowName);
|
||||
markActiveRunsInterrupted(workflowName);
|
||||
if (!stopped && workflowConfig(workflowName) !== null) {
|
||||
process.stderr.write(
|
||||
`[workflow-manager] worker for "${workflowName}" drained, respawning\n`,
|
||||
);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (stopped) {
|
||||
const state = states.get(workflowName);
|
||||
if (state !== undefined) {
|
||||
state.active.clear();
|
||||
}
|
||||
crashRecoveryPending.delete(workflowName);
|
||||
return;
|
||||
}
|
||||
|
||||
const summary = formatChildExitSummary(code, sig);
|
||||
const stderrExtra = formatCapturedStderrTail(runtime.stderrTail(workflowName));
|
||||
process.stderr.write(
|
||||
`[workflow-manager] worker for "${workflowName}" exited (${summary})${stderrExtra}\n`,
|
||||
);
|
||||
|
||||
cleanupAfterUnexpectedWorkerExit(workflowName);
|
||||
crashRecoveryPending.add(workflowName);
|
||||
},
|
||||
onCrashLimitReached: (workflowName) => {
|
||||
crashRecoveryPending.delete(workflowName);
|
||||
trackedWorkflows.delete(workflowName);
|
||||
crashLimitBlocked.add(workflowName);
|
||||
process.stderr.write(
|
||||
`[workflow-manager] worker for "${workflowName}" exceeded crash limit (${String(WORKFLOW_WORKER_RESPAWN.maxCrashes)} in ${String(WORKFLOW_WORKER_RESPAWN.windowMs)}ms) — stopping respawn\n`,
|
||||
);
|
||||
},
|
||||
respawn: {
|
||||
...WORKFLOW_WORKER_RESPAWN,
|
||||
allowRespawn: (_wf) => !stopped,
|
||||
},
|
||||
shutdownTimeoutMs: DEFAULT_DRAIN_TIMEOUT_MS,
|
||||
});
|
||||
|
||||
function getOrCreateState(workflowName: string): WorkflowState {
|
||||
let state = states.get(workflowName);
|
||||
if (state === undefined) {
|
||||
@@ -250,60 +184,38 @@ export function createWorkflowManager(
|
||||
return config.workflows[workflowName] ?? null;
|
||||
}
|
||||
|
||||
function toWorkflowRunStatus(eventType: string): WorkflowRunStatus | null {
|
||||
const map: Record<string, WorkflowRunStatus> = {
|
||||
started: "started",
|
||||
queued: "queued",
|
||||
completed: "completed",
|
||||
failed: "failed",
|
||||
crashed: "crashed",
|
||||
dropped: "dropped",
|
||||
interrupted: "interrupted",
|
||||
killed: "killed",
|
||||
};
|
||||
return map[eventType] ?? null;
|
||||
}
|
||||
|
||||
function extractExitCode(payload: unknown): number | null {
|
||||
if (isPlainRecord(payload) && typeof payload.exitCode === "number") {
|
||||
return payload.exitCode;
|
||||
/** IPC send — matches legacy pool: no-op when IPC is disconnected; cold-start via WorkerRuntime.send. */
|
||||
function sendToWorker(workflowName: string, msg: unknown): void {
|
||||
if (crashLimitBlocked.has(workflowName)) {
|
||||
return;
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
function logWorkflowEvent(
|
||||
workflowName: string,
|
||||
runId: string,
|
||||
eventType: string,
|
||||
payload?: unknown,
|
||||
exitCode: number | null = null,
|
||||
): void {
|
||||
const timestamp = Date.now();
|
||||
const serialised = payload !== undefined ? JSON.stringify(payload) : null;
|
||||
const status = toWorkflowRunStatus(eventType);
|
||||
|
||||
if (status !== null) {
|
||||
logStore.upsertWorkflowRun(
|
||||
{
|
||||
source: "workflow",
|
||||
type: eventType,
|
||||
refId: runId,
|
||||
payload: serialised,
|
||||
timestamp,
|
||||
},
|
||||
{ runId, workflow: workflowName, status, timestamp, exitCode },
|
||||
);
|
||||
} else {
|
||||
logStore.append({
|
||||
source: "workflow",
|
||||
type: eventType,
|
||||
refId: runId,
|
||||
payload: serialised,
|
||||
timestamp,
|
||||
trackedWorkflows.add(workflowName);
|
||||
if (runtime.hasDisconnectedChild(workflowName)) {
|
||||
return;
|
||||
}
|
||||
if (!runtime.trySendSync(workflowName, msg)) {
|
||||
void runtime.send(workflowName, msg).catch(() => {
|
||||
// IPC channel closed — mark any thread from this message as failed
|
||||
if (isStartThreadMsg(msg)) {
|
||||
const state = states.get(workflowName);
|
||||
if (state?.active.has(msg.runId)) {
|
||||
state.active.delete(msg.runId);
|
||||
logWorkflowEvent(workflowName, msg.runId, "failed", { error: "IPC channel closed" }, 1);
|
||||
dequeueNext(workflowName);
|
||||
}
|
||||
}
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
function isStartThreadMsg(msg: unknown): msg is StartThreadMessage {
|
||||
return (
|
||||
msg !== null &&
|
||||
typeof msg === "object" &&
|
||||
(msg as Record<string, unknown>).type === "start-thread"
|
||||
);
|
||||
}
|
||||
|
||||
function dispatchThread(
|
||||
workflowName: string,
|
||||
runId: string,
|
||||
@@ -314,7 +226,6 @@ export function createWorkflowManager(
|
||||
const state = getOrCreateState(workflowName);
|
||||
state.active.add(runId);
|
||||
|
||||
const worker = getOrSpawnWorker(workflowName);
|
||||
const msg: StartThreadMessage = {
|
||||
type: "start-thread",
|
||||
runId,
|
||||
@@ -323,7 +234,7 @@ export function createWorkflowManager(
|
||||
maxRounds,
|
||||
dryRun,
|
||||
};
|
||||
sendStartThread(worker.process, msg);
|
||||
sendToWorker(workflowName, msg);
|
||||
logWorkflowEvent(workflowName, runId, "started", { prompt, maxRounds, dryRun });
|
||||
}
|
||||
|
||||
@@ -367,92 +278,20 @@ export function createWorkflowManager(
|
||||
if (msg.eventType === "completed" || msg.eventType === "failed" || msg.eventType === "killed") {
|
||||
state.active.delete(msg.runId);
|
||||
dequeueNext(workflowName);
|
||||
const exitCode = extractExitCode(msg.payload);
|
||||
const exitCode = extractExitCodeFromPayload(msg.payload);
|
||||
logWorkflowEvent(workflowName, msg.runId, msg.eventType, msg.payload, exitCode);
|
||||
maybeDeferredHotReloadDrain(workflowName);
|
||||
}
|
||||
}
|
||||
|
||||
function recoverQueuedRun(workflowName: string, runId: string, state: WorkflowState): void {
|
||||
if (state.queue.some((q) => q.runId === runId)) return;
|
||||
const launch = readLaunchFromTriggerPayload(
|
||||
logStore.getTriggerPayload(runId),
|
||||
config.maxRounds,
|
||||
);
|
||||
state.queue.push({
|
||||
runId,
|
||||
prompt: launch.prompt,
|
||||
maxRounds: launch.maxRounds,
|
||||
dryRun: launch.dryRun,
|
||||
});
|
||||
process.stderr.write(
|
||||
`[workflow-manager] crash-recovery: re-queued thread "${runId}" for "${workflowName}"\n`,
|
||||
);
|
||||
}
|
||||
|
||||
function recoverStartedRun(
|
||||
workflowName: string,
|
||||
runId: string,
|
||||
state: WorkflowState,
|
||||
worker: WorkerEntry,
|
||||
): void {
|
||||
if (state.active.has(runId)) return;
|
||||
const rawMessages = logStore.getThreadMessages(runId);
|
||||
const launch = readLaunchFromTriggerPayload(
|
||||
logStore.getTriggerPayload(runId),
|
||||
config.maxRounds,
|
||||
);
|
||||
const messages = ensureThreadMessagesWithStart(
|
||||
rawMessages,
|
||||
runId,
|
||||
launch.prompt,
|
||||
launch.maxRounds,
|
||||
);
|
||||
state.active.add(runId);
|
||||
const msg: ResumeThreadMessage = {
|
||||
type: "resume-thread",
|
||||
runId,
|
||||
messages,
|
||||
maxRounds: launch.maxRounds,
|
||||
dryRun: launch.dryRun,
|
||||
};
|
||||
sendResumeThread(worker.process, msg);
|
||||
process.stderr.write(
|
||||
`[workflow-manager] crash-recovery: resuming thread "${runId}" for "${workflowName}" (${messages.length} messages)\n`,
|
||||
);
|
||||
}
|
||||
|
||||
function recoverThreadsForWorker(workflowName: string, worker: WorkerEntry): void {
|
||||
const activeRuns = logStore.getActiveWorkflowRuns(workflowName);
|
||||
const state = getOrCreateState(workflowName);
|
||||
|
||||
for (const run of activeRuns) {
|
||||
if (run.status === "queued") {
|
||||
recoverQueuedRun(workflowName, run.runId, state);
|
||||
} else if (run.status === "started") {
|
||||
recoverStartedRun(workflowName, run.runId, state, worker);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
function recordCrashAndCheckLimit(workflowName: string): boolean {
|
||||
const now = Date.now();
|
||||
const timestamps = (crashTimestamps.get(workflowName) ?? []).filter(
|
||||
(t) => now - t < CRASH_WINDOW_MS,
|
||||
);
|
||||
timestamps.push(now);
|
||||
crashTimestamps.set(workflowName, timestamps);
|
||||
return timestamps.length > MAX_CRASHES_IN_WINDOW;
|
||||
}
|
||||
|
||||
function handleWorkerCrash(workflowName: string): void {
|
||||
function cleanupAfterUnexpectedWorkerExit(workflowName: string): void {
|
||||
const state = states.get(workflowName);
|
||||
if (state === undefined) return;
|
||||
|
||||
const crashedCount = state.active.size;
|
||||
if (crashedCount > 0) {
|
||||
process.stderr.write(
|
||||
`[workflow-manager] worker for "${workflowName}" crashed with ${crashedCount} active thread(s)\n`,
|
||||
`[workflow-manager] worker for "${workflowName}" crashed with ${String(crashedCount)} active thread(s)\n`,
|
||||
);
|
||||
for (const runId of state.active) {
|
||||
logWorkflowEvent(workflowName, runId, "crashed", undefined, 255);
|
||||
@@ -460,26 +299,13 @@ export function createWorkflowManager(
|
||||
}
|
||||
|
||||
state.active.clear();
|
||||
workers.delete(workflowName);
|
||||
pendingDrains.delete(workflowName);
|
||||
|
||||
if (stopped || workflowConfig(workflowName) === null) return;
|
||||
|
||||
if (recordCrashAndCheckLimit(workflowName)) {
|
||||
const count = crashTimestamps.get(workflowName)?.length ?? 0;
|
||||
if (!stopped && !crashLimitBlocked.has(workflowName) && workflowConfig(workflowName) !== null) {
|
||||
process.stderr.write(
|
||||
`[workflow-manager] worker for "${workflowName}" crashed ${count} times in ${CRASH_WINDOW_MS}ms — stopping respawn\n`,
|
||||
`[workflow-manager] respawning worker for "${workflowName}" after crash\n`,
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
process.stderr.write(
|
||||
`[workflow-manager] respawning worker for "${workflowName}" after crash\n`,
|
||||
);
|
||||
const newWorker = getOrSpawnWorker(workflowName);
|
||||
setImmediate(() => {
|
||||
recoverThreadsForWorker(workflowName, newWorker);
|
||||
});
|
||||
}
|
||||
|
||||
function handleWorkerMessage(workflowName: string, raw: unknown): void {
|
||||
@@ -490,43 +316,19 @@ export function createWorkflowManager(
|
||||
);
|
||||
return;
|
||||
}
|
||||
const msg = result.value;
|
||||
|
||||
if (msg.type === "thread-event") {
|
||||
handleThreadEvent(workflowName, msg);
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "thread-workflow-message") {
|
||||
logStore.append({
|
||||
source: "workflow",
|
||||
type: "thread_workflow_message",
|
||||
refId: msg.runId,
|
||||
payload: JSON.stringify(msg.message),
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "workflow-error") {
|
||||
process.stderr.write(
|
||||
`[workflow-manager] workflow-error for runId "${msg.runId}" in "${workflowName}": ${msg.error}\n`,
|
||||
);
|
||||
const state = states.get(workflowName);
|
||||
if (state !== undefined) {
|
||||
state.active.delete(msg.runId);
|
||||
dequeueNext(workflowName);
|
||||
}
|
||||
logWorkflowEvent(workflowName, msg.runId, "failed", { error: msg.error }, msg.exitCode);
|
||||
maybeDeferredHotReloadDrain(workflowName);
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "error") {
|
||||
process.stderr.write(
|
||||
`[workflow-manager] error from "${workflowName}" worker: ${msg.error}\n`,
|
||||
);
|
||||
}
|
||||
dispatchWorkflowWorkerMessage(workflowName, result.value, {
|
||||
logStore,
|
||||
handleThreadEvent,
|
||||
onWorkflowRoleError: (wf, runId, error, exitCode) => {
|
||||
const state = states.get(wf);
|
||||
if (state !== undefined) {
|
||||
state.active.delete(runId);
|
||||
dequeueNext(wf);
|
||||
}
|
||||
logWorkflowEvent(wf, runId, "failed", { error }, exitCode);
|
||||
maybeDeferredHotReloadDrain(wf);
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
function markActiveRunsInterrupted(workflowName: string): void {
|
||||
@@ -538,67 +340,6 @@ export function createWorkflowManager(
|
||||
state.active.clear();
|
||||
}
|
||||
|
||||
function handleWorkerExit(
|
||||
workflowName: string,
|
||||
code: number | null,
|
||||
signal: NodeJS.Signals | null,
|
||||
): void {
|
||||
const entry = workers.get(workflowName);
|
||||
if (entry?.draining) {
|
||||
workers.delete(workflowName);
|
||||
markActiveRunsInterrupted(workflowName);
|
||||
if (!stopped && workflowConfig(workflowName) !== null) {
|
||||
process.stderr.write(
|
||||
`[workflow-manager] worker for "${workflowName}" drained, respawning\n`,
|
||||
);
|
||||
getOrSpawnWorker(workflowName);
|
||||
}
|
||||
return;
|
||||
}
|
||||
if (entry?.stopping) {
|
||||
workers.delete(workflowName);
|
||||
const state = states.get(workflowName);
|
||||
if (state !== undefined) {
|
||||
state.active.clear();
|
||||
}
|
||||
return;
|
||||
}
|
||||
const summary = formatChildExitSummary(code, signal);
|
||||
const stderrExtra = entry !== undefined ? formatCapturedStderrTail(entry.stderrTail.value) : "";
|
||||
process.stderr.write(
|
||||
`[workflow-manager] worker for "${workflowName}" exited (${summary})${stderrExtra}\n`,
|
||||
);
|
||||
handleWorkerCrash(workflowName);
|
||||
}
|
||||
|
||||
function getOrSpawnWorker(workflowName: string): WorkerEntry {
|
||||
const existing = workers.get(workflowName);
|
||||
if (existing !== undefined && existing.process.exitCode === null) {
|
||||
return existing;
|
||||
}
|
||||
|
||||
const stderrTail = { value: "" };
|
||||
const child = spawnWorkflowWorker(nerveRoot, workflowName, workerScript, stderrTail);
|
||||
|
||||
child.on("message", (raw: unknown) => {
|
||||
handleWorkerMessage(workflowName, raw);
|
||||
});
|
||||
|
||||
child.on("exit", (code, signal) => {
|
||||
handleWorkerExit(workflowName, code, signal ?? null);
|
||||
});
|
||||
|
||||
const entry: WorkerEntry = {
|
||||
workflowName,
|
||||
process: child,
|
||||
stopping: false,
|
||||
draining: false,
|
||||
stderrTail,
|
||||
};
|
||||
workers.set(workflowName, entry);
|
||||
return entry;
|
||||
}
|
||||
|
||||
function killThread(runId: string): boolean {
|
||||
for (const [workflowName, state] of states) {
|
||||
const queueIdx = state.queue.findIndex((q) => q.runId === runId);
|
||||
@@ -609,10 +350,8 @@ export function createWorkflowManager(
|
||||
}
|
||||
|
||||
if (state.active.has(runId)) {
|
||||
const workerEntry = workers.get(workflowName);
|
||||
if (workerEntry !== undefined) {
|
||||
sendKillThread(workerEntry.process, runId);
|
||||
}
|
||||
const msg: KillThreadMessage = { type: "kill-thread", runId };
|
||||
sendToWorker(workflowName, msg);
|
||||
return true;
|
||||
}
|
||||
}
|
||||
@@ -663,7 +402,7 @@ export function createWorkflowManager(
|
||||
state.queue.push({ runId, prompt, maxRounds, dryRun });
|
||||
logWorkflowEvent(workflowName, runId, "queued");
|
||||
process.stderr.write(
|
||||
`[workflow-manager] queued thread for "${workflowName}" runId "${runId}" (queue length: ${state.queue.length})\n`,
|
||||
`[workflow-manager] queued thread for "${workflowName}" runId "${runId}" (queue length: ${String(state.queue.length)})\n`,
|
||||
);
|
||||
}
|
||||
|
||||
@@ -707,35 +446,29 @@ export function createWorkflowManager(
|
||||
config = newConfig;
|
||||
}
|
||||
|
||||
/**
|
||||
* Default drain timeout must be at least WORKER_SHUTDOWN_TIMEOUT_MS so the worker
|
||||
* has enough time to finish in-flight threads before the parent force-kills it.
|
||||
*/
|
||||
const DEFAULT_DRAIN_TIMEOUT_MS = Math.max(30_000, WORKER_SHUTDOWN_TIMEOUT_MS + 5_000);
|
||||
|
||||
async function drainAndRespawn(
|
||||
workflowName: string,
|
||||
drainTimeoutMs: number = DEFAULT_DRAIN_TIMEOUT_MS,
|
||||
): Promise<void> {
|
||||
const entry = workers.get(workflowName);
|
||||
if (entry === undefined) {
|
||||
// No active worker — nothing to drain
|
||||
if (!trackedWorkflows.has(workflowName)) {
|
||||
return;
|
||||
}
|
||||
|
||||
entry.draining = true;
|
||||
// Send shutdown without setting stopping=true (so the exit handler uses the draining branch)
|
||||
if (entry.process.connected) {
|
||||
const msg: ShutdownMessage = { type: "shutdown" };
|
||||
try {
|
||||
entry.process.send(msg);
|
||||
} catch {
|
||||
// IPC closed
|
||||
const shutdownMs = Math.max(drainTimeoutMs, WORKER_SHUTDOWN_TIMEOUT_MS);
|
||||
hotReloadEvicting.add(workflowName);
|
||||
try {
|
||||
await runtime.evict(workflowName, { shutdownTimeoutMs: shutdownMs });
|
||||
trackedWorkflows.delete(workflowName);
|
||||
|
||||
if (!stopped && workflowConfig(workflowName) !== null) {
|
||||
trackedWorkflows.add(workflowName);
|
||||
await runtime.start(workflowName);
|
||||
}
|
||||
} finally {
|
||||
hotReloadEvicting.delete(workflowName);
|
||||
}
|
||||
await waitForExit(entry.process, drainTimeoutMs);
|
||||
// The exit handler (draining branch) will respawn the worker automatically
|
||||
}
|
||||
|
||||
function drainWhenIdle(workflowName: string): void {
|
||||
const state = states.get(workflowName);
|
||||
const hasActiveRuns = state !== undefined && state.active.size > 0;
|
||||
@@ -761,20 +494,17 @@ export function createWorkflowManager(
|
||||
|
||||
pendingDrains.add(workflowName);
|
||||
process.stderr.write(
|
||||
`[workflow-manager] deferring hot-reload for "${workflowName}" until ${state.active.size} active run(s) complete\n`,
|
||||
`[workflow-manager] deferring hot-reload for "${workflowName}" until ${String(state.active.size)} active run(s) complete\n`,
|
||||
);
|
||||
}
|
||||
|
||||
async function stop(): Promise<void> {
|
||||
stopped = true;
|
||||
pendingDrains.clear();
|
||||
const exitPromises: Promise<void>[] = [];
|
||||
for (const entry of workers.values()) {
|
||||
sendShutdown(entry.process, entry);
|
||||
exitPromises.push(waitForExit(entry.process, 5000));
|
||||
}
|
||||
await Promise.all(exitPromises);
|
||||
workers.clear();
|
||||
hotReloadEvicting.clear();
|
||||
crashRecoveryPending.clear();
|
||||
await runtime.shutdown();
|
||||
trackedWorkflows.clear();
|
||||
}
|
||||
|
||||
return {
|
||||
|
||||
@@ -30,7 +30,7 @@ import type {
|
||||
WorkerToParentMessage,
|
||||
} from "./ipc.js";
|
||||
import { parseParentMessage } from "./ipc.js";
|
||||
import { ignoreSessionBroadcastSignals } from "./worker-fork-support.js";
|
||||
import { ignoreSessionBroadcastSignals } from "./worker-signals.js";
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// IPC helpers
|
||||
|
||||
@@ -10,7 +10,7 @@ export type CoderMeta = z.infer<typeof coderMetaSchema>;
|
||||
|
||||
export function coderPrompt({ threadId }: { threadId: string }): string {
|
||||
return `Read the workflow thread for the planner's sense design and any tester feedback: \`nerve thread ${threadId}\`
|
||||
Read the nerve-dev skill for sense file structure and conventions: \`cat node_modules/@uncaged/nerve-skills/nerve-dev/SKILL.md\`
|
||||
Read \`cat AGENT.md\` from the repository root, then \`CONVENTIONS.md\` and \`.knowledge/sense.md\` if present.
|
||||
|
||||
## Your task
|
||||
|
||||
@@ -20,21 +20,21 @@ Implement (or fix) the sense the planner designed. If there is tester feedback i
|
||||
|
||||
You do NOT need to finish everything in one pass. You may return \`done: false\` to continue in the next iteration.
|
||||
|
||||
## File structure for each sense
|
||||
## File structure for each sense (flat workspace)
|
||||
|
||||
- \`senses/<name>/src/index.ts\` — TypeScript compute source; import schema as \`./schema.ts\`
|
||||
The workspace has **one root** \`package.json\` and root \`scripts/build.mjs\` (or equivalent) that bundles all senses. There is **no** per-sense \`package.json\`. Bundled output is \`dist/senses/<name>/index.js\` after a root build.
|
||||
|
||||
- \`senses/<name>/src/index.ts\` — compute entry; import schema as \`./schema.ts\`
|
||||
- \`senses/<name>/src/schema.ts\` — Drizzle schema (TypeScript)
|
||||
- \`senses/<name>/migrations/\` — Drizzle migration files (at sense root, not inside src/)
|
||||
- \`senses/<name>/package.json\` — with esbuild build script
|
||||
- \`senses/<name>/index.js\` — bundled output generated by \`pnpm build\` (do NOT edit by hand)
|
||||
- \`senses/<name>/migrations/\` — SQL migration files (at sense root, not inside \`src/\`)
|
||||
|
||||
Look at existing senses for the package.json template and patterns.
|
||||
Look at existing senses for patterns.
|
||||
|
||||
## When to return done: true
|
||||
|
||||
Return \`done: true\` ONLY when ALL of the following are true:
|
||||
- All required files are created
|
||||
- \`pnpm install --no-cache && pnpm build\` succeeds (run it!)
|
||||
- From the **workspace root**, \`pnpm run build\` or \`npm run build\` succeeds (run it!) and \`dist/senses/<name>/index.js\` exists
|
||||
- \`nerve.yaml\` is updated with the sense config
|
||||
|
||||
Return \`done: false\` if you made progress but there is still work to do.`;
|
||||
|
||||
@@ -12,7 +12,7 @@ export function plannerPrompt({ threadId }: { threadId: string }): string {
|
||||
return `You are planning a new Nerve sense.
|
||||
|
||||
Read the workflow thread for the user's request: \`nerve thread ${threadId}\`
|
||||
Read the nerve-dev skill for sense conventions: \`cat node_modules/@uncaged/nerve-skills/nerve-dev/SKILL.md\`
|
||||
Read the workspace guide: \`cat AGENT.md\` from the repository root (created by \`nerve init\`). Also read \`CONVENTIONS.md\` and \`.knowledge/sense.md\` if present. Optional skills live under \`.cursor/skills/\`.
|
||||
Also look at existing senses in the \`senses/\` directory for patterns.
|
||||
|
||||
Pick a good kebab-case name for this sense. Produce a PLAN (not code) in markdown:
|
||||
|
||||
@@ -17,21 +17,20 @@ export function testerPrompt({
|
||||
**IMPORTANT: The Nerve workspace is at \`${nerveRoot}\`. All paths below are relative to this directory. Always \`cd ${nerveRoot}\` first.**
|
||||
|
||||
Read the workflow thread for context: \`nerve thread ${threadId}\`
|
||||
Read the nerve-dev skill for expected file structure: \`cat ${nerveRoot}/node_modules/@uncaged/nerve-skills/nerve-dev/SKILL.md\`
|
||||
Read \`cat ${nerveRoot}/AGENT.md\`, then \`${nerveRoot}/CONVENTIONS.md\` and \`${nerveRoot}/.knowledge/sense.md\` if they exist.
|
||||
|
||||
Verify the full lifecycle in this order:
|
||||
|
||||
1. **File check** — all required sense files exist:
|
||||
1. **File check** — all required sense files exist (no per-sense \`package.json\`):
|
||||
- \`senses/<name>/src/index.ts\`
|
||||
- \`senses/<name>/src/schema.ts\`
|
||||
- \`senses/<name>/migrations/\`
|
||||
- \`senses/<name>/package.json\`
|
||||
|
||||
2. **Build** — run inside the sense directory:
|
||||
2. **Build** — from the workspace root:
|
||||
\`\`\`
|
||||
cd ${nerveRoot}/senses/<name> && pnpm install --no-cache && pnpm build
|
||||
cd ${nerveRoot} && pnpm run build
|
||||
\`\`\`
|
||||
Must produce \`index.js\` at sense root without errors.
|
||||
(or \`npm run build\` per root \`package.json\`.) Must produce \`${nerveRoot}/dist/senses/<name>/index.js\` without errors.
|
||||
|
||||
3. **Config check** — \`nerve validate\` passes, confirming nerve.yaml is valid.
|
||||
|
||||
|
||||
@@ -10,7 +10,7 @@ export type CoderMeta = z.infer<typeof coderMetaSchema>;
|
||||
|
||||
export function coderPrompt({ threadId }: { threadId: string }): string {
|
||||
return `Read the workflow thread to get the planner's design and any reviewer/tester/committer feedback: \`nerve thread ${threadId}\`
|
||||
Read the nerve-dev skill for workflow file structure and conventions: \`cat node_modules/@uncaged/nerve-skills/nerve-dev/SKILL.md\`
|
||||
Read \`cat AGENT.md\` from the repository root, then \`CONVENTIONS.md\` and \`.knowledge/workflow.md\` if present. Optional skills live under \`.cursor/skills/\`.
|
||||
Also look at existing workflows in the \`workflows/\` directory for patterns.
|
||||
|
||||
## Your task
|
||||
@@ -29,15 +29,13 @@ You do NOT need to finish everything in one pass. You may return \`done: false\`
|
||||
2. Second pass: implement role logic
|
||||
3. Third pass: fix build/lint errors
|
||||
|
||||
## Workflow file structure
|
||||
## Workflow file structure (flat workspace)
|
||||
|
||||
The workspace has **one root** \`package.json\` and **one** root build (\`pnpm run build\` or \`npm run build\`), implemented by \`scripts/build.mjs\`, which emits bundles under \`dist/workflows/<name>/index.js\`. There is **no** per-workflow \`package.json\` or \`tsconfig.json\`.
|
||||
|
||||
Each workflow must have:
|
||||
- \`workflows/<name>/index.ts\` — WorkflowDefinition default export
|
||||
- \`workflows/<name>/build.ts\` — factory function
|
||||
- \`workflows/<name>/moderator.ts\` — moderator + meta types
|
||||
- \`workflows/<name>/roles/<role>.ts\` — meta schema and prompt function per role
|
||||
- \`workflows/<name>/package.json\` — with esbuild build script
|
||||
- \`workflows/<name>/tsconfig.json\` — TypeScript config
|
||||
- \`workflows/<name>/index.ts\` — default export \`WorkflowDefinition\` (moderator and meta types typically live here or are imported from co-located modules)
|
||||
- \`workflows/<name>/roles/<role>.ts\` — one TypeScript file per role (schemas, prompts, \`createRole\` wiring, or plain async role functions)
|
||||
|
||||
For **new workflows**, also update \`nerve.yaml\` with \`workflows.<name>\`.
|
||||
|
||||
@@ -53,7 +51,7 @@ For **new workflows**, also update \`nerve.yaml\` with \`workflows.<name>\`.
|
||||
|
||||
Return \`done: true\` ONLY when ALL of the following are true:
|
||||
- All changes from the plan are implemented
|
||||
- \`cd workflows/<name> && pnpm install --no-cache && pnpm build\` succeeds (run it!)
|
||||
- From the **workspace root**, \`pnpm run build\` or \`npm run build\` succeeds (run it!) so \`dist/workflows/<name>/index.js\` is produced
|
||||
- No lint or type errors remain
|
||||
|
||||
Return \`done: false\` if you made progress but there is still work to do, or if build/lint has errors you plan to fix in the next iteration.`;
|
||||
|
||||
@@ -12,18 +12,18 @@ export function plannerPrompt({ threadId }: { threadId: string }): string {
|
||||
return `You are a Nerve workflow planner. You can **create new workflows** or **modify existing ones**.
|
||||
|
||||
Read the workflow thread for the user's request: \`nerve thread ${threadId}\`
|
||||
Read the nerve-dev skill for workflow conventions: \`cat node_modules/@uncaged/nerve-skills/nerve-dev/SKILL.md\`
|
||||
Read the workspace guide: \`cat AGENT.md\` from the repository root (created by \`nerve init\`). Also read \`CONVENTIONS.md\` if it exists; if \`.knowledge/workflow.md\` exists (e.g. Nerve monorepo), read it for layout and engine behavior. Optional Cursor skills live under \`.cursor/skills/\`.
|
||||
List existing workflows: \`ls workflows/\`
|
||||
|
||||
## Determine the task type
|
||||
|
||||
1. If the user wants to **modify an existing workflow** — read its current code (\`cat workflows/<name>/moderator.ts\`, \`cat workflows/<name>/build.ts\`, \`ls workflows/<name>/roles/\`, etc.) and understand its current structure before planning changes.
|
||||
1. If the user wants to **modify an existing workflow** — read its current code (\`cat workflows/<name>/index.ts\`, \`ls workflows/<name>/roles/\`, \`cat workflows/<name>/roles/<role>.ts\`, etc.) and understand its current structure before planning changes.
|
||||
2. If the user wants to **create a new workflow** — look at existing workflows in \`workflows/\` for patterns to follow.
|
||||
|
||||
## Produce a PLAN (not code) in markdown
|
||||
|
||||
For **new workflows**:
|
||||
- Workflow name (kebab-case, **verb-first** phrase — e.g. \`extract-knowledge\`, \`solve-issue\`; not noun-led names like \`knowledge-extraction\` or \`issue-solver\`)
|
||||
- Workflow name — **verb-first** kebab-case phrase (e.g. \`review-pull-request\`, \`deploy-staging\`), not a bare noun
|
||||
- Roles list (name, purpose, tool)
|
||||
- Flow transitions / moderator routing logic
|
||||
- Validation loops design
|
||||
|
||||
@@ -17,24 +17,22 @@ export function testerPrompt({
|
||||
**IMPORTANT: The Nerve workspace is at \`${nerveRoot}\`. All paths below are relative to this directory. Always \`cd ${nerveRoot}\` first.**
|
||||
|
||||
Read the workflow thread for context: \`nerve thread ${threadId}\`
|
||||
Read the nerve-dev skill for expected file structure: \`cat ${nerveRoot}/node_modules/@uncaged/nerve-skills/nerve-dev/SKILL.md\`
|
||||
Read \`cat ${nerveRoot}/AGENT.md\`, then \`${nerveRoot}/CONVENTIONS.md\` and \`${nerveRoot}/.knowledge/workflow.md\` if they exist.
|
||||
|
||||
Get the workflow name from the thread (the planner's output).
|
||||
|
||||
Verify the full lifecycle in this order:
|
||||
|
||||
1. **File check** — all required workflow files exist (under \`${nerveRoot}/\`):
|
||||
1. **File check** — all required workflow sources exist (under \`${nerveRoot}/\`):
|
||||
- \`workflows/<name>/index.ts\`
|
||||
- \`workflows/<name>/build.ts\`
|
||||
- \`workflows/<name>/moderator.ts\`
|
||||
- \`workflows/<name>/roles/\` with one \`.ts\` file per role
|
||||
- \`workflows/<name>/package.json\`
|
||||
- \`workflows/<name>/roles/\` with one \`.ts\` file per role (flat files, not per-role packages)
|
||||
- **No** \`workflows/<name>/package.json\` or \`tsconfig.json\` expected
|
||||
|
||||
2. **Build** — run inside the workflow directory:
|
||||
2. **Build** — from the workspace root:
|
||||
\`\`\`
|
||||
cd ${nerveRoot}/workflows/<name> && pnpm install --no-cache && pnpm build
|
||||
cd ${nerveRoot} && pnpm run build
|
||||
\`\`\`
|
||||
Must produce \`dist/index.js\` without errors.
|
||||
(or \`npm run build\` if that is what the root \`package.json\` defines.) Must produce \`${nerveRoot}/dist/workflows/<name>/index.js\` without errors.
|
||||
|
||||
3. **Config check** — \`cd ${nerveRoot} && nerve validate\` passes, confirming nerve.yaml is valid.
|
||||
|
||||
|
||||
@@ -26,13 +26,11 @@ export {
|
||||
} from "./role-decorators.js";
|
||||
export {
|
||||
nerveCommandEnv,
|
||||
spawnSafe,
|
||||
type SpawnEnv,
|
||||
type SpawnError,
|
||||
type SpawnResult,
|
||||
type SpawnSafeOptions,
|
||||
} from "@uncaged/nerve-core";
|
||||
export type { LlmError, LlmProvider } from "./shared/llm-extract.js";
|
||||
export { isDryRun } from "./role-types.js";
|
||||
export type { LlmMessage, MetaExtractConfig } from "./role-types.js";
|
||||
export type { LlmChatError } from "./shared/llm-chat.js";
|
||||
|
||||
@@ -1,16 +1,8 @@
|
||||
import type { SpawnEnv, StartStep, ThreadContext } from "@uncaged/nerve-core";
|
||||
import type { SpawnEnv, ThreadContext } from "@uncaged/nerve-core";
|
||||
import type { z } from "zod";
|
||||
|
||||
import type { LlmProvider } from "./shared/llm-extract.js";
|
||||
|
||||
/**
|
||||
* @deprecated `dryRun` has been removed from `StartStep.meta` (RFC-005).
|
||||
* Use adapter/role-level `dryRun` config instead.
|
||||
*/
|
||||
export function isDryRun(_start: StartStep): boolean {
|
||||
return false;
|
||||
}
|
||||
|
||||
export type CliPromptFn = (ctx: ThreadContext) => Promise<string>;
|
||||
|
||||
export type LlmMessage = { role: "system" | "user" | "assistant"; content: string };
|
||||
|
||||
@@ -3,7 +3,7 @@ import type { HermesRoleDefaults, HermesRoleRequired } from "../role-types.js";
|
||||
export { hermesAgent } from "@uncaged/nerve-adapter-hermes";
|
||||
export type { HermesAgentOptions } from "@uncaged/nerve-adapter-hermes";
|
||||
|
||||
// --- Hermes options resolution (absorbed from hermes-options.ts) ---
|
||||
// --- Hermes role defaults resolution ---
|
||||
|
||||
const HERMES_DEFAULTS: HermesRoleDefaults = {
|
||||
model: null,
|
||||
|
||||
Reference in New Issue
Block a user