docs(knowledge): update cards via knowledge-extraction workflow (5q/round)
7 cards updated, 4 new cards added. Topics: signal-routing, worker-isolation, storage-layer, adapter-isolation, sense contracts, workflow runtime enforcement, coding conventions details. 小橘 <xiaoju@shazhou.work>
This commit is contained in:
@@ -0,0 +1,171 @@
|
||||
# Adapter Process Isolation
|
||||
|
||||
Describes sandboxing, process isolation, resource limits, and timeout enforcement for adapter invocations in the Nerve workflow system.
|
||||
|
||||
## Process Isolation Model
|
||||
|
||||
Adapters run in a **two-tier isolation** model:
|
||||
|
||||
1. **Workflow Worker Process** — Each workflow runs in a dedicated Node.js worker process (`workflow-worker.ts`) forked from the main daemon
|
||||
2. **Adapter Child Process** — Each adapter spawns CLI tools as child processes via `spawnSafe()` with `shell: false`
|
||||
|
||||
## Resource Limits & Timeouts
|
||||
|
||||
### Adapter-Level Timeouts
|
||||
|
||||
- **Default timeout**: 300 seconds (300,000ms) for both cursor and hermes adapters
|
||||
- **Configurable** via `AgentConfig.timeout` in adapter factory functions
|
||||
- **Wall-clock enforcement** using `setTimeout()` — kills child process with `SIGTERM` on timeout
|
||||
- **AbortSignal support** — external cancellation triggers immediate `SIGTERM`
|
||||
|
||||
### Timeout Behavior
|
||||
|
||||
```ts
|
||||
// Timeout resolution priority (packages/core/src/spawn-safe.ts):
|
||||
// 1. Explicit timeoutMs value
|
||||
// 2. AbortSignal presence → no internal timer (relies on external abort)
|
||||
// 3. DEFAULT_TIMEOUT_MS (300_000) fallback
|
||||
```
|
||||
|
||||
- Child process terminated with `SIGTERM` on timeout/abort
|
||||
- Returns `{ kind: "timeout", stdout, stderr }` error result
|
||||
- **No grace period** — immediate kill
|
||||
- **No SIGKILL escalation** — relies entirely on `SIGTERM` effectiveness
|
||||
|
||||
#### SIGTERM Limitations
|
||||
|
||||
If a child process **ignores or blocks `SIGTERM`** (e.g., signal handlers, blocked delivery):
|
||||
|
||||
- **No fallback to `SIGKILL`** — process may remain alive indefinitely
|
||||
- **No escalation timer** — spawnSafe() does not implement progressive signal escalation
|
||||
- **Potential zombie/orphan risk** — unresponsive processes continue consuming resources
|
||||
- **OS-level cleanup only** — relies on parent process death or OS reaping mechanisms
|
||||
|
||||
## Sandboxing Characteristics
|
||||
|
||||
### What's Isolated
|
||||
|
||||
- **File system**: Child process runs in specified `cwd` (workflow working directory)
|
||||
- **Environment**: Controlled env vars via `nerveCommandEnv()` + optional overrides
|
||||
- **Network**: No explicit restrictions (inherits parent process network access)
|
||||
- **Process tree**: Child processes are direct children, not containerized
|
||||
|
||||
### What's NOT Sandboxed
|
||||
|
||||
- **No resource quotas** (CPU, memory, disk I/O limits)
|
||||
- **No filesystem chroot/containers** — full filesystem access within user permissions
|
||||
- **No network isolation** — can make arbitrary network calls
|
||||
- **No syscall filtering** — no seccomp or similar restrictions
|
||||
|
||||
#### Runtime Resource Enforcement
|
||||
|
||||
**No active resource monitoring or constraints**:
|
||||
|
||||
- **No cgroups** (Linux) — no CPU, memory, or I/O limits enforced
|
||||
- **No job objects** (Windows) — no resource quotas or process tree limits
|
||||
- **No worker_threads resource tracking** — Node.js worker processes run unrestricted
|
||||
- **Pure timeout-based enforcement** — only wall-clock time limits via `setTimeout()`
|
||||
- **OS-scheduled resource sharing** — relies entirely on operating system process scheduling
|
||||
|
||||
Adapters can consume unlimited:
|
||||
- **CPU time** (until timeout)
|
||||
- **Memory** (until OOM)
|
||||
- **Disk I/O** (no quotas)
|
||||
- **Network bandwidth** (no throttling)
|
||||
- **File descriptors** (until ulimit)
|
||||
|
||||
#### Environment Variable Security
|
||||
|
||||
The `nerveCommandEnv()` function provides **minimal sanitization**:
|
||||
|
||||
```ts
|
||||
// spawn-safe.ts lines 47-55
|
||||
export function nerveCommandEnv(): SpawnEnv {
|
||||
const home = homedir();
|
||||
const pnpmHome = join(home, ".local/share/pnpm");
|
||||
return {
|
||||
...process.env, // ← Full parent environment inherited
|
||||
PNPM_HOME: pnpmHome,
|
||||
PATH: `${pnpmHome}:${process.env.PATH ?? ""}`,
|
||||
};
|
||||
}
|
||||
```
|
||||
|
||||
- **No filtering of sensitive keys** — `NODE_OPTIONS`, `LD_PRELOAD`, `PYTHONPATH` passed through unchanged
|
||||
- **Full environment inheritance** — all parent process environment variables copied
|
||||
- **Injection risk** — malicious env vars (e.g., `NODE_OPTIONS=--require=evil.js`) affect Node.js child processes
|
||||
- **Path manipulation** — sensitive PATH entries remain accessible to adapters
|
||||
|
||||
## Security Model
|
||||
|
||||
### Execution Context
|
||||
|
||||
- Uses `shell: false` to prevent shell injection attacks
|
||||
- Arguments passed as separate array elements (not shell-parsed)
|
||||
- PATH includes `~/.local/share/pnpm` for tool discovery
|
||||
- Inherits parent process user/group permissions
|
||||
|
||||
#### File Descriptor Management
|
||||
|
||||
```ts
|
||||
// spawn-safe.ts line 122
|
||||
stdio: ["ignore", "pipe", "pipe"]
|
||||
```
|
||||
|
||||
- **stdin closed**: Child receives no input (`stdio[0]: "ignore"`)
|
||||
- **stdout/stderr captured**: Piped to parent for collection (`stdio[1,2]: "pipe"`)
|
||||
- **No explicit fd closing**: Node.js default behavior — inherits other file descriptors
|
||||
- **Parent sockets/pipes accessible**: Child can access parent's open network connections, database handles, etc.
|
||||
- **Security risk**: Adapter processes may access unintended parent file descriptors
|
||||
|
||||
### Attack Surface
|
||||
|
||||
- CLI tools have **full user-level filesystem access**
|
||||
- Can spawn additional processes (not tracked/limited)
|
||||
- Network requests unrestricted
|
||||
- Resource consumption relies on OS-level limits
|
||||
|
||||
## Worker Process Management
|
||||
|
||||
### Workflow Isolation
|
||||
|
||||
- Each workflow type gets dedicated worker process
|
||||
- Worker processes handle multiple concurrent threads (runIds)
|
||||
- Kill flags enable per-thread cancellation without killing worker
|
||||
- Graceful shutdown waits up to 10 seconds for in-flight operations
|
||||
|
||||
#### Cross-RunId Contamination Risks
|
||||
|
||||
**Shared mutable state** poses contamination risks between concurrent runIds:
|
||||
|
||||
- **`process.env` mutations**: Environment changes affect all subsequent runIds in same worker
|
||||
- **`require.cache` pollution**: Module cache shared across all runIds — side effects persist
|
||||
- **Global variables**: Any global state mutations from one runId visible to others
|
||||
- **`process.cwd()` changes**: Working directory changes affect entire worker process
|
||||
- **File descriptors**: Open files/sockets shared between runId executions
|
||||
|
||||
**No runId-specific scoping** implemented:
|
||||
- Worker reuses single Node.js process for efficiency
|
||||
- Each role execution sees cumulative environment from previous runIds
|
||||
- **Mitigation relies on adapter discipline** — clean implementations avoid global mutations
|
||||
|
||||
### Error Handling
|
||||
|
||||
- Adapter failures don't crash the worker process
|
||||
- Timeout/abort errors are isolated to specific role execution
|
||||
- Worker process survives adapter failures and continues serving other threads
|
||||
|
||||
## Configuration
|
||||
|
||||
```yaml
|
||||
# Example nerve.yaml configuration for timeout overrides
|
||||
workflows:
|
||||
my-workflow:
|
||||
roles:
|
||||
coder:
|
||||
adapter:
|
||||
type: cursor
|
||||
timeout: 600000 # 10 minutes in milliseconds
|
||||
```
|
||||
|
||||
Timeout configuration happens at the adapter creation level, not as a system-wide sandbox policy.
|
||||
+22
-1
@@ -9,9 +9,19 @@ type AgentFn = (prompt: string, context: WorkflowContext) => Promise<string>
|
||||
```
|
||||
|
||||
- Input: prompt + context (start frame, messages, workdir, AbortSignal)
|
||||
- Output: raw string — structured extraction is separate
|
||||
- Output: **single-shot `Promise<string>`** — no streaming support
|
||||
- Adapter handles tool-specific details internally
|
||||
|
||||
### Streaming Limitations
|
||||
|
||||
The `AgentFn` protocol does **not** support streaming responses (`AsyncIterable<string>` or `ReadableStream`). It's strictly limited to single-shot `Promise<string>` returns.
|
||||
|
||||
For long-running or incremental agent outputs:
|
||||
- CLI tools buffer full output until completion
|
||||
- Timeout enforcement via `timeoutMs` (default 300s)
|
||||
- No intermediate results exposed to workflow logic
|
||||
- Progress indication happens at the CLI tool level only
|
||||
|
||||
## Available Adapters
|
||||
|
||||
| Package | Adapter | Tool |
|
||||
@@ -45,3 +55,14 @@ extract:
|
||||
```
|
||||
|
||||
Two-level merge: global → role override. Retry once on parse failure (feeds error back to LLM), then throw `ExtractError`.
|
||||
|
||||
## Error Handling
|
||||
|
||||
When adapters' underlying CLI tools (e.g., `cursor-agent` or `hermes`) fail, errors are surfaced **synchronously via rejection** with no fallback/retry logic:
|
||||
|
||||
- **Missing/unavailable tool**: `spawn_failed` error when CLI binary not found in `$PATH`
|
||||
- **Non-zero exit code**: `non_zero_exit` error with captured stdout/stderr
|
||||
- **Timeout**: `timeout` error when execution exceeds configured `timeoutMs`
|
||||
- **Abort signal**: `aborted` error when `AbortSignal` triggers cancellation
|
||||
|
||||
All errors are immediately thrown as `Error` instances with descriptive messages (e.g., `"cursor-agent: exitCode=7 stdout=... stderr=..."`). No automatic retries or fallback adapters.
|
||||
|
||||
@@ -33,3 +33,14 @@ Senses own both the "what" (compute logic) and the "when" (config-driven schedul
|
||||
- One worker per Workflow type (on-demand)
|
||||
- Workers never talk to each other
|
||||
- All user code runs in isolated Workers; kernel never loads user code directly
|
||||
|
||||
## Storage Systems
|
||||
|
||||
- **Log Store** — SQLite with WAL mode for audit trails and workflow state
|
||||
- **Sense Databases** — Isolated SQLite per sense group for private data
|
||||
- **Knowledge Store** — Vector search index for project context
|
||||
- **Blob Store** — Content-addressable storage for large artifacts
|
||||
|
||||
## Signal Flow
|
||||
|
||||
Sense compute outputs are routed through signal routing logic that determines whether to emit a signal or trigger a workflow—never both simultaneously.
|
||||
|
||||
@@ -6,6 +6,8 @@
|
||||
|
||||
```bash
|
||||
nerve init # scaffold a new workspace (nerve.yaml, senses/, workflows/)
|
||||
nerve init --force # reinitialize workspace even if ~/.uncaged-nerve/ exists (preserves data/)
|
||||
nerve init --from <git-url> # clone existing workspace from git repository
|
||||
nerve validate # validate nerve.yaml config
|
||||
nerve dev # run kernel foreground (development, Ctrl+C to stop)
|
||||
nerve start # start daemon (background)
|
||||
@@ -14,6 +16,14 @@ nerve status # check daemon health (uptime, senses, workflows)
|
||||
nerve daemon # restart daemon (stop + start)
|
||||
```
|
||||
|
||||
### Init Behavior
|
||||
|
||||
**Default `nerve init`**: Creates workspace at `~/.uncaged-nerve/`. If this directory already exists and is non-empty, **exits with error** requiring `--force` flag. No merge/overwrite logic — prevents accidental workspace destruction.
|
||||
|
||||
**Force mode `nerve init --force`**: Reinitializes workspace even if `~/.uncaged-nerve/` exists. **Preserves `data/` directory** (containing sense SQLite databases and logs) but overwrites all config files (`nerve.yaml`, `package.json`, etc.) and example senses.
|
||||
|
||||
**Git clone `nerve init --from <url>`**: Clones existing repository to `~/.uncaged-nerve/`. Requires empty target directory — fails if workspace already exists and is non-empty.
|
||||
|
||||
## Sense Management
|
||||
|
||||
```bash
|
||||
|
||||
@@ -24,6 +24,21 @@ type Config = { throttle?: string }
|
||||
- `throw` only for programmer errors (bugs)
|
||||
- No try-catch for flow control
|
||||
|
||||
### Result<T, E> Type
|
||||
|
||||
Defined in `@uncaged/nerve-core` (`packages/core/src/result.ts`):
|
||||
|
||||
```ts
|
||||
export type Result<T, E = Error> = { ok: true; value: T } | { ok: false; error: E };
|
||||
```
|
||||
|
||||
**Discriminated union** with tagged `ok` field. Helper functions:
|
||||
- `ok(value)` → `{ ok: true, value }`
|
||||
- `err(error)` → `{ ok: false, error }`
|
||||
|
||||
**Exhaustive handling**: Pattern is `if (!result.ok) { handle error }` then access `result.value`.
|
||||
No compiler enforcement - relies on manual discipline and TypeScript's flow control analysis.
|
||||
|
||||
## Naming
|
||||
|
||||
| Type | Style |
|
||||
@@ -38,9 +53,25 @@ type Config = { throttle?: string }
|
||||
- Always named exports, never default
|
||||
- One module = one responsibility
|
||||
|
||||
### Module Naming Conventions
|
||||
|
||||
**Primary exports** use descriptive, unambiguous names:
|
||||
- Functions: `createXxx()`, `parseXxx()`, `xxxAgent()` (e.g., `createCursorAdapter`, `cursorAgent`)
|
||||
- Types: Domain-specific prefixes (e.g., `CursorAgentOptions`, `SenseComputeFn`, `WorkflowContext`)
|
||||
- Constants: `UPPER_SNAKE_CASE` with context (e.g., `DEFAULT_SENSE_SIGNAL_RETENTION`, `CURSOR_ADAPTER_DEFAULT_MS`)
|
||||
|
||||
**Avoiding ambiguity**:
|
||||
- Package-scoped naming: `@uncaged/nerve-adapter-cursor` exports `cursorAgent`, `createCursorAdapter`
|
||||
- Factory pattern: `createXxxAdapter()` for configurable instances, `xxxAdapter` for defaults
|
||||
- Descriptive type prefixes prevent collision (e.g., `CursorAgentOptions` vs `HermesAgentOptions`)
|
||||
|
||||
## Async
|
||||
|
||||
- Always `async/await`, never `.then()` chains
|
||||
- Use `AbortSignal` for cancellation: `AbortController` to create signals, pass to long-running operations
|
||||
- `spawn-safe.ts` and adapter functions accept `abortSignal: AbortSignal | null` parameter
|
||||
- On abort: child processes receive `SIGTERM`, async operations should check `signal.aborted`
|
||||
- No enforced Biome/Vitest rules for AbortSignal usage (manual discipline required)
|
||||
|
||||
## No Dynamic Import
|
||||
|
||||
|
||||
@@ -19,10 +19,20 @@ nerve knowledge query --repo /path "query" # search specific repo
|
||||
|
||||
## Embedding
|
||||
|
||||
- Remote service: configured via `EMBED_SERVICE_URL` env var (self-hosted Cloudflare Worker + KV cache)
|
||||
- Model: Dashscope text-embedding-v3 (1024 dims)
|
||||
- Cache: content-addressable (sha256 of model+text), never expires
|
||||
- Fallback: word-overlap scoring when embed service not configured
|
||||
- **Default model**: Dashscope text-embedding-v3 (1024 dimensions)
|
||||
- **Remote service**: configured via `EMBED_SERVICE_URL` env var (self-hosted Cloudflare Worker + KV cache)
|
||||
- **Model configuration**: No mechanism to specify alternate models — hardcoded to text-embedding-v3 in remote service
|
||||
- **Vector dimensions**: Fixed at 1024 (Float32Array, stored as 4096-byte Buffer blobs in SQLite)
|
||||
- **Cache**: content-addressable (sha256 of model+text), never expires
|
||||
- **Fallback**: word-overlap scoring when embed service not configured
|
||||
|
||||
### Configuration
|
||||
|
||||
The embedding model is **not configurable** through `knowledge.yaml` or other config files. The remote service at `embed.shazhou.workers.dev` uses Dashscope text-embedding-v3 exclusively. To use different models, you would need to:
|
||||
|
||||
1. Deploy your own embedding service compatible with the same API
|
||||
2. Point `EMBED_SERVICE_URL` to your service
|
||||
3. Ensure vector dimensions match (1024) or modify knowledge database schema
|
||||
|
||||
## Chunking
|
||||
|
||||
|
||||
+34
-9
@@ -12,15 +12,29 @@ export { snapshots as table } from "./schema.ts"; // drizzle table for runtime
|
||||
export async function compute(): Promise<ComputeResult<T>> { ... } // pure, no args
|
||||
```
|
||||
|
||||
- `compute()` is a **pure function with no arguments** — no db, no peers, no signal
|
||||
- Returns `ComputeResult<T>` = `null | { signal: T; workflow: WorkflowTrigger | null }`
|
||||
- `null` → silent, no storage, no signal
|
||||
- `{ signal: data, workflow: null }` → persist data, emit signal
|
||||
- `{ signal: data, workflow: { name, prompt } }` → persist data, emit signal, AND trigger workflow
|
||||
- **Runtime handles persistence** — `db.insert(table).values(result.signal)` is done by `sense-runtime`, not by the sense itself
|
||||
- Each Sense has its own **independent SQLite database**
|
||||
- Schema defined with Drizzle ORM (`schema.ts` is single source of truth)
|
||||
- Types: `SenseComputeFn`, `SenseModule`, `ComputeResult` exported from `@uncaged/nerve-core`
|
||||
**Function Signature & Input Schema:**
|
||||
- `compute()` is **parameterless** — no direct inputs, environment variables available
|
||||
- No database access within compute — runtime provides isolated execution context
|
||||
- Must be pure function (no side effects, no external API calls)
|
||||
|
||||
**Return Value Contract:**
|
||||
- `ComputeResult<T>` = `null | { signal: T; workflow: WorkflowTrigger | null }`
|
||||
- `null` → silent, no storage, no signal
|
||||
- `{ signal: data, workflow: null }` → persist + emit signal
|
||||
- `{ signal, workflow: WorkflowTrigger }` → persist + emit signal + trigger workflow
|
||||
- Any other value → treated as `{ signal: value, workflow: null }`
|
||||
|
||||
**Error Handling & Serialization:**
|
||||
- Exceptions caught by worker, logged as errors (no signal emitted)
|
||||
- Signal payload must be JSON-serializable (passed via IPC)
|
||||
- Invalid workflow triggers silently dropped (signal still emitted)
|
||||
|
||||
**Timeout & Scheduling Semantics:**
|
||||
- Timeout priority: explicit config → AbortSignal → DEFAULT_TIMEOUT_MS (30s)
|
||||
- Enforced via `Promise.race()` with timeout promise
|
||||
- Grace period can trigger `process.exit(1)` after timeout (kills worker group)
|
||||
- Interval translation: YAML config values used directly as milliseconds in `setInterval()`
|
||||
- Jitter control: throttle mechanism prevents rapid-fire, single deferred trigger per throttle window
|
||||
|
||||
## Config (nerve.yaml)
|
||||
|
||||
@@ -34,3 +48,14 @@ senses:
|
||||
interval: 30s # periodic trigger (optional)
|
||||
on: [disk-pressure] # trigger on signals from other senses (optional)
|
||||
```
|
||||
|
||||
## Manual Trigger Context
|
||||
|
||||
**`nerve sense trigger <name>`** sends IPC message to running daemon. The compute context is initialized as follows:
|
||||
|
||||
- **SQLite Database**: Opened in **read-write mode** at `data/senses/<name>.db`
|
||||
- **Migrations**: All `*.sql` files in `senses/<name>/migrations/` applied in lexicographic order
|
||||
- **Environment**: Inherits daemon process environment (no special secrets injection)
|
||||
- **Arguments**: No runtime arguments or mock inputs supported — `compute()` is always pure function with no parameters
|
||||
- **Isolation**: Runs in forked child process (worker) with full filesystem access within user permissions
|
||||
- **Persistence**: Runtime automatically calls `db.insert(table).values(result.signal)` if compute returns non-null signal
|
||||
|
||||
@@ -0,0 +1,91 @@
|
||||
# Signal Routing
|
||||
|
||||
Signal routing is the core mechanism that determines how Sense outputs flow through the Nerve system.
|
||||
|
||||
## Routing Logic
|
||||
|
||||
When a Sense `compute()` function returns non-null, the output goes through `routeSenseComputeOutput()` in `packages/core/src/sense-workflow-directive.ts`:
|
||||
|
||||
```
|
||||
Sense compute() → non-null → routeSenseComputeOutput() → { signal, workflow }
|
||||
↓
|
||||
kernel.ts → signal ALWAYS emitted + optional workflow start
|
||||
```
|
||||
|
||||
## Two Output Formats
|
||||
|
||||
### 1. Explicit Format
|
||||
```typescript
|
||||
{
|
||||
signal: any, // emitted as signal
|
||||
workflow: { // optional workflow trigger
|
||||
name: string,
|
||||
maxRounds: number,
|
||||
prompt: string,
|
||||
dryRun: boolean
|
||||
} | null
|
||||
}
|
||||
```
|
||||
|
||||
### 2. Shorthand Format
|
||||
Any other value is treated as:
|
||||
```typescript
|
||||
{ signal: payload, workflow: null }
|
||||
```
|
||||
|
||||
## Workflow Directive Parsing
|
||||
|
||||
## Concrete Routing Predicates
|
||||
|
||||
The routing decision is implemented in `routeSenseComputeOutput()` using these exact matching criteria:
|
||||
|
||||
### 1. Explicit Format Detection
|
||||
```typescript
|
||||
if (isPlainRecord(payload) && Object.hasOwn(payload, "signal"))
|
||||
```
|
||||
- Payload must be a plain object
|
||||
- Must have `signal` property (any value)
|
||||
- Workflow extracted from `workflow` property or defaults to null
|
||||
|
||||
### 2. Workflow Validation
|
||||
When workflow is non-null, it's validated via `parseWorkflowTrigger()`:
|
||||
- `name`: non-empty string (trimmed)
|
||||
- `maxRounds`: positive integer >= 1
|
||||
- `prompt`: string
|
||||
- `dryRun`: boolean
|
||||
|
||||
**Critical behavior**: Invalid workflows are silently dropped (become null) but signal emission continues. This prevents malformed workflow config from blocking signals.
|
||||
|
||||
### 3. Fallback to Shorthand
|
||||
Any value that doesn't match explicit format becomes:
|
||||
```typescript
|
||||
{ signal: payload, workflow: null }
|
||||
```
|
||||
|
||||
## Processing Flow
|
||||
|
||||
```typescript
|
||||
// In kernel.ts handleSenseWorkerSignal()
|
||||
const { signal: signalPayload, workflow } = routeResult.value;
|
||||
|
||||
// Signal is ALWAYS emitted when compute returns non-null
|
||||
bus.emit({ id, senseId, payload: signalPayload, timestamp });
|
||||
|
||||
// Workflow is started ONLY if workflow is non-null
|
||||
if (workflow !== null) {
|
||||
workflowManager.startWorkflow(workflow.name, { ... });
|
||||
}
|
||||
```
|
||||
|
||||
## Legacy String Format (Deprecated)
|
||||
|
||||
The old `"name|maxRounds|prompt"` string format is converted to the structured format internally but should not be used in new code.
|
||||
|
||||
## Key Behaviors
|
||||
|
||||
1. **Signal priority**: Every non-null compute result emits a signal, regardless of workflow
|
||||
2. **Additive behavior**: Valid workflow triggers are executed in addition to signal emission
|
||||
3. **Failure tolerance**: Invalid workflow directives are silently ignored, signal still emits
|
||||
4. **Structure-based routing**: No complex predicates - simply checks object structure and property existence
|
||||
|
||||
This routing mechanism ensures clean separation between perception (signals) and action (workflows) while maintaining backward compatibility.
|
||||
@@ -0,0 +1,132 @@
|
||||
# Storage Layer
|
||||
|
||||
Nerve uses multiple storage systems designed for different data types and access patterns.
|
||||
|
||||
## Core Storage Components
|
||||
|
||||
### 1. Log Store (`logs.db`)
|
||||
Append-only audit trail implemented in SQLite with WAL mode.
|
||||
|
||||
**Schema:**
|
||||
- `logs` — all system events (signals, workflow transitions, sense outputs)
|
||||
- `meta` — key-value store for system metadata
|
||||
- `workflow_runs` — materialized view of workflow execution state
|
||||
|
||||
**Key Features:**
|
||||
- Atomic workflow state updates via transactions
|
||||
- Thread message persistence for crash recovery
|
||||
- Configurable log archival to JSONL files
|
||||
- Full-text search across log entries
|
||||
|
||||
### 2. Sense Databases
|
||||
Each sense group gets its own SQLite database for private state.
|
||||
|
||||
**Characteristics:**
|
||||
- Isolated per sense group (e.g., `system-senses.db`)
|
||||
- Managed by individual sense compute functions
|
||||
- Drizzle ORM integration for schema management
|
||||
- No cross-sense data sharing
|
||||
|
||||
### 3. Knowledge Store (`knowledge.db`)
|
||||
Vector-enabled search index for project context.
|
||||
|
||||
**Contents:**
|
||||
- Chunked source files with embeddings
|
||||
- Curated knowledge cards from `.knowledge/`
|
||||
- Semantic search capabilities
|
||||
- Global vs. repo-scoped search modes
|
||||
|
||||
### 4. Blob Store (CAS)
|
||||
Content-addressable storage for large artifacts.
|
||||
|
||||
**Design:**
|
||||
- SHA-256 based file naming
|
||||
- Automatic deduplication
|
||||
- Used for workflow artifacts and large payloads
|
||||
|
||||
## Consistency & Isolation Mechanisms
|
||||
|
||||
### SQLite WAL Mode
|
||||
All SQLite databases use `PRAGMA journal_mode=WAL` for:
|
||||
- **Writer-reader concurrency** — readers don't block writers
|
||||
- **Atomic writes** — each transaction is fully applied or rolled back
|
||||
- **Crash recovery** — WAL provides consistent state after crashes
|
||||
|
||||
### Transaction Management
|
||||
|
||||
#### Log Store Transactions
|
||||
Uses `BEGIN IMMEDIATE` transactions (`packages/store/src/log-store.ts`):
|
||||
```typescript
|
||||
function runInTransaction<T>(db: DatabaseSync, fn: () => T): T {
|
||||
db.exec("BEGIN IMMEDIATE"); // Exclusive write lock
|
||||
try {
|
||||
const result = fn();
|
||||
db.exec("COMMIT");
|
||||
return result;
|
||||
} catch (e) {
|
||||
db.exec("ROLLBACK");
|
||||
throw e;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Key Operations:**
|
||||
- `upsertWorkflowRun()` — atomically writes log entry + workflow state
|
||||
- `archiveLogs()` — transactional export + delete + watermark update
|
||||
|
||||
#### Sense Database Isolation
|
||||
- Each sense group has its own SQLite file (e.g., `system-senses.db`)
|
||||
- No cross-sense transactions or coordination required
|
||||
- Independent schema migrations per sense
|
||||
- Private `_signals` table for signal history retention
|
||||
|
||||
### Process-Level Isolation
|
||||
|
||||
#### Worker Process Architecture
|
||||
- **One worker per sense group** — prevents data races within group
|
||||
- **One worker per workflow type** — isolated execution contexts
|
||||
- **No shared memory** — all communication via IPC messages
|
||||
|
||||
#### Concurrency Control
|
||||
Workflow manager enforces limits per workflow:
|
||||
```yaml
|
||||
workflows:
|
||||
my-workflow:
|
||||
concurrency: 2 # Max parallel threads
|
||||
overflow: "queue" # or "drop"
|
||||
maxQueue: 10 # Queue depth limit
|
||||
```
|
||||
|
||||
### Consistency Guarantees & Failure Modes
|
||||
|
||||
**Strong Consistency (Single Database)**:
|
||||
1. **Within Log Store** — ACID transactions with immediate consistency
|
||||
2. **Within Sense DB** — WAL mode ensures atomic commits per database
|
||||
3. **Workflow State** — `upsertWorkflowRun()` atomically updates log + materialized view
|
||||
|
||||
**No Cross-Database Consistency**:
|
||||
- No distributed transactions across multiple SQLite files
|
||||
- Log Store and Sense Databases can temporarily diverge during failures
|
||||
- Signal emission and workflow triggering are separate, non-atomic operations
|
||||
|
||||
**Failure Recovery Mechanisms**:
|
||||
- **Sense worker crash**: State rebuilt from sense SQLite database on respawn
|
||||
- **Workflow worker crash**: Thread state recovered from log store message history
|
||||
- **Kernel crash**: All workers respawned, state recovered from persistent stores
|
||||
- **Log Store corruption**: WAL recovery on database open
|
||||
- **Sense DB corruption**: Migrations re-run, `_signals` table rebuilt if needed
|
||||
|
||||
**Rollback Scenarios**:
|
||||
- **Log write failure**: Transaction rolled back, no state changes persisted
|
||||
- **Sense compute failure**: Error logged, no signal/workflow emitted
|
||||
- **Workflow failure**: Thread marked as failed in materialized view
|
||||
- **IPC failure**: Worker respawned, pending operations lost (not rolled back)
|
||||
|
||||
## Archive Strategy
|
||||
|
||||
Logs older than retention window (default 30 days) are:
|
||||
1. Exported to `data/archive/logs/YYYY-MM-DD.jsonl`
|
||||
2. Deleted from active database
|
||||
3. Watermark updated to prevent re-processing
|
||||
|
||||
This keeps the active database size bounded while preserving audit trails.
|
||||
@@ -0,0 +1,152 @@
|
||||
# Worker Isolation
|
||||
|
||||
Nerve's worker architecture ensures complete isolation between different types of user code while maintaining system stability.
|
||||
|
||||
## Process Architecture
|
||||
|
||||
```
|
||||
Kernel (Main Process)
|
||||
├── Sense Worker (Group A) ── sense-1, sense-2
|
||||
├── Sense Worker (Group B) ── sense-3, sense-4
|
||||
├── Workflow Worker (cleanup) ── cleanup workflow instances
|
||||
└── Workflow Worker (review) ── review workflow instances
|
||||
```
|
||||
|
||||
## Isolation Boundaries
|
||||
|
||||
### 1. Sense Workers
|
||||
- **One worker per sense group** (configured in `nerve.yaml`)
|
||||
- Groups share a child process but have isolated execution contexts
|
||||
- Crash in one sense doesn't affect other groups
|
||||
- Each group has its own SQLite database
|
||||
|
||||
### 2. Workflow Workers
|
||||
- **One worker per workflow type** (spawned on-demand)
|
||||
- Multiple threads of the same workflow share a worker process
|
||||
- Concurrency limits enforced at the workflow level
|
||||
- Workers terminate when no active threads remain
|
||||
|
||||
### 3. Kernel Protection
|
||||
- **User code never runs in kernel process**
|
||||
- All `compute()` and workflow role functions run in workers
|
||||
- Kernel only handles IPC, scheduling, and coordination
|
||||
- System remains stable even with infinite loops or crashes in user code
|
||||
|
||||
## Worker Lifecycle
|
||||
|
||||
### Sense Workers
|
||||
```
|
||||
nerve daemon start → spawn worker per group → long-lived process
|
||||
→ hot reload on file changes
|
||||
→ respawn on crash
|
||||
```
|
||||
|
||||
### Workflow Workers
|
||||
```
|
||||
workflow trigger → check existing worker → reuse or spawn
|
||||
→ execute thread
|
||||
→ terminate when idle
|
||||
```
|
||||
|
||||
## Communication Patterns
|
||||
|
||||
### Kernel ↔ Sense Worker
|
||||
- IPC via child process stdio
|
||||
- JSON-formatted messages
|
||||
- Worker reports signals back to kernel
|
||||
- Bidirectional: kernel can request immediate computes
|
||||
|
||||
### Kernel ↔ Workflow Worker
|
||||
- Similar IPC protocol
|
||||
- Workflow definition loaded in worker
|
||||
- Role execution results streamed back
|
||||
- Thread state managed in kernel
|
||||
|
||||
## Resource Limits & Control
|
||||
|
||||
### Timeout Enforcement
|
||||
Configurable timeouts per sense (in `nerve.yaml`):
|
||||
```yaml
|
||||
senses:
|
||||
my-sense:
|
||||
timeout: 30000 # Execution timeout (ms)
|
||||
gracePeriod: 5000 # Grace period before hard kill
|
||||
```
|
||||
|
||||
**Timeout Implementation:**
|
||||
- `AbortController` for async operations
|
||||
- `Promise.race()` between compute and timeout
|
||||
- Grace period triggers `process.exit(1)` to kill entire worker group
|
||||
|
||||
### Memory & CPU Limits
|
||||
**No Application-Level Resource Quotas**:
|
||||
- No memory caps, CPU throttling, or disk I/O limits enforced by Nerve
|
||||
- Workers can consume arbitrary system resources until OS limits
|
||||
- No cgroup/container isolation — full filesystem access within user permissions
|
||||
- No syscall filtering (no seccomp restrictions)
|
||||
|
||||
**OS-Level Constraints Only**:
|
||||
- Process memory limited by system `ulimit -m`
|
||||
- CPU usage bounded by scheduler only
|
||||
- Network requests unrestricted
|
||||
- Can spawn additional processes (not tracked by Nerve)
|
||||
|
||||
### Concurrency Control
|
||||
|
||||
#### Sense Workers
|
||||
- One active compute per sense at a time (serialized via promise chains)
|
||||
- No memory sharing between sense groups
|
||||
- Crash isolation: one sense crash doesn't affect other groups
|
||||
|
||||
#### Workflow Workers
|
||||
Per-workflow limits configured in `nerve.yaml`:
|
||||
```yaml
|
||||
workflows:
|
||||
my-workflow:
|
||||
concurrency: 2 # Max parallel threads
|
||||
overflow: "drop" # or "queue"
|
||||
maxQueue: 10 # Queue size limit
|
||||
```
|
||||
|
||||
### Process Management
|
||||
|
||||
#### Signal Handling
|
||||
Workers ignore session broadcast signals (SIGINT/SIGTERM):
|
||||
```typescript
|
||||
// Workers ignore terminal signals; kernel coordinates shutdown
|
||||
process.on("SIGINT", () => {});
|
||||
process.on("SIGTERM", () => {});
|
||||
```
|
||||
|
||||
#### Graceful Shutdown & State Handoff
|
||||
**Sense Workers**:
|
||||
- IPC `shutdown` message → `process.exit(0)` (immediate)
|
||||
- No graceful termination period for senses
|
||||
- State rebuilt from SQLite on respawn (no handoff needed)
|
||||
|
||||
**Workflow Workers**:
|
||||
- IPC `shutdown` → wait for in-flight threads to complete
|
||||
- Drain timeout: `WORKER_SHUTDOWN_TIMEOUT_MS` (10s)
|
||||
- If threads don't complete → `SIGKILL` force termination
|
||||
- Thread state preserved in log store for crash recovery
|
||||
|
||||
**State Handoff Mechanism**:
|
||||
- No explicit state transfer between old/new workers
|
||||
- Sense workers: SQLite database contains full state
|
||||
- Workflow workers: Log store contains thread message history
|
||||
- Kernel coordinates recovery via `recoverThreadsForWorker()`
|
||||
|
||||
## Failure Handling
|
||||
|
||||
### Worker Crashes
|
||||
- **Sense workers**: Automatic respawn after 1s delay, state rebuilt from DB
|
||||
- **Workflow workers**: Crash recovery from log store thread messages
|
||||
- **Kernel protection**: Main process continues, marks affected runs as crashed
|
||||
- **Crash limits**: Max 5 crashes per workflow in 60s window (prevents infinite respawn)
|
||||
|
||||
### Resource Exhaustion
|
||||
- **Memory**: Worker process killed by OS, kernel respawns automatically
|
||||
- **Compute timeout**: Grace period → hard kill → respawn
|
||||
- **Infinite loops**: Timeout enforcement prevents hanging indefinitely
|
||||
|
||||
This architecture allows Nerve to run untrusted or experimental code safely while maintaining system availability.
|
||||
@@ -57,3 +57,66 @@ const workflow: WorkflowDefinition<MyMeta> = {
|
||||
- `prompt: string | ((start, messages) => Promise<string>)` — static or dynamic
|
||||
- `meta: z.ZodType<M>` — Zod schema, directly (no wrapper needed)
|
||||
- `extract: LlmExtractorConfig` — provider for structured extraction
|
||||
|
||||
## Runtime Enforcement Mechanisms
|
||||
|
||||
### Role Authority & Validation
|
||||
|
||||
**Role Function Lookup**:
|
||||
- Roles accessed via `def.roles[nextRole]` dictionary lookup
|
||||
- Unknown roles trigger immediate workflow error (`Unknown role: ${nextRole}`)
|
||||
- No dynamic role registration during execution
|
||||
|
||||
**Result Validation** (`validateRoleResult()`):
|
||||
```typescript
|
||||
// Required return shape from every role function
|
||||
{ content: string, meta: Record<string, unknown> }
|
||||
```
|
||||
- `content` must be string (non-string → workflow error)
|
||||
- `meta` must be plain object (array/null/primitive → workflow error)
|
||||
- Validation failure terminates thread immediately
|
||||
|
||||
### Moderator Authority & Routing Control
|
||||
|
||||
**Next Role Selection**:
|
||||
- Moderator must return role name from `roles` keys OR `END` symbol
|
||||
- Called after every role completion (receives full context)
|
||||
- No validation of role name until execution attempt
|
||||
- Pure function constraint: cannot perform side effects
|
||||
|
||||
**Causal Chain Integrity**:
|
||||
- Moderator receives immutable history: `{ start, steps }`
|
||||
- Steps array contains ALL role outputs in chronological order
|
||||
- No role can modify prior steps or start metadata
|
||||
- Thread context built from log store on crash recovery
|
||||
|
||||
### Unauthorized Command Event Prevention
|
||||
|
||||
**Message Flow Control**:
|
||||
- Role functions have NO direct access to kernel IPC
|
||||
- All outputs flow through `sendWorkflowMessage()` wrapper
|
||||
- Worker process validates messages before kernel transmission
|
||||
- No direct log store database access from roles
|
||||
|
||||
**Process Isolation**:
|
||||
- Roles execute in forked worker processes (not kernel)
|
||||
- File system access limited to user permissions
|
||||
- No network isolation (roles can make arbitrary HTTP calls)
|
||||
- Worker has read/write access to workflow workspace only
|
||||
|
||||
### Concurrent Thread Management
|
||||
|
||||
**Kill Flag Implementation**:
|
||||
```typescript
|
||||
type KillFlag = { value: boolean };
|
||||
// Checked before role execution and after completion
|
||||
if (killFlag.value) {
|
||||
sendThreadEvent(runId, "killed", { exitCode: 137 });
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
**Concurrency Enforcement**:
|
||||
- Workflow manager enforces per-workflow limits in kernel
|
||||
- Excess threads queued/dropped per overflow policy
|
||||
- No role can spawn additional threads (no access to workflow manager)
|
||||
|
||||
Reference in New Issue
Block a user