This repository has been archived on 2026-06-01. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
nerve/.knowledge/worker-isolation.md
xiaomo a1b1d5eaf1 chore: RFC-006 Phase 4 cleanup — delete worker-fork-support.ts
- Move formatChildExitSummary/formatCapturedStderrTail to worker-runtime.ts
- Move ignoreSessionBroadcastSignals to new worker-signals.ts
- Delete worker-fork-support.ts (teeCapturedStderr no longer used)
- Update .knowledge/worker-isolation.md and architecture.md for WorkerRuntime
- All 167 tests pass, biome check clean

Closes #283
2026-04-30 14:17:16 +00:00

5.7 KiB

Worker Isolation

Nerve's worker architecture ensures complete isolation between different types of user code while maintaining system stability.

Process Architecture

Kernel (Main Process)
├── Sense Worker (Group A) ── sense-1, sense-2
├── Sense Worker (Group B) ── sense-3, sense-4
├── Workflow Worker (cleanup) ── cleanup workflow instances
└── Workflow Worker (review) ── review workflow instances

WorkerRuntime (RFC-006)

Forked worker processes are managed by WorkerRuntime (worker-runtime.ts): one Node child per logical key, cold start, optional respawn after crash, drain/evict, and coordinated shutdown over IPC. worker-pool.ts (sense groups) and workflow-manager.ts (workflow types) both configure and delegate to createWorkerRuntime instead of owning ad-hoc fork logic.

Worker entrypoints (sense-worker.ts, workflow-worker.ts) import lightweight helpers only — e.g. worker-signals.ts for session broadcast signal handling — so they do not pull in the parent-side runtime module.

Isolation Boundaries

1. Sense Workers

  • One worker per sense group (configured in nerve.yaml)
  • Groups share a child process but have isolated execution contexts
  • Crash in one sense doesn't affect other groups
  • Each group has its own SQLite database

2. Workflow Workers

  • One worker per workflow type (spawned on-demand)
  • Multiple threads of the same workflow share a worker process
  • Concurrency limits enforced at the workflow level
  • Workers terminate when no active threads remain

3. Kernel Protection

  • User code never runs in kernel process
  • All compute() and workflow role functions run in workers
  • Kernel only handles IPC, scheduling, and coordination
  • System remains stable even with infinite loops or crashes in user code

Worker Lifecycle

Sense Workers

nerve daemon start → spawn worker per group → long-lived process
                   → hot reload on file changes
                   → respawn on crash

Workflow Workers

workflow trigger → check existing worker → reuse or spawn
                                       → execute thread
                                       → terminate when idle

Communication Patterns

Kernel ↔ Sense Worker

  • IPC via child process stdio
  • JSON-formatted messages
  • Worker reports signals back to kernel
  • Bidirectional: kernel can request immediate computes

Kernel ↔ Workflow Worker

  • Similar IPC protocol
  • Workflow definition loaded in worker
  • Role execution results streamed back
  • Thread state managed in kernel

Resource Limits & Control

Timeout Enforcement

Configurable timeouts per sense (in nerve.yaml):

senses:
  my-sense:
    timeout: 30000      # Execution timeout (ms)
    gracePeriod: 5000   # Grace period before hard kill

Timeout Implementation:

  • AbortController for async operations
  • Promise.race() between compute and timeout
  • Grace period triggers process.exit(1) to kill entire worker group

Memory & CPU Limits

No Application-Level Resource Quotas:

  • No memory caps, CPU throttling, or disk I/O limits enforced by Nerve
  • Workers can consume arbitrary system resources until OS limits
  • No cgroup/container isolation — full filesystem access within user permissions
  • No syscall filtering (no seccomp restrictions)

OS-Level Constraints Only:

  • Process memory limited by system ulimit -m
  • CPU usage bounded by scheduler only
  • Network requests unrestricted
  • Can spawn additional processes (not tracked by Nerve)

Concurrency Control

Sense Workers

  • One active compute per sense at a time (serialized via promise chains)
  • No memory sharing between sense groups
  • Crash isolation: one sense crash doesn't affect other groups

Workflow Workers

Per-workflow limits configured in nerve.yaml:

workflows:
  my-workflow:
    concurrency: 2        # Max parallel threads
    overflow: "drop"      # or "queue"
    maxQueue: 10         # Queue size limit

Process Management

Signal Handling

Workers ignore session broadcast signals (SIGINT/SIGTERM) via ignoreSessionBroadcastSignals() in worker-signals.ts:

// Workers ignore terminal signals; kernel coordinates shutdown
process.on("SIGINT", () => {});
process.on("SIGTERM", () => {});

Graceful Shutdown & State Handoff

Sense Workers:

  • IPC shutdown message → process.exit(0) (immediate)
  • No graceful termination period for senses
  • State rebuilt from SQLite on respawn (no handoff needed)

Workflow Workers:

  • IPC shutdown → wait for in-flight threads to complete
  • Drain timeout: WORKER_SHUTDOWN_TIMEOUT_MS (10s)
  • If threads don't complete → SIGKILL force termination
  • Thread state preserved in log store for crash recovery

State Handoff Mechanism:

  • No explicit state transfer between old/new workers
  • Sense workers: SQLite database contains full state
  • Workflow workers: Log store contains thread message history
  • Kernel coordinates recovery via recoverThreadsForWorker()

Failure Handling

Worker Crashes

  • Sense workers: Automatic respawn after 1s delay, state rebuilt from DB
  • Workflow workers: Crash recovery from log store thread messages
  • Kernel protection: Main process continues, marks affected runs as crashed
  • Crash limits: Max 5 crashes per workflow in 60s window (prevents infinite respawn)

Resource Exhaustion

  • Memory: Worker process killed by OS, kernel respawns automatically
  • Compute timeout: Grace period → hard kill → respawn
  • Infinite loops: Timeout enforcement prevents hanging indefinitely

This architecture allows Nerve to run untrusted or experimental code safely while maintaining system availability.