- Move formatChildExitSummary/formatCapturedStderrTail to worker-runtime.ts - Move ignoreSessionBroadcastSignals to new worker-signals.ts - Delete worker-fork-support.ts (teeCapturedStderr no longer used) - Update .knowledge/worker-isolation.md and architecture.md for WorkerRuntime - All 167 tests pass, biome check clean Closes #283
5.7 KiB
Worker Isolation
Nerve's worker architecture ensures complete isolation between different types of user code while maintaining system stability.
Process Architecture
Kernel (Main Process)
├── Sense Worker (Group A) ── sense-1, sense-2
├── Sense Worker (Group B) ── sense-3, sense-4
├── Workflow Worker (cleanup) ── cleanup workflow instances
└── Workflow Worker (review) ── review workflow instances
WorkerRuntime (RFC-006)
Forked worker processes are managed by WorkerRuntime (worker-runtime.ts): one Node child per logical key, cold start, optional respawn after crash, drain/evict, and coordinated shutdown over IPC. worker-pool.ts (sense groups) and workflow-manager.ts (workflow types) both configure and delegate to createWorkerRuntime instead of owning ad-hoc fork logic.
Worker entrypoints (sense-worker.ts, workflow-worker.ts) import lightweight helpers only — e.g. worker-signals.ts for session broadcast signal handling — so they do not pull in the parent-side runtime module.
Isolation Boundaries
1. Sense Workers
- One worker per sense group (configured in
nerve.yaml) - Groups share a child process but have isolated execution contexts
- Crash in one sense doesn't affect other groups
- Each group has its own SQLite database
2. Workflow Workers
- One worker per workflow type (spawned on-demand)
- Multiple threads of the same workflow share a worker process
- Concurrency limits enforced at the workflow level
- Workers terminate when no active threads remain
3. Kernel Protection
- User code never runs in kernel process
- All
compute()and workflow role functions run in workers - Kernel only handles IPC, scheduling, and coordination
- System remains stable even with infinite loops or crashes in user code
Worker Lifecycle
Sense Workers
nerve daemon start → spawn worker per group → long-lived process
→ hot reload on file changes
→ respawn on crash
Workflow Workers
workflow trigger → check existing worker → reuse or spawn
→ execute thread
→ terminate when idle
Communication Patterns
Kernel ↔ Sense Worker
- IPC via child process stdio
- JSON-formatted messages
- Worker reports signals back to kernel
- Bidirectional: kernel can request immediate computes
Kernel ↔ Workflow Worker
- Similar IPC protocol
- Workflow definition loaded in worker
- Role execution results streamed back
- Thread state managed in kernel
Resource Limits & Control
Timeout Enforcement
Configurable timeouts per sense (in nerve.yaml):
senses:
my-sense:
timeout: 30000 # Execution timeout (ms)
gracePeriod: 5000 # Grace period before hard kill
Timeout Implementation:
AbortControllerfor async operationsPromise.race()between compute and timeout- Grace period triggers
process.exit(1)to kill entire worker group
Memory & CPU Limits
No Application-Level Resource Quotas:
- No memory caps, CPU throttling, or disk I/O limits enforced by Nerve
- Workers can consume arbitrary system resources until OS limits
- No cgroup/container isolation — full filesystem access within user permissions
- No syscall filtering (no seccomp restrictions)
OS-Level Constraints Only:
- Process memory limited by system
ulimit -m - CPU usage bounded by scheduler only
- Network requests unrestricted
- Can spawn additional processes (not tracked by Nerve)
Concurrency Control
Sense Workers
- One active compute per sense at a time (serialized via promise chains)
- No memory sharing between sense groups
- Crash isolation: one sense crash doesn't affect other groups
Workflow Workers
Per-workflow limits configured in nerve.yaml:
workflows:
my-workflow:
concurrency: 2 # Max parallel threads
overflow: "drop" # or "queue"
maxQueue: 10 # Queue size limit
Process Management
Signal Handling
Workers ignore session broadcast signals (SIGINT/SIGTERM) via ignoreSessionBroadcastSignals() in worker-signals.ts:
// Workers ignore terminal signals; kernel coordinates shutdown
process.on("SIGINT", () => {});
process.on("SIGTERM", () => {});
Graceful Shutdown & State Handoff
Sense Workers:
- IPC
shutdownmessage →process.exit(0)(immediate) - No graceful termination period for senses
- State rebuilt from SQLite on respawn (no handoff needed)
Workflow Workers:
- IPC
shutdown→ wait for in-flight threads to complete - Drain timeout:
WORKER_SHUTDOWN_TIMEOUT_MS(10s) - If threads don't complete →
SIGKILLforce termination - Thread state preserved in log store for crash recovery
State Handoff Mechanism:
- No explicit state transfer between old/new workers
- Sense workers: SQLite database contains full state
- Workflow workers: Log store contains thread message history
- Kernel coordinates recovery via
recoverThreadsForWorker()
Failure Handling
Worker Crashes
- Sense workers: Automatic respawn after 1s delay, state rebuilt from DB
- Workflow workers: Crash recovery from log store thread messages
- Kernel protection: Main process continues, marks affected runs as crashed
- Crash limits: Max 5 crashes per workflow in 60s window (prevents infinite respawn)
Resource Exhaustion
- Memory: Worker process killed by OS, kernel respawns automatically
- Compute timeout: Grace period → hard kill → respawn
- Infinite loops: Timeout enforcement prevents hanging indefinitely
This architecture allows Nerve to run untrusted or experimental code safely while maintaining system availability.