9c832b0e21
7 cards updated, 4 new cards added. Topics: signal-routing, worker-isolation, storage-layer, adapter-isolation, sense contracts, workflow runtime enforcement, coding conventions details. 小橘 <xiaoju@shazhou.work>
6.6 KiB
6.6 KiB
Adapter Process Isolation
Describes sandboxing, process isolation, resource limits, and timeout enforcement for adapter invocations in the Nerve workflow system.
Process Isolation Model
Adapters run in a two-tier isolation model:
- Workflow Worker Process — Each workflow runs in a dedicated Node.js worker process (
workflow-worker.ts) forked from the main daemon - Adapter Child Process — Each adapter spawns CLI tools as child processes via
spawnSafe()withshell: false
Resource Limits & Timeouts
Adapter-Level Timeouts
- Default timeout: 300 seconds (300,000ms) for both cursor and hermes adapters
- Configurable via
AgentConfig.timeoutin adapter factory functions - Wall-clock enforcement using
setTimeout()— kills child process withSIGTERMon timeout - AbortSignal support — external cancellation triggers immediate
SIGTERM
Timeout Behavior
// Timeout resolution priority (packages/core/src/spawn-safe.ts):
// 1. Explicit timeoutMs value
// 2. AbortSignal presence → no internal timer (relies on external abort)
// 3. DEFAULT_TIMEOUT_MS (300_000) fallback
- Child process terminated with
SIGTERMon timeout/abort - Returns
{ kind: "timeout", stdout, stderr }error result - No grace period — immediate kill
- No SIGKILL escalation — relies entirely on
SIGTERMeffectiveness
SIGTERM Limitations
If a child process ignores or blocks SIGTERM (e.g., signal handlers, blocked delivery):
- No fallback to
SIGKILL— process may remain alive indefinitely - No escalation timer — spawnSafe() does not implement progressive signal escalation
- Potential zombie/orphan risk — unresponsive processes continue consuming resources
- OS-level cleanup only — relies on parent process death or OS reaping mechanisms
Sandboxing Characteristics
What's Isolated
- File system: Child process runs in specified
cwd(workflow working directory) - Environment: Controlled env vars via
nerveCommandEnv()+ optional overrides - Network: No explicit restrictions (inherits parent process network access)
- Process tree: Child processes are direct children, not containerized
What's NOT Sandboxed
- No resource quotas (CPU, memory, disk I/O limits)
- No filesystem chroot/containers — full filesystem access within user permissions
- No network isolation — can make arbitrary network calls
- No syscall filtering — no seccomp or similar restrictions
Runtime Resource Enforcement
No active resource monitoring or constraints:
- No cgroups (Linux) — no CPU, memory, or I/O limits enforced
- No job objects (Windows) — no resource quotas or process tree limits
- No worker_threads resource tracking — Node.js worker processes run unrestricted
- Pure timeout-based enforcement — only wall-clock time limits via
setTimeout() - OS-scheduled resource sharing — relies entirely on operating system process scheduling
Adapters can consume unlimited:
- CPU time (until timeout)
- Memory (until OOM)
- Disk I/O (no quotas)
- Network bandwidth (no throttling)
- File descriptors (until ulimit)
Environment Variable Security
The nerveCommandEnv() function provides minimal sanitization:
// spawn-safe.ts lines 47-55
export function nerveCommandEnv(): SpawnEnv {
const home = homedir();
const pnpmHome = join(home, ".local/share/pnpm");
return {
...process.env, // ← Full parent environment inherited
PNPM_HOME: pnpmHome,
PATH: `${pnpmHome}:${process.env.PATH ?? ""}`,
};
}
- No filtering of sensitive keys —
NODE_OPTIONS,LD_PRELOAD,PYTHONPATHpassed through unchanged - Full environment inheritance — all parent process environment variables copied
- Injection risk — malicious env vars (e.g.,
NODE_OPTIONS=--require=evil.js) affect Node.js child processes - Path manipulation — sensitive PATH entries remain accessible to adapters
Security Model
Execution Context
- Uses
shell: falseto prevent shell injection attacks - Arguments passed as separate array elements (not shell-parsed)
- PATH includes
~/.local/share/pnpmfor tool discovery - Inherits parent process user/group permissions
File Descriptor Management
// spawn-safe.ts line 122
stdio: ["ignore", "pipe", "pipe"]
- stdin closed: Child receives no input (
stdio[0]: "ignore") - stdout/stderr captured: Piped to parent for collection (
stdio[1,2]: "pipe") - No explicit fd closing: Node.js default behavior — inherits other file descriptors
- Parent sockets/pipes accessible: Child can access parent's open network connections, database handles, etc.
- Security risk: Adapter processes may access unintended parent file descriptors
Attack Surface
- CLI tools have full user-level filesystem access
- Can spawn additional processes (not tracked/limited)
- Network requests unrestricted
- Resource consumption relies on OS-level limits
Worker Process Management
Workflow Isolation
- Each workflow type gets dedicated worker process
- Worker processes handle multiple concurrent threads (runIds)
- Kill flags enable per-thread cancellation without killing worker
- Graceful shutdown waits up to 10 seconds for in-flight operations
Cross-RunId Contamination Risks
Shared mutable state poses contamination risks between concurrent runIds:
process.envmutations: Environment changes affect all subsequent runIds in same workerrequire.cachepollution: Module cache shared across all runIds — side effects persist- Global variables: Any global state mutations from one runId visible to others
process.cwd()changes: Working directory changes affect entire worker process- File descriptors: Open files/sockets shared between runId executions
No runId-specific scoping implemented:
- Worker reuses single Node.js process for efficiency
- Each role execution sees cumulative environment from previous runIds
- Mitigation relies on adapter discipline — clean implementations avoid global mutations
Error Handling
- Adapter failures don't crash the worker process
- Timeout/abort errors are isolated to specific role execution
- Worker process survives adapter failures and continues serving other threads
Configuration
# Example nerve.yaml configuration for timeout overrides
workflows:
my-workflow:
roles:
coder:
adapter:
type: cursor
timeout: 600000 # 10 minutes in milliseconds
Timeout configuration happens at the adapter creation level, not as a system-wide sandbox policy.