RFC: Cloud Workflow Orchestrator for Cross-Agent Coordination #115
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Design a cloud-native (Cloudflare Workers) reactive orchestrator for cross-agent coordination, sharing nerve's workflow/moderator semantics but operating in a purely event-driven (passive) mode — no sense layer.
Motivation
Nerve daemons currently run as single-machine processes with in-memory signal bus. There is no mechanism for agents to coordinate across machines. Real scenarios include:
Core Model
The cloud orchestrator is a subset of nerve: only
reflex + workflow, nosense.Primitives
Workflow Roles
Two types:
The "Dungeon Queue" Pattern
Cross-agent workflows follow a recruitment → execution lifecycle, modeled as nested workflows:
Agent-side Integration
Each nerve daemon gets a
cloud-workflow-adapter(peer to local workflow-manager):Architecture
Relationship to Pulseflare
Pulseflare (currently a passive event store on CF D1) evolves into this orchestrator. The existing event append/query API becomes the foundation; workflow engine and recruitment logic are built on top.
Schema Validation
Consumer-side validation (Robustness Principle):
Open Questions
Next Steps
Updates from discussion
Naming
The cloud orchestrator lives in the nerve monorepo as a new package: nerveflare (
packages/nerveflare). Replaces pulseflare.Agent-side Model
The local nerve daemon interacts with nerveflare through standard sense/signal/reflex — no special adapter needed:
Capacity Awareness
"Am I busy?" can itself be a sense — e.g., count active local workflow threads. Reflex for open channel claims only fires when capacity sense reports availability. This keeps the claim decision within nerve's own model.
Revised Architecture (agent side)
This means zero new primitives on the agent side. Cross-agent coordination is just another sense source + workflow target. The entire nerve model (sense → signal → reflex → workflow) stays intact.
Design Review 讨论记录
亮点 👍
讨论要点
1. Transport 选择
建议 REST + CF DO WebSocket 双支持。REST 做 fallback,WebSocket(DO hibernation API)做实时 role-turn 推送。纯 polling 延迟高,纯 WS 需要重连逻辑。
2. Agent 离线 mid-workflow
3. Event ordering
Causal ordering 够用。跨 agent 天然异步,role-turn 由 moderator 排序,不需要全局 total order(反而成瓶颈)。
4. Workflow 定义格式
建议复用 nerve YAML +
binding: cloud标记区分,workflow 作者不需要学两套语法。5. Auth
Per-agent token + agent-id claim。Orchestrator 维护 agent registry(哪些 agent 能参与哪些 role),token 在
nerve init时生成并注册。6. 本地 nerve 与 cloud workflow 的关系(已澄清)
本地 nerve 不感知 cloud-workflow 的存在。sense 层多一个"云端事件源"——收到任务/招募 claim 成功 → reflex 触发本地 workflow → 做完 POST 结果回去。跟处理本地事件(如 CPU 告警)没本质区别。两个 workflow 没有交集,最多共享一些 TS 类型和代码逻辑。
Superseded by #119 — simplified design with stateless agent pool model, no named channels or recruitment phases.