RFC: Khala — Stateless Agent Pool Cloud Workflow Orchestrator #119
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
acdea25b595a8484d3ae3d2445ea3ec3601ac760
Summary
Khala is a cloud-native (Cloudflare Workers + D1 + Durable Objects) reactive workflow orchestrator for cross-agent coordination. It treats all agents as a stateless, homogeneous worker pool — any agent can execute any role turn at any time.
Supersedes #115 (original RFC with named channels and recruitment phases).
Key Insight
Two properties of our agent fleet make the design radically simple:
shazhou/skills) — all agents have equivalent capabilitiesTherefore agents are interchangeable work units. No need for named channels, role binding, recruitment phases, or offline recovery.
Architecture
Cloud Workflow Definition
Role Definition
A role is "who you are and what you're responsible for", not "how to do it". Execution details come from the agent's local skills and tools.
Turn Event
When moderator assigns a turn, the agent receives:
Thread Context: Query, Don't Dump
Thread history is NOT bulk-loaded into agent context. Instead, agents get a query interface to pull what they need:
This:
The query capability is injected as a tool/context-provider into the local workflow that executes the turn.
Agent-Side Integration
Zero new primitives. Standard nerve sense/signal/reflex:
Capacity management: a CapacitySense checks local workflow load. Reflex only fires when the agent has bandwidth.
Khala Internals
Khala is a nerve subset: only reflex + workflow, no sense.
Transport
Event Ordering
Causal ordering only. Moderator serializes turns within a thread. No global total order needed.
Auth
Per-agent token. Agents register with khala on
nerve init. Orchestrator maintains an agent registry.Relationship to Pulseflare
Khala replaces pulseflare. The D1 event store from pulseflare becomes the persistence layer for thread messages.
Open Questions
Next Steps
Design Review 补充讨论
Turn Timeout + 乐观锁
Turn 带
claim_id,agent POST response 时校验 claim_id 是否还是当前 holder:两层 timeout:
超时值可配置在 workflow definition 里。
Workflow Initiation: API Only
Cloud workflow 是纯 reactive,不需要引入 sense/reflex 概念。
POST /workflows创建 thread 就够了。Result Aggregation
Moderator 定义
terminal状态,到达时把最后一条 message(或 moderator 汇总)作为 workflow result,写回 D1 + 通知 initiator(webhook 或 polling)。Observability
thread_id天然是 trace ID。每个 turn event 带 timestamp + agent_id + role,D1 里形成 audit log。加个GET /threads/:id/trace即可。统一 Workflow 命名:Git 语义
Local 和 cloud workflow 统一管理,采用 git branch 命名风格:
originnerve workflow run origin/code-review→ POST 到 nerveflare 创建 threadnerve workflow logs origin/code-review#thread-123→ 查 thread 历史Config 预留多 remote 支持(先只实现
origin):Workflow definition 统一 YAML,用
binding: local或binding: cloud区分。Moderator Format
JSONata 先行,够用再说。避免过早抽象 DSL。
CapacitySense
建议第一版先不做,hardcode max concurrent turns,等跑起来看实际瓶颈再优化。
RFC: Nerveflare — Stateless Agent Pool Cloud Workflow Orchestratorto RFC: Khala — Stateless Agent Pool Cloud Workflow OrchestratorNaming Update
Renamed from Nerveflare to Khala (卡拉).
Inspired by StarCraft's Protoss Khala — a psychic link connecting all individuals as equals, sharing knowledge and consciousness.
Package:
packages/khalaDeployment:
khala.shazhou.workers.dev