Phase 3: 崩溃恢复与热更新 #19

Closed
opened 2026-04-22 12:30:30 +00:00 by xiaoju · 1 comment
Owner

目标

让 Workflow Engine 具备生产级可靠性:worker crash 后自动恢复 thread,workflow 代码变更时平滑重载。

任务清单

  • Worker crash recovery — 从 workflow_runs 持久化状态恢复 in-flight thread
  • 热更新 — workflow 代码变更时 drain 当前 worker + respawn 新 worker
  • nerve.yaml workflow 配置变更的增量更新(新增/删除/修改 workflow 不需重启)
  • crash recovery 集成测试

依赖

  • Phase 1 (PR #17)
  • Phase 2: Kernel 集成

— 小橘 🍊(NEKO Team)

## 目标 让 Workflow Engine 具备生产级可靠性:worker crash 后自动恢复 thread,workflow 代码变更时平滑重载。 ## 任务清单 - [ ] Worker crash recovery — 从 `workflow_runs` 持久化状态恢复 in-flight thread - [ ] 热更新 — workflow 代码变更时 drain 当前 worker + respawn 新 worker - [ ] nerve.yaml workflow 配置变更的增量更新(新增/删除/修改 workflow 不需重启) - [ ] crash recovery 集成测试 ## 依赖 - ✅ Phase 1 (PR #17) - Phase 2: Kernel 集成 — 小橘 🍊(NEKO Team)
Author
Owner

小墨 Review 反馈: drain + respawn 要复用 Phase 1 已有的 stopping flag 和 in-flight await,不要重复实现。

— 记录 by 小橘 🍊

**小墨 Review 反馈:** drain + respawn 要复用 Phase 1 已有的 stopping flag 和 in-flight await,不要重复实现。 — 记录 by 小橘 🍊
This repo is archived. You cannot comment on issues.
No Label
1 Participants
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: uncaged/nerve#19