docs: RFC v3 — named payload fields, refs as GC index, merge parent+ancestors

- payload is source of truth with named fields (start, content, ancestors, compact)
- refs[] auto-derived by collectRefs(), pure GC index
- parent merged into ancestors[0]

小橘 <xiaoju@shazhou.work>
This commit is contained in:
2026-05-09 07:12:29 +00:00
parent 6c2a137aef
commit 8f78a00063
+25 -19
View File
@@ -54,31 +54,29 @@ CAS blob:
payload: { payload: {
role: "coder", role: "coder",
meta: { ... }, meta: { ... },
start: "<start_hash>",
content: "<content_merkle_hash>",
ancestors: ["<parent_hash>", "<grandparent_hash>", ...],
compact: null,
timestamp: 1234567890 timestamp: 1234567890
}, },
refs: [ refs: [<start_hash>, <content_hash>, <parent_hash>, ...]
<start_hash>, // refs[0]: always the StartNode
<parent_hash>, // refs[1]: previous StateNode (null for first step)
<content_hash>, // refs[2]: content Merkle node (carries role artifact refs)
...ancestors, // refs[3..N]: skip-list of up to 10 ancestor StateNode hashes
]
} }
``` ```
**Fixed ref positions:** **Payload is the source of truth.** Application code reads named fields from payload. `refs[]` is a **GC index** — automatically derived from payload by collecting all CAS hashes. GC only scans `refs[]` without understanding payload structure.
| Index | Meaning | Nullable | **Payload fields:**
|-------|---------|----------|
| 0 | StartNode hash | No |
| 1 | Parent StateNode hash | Yes (null for first step after start) |
| 2 | Content Merkle node hash | No |
| 3+ | Ancestor skip-list (≤ 10 most recent ancestors, newest first) | Optional |
**Optional payload fields:**
| Field | Type | Meaning | | Field | Type | Meaning |
|-------|------|---------| |-------|------|---------|
| `role` | `string` | Role name, or `"__end__"` for completion |
| `meta` | `object` | Structured metadata extracted from agent output |
| `start` | `string` | StartNode hash |
| `content` | `string` | Content Merkle node hash (carries role artifact refs) |
| `ancestors` | `string[]` | `[parent, grandparent, ...]` — up to 11 entries (1 parent + 10 skip-list). Empty for first step after start. `ancestors[0]` is the direct parent. |
| `compact` | `string \| null` | CAS hash of a compacted summary of all nodes before this one. When present, LLM context assembly can use this instead of walking the full chain. | | `compact` | `string \| null` | CAS hash of a compacted summary of all nodes before this one. When present, LLM context assembly can use this instead of walking the full chain. |
| `timestamp` | `number` | Unix timestamp in ms |
### Content Merkle Node ### Content Merkle Node
@@ -121,9 +119,13 @@ An end is just a StateNode with `role: "__end__"`:
payload: { payload: {
role: "__end__", role: "__end__",
meta: { returnCode: 0, summary: "completed successfully" }, meta: { returnCode: 0, summary: "completed successfully" },
start: "<start_hash>",
content: "<content_hash>",
ancestors: ["<parent_hash>", ...],
compact: null,
timestamp: 1234567891 timestamp: 1234567891
}, },
refs: [<start_hash>, <parent_hash>, <content_hash>, ...ancestors] refs: [<start_hash>, <content_hash>, <parent_hash>, ...]
} }
``` ```
@@ -165,11 +167,11 @@ Benefits:
### Ancestor Skip-List ### Ancestor Skip-List
Each StateNode carries up to 10 ancestor hashes in `refs[3..N]` (newest first): Each StateNode carries up to 11 entries in `payload.ancestors` (1 parent + 10 skip-list, newest first):
``` ```
Node 15: refs = [start, node14, content, node13, node12, node11, node10, node9, node8, node7, node6, node5, node4] Node 15: ancestors = [node14, node13, node12, node11, node10, node9, node8, node7, node6, node5, node4]
^--- ancestors (10 most recent) ---^ ^parent ^--- skip-list (10 most recent) ---^
``` ```
This enables: This enables:
@@ -221,6 +223,10 @@ Simple mark-and-sweep:
No per-row format parsing needed. GC only needs to understand `refs[]`. No per-row format parsing needed. GC only needs to understand `refs[]`.
### refs[] Derivation
`refs[]` is auto-derived from payload at write time via a `collectRefs(payload)` function that extracts all CAS hash strings from named fields (`start`, `content`, `ancestors`, `compact`). Application code never reads `refs[]` — it reads named payload fields. This makes `refs[]` a pure GC optimization with zero semantic coupling.
### Extract Phase ### Extract Phase
The Extractor is expanded from the current design. Currently it only extracts `meta` from agent output. In the new design it extracts: The Extractor is expanded from the current design. Currently it only extracts `meta` from agent output. In the new design it extracts: