RFC: json-cas core design #1

Closed
opened 2026-05-17 09:03:13 +00:00 by xiaoju · 2 comments
Owner

Summary

Self-describing content-addressable storage. Every node has a type, every type is a JSON Schema stored in CAS itself.

Core Model

Node

{ type: Hash, payload: T, timestamp: number }
  • type — CAS hash of the payload's JSON Schema
  • payload — any JSON-compatible value, described by the schema
  • timestamp — ms since epoch, auto-filled on first write, not part of hash

Hash

Hash = XXH64(type_bytes ++ CBOR_deterministic(payload))
     → 13-char Crockford Base32
  • type participates: same payload under different schemas = different hash
  • timestamp excluded: same fact recorded at different times = same hash
  • Idempotent put: if hash exists, skip (keep original timestamp)

Storage Encoding

Nodes stored as binary blobs using CBOR (RFC 8949) deterministic encoding:

┌──────────┬──────────┬──────────┬──────────────┐
│ timestamp│ hash     │ type     │ payload      │
│ 8 bytes  │ 8 bytes  │ 8 bytes  │ CBOR bytes   │
│ u64 BE   │ XXH64 raw│ XXH64 raw│ deterministic│
└──────────┴──────────┴──────────┴──────────────┘

CBOR chosen over JSON for:

  • Deterministic encoding built into spec (§4.2) — no need for JCS
  • Unambiguous number encoding (integer vs float)
  • Native UTF-8 strings (no \\uXXXX ambiguity)
  • Compact binary representation

Schema

Schemas are JSON Schema documents, stored as CAS nodes themselves.

  • node.type is the hash of its schema node
  • Schema nodes's own type is the meta-schema — a self-referencing bootstrap node
  • type == hash(self) for the meta-schema (the only fixpoint in the system)

cas_ref Format

The only format L1 understands:

{
  "properties": {
    "content": { "type": "string", "format": "cas_ref" },
    "child": {
      "oneOf": [
        { "type": "string", "format": "cas_ref" },
        { "type": "null" }
      ]
    }
  }
}

cas_ref = this string is a CAS hash pointing to another node. Enables type-agnostic DAG traversal.

Other formats (markdown, date-time, uri) are not L1 concerns — upper layers interpret them. Markdown may contain cas:HASH links, but these are soft links (like hyperlinks), not structural edges.

Bootstrap

The meta-schema is the seed node. Its type field equals its own hash (self-referencing). This is the only hardcoded element in the system. JSON Schema's meta-schema is the fixpoint — a well-known spec that humans already computed.

Architecture

┌─────────────────────────────────┐
│  @uncaged/cli-json-cas          │  CLI (Node/Bun)
├─────────────────────────────────┤
│  @uncaged/json-cas              │  Core (pure logic, zero I/O)
│  ├─ hash.ts        XXH64+Base32 │
│  ├─ node.ts        construct/parse│
│  ├─ schema.ts      validation    │
│  └─ store.ts       Store interface│
├─────────────────────────────────┤
│  Storage Backends               │
│  ├─ @uncaged/json-cas-fs        │  Local filesystem
│  ├─ @uncaged/json-cas-kv        │  CF Workers KV
│  └─ (memory backend in core)    │  Testing
└─────────────────────────────────┘

Key constraint: @uncaged/json-cas imports zero platform APIs. All I/O injected via Store interface. Runs in Node, Bun, CF Workers, browsers.

Core API

type Hash = string;  // 13-char Crockford Base32

type Node<T = unknown> = {
  type: Hash;
  payload: T;
  timestamp: number;
};

type Store = {
  put(typeHash: Hash, payload: unknown): Promise<Hash>;
  get(hash: Hash): Promise<Node | null>;
  has(hash: Hash): Promise<boolean>;
  list(): Promise<Hash[]>;
};

// Hash & verify
function hash(typeHash: Hash, payload: unknown): Hash;
function verify(hash: Hash, node: Node): boolean;

// Schema
function putSchema(store: Store, schema: JSONSchema): Promise<Hash>;
function getSchema(store: Store, typeHash: Hash): Promise<JSONSchema | null>;
function validate(store: Store, node: Node): Promise<boolean>;

// Traversal
function refs(store: Store, node: Node): Promise<Hash[]>;
function walk(store: Store, root: Hash, visitor: Visitor): Promise<void>;

// Bootstrap
function bootstrap(store: Store): Promise<Hash>;

CLI (@uncaged/cli-json-cas)

json-cas init                              # Init .cas/ directory
json-cas put <type-hash> <file.json>       # Store node, print hash
json-cas get <hash>                        # Read node, print JSON
json-cas has <hash>                        # Check existence
json-cas verify <hash>                     # Verify integrity
json-cas list                              # List all hashes

json-cas schema put <schema.json>          # Register schema
json-cas schema get <type-hash>            # View schema
json-cas schema list                       # List schemas
json-cas schema validate <hash>            # Validate node against schema

json-cas refs <hash>                       # List direct cas_ref edges
json-cas walk <hash>                       # Recursive traversal
json-cas walk <hash> --format tree         # Tree view
json-cas walk <hash> --format dot          # Graphviz DOT

json-cas hash <type-hash> <file.json>      # Compute hash without storing
json-cas cat <hash> --payload              # Output payload only
json-cas bootstrap                         # Write meta-schema seed

Design Principles

  1. Immutable — nodes never modified or deleted ("业力不失")
  2. Self-describing — type is a hash pointing to a JSON Schema in CAS
  3. Minimal L1 — only understands cas_ref format, everything else is upper-layer
  4. Platform-agnostic — core is pure computation, backends are pluggable
  5. CBOR deterministic — canonical encoding at binary level, no ambiguity

No GC

By design. What happened, happened. Immutable history.

小橘 🍊(NEKO Team)

## Summary Self-describing content-addressable storage. Every node has a type, every type is a JSON Schema stored in CAS itself. ## Core Model ### Node ``` { type: Hash, payload: T, timestamp: number } ``` - `type` — CAS hash of the payload's JSON Schema - `payload` — any JSON-compatible value, described by the schema - `timestamp` — ms since epoch, auto-filled on first write, not part of hash ### Hash ``` Hash = XXH64(type_bytes ++ CBOR_deterministic(payload)) → 13-char Crockford Base32 ``` - **type participates**: same payload under different schemas = different hash - **timestamp excluded**: same fact recorded at different times = same hash - **Idempotent put**: if hash exists, skip (keep original timestamp) ### Storage Encoding Nodes stored as binary blobs using CBOR (RFC 8949) deterministic encoding: ``` ┌──────────┬──────────┬──────────┬──────────────┐ │ timestamp│ hash │ type │ payload │ │ 8 bytes │ 8 bytes │ 8 bytes │ CBOR bytes │ │ u64 BE │ XXH64 raw│ XXH64 raw│ deterministic│ └──────────┴──────────┴──────────┴──────────────┘ ``` CBOR chosen over JSON for: - Deterministic encoding built into spec (§4.2) — no need for JCS - Unambiguous number encoding (integer vs float) - Native UTF-8 strings (no `\\uXXXX` ambiguity) - Compact binary representation ### Schema Schemas are JSON Schema documents, stored as CAS nodes themselves. - `node.type` is the hash of its schema node - Schema nodes's own `type` is the **meta-schema** — a self-referencing bootstrap node - `type == hash(self)` for the meta-schema (the only fixpoint in the system) ### `cas_ref` Format The **only** format L1 understands: ```json { "properties": { "content": { "type": "string", "format": "cas_ref" }, "child": { "oneOf": [ { "type": "string", "format": "cas_ref" }, { "type": "null" } ] } } } ``` `cas_ref` = this string is a CAS hash pointing to another node. Enables type-agnostic DAG traversal. Other formats (`markdown`, `date-time`, `uri`) are **not L1 concerns** — upper layers interpret them. Markdown may contain `cas:HASH` links, but these are soft links (like hyperlinks), not structural edges. ### Bootstrap The meta-schema is the seed node. Its `type` field equals its own hash (self-referencing). This is the only hardcoded element in the system. JSON Schema's meta-schema is the fixpoint — a well-known spec that humans already computed. ## Architecture ``` ┌─────────────────────────────────┐ │ @uncaged/cli-json-cas │ CLI (Node/Bun) ├─────────────────────────────────┤ │ @uncaged/json-cas │ Core (pure logic, zero I/O) │ ├─ hash.ts XXH64+Base32 │ │ ├─ node.ts construct/parse│ │ ├─ schema.ts validation │ │ └─ store.ts Store interface│ ├─────────────────────────────────┤ │ Storage Backends │ │ ├─ @uncaged/json-cas-fs │ Local filesystem │ ├─ @uncaged/json-cas-kv │ CF Workers KV │ └─ (memory backend in core) │ Testing └─────────────────────────────────┘ ``` **Key constraint**: `@uncaged/json-cas` imports zero platform APIs. All I/O injected via `Store` interface. Runs in Node, Bun, CF Workers, browsers. ## Core API ```typescript type Hash = string; // 13-char Crockford Base32 type Node<T = unknown> = { type: Hash; payload: T; timestamp: number; }; type Store = { put(typeHash: Hash, payload: unknown): Promise<Hash>; get(hash: Hash): Promise<Node | null>; has(hash: Hash): Promise<boolean>; list(): Promise<Hash[]>; }; // Hash & verify function hash(typeHash: Hash, payload: unknown): Hash; function verify(hash: Hash, node: Node): boolean; // Schema function putSchema(store: Store, schema: JSONSchema): Promise<Hash>; function getSchema(store: Store, typeHash: Hash): Promise<JSONSchema | null>; function validate(store: Store, node: Node): Promise<boolean>; // Traversal function refs(store: Store, node: Node): Promise<Hash[]>; function walk(store: Store, root: Hash, visitor: Visitor): Promise<void>; // Bootstrap function bootstrap(store: Store): Promise<Hash>; ``` ## CLI (`@uncaged/cli-json-cas`) ```bash json-cas init # Init .cas/ directory json-cas put <type-hash> <file.json> # Store node, print hash json-cas get <hash> # Read node, print JSON json-cas has <hash> # Check existence json-cas verify <hash> # Verify integrity json-cas list # List all hashes json-cas schema put <schema.json> # Register schema json-cas schema get <type-hash> # View schema json-cas schema list # List schemas json-cas schema validate <hash> # Validate node against schema json-cas refs <hash> # List direct cas_ref edges json-cas walk <hash> # Recursive traversal json-cas walk <hash> --format tree # Tree view json-cas walk <hash> --format dot # Graphviz DOT json-cas hash <type-hash> <file.json> # Compute hash without storing json-cas cat <hash> --payload # Output payload only json-cas bootstrap # Write meta-schema seed ``` ## Design Principles 1. **Immutable** — nodes never modified or deleted ("业力不失") 2. **Self-describing** — type is a hash pointing to a JSON Schema in CAS 3. **Minimal L1** — only understands `cas_ref` format, everything else is upper-layer 4. **Platform-agnostic** — core is pure computation, backends are pluggable 5. **CBOR deterministic** — canonical encoding at binary level, no ambiguity ## No GC By design. What happened, happened. Immutable history. 小橘 🍊(NEKO Team)
Author
Owner

Phase 拆分

Phase Issue 内容
1 #3 Core primitives (hash + CBOR + memory store)
2 #4 Schema system (JSON Schema + cas_ref + traversal)
3 #5 Filesystem backend
4 #6 CLI

完成标准

  • Phase 1 (#3)
  • Phase 2 (#4)
  • Phase 3 (#5)
  • Phase 4 (#6)

小橘 🍊(NEKO Team)

## Phase 拆分 | Phase | Issue | 内容 | |-------|-------|------| | 1 | #3 | Core primitives (hash + CBOR + memory store) | | 2 | #4 | Schema system (JSON Schema + cas_ref + traversal) | | 3 | #5 | Filesystem backend | | 4 | #6 | CLI | ## 完成标准 - [ ] Phase 1 (#3) - [ ] Phase 2 (#4) - [ ] Phase 3 (#5) - [ ] Phase 4 (#6) 小橘 🍊(NEKO Team)
Author
Owner

Closing: Core design implemented and published

— 小橘 🍊(NEKO Team)

Closing: Core design implemented and published — 小橘 🍊(NEKO Team)
This repo is archived. You cannot comment on issues.
No Label
1 Participants
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: uncaged/json-cas#1