docs(rfc): knowledge layer — built-in, local-first, repo-scoped

Replaces the Alysaril delegation with a built-in knowledge feature: - knowledge.yaml at repo root with include/exclude - knowledge.db (SQLite) stores chunks + embeddings locally - Remote service only for embedding computation + content-hash cache - In-memory cosine search (sufficient for project scale) - CLI: nerve knowledge sync/query with -r and -g flags Refs #233
2026-04-29 04:12:09 +00:00
parent 3950f0e278
commit aecced587c
1 changed files with 67 additions and 11 deletions
@@ -141,19 +141,75 @@ remains the runtime interface. The new config layer is syntactic sugar — the r

 ## Knowledge Layer

-Project knowledge is **not a nerve feature**. It is managed by [Alysaril](https://git.shazhou.work/uncaged/alysaril) — an independent project knowledge base tool (Zettelkasten cards + semantic search).
+Project knowledge is a **built-in nerve feature**. Scope is the **repo** — each repo has its own knowledge base, tracked in git.

-Nerve's relationship to project knowledge:
-
- **Nerve does not hardcode knowledge paths** — no `.nerve/knowledge/` convention in runtime code
- **Knowledge loading is a prompt concern** — role prompts tell agents to read relevant cards
- **Agent long-term memory** — domain expertise accumulated across runs (e.g. "this repo uses pnpm"), stored per agent, separate from project knowledge
- **Workflow context** (`start` + `messages`) serves as the only in-run state — no separate "short-term memory" layer needed
+### Architecture

 ```
-Project knowledge (Alysaril)    Shared, git managed, any agent reads via prompt
-Agent long-term memory          Per agent, domain expertise, cross-run
-Workflow context (start + msgs) Per run, moderator-controlled history
+Local (per repo)                         Remote Service
+┌───────────────────────┐           ┌─────────────────────┐
+│ knowledge.yaml        │           │ Embedding API       │
+│ ├── include/exclude   │   ──→     │ text → vector       │
+│ knowledge.db (SQLite) │   ←──     │ content-hash cache  │
+│ ├── chunk text        │           │ (avoid recompute)   │
+│ ├── embedding bytes   │           └─────────────────────┘
+│ └── cosine search     │
+└───────────────────────┘
+```
+
+- **Local-first** — `knowledge.db` stores chunks + embeddings, search runs locally (in-memory cosine similarity)
+- **Remote service only computes embeddings** — content-addressable cache keyed by text hash, avoids redundant computation across agents
+- **Branch-aware by design** — different agents on different branches naturally have different `knowledge.db` contents
+
+### Configuration (`knowledge.yaml` at repo root)
+
+```yaml
+include:
+  - "src/**/*.ts"
+  - "docs/**/*.md"
+  - "*.md"
+
+exclude:
+  - "node_modules/**"
+  - "dist/**"
+  - "*.test.ts"
+```
+
+`knowledge.yaml` is committed to git. `knowledge.db` is gitignored — it's a local cache rebuilt from source files + remote embedding service.
+
+### CLI
+
+```bash
+nerve knowledge sync              # index/re-index changed files
+nerve knowledge query "how does the signal bus work"
+
+# Scope
+nerve knowledge query "..." # default: cwd repo
+nerve knowledge query -r /path/to/other/repo "..."
+nerve knowledge query -g "..."   # global search (all indexed repos)
+# -r and -g are mutually exclusive
+```
+
+### Search Implementation
+
+Project-scale knowledge (hundreds to low thousands of chunks) does not need vector indices. Full scan with cosine similarity in memory is sufficient and adds zero native dependencies.
+
+```ts
+// Pseudocode
+const chunks = db.all("SELECT slug, chunk, embedding FROM chunks");
+const query_vec = await embed(query);
+const results = chunks
+  .map(c => ({ ...c, score: cosine(query_vec, c.embedding) }))
+  .sort((a, b) => b.score - a.score)
+  .slice(0, limit);
+```
+
+### Knowledge Layers
+
+```
+Project knowledge (knowledge.yaml)  Per repo, git managed, any agent reads
+Agent long-term memory              Per agent, domain expertise, cross-run
+Workflow context (start + msgs)     Per run, moderator-controlled history
 ```

 ## Open Questions
@@ -162,9 +218,9 @@ Workflow context (start + msgs) Per run, moderator-controlled history
 2. **Extract override granularity** — global only, or also per-agent and per-role?
 3. **Context threading** — should `WorkflowContext` expose `workdir` and `signal` alongside the existing `start` + `messages`?
 4. **Agent long-term memory** — storage format and mechanism for persisting domain expertise across runs
+5. **Embedding service** — self-hosted vs managed (Cloudflare Workers AI, Dashscope, etc.), model choice (e.g. `text-embedding-3-small`)

 ## References

 - [RFC-002: Workflow Engine](./rfc-002-workflow-engine.md)
 - Current `Role` / `Moderator` types: `packages/core/src/workflow.ts`
- [Alysaril](https://git.shazhou.work/uncaged/alysaril) — project knowledge base (independent tool)