docs(rfc): knowledge layer — built-in, local-first, repo-scoped

Replaces the Alysaril delegation with a built-in knowledge feature:
- knowledge.yaml at repo root with include/exclude
- knowledge.db (SQLite) stores chunks + embeddings locally
- Remote service only for embedding computation + content-hash cache
- In-memory cosine search (sufficient for project scale)
- CLI: nerve knowledge sync/query with -r and -g flags

Refs #233
This commit is contained in:
2026-04-29 04:12:09 +00:00
parent 3950f0e278
commit aecced587c
+67 -11
View File
@@ -141,19 +141,75 @@ remains the runtime interface. The new config layer is syntactic sugar — the r
## Knowledge Layer
Project knowledge is **not a nerve feature**. It is managed by [Alysaril](https://git.shazhou.work/uncaged/alysaril) — an independent project knowledge base tool (Zettelkasten cards + semantic search).
Project knowledge is a **built-in nerve feature**. Scope is the **repo** — each repo has its own knowledge base, tracked in git.
Nerve's relationship to project knowledge:
- **Nerve does not hardcode knowledge paths** — no `.nerve/knowledge/` convention in runtime code
- **Knowledge loading is a prompt concern** — role prompts tell agents to read relevant cards
- **Agent long-term memory** — domain expertise accumulated across runs (e.g. "this repo uses pnpm"), stored per agent, separate from project knowledge
- **Workflow context** (`start` + `messages`) serves as the only in-run state — no separate "short-term memory" layer needed
### Architecture
```
Project knowledge (Alysaril) Shared, git managed, any agent reads via prompt
Agent long-term memory Per agent, domain expertise, cross-run
Workflow context (start + msgs) Per run, moderator-controlled history
Local (per repo) Remote Service
┌───────────────────────┐ ┌─────────────────────┐
│ knowledge.yaml │ │ Embedding API │
│ ├── include/exclude │ ──→ │ text → vector │
│ knowledge.db (SQLite) │ ←── │ content-hash cache │
│ ├── chunk text │ │ (avoid recompute) │
│ ├── embedding bytes │ └─────────────────────┘
│ └── cosine search │
└───────────────────────┘
```
- **Local-first** — `knowledge.db` stores chunks + embeddings, search runs locally (in-memory cosine similarity)
- **Remote service only computes embeddings** — content-addressable cache keyed by text hash, avoids redundant computation across agents
- **Branch-aware by design** — different agents on different branches naturally have different `knowledge.db` contents
### Configuration (`knowledge.yaml` at repo root)
```yaml
include:
- "src/**/*.ts"
- "docs/**/*.md"
- "*.md"
exclude:
- "node_modules/**"
- "dist/**"
- "*.test.ts"
```
`knowledge.yaml` is committed to git. `knowledge.db` is gitignored — it's a local cache rebuilt from source files + remote embedding service.
### CLI
```bash
nerve knowledge sync # index/re-index changed files
nerve knowledge query "how does the signal bus work"
# Scope
nerve knowledge query "..." # default: cwd repo
nerve knowledge query -r /path/to/other/repo "..."
nerve knowledge query -g "..." # global search (all indexed repos)
# -r and -g are mutually exclusive
```
### Search Implementation
Project-scale knowledge (hundreds to low thousands of chunks) does not need vector indices. Full scan with cosine similarity in memory is sufficient and adds zero native dependencies.
```ts
// Pseudocode
const chunks = db.all("SELECT slug, chunk, embedding FROM chunks");
const query_vec = await embed(query);
const results = chunks
.map(c => ({ ...c, score: cosine(query_vec, c.embedding) }))
.sort((a, b) => b.score - a.score)
.slice(0, limit);
```
### Knowledge Layers
```
Project knowledge (knowledge.yaml) Per repo, git managed, any agent reads
Agent long-term memory Per agent, domain expertise, cross-run
Workflow context (start + msgs) Per run, moderator-controlled history
```
## Open Questions
@@ -162,9 +218,9 @@ Workflow context (start + msgs) Per run, moderator-controlled history
2. **Extract override granularity** — global only, or also per-agent and per-role?
3. **Context threading** — should `WorkflowContext` expose `workdir` and `signal` alongside the existing `start` + `messages`?
4. **Agent long-term memory** — storage format and mechanism for persisting domain expertise across runs
5. **Embedding service** — self-hosted vs managed (Cloudflare Workers AI, Dashscope, etc.), model choice (e.g. `text-embedding-3-small`)
## References
- [RFC-002: Workflow Engine](./rfc-002-workflow-engine.md)
- Current `Role` / `Moderator` types: `packages/core/src/workflow.ts`
- [Alysaril](https://git.shazhou.work/uncaged/alysaril) — project knowledge base (independent tool)