docs(rfc): knowledge layer — built-in, local-first, repo-scoped
Replaces the Alysaril delegation with a built-in knowledge feature: - knowledge.yaml at repo root with include/exclude - knowledge.db (SQLite) stores chunks + embeddings locally - Remote service only for embedding computation + content-hash cache - In-memory cosine search (sufficient for project scale) - CLI: nerve knowledge sync/query with -r and -g flags Refs #233
This commit is contained in:
@@ -141,19 +141,75 @@ remains the runtime interface. The new config layer is syntactic sugar — the r
|
||||
|
||||
## Knowledge Layer
|
||||
|
||||
Project knowledge is **not a nerve feature**. It is managed by [Alysaril](https://git.shazhou.work/uncaged/alysaril) — an independent project knowledge base tool (Zettelkasten cards + semantic search).
|
||||
Project knowledge is a **built-in nerve feature**. Scope is the **repo** — each repo has its own knowledge base, tracked in git.
|
||||
|
||||
Nerve's relationship to project knowledge:
|
||||
|
||||
- **Nerve does not hardcode knowledge paths** — no `.nerve/knowledge/` convention in runtime code
|
||||
- **Knowledge loading is a prompt concern** — role prompts tell agents to read relevant cards
|
||||
- **Agent long-term memory** — domain expertise accumulated across runs (e.g. "this repo uses pnpm"), stored per agent, separate from project knowledge
|
||||
- **Workflow context** (`start` + `messages`) serves as the only in-run state — no separate "short-term memory" layer needed
|
||||
### Architecture
|
||||
|
||||
```
|
||||
Project knowledge (Alysaril) Shared, git managed, any agent reads via prompt
|
||||
Agent long-term memory Per agent, domain expertise, cross-run
|
||||
Workflow context (start + msgs) Per run, moderator-controlled history
|
||||
Local (per repo) Remote Service
|
||||
┌───────────────────────┐ ┌─────────────────────┐
|
||||
│ knowledge.yaml │ │ Embedding API │
|
||||
│ ├── include/exclude │ ──→ │ text → vector │
|
||||
│ knowledge.db (SQLite) │ ←── │ content-hash cache │
|
||||
│ ├── chunk text │ │ (avoid recompute) │
|
||||
│ ├── embedding bytes │ └─────────────────────┘
|
||||
│ └── cosine search │
|
||||
└───────────────────────┘
|
||||
```
|
||||
|
||||
- **Local-first** — `knowledge.db` stores chunks + embeddings, search runs locally (in-memory cosine similarity)
|
||||
- **Remote service only computes embeddings** — content-addressable cache keyed by text hash, avoids redundant computation across agents
|
||||
- **Branch-aware by design** — different agents on different branches naturally have different `knowledge.db` contents
|
||||
|
||||
### Configuration (`knowledge.yaml` at repo root)
|
||||
|
||||
```yaml
|
||||
include:
|
||||
- "src/**/*.ts"
|
||||
- "docs/**/*.md"
|
||||
- "*.md"
|
||||
|
||||
exclude:
|
||||
- "node_modules/**"
|
||||
- "dist/**"
|
||||
- "*.test.ts"
|
||||
```
|
||||
|
||||
`knowledge.yaml` is committed to git. `knowledge.db` is gitignored — it's a local cache rebuilt from source files + remote embedding service.
|
||||
|
||||
### CLI
|
||||
|
||||
```bash
|
||||
nerve knowledge sync # index/re-index changed files
|
||||
nerve knowledge query "how does the signal bus work"
|
||||
|
||||
# Scope
|
||||
nerve knowledge query "..." # default: cwd repo
|
||||
nerve knowledge query -r /path/to/other/repo "..."
|
||||
nerve knowledge query -g "..." # global search (all indexed repos)
|
||||
# -r and -g are mutually exclusive
|
||||
```
|
||||
|
||||
### Search Implementation
|
||||
|
||||
Project-scale knowledge (hundreds to low thousands of chunks) does not need vector indices. Full scan with cosine similarity in memory is sufficient and adds zero native dependencies.
|
||||
|
||||
```ts
|
||||
// Pseudocode
|
||||
const chunks = db.all("SELECT slug, chunk, embedding FROM chunks");
|
||||
const query_vec = await embed(query);
|
||||
const results = chunks
|
||||
.map(c => ({ ...c, score: cosine(query_vec, c.embedding) }))
|
||||
.sort((a, b) => b.score - a.score)
|
||||
.slice(0, limit);
|
||||
```
|
||||
|
||||
### Knowledge Layers
|
||||
|
||||
```
|
||||
Project knowledge (knowledge.yaml) Per repo, git managed, any agent reads
|
||||
Agent long-term memory Per agent, domain expertise, cross-run
|
||||
Workflow context (start + msgs) Per run, moderator-controlled history
|
||||
```
|
||||
|
||||
## Open Questions
|
||||
@@ -162,9 +218,9 @@ Workflow context (start + msgs) Per run, moderator-controlled history
|
||||
2. **Extract override granularity** — global only, or also per-agent and per-role?
|
||||
3. **Context threading** — should `WorkflowContext` expose `workdir` and `signal` alongside the existing `start` + `messages`?
|
||||
4. **Agent long-term memory** — storage format and mechanism for persisting domain expertise across runs
|
||||
5. **Embedding service** — self-hosted vs managed (Cloudflare Workers AI, Dashscope, etc.), model choice (e.g. `text-embedding-3-small`)
|
||||
|
||||
## References
|
||||
|
||||
- [RFC-002: Workflow Engine](./rfc-002-workflow-engine.md)
|
||||
- Current `Role` / `Moderator` types: `packages/core/src/workflow.ts`
|
||||
- [Alysaril](https://git.shazhou.work/uncaged/alysaril) — project knowledge base (independent tool)
|
||||
|
||||
Reference in New Issue
Block a user