9c832b0e21
7 cards updated, 4 new cards added. Topics: signal-routing, worker-isolation, storage-layer, adapter-isolation, sense contracts, workflow runtime enforcement, coding conventions details. 小橘 <xiaoju@shazhou.work>
49 lines
1.9 KiB
Markdown
49 lines
1.9 KiB
Markdown
# Knowledge Layer (RFC-003 Phase 6)
|
|
|
|
Local-first, repo-scoped knowledge base for project context.
|
|
|
|
## Files
|
|
|
|
- `knowledge.yaml` — repo root, defines include/exclude globs
|
|
- `knowledge.db` — SQLite, stores chunks + embeddings
|
|
- `.knowledge/` — curated knowledge cards (indexed by sync)
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
nerve knowledge sync # chunk files, compute embeddings, write to knowledge.db
|
|
nerve knowledge query "query" # search by cosine similarity (or word overlap fallback)
|
|
nerve knowledge query -g "query" # global search across all indexed repos
|
|
nerve knowledge query --repo /path "query" # search specific repo
|
|
```
|
|
|
|
## Embedding
|
|
|
|
- **Default model**: Dashscope text-embedding-v3 (1024 dimensions)
|
|
- **Remote service**: configured via `EMBED_SERVICE_URL` env var (self-hosted Cloudflare Worker + KV cache)
|
|
- **Model configuration**: No mechanism to specify alternate models — hardcoded to text-embedding-v3 in remote service
|
|
- **Vector dimensions**: Fixed at 1024 (Float32Array, stored as 4096-byte Buffer blobs in SQLite)
|
|
- **Cache**: content-addressable (sha256 of model+text), never expires
|
|
- **Fallback**: word-overlap scoring when embed service not configured
|
|
|
|
### Configuration
|
|
|
|
The embedding model is **not configurable** through `knowledge.yaml` or other config files. The remote service at `embed.shazhou.workers.dev` uses Dashscope text-embedding-v3 exclusively. To use different models, you would need to:
|
|
|
|
1. Deploy your own embedding service compatible with the same API
|
|
2. Point `EMBED_SERVICE_URL` to your service
|
|
3. Ensure vector dimensions match (1024) or modify knowledge database schema
|
|
|
|
## Chunking
|
|
|
|
- Markdown: split by headings, large sections split further by paragraphs (max 24)
|
|
- TypeScript/JS: split by function declarations, fallback to paragraphs
|
|
- Other files: single chunk
|
|
|
|
## Env Config
|
|
|
|
```
|
|
EMBED_SERVICE_URL=https://embed.shazhou.workers.dev
|
|
EMBED_AUTH_TOKEN=<token>
|
|
```
|