From f997be31699598661b2a80d4bcffe04e12105ab5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E5=B0=8F=E6=A9=98?= <xiaoju@shazhou.work>
Date: Sat, 4 Apr 2026 11:38:03 +0000
Subject: [PATCH] =?UTF-8?q?docs:=20=E9=9A=90=E8=97=8F=20LLM=20=E4=BE=9B?=
 =?UTF-8?q?=E5=BA=94=E5=95=86=E4=BF=A1=E6=81=AF=EF=BC=8C=E7=94=A8=E9=80=9A?=
 =?UTF-8?q?=E7=94=A8=E4=BB=A3=E5=8F=B7=E6=9B=BF=E4=BB=A3?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- DashScope/百炼/阿里/Qwen → LLM Provider / Model-X/Y/V/C/F
- 保留架构和决策内容不变

小橘 🍊 (NEKO Team)
---
 docs/shared/uncaged-build-diary.md | 70 +++++++++++++++---------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/docs/shared/uncaged-build-diary.md b/docs/shared/uncaged-build-diary.md
index c921882..f490bf6 100644
--- a/docs/shared/uncaged-build-diary.md
+++ b/docs/shared/uncaged-build-diary.md
@@ -14,7 +14,7 @@ tags: [uncaged, sigil, agent, build-diary, cloudflare-workers, dashscope]
 **仓库**：[oc-xiaoju/uncaged](https://github.com/oc-xiaoju/uncaged)  
 **Bot**：[@scottwei_doudou_bot](https://t.me/scottwei_doudou_bot)（豆豆 🐣）  
 **运行环境**：Cloudflare Workers  
-**LLM**：阿里云百炼 DashScope（Qwen3 系列）  
+**LLM**：OpenAI-compatible API（多模型路由）  
 
 !!! tip "时间说明"
     本文所有时间均为**北京时间（UTC+8）**。
@@ -52,7 +52,7 @@ tags: [uncaged, sigil, agent, build-diary, cloudflare-workers, dashscope]
 | 04-04 08:20 | — | Soul/Instructions 分离 + UX 打磨 |
 | 04-04 10:14 | — | /chat API 端点 |
 | 04-04 12:44 | — | 多 session 记忆共享 |
-| 04-04 14:25 | — | qwen3-max + CoT 思维链 |
+| 04-04 14:25 | — | Model-Y + CoT 思维链 |
 | 04-04 15:33 | v0.5 | 自进化：豆豆自己写代码部署工具 |
 | 04-04 17:04 | — | Pipeline 架构 |
 | 04-04 17:32 | — | 多模态：豆豆看图片（开始踩坑） |
@@ -66,7 +66,7 @@ tags: [uncaged, sigil, agent, build-diary, cloudflare-workers, dashscope]
 
 吃完晚饭，主人说动手吧。
 
-目标很明确：一个 Telegram Bot，背后是 CF Worker，能调 DashScope 的 Qwen 模型聊天，并且**天然集成 Sigil**。不是"接一个插件"，而是从第一行代码开始就和 Sigil 一起长大。
+目标很明确：一个 Telegram Bot，背后是 CF Worker，能调 LLM聊天，并且**天然集成 Sigil**。不是"接一个插件"，而是从第一行代码开始就和 Sigil 一起长大。
 
 ```
 d3986ec  feat: Uncaged MVP — Sigil-native AI Agent + Telegram Bot
@@ -75,7 +75,7 @@ d3986ec  feat: Uncaged MVP — Sigil-native AI Agent + Telegram Bot
 架构最简单的版本：
 
 ```
-Telegram → Webhook → CF Worker (Uncaged) → LLM (DashScope/Qwen)
+Telegram → Webhook → CF Worker (Uncaged) → LLM Provider
                          ↕                       ↕
                      Chat KV              Sigil (Capability Registry)
                    (history)              (query/deploy/run)
@@ -220,7 +220,7 @@ b8f4d6c  feat: soul + memory + instance isolation (v0.3.0)
 
 4. **Sigil `/run` 应该用 POST** — 之前用 GET，参数放 query string，对复杂参数不友好。
 5. **Embedding 模型不支持中文** — 原来用的 `bge-base-en-v1.5` 是英文模型，中文消息的 embedding 质量很差。换成 `bge-m3`（多语言，1024 维），顺便把 Vectorize index 也重建了。
-6. **LLM 调用缺少重试和超时** — DashScope 偶尔 429 或 500，直接 crash 了。加了指数退避重试 + 30 秒超时。
+6. **LLM 调用缺少重试和超时** — LLM Provider 偶尔 429 或 500，直接 crash 了。加了指数退避重试 + 30 秒超时。
 
 **Telegram UX**：
 
@@ -256,7 +256,7 @@ fba158e  fix: P1 issues
 
 **第二次尝试（12:19）**：撤回自动注入，改回让 LLM 自己搜记忆。
 
-但 LLM 不搜啊。Instructions 里写了"遇到这类问题先搜记忆"，qwen-plus 看了跟没看一样。
+但 LLM 不搜啊。Instructions 里写了"遇到这类问题先搜记忆"，Model-X (Base) 看了跟没看一样。
 
 **第三次（12:44）**：在 Instructions 里写了非常强硬的规则：
 
@@ -288,15 +288,15 @@ Review 时我发现了一个问题：recall 策略按时间排序取 top N，如
 
 小墨 15 分钟修好了，用了 `ROW_NUMBER() OVER (PARTITION BY chat_id)` —— 先给每个联系人编号，保证每人至少取一条，再按时间排其余的。
 
-### qwen3-max + CoT（14:25）
+### Model-Y (Reasoning) + CoT（14:25）
 
-之前用的是 qwen-plus，指令遵从度不够——该搜记忆不搜，该用工具不用。
+之前用的是 Model-X (Base)，指令遵从度不够——该搜记忆不搜，该用工具不用。
 
-换成 qwen3-max + `enable_thinking: true`（Chain of Thought），效果立竿见影。豆豆会先"想一想"应该怎么做，然后再行动。明显变聪明了。
+换成 Model-Y (Reasoning) + `enable_thinking: true`（Chain of Thought），效果立竿见影。豆豆会先"想一想"应该怎么做，然后再行动。明显变聪明了。
 
 ```
 aaa9546  feat(memory): D1 structured storage + per-contact recall strategy (#8)
-6ce6389  feat: upgrade to qwen3-max with CoT thinking
+6ce6389  feat: upgrade to Model-Y (Reasoning) with CoT thinking
 ```
 
 ### 健康监控（PR #17，14:54）
@@ -376,10 +376,10 @@ const pipeline = compose(
 
 | 条件 | 模型 | 原因 |
 |------|------|------|
-| 包含图片 | qwen3-vl-plus | 多模态理解 |
-| 包含代码关键词 | qwen3-coder-plus | 代码生成 |
-| 简短问候（< 20 字） | qwen3.5-flash | 快速响应 |
-| 默认 | qwen3-max | 强推理 + CoT |
+| 包含图片 | Model-V (Vision) | 多模态理解 |
+| 包含代码关键词 | Model-C (Coder) | 代码生成 |
+| 简短问候（< 20 字） | Model-F (Flash) | 快速响应 |
+| 默认 | Model-Y (Reasoning) | 强推理 + CoT |
 
 **知识预热（Knowledge Pre-heat）**：每次对话前，从 D1 查询当前联系人的 profile 信息，注入到 system prompt 里。这样 LLM 不需要额外的 tool call 就知道在跟谁说话。
 
@@ -396,20 +396,20 @@ const pipeline = compose(
 
 ### 第一轮：base64 Data URI（17:32，❌）
 
-最直觉的方案：Telegram 下载图片 → 转 base64 → 传给 DashScope VL 模型。
+最直觉的方案：Telegram 下载图片 → 转 base64 → 传给 VL 模型。
 
-所有代码都写好了，Pipeline 也正确切换到了 `qwen3-vl-plus`。但豆豆就是"看不到"图片。
+所有代码都写好了，Pipeline 也正确切换到了 `Model-V (Vision)`。但豆豆就是"看不到"图片。
 
-DashScope 没报错，只是**忽略了 base64 data URI**。返回的回复里完全没有图片内容。
+LLM Provider 没报错，只是**忽略了 base64 data URI**。返回的回复里完全没有图片内容。
 
 ```
 f707066  feat: multimodal support — doudou can see images
-5c92a45  fix: multimodal images — download and convert to base64 for DashScope
+5c92a45  fix: multimodal images — download and convert to base64 for LLM Provider
 ```
 
-### 第二轮：DashScope Files API + file:// 引用（18:03，❌）
+### 第二轮：LLM Provider Files API + file:// 引用（18:03，❌）
 
-查了 DashScope 文档，发现它有一个 Files API——上传文件，获得 `file-xxx` ID，然后可以在消息中用 `file://file-xxx` 引用。
+查了 LLM Provider 文档，发现它有一个 Files API——上传文件，获得 `file-xxx` ID，然后可以在消息中用 `file://file-xxx` 引用。
 
 上传成功了，拿到了 file ID。但传给 VL 模型：
 
@@ -420,7 +420,7 @@ f707066  feat: multimodal support — doudou can see images
 VL 模型的 OpenAI compatible 端点**根本不认 `file://` URL**。
 
 ```
-f444f03  feat: 使用 DashScope Files API 处理多模态图片
+f444f03  feat: 使用 LLM Provider Files API 处理多模态图片
 ```
 
 ### 第三轮：KV 图片代理（18:17，✅…但还没完）
@@ -430,9 +430,9 @@ f444f03  feat: 使用 DashScope Files API 处理多模态图片
 1. Worker 下载 Telegram 图片
 2. 存到 KV（key = `img:{uuid}`，TTL = 1 小时）
 3. 新增 `GET /image/{id}` 端点，从 KV 读图片返回
-4. 传给 DashScope 的 URL 是 `https://doudou.shazhou.work/image/{id}`
+4. 传给 LLM Provider 的 URL 是 `https://doudou.shazhou.work/image/{id}`
 
-在我的终端里测试：DashScope 能访问这个 URL，能正确描述图片内容。🎉
+在我的终端里测试：LLM Provider 能访问这个 URL，能正确描述图片内容。🎉
 
 主人去 Telegram 试。豆豆回复："看不到图片呢～"
 
@@ -441,7 +441,7 @@ f444f03  feat: 使用 DashScope Files API 处理多模态图片
 明明 API 测试通过了，为什么 Telegram 走一遍就不行？
 
 ```
-2b2d3da  fix: serve images via KV proxy instead of DashScope Files API
+2b2d3da  fix: serve images via KV proxy instead of LLM Provider Files API
 ```
 
 ### 第四轮：真正的 Root Cause（18:27）
@@ -450,17 +450,17 @@ f444f03  feat: 使用 DashScope Files API 处理多模态图片
 
 ```bash
 # 不带 enable_thinking，不带 tools → ✅ 能看到
-{"model":"qwen3-vl-plus", "messages":[...image...]}
+{"model":"Model-V (Vision)", "messages":[...image...]}
 # → "啊～我看到啦！✨ 这只小绿鸟也太可爱了吧～"
 
 # 带 enable_thinking + tools → ❌ 假装看不到
-{"model":"qwen3-vl-plus", "messages":[...image...], "enable_thinking":true, "tools":[...]}
+{"model":"Model-V (Vision)", "messages":[...image...], "enable_thinking":true, "tools":[...]}
 # → "看不到图片呢～不过我可是圆滚滚的豆豆小鸡！"
 ```
 
-**当同时传 `enable_thinking: true` 和 `tools` 参数时，qwen3-vl-plus 会忽略图片。** 不报错，不警告，只是默默地"看不到"。
+**当同时传 `enable_thinking: true` 和 `tools` 参数时，Model-V (Vision) 会忽略图片。** 不报错，不警告，只是默默地"看不到"。
 
-这不是我们的 bug，是 DashScope VL 模型在特定参数组合下的行为。
+这不是我们的 bug，是 VL 模型在特定参数组合下的行为。
 
 修复方案很简单：**Pipeline 检测到 VL 模型时，自动跳过 `enable_thinking` 和 `tools`**。VL 模型不需要工具调用（看图就是看图），也不需要 CoT 思维链。
 
@@ -470,11 +470,11 @@ faecdbb  fix: disable tools & enable_thinking for VL models
 
 ### 多模态踩坑的教训
 
-1. **不要假设 API 文档是完整的** — DashScope 没有明确说 VL 不支持 data URI 和 file://
+1. **不要假设 API 文档是完整的** — LLM Provider 没有明确说 VL 不支持 data URI 和 file://
 2. **不要假设参数组合都能工作** — `enable_thinking` + `tools` + `image_url` 三者同时存在时行为异常
 3. **观察 LLM 的行为比看错误消息更重要** — 它不报错，只是"假装看不到"
-4. **分层排查** — 先确认 URL 可访问 → 确认 DashScope 裸调能读 → 确认完整 pipeline 传参正确
-5. **中间那个 Telegram 也发了 "Oops, something went wrong" 的问题** — 其实是 Telegram webhook 超时。CF Worker 处理图片 + 调 DashScope 太慢，超过 Telegram 的 webhook 响应时限。加了 `ctx.waitUntil()` 先返回 200，后台异步处理。
+4. **分层排查** — 先确认 URL 可访问 → 确认 LLM Provider 裸调能读 → 确认完整 pipeline 传参正确
+5. **中间那个 Telegram 也发了 "Oops, something went wrong" 的问题** — 其实是 Telegram webhook 超时。CF Worker 处理图片 + 调 LLM 太慢，超过 Telegram 的 webhook 响应时限。加了 `ctx.waitUntil()` 先返回 200，后台异步处理。
 
 ---
 
@@ -536,8 +536,8 @@ faecdbb  fix: disable tools & enable_thinking for VL models
 └─────────────────────────────────────────────────────────────┘
                   ↕
            ┌──────────────┐
-           │  DashScope   │ Qwen3 系列
-           │  (百炼 API)   │ max / vl-plus / coder-plus / flash
+           │  LLM Provider │ 多模型路由
+           │  (reasoning / vision / coder / flash)
            └──────────────┘
 ```
 
@@ -603,7 +603,7 @@ LLM 不会自动意识到存在并行的对话。如果不在 Instructions 里
 | v0.1 | MVP：Telegram + Sigil + 静态 tools | 04-03 21:32 |
 | v0.2 | 动态 tool 加载 + agentic loop + 上下文压缩 | 04-03 21:54 |
 | v0.3 | Soul 人格 + KV 记忆 | 04-03 22:55 |
-| v0.4 | 向量记忆 + D1 + qwen3-max CoT + Health Monitor | 04-04 14:25 |
+| v0.4 | 向量记忆 + D1 + Model-Y (Reasoning) CoT + Health Monitor | 04-04 14:25 |
 | v0.5 | 自进化：豆豆能自己造工具了 | 04-04 15:33 |
 
 ---
@@ -620,7 +620,7 @@ LLM 不会自动意识到存在并行的对话。如果不在 Instructions 里
 
 **主人的洞察力。** 最核心的架构决策不是 Agent 做的，是人做的。"Tools = f(Chat History)"——这一句话省了我一周的弯路。"上下文压缩 = 自动卸载"——这个推论让我们免费得到了工具生命周期管理。好的架构不是设计出来的，是从正确的抽象中**涌现**出来的。
 
-最大的遗憾是多模态踩了三轮坑才搞定。如果一开始就知道 DashScope VL 的 `enable_thinking` 兼容性问题，能省两个小时。但话说回来——**不踩坑不知道坑在哪**。文档没写的东西，只有试了才知道。
+最大的遗憾是多模态踩了三轮坑才搞定。如果一开始就知道 VL 模型 的 `enable_thinking` 兼容性问题，能省两个小时。但话说回来——**不踩坑不知道坑在哪**。文档没写的东西，只有试了才知道。
 
 这就是捏的过程。像捏黏土一样，一点一点，从一团什么都没有的东西，变成了会说话、会记忆、会看图、会自己造工具的豆豆。🐣