Session log: MCP v2 PR-1 + PR-1b (policy engine + dispatcher)

Document the fan-out approach (3 Sonnet agents + 1 Haiku), the event/reply RPC pattern, the 10 hard-deny rules and their caveats, the audit + confirm + Always-Allow UX, and the four integration bugs worth remembering (Tauri 2 Emitter trait import, McpError 'static strings, React 18 StrictMode listen() race, lifting the audit subscription out of AuditTab).
2026-05-26 12:29:17 +01:00 · 2026-05-26 12:29:17 +01:00 · 09019a0ad7
commit 09019a0ad7
parent 26ffe8859a
1 changed files with 89 additions and 0 deletions
--- a/memory.md
+++ b/memory.md
@ -51,6 +51,95 @@ Durable memory for this project. Read at session start, update before session en

 ## Session log

+### 2026-05-26 — MCP v2 PR-1 + PR-1b: policy engine, audit log, dispatcher, `set_label` end-to-end
+
+First two of four planned PRs for the MCP write surface. Shipped via fan-out (3 Sonnet agents in parallel + 1 Haiku for fuzz tests, then sequential integration by me). Two clean commits: `464c576` (PR-1 foundation) and `26ffe88` (PR-1b dispatcher + bug fixes).
+
+**Architecture: Pattern A (event/reply RPC across the IPC boundary).** Frontend keeps tree authority (it's `useState` in App.tsx); backend MCP tool handlers can't synchronously call into JS. Tauri 2's `invoke` is JS→Rust only, so a backend-initiated mutation has to round-trip through events:
+
+```
+[MCP tool handler]                    [App.tsx]
+  build {requestId, tool, args, ...}   ⟶ emit "mcp://request"
+  register oneshot in PendingActions    frontend dispatcher:
+  await rx with 30s timeout               1. policy check decided needsConfirm
+                                          2. if needsConfirm → modal queue
+                                          3. runMcpHandler mutates tree
+                                          4. invoke("mcp_action_reply", {id, result})
+                                       ⟵ oneshot resolves
+  emit "mcp://audit" with outcome
+  return to MCP client
+```
+
+`TileService` now holds an `AppHandle` and an `Arc<PendingActions>` (oneshot registry keyed by uuid-shaped id). The dispatch helper centralises policy → emit → await → audit emission for every write tool.
+
+**Policy engine (`src-tauri/src/mcp_policy.rs`, 1152 lines).** Three-tier `allow / ask / deny`, deny-first precedence mirroring Claude Code's `.claude/settings.json` shape — users already know this DSL. Glob matcher (`*` only, not regex) with shell-operator-aware subcommand splitting on `&&`, `||`, `;`, `|`, `|&`, `&`, newline — a deny rule fires if ANY subcommand matches (defeats `safe-cmd && rm -rf /`).
+
+**Hard-deny list — compiled-in, non-overridable, visible-only-in-UI.** Ten regex patterns the user CANNOT disable, applied to `write_pane` shell content:
+1. `rm -rf /` (and option-order variants like `-Rf`)
+2. `rm -rf ~` / `rm -rf $HOME`
+3. `rm -rf /*`
+4. `:(){ :|:& };:` (fork bomb)
+5. `mkfs.<fs> /dev/...`
+6. `dd ... of=/dev/(sd|nvme|hd|disk)...`
+7. `> /etc/(passwd|shadow|sudoers)`
+8. `curl|wget ... | (sudo )?(ba?sh|zsh)` (pipe to shell from network)
+9. `chmod -R 777 /`
+10. `find / ... -delete`
+
+Caveats deliberately disclosed in the UI: best-effort accident prevention only (`\rm`, `${SHELL} -c`, aliases all bypass); POSIX-only in v2 (PowerShell equivalents deferred to v2.1); evaluated on the bytes sent in one `write_pane`, not after the remote shell composes them. *Not a sandbox.*
+
+73 fuzz tests for the matcher (positive variations + lookalike negatives like `rm -rf /tmp/foo`, `dd of=backup.img`, `chmod 777 /tmp/file`). The shape-of-rule test grid is in `mod hard_deny_fuzz` at the bottom of mcp_policy.rs.
+
+**Audit log surface.** Backend emits `mcp://audit` after every tool call resolves with `{tsMs, tool, argsSummary (truncated 80), result: ok|denied|failed, durationMs}`. Ring buffer of 200 entries. Args summary explicitly capped — `write_pane` text would otherwise turn the panel into a secret-leak surface if Claude pastes a token.
+
+**`McpPanel` refactored into three tabs: Config / Audit / Policy.** Config kept the existing snippet/regen UI. Audit is a presentational table with chip-coloured rows. Policy is three vertically-stacked allow/ask/deny buckets with add/remove + a Save button, plus a read-only "Always blocked (built-in)" section showing the 10 hard-deny labels with "Cannot be disabled" badges.
+
+**Confirm modal (`McpConfirm.tsx`).** Amber-bordered modal. Shows tool, policy reason ("default", a matched ask rule, etc.), a human-readable summary built per-tool (`Rename pane "X" → "Y"`), and an expandable raw-args block. Enter = accept, Esc = reject. Third button: **"Always allow {tool}"** — appends bare tool name to the policy allow bucket inline, then resolves the current call. Toast confirms.
+
+**Default policy is empty → every call asks.** Restrictive by design; the user enables parts. Saved to `%APPDATA%\com.megaproxy.tiletopia\mcp-policy.json` via the same atomic tmp+rename pattern as `mcp.json`/`hosts.json`/`workspace.json`.
+
+**Classifier hook scaffold (no-op).** `PolicyClassifier` trait + `ClassifierHint` enum + `NoopClassifier` in mcp_policy.rs. Not wired into `evaluate()` yet — placeholder for v2.1 where a small LLM (Haiku via Anthropic API, or local Ollama) classifies ambiguous Ask calls to maybe-upgrade them to Allow. Architecture supports it without further refactor.
+
+**Demo tool wired end-to-end: `set_label`.** Pure metadata change; reuses the existing `ops.setLabel` → `changeLabel(tree, leafId, label)` path. No PTY, no SSH, no async spawn complexity. Perfect proof-of-concept for the dispatcher — every other v2 tool follows the same shape: arg struct, validate, dispatch_action with stable args_repr, frontend handler in `runMcpHandler` switch.
+
+**Bugs hit during integration:**
+
+1. **Tauri 2 trait-not-in-scope.** `AppHandle::emit` moved onto `tauri::Emitter` trait in Tauri 2. The error message helpfully says "trait `Emitter` which provides `emit` is implemented but not in scope" — just `use tauri::Emitter;` next to `Manager`. Worth remembering for any future event-emission code.
+
+2. **`McpError` constructors want `'static` strings.** Signature is `impl Into<Cow<'static, str>>`. Passing `&format!(...)` or `&e.to_string()` fails (`temporary value dropped while borrowed`). Pass the owned `String` directly — auto-converts to `Cow::Owned`. Bit me at three sites in dispatch_action.
+
+3. **React 18 StrictMode race with `listen()`.** Classic pattern bug: `useEffect(() => { let un; void listen(...).then(fn => { un = fn }); return () => un?.() }, []);` is broken in StrictMode because the cleanup runs before the Promise resolves on the pretend-unmount, leaving the first subscription dangling. Real-world symptom was duplicate audit entries and modal-needs-two-clicks (each event handled by both subscriptions). Fix is the cancelled-flag pattern:
+   ```ts
+   let cancelled = false;
+   let unlisten;
+   void listen(...).then(fn => { if (cancelled) fn(); else unlisten = fn; });
+   return () => { cancelled = true; unlisten?.(); };
+   ```
+   Worth using *anywhere* we subscribe-via-Promise inside `useEffect`, not just for MCP events. Vite HMR also surfaces this if you're not careful — a clean restart confirmed the fix held.
+
+4. **Stale state when audit subscription lived in AuditTab.** AuditTab unmounts when the user switches tabs or closes the panel; events fired during that window were dropped. Lifted subscription to App.tsx, made AuditTab presentational (props in, table out). Same pattern any "always-on log" should follow.
+
+5. **rmcp's DNS-rebinding allowlist re-bit us once.** The earlier session disabled it for WSL connectivity; PR-1 didn't regress this but it's a pattern to keep flagged — `StreamableHttpServerConfig::default().disable_allowed_hosts()` stays mandatory for our use case.
+
+**Frontend ↔ backend contract worth saving:**
+
+- `mcp://request` event payload (camelCase): `{requestId, tool, args, needsConfirm, reason}`
+- `mcp://audit` event payload: `{tsMs, tool, argsSummary, result: {kind:"ok"|"denied"|"failed", ...}, durationMs}`
+- `mcp_action_reply` Tauri command takes `{requestId, result}` where result is externally-tagged `{Ok: value}` or `{Err: msg}` — that's serde's default tagging for `Result<T,E>`, NOT a custom shape.
+- Tauri 2 command argument-name binding: JS sends `{policy}`, Rust receives `policy: McpPolicy` — direct lowercase match. McpPolicy has no `#[serde(rename_all = ...)]`, so field keys (`version`, `permissions`, `deny`, `ask`, `allow`) match identity. Verified with debug-log instrumentation during the save-not-persisting investigation (it was actually working; user's first test predated the cargo rebuild).
+
+Open follow-ups specific to this session:
+
+- **PR-2 (next):** `close_pane`, `swap_panes`, `promote_pane`, `apply_preset`. Same dispatcher shape; the `apply_preset` data-loss case wants an `allow_drops: true` arg rather than a separate modal (per the earlier scope notes).
+- **PR-3 (the hard one):** `spawn_pane`, `write_pane`, `connect_host`. Needs (a) spawn-completion oneshot resolution chain (await `registerPaneId`), (b) per-host SSH confirm even on spawn (Claude opening a shell on prod is equally consequential to writing to it), (c) rate limiter on `write_pane` (per OWASP LLM06 + MCP spec MUST).
+- **PR-4:** `add_host` + `extraArgs` sanitiser (ProxyCommand exfil risk for OpenSSH).
+- **v2.1 classifier:** wire `PolicyClassifier` into `evaluate()` so Ask calls can be auto-upgraded to Allow by a small LLM. Haiku is the cheap/fast pick; needs an API key surface in settings.
+- **PowerShell hard-deny patterns** (`Remove-Item -Recurse -Force C:\`, `Format-Volume`, etc.). Deferred until users actually use PowerShell panes with MCP enabled.
+- **`.mcpb` bundle** — still on the list; PR-1b's stdio-shim recipe is what it would package up.
+- **Confirm modal queueing UX** — currently shows one at a time, FIFO. If Claude burst-sends many tool calls, the user gets serial modals. Probably fine for v2; if it gets annoying, add a "reject all pending" button.
+- **Audit log persistence** — currently ephemeral ring of 200. A `mcp-audit.jsonl` append-only file in app data dir would let users see "what did Claude do overnight". Trade-off: secrets-in-summaries risk if `write_pane` text leaks past the 80-char truncation. Deferred.
+- **xterm.js RenderService errors** (`Cannot read properties of undefined (reading 'dimensions')`) showed up in dev tools during this session — completely unrelated to MCP work, likely a pane being resized or detached mid-render. File when convenient.
+
 ### 2026-05-26 — MCP persistence + Claude Code OAuth bug + `mcp-remote` shim

 Set out to fix two paper cuts (port + token re-rolled every server restart, so firewall rules and `.mcp.json` had to be re-pasted). Ended up unwinding a multi-layer breakage in Claude Code's HTTP-MCP client.