From 09019a0ad73cd764c6e3014fc24f68754880c0c7 Mon Sep 17 00:00:00 2001 From: megaproxy Date: Tue, 26 May 2026 12:29:17 +0100 Subject: [PATCH] Session log: MCP v2 PR-1 + PR-1b (policy engine + dispatcher) Document the fan-out approach (3 Sonnet agents + 1 Haiku), the event/reply RPC pattern, the 10 hard-deny rules and their caveats, the audit + confirm + Always-Allow UX, and the four integration bugs worth remembering (Tauri 2 Emitter trait import, McpError 'static strings, React 18 StrictMode listen() race, lifting the audit subscription out of AuditTab). --- memory.md | 89 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 89 insertions(+) diff --git a/memory.md b/memory.md index 3ca0913..454f242 100644 --- a/memory.md +++ b/memory.md @@ -51,6 +51,95 @@ Durable memory for this project. Read at session start, update before session en ## Session log +### 2026-05-26 — MCP v2 PR-1 + PR-1b: policy engine, audit log, dispatcher, `set_label` end-to-end + +First two of four planned PRs for the MCP write surface. Shipped via fan-out (3 Sonnet agents in parallel + 1 Haiku for fuzz tests, then sequential integration by me). Two clean commits: `464c576` (PR-1 foundation) and `26ffe88` (PR-1b dispatcher + bug fixes). + +**Architecture: Pattern A (event/reply RPC across the IPC boundary).** Frontend keeps tree authority (it's `useState` in App.tsx); backend MCP tool handlers can't synchronously call into JS. Tauri 2's `invoke` is JS→Rust only, so a backend-initiated mutation has to round-trip through events: + +``` +[MCP tool handler] [App.tsx] + build {requestId, tool, args, ...} ⟶ emit "mcp://request" + register oneshot in PendingActions frontend dispatcher: + await rx with 30s timeout 1. policy check decided needsConfirm + 2. if needsConfirm → modal queue + 3. runMcpHandler mutates tree + 4. invoke("mcp_action_reply", {id, result}) + ⟵ oneshot resolves + emit "mcp://audit" with outcome + return to MCP client +``` + +`TileService` now holds an `AppHandle` and an `Arc` (oneshot registry keyed by uuid-shaped id). The dispatch helper centralises policy → emit → await → audit emission for every write tool. + +**Policy engine (`src-tauri/src/mcp_policy.rs`, 1152 lines).** Three-tier `allow / ask / deny`, deny-first precedence mirroring Claude Code's `.claude/settings.json` shape — users already know this DSL. Glob matcher (`*` only, not regex) with shell-operator-aware subcommand splitting on `&&`, `||`, `;`, `|`, `|&`, `&`, newline — a deny rule fires if ANY subcommand matches (defeats `safe-cmd && rm -rf /`). + +**Hard-deny list — compiled-in, non-overridable, visible-only-in-UI.** Ten regex patterns the user CANNOT disable, applied to `write_pane` shell content: +1. `rm -rf /` (and option-order variants like `-Rf`) +2. `rm -rf ~` / `rm -rf $HOME` +3. `rm -rf /*` +4. `:(){ :|:& };:` (fork bomb) +5. `mkfs. /dev/...` +6. `dd ... of=/dev/(sd|nvme|hd|disk)...` +7. `> /etc/(passwd|shadow|sudoers)` +8. `curl|wget ... | (sudo )?(ba?sh|zsh)` (pipe to shell from network) +9. `chmod -R 777 /` +10. `find / ... -delete` + +Caveats deliberately disclosed in the UI: best-effort accident prevention only (`\rm`, `${SHELL} -c`, aliases all bypass); POSIX-only in v2 (PowerShell equivalents deferred to v2.1); evaluated on the bytes sent in one `write_pane`, not after the remote shell composes them. *Not a sandbox.* + +73 fuzz tests for the matcher (positive variations + lookalike negatives like `rm -rf /tmp/foo`, `dd of=backup.img`, `chmod 777 /tmp/file`). The shape-of-rule test grid is in `mod hard_deny_fuzz` at the bottom of mcp_policy.rs. + +**Audit log surface.** Backend emits `mcp://audit` after every tool call resolves with `{tsMs, tool, argsSummary (truncated 80), result: ok|denied|failed, durationMs}`. Ring buffer of 200 entries. Args summary explicitly capped — `write_pane` text would otherwise turn the panel into a secret-leak surface if Claude pastes a token. + +**`McpPanel` refactored into three tabs: Config / Audit / Policy.** Config kept the existing snippet/regen UI. Audit is a presentational table with chip-coloured rows. Policy is three vertically-stacked allow/ask/deny buckets with add/remove + a Save button, plus a read-only "Always blocked (built-in)" section showing the 10 hard-deny labels with "Cannot be disabled" badges. + +**Confirm modal (`McpConfirm.tsx`).** Amber-bordered modal. Shows tool, policy reason ("default", a matched ask rule, etc.), a human-readable summary built per-tool (`Rename pane "X" → "Y"`), and an expandable raw-args block. Enter = accept, Esc = reject. Third button: **"Always allow {tool}"** — appends bare tool name to the policy allow bucket inline, then resolves the current call. Toast confirms. + +**Default policy is empty → every call asks.** Restrictive by design; the user enables parts. Saved to `%APPDATA%\com.megaproxy.tiletopia\mcp-policy.json` via the same atomic tmp+rename pattern as `mcp.json`/`hosts.json`/`workspace.json`. + +**Classifier hook scaffold (no-op).** `PolicyClassifier` trait + `ClassifierHint` enum + `NoopClassifier` in mcp_policy.rs. Not wired into `evaluate()` yet — placeholder for v2.1 where a small LLM (Haiku via Anthropic API, or local Ollama) classifies ambiguous Ask calls to maybe-upgrade them to Allow. Architecture supports it without further refactor. + +**Demo tool wired end-to-end: `set_label`.** Pure metadata change; reuses the existing `ops.setLabel` → `changeLabel(tree, leafId, label)` path. No PTY, no SSH, no async spawn complexity. Perfect proof-of-concept for the dispatcher — every other v2 tool follows the same shape: arg struct, validate, dispatch_action with stable args_repr, frontend handler in `runMcpHandler` switch. + +**Bugs hit during integration:** + +1. **Tauri 2 trait-not-in-scope.** `AppHandle::emit` moved onto `tauri::Emitter` trait in Tauri 2. The error message helpfully says "trait `Emitter` which provides `emit` is implemented but not in scope" — just `use tauri::Emitter;` next to `Manager`. Worth remembering for any future event-emission code. + +2. **`McpError` constructors want `'static` strings.** Signature is `impl Into>`. Passing `&format!(...)` or `&e.to_string()` fails (`temporary value dropped while borrowed`). Pass the owned `String` directly — auto-converts to `Cow::Owned`. Bit me at three sites in dispatch_action. + +3. **React 18 StrictMode race with `listen()`.** Classic pattern bug: `useEffect(() => { let un; void listen(...).then(fn => { un = fn }); return () => un?.() }, []);` is broken in StrictMode because the cleanup runs before the Promise resolves on the pretend-unmount, leaving the first subscription dangling. Real-world symptom was duplicate audit entries and modal-needs-two-clicks (each event handled by both subscriptions). Fix is the cancelled-flag pattern: + ```ts + let cancelled = false; + let unlisten; + void listen(...).then(fn => { if (cancelled) fn(); else unlisten = fn; }); + return () => { cancelled = true; unlisten?.(); }; + ``` + Worth using *anywhere* we subscribe-via-Promise inside `useEffect`, not just for MCP events. Vite HMR also surfaces this if you're not careful — a clean restart confirmed the fix held. + +4. **Stale state when audit subscription lived in AuditTab.** AuditTab unmounts when the user switches tabs or closes the panel; events fired during that window were dropped. Lifted subscription to App.tsx, made AuditTab presentational (props in, table out). Same pattern any "always-on log" should follow. + +5. **rmcp's DNS-rebinding allowlist re-bit us once.** The earlier session disabled it for WSL connectivity; PR-1 didn't regress this but it's a pattern to keep flagged — `StreamableHttpServerConfig::default().disable_allowed_hosts()` stays mandatory for our use case. + +**Frontend ↔ backend contract worth saving:** + +- `mcp://request` event payload (camelCase): `{requestId, tool, args, needsConfirm, reason}` +- `mcp://audit` event payload: `{tsMs, tool, argsSummary, result: {kind:"ok"|"denied"|"failed", ...}, durationMs}` +- `mcp_action_reply` Tauri command takes `{requestId, result}` where result is externally-tagged `{Ok: value}` or `{Err: msg}` — that's serde's default tagging for `Result`, NOT a custom shape. +- Tauri 2 command argument-name binding: JS sends `{policy}`, Rust receives `policy: McpPolicy` — direct lowercase match. McpPolicy has no `#[serde(rename_all = ...)]`, so field keys (`version`, `permissions`, `deny`, `ask`, `allow`) match identity. Verified with debug-log instrumentation during the save-not-persisting investigation (it was actually working; user's first test predated the cargo rebuild). + +Open follow-ups specific to this session: + +- **PR-2 (next):** `close_pane`, `swap_panes`, `promote_pane`, `apply_preset`. Same dispatcher shape; the `apply_preset` data-loss case wants an `allow_drops: true` arg rather than a separate modal (per the earlier scope notes). +- **PR-3 (the hard one):** `spawn_pane`, `write_pane`, `connect_host`. Needs (a) spawn-completion oneshot resolution chain (await `registerPaneId`), (b) per-host SSH confirm even on spawn (Claude opening a shell on prod is equally consequential to writing to it), (c) rate limiter on `write_pane` (per OWASP LLM06 + MCP spec MUST). +- **PR-4:** `add_host` + `extraArgs` sanitiser (ProxyCommand exfil risk for OpenSSH). +- **v2.1 classifier:** wire `PolicyClassifier` into `evaluate()` so Ask calls can be auto-upgraded to Allow by a small LLM. Haiku is the cheap/fast pick; needs an API key surface in settings. +- **PowerShell hard-deny patterns** (`Remove-Item -Recurse -Force C:\`, `Format-Volume`, etc.). Deferred until users actually use PowerShell panes with MCP enabled. +- **`.mcpb` bundle** — still on the list; PR-1b's stdio-shim recipe is what it would package up. +- **Confirm modal queueing UX** — currently shows one at a time, FIFO. If Claude burst-sends many tool calls, the user gets serial modals. Probably fine for v2; if it gets annoying, add a "reject all pending" button. +- **Audit log persistence** — currently ephemeral ring of 200. A `mcp-audit.jsonl` append-only file in app data dir would let users see "what did Claude do overnight". Trade-off: secrets-in-summaries risk if `write_pane` text leaks past the 80-char truncation. Deferred. +- **xterm.js RenderService errors** (`Cannot read properties of undefined (reading 'dimensions')`) showed up in dev tools during this session — completely unrelated to MCP work, likely a pane being resized or detached mid-render. File when convenient. + ### 2026-05-26 — MCP persistence + Claude Code OAuth bug + `mcp-remote` shim Set out to fix two paper cuts (port + token re-rolled every server restart, so firewall rules and `.mcp.json` had to be re-pasted). Ended up unwinding a multi-layer breakage in Claude Code's HTTP-MCP client.