Session log: MCP v2 PR-1 + PR-1b (policy engine + dispatcher)

Document the fan-out approach (3 Sonnet agents + 1 Haiku), the
event/reply RPC pattern, the 10 hard-deny rules and their caveats,
the audit + confirm + Always-Allow UX, and the four integration
bugs worth remembering (Tauri 2 Emitter trait import, McpError
'static strings, React 18 StrictMode listen() race, lifting the
audit subscription out of AuditTab).
This commit is contained in:
megaproxy 2026-05-26 12:29:17 +01:00
parent 26ffe8859a
commit 09019a0ad7

View file

@ -51,6 +51,95 @@ Durable memory for this project. Read at session start, update before session en
## Session log
### 2026-05-26 — MCP v2 PR-1 + PR-1b: policy engine, audit log, dispatcher, `set_label` end-to-end
First two of four planned PRs for the MCP write surface. Shipped via fan-out (3 Sonnet agents in parallel + 1 Haiku for fuzz tests, then sequential integration by me). Two clean commits: `464c576` (PR-1 foundation) and `26ffe88` (PR-1b dispatcher + bug fixes).
**Architecture: Pattern A (event/reply RPC across the IPC boundary).** Frontend keeps tree authority (it's `useState` in App.tsx); backend MCP tool handlers can't synchronously call into JS. Tauri 2's `invoke` is JS→Rust only, so a backend-initiated mutation has to round-trip through events:
```
[MCP tool handler] [App.tsx]
build {requestId, tool, args, ...} ⟶ emit "mcp://request"
register oneshot in PendingActions frontend dispatcher:
await rx with 30s timeout 1. policy check decided needsConfirm
2. if needsConfirm → modal queue
3. runMcpHandler mutates tree
4. invoke("mcp_action_reply", {id, result})
⟵ oneshot resolves
emit "mcp://audit" with outcome
return to MCP client
```
`TileService` now holds an `AppHandle` and an `Arc<PendingActions>` (oneshot registry keyed by uuid-shaped id). The dispatch helper centralises policy → emit → await → audit emission for every write tool.
**Policy engine (`src-tauri/src/mcp_policy.rs`, 1152 lines).** Three-tier `allow / ask / deny`, deny-first precedence mirroring Claude Code's `.claude/settings.json` shape — users already know this DSL. Glob matcher (`*` only, not regex) with shell-operator-aware subcommand splitting on `&&`, `||`, `;`, `|`, `|&`, `&`, newline — a deny rule fires if ANY subcommand matches (defeats `safe-cmd && rm -rf /`).
**Hard-deny list — compiled-in, non-overridable, visible-only-in-UI.** Ten regex patterns the user CANNOT disable, applied to `write_pane` shell content:
1. `rm -rf /` (and option-order variants like `-Rf`)
2. `rm -rf ~` / `rm -rf $HOME`
3. `rm -rf /*`
4. `:(){ :|:& };:` (fork bomb)
5. `mkfs.<fs> /dev/...`
6. `dd ... of=/dev/(sd|nvme|hd|disk)...`
7. `> /etc/(passwd|shadow|sudoers)`
8. `curl|wget ... | (sudo )?(ba?sh|zsh)` (pipe to shell from network)
9. `chmod -R 777 /`
10. `find / ... -delete`
Caveats deliberately disclosed in the UI: best-effort accident prevention only (`\rm`, `${SHELL} -c`, aliases all bypass); POSIX-only in v2 (PowerShell equivalents deferred to v2.1); evaluated on the bytes sent in one `write_pane`, not after the remote shell composes them. *Not a sandbox.*
73 fuzz tests for the matcher (positive variations + lookalike negatives like `rm -rf /tmp/foo`, `dd of=backup.img`, `chmod 777 /tmp/file`). The shape-of-rule test grid is in `mod hard_deny_fuzz` at the bottom of mcp_policy.rs.
**Audit log surface.** Backend emits `mcp://audit` after every tool call resolves with `{tsMs, tool, argsSummary (truncated 80), result: ok|denied|failed, durationMs}`. Ring buffer of 200 entries. Args summary explicitly capped — `write_pane` text would otherwise turn the panel into a secret-leak surface if Claude pastes a token.
**`McpPanel` refactored into three tabs: Config / Audit / Policy.** Config kept the existing snippet/regen UI. Audit is a presentational table with chip-coloured rows. Policy is three vertically-stacked allow/ask/deny buckets with add/remove + a Save button, plus a read-only "Always blocked (built-in)" section showing the 10 hard-deny labels with "Cannot be disabled" badges.
**Confirm modal (`McpConfirm.tsx`).** Amber-bordered modal. Shows tool, policy reason ("default", a matched ask rule, etc.), a human-readable summary built per-tool (`Rename pane "X" → "Y"`), and an expandable raw-args block. Enter = accept, Esc = reject. Third button: **"Always allow {tool}"** — appends bare tool name to the policy allow bucket inline, then resolves the current call. Toast confirms.
**Default policy is empty → every call asks.** Restrictive by design; the user enables parts. Saved to `%APPDATA%\com.megaproxy.tiletopia\mcp-policy.json` via the same atomic tmp+rename pattern as `mcp.json`/`hosts.json`/`workspace.json`.
**Classifier hook scaffold (no-op).** `PolicyClassifier` trait + `ClassifierHint` enum + `NoopClassifier` in mcp_policy.rs. Not wired into `evaluate()` yet — placeholder for v2.1 where a small LLM (Haiku via Anthropic API, or local Ollama) classifies ambiguous Ask calls to maybe-upgrade them to Allow. Architecture supports it without further refactor.
**Demo tool wired end-to-end: `set_label`.** Pure metadata change; reuses the existing `ops.setLabel``changeLabel(tree, leafId, label)` path. No PTY, no SSH, no async spawn complexity. Perfect proof-of-concept for the dispatcher — every other v2 tool follows the same shape: arg struct, validate, dispatch_action with stable args_repr, frontend handler in `runMcpHandler` switch.
**Bugs hit during integration:**
1. **Tauri 2 trait-not-in-scope.** `AppHandle::emit` moved onto `tauri::Emitter` trait in Tauri 2. The error message helpfully says "trait `Emitter` which provides `emit` is implemented but not in scope" — just `use tauri::Emitter;` next to `Manager`. Worth remembering for any future event-emission code.
2. **`McpError` constructors want `'static` strings.** Signature is `impl Into<Cow<'static, str>>`. Passing `&format!(...)` or `&e.to_string()` fails (`temporary value dropped while borrowed`). Pass the owned `String` directly — auto-converts to `Cow::Owned`. Bit me at three sites in dispatch_action.
3. **React 18 StrictMode race with `listen()`.** Classic pattern bug: `useEffect(() => { let un; void listen(...).then(fn => { un = fn }); return () => un?.() }, []);` is broken in StrictMode because the cleanup runs before the Promise resolves on the pretend-unmount, leaving the first subscription dangling. Real-world symptom was duplicate audit entries and modal-needs-two-clicks (each event handled by both subscriptions). Fix is the cancelled-flag pattern:
```ts
let cancelled = false;
let unlisten;
void listen(...).then(fn => { if (cancelled) fn(); else unlisten = fn; });
return () => { cancelled = true; unlisten?.(); };
```
Worth using *anywhere* we subscribe-via-Promise inside `useEffect`, not just for MCP events. Vite HMR also surfaces this if you're not careful — a clean restart confirmed the fix held.
4. **Stale state when audit subscription lived in AuditTab.** AuditTab unmounts when the user switches tabs or closes the panel; events fired during that window were dropped. Lifted subscription to App.tsx, made AuditTab presentational (props in, table out). Same pattern any "always-on log" should follow.
5. **rmcp's DNS-rebinding allowlist re-bit us once.** The earlier session disabled it for WSL connectivity; PR-1 didn't regress this but it's a pattern to keep flagged — `StreamableHttpServerConfig::default().disable_allowed_hosts()` stays mandatory for our use case.
**Frontend ↔ backend contract worth saving:**
- `mcp://request` event payload (camelCase): `{requestId, tool, args, needsConfirm, reason}`
- `mcp://audit` event payload: `{tsMs, tool, argsSummary, result: {kind:"ok"|"denied"|"failed", ...}, durationMs}`
- `mcp_action_reply` Tauri command takes `{requestId, result}` where result is externally-tagged `{Ok: value}` or `{Err: msg}` — that's serde's default tagging for `Result<T,E>`, NOT a custom shape.
- Tauri 2 command argument-name binding: JS sends `{policy}`, Rust receives `policy: McpPolicy` — direct lowercase match. McpPolicy has no `#[serde(rename_all = ...)]`, so field keys (`version`, `permissions`, `deny`, `ask`, `allow`) match identity. Verified with debug-log instrumentation during the save-not-persisting investigation (it was actually working; user's first test predated the cargo rebuild).
Open follow-ups specific to this session:
- **PR-2 (next):** `close_pane`, `swap_panes`, `promote_pane`, `apply_preset`. Same dispatcher shape; the `apply_preset` data-loss case wants an `allow_drops: true` arg rather than a separate modal (per the earlier scope notes).
- **PR-3 (the hard one):** `spawn_pane`, `write_pane`, `connect_host`. Needs (a) spawn-completion oneshot resolution chain (await `registerPaneId`), (b) per-host SSH confirm even on spawn (Claude opening a shell on prod is equally consequential to writing to it), (c) rate limiter on `write_pane` (per OWASP LLM06 + MCP spec MUST).
- **PR-4:** `add_host` + `extraArgs` sanitiser (ProxyCommand exfil risk for OpenSSH).
- **v2.1 classifier:** wire `PolicyClassifier` into `evaluate()` so Ask calls can be auto-upgraded to Allow by a small LLM. Haiku is the cheap/fast pick; needs an API key surface in settings.
- **PowerShell hard-deny patterns** (`Remove-Item -Recurse -Force C:\`, `Format-Volume`, etc.). Deferred until users actually use PowerShell panes with MCP enabled.
- **`.mcpb` bundle** — still on the list; PR-1b's stdio-shim recipe is what it would package up.
- **Confirm modal queueing UX** — currently shows one at a time, FIFO. If Claude burst-sends many tool calls, the user gets serial modals. Probably fine for v2; if it gets annoying, add a "reject all pending" button.
- **Audit log persistence** — currently ephemeral ring of 200. A `mcp-audit.jsonl` append-only file in app data dir would let users see "what did Claude do overnight". Trade-off: secrets-in-summaries risk if `write_pane` text leaks past the 80-char truncation. Deferred.
- **xterm.js RenderService errors** (`Cannot read properties of undefined (reading 'dimensions')`) showed up in dev tools during this session — completely unrelated to MCP work, likely a pane being resized or detached mid-render. File when convenient.
### 2026-05-26 — MCP persistence + Claude Code OAuth bug + `mcp-remote` shim
Set out to fix two paper cuts (port + token re-rolled every server restart, so firewall rules and `.mcp.json` had to be re-pasted). Ended up unwinding a multi-layer breakage in Claude Code's HTTP-MCP client.