Idle filter: pivot per-distro → per-pane via TILETOPIA_PANE_ID env marker

Per-distro suppression (shipped earlier today) broke tiletopia's primary
use case — multiple claude panes per distro means as soon as one runs
claude, ALL Ubuntu panes go silent. Tested live: user couldn't reproduce
idle on any pane because PID 46848 (their main session) tripped the gate.

New mechanism, per-pane via env-var marker:

1. pty.rs tags every WSL spawn with TILETOPIA_PANE_ID=<id> as a Windows
   env var, plus WSLENV=...TILETOPIA_PANE_ID/u (appended to any pre-
   existing WSLENV) so the var forwards into the distro. Pane id is now
   reserved BEFORE build_command so the tag is available at spawn time.
2. probe.rs rewritten — is_watch_process_running(distro, pane_id) runs
   a bash one-liner that pgreps for each watched name, then for each PID
   checks /proc/<pid>/environ for the matching TILETOPIA_PANE_ID line.
   Env inheritance does the work: shell inherits from wsl.exe, claude
   inherits from shell. Cache keyed by (distro, pane_id).
3. Fail-safe INVERTED: probe failure now returns false (don't suppress)
   instead of true (suppress). A transient error should never silence
   the idle indicator permanently. Frontend catch updated to match.
4. LeafPane tracks PaneId in paneIdRef set by onPaneSpawned; idle ticks
   before spawn-completion pass 0, which won't match any real marker so
   the pane idles normally.

Existing panes won't have the marker until respawned — they'll always
show idle (since probe never matches). User opens fresh panes once after
deploying this. Documented in memory.md follow-ups.

pnpm check clean. Rust validation: cargo test --lib on Windows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
megaproxy 2026-05-26 17:58:51 +01:00
parent d3474d33b0
commit 6772b8db37
6 changed files with 230 additions and 124 deletions

View file

@ -320,11 +320,12 @@ pub async fn mcp_hard_deny_labels() -> Result<Vec<&'static str>, String> {
pub async fn is_watch_process_running(
cache: tauri::State<'_, Arc<ProbeCache>>,
distro: String,
pane_id: PaneId,
) -> Result<bool, String> {
// Probe shells out — keep it off the async runtime's thread.
let cache_arc: Arc<ProbeCache> = (*cache).clone();
let running = tokio::task::spawn_blocking(move || {
cache_arc.is_watch_process_running(&distro)
cache_arc.is_watch_process_running(&distro, pane_id)
})
.await
.map_err(|e| format!("probe join failed: {e}"))?;

View file

@ -1,63 +1,66 @@
//! "Is a watched process running in distro X?" probe for the idle-detection
//! filter.
//! "Is a watched process running in THIS pane?" probe for the idle filter.
//!
//! Background: tiletopia's idle indicator fires whenever a pane goes 5s
//! without PTY output. When the user is reading a long `claude` response,
//! the pane is silent but there's nothing actionable to surface — the
//! indicator becomes noise. This module lets the frontend ask the backend
//! "is `claude` (or any other watched process) running in this distro?"
//! before flagging a pane idle, and suppresses the indicator if so.
//! Background: tiletopia's idle indicator fires when a pane goes 5s without
//! PTY output. When the user is reading a long `claude` response the pane
//! is silent but nothing actionable is happening — the indicator becomes
//! noise. This module lets the frontend ask "is `claude` running in pane N?"
//! before flagging idle, and suppresses if so.
//!
//! Granularity is per-distro, not per-pane. Identifying which Windows pane
//! corresponds to which Linux-side shell inside the distro is too complex
//! (PIDs aren't visible from Windows; ProcMon-style probes are fragile). If
//! `claude` is running anywhere in distro X, idle is suppressed for ALL
//! panes in distro X. Over-suppression for multi-pane-same-distro users is
//! the agreed trade-off; the previous bug (always notify) was worse.
//! ## Per-pane granularity (revised v2 design)
//!
//! PowerShell + SSH panes don't go through this probe — the frontend short-
//! circuits to "always idle" for them. (PowerShell has no portable `ps`
//! equivalent; SSH processes live on a remote box and would need a separate
//! transport.)
//! v1 of this module was per-distro: one `pgrep` in the distro answered for
//! all panes. That was wrong for tiletopia's primary use case — running
//! multiple claude sessions across panes in the same distro is THE point of
//! the app, and per-distro suppression silenced every pane the moment one
//! ran claude. Revised: per-pane via env-var marker.
//!
//! The probe shells out (`wsl.exe -d <distro> -- pgrep -x ...`), which costs
//! ~100-300ms per call. We cache the answer per-distro for a few seconds so
//! the frontend can poll on every idle tick without storming `wsl.exe`.
//! How it works:
//!
//! 1. `pty.rs` tags every WSL spawn with `TILETOPIA_PANE_ID=<id>` propagated
//! into the distro via `WSLENV`. The user's shell inherits it; every
//! descendant process inherits from the shell. So `claude` running in
//! pane N has `TILETOPIA_PANE_ID=N` in `/proc/<claude_pid>/environ`.
//! 2. This probe runs `pgrep -x <name>` for each watched process, then for
//! each PID it returns reads `/proc/<pid>/environ` (null-separated) and
//! checks for an exact `TILETOPIA_PANE_ID=<target>` entry.
//! 3. Cache keyed by `(distro, pane_id)`; ~3s TTL.
//!
//! PowerShell + SSH panes still skip the probe (frontend short-circuits).
//! No `/proc` on the remote side for SSH, no parallel concept on Windows.
use std::collections::HashMap;
use std::time::{Duration, Instant};
use parking_lot::Mutex;
/// Built-in list of process names that suppress idle when running. v1 ships
/// with just `claude`; the user can extend it via the workspace config later.
/// Built-in list of process names whose presence in a pane suppresses idle.
///
/// [[user-watch-list]] TODO: surface this as a user-editable list (workspace
/// config field or dedicated `watch.json`). For now the constant covers the
/// only real-world use case (Anthropic's `claude` CLI taking its time on a
/// long response). Adding entries to the constant is the only knob.
/// [[user-watch-list]] TODO: surface this as a user-editable list
/// (workspace config field or dedicated `watch.json`). For now the constant
/// covers the only real-world use case (Anthropic's `claude` CLI taking its
/// time on a long response). Adding entries to the constant is the only
/// knob today.
pub const DEFAULT_WATCH_PROCESSES: &[&str] = &["claude"];
/// How long a per-distro probe result is reused before we re-shell. Sized
/// against the frontend's 1s idle-tick interval — 3s means roughly one
/// probe per distro per 3 ticks even with many panes polling, while still
/// reacting to "claude just finished" within a few seconds. Trade-off: too
/// short = wsl.exe spam, too long = stale "claude is running" once the
/// process actually exits.
/// How long a probe result is reused before we re-shell. Sized against the
/// frontend's 1s idle-tick interval — 3s means ~one `wsl.exe` call per
/// (distro, pane) per 3 ticks while reacting to "claude finished" within a
/// few seconds. Too short = wsl.exe spam; too long = stale answer once
/// claude actually exits.
const CACHE_TTL: Duration = Duration::from_secs(3);
/// Cache entry: timestamp the probe ran + whether any watched process was
/// found in the distro.
/// found in this specific pane.
#[derive(Clone, Copy)]
struct CacheEntry {
at: Instant,
running: bool,
}
/// Per-distro probe cache. Keyed by distro name (the same string the user
/// sees in the shell picker; the same string we pass as `wsl.exe -d`).
/// Probe cache keyed by `(distro, pane_id)` so panes in the same distro
/// running different processes get independent answers.
pub struct ProbeCache {
cache: Mutex<HashMap<String, CacheEntry>>,
cache: Mutex<HashMap<(String, u64), CacheEntry>>,
}
impl ProbeCache {
@ -67,17 +70,19 @@ impl ProbeCache {
}
}
/// Returns true iff one of the watched processes is running in the
/// distro. Cached for {@link CACHE_TTL}; cache misses (or stale entries)
/// trigger a fresh probe. On probe failure the result is `true` —
/// **fail-safe is to suppress** the idle indicator, matching the
/// agreed trade-off ("over-suppression beats the previous always-notify
/// behaviour").
pub fn is_watch_process_running(&self, distro: &str) -> bool {
/// Returns true iff one of the watched processes is running in pane
/// `pane_id` of `distro`. Cached for {@link CACHE_TTL}. On probe failure
/// returns `false` — **fail-safe is to NOT suppress**. The v1 fail-safe
/// of "suppress on error" was wrong: a transient probe failure shouldn't
/// silence the idle indicator. Better to occasionally over-notify than
/// permanently silence.
pub fn is_watch_process_running(&self, distro: &str, pane_id: u64) -> bool {
let key = (distro.to_string(), pane_id);
// Fast path: fresh cached answer.
{
let guard = self.cache.lock();
if let Some(entry) = guard.get(distro) {
if let Some(entry) = guard.get(&key) {
if entry.at.elapsed() < CACHE_TTL {
return entry.running;
}
@ -85,12 +90,12 @@ impl ProbeCache {
}
// Slow path: re-probe. Drop the lock before shelling out so other
// distros' probes aren't blocked.
let running = probe_distro(distro, DEFAULT_WATCH_PROCESSES);
// probes aren't blocked.
let running = probe_pane(distro, pane_id, DEFAULT_WATCH_PROCESSES);
let mut guard = self.cache.lock();
guard.insert(
distro.to_string(),
key,
CacheEntry {
at: Instant::now(),
running,
@ -106,67 +111,88 @@ impl Default for ProbeCache {
}
}
/// Run `wsl.exe -d <distro> -- pgrep -x <name>` for each watched name.
/// Returns true on the first hit. On any failure (wsl.exe missing, distro
/// not running, pgrep not installed, timeout) returns true — fail-safe is
/// suppression.
fn probe_distro(distro: &str, watched: &[&str]) -> bool {
/// Bash one-liner: for each watched process name, `pgrep -x` for it; for
/// each matching PID, check `/proc/<pid>/environ` for an exact
/// `TILETOPIA_PANE_ID=<target>` entry (null-separated, so we `tr` it to
/// newlines and exact-line-match with `grep -xF`). Exit 0 = match, 1 = no
/// match, anything else = probe failure (treated as `false` upstream —
/// see fail-safe note on `is_watch_process_running`).
///
/// `bash` (not `sh`) is required for process substitution `< <(pgrep ...)`.
/// Both bash and pgrep are installed by default on every WSL distro
/// tiletopia targets; if a minimal distro is missing them the probe falls
/// to "not running" and the pane goes idle normally (better than the v1
/// fail-safe which kept suppressing forever).
const PROBE_SCRIPT: &str = r#"
target_id="$1"
shift
for name in "$@"; do
while IFS= read -r pid; do
[ -z "$pid" ] && continue
if [ -r "/proc/$pid/environ" ]; then
if tr '\0' '\n' < "/proc/$pid/environ" 2>/dev/null | grep -qxF "TILETOPIA_PANE_ID=$target_id"; then
exit 0
fi
fi
done < <(pgrep -x "$name" 2>/dev/null)
done
exit 1
"#;
fn probe_pane(distro: &str, pane_id: u64, watched: &[&str]) -> bool {
if !cfg!(windows) {
// Non-Windows builds don't actually ship the app; pretend no watched
// process so the idle indicator works for developer test runs.
// Non-Windows builds don't ship the app; pretend no watched process
// so developer test runs see the idle indicator working.
return false;
}
if distro.is_empty() {
// We can't probe an empty distro name; treat as "no info" → fail-safe.
tracing::debug!("probe: empty distro name; defaulting to suppression");
return true;
tracing::debug!("probe: empty distro name; treating as not-running");
return false;
}
for name in watched {
match probe_one(distro, name) {
Ok(true) => return true,
Ok(false) => continue,
Err(e) => {
tracing::debug!(
"probe: wsl pgrep for {name:?} in {distro:?} failed: {e} — suppressing idle"
);
return true;
}
}
}
false
}
// Compose args: bash -c <script> _ <pane_id> <watch>...
// The `_` is `$0` for the script, then watch names are `$@`.
let mut args: Vec<String> = vec![
"-d".to_string(),
distro.to_string(),
"--".to_string(),
"bash".to_string(),
"-c".to_string(),
PROBE_SCRIPT.to_string(),
"_".to_string(),
pane_id.to_string(),
];
args.extend(watched.iter().map(|s| s.to_string()));
/// Single `pgrep -x <name>` invocation. Ok(true) on a match, Ok(false) on
/// exit code 1 (no match), Err on anything else. Wrapped in our standard
/// `quiet_command` so the console window doesn't flash on the Windows
/// desktop every probe.
fn probe_one(distro: &str, name: &str) -> std::io::Result<bool> {
// `pgrep -x` matches the exact comm (no substring), which avoids
// `claude-something-else` false-positives. Stdout/stderr are silenced
// — exit code carries the answer.
//
// Note: `name` is a compile-time string literal in DEFAULT_WATCH_PROCESSES
// (no user input), so shell-quoting concerns don't apply. If we ever
// wire user-supplied process names through here we MUST validate / shell-
// quote them before this point.
let out = crate::pty::quiet_command_pub("wsl.exe")
.args(["-d", distro, "--", "pgrep", "-x", name])
let out = match crate::pty::quiet_command_pub("wsl.exe")
.args(&args)
.stdout(std::process::Stdio::null())
.stderr(std::process::Stdio::null())
.output()?;
.output()
{
Ok(o) => o,
Err(e) => {
tracing::debug!(
"probe: wsl.exe spawn for distro={distro:?} pane={pane_id} failed: {e}"
);
return false;
}
};
match out.status.code() {
Some(0) => Ok(true), // pgrep found at least one match
Some(1) => Ok(false), // pgrep ran but found nothing
Some(0) => true, // watched process matching this pane found
Some(1) => false, // no match
Some(other) => {
// 2 = syntax error in pgrep itself; 3 = fatal error; 127 = command
// not found. None of these mean "definitively no claude running",
// so treat as a probe failure (caller fails-safe to true).
Err(std::io::Error::other(format!(
"pgrep exit code {other}"
)))
tracing::debug!(
"probe: distro={distro:?} pane={pane_id} bash exit={other} — treating as not-running"
);
false
}
None => {
tracing::debug!(
"probe: distro={distro:?} pane={pane_id} killed by signal — treating as not-running"
);
false
}
None => Err(std::io::Error::other("pgrep killed by signal")),
}
}

View file

@ -154,7 +154,32 @@ impl PtyManager {
_ => None,
};
let (cmd, spawn_err) = build_command(&spec)?;
// Reserve the pane id BEFORE spawning so we can tag the shell's
// env with it — see TILETOPIA_PANE_ID below. We still insert into
// the panes map further down, after the reader thread is wired.
let id = self.next_id.fetch_add(1, Ordering::Relaxed);
let (mut cmd, spawn_err) = build_command(&spec)?;
// WSL panes get a TILETOPIA_PANE_ID env marker so the idle-filter
// probe (probe.rs) can tell which descendant processes belong to
// which pane — inheritance does the work: the shell inherits from
// wsl.exe via WSLENV, and every child (e.g. claude) inherits from
// the shell, so checking `/proc/<pid>/environ` for the marker
// answers "is this process running in pane N?" exactly.
if matches!(spec, SpawnSpec::Wsl { .. }) {
cmd.env("TILETOPIA_PANE_ID", id.to_string());
// WSLENV controls which Windows-side env vars are forwarded into
// the distro. Append our marker rather than clobbering — users
// may have their own WSLENV set up. `/u` = always pass through
// as a Unix-style env var.
let existing = std::env::var("WSLENV").unwrap_or_default();
let combined = if existing.is_empty() {
"TILETOPIA_PANE_ID/u".to_string()
} else {
format!("{existing}:TILETOPIA_PANE_ID/u")
};
cmd.env("WSLENV", combined);
}
let child = pair.slave.spawn_command(cmd).context(spawn_err)?;
// We need to keep the master alive (drop = close the PTY), but we
@ -170,8 +195,6 @@ impl PtyManager {
let writer: SharedWriter = Arc::new(Mutex::new(writer_raw));
let ring: Arc<Mutex<PaneRing>> = Arc::new(Mutex::new(PaneRing::new()));
let id = self.next_id.fetch_add(1, Ordering::Relaxed);
self.panes.lock().insert(
id,
PaneHandle {