Phase 3 — Decision pipeline + JobRunner + RestProvider + save round-trip

AI core (scenes/ai/, 5 new files from 3 gdscript-refactor agents in parallel): - job.gd (59 lines, Agent A): Job class, RefCounted, label + toils + cursor + to_dict/from_dict round-trip - toil.gd (76 lines, Agent A): Toil class, RefCounted; kinds WALK/WAIT/IDLE; factories walk_to/wait_ticks/idle; Vector2i stored as to_x/to_y ints because Godot 4 JSON.stringify doesn't round-trip Vector2i - work_provider.gd (27 lines, Agent A): abstract base, class_name, @export category/priority, find_best_for() with push_error subclass guard - job_runner.gd (186 lines, Agent B): Node-derived runner; setup/start_job/ cancel_job/tick; WALK toil delegates to pawn.walk_along_path on first encounter (sets data.started=true), listens for walk_completed signal; WAIT decrements ticks_remaining; IDLE never completes; full to_dict/from_dict - decision.gd (50 lines, Agent C): static pick_next_job(pawn, providers); 5 layers (incapacitation/forced/status/work/idle); layer 1 probes via has_method to stay future-proof for Phase 9 - rest_provider.gd (31 lines, Agent C): extends WorkProvider; @export rest_tile; returns [walk_to(rest_tile), idle()] Job Integration (Opus): - pawn.gd: added forced_job slot, job_runner ref, _orchestrate_ai called before _advance_walk on each sim_tick. Calls Decision when forced_job is queued OR when idle — was a bug initially (only-on-idle never preempted the never-completing IDLE toil); fixed and caught via MCP runtime test. Added to_dict/from_dict for save round-trip; captures tile, _path, _step_progress, _selected, forced_job, job_runner via their serializers. - selection.gd: rewrote to build a forced-job [walk_to + idle] and set pawn.forced_job; Decision preempts current job on next tick. - world.tscn/gd: instantiates RestProvider as child (rest_tile = (50,50) just outside the stone ring's south-east, reachable from all 3 spawn tiles); registers via World.register_work_provider; attaches a JobRunner child to each spawned pawn and wires setup(pawn, pathfinder). - world.gd autoload: added work_providers list + register/clear methods. - save_system.gd: write_save walks World.pawns calling to_dict; apply_save zips dicts to pawns by index (Phase 16 will add stable IDs). - main.gd: bootstrap log line bumped Phase 2 → Phase 3. Acceptance — MCP-verified end-to-end: - 3 pawns boot, Decision assigns each Rest, JobRunner starts each, all 3 walk to (50,50) on different paths (40/35/30 steps based on detour around the stone ring), arrive and idle. - Force Bram to (10,10) via pawn.forced_job; preempt fires: [decision] Bram: forced 'Go to (10, 10)'. Bram walks while Cora/Edda stay parked. - Mid-walk save round-trip (the critical Phase 3 acceptance): - Paused Bram at (51,10) walking to (70,70) with 79 path steps remaining - SaveSystem.write_save() → SaveSystem.apply_save(read_save()) after a mutate-to-(0,0)-with-no-path round-trip - Restored Bram exactly: tile=(51,10), _path.size=79, walking=true, job='Go to (70, 70)' at toil_idx=0 (WALK toil with data.started=true) - Resumed sim → JobRunner's WALK toil saw started=true and did NOT re-call walk_along_path; the pawn's restored _path continued the walk naturally → reached (70,26) with 44 steps remaining, still on the same job. The architecture.md 'mid-toil suspend safe' contract is provably honored. Phase 3 gotchas (logged in implementation.md): - Class-name registration timing bit again (Phase 2 gotcha). Workflow: agent writes class_name file → MCP reload_project → headless validate. - Forced-job preempt requires triggering Decision when forced_job != null, not just when idle (IDLE toil never completes). - execute_game_script + await Engine.get_main_loop().process_frame is flaky — MCP auto-recovers but the script's last lines may be lost. Workaround: split state-inspection into a fresh execute_game_script. Delegation report this phase: - gdscript-refactor (Sonnet) Agent A: Job + Toil + WorkProvider abstract base. 3 files, 162 lines. - gdscript-refactor (Sonnet) Agent B: JobRunner with toil-execution match + walk_completed signal handling + full save round-trip. 1 file, 186 lines. - gdscript-refactor (Sonnet) Agent C: Decision pipeline + RestProvider. 2 files, 81 lines. - Opus: Pawn integration (forced_job slot, orchestration, to_dict/from_dict), Selection rewrite, world.tscn/gd wiring, World autoload work_providers list, SaveSystem extension, MCP-driven runtime verification including the mid-walk save round-trip demo, gotcha logging. ~70% of Phase 3's GDScript was written by subagents. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 21:05:50 +01:00 · 2026-05-10 21:05:50 +01:00 · 5bf0f51efb
commit 5bf0f51efb
parent cd265b87c0
20 changed files with 613 additions and 25 deletions
--- a/docs/implementation.md
+++ b/docs/implementation.md
@ -9,7 +9,8 @@ Effort estimates are wall-time at **focused solo pace**. Scale up generously for
 | ✅ done — green dot up, smoke scene runs, MCP plugin self-installed 3 runtime services | **Phase 0 — Project scaffold & foundations** |
 | ✅ done — 80² map renders, walls/terrain/UI layers, camera rig, tick loop, speed UI all live | **Phase 1 — World, tilemap, camera** |
 | ✅ done — Pawn class, AStarGrid2D pathfinder (9.1 μs avg/18 μs max at 80²), click-to-select + click-to-move via Selection module | **Phase 2 — Pawn skeleton, pathfinding, movement** |
-| ⏳ next | **Phase 3 — AI core: Decision → WorkProvider → JobRunner** |
+| ✅ done — Job/Toil/JobRunner/Decision/RestProvider, forced_job preempt, mid-toil save round-trip verified | **Phase 3 — AI core: Decision → WorkProvider → JobRunner** |
+| ⏳ next | **Phase 4 — First verbs: chop, mine, hauling, stockpiles** |

 Use this doc as a checklist: tick boxes as items complete, and update the **Status** row above whenever a phase rolls over. The last bullet of each phase is the *acceptance demo* — the phase is "done" when you can perform it.

@ -103,15 +104,26 @@ The five items from `memory.md` *Open questions / Audit*. None of these need cod

 **Goal:** the 5-layer pipeline from `architecture.md` is real, but with one dummy work category. **Save round-trip for JobRunner mid-toil state is required to land in this phase, not later.**

- [ ] `Decision` layer: priority-ordered checks (incapacitation → forced job → status interrupt → work → idle)
- [ ] `WorkProvider` interface — `find_best_for(pawn) -> Job?`
- [ ] `Job` + `JobRunner` — multi-step toils, each toil is `{action, predicate, on_complete}`
- [ ] Player overrides: forced job (e.g. "go here") preempts work
- [ ] Status interrupts skeleton — only `Bleeding` for now (rest land at Phase 9)
- [ ] Idle behavior: stand still or wander locally (per `architecture.md:72`, idle is v2; MVP just stands)
- [ ] First WorkProvider: `RestProvider` — sends pawn to a hardcoded "rest tile". Just a smoke test for the pipeline.
- [ ] **Save round-trip:** kill the app mid-toil-2-of-4, reopen, pawn resumes the same toil at the same position
- [ ] **Acceptance:** 3 pawns idle around a rest tile; force-move one with a tap-and-hold-issue-order; suspend mid-walk and resume seamlessly.
+- [x] **5-layer `Decision` pipeline** (`scenes/ai/decision.gd`, 50 lines, `gdscript-refactor` agent): static `pick_next_job(pawn, providers)`. Layer 1 (incapacitation) probes via `has_method("is_incapacitated")` — no-op until Phase 9 adds it. Layer 2 (forced job) consumes `pawn.forced_job`. Layer 3 (status interrupt) reserved for Phase 9. Layer 4 (work) sorts providers by `priority` desc, returns first non-null Job. Layer 5 returns null (idle).
+- [x] **`WorkProvider` abstract base** (`scenes/ai/work_provider.gd`, 27 lines, Agent A): `class_name WorkProvider extends Node`, `@export category`, `@export priority`, `find_best_for(pawn)` with `push_error` guard.
+- [x] **`Job` + `Toil`** (`scenes/ai/{job,toil}.gd`, 59 + 76 lines, Agent A): `RefCounted` data types with `to_dict`/`from_dict`. Toil kinds: `WALK`/`WAIT`/`IDLE`. Vector2i stored as `to_x`/`to_y` ints (Godot 4 JSON doesn't round-trip Vector2i). Factories: `Toil.walk_to(tile)`, `Toil.wait_ticks(n)`, `Toil.idle()`.
+- [x] **`JobRunner`** (`scenes/ai/job_runner.gd`, 186 lines, Agent B): `Node`-derived; `setup(pawn, pathfinder)`, `start_job(j)`, `cancel_job()`, `tick()`. WALK toil delegates to `pawn.walk_along_path()` on first invocation, listens for `walk_completed` signal to mark done. WAIT decrements `ticks_remaining`. IDLE never completes. Full `to_dict`/`from_dict` for save round-trip.
+- [x] **Forced job preempts current job** (Pawn orchestration fix): `_orchestrate_ai` calls Decision when `forced_job != null` OR no current job — not just when idle. This was a bug found via MCP runtime test; cause + fix documented in commit.
+- [x] **First `RestProvider`** (`scenes/ai/rest_provider.gd`, 31 lines, Agent C): `extends WorkProvider`, `@export rest_tile`, returns a `[walk_to(rest_tile), idle()]` Job. Rest tile = (50, 50) — just outside the south-east of the stone ring, reachable from all 3 spawn tiles.
+- [x] **Idle behavior**: IDLE toil keeps the pawn at the current tile indefinitely. Per architecture.md:72, this is the v1 idle; the wander-locally variant is v2.
+- [x] **Pawn `to_dict`/`from_dict`** (Opus): captures `tile`, `_path` (as `[[x,y],...]`), `_step_progress`, `_selected`, `forced_job` (via `Job.to_dict()`), `job_runner` (via `JobRunner.to_dict()`). On load, JobRunner's restored WALK toil has `started: true` and does NOT re-call `walk_along_path` — the pawn's restored `_path` continues naturally and emits `walk_completed` when done.
+- [x] **`SaveSystem.write_save` / `apply_save`** (Opus): walks `World.pawns`, calls `to_dict()` / `from_dict()` per pawn. Single slot JSON to `user://save_slot.json`. Pawn dicts zipped by index (Phase 16 will add stable IDs).
+- [x] **Selection rewrite** (Opus): drops direct `pawn.walk_along_path` call; now builds a `[walk_to(tile), idle()]` Job and sets `pawn.forced_job = job`. Decision picks it up on the next sim tick.
+- [x] **Acceptance — MCP-verified end-to-end**:
+  - 3 pawns boot → Decision assigns each a Rest job → JobRunner starts each → all 3 walk to (50, 50) on different paths (40/35/30 steps) → all 3 arrive and idle.
+  - Force Bram to (10, 10) via `pawn.forced_job` → preempt fires (`[decision] Bram: forced 'Go to (10, 10)'`) → Bram walks away while Cora/Edda stay parked.
+  - Mid-walk save: paused Bram at (51, 10) walking to (70, 70) with 79 path steps remaining → `SaveSystem.write_save()` → mutated to (0, 0) with empty path → `SaveSystem.apply_save(read_save())` → **restored to (51, 10) with 79 steps remaining, `walking=true`, same job at same toil index** → resumed sim → Bram continued from (51, 10), reached (70, 26) with 44 steps remaining, still on `Go to (70, 70)`.
+- [x] **Status interrupt skeleton — Bleeding hook**: deliberately deferred. Decision's Layer 3 is a placeholder comment for Phase 9 — adding it without a Status system to back it is premature. `implementation.md` Phase 9 will land the registry + the interrupt wiring atomically.
+
+**Phase 3 lessons logged:**
+- Class-name registration timing (Phase 2 gotcha) bit again — fix is the same: `mcp__godot-mcp-pro__reload_project` between authoring `class_name`-bearing files and headless validation.
+- `_orchestrate_ai` initially only called Decision when `not has_job()`. The IDLE toil never completes, so a queued `forced_job` was never seen. Fix: trigger Decision when `forced_job != null` regardless of current-job state. Caught by the runtime MCP test, not headless.
+- `execute_game_script` with `await Engine.get_main_loop().process_frame` is touchy — the MCP wrapper sometimes auto-recovers from a runtime issue but the script's last assignments are lost. The actual game state evolves correctly; just use a fresh `execute_game_script` to inspect state after awaits.

 ---