Add AI receipt scanning with OCR pipeline and debug toggle

- OCR pipeline: Tesseract (images) + pdfplumber (PDFs) → AI text prompt →
  rule-based regex fallback; works with any text model, not just vision models
- Scan Receipt toolbar button parses a photo and pre-fills the transaction form;
  receipt image is automatically attached to the created transaction
- AI settings page: provider, API key (AES-256-GCM encrypted), custom URL,
  model, and per-user debug toggle that gates the OCR/AI debug panel
- Fix CSRF cookie secure=False so HTTP deployments work; add 7-day max_age
- Fix attachment_refs missing from _to_response (attachments never appeared in UI)
- Fix multipart boundary lost when Content-Type was set manually in axios calls
- nginx: raise client_max_body_size to 15 MB, add 120s proxy timeout for OCR
- Migration 0005: add ai_debug boolean to users table
- Update README and CLAUDE.md with AI scanning docs and architecture notes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
megaproxy 2026-04-22 22:07:38 +00:00
parent a7c54ca61c
commit 26e2a055db
16 changed files with 397 additions and 99 deletions

View file

@ -14,6 +14,7 @@ Runs entirely on your own hardware via Docker Compose. Designed for LAN access w
- Transfer detection between accounts
- Recurring transaction rules (rrule)
- Receipt and document attachments on transactions (JPEG, PNG, WebP, PDF — up to 10 MB each)
- **AI receipt scanning** — photograph a receipt to auto-extract merchant, amount, date, and description into a new transaction; receipt is automatically attached
- CSV import with **auto-detection** for 10 UK bank formats: Monzo, Starling, Revolut, Barclays, Lloyds, NatWest, HSBC, Santander, Nationwide, and generic fallback
- SHA-256 deduplication prevents re-importing the same transactions
@ -101,6 +102,7 @@ Ten independent security layers:
| Database | PostgreSQL 16 with pgcrypto and RLS |
| Cache / Sessions | Redis 7 |
| ML | Prophet, statsmodels, NumPy, SciPy |
| OCR | Tesseract 5, pdfplumber, pdf2image |
| Background jobs | APScheduler (in-process) |
| Containerisation | Docker Compose |
@ -172,6 +174,34 @@ Forward your domain to `http://<host>:4000`. The frontend nginx serves the React
---
## AI Receipt Scanning
Receipt scanning uses OCR (Tesseract) to extract text from the image first, then optionally passes that text to an AI model to parse it into structured fields. This means **any text-capable LLM works** — you're not limited to vision models.
If no AI is configured, or if the AI call fails, a rule-based parser runs on the OCR text as a fallback (finds totals, dates, and merchant names via regex).
### Setup
Go to **Settings → AI** and fill in:
| Field | Description |
|-------|-------------|
| Provider | `Anthropic` or `OpenAI-compatible` |
| API Key | Your key (stored AES-256-GCM encrypted on your server) |
| Custom API URL | Optional — for Open WebUI, LM Studio, Ollama, etc. |
| Model | Optional — defaults to `claude-haiku-4-5-20251001` or `gpt-4o-mini` |
| Debug mode | Shows OCR text and raw AI response in the scan form when enabled |
For **Open WebUI**: set the provider to `OpenAI-compatible`, enter `http://your-server:port` as the URL (MyMidas appends `/v1/chat/completions`), and enter the model name exactly as shown in Open WebUI's interface.
Use the **Test connection** button to verify your settings before scanning.
### Usage
Click **Scan Receipt** in the transactions toolbar, select a photo or PDF. The form opens pre-filled with extracted fields — review and save. The receipt image is automatically attached to the created transaction.
---
## Backups
Encrypted backups run automatically every night at 3 AM (GPG AES-256 symmetric encryption). Backups are stored in `./data/backups/` and retained for 30 days.