docs(perf): design for Quick-Wins Pass A

Short design covering four changes: mtime-based CSS cache-bust token, Django Debug Toolbar (dev-only) for profiling, N+1 fixes on Dashboard and Payroll pages, and a before/after measurement in the commit message. Scope is deliberately tight — plan B (template splitting) and plan C (full audit) are deferred until plan A evidence lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:23:48 +02:00 · 2026-04-24 00:23:48 +02:00 · d1490c4639
commit d1490c4639
parent 503eff67a0
1 changed files with 175 additions and 0 deletions
--- a/docs/plans/2026-04-24-perf-quick-wins-design.md
+++ b/docs/plans/2026-04-24-perf-quick-wins-design.md
@ -0,0 +1,175 @@
+# Perf Quick-Wins Pass — Design (24 Apr 2026)
+
+## Origin
+
+Konrad, after a long stretch of feature work (Inline Filters + Adjustments
+tab + filter-bar v2):
+
+> _"the app feel a bit sluggish especially changing between main spaces.
+> Go through the app systematically and look for bugs and un optimized
+> code. systematically go through the code and expertly and thoroughly
+> review and fix it."_
+
+Presented three scopes (A quick-wins / B focused dashboard pass / C full
+systematic audit). Konrad picked **A — quick-wins first**, on the
+principle that perf work is notorious for "big rewrite that didn't help."
+If A moves the needle, we can stop; if not, we escalate with evidence.
+
+## Goal
+
+Make navigation between main spaces (Dashboard ↔ Payroll ↔ Workers ↔
+Report ↔ Teams ↔ Projects) feel snappier. Ship in 1-3 commits. No
+architecture changes. Every change individually revertible.
+
+## Who it's for
+
+Everyone who uses the app — most immediately Konrad, who navigates
+between Dashboard and Payroll dozens of times a day.
+
+## What we already know (pre-measurement)
+
+- `payroll_dashboard.html` is 213 KB / 4,147 lines — all 4 tabs rendered
+  server-side even when only one is visible. Addressed in plan B, not A.
+- `deployment_timestamp` context var is `int(time.time())` per-request
+  → `custom.css?v=<timestamp>` is a new URL every second → Cloudflare
+  edge-cache HIT rate on CSS is effectively 0 → every page load fetches
+  64 KB of CSS from the VM. Documented as a trade-off in CLAUDE.md.
+  This is almost certainly the biggest single contributor to the
+  "heavy navigation" feel.
+- 49 `select_related`/`prefetch_related` calls vs 91
+  `.all()/.first()/.count()` calls in `views.py`. Not damning but worth
+  pointing at hot-path views.
+
+## Scope — 4 changes, in order
+
+### 1. Fix `deployment_timestamp` to bust cache only on real deploys
+
+**File:** `core/context_processors.py`
+
+Today:
+```python
+'deployment_timestamp': int(time.time()),
+```
+
+After:
+```python
+# Cache-bust token tied to the CSS file's mtime — only changes when
+# custom.css actually changes. Falls back to int(time.time()) if the
+# file isn't on disk yet (fresh container, pre-collectstatic).
+try:
+    _css_path = settings.BASE_DIR / 'static' / 'css' / 'custom.css'
+    _token = int(os.path.getmtime(_css_path))
+except (OSError, FileNotFoundError):
+    _token = int(time.time())
+```
+
+Effect: the `?v=...` query string stays constant across requests until
+`custom.css` is modified. Cloudflare can finally hold the file at its
+edge for its full 4h TTL. Repeat navigation within a session drops from
+"fetch 64 KB from VM" to "304 Not Modified" from the browser cache,
+after the first hit in a 4h window.
+
+**Degraded-mode guarantee:** if the file is missing (shouldn't happen in
+normal dev or prod, but could in a fresh container), we degrade to
+today's behaviour (per-request timestamp) rather than crash.
+
+### 2. Profile + fix N+1 on the two busiest pages
+
+**Pages:** `/` (dashboard) and `/payroll/` (payroll dashboard — all 4
+tabs — Pending, History, Loans, Adjustments).
+
+**Tool:** Django Debug Toolbar, added to `requirements.txt` as a
+dev-only dependency. Gated in `config/settings.py` so it only
+initialises when `DJANGO_DEBUG=true` AND `USE_SQLITE=true` (never
+loads in prod).
+
+**Process:**
+1. Install toolbar, confirm the SQL panel loads on `/`.
+2. Navigate to `/`, read the SQL tab: flag any query count > ~20,
+   any row with `+N duplicate queries`, any view of the queryset
+   that could be answered with `select_related`/`prefetch_related`/
+   `annotate(Count/Sum)`.
+3. Fix each flag with the minimal ORM change. One fix = one commit.
+4. Re-run, confirm query count dropped, confirm no test regressions.
+5. Repeat for `/payroll/?status=pending`, `/payroll/?status=history`,
+   `/payroll/?status=loans`, `/payroll/?status=adjustments`.
+
+**Likely suspects** (prediction — to be confirmed by toolbar):
+- **Dashboard cert-expiry card** — aggregates expired/expiring certs
+  across active workers. If it loops in Python instead of `annotate`-
+  plus-filter, that's an N+1.
+- **Pending payments table** — worker + team + overdue calc per row.
+  The overdue check calls `get_pay_period(team)` per worker; if teams
+  aren't prefetched we're firing one SELECT per row.
+- **Adjustments tab groupings** — we fixed `worker.teams.first()` →
+  `worker.teams.all()` once already (commit `06b3315`); worth
+  double-checking grouped view for similar patterns.
+
+**Out of scope for this step:** any fix that requires a template
+rewrite. If something needs more than a `.select_related()` /
+`.prefetch_related()` / `.annotate()` tweak, it goes on the plan-B list.
+
+### 3. Double-check WeasyPrint is not eager-imported anywhere
+
+**File:** `core/utils.py`, `core/views.py`.
+
+We already lazy-import WeasyPrint in `render_to_pdf()` (per CLAUDE.md).
+I'll grep to confirm nothing else on the app-boot path imports
+`weasyprint` at module level. If anything does, move it into a function
+body. 10 minutes, zero-risk.
+
+### 4. Commit message includes before/after measurement
+
+The final commit's message records:
+- Page size bytes (DOM serialized) for `/` and `/payroll/` before & after
+- Network request count on a cold cache hit
+- SQL query count on both pages
+
+If the numbers don't materially improve after steps 1-3, I stop. We
+don't press on to plan B without evidence that plan A helped (or at
+least surfaced what's actually slow).
+
+## What I will NOT touch in this pass
+
+- Splitting `payroll_dashboard.html` into tab partials
+- Any refactoring of `views.py` or extraction of helpers
+- Any visual / UX change
+- Tests — query-count changes don't break the existing tests (they
+  assert URL contract + output shape, not query plans). If a test
+  genuinely needs updating because I materially changed a view's
+  behaviour, I'll note why in the commit
+
+## Risks + rollback
+
+All four changes are individually revertible. Biggest risks:
+
+- **mtime-based token misfires on fresh containers** — mitigated by
+  try/except fallback to today's behaviour
+- **A `select_related` fix changes query semantics** — e.g., eager
+  loading a nullable FK that used to be accessed lazily-with-None. Low
+  risk on Django's ORM, but the test suite (65 tests, all passing at
+  HEAD `503eff6`) will catch any behavioural regression
+- **Django Debug Toolbar pulled in in prod** — mitigated by double-gate
+  (DEBUG=true AND USE_SQLITE=true) in `config/settings.py`
+
+Rollback: `git revert <sha>` on the offending commit. No data, schema,
+or URL-contract impact.
+
+## Out of scope (explicit non-goals)
+
+- Plan B / C items (template splitting, written baseline doc, whole-app
+  measurement)
+- Moving CDN assets to local / self-hosted
+- Changing Flatlogic's `runserver` → gunicorn
+- Turning on HTTP/2 push, service workers, or other frontend perf tooling
+- Any refactor that requires a migration
+
+## Next step
+
+Generate an implementation plan via the writing-plans skill
+(task-by-task, bite-sized steps) and then execute via
+subagent-driven-development. Auto mode is active — proceed
+continuously, no mid-execution checkpoints (plan A is 4 small
+mechanical changes; a checkpoint adds overhead without value).
+
+Ship alongside current `ai-dev` HEAD (`503eff6`) in the same branch.