docs(perf): design for Quick-Wins Pass A

Short design covering four changes: mtime-based CSS cache-bust token,
Django Debug Toolbar (dev-only) for profiling, N+1 fixes on Dashboard
and Payroll pages, and a before/after measurement in the commit message.
Scope is deliberately tight — plan B (template splitting) and plan C
(full audit) are deferred until plan A evidence lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Konrad du Plessis 2026-04-24 00:23:48 +02:00
parent 503eff67a0
commit d1490c4639

View File

@ -0,0 +1,175 @@
# Perf Quick-Wins Pass — Design (24 Apr 2026)
## Origin
Konrad, after a long stretch of feature work (Inline Filters + Adjustments
tab + filter-bar v2):
> _"the app feel a bit sluggish especially changing between main spaces.
> Go through the app systematically and look for bugs and un optimized
> code. systematically go through the code and expertly and thoroughly
> review and fix it."_
Presented three scopes (A quick-wins / B focused dashboard pass / C full
systematic audit). Konrad picked **A — quick-wins first**, on the
principle that perf work is notorious for "big rewrite that didn't help."
If A moves the needle, we can stop; if not, we escalate with evidence.
## Goal
Make navigation between main spaces (Dashboard ↔ Payroll ↔ Workers ↔
Report ↔ Teams ↔ Projects) feel snappier. Ship in 1-3 commits. No
architecture changes. Every change individually revertible.
## Who it's for
Everyone who uses the app — most immediately Konrad, who navigates
between Dashboard and Payroll dozens of times a day.
## What we already know (pre-measurement)
- `payroll_dashboard.html` is 213 KB / 4,147 lines — all 4 tabs rendered
server-side even when only one is visible. Addressed in plan B, not A.
- `deployment_timestamp` context var is `int(time.time())` per-request
`custom.css?v=<timestamp>` is a new URL every second → Cloudflare
edge-cache HIT rate on CSS is effectively 0 → every page load fetches
64 KB of CSS from the VM. Documented as a trade-off in CLAUDE.md.
This is almost certainly the biggest single contributor to the
"heavy navigation" feel.
- 49 `select_related`/`prefetch_related` calls vs 91
`.all()/.first()/.count()` calls in `views.py`. Not damning but worth
pointing at hot-path views.
## Scope — 4 changes, in order
### 1. Fix `deployment_timestamp` to bust cache only on real deploys
**File:** `core/context_processors.py`
Today:
```python
'deployment_timestamp': int(time.time()),
```
After:
```python
# Cache-bust token tied to the CSS file's mtime — only changes when
# custom.css actually changes. Falls back to int(time.time()) if the
# file isn't on disk yet (fresh container, pre-collectstatic).
try:
_css_path = settings.BASE_DIR / 'static' / 'css' / 'custom.css'
_token = int(os.path.getmtime(_css_path))
except (OSError, FileNotFoundError):
_token = int(time.time())
```
Effect: the `?v=...` query string stays constant across requests until
`custom.css` is modified. Cloudflare can finally hold the file at its
edge for its full 4h TTL. Repeat navigation within a session drops from
"fetch 64 KB from VM" to "304 Not Modified" from the browser cache,
after the first hit in a 4h window.
**Degraded-mode guarantee:** if the file is missing (shouldn't happen in
normal dev or prod, but could in a fresh container), we degrade to
today's behaviour (per-request timestamp) rather than crash.
### 2. Profile + fix N+1 on the two busiest pages
**Pages:** `/` (dashboard) and `/payroll/` (payroll dashboard — all 4
tabs — Pending, History, Loans, Adjustments).
**Tool:** Django Debug Toolbar, added to `requirements.txt` as a
dev-only dependency. Gated in `config/settings.py` so it only
initialises when `DJANGO_DEBUG=true` AND `USE_SQLITE=true` (never
loads in prod).
**Process:**
1. Install toolbar, confirm the SQL panel loads on `/`.
2. Navigate to `/`, read the SQL tab: flag any query count > ~20,
any row with `+N duplicate queries`, any view of the queryset
that could be answered with `select_related`/`prefetch_related`/
`annotate(Count/Sum)`.
3. Fix each flag with the minimal ORM change. One fix = one commit.
4. Re-run, confirm query count dropped, confirm no test regressions.
5. Repeat for `/payroll/?status=pending`, `/payroll/?status=history`,
`/payroll/?status=loans`, `/payroll/?status=adjustments`.
**Likely suspects** (prediction — to be confirmed by toolbar):
- **Dashboard cert-expiry card** — aggregates expired/expiring certs
across active workers. If it loops in Python instead of `annotate`-
plus-filter, that's an N+1.
- **Pending payments table** — worker + team + overdue calc per row.
The overdue check calls `get_pay_period(team)` per worker; if teams
aren't prefetched we're firing one SELECT per row.
- **Adjustments tab groupings** — we fixed `worker.teams.first()`
`worker.teams.all()` once already (commit `06b3315`); worth
double-checking grouped view for similar patterns.
**Out of scope for this step:** any fix that requires a template
rewrite. If something needs more than a `.select_related()` /
`.prefetch_related()` / `.annotate()` tweak, it goes on the plan-B list.
### 3. Double-check WeasyPrint is not eager-imported anywhere
**File:** `core/utils.py`, `core/views.py`.
We already lazy-import WeasyPrint in `render_to_pdf()` (per CLAUDE.md).
I'll grep to confirm nothing else on the app-boot path imports
`weasyprint` at module level. If anything does, move it into a function
body. 10 minutes, zero-risk.
### 4. Commit message includes before/after measurement
The final commit's message records:
- Page size bytes (DOM serialized) for `/` and `/payroll/` before & after
- Network request count on a cold cache hit
- SQL query count on both pages
If the numbers don't materially improve after steps 1-3, I stop. We
don't press on to plan B without evidence that plan A helped (or at
least surfaced what's actually slow).
## What I will NOT touch in this pass
- Splitting `payroll_dashboard.html` into tab partials
- Any refactoring of `views.py` or extraction of helpers
- Any visual / UX change
- Tests — query-count changes don't break the existing tests (they
assert URL contract + output shape, not query plans). If a test
genuinely needs updating because I materially changed a view's
behaviour, I'll note why in the commit
## Risks + rollback
All four changes are individually revertible. Biggest risks:
- **mtime-based token misfires on fresh containers** — mitigated by
try/except fallback to today's behaviour
- **A `select_related` fix changes query semantics** — e.g., eager
loading a nullable FK that used to be accessed lazily-with-None. Low
risk on Django's ORM, but the test suite (65 tests, all passing at
HEAD `503eff6`) will catch any behavioural regression
- **Django Debug Toolbar pulled in in prod** — mitigated by double-gate
(DEBUG=true AND USE_SQLITE=true) in `config/settings.py`
Rollback: `git revert <sha>` on the offending commit. No data, schema,
or URL-contract impact.
## Out of scope (explicit non-goals)
- Plan B / C items (template splitting, written baseline doc, whole-app
measurement)
- Moving CDN assets to local / self-hosted
- Changing Flatlogic's `runserver` → gunicorn
- Turning on HTTP/2 push, service workers, or other frontend perf tooling
- Any refactor that requires a migration
## Next step
Generate an implementation plan via the writing-plans skill
(task-by-task, bite-sized steps) and then execute via
subagent-driven-development. Auto mode is active — proceed
continuously, no mid-execution checkpoints (plan A is 4 small
mechanical changes; a checkpoint adds overhead without value).
Ship alongside current `ai-dev` HEAD (`503eff6`) in the same branch.