docs(perf): design for Quick-Wins Pass A
Short design covering four changes: mtime-based CSS cache-bust token, Django Debug Toolbar (dev-only) for profiling, N+1 fixes on Dashboard and Payroll pages, and a before/after measurement in the commit message. Scope is deliberately tight — plan B (template splitting) and plan C (full audit) are deferred until plan A evidence lands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
503eff67a0
commit
d1490c4639
175
docs/plans/2026-04-24-perf-quick-wins-design.md
Normal file
175
docs/plans/2026-04-24-perf-quick-wins-design.md
Normal file
@ -0,0 +1,175 @@
|
||||
# Perf Quick-Wins Pass — Design (24 Apr 2026)
|
||||
|
||||
## Origin
|
||||
|
||||
Konrad, after a long stretch of feature work (Inline Filters + Adjustments
|
||||
tab + filter-bar v2):
|
||||
|
||||
> _"the app feel a bit sluggish especially changing between main spaces.
|
||||
> Go through the app systematically and look for bugs and un optimized
|
||||
> code. systematically go through the code and expertly and thoroughly
|
||||
> review and fix it."_
|
||||
|
||||
Presented three scopes (A quick-wins / B focused dashboard pass / C full
|
||||
systematic audit). Konrad picked **A — quick-wins first**, on the
|
||||
principle that perf work is notorious for "big rewrite that didn't help."
|
||||
If A moves the needle, we can stop; if not, we escalate with evidence.
|
||||
|
||||
## Goal
|
||||
|
||||
Make navigation between main spaces (Dashboard ↔ Payroll ↔ Workers ↔
|
||||
Report ↔ Teams ↔ Projects) feel snappier. Ship in 1-3 commits. No
|
||||
architecture changes. Every change individually revertible.
|
||||
|
||||
## Who it's for
|
||||
|
||||
Everyone who uses the app — most immediately Konrad, who navigates
|
||||
between Dashboard and Payroll dozens of times a day.
|
||||
|
||||
## What we already know (pre-measurement)
|
||||
|
||||
- `payroll_dashboard.html` is 213 KB / 4,147 lines — all 4 tabs rendered
|
||||
server-side even when only one is visible. Addressed in plan B, not A.
|
||||
- `deployment_timestamp` context var is `int(time.time())` per-request
|
||||
→ `custom.css?v=<timestamp>` is a new URL every second → Cloudflare
|
||||
edge-cache HIT rate on CSS is effectively 0 → every page load fetches
|
||||
64 KB of CSS from the VM. Documented as a trade-off in CLAUDE.md.
|
||||
This is almost certainly the biggest single contributor to the
|
||||
"heavy navigation" feel.
|
||||
- 49 `select_related`/`prefetch_related` calls vs 91
|
||||
`.all()/.first()/.count()` calls in `views.py`. Not damning but worth
|
||||
pointing at hot-path views.
|
||||
|
||||
## Scope — 4 changes, in order
|
||||
|
||||
### 1. Fix `deployment_timestamp` to bust cache only on real deploys
|
||||
|
||||
**File:** `core/context_processors.py`
|
||||
|
||||
Today:
|
||||
```python
|
||||
'deployment_timestamp': int(time.time()),
|
||||
```
|
||||
|
||||
After:
|
||||
```python
|
||||
# Cache-bust token tied to the CSS file's mtime — only changes when
|
||||
# custom.css actually changes. Falls back to int(time.time()) if the
|
||||
# file isn't on disk yet (fresh container, pre-collectstatic).
|
||||
try:
|
||||
_css_path = settings.BASE_DIR / 'static' / 'css' / 'custom.css'
|
||||
_token = int(os.path.getmtime(_css_path))
|
||||
except (OSError, FileNotFoundError):
|
||||
_token = int(time.time())
|
||||
```
|
||||
|
||||
Effect: the `?v=...` query string stays constant across requests until
|
||||
`custom.css` is modified. Cloudflare can finally hold the file at its
|
||||
edge for its full 4h TTL. Repeat navigation within a session drops from
|
||||
"fetch 64 KB from VM" to "304 Not Modified" from the browser cache,
|
||||
after the first hit in a 4h window.
|
||||
|
||||
**Degraded-mode guarantee:** if the file is missing (shouldn't happen in
|
||||
normal dev or prod, but could in a fresh container), we degrade to
|
||||
today's behaviour (per-request timestamp) rather than crash.
|
||||
|
||||
### 2. Profile + fix N+1 on the two busiest pages
|
||||
|
||||
**Pages:** `/` (dashboard) and `/payroll/` (payroll dashboard — all 4
|
||||
tabs — Pending, History, Loans, Adjustments).
|
||||
|
||||
**Tool:** Django Debug Toolbar, added to `requirements.txt` as a
|
||||
dev-only dependency. Gated in `config/settings.py` so it only
|
||||
initialises when `DJANGO_DEBUG=true` AND `USE_SQLITE=true` (never
|
||||
loads in prod).
|
||||
|
||||
**Process:**
|
||||
1. Install toolbar, confirm the SQL panel loads on `/`.
|
||||
2. Navigate to `/`, read the SQL tab: flag any query count > ~20,
|
||||
any row with `+N duplicate queries`, any view of the queryset
|
||||
that could be answered with `select_related`/`prefetch_related`/
|
||||
`annotate(Count/Sum)`.
|
||||
3. Fix each flag with the minimal ORM change. One fix = one commit.
|
||||
4. Re-run, confirm query count dropped, confirm no test regressions.
|
||||
5. Repeat for `/payroll/?status=pending`, `/payroll/?status=history`,
|
||||
`/payroll/?status=loans`, `/payroll/?status=adjustments`.
|
||||
|
||||
**Likely suspects** (prediction — to be confirmed by toolbar):
|
||||
- **Dashboard cert-expiry card** — aggregates expired/expiring certs
|
||||
across active workers. If it loops in Python instead of `annotate`-
|
||||
plus-filter, that's an N+1.
|
||||
- **Pending payments table** — worker + team + overdue calc per row.
|
||||
The overdue check calls `get_pay_period(team)` per worker; if teams
|
||||
aren't prefetched we're firing one SELECT per row.
|
||||
- **Adjustments tab groupings** — we fixed `worker.teams.first()` →
|
||||
`worker.teams.all()` once already (commit `06b3315`); worth
|
||||
double-checking grouped view for similar patterns.
|
||||
|
||||
**Out of scope for this step:** any fix that requires a template
|
||||
rewrite. If something needs more than a `.select_related()` /
|
||||
`.prefetch_related()` / `.annotate()` tweak, it goes on the plan-B list.
|
||||
|
||||
### 3. Double-check WeasyPrint is not eager-imported anywhere
|
||||
|
||||
**File:** `core/utils.py`, `core/views.py`.
|
||||
|
||||
We already lazy-import WeasyPrint in `render_to_pdf()` (per CLAUDE.md).
|
||||
I'll grep to confirm nothing else on the app-boot path imports
|
||||
`weasyprint` at module level. If anything does, move it into a function
|
||||
body. 10 minutes, zero-risk.
|
||||
|
||||
### 4. Commit message includes before/after measurement
|
||||
|
||||
The final commit's message records:
|
||||
- Page size bytes (DOM serialized) for `/` and `/payroll/` before & after
|
||||
- Network request count on a cold cache hit
|
||||
- SQL query count on both pages
|
||||
|
||||
If the numbers don't materially improve after steps 1-3, I stop. We
|
||||
don't press on to plan B without evidence that plan A helped (or at
|
||||
least surfaced what's actually slow).
|
||||
|
||||
## What I will NOT touch in this pass
|
||||
|
||||
- Splitting `payroll_dashboard.html` into tab partials
|
||||
- Any refactoring of `views.py` or extraction of helpers
|
||||
- Any visual / UX change
|
||||
- Tests — query-count changes don't break the existing tests (they
|
||||
assert URL contract + output shape, not query plans). If a test
|
||||
genuinely needs updating because I materially changed a view's
|
||||
behaviour, I'll note why in the commit
|
||||
|
||||
## Risks + rollback
|
||||
|
||||
All four changes are individually revertible. Biggest risks:
|
||||
|
||||
- **mtime-based token misfires on fresh containers** — mitigated by
|
||||
try/except fallback to today's behaviour
|
||||
- **A `select_related` fix changes query semantics** — e.g., eager
|
||||
loading a nullable FK that used to be accessed lazily-with-None. Low
|
||||
risk on Django's ORM, but the test suite (65 tests, all passing at
|
||||
HEAD `503eff6`) will catch any behavioural regression
|
||||
- **Django Debug Toolbar pulled in in prod** — mitigated by double-gate
|
||||
(DEBUG=true AND USE_SQLITE=true) in `config/settings.py`
|
||||
|
||||
Rollback: `git revert <sha>` on the offending commit. No data, schema,
|
||||
or URL-contract impact.
|
||||
|
||||
## Out of scope (explicit non-goals)
|
||||
|
||||
- Plan B / C items (template splitting, written baseline doc, whole-app
|
||||
measurement)
|
||||
- Moving CDN assets to local / self-hosted
|
||||
- Changing Flatlogic's `runserver` → gunicorn
|
||||
- Turning on HTTP/2 push, service workers, or other frontend perf tooling
|
||||
- Any refactor that requires a migration
|
||||
|
||||
## Next step
|
||||
|
||||
Generate an implementation plan via the writing-plans skill
|
||||
(task-by-task, bite-sized steps) and then execute via
|
||||
subagent-driven-development. Auto mode is active — proceed
|
||||
continuously, no mid-execution checkpoints (plan A is 4 small
|
||||
mechanical changes; a checkpoint adds overhead without value).
|
||||
|
||||
Ship alongside current `ai-dev` HEAD (`503eff6`) in the same branch.
|
||||
Loading…
x
Reference in New Issue
Block a user