38686-vm/docs/plans/2026-04-24-perf-quick-wins-design.md
Konrad du Plessis d1490c4639 docs(perf): design for Quick-Wins Pass A
Short design covering four changes: mtime-based CSS cache-bust token,
Django Debug Toolbar (dev-only) for profiling, N+1 fixes on Dashboard
and Payroll pages, and a before/after measurement in the commit message.
Scope is deliberately tight — plan B (template splitting) and plan C
(full audit) are deferred until plan A evidence lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:23:48 +02:00

7.1 KiB

Perf Quick-Wins Pass — Design (24 Apr 2026)

Origin

Konrad, after a long stretch of feature work (Inline Filters + Adjustments tab + filter-bar v2):

"the app feel a bit sluggish especially changing between main spaces. Go through the app systematically and look for bugs and un optimized code. systematically go through the code and expertly and thoroughly review and fix it."

Presented three scopes (A quick-wins / B focused dashboard pass / C full systematic audit). Konrad picked A — quick-wins first, on the principle that perf work is notorious for "big rewrite that didn't help." If A moves the needle, we can stop; if not, we escalate with evidence.

Goal

Make navigation between main spaces (Dashboard ↔ Payroll ↔ Workers ↔ Report ↔ Teams ↔ Projects) feel snappier. Ship in 1-3 commits. No architecture changes. Every change individually revertible.

Who it's for

Everyone who uses the app — most immediately Konrad, who navigates between Dashboard and Payroll dozens of times a day.

What we already know (pre-measurement)

  • payroll_dashboard.html is 213 KB / 4,147 lines — all 4 tabs rendered server-side even when only one is visible. Addressed in plan B, not A.
  • deployment_timestamp context var is int(time.time()) per-request → custom.css?v=<timestamp> is a new URL every second → Cloudflare edge-cache HIT rate on CSS is effectively 0 → every page load fetches 64 KB of CSS from the VM. Documented as a trade-off in CLAUDE.md. This is almost certainly the biggest single contributor to the "heavy navigation" feel.
  • 49 select_related/prefetch_related calls vs 91 .all()/.first()/.count() calls in views.py. Not damning but worth pointing at hot-path views.

Scope — 4 changes, in order

1. Fix deployment_timestamp to bust cache only on real deploys

File: core/context_processors.py

Today:

'deployment_timestamp': int(time.time()),

After:

# Cache-bust token tied to the CSS file's mtime — only changes when
# custom.css actually changes. Falls back to int(time.time()) if the
# file isn't on disk yet (fresh container, pre-collectstatic).
try:
    _css_path = settings.BASE_DIR / 'static' / 'css' / 'custom.css'
    _token = int(os.path.getmtime(_css_path))
except (OSError, FileNotFoundError):
    _token = int(time.time())

Effect: the ?v=... query string stays constant across requests until custom.css is modified. Cloudflare can finally hold the file at its edge for its full 4h TTL. Repeat navigation within a session drops from "fetch 64 KB from VM" to "304 Not Modified" from the browser cache, after the first hit in a 4h window.

Degraded-mode guarantee: if the file is missing (shouldn't happen in normal dev or prod, but could in a fresh container), we degrade to today's behaviour (per-request timestamp) rather than crash.

2. Profile + fix N+1 on the two busiest pages

Pages: / (dashboard) and /payroll/ (payroll dashboard — all 4 tabs — Pending, History, Loans, Adjustments).

Tool: Django Debug Toolbar, added to requirements.txt as a dev-only dependency. Gated in config/settings.py so it only initialises when DJANGO_DEBUG=true AND USE_SQLITE=true (never loads in prod).

Process:

  1. Install toolbar, confirm the SQL panel loads on /.
  2. Navigate to /, read the SQL tab: flag any query count > ~20, any row with +N duplicate queries, any view of the queryset that could be answered with select_related/prefetch_related/ annotate(Count/Sum).
  3. Fix each flag with the minimal ORM change. One fix = one commit.
  4. Re-run, confirm query count dropped, confirm no test regressions.
  5. Repeat for /payroll/?status=pending, /payroll/?status=history, /payroll/?status=loans, /payroll/?status=adjustments.

Likely suspects (prediction — to be confirmed by toolbar):

  • Dashboard cert-expiry card — aggregates expired/expiring certs across active workers. If it loops in Python instead of annotate- plus-filter, that's an N+1.
  • Pending payments table — worker + team + overdue calc per row. The overdue check calls get_pay_period(team) per worker; if teams aren't prefetched we're firing one SELECT per row.
  • Adjustments tab groupings — we fixed worker.teams.first()worker.teams.all() once already (commit 06b3315); worth double-checking grouped view for similar patterns.

Out of scope for this step: any fix that requires a template rewrite. If something needs more than a .select_related() / .prefetch_related() / .annotate() tweak, it goes on the plan-B list.

3. Double-check WeasyPrint is not eager-imported anywhere

File: core/utils.py, core/views.py.

We already lazy-import WeasyPrint in render_to_pdf() (per CLAUDE.md). I'll grep to confirm nothing else on the app-boot path imports weasyprint at module level. If anything does, move it into a function body. 10 minutes, zero-risk.

4. Commit message includes before/after measurement

The final commit's message records:

  • Page size bytes (DOM serialized) for / and /payroll/ before & after
  • Network request count on a cold cache hit
  • SQL query count on both pages

If the numbers don't materially improve after steps 1-3, I stop. We don't press on to plan B without evidence that plan A helped (or at least surfaced what's actually slow).

What I will NOT touch in this pass

  • Splitting payroll_dashboard.html into tab partials
  • Any refactoring of views.py or extraction of helpers
  • Any visual / UX change
  • Tests — query-count changes don't break the existing tests (they assert URL contract + output shape, not query plans). If a test genuinely needs updating because I materially changed a view's behaviour, I'll note why in the commit

Risks + rollback

All four changes are individually revertible. Biggest risks:

  • mtime-based token misfires on fresh containers — mitigated by try/except fallback to today's behaviour
  • A select_related fix changes query semantics — e.g., eager loading a nullable FK that used to be accessed lazily-with-None. Low risk on Django's ORM, but the test suite (65 tests, all passing at HEAD 503eff6) will catch any behavioural regression
  • Django Debug Toolbar pulled in in prod — mitigated by double-gate (DEBUG=true AND USE_SQLITE=true) in config/settings.py

Rollback: git revert <sha> on the offending commit. No data, schema, or URL-contract impact.

Out of scope (explicit non-goals)

  • Plan B / C items (template splitting, written baseline doc, whole-app measurement)
  • Moving CDN assets to local / self-hosted
  • Changing Flatlogic's runserver → gunicorn
  • Turning on HTTP/2 push, service workers, or other frontend perf tooling
  • Any refactor that requires a migration

Next step

Generate an implementation plan via the writing-plans skill (task-by-task, bite-sized steps) and then execute via subagent-driven-development. Auto mode is active — proceed continuously, no mid-execution checkpoints (plan A is 4 small mechanical changes; a checkpoint adds overhead without value).

Ship alongside current ai-dev HEAD (503eff6) in the same branch.