From bcd0112687dda109be1da29c885ab9700955e25b Mon Sep 17 00:00:00 2001 From: Konrad du Plessis Date: Fri, 24 Apr 2026 00:26:42 +0200 Subject: [PATCH] docs(perf): task-by-task plan for Quick-Wins Pass A MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Four tasks: mtime cache-bust token + tests; install & gate Django Debug Toolbar dev-only; profile + fix N+1 on /; profile + fix N+1 on /payroll/ (all four tabs) with before/after summary in the final commit message. Execute via subagent-driven-development. Auto mode — no mid-execution checkpoints. Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/plans/2026-04-24-perf-quick-wins-plan.md | 542 ++++++++++++++++++ 1 file changed, 542 insertions(+) create mode 100644 docs/plans/2026-04-24-perf-quick-wins-plan.md diff --git a/docs/plans/2026-04-24-perf-quick-wins-plan.md b/docs/plans/2026-04-24-perf-quick-wins-plan.md new file mode 100644 index 0000000..4a10abb --- /dev/null +++ b/docs/plans/2026-04-24-perf-quick-wins-plan.md @@ -0,0 +1,542 @@ +# Perf Quick-Wins Pass — Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task. Auto mode is active — execute continuously, no mid-execution checkpoints. + +**Goal:** Make main-space navigation feel snappier via targeted ORM + static-asset fixes. No architecture change. + +**Architecture:** Four small mechanical changes: (1) mtime-based CSS cache-bust token so Cloudflare can actually cache `custom.css`; (2) install & gate Django Debug Toolbar as a dev-only profiling tool; (3) fix any N+1 hotspots the toolbar surfaces on `/` and `/payroll/` via targeted `select_related`/`prefetch_related`/`annotate`; (4) record before/after numbers in the final commit message. + +**Tech Stack:** Django 5.2.7 ORM; `django-debug-toolbar` (new dev-only dep); `os.path.getmtime` for static-asset cache-busting. + +**Design doc:** `docs/plans/2026-04-24-perf-quick-wins-design.md` (committed as `d1490c4`). + +**Starting HEAD:** `d1490c4` on branch `ai-dev` (clean working tree). + +**Expected net change:** ~100-200 lines across 5-6 files + 1 requirements bump. + +--- + +## Task 1: mtime-based cache-bust token + regression test + +**Goal:** The `deployment_timestamp` context var changes only when `static/css/custom.css` actually changes, not on every HTTP request. + +**Files:** +- Modify: `core/context_processors.py` (replace `int(time.time())` with `_compute_cache_bust_token()`) +- Test: `core/tests.py` (add `CacheBustTokenTests` class) + +**Step 1: Write the failing tests** + +Append to `core/tests.py` (find a spot after existing context-processor / URL-contract tests; conventional place is at the bottom of the file): + +```python +# === Cache-bust token tests === +# The deployment_timestamp context variable controls the ?v=... query +# string on our CSS URL. It MUST only change when custom.css changes +# — otherwise Cloudflare cache-HIT rate on the CSS drops to zero and +# every page reload re-fetches 64 KB from the VM. + +class CacheBustTokenTests(TestCase): + """Regression tests for the mtime-based cache-bust token.""" + + def test_token_is_an_integer(self): + """Token must be int (templates cast to str; float would show a dot).""" + from core.context_processors import project_context + ctx = project_context(request=None) + self.assertIsInstance(ctx['deployment_timestamp'], int) + + def test_token_is_stable_across_two_calls(self): + """Critical property: two back-to-back calls return the same + token — because custom.css hasn't changed between them. This + is the entire point of the mtime-based approach.""" + from core.context_processors import project_context + t1 = project_context(request=None)['deployment_timestamp'] + t2 = project_context(request=None)['deployment_timestamp'] + self.assertEqual(t1, t2) + + def test_token_falls_back_if_file_missing(self): + """If static/css/custom.css is somehow missing (fresh + container pre-collectstatic), we must NOT crash. We fall back + to int(time.time()) so every page still renders.""" + import core.context_processors as cp + original = cp._compute_cache_bust_token + try: + # Monkey-patch so the function sees a guaranteed-missing path. + cp._CSS_PATH_FOR_TOKEN = cp.Path('/definitely/does/not/exist.css') + # Should return an int and NOT raise. + token = cp._compute_cache_bust_token() + self.assertIsInstance(token, int) + finally: + # Reset the module-level cache so other tests get the real value. + cp._CSS_PATH_FOR_TOKEN = cp.Path(settings.BASE_DIR) / 'static' / 'css' / 'custom.css' +``` + +Add the `from django.conf import settings` import at the top of `core/tests.py` if not already present. + +**Step 2: Run the tests to verify they fail** + +```bash +USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests.CacheBustTokenTests -v 2 +``` + +Expected: 3 failures (function `_compute_cache_bust_token` doesn't exist). + +**Step 3: Implement the token function** + +Replace the entire contents of `core/context_processors.py` with: + +```python +# === core/context_processors.py === +# Globals injected into every template render. +# +# `deployment_timestamp` is a cache-bust token on our CSS URL. Historically +# it was `int(time.time())` — a new value every second — which defeated +# Cloudflare's edge cache because every page load saw a different `?v=...` +# string. We now tie the token to `custom.css`'s mtime, so the URL +# changes ONLY when the CSS actually changes. Cloudflare can hold the +# file for its full 4h TTL, and users on repeat visits hit the browser +# cache (304 Not Modified). +import os +import time +from pathlib import Path + +from django.conf import settings + +# Path to the file whose mtime drives the cache-bust token. Module-level +# constant so tests can monkey-patch it to simulate "file missing". +_CSS_PATH_FOR_TOKEN = Path(settings.BASE_DIR) / 'static' / 'css' / 'custom.css' + + +def _compute_cache_bust_token(): + """Return an integer cache-bust token. + + Normal path: returns the CSS file's mtime (an integer). + Fallback: if the file can't be stat'd (doesn't exist / permission + error / disk issue), returns the current wall-clock time in seconds + — that degrades to the PRE-FIX behaviour (new token per request) + rather than crashing the whole request cycle. + """ + try: + return int(os.path.getmtime(_CSS_PATH_FOR_TOKEN)) + except (OSError, FileNotFoundError): + return int(time.time()) + + +def project_context(request): + """Adds project-specific environment variables to the template context globally.""" + return { + "project_description": os.getenv("PROJECT_DESCRIPTION", ""), + "project_image_url": os.getenv("PROJECT_IMAGE_URL", ""), + # Cache-busts static assets — see _compute_cache_bust_token(). + "deployment_timestamp": _compute_cache_bust_token(), + } +``` + +**Step 4: Run the tests to verify they pass** + +```bash +USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests.CacheBustTokenTests -v 2 +``` + +Expected: 3 passes. + +**Step 5: Run the full test suite to catch any regression** + +```bash +USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2 +``` + +Expected: 68 passes (65 pre-existing + 3 new). If anything else fails, stop and investigate. + +**Step 6: Update CLAUDE.md's "Static Assets & Cache-Busting" note** + +In `CLAUDE.md`, find the section "How cache-busting works now" — specifically the paragraph beginning "`deployment_timestamp` comes from `core/context_processors.py::project_context` as `int(time.time())`..." and replace with: + +```markdown +`deployment_timestamp` comes from `core/context_processors.py::project_context` +via `_compute_cache_bust_token()` — returns the mtime of +`static/css/custom.css` as an integer. The token only changes when the +CSS file is modified, so Cloudflare's edge cache holds each version for +its full 4h TTL and repeat page loads in a session hit the browser +cache (304 Not Modified). Deploys that include a CSS change bump the +mtime → new token → cache busts. Pre-24-Apr-2026 this was +`int(time.time())` per-request, which defeated the CDN cache entirely +(effectively 0% hit rate on CSS). Degraded-mode fallback: if +`custom.css` isn't on disk (e.g., fresh container before +`collectstatic`), the function falls back to the old per-request +timestamp rather than crashing. +``` + +**Step 7: Commit** + +```bash +git add core/context_processors.py core/tests.py CLAUDE.md +git commit -m "$(cat <<'EOF' +perf(cache): mtime-based CSS cache-bust token + +deployment_timestamp was int(time.time()) per-request, giving every +page load a new ?v=... query string on custom.css. Cloudflare treats +each unique URL as a new resource, so the CSS was fetched from the VM +on every page load — 64 KB over the wire per navigation. + +Token now tied to static/css/custom.css mtime. The URL only changes +when the CSS actually changes, so Cloudflare can hold the file for +its full 4h TTL. Degraded-mode fallback preserves today's behaviour +if the file isn't on disk. + +3 new CacheBustTokenTests; all 68 tests pass. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 2: Install & gate Django Debug Toolbar (dev-only) + +**Goal:** Be able to see SQL query count, duplicate-query warnings, and execution times on any page — but ONLY when DEBUG + USE_SQLITE, never in prod. + +**Files:** +- Modify: `requirements.txt` (add one line) +- Modify: `config/settings.py` (conditional INSTALLED_APPS, MIDDLEWARE, INTERNAL_IPS, DEBUG_TOOLBAR_CONFIG) +- Modify: `config/urls.py` (conditional `debug_toolbar.urls` include) + +**Step 1: Add the dependency** + +Append to `requirements.txt`: + +``` +django-debug-toolbar==6.0.0 +``` + +Then install: + +```bash +pip install django-debug-toolbar==6.0.0 +``` + +Expected: clean install, one new package. + +**Step 2: Gate it in `config/settings.py`** + +At the very bottom of `config/settings.py` (after the SQLite override block), append: + +```python +# === DEV-ONLY: Django Debug Toolbar === +# Loaded ONLY when BOTH DEBUG=true AND USE_SQLITE=true. This is a +# deliberately strict gate — the toolbar exposes query internals, +# settings, and request state that should never appear in production. +# The USE_SQLITE half of the check acts as a belt-and-suspenders guard +# against accidentally enabling DEBUG on production (which would be its +# own serious problem, but at least wouldn't leak toolbar data). +if DEBUG and _IS_DEV: + try: + import debug_toolbar # noqa: F401 — probe for installed package + except ImportError: + pass + else: + INSTALLED_APPS += ['debug_toolbar'] + # Insert the middleware as early as possible in the chain so it + # captures every request, but AFTER SecurityMiddleware (standard + # recommendation in the toolbar's install docs). + MIDDLEWARE.insert(1, 'debug_toolbar.middleware.DebugToolbarMiddleware') + INTERNAL_IPS = ['127.0.0.1', 'localhost'] + DEBUG_TOOLBAR_CONFIG = { + # Don't auto-collapse the SQL panel — the SQL count is the + # main thing we check on every page. + 'SHOW_COLLAPSED': False, + # Explicit check so the toolbar ONLY renders when the hosting + # flags are still set (guards against stale cached pages). + 'SHOW_TOOLBAR_CALLBACK': lambda request: DEBUG and _IS_DEV, + } +``` + +**Step 3: Gate it in `config/urls.py`** + +Read `config/urls.py` first to see the current shape. Then at the bottom (after `urlpatterns` is defined), append: + +```python +# === DEV-ONLY: Django Debug Toolbar URL include === +# Matches the conditional load in settings.py. No-op in prod. +from django.conf import settings +if settings.DEBUG and getattr(settings, '_IS_DEV_DEBUG_TOOLBAR_ENABLED', False) is False: + # (settings.py sets no such flag; gate on INSTALLED_APPS membership.) + pass + +if 'debug_toolbar' in settings.INSTALLED_APPS: + from debug_toolbar.toolbar import debug_toolbar_urls + urlpatterns += debug_toolbar_urls() +``` + +Note: `debug_toolbar_urls()` is the v6.x API (replaces the older pattern-include). If that import fails in v6, fall back to: + +```python +if 'debug_toolbar' in settings.INSTALLED_APPS: + import debug_toolbar + urlpatterns += [path('__debug__/', include(debug_toolbar.urls))] +``` + +The implementer should try `debug_toolbar_urls()` first; if it errors, switch to the include form and move on. + +**Step 4: Manual verification** + +Start the dev server, load `/`, confirm the toolbar appears. If the page 500s or the toolbar doesn't render, stop and investigate before profiling. + +```bash +run_dev.bat # or manually: set USE_SQLITE=true && set DJANGO_DEBUG=true && python manage.py runserver 0.0.0.0:8000 +``` + +Then open `http://127.0.0.1:8000/` in a browser and confirm the toolbar tab appears on the right edge. Click SQL panel, note the query count. + +Run Django's system check too, to catch any app-config issue: + +```bash +USE_SQLITE=true DJANGO_DEBUG=true python manage.py check +``` + +Expected: `System check identified no issues (0 silenced).` + +**Step 5: Run the full test suite** + +```bash +USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2 +``` + +Expected: still 68 passes. Toolbar shouldn't affect tests. + +**Step 6: Commit** + +```bash +git add requirements.txt config/settings.py config/urls.py +git commit -m "$(cat <<'EOF' +chore(dev): add Django Debug Toolbar (dev-only, DEBUG+USE_SQLITE gated) + +Double-gated install: only loads when DEBUG=true AND USE_SQLITE=true, +never in prod. Lets us profile SQL query counts on the dashboard and +payroll pages before attacking N+1 hotspots. + +requirements.txt adds django-debug-toolbar==6.0.0 +config/settings.py conditionally appends to INSTALLED_APPS + MIDDLEWARE +config/urls.py conditionally includes __debug__ route + +No behavioural change to production. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 3: Profile & fix N+1 on `/` (dashboard) + +**Goal:** Confirm the dashboard's SQL query count and fix any N+1 the toolbar surfaces. + +**Files:** +- Modify: `core/views.py` (the `index()` view — starts line ~353, `certs_expired/expiring` aggregation around lines ~406-413) +- Possibly modify: `core/templates/core/index.html` (only if a template access is firing queries) + +**Step 1: Baseline measurement** + +Start the dev server. Log in as an admin user. Open `/`. In the Debug Toolbar panel: +- Note **total SQL query count** on the page +- Note **total SQL time** +- Expand the SQL tab, scan for any group marked "N+1" or "N duplicate queries" +- Note **DOM size / page weight** from DevTools Network tab (the HTML document itself, not total transfer) + +Record these four numbers. They go into the Task 4 commit message. + +**Step 2: Identify each problem** + +The design doc predicts three suspects: +1. **Cert-expiry card** — two separate COUNT queries at views.py:406 and views.py:409. Could become one `.aggregate()` call. +2. **Stat cards** — if any stat card loops over a queryset in the template instead of using `.count()` on the queryset, that's an N+1. +3. **Pending payments table** (if it renders on this page for admins) — worker/team/log per row. + +For each "N duplicate queries" group the toolbar shows, inspect the Python side: +- Find the view function (use the traceback in the toolbar if available) +- Decide whether the fix is `select_related` (FK), `prefetch_related` (M2M / reverse FK), or `annotate` (aggregate) +- Apply the minimal fix + +**Step 3: Apply fixes** + +For each confirmed N+1, edit `core/views.py` with the minimal change. Example shape (only as an illustration — apply whatever the toolbar surfaces): + +```python +# BEFORE +workers = Worker.objects.filter(active=True) +for w in workers: + for cert in w.workercertificate_set.all(): # N+1 — one query per worker + ... + +# AFTER +workers = Worker.objects.filter(active=True).prefetch_related('workercertificate_set') +for w in workers: + for cert in w.workercertificate_set.all(): # reuses prefetched cache + ... +``` + +For the cert counts specifically (views.py:406-413), the current code is: + +```python +certs_expired_count = WorkerCertificate.objects.filter( + worker__is_active=True, valid_until__lt=today +).count() +certs_expiring_count = WorkerCertificate.objects.filter( + worker__is_active=True, valid_until__gte=today, valid_until__lte=today + timedelta(days=30) +).count() +certs_alert_total = certs_expired_count + certs_expiring_count +``` + +This is TWO queries, not an N+1. If the toolbar shows them as hot, consolidate into one `.aggregate()`: + +```python +from django.db.models import Count, Q +_cert_counts = WorkerCertificate.objects.filter( + worker__is_active=True +).aggregate( + expired=Count('id', filter=Q(valid_until__lt=today)), + expiring=Count('id', filter=Q(valid_until__gte=today, valid_until__lte=today + timedelta(days=30))), +) +certs_expired_count = _cert_counts['expired'] +certs_expiring_count = _cert_counts['expiring'] +certs_alert_total = certs_expired_count + certs_expiring_count +``` + +But ONLY apply this if the toolbar flags it. Per YAGNI, don't pre-optimise. Two simple COUNTs with an index on `valid_until` are plenty fast. + +**Step 4: Re-measure** + +Reload `/`. The SQL query count should drop. If it didn't, your fix didn't hit the real hot path — revert that edit and look again at the toolbar output. + +**Step 5: Run the full test suite** + +```bash +USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2 +``` + +Expected: 68 passes. A change in the ORM query plan does NOT typically move test counts, but a bug in a `prefetch_related` pattern can. If any test fails, revert and rethink. + +**Step 6: Commit** + +Only commit if ≥1 N+1 was actually found and fixed. If the toolbar showed `/` at ~20 queries with no N+1s flagged, skip Task 3 entirely and note that in the final commit message. + +```bash +git add core/views.py [any template files touched] +git commit -m "$(cat <<'EOF' +perf(dashboard): fix N+1 on [specific problem identified] + +Debug Toolbar showed [before-count] queries on /, including [N +duplicates of query X]. Added [select_related/prefetch_related/annotate] +to [view function] to fold those into one. + +After: [after-count] queries. Test suite unaffected (68/68 pass). + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Task 4: Profile & fix N+1 on `/payroll/` (all 4 tabs) + before/after summary + +**Goal:** Same as Task 3, applied to the payroll dashboard's four tabs, plus a final commit that captures the full before/after picture. + +**Files:** +- Modify: `core/views.py` (the `payroll_dashboard()` view — starts line ~2593; each status branch: `pending`, `history`, `loans`, `adjustments`) +- Possibly: `core/templates/core/payroll_dashboard.html` (only if a template access fires queries) + +**Step 1: Baseline measurement — all 4 tabs** + +For each of the four URLs, record query count + SQL time: +- `/payroll/` (defaults to pending) +- `/payroll/?status=history` +- `/payroll/?status=loans` +- `/payroll/?status=adjustments` + +**Step 2: Identify N+1s per tab** + +Expected suspects (design doc predictions): +- **Pending tab** — for every worker in the outstanding list, `get_pay_period(team)` is called per worker. If `team` isn't prefetched via `select_related('team')` or `prefetch_related('teams')` somewhere upstream, each call hits the DB. Look at views.py around line 2676 (the pending-payments loop where `get_pay_period` is called). +- **History tab** — PayrollRecord list view; each row needs worker + work_logs + adjustments. Check for `prefetch_related('work_logs', 'payrolladjustment_set')` on the base queryset. +- **Loans tab** — Loan list with repayment adjustments. Check for `prefetch_related('payrolladjustment_set')` on active loans. +- **Adjustments tab** — we already fixed `worker.teams.first()` → `worker.teams.all()` in commit `06b3315`; confirm no new N+1 introduced by the sort / group work done in later Adjustments commits. + +**Step 3: Apply fixes** + +Same pattern as Task 3. One fix = one toolbar re-measure. Don't batch multiple fixes into one change — it's much harder to attribute the win (or the regression). + +**Step 4: Confirm WeasyPrint stays lazy-imported** + +One-line grep-and-document check (design doc Task 3): + +```bash +grep -rn "^import weasyprint\|^from weasyprint" core/ config/ +``` + +Expected: no matches. WeasyPrint should only be imported inside the body of `render_to_pdf()` in `core/utils.py`. If any module-level import exists, move it inside the function. + +**Step 5: Run the full test suite once more** + +```bash +USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2 +``` + +Expected: 68 passes. + +**Step 6: Final commit with before/after summary** + +Even if Task 4 found no N+1s (unlikely but possible), this commit records the measurement. If Task 3 or 4 DID find real fixes, they're each in their own commit already; this is the "closing the pass" commit. + +```bash +# (Only include files in git add IF you actually changed them in this task.) +git add core/views.py [any templates touched] +git commit -m "$(cat <<'EOF' +perf(payroll): [specific fix] + quick-wins pass summary + +Fixed [N+1 description] on the [pending|history|loans|adjustments] tab +by [select_related / prefetch_related / annotate added to queryset X]. + +Perf Quick-Wins Pass A — before/after: + / N queries, Tms SQL → M queries, Sms SQL + /payroll/ N queries, Tms SQL → M queries, Sms SQL (pending) + /payroll/?status=history N → M queries + /payroll/?status=loans N → M queries + /payroll/?status=adjustments N → M queries + +Cache-bust token (d1490c4): CSS is now browser-cached across sessions +and Cloudflare holds it at the edge for 4h — biggest felt improvement +expected from this pass. + +WeasyPrint confirmed still lazy-imported (module-level grep clean). + +Test suite: 68/68. + +Co-Authored-By: Claude Opus 4.7 (1M context) +EOF +)" +``` + +--- + +## Final acceptance checklist + +Before declaring the pass complete, the controller (this session) verifies: + +- [ ] All commits on `ai-dev` from `d1490c4` → HEAD show `perf(...)` or `chore(dev):` prefix +- [ ] `USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2` reports 68/68 passing +- [ ] Django Debug Toolbar renders on `/` with USE_SQLITE=true DEBUG=true — and does NOT render when DEBUG=false (quick toggle test) +- [ ] The final commit message contains concrete before/after query counts (not placeholders) +- [ ] Working tree clean, branch ready to push + +--- + +## What's NOT in this plan (explicit non-goals) + +- Splitting `payroll_dashboard.html` into tab partials — plan B territory +- Refactoring any view's body structure +- Any visual/UX change +- Adding a perf regression test harness — if we want that, it's a separate design + +Rollback: `git revert ` on any individual commit. No data, schema, or URL-contract impact in any task.