38686-vm/docs/plans/2026-04-24-perf-quick-wins-plan.md
Konrad du Plessis bcd0112687 docs(perf): task-by-task plan for Quick-Wins Pass A
Four tasks: mtime cache-bust token + tests; install & gate Django
Debug Toolbar dev-only; profile + fix N+1 on /; profile + fix N+1 on
/payroll/ (all four tabs) with before/after summary in the final
commit message.

Execute via subagent-driven-development. Auto mode — no mid-execution
checkpoints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 00:26:42 +02:00

22 KiB

Perf Quick-Wins Pass — Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development to implement this plan task-by-task. Auto mode is active — execute continuously, no mid-execution checkpoints.

Goal: Make main-space navigation feel snappier via targeted ORM + static-asset fixes. No architecture change.

Architecture: Four small mechanical changes: (1) mtime-based CSS cache-bust token so Cloudflare can actually cache custom.css; (2) install & gate Django Debug Toolbar as a dev-only profiling tool; (3) fix any N+1 hotspots the toolbar surfaces on / and /payroll/ via targeted select_related/prefetch_related/annotate; (4) record before/after numbers in the final commit message.

Tech Stack: Django 5.2.7 ORM; django-debug-toolbar (new dev-only dep); os.path.getmtime for static-asset cache-busting.

Design doc: docs/plans/2026-04-24-perf-quick-wins-design.md (committed as d1490c4).

Starting HEAD: d1490c4 on branch ai-dev (clean working tree).

Expected net change: ~100-200 lines across 5-6 files + 1 requirements bump.


Task 1: mtime-based cache-bust token + regression test

Goal: The deployment_timestamp context var changes only when static/css/custom.css actually changes, not on every HTTP request.

Files:

  • Modify: core/context_processors.py (replace int(time.time()) with _compute_cache_bust_token())
  • Test: core/tests.py (add CacheBustTokenTests class)

Step 1: Write the failing tests

Append to core/tests.py (find a spot after existing context-processor / URL-contract tests; conventional place is at the bottom of the file):

# === Cache-bust token tests ===
# The deployment_timestamp context variable controls the ?v=... query
# string on our CSS URL. It MUST only change when custom.css changes
# — otherwise Cloudflare cache-HIT rate on the CSS drops to zero and
# every page reload re-fetches 64 KB from the VM.

class CacheBustTokenTests(TestCase):
    """Regression tests for the mtime-based cache-bust token."""

    def test_token_is_an_integer(self):
        """Token must be int (templates cast to str; float would show a dot)."""
        from core.context_processors import project_context
        ctx = project_context(request=None)
        self.assertIsInstance(ctx['deployment_timestamp'], int)

    def test_token_is_stable_across_two_calls(self):
        """Critical property: two back-to-back calls return the same
        token — because custom.css hasn't changed between them. This
        is the entire point of the mtime-based approach."""
        from core.context_processors import project_context
        t1 = project_context(request=None)['deployment_timestamp']
        t2 = project_context(request=None)['deployment_timestamp']
        self.assertEqual(t1, t2)

    def test_token_falls_back_if_file_missing(self):
        """If static/css/custom.css is somehow missing (fresh
        container pre-collectstatic), we must NOT crash. We fall back
        to int(time.time()) so every page still renders."""
        import core.context_processors as cp
        original = cp._compute_cache_bust_token
        try:
            # Monkey-patch so the function sees a guaranteed-missing path.
            cp._CSS_PATH_FOR_TOKEN = cp.Path('/definitely/does/not/exist.css')
            # Should return an int and NOT raise.
            token = cp._compute_cache_bust_token()
            self.assertIsInstance(token, int)
        finally:
            # Reset the module-level cache so other tests get the real value.
            cp._CSS_PATH_FOR_TOKEN = cp.Path(settings.BASE_DIR) / 'static' / 'css' / 'custom.css'

Add the from django.conf import settings import at the top of core/tests.py if not already present.

Step 2: Run the tests to verify they fail

USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests.CacheBustTokenTests -v 2

Expected: 3 failures (function _compute_cache_bust_token doesn't exist).

Step 3: Implement the token function

Replace the entire contents of core/context_processors.py with:

# === core/context_processors.py ===
# Globals injected into every template render.
#
# `deployment_timestamp` is a cache-bust token on our CSS URL. Historically
# it was `int(time.time())` — a new value every second — which defeated
# Cloudflare's edge cache because every page load saw a different `?v=...`
# string. We now tie the token to `custom.css`'s mtime, so the URL
# changes ONLY when the CSS actually changes. Cloudflare can hold the
# file for its full 4h TTL, and users on repeat visits hit the browser
# cache (304 Not Modified).
import os
import time
from pathlib import Path

from django.conf import settings

# Path to the file whose mtime drives the cache-bust token. Module-level
# constant so tests can monkey-patch it to simulate "file missing".
_CSS_PATH_FOR_TOKEN = Path(settings.BASE_DIR) / 'static' / 'css' / 'custom.css'


def _compute_cache_bust_token():
    """Return an integer cache-bust token.

    Normal path: returns the CSS file's mtime (an integer).
    Fallback: if the file can't be stat'd (doesn't exist / permission
    error / disk issue), returns the current wall-clock time in seconds
    — that degrades to the PRE-FIX behaviour (new token per request)
    rather than crashing the whole request cycle.
    """
    try:
        return int(os.path.getmtime(_CSS_PATH_FOR_TOKEN))
    except (OSError, FileNotFoundError):
        return int(time.time())


def project_context(request):
    """Adds project-specific environment variables to the template context globally."""
    return {
        "project_description": os.getenv("PROJECT_DESCRIPTION", ""),
        "project_image_url": os.getenv("PROJECT_IMAGE_URL", ""),
        # Cache-busts static assets — see _compute_cache_bust_token().
        "deployment_timestamp": _compute_cache_bust_token(),
    }

Step 4: Run the tests to verify they pass

USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests.CacheBustTokenTests -v 2

Expected: 3 passes.

Step 5: Run the full test suite to catch any regression

USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2

Expected: 68 passes (65 pre-existing + 3 new). If anything else fails, stop and investigate.

Step 6: Update CLAUDE.md's "Static Assets & Cache-Busting" note

In CLAUDE.md, find the section "How cache-busting works now" — specifically the paragraph beginning "deployment_timestamp comes from core/context_processors.py::project_context as int(time.time())..." and replace with:

`deployment_timestamp` comes from `core/context_processors.py::project_context`
via `_compute_cache_bust_token()` — returns the mtime of
`static/css/custom.css` as an integer. The token only changes when the
CSS file is modified, so Cloudflare's edge cache holds each version for
its full 4h TTL and repeat page loads in a session hit the browser
cache (304 Not Modified). Deploys that include a CSS change bump the
mtime → new token → cache busts. Pre-24-Apr-2026 this was
`int(time.time())` per-request, which defeated the CDN cache entirely
(effectively 0% hit rate on CSS). Degraded-mode fallback: if
`custom.css` isn't on disk (e.g., fresh container before
`collectstatic`), the function falls back to the old per-request
timestamp rather than crashing.

Step 7: Commit

git add core/context_processors.py core/tests.py CLAUDE.md
git commit -m "$(cat <<'EOF'
perf(cache): mtime-based CSS cache-bust token

deployment_timestamp was int(time.time()) per-request, giving every
page load a new ?v=... query string on custom.css. Cloudflare treats
each unique URL as a new resource, so the CSS was fetched from the VM
on every page load — 64 KB over the wire per navigation.

Token now tied to static/css/custom.css mtime. The URL only changes
when the CSS actually changes, so Cloudflare can hold the file for
its full 4h TTL. Degraded-mode fallback preserves today's behaviour
if the file isn't on disk.

3 new CacheBustTokenTests; all 68 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

Task 2: Install & gate Django Debug Toolbar (dev-only)

Goal: Be able to see SQL query count, duplicate-query warnings, and execution times on any page — but ONLY when DEBUG + USE_SQLITE, never in prod.

Files:

  • Modify: requirements.txt (add one line)
  • Modify: config/settings.py (conditional INSTALLED_APPS, MIDDLEWARE, INTERNAL_IPS, DEBUG_TOOLBAR_CONFIG)
  • Modify: config/urls.py (conditional debug_toolbar.urls include)

Step 1: Add the dependency

Append to requirements.txt:

django-debug-toolbar==6.0.0

Then install:

pip install django-debug-toolbar==6.0.0

Expected: clean install, one new package.

Step 2: Gate it in config/settings.py

At the very bottom of config/settings.py (after the SQLite override block), append:

# === DEV-ONLY: Django Debug Toolbar ===
# Loaded ONLY when BOTH DEBUG=true AND USE_SQLITE=true. This is a
# deliberately strict gate — the toolbar exposes query internals,
# settings, and request state that should never appear in production.
# The USE_SQLITE half of the check acts as a belt-and-suspenders guard
# against accidentally enabling DEBUG on production (which would be its
# own serious problem, but at least wouldn't leak toolbar data).
if DEBUG and _IS_DEV:
    try:
        import debug_toolbar  # noqa: F401 — probe for installed package
    except ImportError:
        pass
    else:
        INSTALLED_APPS += ['debug_toolbar']
        # Insert the middleware as early as possible in the chain so it
        # captures every request, but AFTER SecurityMiddleware (standard
        # recommendation in the toolbar's install docs).
        MIDDLEWARE.insert(1, 'debug_toolbar.middleware.DebugToolbarMiddleware')
        INTERNAL_IPS = ['127.0.0.1', 'localhost']
        DEBUG_TOOLBAR_CONFIG = {
            # Don't auto-collapse the SQL panel — the SQL count is the
            # main thing we check on every page.
            'SHOW_COLLAPSED': False,
            # Explicit check so the toolbar ONLY renders when the hosting
            # flags are still set (guards against stale cached pages).
            'SHOW_TOOLBAR_CALLBACK': lambda request: DEBUG and _IS_DEV,
        }

Step 3: Gate it in config/urls.py

Read config/urls.py first to see the current shape. Then at the bottom (after urlpatterns is defined), append:

# === DEV-ONLY: Django Debug Toolbar URL include ===
# Matches the conditional load in settings.py. No-op in prod.
from django.conf import settings
if settings.DEBUG and getattr(settings, '_IS_DEV_DEBUG_TOOLBAR_ENABLED', False) is False:
    # (settings.py sets no such flag; gate on INSTALLED_APPS membership.)
    pass

if 'debug_toolbar' in settings.INSTALLED_APPS:
    from debug_toolbar.toolbar import debug_toolbar_urls
    urlpatterns += debug_toolbar_urls()

Note: debug_toolbar_urls() is the v6.x API (replaces the older pattern-include). If that import fails in v6, fall back to:

if 'debug_toolbar' in settings.INSTALLED_APPS:
    import debug_toolbar
    urlpatterns += [path('__debug__/', include(debug_toolbar.urls))]

The implementer should try debug_toolbar_urls() first; if it errors, switch to the include form and move on.

Step 4: Manual verification

Start the dev server, load /, confirm the toolbar appears. If the page 500s or the toolbar doesn't render, stop and investigate before profiling.

run_dev.bat  # or manually: set USE_SQLITE=true && set DJANGO_DEBUG=true && python manage.py runserver 0.0.0.0:8000

Then open http://127.0.0.1:8000/ in a browser and confirm the toolbar tab appears on the right edge. Click SQL panel, note the query count.

Run Django's system check too, to catch any app-config issue:

USE_SQLITE=true DJANGO_DEBUG=true python manage.py check

Expected: System check identified no issues (0 silenced).

Step 5: Run the full test suite

USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2

Expected: still 68 passes. Toolbar shouldn't affect tests.

Step 6: Commit

git add requirements.txt config/settings.py config/urls.py
git commit -m "$(cat <<'EOF'
chore(dev): add Django Debug Toolbar (dev-only, DEBUG+USE_SQLITE gated)

Double-gated install: only loads when DEBUG=true AND USE_SQLITE=true,
never in prod. Lets us profile SQL query counts on the dashboard and
payroll pages before attacking N+1 hotspots.

requirements.txt adds django-debug-toolbar==6.0.0
config/settings.py conditionally appends to INSTALLED_APPS + MIDDLEWARE
config/urls.py conditionally includes __debug__ route

No behavioural change to production.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

Task 3: Profile & fix N+1 on / (dashboard)

Goal: Confirm the dashboard's SQL query count and fix any N+1 the toolbar surfaces.

Files:

  • Modify: core/views.py (the index() view — starts line ~353, certs_expired/expiring aggregation around lines ~406-413)
  • Possibly modify: core/templates/core/index.html (only if a template access is firing queries)

Step 1: Baseline measurement

Start the dev server. Log in as an admin user. Open /. In the Debug Toolbar panel:

  • Note total SQL query count on the page
  • Note total SQL time
  • Expand the SQL tab, scan for any group marked "N+1" or "N duplicate queries"
  • Note DOM size / page weight from DevTools Network tab (the HTML document itself, not total transfer)

Record these four numbers. They go into the Task 4 commit message.

Step 2: Identify each problem

The design doc predicts three suspects:

  1. Cert-expiry card — two separate COUNT queries at views.py:406 and views.py:409. Could become one .aggregate() call.
  2. Stat cards — if any stat card loops over a queryset in the template instead of using .count() on the queryset, that's an N+1.
  3. Pending payments table (if it renders on this page for admins) — worker/team/log per row.

For each "N duplicate queries" group the toolbar shows, inspect the Python side:

  • Find the view function (use the traceback in the toolbar if available)
  • Decide whether the fix is select_related (FK), prefetch_related (M2M / reverse FK), or annotate (aggregate)
  • Apply the minimal fix

Step 3: Apply fixes

For each confirmed N+1, edit core/views.py with the minimal change. Example shape (only as an illustration — apply whatever the toolbar surfaces):

# BEFORE
workers = Worker.objects.filter(active=True)
for w in workers:
    for cert in w.workercertificate_set.all():  # N+1 — one query per worker
        ...

# AFTER
workers = Worker.objects.filter(active=True).prefetch_related('workercertificate_set')
for w in workers:
    for cert in w.workercertificate_set.all():  # reuses prefetched cache
        ...

For the cert counts specifically (views.py:406-413), the current code is:

certs_expired_count = WorkerCertificate.objects.filter(
    worker__is_active=True, valid_until__lt=today
).count()
certs_expiring_count = WorkerCertificate.objects.filter(
    worker__is_active=True, valid_until__gte=today, valid_until__lte=today + timedelta(days=30)
).count()
certs_alert_total = certs_expired_count + certs_expiring_count

This is TWO queries, not an N+1. If the toolbar shows them as hot, consolidate into one .aggregate():

from django.db.models import Count, Q
_cert_counts = WorkerCertificate.objects.filter(
    worker__is_active=True
).aggregate(
    expired=Count('id', filter=Q(valid_until__lt=today)),
    expiring=Count('id', filter=Q(valid_until__gte=today, valid_until__lte=today + timedelta(days=30))),
)
certs_expired_count = _cert_counts['expired']
certs_expiring_count = _cert_counts['expiring']
certs_alert_total = certs_expired_count + certs_expiring_count

But ONLY apply this if the toolbar flags it. Per YAGNI, don't pre-optimise. Two simple COUNTs with an index on valid_until are plenty fast.

Step 4: Re-measure

Reload /. The SQL query count should drop. If it didn't, your fix didn't hit the real hot path — revert that edit and look again at the toolbar output.

Step 5: Run the full test suite

USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2

Expected: 68 passes. A change in the ORM query plan does NOT typically move test counts, but a bug in a prefetch_related pattern can. If any test fails, revert and rethink.

Step 6: Commit

Only commit if ≥1 N+1 was actually found and fixed. If the toolbar showed / at ~20 queries with no N+1s flagged, skip Task 3 entirely and note that in the final commit message.

git add core/views.py [any template files touched]
git commit -m "$(cat <<'EOF'
perf(dashboard): fix N+1 on [specific problem identified]

Debug Toolbar showed [before-count] queries on /, including [N
duplicates of query X]. Added [select_related/prefetch_related/annotate]
to [view function] to fold those into one.

After: [after-count] queries. Test suite unaffected (68/68 pass).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

Task 4: Profile & fix N+1 on /payroll/ (all 4 tabs) + before/after summary

Goal: Same as Task 3, applied to the payroll dashboard's four tabs, plus a final commit that captures the full before/after picture.

Files:

  • Modify: core/views.py (the payroll_dashboard() view — starts line ~2593; each status branch: pending, history, loans, adjustments)
  • Possibly: core/templates/core/payroll_dashboard.html (only if a template access fires queries)

Step 1: Baseline measurement — all 4 tabs

For each of the four URLs, record query count + SQL time:

  • /payroll/ (defaults to pending)
  • /payroll/?status=history
  • /payroll/?status=loans
  • /payroll/?status=adjustments

Step 2: Identify N+1s per tab

Expected suspects (design doc predictions):

  • Pending tab — for every worker in the outstanding list, get_pay_period(team) is called per worker. If team isn't prefetched via select_related('team') or prefetch_related('teams') somewhere upstream, each call hits the DB. Look at views.py around line 2676 (the pending-payments loop where get_pay_period is called).
  • History tab — PayrollRecord list view; each row needs worker + work_logs + adjustments. Check for prefetch_related('work_logs', 'payrolladjustment_set') on the base queryset.
  • Loans tab — Loan list with repayment adjustments. Check for prefetch_related('payrolladjustment_set') on active loans.
  • Adjustments tab — we already fixed worker.teams.first()worker.teams.all() in commit 06b3315; confirm no new N+1 introduced by the sort / group work done in later Adjustments commits.

Step 3: Apply fixes

Same pattern as Task 3. One fix = one toolbar re-measure. Don't batch multiple fixes into one change — it's much harder to attribute the win (or the regression).

Step 4: Confirm WeasyPrint stays lazy-imported

One-line grep-and-document check (design doc Task 3):

grep -rn "^import weasyprint\|^from weasyprint" core/ config/

Expected: no matches. WeasyPrint should only be imported inside the body of render_to_pdf() in core/utils.py. If any module-level import exists, move it inside the function.

Step 5: Run the full test suite once more

USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2

Expected: 68 passes.

Step 6: Final commit with before/after summary

Even if Task 4 found no N+1s (unlikely but possible), this commit records the measurement. If Task 3 or 4 DID find real fixes, they're each in their own commit already; this is the "closing the pass" commit.

# (Only include files in git add IF you actually changed them in this task.)
git add core/views.py [any templates touched]
git commit -m "$(cat <<'EOF'
perf(payroll): [specific fix] + quick-wins pass summary

Fixed [N+1 description] on the [pending|history|loans|adjustments] tab
by [select_related / prefetch_related / annotate added to queryset X].

Perf Quick-Wins Pass A — before/after:
  /            N queries, Tms SQL → M queries, Sms SQL
  /payroll/    N queries, Tms SQL → M queries, Sms SQL  (pending)
  /payroll/?status=history     N → M queries
  /payroll/?status=loans       N → M queries
  /payroll/?status=adjustments N → M queries

Cache-bust token (d1490c4): CSS is now browser-cached across sessions
and Cloudflare holds it at the edge for 4h — biggest felt improvement
expected from this pass.

WeasyPrint confirmed still lazy-imported (module-level grep clean).

Test suite: 68/68.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EOF
)"

Final acceptance checklist

Before declaring the pass complete, the controller (this session) verifies:

  • All commits on ai-dev from d1490c4 → HEAD show perf(...) or chore(dev): prefix
  • USE_SQLITE=true DJANGO_DEBUG=true python manage.py test core.tests -v 2 reports 68/68 passing
  • Django Debug Toolbar renders on / with USE_SQLITE=true DEBUG=true — and does NOT render when DEBUG=false (quick toggle test)
  • The final commit message contains concrete before/after query counts (not placeholders)
  • Working tree clean, branch ready to push

What's NOT in this plan (explicit non-goals)

  • Splitting payroll_dashboard.html into tab partials — plan B territory
  • Refactoring any view's body structure
  • Any visual/UX change
  • Adding a perf regression test harness — if we want that, it's a separate design

Rollback: git revert <sha> on any individual commit. No data, schema, or URL-contract impact in any task.