38686-vm/docs/plans/2026-04-21-deploy-audit-and-fork.md
Konrad du Plessis 3c28387dd3 WIP: 2026-04-22 session checkpoint
Complete working state of the session. Will be split into two deploy
phases (safety scaffolding then feature release) before merging to ai-dev.

Includes:
- Security fixes (email creds / SECRET_KEY / DEBUG / CSRF)
- Backup + restore management commands and browser endpoints
- WeasyPrint migration (replaces xhtml2pdf)
- New Worker fields + WorkerCertificate + WorkerWarning models
- Worker / Team / Project friendly management UIs
- Dashboard cert-expiry card + Manage All buttons
- Bootstrap tooltips (global init + theme-aware CSS)
- Django admin template override (taller M2M pickers)
- Money filter for ZAR currency formatting
- Resources dropdown nav
- Massive CLAUDE.md expansion + deploy plan docs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 00:19:15 +02:00

13 KiB

Deploy Readiness Audit + Fork Plan

Created: 21 April 2026 Status: Draft plan — not yet executed Author: Claude, based on full-repo audit


Goal

Get the app into a known-good state before attempting to test-deploy it on a non-Flatlogic platform, so that any issues surfaced during the test deploy are real deploy issues, not latent code bugs. Then fork the repo into a clean branch for the test-deploy.


BLOCKING QUESTION — Platform choice

You said "test deploy this using SpacetimeDB." I want to confirm before we proceed, because SpacetimeDB is not a platform where this Django app can run.

SpacetimeDB (Clockwork Labs) is a specialized relational database designed for real-time multiplayer games — you write your app logic in Rust or C#, compile it to WebAssembly, and it runs inside the database. It doesn't speak the PostgreSQL/MySQL wire protocol, doesn't host HTTP apps, and isn't a Django deploy target. Using SpacetimeDB would mean rewriting the entire payroll app from scratch in Rust (multiple weeks, not a test deploy).

I think you may have meant one of these instead:

Candidate What it is Django fit
Fly.io Containerised Django hosting, cheap, persistent volumes Drop-in
Railway Similar to Fly.io, simpler UX Drop-in
Render Mature PaaS, handles Django well Drop-in
Supabase PostgreSQL-based BaaS (not Django-host, but could replace MySQL) Partial — you'd still need a Django host
PlanetScale Serverless MySQL replacement Partial — same
DigitalOcean App Platform Containerised Django hosting Drop-in

Please confirm which you meant. The rest of this plan assumes Fly.io as a sensible default for "test deploy a Django app cheaply on a platform other than Flatlogic" — swap the platform in if you meant something different.


Part A — Audit findings (prioritised)

I did a full-repo audit looking at performance, latent bugs, and Flatlogic-specific deploy risks. Findings below with severity — fix CRITICALs before any deploy (Flatlogic or elsewhere).

🔴 CRITICAL — fix before ANY deploy

1. Gmail App Password committed to source control

File: config/settings.py:177

EMAIL_HOST_PASSWORD = os.getenv("EMAIL_HOST_PASSWORD", "cwvhpcwyijneukax")

The fallback value cwvhpcwyijneukax is a real 16-character Gmail App Password. Anyone with read access to this repo (public GitHub history included) has full send-access to konrad@foxfitt.co.za.

Also on lines 176, 188EMAIL_HOST_USER defaults to Konrad's real email, and SPARK_RECEIPT_EMAIL defaults to a real inbound address.

Fix: Remove the string fallbacks. If env var missing, either raise an error on startup or disable email features. ~10 minutes.

Additional action: Rotate the Gmail App Password immediately after fix lands, since it's already exposed in git history.


2. SECRET_KEY has a weak default

File: config/settings.py:20

SECRET_KEY = os.getenv("DJANGO_SECRET_KEY", "change-me")

If DJANGO_SECRET_KEY env var isn't set on the deploy platform, Django boots with "change-me" — sessions become forgeable, password reset tokens become predictable.

Fix: Remove the fallback. If env var missing, raise ImproperlyConfigured. ~5 minutes.


🟠 HIGH — fix before test deploy (and before production)

3. DEBUG defaults to true

File: config/settings.py:21

DEBUG = os.getenv("DJANGO_DEBUG", "true").lower() == "true"

If DJANGO_DEBUG env var isn't set, the app runs in DEBUG mode in production — full tracebacks on 500 errors expose DB schema, file paths, secret fragments.

Fix: Change default to "false". ~2 minutes.


4. Media uploads will be lost on Flatlogic rebuilds

MEDIA_ROOT = BASE_DIR / 'media'. Worker photos, ID documents, cert PDFs, warning PDFs all upload here. On Flatlogic, the app container is rebuilt on every deploy from git — local filesystem is ephemeral, so any uploaded file disappears on the next "Pull Latest".

Currently you have no uploaded files (you mentioned this), so the problem isn't visible yet. The moment you upload a worker photo and do a deploy, it's gone.

Fix options:

  • (a) S3/Cloudflare R2 bucket via django-storages — durable, costs cents/month
  • (b) Ask Flatlogic to mount a persistent volume at /app/media
  • (c) For the test deploy only: use ephemeral storage, accept the risk for testing

Recommend (a) for production, (c) for the test deploy since no uploaded files exist yet.


5. Batch reports load entire tables into Python memory

worker_batch_report, team_batch_report, project_batch_report all build a full list in memory before rendering. At ~14 workers this is fine; at 1,000+ it will strain the 512MB e2-micro RAM especially when WeasyPrint PDF rendering runs concurrently.

Fix: Add pagination (50 rows per page) on the HTML view. CSV/PDF can still be "export all" but should use .iterator() to stream rows rather than building a full list. ~30 minutes.

Not urgent at current data scale — deferrable.


6. CSRF_TRUSTED_ORIGINS has a URL-joining bug

File: config/settings.py:30-40

If someone sets HOST_FQDN=https://example.com (with scheme), the settings code prepends https:// again → https://https://example.com, which Django rejects and breaks CSRF validation entirely.

Fix: Check if the value already has a scheme before prepending. ~5 minutes.


🟡 MEDIUM — worth fixing, not deploy-blocking

7. PDF generation failures not handled in email send path

_send_payslip_email() and create_receipt() both call render_to_pdf() which can return None if WeasyPrint fails. Some paths check if pdf:, but not all — if the return is None, email.attach(filename, None, "application/pdf") may raise or silently send an unattached email. The user would get a notification but no payslip.

Fix: Guard every render_to_pdf() return with if pdf_bytes is None: log + skip attachment. ~15 minutes.


8. price_overtime() silently swallows all exceptions

The view loops over overtime entries and any exception inside the loop is caught and ignored — including typos, DB errors, and missing-record issues. The UI reports "Priced 5" even if 10 silently failed.

Fix: Catch specific expected exceptions only; log the rest. ~10 minutes.


9. X_FRAME_OPTIONS = 'ALLOWALL' — clickjacking risk

Deliberately disabled for Flatlogic's iframe preview. Any third party can embed the app in their own iframe and attack logged-in users.

Fix options:

  • On any platform other than Flatlogic: remove the middleware exclusion entirely
  • On Flatlogic: restrict to Flatlogic's iframe parent domain via a custom middleware

Relevant mainly if deploying off Flatlogic.


10. Hard-coded daily-rate divisor = 20

Worker.daily_rate = monthly_salary / 20. Fine as a business rule, but hardcoded with no validation. If monthly_salary == 0, every payslip becomes R 0.00 silently.

Fix: Add monthly_salary > 0 validation in Worker.save() or the form. Low priority. ~10 minutes.


🟢 LOW — performance, N+1 refinements

The index dashboard, payroll_dashboard, and work_history views all have some for-loop-with-related-access patterns that would be N+1 without the existing prefetch_related calls. The prefetches ARE present — they're just not documented clearly enough that a future Claude/dev will preserve them.

Fix: Add a comment above each loop explaining the prefetch, to prevent regression. No code change needed. ~30 minutes of doc comments.


Part B — Fork & test-deploy strategy

Overview

The test deploy goal is to verify: can this app run and function correctly on a non-Flatlogic platform, so we're not locked in. The fork gives us an isolated place to experiment without touching the working ai-dev/master branches or our current redesign-weasyprint work.

Fork strategy (git)

Since this repo is already on GitHub at Konradzar/LabourPay_v5, create a fork branch (not a separate GitHub fork — we don't need the ceremony):

  1. From redesign-weasyprint branch (which has all the recent work including WeasyPrint, worker/team/project management), cut a new branch:
    git checkout -b test-deploy-<platform>
    
  2. Commit the CRITICAL fixes (#1, #2, #3) on this branch first — those changes should eventually land everywhere, not just in the test deploy branch.
  3. Add platform-specific deploy config on top (Dockerfile, fly.toml / railway.json / render.yaml, depending on platform).
  4. Push only this branch — not the other local branches — so GitHub sees it but ai-dev and master stay unaffected.

Flatlogic keeps syncing only ai-dev. The test-deploy branch is invisible to Flatlogic.

Platform-specific deploy config (assuming Fly.io — adjust if you meant different)

  1. Dockerfile — single-stage Python 3.13 image with WeasyPrint system deps:

    FROM python:3.13-slim
    RUN apt-get update && apt-get install -y \
        libpango-1.0-0 libcairo2 libgdk-pixbuf2.0-0 \
        libffi-dev shared-mime-info \
        default-libmysqlclient-dev pkg-config build-essential \
        && rm -rf /var/lib/apt/lists/*
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    COPY . .
    RUN python manage.py collectstatic --noinput
    CMD ["gunicorn", "config.wsgi:application", "--bind", "0.0.0.0:8000"]
    
  2. fly.toml — minimal Fly config:

    app = "foxfitt-test"
    primary_region = "jnb"  # Johannesburg, closest to South Africa
    [env]
      DJANGO_DEBUG = "false"
      USE_SQLITE = "false"
      HOST_FQDN = "foxfitt-test.fly.dev"
    [[services]]
      internal_port = 8000
      protocol = "tcp"
      [[services.ports]]
        handlers = ["http", "tls"]
        port = 443
    
  3. Secrets (set via fly secrets set):

    • DJANGO_SECRET_KEY
    • DB_NAME, DB_USER, DB_PASS, DB_HOST, DB_PORT (for managed MySQL/Postgres)
    • EMAIL_HOST_USER, EMAIL_HOST_PASSWORD
    • SPARK_RECEIPT_EMAIL
    • DEFAULT_FROM_EMAIL
  4. Add WhiteNoise for static file serving (one-liner in MIDDLEWARE + add to requirements.txt) — Apache-style static serving doesn't exist on Fly.

  5. Database decision:

    • Simplest: provision Fly's managed MySQL (or PostgreSQL — needs settings.py ENGINE tweak)
    • Alternative: external managed MySQL (PlanetScale free tier, AWS RDS)
  6. Media storage decision:

    • For test deploy: ephemeral is fine
    • For anything beyond test: add django-storages + S3 bucket

Test deploy verification checklist

Once deployed, verify:

  • Home page loads (/)
  • Login works with a seeded admin user
  • Dashboard renders
  • Create a worker via friendly UI
  • Log attendance
  • Process a payment (critical path)
  • Download a payroll report PDF (verifies WeasyPrint + system libs)
  • Generate CSV exports
  • Django admin pages load
  • Session cookies survive across requests
  • Static files (CSS, images) load from /static/
  • Admin can log in and out

Rollback

Test deploy is entirely on its own branch. If it fails catastrophically, delete the Fly app and the branch. Zero impact on Flatlogic production.


I don't recommend executing this as one big batch. Split into four phases, ship each, observe, then proceed:

Phase 1 — Critical security fixes (ship to everything)

  • Remove hardcoded email credentials (#1)
  • Fix SECRET_KEY default (#2)
  • Fix DEBUG default (#3)
  • Fix CSRF_TRUSTED_ORIGINS bug (#6)
  • Rotate exposed Gmail App Password
  • Land this on redesign-weasyprint, merge to ai-dev, let Flatlogic rebuild, verify still working.

Phase 2 — Create the fork branch

  • Cut test-deploy-<platform> off redesign-weasyprint after Phase 1 is in.
  • No code changes — just the branch exists.

Phase 3 — Add deploy config for chosen platform

  • Dockerfile, platform config file, WhiteNoise, static-root adjustments, etc.
  • Commit on test-deploy-<platform> only.

Phase 4 — Deploy + verify

  • Push branch, deploy, run the verification checklist above.
  • Document any platform-specific quirks in a follow-up note.

Phase 5 (optional, only if going beyond test) — Address MEDIUM findings

  • PDF-None handling (#7)
  • price_overtime exception leak (#8)
  • Clickjacking header (#9)
  • Salary validation (#10)

Open questions / decisions needed

  1. What platform did you actually mean? (SpacetimeDB blocker — see top of doc)
  2. For the test deploy, is the database allowed to start empty, or do you want the production MySQL data copied in?
  3. Do you want the CRITICAL fixes merged into Flatlogic production (via ai-dev) as part of Phase 1, or hold off until the whole plan is approved?
  4. Budget sensitivity — Fly.io's free tier is ~$5/mo equivalent; Railway's is similar. Are we constrained to free-tier, or is a few dollars a month OK for test?

Not in scope (explicit)

  • Adding new app features (no new views, models, or migrations beyond what exists)
  • Rewriting in a different language/framework (e.g., actual SpacetimeDB migration)
  • Performance tuning at scale (current data size is small; defer until needed)
  • Removing Flatlogic as the production platform — this is a test deploy to prove portability, not a migration