38686-vm/docs/plans/2026-04-21-deploy-audit-and-fork.md
Konrad du Plessis 3c28387dd3 WIP: 2026-04-22 session checkpoint
Complete working state of the session. Will be split into two deploy
phases (safety scaffolding then feature release) before merging to ai-dev.

Includes:
- Security fixes (email creds / SECRET_KEY / DEBUG / CSRF)
- Backup + restore management commands and browser endpoints
- WeasyPrint migration (replaces xhtml2pdf)
- New Worker fields + WorkerCertificate + WorkerWarning models
- Worker / Team / Project friendly management UIs
- Dashboard cert-expiry card + Manage All buttons
- Bootstrap tooltips (global init + theme-aware CSS)
- Django admin template override (taller M2M pickers)
- Money filter for ZAR currency formatting
- Resources dropdown nav
- Massive CLAUDE.md expansion + deploy plan docs

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 00:19:15 +02:00

295 lines
13 KiB
Markdown

# Deploy Readiness Audit + Fork Plan
**Created:** 21 April 2026
**Status:** Draft plan — not yet executed
**Author:** Claude, based on full-repo audit
---
## Goal
Get the app into a known-good state before attempting to test-deploy it on a non-Flatlogic platform, so that any issues surfaced during the test deploy are *real deploy issues*, not latent code bugs. Then fork the repo into a clean branch for the test-deploy.
---
## ⛔ BLOCKING QUESTION — Platform choice
You said "test deploy this using **SpacetimeDB**." I want to confirm before we proceed, because **SpacetimeDB is not a platform where this Django app can run**.
**SpacetimeDB** (Clockwork Labs) is a specialized relational database designed for real-time multiplayer games — you write your app logic in **Rust or C#**, compile it to WebAssembly, and it runs *inside* the database. It doesn't speak the PostgreSQL/MySQL wire protocol, doesn't host HTTP apps, and isn't a Django deploy target. Using SpacetimeDB would mean **rewriting the entire payroll app from scratch in Rust** (multiple weeks, not a test deploy).
I think you may have meant one of these instead:
| Candidate | What it is | Django fit |
|---|---|---|
| **Fly.io** | Containerised Django hosting, cheap, persistent volumes | ✅ Drop-in |
| **Railway** | Similar to Fly.io, simpler UX | ✅ Drop-in |
| **Render** | Mature PaaS, handles Django well | ✅ Drop-in |
| **Supabase** | PostgreSQL-based BaaS (not Django-host, but could replace MySQL) | Partial — you'd still need a Django host |
| **PlanetScale** | Serverless MySQL replacement | Partial — same |
| **DigitalOcean App Platform** | Containerised Django hosting | ✅ Drop-in |
**Please confirm which you meant.** The rest of this plan assumes **Fly.io** as a sensible default for "test deploy a Django app cheaply on a platform other than Flatlogic" — swap the platform in if you meant something different.
---
## Part A — Audit findings (prioritised)
I did a full-repo audit looking at performance, latent bugs, and Flatlogic-specific deploy risks. Findings below with **severity** — fix CRITICALs before any deploy (Flatlogic or elsewhere).
### 🔴 CRITICAL — fix before ANY deploy
#### 1. Gmail App Password committed to source control
**File:** `config/settings.py:177`
```python
EMAIL_HOST_PASSWORD = os.getenv("EMAIL_HOST_PASSWORD", "cwvhpcwyijneukax")
```
The fallback value `cwvhpcwyijneukax` is a real 16-character Gmail App Password. Anyone with read access to this repo (public GitHub history included) has full send-access to `konrad@foxfitt.co.za`.
**Also on lines 176, 188**`EMAIL_HOST_USER` defaults to Konrad's real email, and `SPARK_RECEIPT_EMAIL` defaults to a real inbound address.
**Fix:** Remove the string fallbacks. If env var missing, either raise an error on startup or disable email features. ~10 minutes.
**Additional action:** Rotate the Gmail App Password immediately after fix lands, since it's already exposed in git history.
---
#### 2. SECRET_KEY has a weak default
**File:** `config/settings.py:20`
```python
SECRET_KEY = os.getenv("DJANGO_SECRET_KEY", "change-me")
```
If `DJANGO_SECRET_KEY` env var isn't set on the deploy platform, Django boots with `"change-me"` — sessions become forgeable, password reset tokens become predictable.
**Fix:** Remove the fallback. If env var missing, raise `ImproperlyConfigured`. ~5 minutes.
---
### 🟠 HIGH — fix before test deploy (and before production)
#### 3. `DEBUG` defaults to `true`
**File:** `config/settings.py:21`
```python
DEBUG = os.getenv("DJANGO_DEBUG", "true").lower() == "true"
```
If `DJANGO_DEBUG` env var isn't set, the app runs in DEBUG mode in production — full tracebacks on 500 errors expose DB schema, file paths, secret fragments.
**Fix:** Change default to `"false"`. ~2 minutes.
---
#### 4. Media uploads will be lost on Flatlogic rebuilds
`MEDIA_ROOT = BASE_DIR / 'media'`. Worker photos, ID documents, cert PDFs, warning PDFs all upload here. On Flatlogic, the app container is rebuilt on every deploy from git — **local filesystem is ephemeral**, so any uploaded file disappears on the next "Pull Latest".
Currently you have no uploaded files (you mentioned this), so the problem isn't visible yet. The moment you upload a worker photo and do a deploy, it's gone.
**Fix options:**
- **(a) S3/Cloudflare R2 bucket** via `django-storages` — durable, costs cents/month
- **(b) Ask Flatlogic to mount a persistent volume** at `/app/media`
- **(c) For the test deploy only:** use ephemeral storage, accept the risk for testing
Recommend (a) for production, (c) for the test deploy since no uploaded files exist yet.
---
#### 5. Batch reports load entire tables into Python memory
`worker_batch_report`, `team_batch_report`, `project_batch_report` all build a full list in memory before rendering. At ~14 workers this is fine; at 1,000+ it will strain the 512MB e2-micro RAM especially when WeasyPrint PDF rendering runs concurrently.
**Fix:** Add pagination (50 rows per page) on the HTML view. CSV/PDF can still be "export all" but should use `.iterator()` to stream rows rather than building a full list. ~30 minutes.
**Not urgent at current data scale — deferrable.**
---
#### 6. `CSRF_TRUSTED_ORIGINS` has a URL-joining bug
**File:** `config/settings.py:30-40`
If someone sets `HOST_FQDN=https://example.com` (with scheme), the settings code prepends `https://` again → `https://https://example.com`, which Django rejects and breaks CSRF validation entirely.
**Fix:** Check if the value already has a scheme before prepending. ~5 minutes.
---
### 🟡 MEDIUM — worth fixing, not deploy-blocking
#### 7. PDF generation failures not handled in email send path
`_send_payslip_email()` and `create_receipt()` both call `render_to_pdf()` which can return `None` if WeasyPrint fails. Some paths check `if pdf:`, but not all — if the return is `None`, `email.attach(filename, None, "application/pdf")` may raise or silently send an unattached email. The user would get a notification but no payslip.
**Fix:** Guard every `render_to_pdf()` return with `if pdf_bytes is None: log + skip attachment`. ~15 minutes.
---
#### 8. `price_overtime()` silently swallows all exceptions
The view loops over overtime entries and any exception inside the loop is caught and ignored — including typos, DB errors, and missing-record issues. The UI reports "Priced 5" even if 10 silently failed.
**Fix:** Catch specific expected exceptions only; log the rest. ~10 minutes.
---
#### 9. `X_FRAME_OPTIONS = 'ALLOWALL'` — clickjacking risk
Deliberately disabled for Flatlogic's iframe preview. Any third party can embed the app in their own iframe and attack logged-in users.
**Fix options:**
- On any platform *other* than Flatlogic: remove the middleware exclusion entirely
- On Flatlogic: restrict to Flatlogic's iframe parent domain via a custom middleware
Relevant mainly if deploying off Flatlogic.
---
#### 10. Hard-coded daily-rate divisor = 20
`Worker.daily_rate` = `monthly_salary / 20`. Fine as a business rule, but hardcoded with no validation. If `monthly_salary == 0`, every payslip becomes R 0.00 silently.
**Fix:** Add `monthly_salary > 0` validation in `Worker.save()` or the form. Low priority. ~10 minutes.
---
### 🟢 LOW — performance, N+1 refinements
The `index` dashboard, `payroll_dashboard`, and `work_history` views all have some `for-loop-with-related-access` patterns that would be N+1 without the existing `prefetch_related` calls. The prefetches ARE present — they're just not documented clearly enough that a future Claude/dev will preserve them.
**Fix:** Add a comment above each loop explaining the prefetch, to prevent regression. No code change needed. ~30 minutes of doc comments.
---
## Part B — Fork & test-deploy strategy
### Overview
The test deploy goal is to verify: *can this app run and function correctly on a non-Flatlogic platform, so we're not locked in*. The fork gives us an isolated place to experiment without touching the working `ai-dev`/`master` branches or our current `redesign-weasyprint` work.
### Fork strategy (git)
Since this repo is already on GitHub at `Konradzar/LabourPay_v5`, create a fork branch (not a separate GitHub fork — we don't need the ceremony):
1. From `redesign-weasyprint` branch (which has all the recent work including WeasyPrint, worker/team/project management), cut a new branch:
```
git checkout -b test-deploy-<platform>
```
2. Commit the CRITICAL fixes (#1, #2, #3) on this branch **first** — those changes should eventually land everywhere, not just in the test deploy branch.
3. Add platform-specific deploy config on top (Dockerfile, fly.toml / railway.json / render.yaml, depending on platform).
4. Push only this branch — not the other local branches — so GitHub sees it but `ai-dev` and `master` stay unaffected.
Flatlogic keeps syncing only `ai-dev`. The test-deploy branch is invisible to Flatlogic.
### Platform-specific deploy config (assuming Fly.io — adjust if you meant different)
1. **`Dockerfile`** — single-stage Python 3.13 image with WeasyPrint system deps:
```dockerfile
FROM python:3.13-slim
RUN apt-get update && apt-get install -y \
libpango-1.0-0 libcairo2 libgdk-pixbuf2.0-0 \
libffi-dev shared-mime-info \
default-libmysqlclient-dev pkg-config build-essential \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN python manage.py collectstatic --noinput
CMD ["gunicorn", "config.wsgi:application", "--bind", "0.0.0.0:8000"]
```
2. **`fly.toml`** — minimal Fly config:
```toml
app = "foxfitt-test"
primary_region = "jnb" # Johannesburg, closest to South Africa
[env]
DJANGO_DEBUG = "false"
USE_SQLITE = "false"
HOST_FQDN = "foxfitt-test.fly.dev"
[[services]]
internal_port = 8000
protocol = "tcp"
[[services.ports]]
handlers = ["http", "tls"]
port = 443
```
3. **Secrets** (set via `fly secrets set`):
- `DJANGO_SECRET_KEY`
- `DB_NAME`, `DB_USER`, `DB_PASS`, `DB_HOST`, `DB_PORT` (for managed MySQL/Postgres)
- `EMAIL_HOST_USER`, `EMAIL_HOST_PASSWORD`
- `SPARK_RECEIPT_EMAIL`
- `DEFAULT_FROM_EMAIL`
4. **Add WhiteNoise** for static file serving (one-liner in `MIDDLEWARE` + add to `requirements.txt`) — Apache-style static serving doesn't exist on Fly.
5. **Database decision**:
- **Simplest:** provision Fly's managed MySQL (or PostgreSQL — needs settings.py `ENGINE` tweak)
- **Alternative:** external managed MySQL (PlanetScale free tier, AWS RDS)
6. **Media storage decision**:
- For test deploy: ephemeral is fine
- For anything beyond test: add `django-storages` + S3 bucket
### Test deploy verification checklist
Once deployed, verify:
- [ ] Home page loads (`/`)
- [ ] Login works with a seeded admin user
- [ ] Dashboard renders
- [ ] Create a worker via friendly UI
- [ ] Log attendance
- [ ] Process a payment (critical path)
- [ ] Download a payroll report PDF (verifies WeasyPrint + system libs)
- [ ] Generate CSV exports
- [ ] Django admin pages load
- [ ] Session cookies survive across requests
- [ ] Static files (CSS, images) load from `/static/`
- [ ] Admin can log in and out
### Rollback
Test deploy is entirely on its own branch. If it fails catastrophically, delete the Fly app and the branch. Zero impact on Flatlogic production.
---
## Part C — Recommended execution order
I don't recommend executing this as one big batch. Split into four phases, ship each, observe, then proceed:
**Phase 1 — Critical security fixes (ship to everything)**
- Remove hardcoded email credentials (#1)
- Fix SECRET_KEY default (#2)
- Fix DEBUG default (#3)
- Fix CSRF_TRUSTED_ORIGINS bug (#6)
- **Rotate exposed Gmail App Password**
- Land this on `redesign-weasyprint`, merge to `ai-dev`, let Flatlogic rebuild, verify still working.
**Phase 2 — Create the fork branch**
- Cut `test-deploy-<platform>` off `redesign-weasyprint` after Phase 1 is in.
- No code changes — just the branch exists.
**Phase 3 — Add deploy config for chosen platform**
- Dockerfile, platform config file, WhiteNoise, static-root adjustments, etc.
- Commit on `test-deploy-<platform>` only.
**Phase 4 — Deploy + verify**
- Push branch, deploy, run the verification checklist above.
- Document any platform-specific quirks in a follow-up note.
**Phase 5 (optional, only if going beyond test) — Address MEDIUM findings**
- PDF-None handling (#7)
- `price_overtime` exception leak (#8)
- Clickjacking header (#9)
- Salary validation (#10)
---
## Open questions / decisions needed
1. **What platform did you actually mean?** (SpacetimeDB blocker — see top of doc)
2. For the test deploy, is the database allowed to start empty, or do you want the production MySQL data copied in?
3. Do you want the CRITICAL fixes merged into Flatlogic production (via `ai-dev`) as part of Phase 1, or hold off until the whole plan is approved?
4. Budget sensitivity — Fly.io's free tier is ~$5/mo equivalent; Railway's is similar. Are we constrained to free-tier, or is a few dollars a month OK for test?
---
## Not in scope (explicit)
- Adding new app features (no new views, models, or migrations beyond what exists)
- Rewriting in a different language/framework (e.g., actual SpacetimeDB migration)
- Performance tuning at scale (current data size is small; defer until needed)
- Removing Flatlogic as the production platform — this is a *test* deploy to prove portability, not a migration