From 4f5f1bbe1371d955d5ce8585491e733091478345 Mon Sep 17 00:00:00 2001 From: Konrad du Plessis Date: Fri, 12 Jun 2026 22:35:51 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20audit=20bundle=20deployed=20&=20verifie?= =?UTF-8?q?d=20=E2=80=94=20close=20breadcrumbs,=20park=20leftovers,=20reco?= =?UTF-8?q?rd=20gitea-auth=20break=20+=20autosave-diff=20rule?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Fable 5 --- CLAUDE.md | 82 ++++++++++++++++++++------------------- docs/plans/parked-work.md | 44 ++++++++++++++++++--- 2 files changed, 81 insertions(+), 45 deletions(-) diff --git a/CLAUDE.md b/CLAUDE.md index 0209c46..429512a 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -3,16 +3,16 @@ ## What's mid-flight β€” read this first **Parked / deferred work:** see `docs/plans/parked-work.md`. -**Production status (29 May 2026):** βœ… **fully caught up, verified, -and recovered from a 27-29 May platform incident.** The 36-commit -bundle (Manager/Salaried Pay + pay-type filter + Salary auto-scope -picker + Pay Salary dashboard quick action) is **deployed and -confirmed working on production** (`https://foxlog.flatlogic.app/`, -Konrad verified 17 May 2026 and re-verified post-incident 29 May -2026 via a live test payment + Spark Receipt delivery). -`origin/ai-dev` HEAD `80d96d7` == prod (the only delta over the -functional tip `4c25011` is doc breadcrumbs). Migrations -`0016`/`0017` applied; `static/css/custom.css` collected. +**Production status (12 Jun 2026):** βœ… **fully caught up & verified.** +`origin/ai-dev` HEAD `abfae69` == prod (deployed 12 Jun 2026 via +fetch + `reset --hard` + restart; Konrad browser-verified same day). +Live on prod: the Manager/Salaried bundle (17 May), the SiteReport +removal (migration `0018` confirmed applied), the **12 Jun audit-fix +bundle** (see breadcrumb below), and the Flatlogic preview domain in +`ALLOWED_HOSTS` (`abfae69`). Migrations applied through `0018`; +static collected (no changes since May). **Known platform nit:** the +VM can no longer push to Flatlogic's gitea mirror (auth failed since +the May incident β€” see "Git remotes on the VM" below). **πŸ”₯ Incident 27-29 May 2026 (now closed) β€” what future sessions need to know:** Cloudflare Tunnel error 1033 (27 May) β†’ suspected @@ -34,37 +34,33 @@ password manager. **Strategic side note:** SSH access closes the `C:\Users\konra\.claude\plans\prancy-painting-brook.md` (off-platform backup of `media/` is now feasible via `rsync`). -**SiteReport removal β€” pushed to origin (per git, 12 Jun 2026).** The -"Log Today's Work" / SiteReport removal (migration -`0018_delete_sitereport`, commit `7f5e4c9` + follow-ups through -`663b7d9`) is on `origin/ai-dev` β€” the old "local only, NOT pushed" -HARD STOP here is resolved. Production deploy status of those commits -should be confirmed with Konrad before assuming prod has them -(deploy needs: pull β†’ `/run-migrate/` for `0018` β†’ restart). Design -knowledge for a future rebuild lives in +**SiteReport removal β€” DEPLOYED (confirmed 12 Jun 2026).** Migration +`0018_delete_sitereport` shows `[X]` applied on production and the +code lineage was already on the VM. Fully resolved β€” nothing pending. +Design knowledge for a future rebuild lives in `docs/plans/2026-05-17-site-report-removed-capture.md`; see also the parked rebuild entry in `docs/plans/parked-work.md`. -**πŸš€ Pushed to origin, awaiting PRODUCTION deploy: 12 Jun 2026 -audit-fix bundle** (`14ab8d0..2d3cc43`, 11 commits, pushed 12 Jun -2026 with Konrad's approval). A comprehensive technical audit (4 -parallel review agents + manual verification of every finding) -fixed: email-failure 500 after committed payments (the 28 May -incident class), Batch Pay modal silently re-ticking unticked -workers + swallowing server errors, payments with deductions > -earnings now REFUSED (Konrad's decision β€” no negative -PayrollRecords), attendance date range capped at 31 days, -worker-report views survive junk query params, **worker batch -report's lifetime "Total Paid" column was inflated by the work-log -join (real display bug β€” fixed + regression test)**, report-page -N+1s killed, money paths standardised on Decimal. Verified before -push: suite **206 OK** + live browser checks of the Batch Pay -filter behaviour, attendance cost estimator, payslip preview, and -the 31-day range rejection (12 Jun 2026, local dev). **Deploy to -prod still pending:** pull β†’ restart (NO new migrations; no -`static/` changes so no collectstatic; restart required β€” cached -template loader). Mind the deploy ordering rule: confirm the pull -landed `2d3cc43` BEFORE the restart. +**βœ… 12 Jun 2026 audit-fix bundle β€” DEPLOYED & VERIFIED on +production** (`14ab8d0..abfae69`, 13 commits). A comprehensive +technical audit (4 parallel review agents + manual verification of +every finding) fixed: email-failure 500 after committed payments +(the 28 May incident class β€” interactive callers now get the warning +toast; only batch re-raises), Batch Pay modal silently re-ticking +unticked workers + swallowing server errors, **payments with +deductions > earnings now REFUSED** (Konrad's decision β€” no negative +PayrollRecords; exactly-zero net still allowed; see +`DeductionsExceedEarningsError` in `core/views.py`), attendance date +range capped at 31 days, worker-report views survive junk query +params (`_int_param_or_none`), **worker batch report's lifetime +"Total Paid" was inflated by the work-log join** (fixed + +`WorkerReportLifetimeTotalsTests`), report-page N+1s killed, money +display paths standardised on Decimal. Suite **206 OK**; browser +checks + prod smoke test passed (Konrad, 12 Jun 2026). The deploy +also surfaced + properly committed a VM-side autosave fix +(`abfae69`: Flatlogic preview domain in `ALLOWED_HOSTS`) β€” see the +"reset --hard" warning in the Deployment section. Small leftovers +parked in `docs/plans/parked-work.md` ("Audit leftovers"). **🧊 Backburner β€” do NOT start in `ai-dev`:** Phase A.2 (manual JournalEntry UI) and Phase B (Letterly inbound webhook) are @@ -1009,7 +1005,7 @@ printing the value. - **Service**: The Django app runs as `django-dev.service` (systemd). Gemini restarts it via `sudo systemctl restart django-dev.service`. It runs `python manage.py runserver 0.0.0.0:8000 --insecure` β€” a **development server**, not gunicorn/uwsgi (Flatlogic default, works fine at this scale). - **⚠ The `--insecure` flag on runserver is REQUIRED in production (added 29 May 2026).** With `DEBUG=False` (the correct production state), Django's `runserver` refuses to serve `/static/` files by default β€” every CSS/JS request returns 404, and the dashboard renders as plain unstyled HTML. The `--insecure` flag explicitly opts in to serving static files even with DEBUG off. **If you ever see "everything works but the page looks unstyled" after a deploy:** check the `ExecStart=` line in `/etc/systemd/system/django-dev.service` (or its drop-in directory) β€” if `--insecure` is missing, add it, then `sudo systemctl daemon-reload && sudo systemctl restart django-dev.service`. The proper long-term fix is an Apache `Alias /static/ β†’ staticfiles/` directive that bypasses Django entirely, but `--insecure` is a stable workaround. - **⚠ Cloudflare HIT-caches 404 responses for ~4h.** If a static-file URL returned 404 at any point, Cloudflare will keep serving that 404 even after you fix the underlying problem. To verify a fix without waiting for the TTL: append a random query string (`?cb=$(date +%s)`) β€” that's a cache key Cloudflare hasn't seen, so it fetches from origin. The Flatlogic preview iframe sometimes shows cached-working CSS while a fresh browser tab shows the cached 404; trust the browser tab, not the iframe. -- **⚠ DEPLOY ORDERING β€” pull THEN restart, not the reverse.** Production runs `DEBUG=False`, so Django uses the **cached template loader**: every `.html` template is compiled into memory once at process start and is NEVER re-read from disk until the process restarts. Symptom of getting this wrong: "I pulled the code, `git log` shows the right commit, but the page still looks old." Cause: the `restart` happened *before* the code reached the target commit (e.g. Flatlogic auto-pulled afterward, or Gemini pulled after restarting). **Fix: restart AGAIN, after confirming `git log --oneline -1` is at the target commit.** Correct deploy order is ALWAYS: (1) `git fetch github ai-dev && git reset --hard github/ai-dev`, (2) `/run-migrate/` if there are new migrations, (3) `collectstatic` if `static/` changed, (4) `sudo systemctl restart django-dev.service` **last**. Template-only changes still need the restart (cached loader) β€” unlike local dev where `DEBUG=True` re-reads templates per request. Bit us 15 May 2026: 14 commits of template fixes were "invisible" on prod until a second restart. `git reset --hard github/ai-dev` (not `git pull`) is preferred because the VM accumulates Flatlogic-editor autosave commits that make a plain pull conflict. +- **⚠ DEPLOY ORDERING β€” pull THEN restart, not the reverse.** Production runs `DEBUG=False`, so Django uses the **cached template loader**: every `.html` template is compiled into memory once at process start and is NEVER re-read from disk until the process restarts. Symptom of getting this wrong: "I pulled the code, `git log` shows the right commit, but the page still looks old." Cause: the `restart` happened *before* the code reached the target commit (e.g. Flatlogic auto-pulled afterward, or Gemini pulled after restarting). **Fix: restart AGAIN, after confirming `git log --oneline -1` is at the target commit.** Correct deploy order is ALWAYS: (1) `git fetch github ai-dev && git reset --hard github/ai-dev`, (2) `/run-migrate/` if there are new migrations, (3) `collectstatic` if `static/` changed, (4) `sudo systemctl restart django-dev.service` **last**. Template-only changes still need the restart (cached loader) β€” unlike local dev where `DEBUG=True` re-reads templates per request. Bit us 15 May 2026: 14 commits of template fixes were "invisible" on prod until a second restart. `git reset --hard github/ai-dev` (not `git pull`) is preferred because the VM accumulates Flatlogic-editor autosave commits that make a plain pull conflict. **⚠ BEFORE any reset --hard: `git show --stat` every VM-local commit being discarded, and anchor it with a `pre-*` safety branch.** An "Autosave" commit can contain a REAL fix β€” on 12 Jun 2026 autosave `98f66e9` held the only copy of the Flatlogic preview domain in `ALLOWED_HOSTS` (a 29 May incident-recovery edit); we read the diff, anchored it, and re-committed it properly via GitHub (`abfae69`). Never discard a VM commit unread. - **CDN**: All production traffic goes through Cloudflare. Response headers show `cf-ray`/`cf-cache-status`. Static assets are cached at the edge for 4h β€” see "Static Assets & Cache-Busting" section for how the `deployment_timestamp` token breaks stale caches. - **Never edit `ai-dev` directly on GitHub** β€” Flatlogic pushes overwrite it - **Gemini gotcha**: Flatlogic's Gemini AI reads `__pycache__/*.pyc` and gets confused. Tell it: "Do NOT read .pyc files. Only work with .py source files." @@ -1026,6 +1022,14 @@ which silently confuses deploys. Flatlogic's UI occasionally commits as `Flatlogic Bot ` (autosaves from the in-browser file editor) β€” those commits land on gitea but don't propagate to GitHub unless someone pushes. +**⚠ gitea auth BROKEN since the May 2026 incident (found 12 Jun 2026):** +the VM gets `Authentication failed` pushing to gitea β€” credentials +presumably rotated/lost in the platform recovery. Until Erik/Flatlogic +support restores them, deploys are GitHubβ†’VM one-way and the Flatlogic +dashboard may show a stale commit; don't burn time re-diagnosing a +gitea push failure. Once fixed, re-sync with `git push gitea ai-dev` +(may need `--force` β€” gitea last saw the discarded autosave `98f66e9`). + ### VM-local safety branches When doing risky deploys (model migrations, branch resets, history rewrites), we create a safety branch on the VM at the pre-deploy HEAD so Gemini can diff --git a/docs/plans/parked-work.md b/docs/plans/parked-work.md index 7767bbc..e7bafae 100644 --- a/docs/plans/parked-work.md +++ b/docs/plans/parked-work.md @@ -1,12 +1,45 @@ # Parked / deferred work -> Updated 15 May 2026. A small index of features that are designed, +> Updated 12 Jun 2026. A small index of features that are designed, > half-built, blocked on input, or pending an operator step. When a > fresh session opens, glance here first to see what's already on > the workbench. --- +## πŸ”§ Audit leftovers (12 Jun 2026) β€” small, deliberate deferrals + +The 12 Jun 2026 technical-audit bundle (deployed; see CLAUDE.md +breadcrumb) deliberately left these on the bench: + +1. **Team/project batch-report per-row queries.** + `_build_team_report_context` and `_build_project_report_context` + (`core/views.py`) still run a per-team/per-project query loop + (~2 queries Γ— ~6 rows on `/teams/report/` and `/projects/report/`). + Same disease as the fixed worker report; skipped because the pages + are rarely used and the fix pattern is already proven (copy the + batched-GROUP-BY-dict restructure from + `_build_worker_report_context`). Do it next time those pages feel + slow or get touched anyway. +2. **JS vs server decimal-separator inconsistency (cosmetic).** + Browser-side money helpers use `toLocaleString('en-ZA')` β†’ comma + decimals (`R 2 400,00`); server-side `money` filter renders dot + decimals (`R 2 400.00`). Pre-dates the audit; harmonising means + choosing ONE convention and touching both sides. Pure cosmetics β€” + decide when something user-facing forces the question. +3. **Stale VM safety branches.** Five `pre-*` branches on the VM + (four from Apr 2026 + `pre-audit-deploy-20260612`). After a few + days of confirmed stability post-12-Jun, tell the Flatlogic agent + to delete them all (the 20260612 one anchors autosave `98f66e9`, + whose only real content is now properly committed as `abfae69`). +4. **gitea mirror auth broken** β€” operator step: Konrad asks Erik + (Flatlogic support) to restore the VM's gitea credentials; then + re-sync with `git push gitea ai-dev` (may need `--force`). Until + then deploys are GitHubβ†’VM one-way. Details in CLAUDE.md "Git + remotes on the VM". + +--- + ## ⏸ Paused β€” ready to execute (not started, not pushed) ### Site-progress logging β€” rebuild from scratch (parked) @@ -20,11 +53,10 @@ schema-as-Python pattern, recovery pointers, and the now-superseded `2026-05-15-post-attendance-flow-v2-*` prior thinking, which stays on disk). -**Removal status (local only, HARD STOP β€” not pushed):** Tasks 1-3 -done β€” model/table/UI/routes deleted, migration -`0018_delete_sitereport` drops `core_sitereport`, suite **193 OK**. -Un-pushed pending Konrad's local verification (destructive migration -on the daily-use attendance path). +**Removal status: SHIPPED.** Pushed to origin and confirmed deployed +on production (migration `0018_delete_sitereport` shows `[X]` applied +β€” verified 12 Jun 2026 during the audit-bundle deploy). Nothing +pending; only the future rebuild remains parked. ---