docs: audit bundle deployed & verified — close breadcrumbs, park leftovers, record gitea-auth break + autosave-diff rule
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
abfae69606
commit
4f5f1bbe13
82
CLAUDE.md
82
CLAUDE.md
@ -3,16 +3,16 @@
|
||||
## What's mid-flight — read this first
|
||||
**Parked / deferred work:** see `docs/plans/parked-work.md`.
|
||||
|
||||
**Production status (29 May 2026):** ✅ **fully caught up, verified,
|
||||
and recovered from a 27-29 May platform incident.** The 36-commit
|
||||
bundle (Manager/Salaried Pay + pay-type filter + Salary auto-scope
|
||||
picker + Pay Salary dashboard quick action) is **deployed and
|
||||
confirmed working on production** (`https://foxlog.flatlogic.app/`,
|
||||
Konrad verified 17 May 2026 and re-verified post-incident 29 May
|
||||
2026 via a live test payment + Spark Receipt delivery).
|
||||
`origin/ai-dev` HEAD `80d96d7` == prod (the only delta over the
|
||||
functional tip `4c25011` is doc breadcrumbs). Migrations
|
||||
`0016`/`0017` applied; `static/css/custom.css` collected.
|
||||
**Production status (12 Jun 2026):** ✅ **fully caught up & verified.**
|
||||
`origin/ai-dev` HEAD `abfae69` == prod (deployed 12 Jun 2026 via
|
||||
fetch + `reset --hard` + restart; Konrad browser-verified same day).
|
||||
Live on prod: the Manager/Salaried bundle (17 May), the SiteReport
|
||||
removal (migration `0018` confirmed applied), the **12 Jun audit-fix
|
||||
bundle** (see breadcrumb below), and the Flatlogic preview domain in
|
||||
`ALLOWED_HOSTS` (`abfae69`). Migrations applied through `0018`;
|
||||
static collected (no changes since May). **Known platform nit:** the
|
||||
VM can no longer push to Flatlogic's gitea mirror (auth failed since
|
||||
the May incident — see "Git remotes on the VM" below).
|
||||
|
||||
**🔥 Incident 27-29 May 2026 (now closed) — what future sessions
|
||||
need to know:** Cloudflare Tunnel error 1033 (27 May) → suspected
|
||||
@ -34,37 +34,33 @@ password manager. **Strategic side note:** SSH access closes the
|
||||
`C:\Users\konra\.claude\plans\prancy-painting-brook.md` (off-platform
|
||||
backup of `media/` is now feasible via `rsync`).
|
||||
|
||||
**SiteReport removal — pushed to origin (per git, 12 Jun 2026).** The
|
||||
"Log Today's Work" / SiteReport removal (migration
|
||||
`0018_delete_sitereport`, commit `7f5e4c9` + follow-ups through
|
||||
`663b7d9`) is on `origin/ai-dev` — the old "local only, NOT pushed"
|
||||
HARD STOP here is resolved. Production deploy status of those commits
|
||||
should be confirmed with Konrad before assuming prod has them
|
||||
(deploy needs: pull → `/run-migrate/` for `0018` → restart). Design
|
||||
knowledge for a future rebuild lives in
|
||||
**SiteReport removal — DEPLOYED (confirmed 12 Jun 2026).** Migration
|
||||
`0018_delete_sitereport` shows `[X]` applied on production and the
|
||||
code lineage was already on the VM. Fully resolved — nothing pending.
|
||||
Design knowledge for a future rebuild lives in
|
||||
`docs/plans/2026-05-17-site-report-removed-capture.md`; see also the
|
||||
parked rebuild entry in `docs/plans/parked-work.md`.
|
||||
|
||||
**🚀 Pushed to origin, awaiting PRODUCTION deploy: 12 Jun 2026
|
||||
audit-fix bundle** (`14ab8d0..2d3cc43`, 11 commits, pushed 12 Jun
|
||||
2026 with Konrad's approval). A comprehensive technical audit (4
|
||||
parallel review agents + manual verification of every finding)
|
||||
fixed: email-failure 500 after committed payments (the 28 May
|
||||
incident class), Batch Pay modal silently re-ticking unticked
|
||||
workers + swallowing server errors, payments with deductions >
|
||||
earnings now REFUSED (Konrad's decision — no negative
|
||||
PayrollRecords), attendance date range capped at 31 days,
|
||||
worker-report views survive junk query params, **worker batch
|
||||
report's lifetime "Total Paid" column was inflated by the work-log
|
||||
join (real display bug — fixed + regression test)**, report-page
|
||||
N+1s killed, money paths standardised on Decimal. Verified before
|
||||
push: suite **206 OK** + live browser checks of the Batch Pay
|
||||
filter behaviour, attendance cost estimator, payslip preview, and
|
||||
the 31-day range rejection (12 Jun 2026, local dev). **Deploy to
|
||||
prod still pending:** pull → restart (NO new migrations; no
|
||||
`static/` changes so no collectstatic; restart required — cached
|
||||
template loader). Mind the deploy ordering rule: confirm the pull
|
||||
landed `2d3cc43` BEFORE the restart.
|
||||
**✅ 12 Jun 2026 audit-fix bundle — DEPLOYED & VERIFIED on
|
||||
production** (`14ab8d0..abfae69`, 13 commits). A comprehensive
|
||||
technical audit (4 parallel review agents + manual verification of
|
||||
every finding) fixed: email-failure 500 after committed payments
|
||||
(the 28 May incident class — interactive callers now get the warning
|
||||
toast; only batch re-raises), Batch Pay modal silently re-ticking
|
||||
unticked workers + swallowing server errors, **payments with
|
||||
deductions > earnings now REFUSED** (Konrad's decision — no negative
|
||||
PayrollRecords; exactly-zero net still allowed; see
|
||||
`DeductionsExceedEarningsError` in `core/views.py`), attendance date
|
||||
range capped at 31 days, worker-report views survive junk query
|
||||
params (`_int_param_or_none`), **worker batch report's lifetime
|
||||
"Total Paid" was inflated by the work-log join** (fixed +
|
||||
`WorkerReportLifetimeTotalsTests`), report-page N+1s killed, money
|
||||
display paths standardised on Decimal. Suite **206 OK**; browser
|
||||
checks + prod smoke test passed (Konrad, 12 Jun 2026). The deploy
|
||||
also surfaced + properly committed a VM-side autosave fix
|
||||
(`abfae69`: Flatlogic preview domain in `ALLOWED_HOSTS`) — see the
|
||||
"reset --hard" warning in the Deployment section. Small leftovers
|
||||
parked in `docs/plans/parked-work.md` ("Audit leftovers").
|
||||
|
||||
**🧊 Backburner — do NOT start in `ai-dev`:** Phase A.2 (manual
|
||||
JournalEntry UI) and Phase B (Letterly inbound webhook) are
|
||||
@ -1009,7 +1005,7 @@ printing the value.
|
||||
- **Service**: The Django app runs as `django-dev.service` (systemd). Gemini restarts it via `sudo systemctl restart django-dev.service`. It runs `python manage.py runserver 0.0.0.0:8000 --insecure` — a **development server**, not gunicorn/uwsgi (Flatlogic default, works fine at this scale).
|
||||
- **⚠ The `--insecure` flag on runserver is REQUIRED in production (added 29 May 2026).** With `DEBUG=False` (the correct production state), Django's `runserver` refuses to serve `/static/` files by default — every CSS/JS request returns 404, and the dashboard renders as plain unstyled HTML. The `--insecure` flag explicitly opts in to serving static files even with DEBUG off. **If you ever see "everything works but the page looks unstyled" after a deploy:** check the `ExecStart=` line in `/etc/systemd/system/django-dev.service` (or its drop-in directory) — if `--insecure` is missing, add it, then `sudo systemctl daemon-reload && sudo systemctl restart django-dev.service`. The proper long-term fix is an Apache `Alias /static/ → staticfiles/` directive that bypasses Django entirely, but `--insecure` is a stable workaround.
|
||||
- **⚠ Cloudflare HIT-caches 404 responses for ~4h.** If a static-file URL returned 404 at any point, Cloudflare will keep serving that 404 even after you fix the underlying problem. To verify a fix without waiting for the TTL: append a random query string (`?cb=$(date +%s)`) — that's a cache key Cloudflare hasn't seen, so it fetches from origin. The Flatlogic preview iframe sometimes shows cached-working CSS while a fresh browser tab shows the cached 404; trust the browser tab, not the iframe.
|
||||
- **⚠ DEPLOY ORDERING — pull THEN restart, not the reverse.** Production runs `DEBUG=False`, so Django uses the **cached template loader**: every `.html` template is compiled into memory once at process start and is NEVER re-read from disk until the process restarts. Symptom of getting this wrong: "I pulled the code, `git log` shows the right commit, but the page still looks old." Cause: the `restart` happened *before* the code reached the target commit (e.g. Flatlogic auto-pulled afterward, or Gemini pulled after restarting). **Fix: restart AGAIN, after confirming `git log --oneline -1` is at the target commit.** Correct deploy order is ALWAYS: (1) `git fetch github ai-dev && git reset --hard github/ai-dev`, (2) `/run-migrate/` if there are new migrations, (3) `collectstatic` if `static/` changed, (4) `sudo systemctl restart django-dev.service` **last**. Template-only changes still need the restart (cached loader) — unlike local dev where `DEBUG=True` re-reads templates per request. Bit us 15 May 2026: 14 commits of template fixes were "invisible" on prod until a second restart. `git reset --hard github/ai-dev` (not `git pull`) is preferred because the VM accumulates Flatlogic-editor autosave commits that make a plain pull conflict.
|
||||
- **⚠ DEPLOY ORDERING — pull THEN restart, not the reverse.** Production runs `DEBUG=False`, so Django uses the **cached template loader**: every `.html` template is compiled into memory once at process start and is NEVER re-read from disk until the process restarts. Symptom of getting this wrong: "I pulled the code, `git log` shows the right commit, but the page still looks old." Cause: the `restart` happened *before* the code reached the target commit (e.g. Flatlogic auto-pulled afterward, or Gemini pulled after restarting). **Fix: restart AGAIN, after confirming `git log --oneline -1` is at the target commit.** Correct deploy order is ALWAYS: (1) `git fetch github ai-dev && git reset --hard github/ai-dev`, (2) `/run-migrate/` if there are new migrations, (3) `collectstatic` if `static/` changed, (4) `sudo systemctl restart django-dev.service` **last**. Template-only changes still need the restart (cached loader) — unlike local dev where `DEBUG=True` re-reads templates per request. Bit us 15 May 2026: 14 commits of template fixes were "invisible" on prod until a second restart. `git reset --hard github/ai-dev` (not `git pull`) is preferred because the VM accumulates Flatlogic-editor autosave commits that make a plain pull conflict. **⚠ BEFORE any reset --hard: `git show --stat` every VM-local commit being discarded, and anchor it with a `pre-*` safety branch.** An "Autosave" commit can contain a REAL fix — on 12 Jun 2026 autosave `98f66e9` held the only copy of the Flatlogic preview domain in `ALLOWED_HOSTS` (a 29 May incident-recovery edit); we read the diff, anchored it, and re-committed it properly via GitHub (`abfae69`). Never discard a VM commit unread.
|
||||
- **CDN**: All production traffic goes through Cloudflare. Response headers show `cf-ray`/`cf-cache-status`. Static assets are cached at the edge for 4h — see "Static Assets & Cache-Busting" section for how the `deployment_timestamp` token breaks stale caches.
|
||||
- **Never edit `ai-dev` directly on GitHub** — Flatlogic pushes overwrite it
|
||||
- **Gemini gotcha**: Flatlogic's Gemini AI reads `__pycache__/*.pyc` and gets confused. Tell it: "Do NOT read .pyc files. Only work with .py source files."
|
||||
@ -1026,6 +1022,14 @@ which silently confuses deploys. Flatlogic's UI occasionally commits as
|
||||
`Flatlogic Bot <support@flatlogic.com>` (autosaves from the in-browser file editor) —
|
||||
those commits land on gitea but don't propagate to GitHub unless someone pushes.
|
||||
|
||||
**⚠ gitea auth BROKEN since the May 2026 incident (found 12 Jun 2026):**
|
||||
the VM gets `Authentication failed` pushing to gitea — credentials
|
||||
presumably rotated/lost in the platform recovery. Until Erik/Flatlogic
|
||||
support restores them, deploys are GitHub→VM one-way and the Flatlogic
|
||||
dashboard may show a stale commit; don't burn time re-diagnosing a
|
||||
gitea push failure. Once fixed, re-sync with `git push gitea ai-dev`
|
||||
(may need `--force` — gitea last saw the discarded autosave `98f66e9`).
|
||||
|
||||
### VM-local safety branches
|
||||
When doing risky deploys (model migrations, branch resets, history rewrites), we
|
||||
create a safety branch on the VM at the pre-deploy HEAD so Gemini can
|
||||
|
||||
@ -1,12 +1,45 @@
|
||||
# Parked / deferred work
|
||||
|
||||
> Updated 15 May 2026. A small index of features that are designed,
|
||||
> Updated 12 Jun 2026. A small index of features that are designed,
|
||||
> half-built, blocked on input, or pending an operator step. When a
|
||||
> fresh session opens, glance here first to see what's already on
|
||||
> the workbench.
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Audit leftovers (12 Jun 2026) — small, deliberate deferrals
|
||||
|
||||
The 12 Jun 2026 technical-audit bundle (deployed; see CLAUDE.md
|
||||
breadcrumb) deliberately left these on the bench:
|
||||
|
||||
1. **Team/project batch-report per-row queries.**
|
||||
`_build_team_report_context` and `_build_project_report_context`
|
||||
(`core/views.py`) still run a per-team/per-project query loop
|
||||
(~2 queries × ~6 rows on `/teams/report/` and `/projects/report/`).
|
||||
Same disease as the fixed worker report; skipped because the pages
|
||||
are rarely used and the fix pattern is already proven (copy the
|
||||
batched-GROUP-BY-dict restructure from
|
||||
`_build_worker_report_context`). Do it next time those pages feel
|
||||
slow or get touched anyway.
|
||||
2. **JS vs server decimal-separator inconsistency (cosmetic).**
|
||||
Browser-side money helpers use `toLocaleString('en-ZA')` → comma
|
||||
decimals (`R 2 400,00`); server-side `money` filter renders dot
|
||||
decimals (`R 2 400.00`). Pre-dates the audit; harmonising means
|
||||
choosing ONE convention and touching both sides. Pure cosmetics —
|
||||
decide when something user-facing forces the question.
|
||||
3. **Stale VM safety branches.** Five `pre-*` branches on the VM
|
||||
(four from Apr 2026 + `pre-audit-deploy-20260612`). After a few
|
||||
days of confirmed stability post-12-Jun, tell the Flatlogic agent
|
||||
to delete them all (the 20260612 one anchors autosave `98f66e9`,
|
||||
whose only real content is now properly committed as `abfae69`).
|
||||
4. **gitea mirror auth broken** — operator step: Konrad asks Erik
|
||||
(Flatlogic support) to restore the VM's gitea credentials; then
|
||||
re-sync with `git push gitea ai-dev` (may need `--force`). Until
|
||||
then deploys are GitHub→VM one-way. Details in CLAUDE.md "Git
|
||||
remotes on the VM".
|
||||
|
||||
---
|
||||
|
||||
## ⏸ Paused — ready to execute (not started, not pushed)
|
||||
|
||||
### Site-progress logging — rebuild from scratch (parked)
|
||||
@ -20,11 +53,10 @@ schema-as-Python pattern, recovery pointers, and the now-superseded
|
||||
`2026-05-15-post-attendance-flow-v2-*` prior thinking, which stays on
|
||||
disk).
|
||||
|
||||
**Removal status (local only, HARD STOP — not pushed):** Tasks 1-3
|
||||
done — model/table/UI/routes deleted, migration
|
||||
`0018_delete_sitereport` drops `core_sitereport`, suite **193 OK**.
|
||||
Un-pushed pending Konrad's local verification (destructive migration
|
||||
on the daily-use attendance path).
|
||||
**Removal status: SHIPPED.** Pushed to origin and confirmed deployed
|
||||
on production (migration `0018_delete_sitereport` shows `[X]` applied
|
||||
— verified 12 Jun 2026 during the audit-bundle deploy). Nothing
|
||||
pending; only the future rebuild remains parked.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user