perf(payroll): batch project-loop N+1s + quick-wins pass closing summary

Profiled /payroll/ under Django Debug Toolbar and confirmed heavy N+1 patterns in the shared payroll_dashboard() code path (shared by all four tabs). Main wins: 1. outstanding_project_costs loop + project_chart_data loop previously fired one PayrollAdjustment SELECT per project (outstanding) and per (project x 6 months) (chart) — ~42+7 = 49 round-trips on a 7-project dataset. Replaced with 4 GROUP BY aggregate queries keyed by project_id / (project_id, month), merged in Python. 2. Per-worker Loan.exists() and get_worker_active_team() checks inside the workers_data loop — pre-computed into a set + dict once, up-front. 3. team_workers_map loop used `team.workers.filter(active=True)` which bypasses the prefetch cache; switched to a Prefetch(to_attr=) that returns already-filtered active workers, dropping 6 duplicate SELECTs. 4. Adjustments tab: reused `paginator.count` for the "Total" stat card (was firing a second identical COUNT(*)) and reused existing all_workers / all_teams querysets instead of re-querying for the filter popovers. 5. Hoisted shared lookups (all_workers, active_projects_list, chart date-window) so duplicate ordering-identical SELECTs from multiple call sites collapse into a single evaluated queryset. ===== Quick-Wins Pass A - before/after query counts ===== / 15q, no duplicates (healthy, no fix) /payroll/?status=pending 157q (before) -> 26q (after), 0 dupes /payroll/?status=history 157q -> 26q, 0 dupes /payroll/?status=loans 158q -> 27q, 0 dupes /payroll/?status=adjustments 168q -> 34q, 0 dupes CSS cache-bust token (0c42cde) is still expected to be the biggest user-felt improvement of this pass — custom.css now holds at Cloudflare's edge for its full 4h TTL instead of being re-fetched from the VM on every page load. The payroll-dashboard query-count cut (~131 SQL round-trips trimmed per render) is a meaningful admin-UX latency win on top of that, especially under MySQL over the Flatlogic network. WeasyPrint confirmed still lazy-imported. Test suite: 68/68. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 01:16:37 +02:00 · 2026-04-24 01:16:37 +02:00 · 61c485ffcf
commit 61c485ffcf
parent 2731ac9ffd
1 changed files with 207 additions and 63 deletions
--- a/core/views.py
+++ b/core/views.py
@ -2617,6 +2617,30 @@ def payroll_dashboard(request):
    pending_adj_sub_total = Decimal('0.00')      # Unpaid deductive adjustments
    all_ot_data = []  # For the Price Overtime modal

+    # === PRE-COMPUTED LOOKUPS — avoid per-worker SELECTs in the loop below ===
+    # Previously the loop fired:
+    #   - one `Loan.objects.filter(worker=w, active=True).exists()` per worker
+    #   - one `worker.teams.filter(active=True).first()` per worker (via
+    #     get_worker_active_team) — which fires a fresh SELECT even though
+    #     active_workers was prefetched, because `.filter()` bypasses the
+    #     prefetch cache.
+    # We batch both into dict lookups keyed by worker_id.
+    workers_with_active_loan = set(
+        Loan.objects.filter(active=True).values_list('worker_id', flat=True).distinct()
+    )
+    # Map worker_id → first active Team instance (mirrors get_worker_active_team).
+    # We load every active team once, then walk the through-table to find the
+    # first active team per worker.
+    active_team_by_id = {t.id: t for t in Team.objects.filter(active=True)}
+    worker_active_team = {}
+    for membership in Team.workers.through.objects.filter(
+        team_id__in=active_team_by_id.keys()
+    ).values('team_id', 'worker_id'):
+        wid = membership['worker_id']
+        if wid in worker_active_team:
+            continue
+        worker_active_team[wid] = active_team_by_id[membership['team_id']]
+
    for worker in active_workers:
        # Find unpaid work logs for this worker.
        # A log is "unpaid for this worker" if no PayrollRecord links
@ -2668,7 +2692,8 @@ def payroll_dashboard(request):
            # --- Overdue detection ---
            # A worker is "overdue" if they have unpaid work from a completed pay period.
            # Uses their team's pay schedule to determine the cutoff date.
-            team = get_worker_active_team(worker)
+            # PERF: team lookup via pre-computed dict (no per-worker SELECT).
+            team = worker_active_team.get(worker.id)
            team_name = team.name if team else ''
            earliest_unpaid = min((l.date for l in unpaid_logs), default=None) if unpaid_logs else None
            is_overdue = False
@ -2678,7 +2703,8 @@ def payroll_dashboard(request):
                    cutoff = period_start - datetime.timedelta(days=1)
                    is_overdue = earliest_unpaid <= cutoff

-            has_loan = Loan.objects.filter(worker=worker, active=True).exists()
+            # PERF: loan membership via pre-computed set (no per-worker SELECT).
+            has_loan = worker.id in workers_with_active_loan

            # Most recent project — used by the "Adjust" button to pre-select project
            last_project_id = unpaid_logs[-1].project_id if unpaid_logs else None
@ -2718,31 +2744,16 @@ def payroll_dashboard(request):
    # --- Outstanding cost per project ---
    # Check per-worker: a WorkLog is "unpaid for worker X" if no PayrollRecord
    # links BOTH that log AND that worker. This handles partially-paid logs.
-    outstanding_project_costs = []
-    for project in Project.objects.filter(active=True):
-        project_outstanding = Decimal('0.00')
-        # Unpaid work log costs — check each worker individually
-        for log in project.work_logs.prefetch_related('payroll_records', 'workers').all():
-            paid_worker_ids = {pr.worker_id for pr in log.payroll_records.all()}
-            for w in log.workers.all():
-                if w.id not in paid_worker_ids:
-                    project_outstanding += w.daily_rate
-        # Unpaid adjustments for this project
-        unpaid_adjs = PayrollAdjustment.objects.filter(
-            payroll_record__isnull=True
-        ).filter(Q(project=project) | Q(work_log__project=project))
-        for adj in unpaid_adjs:
-            if adj.type in ADDITIVE_TYPES:
-                project_outstanding += adj.amount
-            elif adj.type in DEDUCTIVE_TYPES:
-                project_outstanding -= adj.amount
-        if project_outstanding != 0:
-            outstanding_project_costs.append({
-                'name': project.name,
-                'cost': project_outstanding,
-            })
+    #
+    # PERF: materialise the active-project list once and reuse it for both
+    # the outstanding-costs loop and the chart-data loop below. Previously
+    # each loop re-queried `Project.objects.filter(active=True)`, firing the
+    # same SELECT twice per dashboard render.
+    active_projects_list = list(Project.objects.filter(active=True))
+    active_project_ids = [p.id for p in active_projects_list]

-    # --- Chart data: last 6 months ---
+    # === CHART DATE-WINDOW SETUP (moved up so the batched queries below can
+    # also use it) ===
    today = timezone.now().date()
    chart_months = []
    for i in range(5, -1, -1):
@ -2756,6 +2767,76 @@ def payroll_dashboard(request):
    chart_labels = [
        datetime.date(y, m, 1).strftime('%b %Y') for y, m in chart_months
    ]
+    six_months_ago_date = datetime.date(chart_months[0][0], chart_months[0][1], 1)
+
+    # === BATCHED AGGREGATES: one SQL query per concept instead of per-project ===
+    # Previously we looped over each active project and issued:
+    #   - 1 SELECT of WorkLog (with workers prefetch) per project
+    #   - 1 SELECT of PayrollAdjustment (unpaid) per project
+    #   - 1 SELECT of WorkLog (workers prefetch) per project × 6 months
+    #   - 1 SELECT of PayrollAdjustment (paid) per project × 6 months
+    # On a ~7-project dataset that's ~7+7+42+42 ≈ 98 SQL round-trips.
+    # The rewrite replaces those with 4 GROUP-BY queries that return
+    # project_id (and month, where relevant) → total, plus one query for
+    # per-log paid-worker sets.
+
+    # --- 1. Unpaid-work-log cost per project ---
+    # We can't do pure SQL aggregation for this because a WorkLog can be
+    # partially paid (one worker of two). We still need per-log inspection,
+    # BUT we can load all unpaid-or-partially-paid logs + their workers +
+    # payroll_records in a bounded set of queries using prefetch_related
+    # rather than looping one project at a time.
+    project_outstanding_map = {pid: Decimal('0.00') for pid in active_project_ids}
+
+    all_project_logs = WorkLog.objects.filter(
+        project_id__in=active_project_ids
+    ).prefetch_related('payroll_records', 'workers')
+    for log in all_project_logs:
+        paid_worker_ids = {pr.worker_id for pr in log.payroll_records.all()}
+        for w in log.workers.all():
+            if w.id not in paid_worker_ids:
+                project_outstanding_map[log.project_id] += w.daily_rate
+
+    # --- 2. Unpaid-adjustment net per project (batched via two GROUP BYs) ---
+    # Each unpaid adjustment contributes to "its project" (direct FK) OR
+    # its work_log's project. We aggregate both sides and merge in Python.
+    def _sum_adj_by_project(qs, project_col):
+        # Sum adjustment amounts grouped by project_col (e.g. 'project_id'),
+        # separated by type family so we can apply add/subtract correctly.
+        rows = qs.values(project_col, 'type').annotate(total=Sum('amount'))
+        additive = {pid: Decimal('0.00') for pid in active_project_ids}
+        deductive = {pid: Decimal('0.00') for pid in active_project_ids}
+        for row in rows:
+            pid = row[project_col]
+            if pid not in additive:
+                continue
+            if row['type'] in ADDITIVE_TYPES:
+                additive[pid] += row['total']
+            elif row['type'] in DEDUCTIVE_TYPES:
+                deductive[pid] += row['total']
+        return additive, deductive
+
+    unpaid_adj_base = PayrollAdjustment.objects.filter(payroll_record__isnull=True)
+    unpaid_direct_add, unpaid_direct_sub = _sum_adj_by_project(
+        unpaid_adj_base.filter(project_id__in=active_project_ids),
+        'project_id',
+    )
+    unpaid_wl_add, unpaid_wl_sub = _sum_adj_by_project(
+        unpaid_adj_base.filter(work_log__project_id__in=active_project_ids),
+        'work_log__project_id',
+    )
+    for pid in active_project_ids:
+        project_outstanding_map[pid] += unpaid_direct_add[pid] + unpaid_wl_add[pid]
+        project_outstanding_map[pid] -= unpaid_direct_sub[pid] + unpaid_wl_sub[pid]
+
+    outstanding_project_costs = []
+    for project in active_projects_list:
+        cost = project_outstanding_map[project.id]
+        if cost != 0:
+            outstanding_project_costs.append({
+                'name': project.name,
+                'cost': cost,
+            })

    # Monthly payroll totals
    paid_by_month_qs = PayrollRecord.objects.annotate(
@ -2767,28 +2848,71 @@ def payroll_dashboard(request):
    }
    chart_totals = [paid_by_month.get((y, m), 0) for y, m in chart_months]

-    # Per-project monthly costs (for stacked bar chart)
+    # --- 3. Per-project × per-month work-log cost (for stacked bar chart) ---
+    # Aggregate worker×log rows directly in SQL: one GROUP BY
+    # (project_id, year, month) returns all we need.
+    project_month_wage = {
+        (pid, y, m): Decimal('0.00')
+        for pid in active_project_ids for y, m in chart_months
+    }
+    wage_rows = WorkLog.objects.filter(
+        project_id__in=active_project_ids,
+        date__gte=six_months_ago_date,
+    ).annotate(month=TruncMonth('date')).values(
+        'project_id', 'month', 'workers__monthly_salary'
+    ).annotate(worker_count=Count('workers'))
+    # Each row = one (project, month, distinct salary) with how many workers
+    # at that salary were logged. Multiply by daily_rate (salary / 20) × count.
+    for row in wage_rows:
+        salary = row['workers__monthly_salary']
+        if salary is None:
+            continue
+        key = (row['project_id'], row['month'].year, row['month'].month)
+        if key not in project_month_wage:
+            continue
+        daily = Decimal(salary) / Decimal('20.00')
+        project_month_wage[key] += daily * row['worker_count']
+
+    # --- 4. Per-project × per-month paid-adjustment net ---
+    paid_adj_base = PayrollAdjustment.objects.filter(
+        payroll_record__isnull=False,
+        date__gte=six_months_ago_date,
+    ).annotate(month=TruncMonth('date'))
+
+    def _sum_paid_adj_by_project_month(qs, project_col):
+        rows = qs.values(project_col, 'month', 'type').annotate(total=Sum('amount'))
+        add = {}
+        sub = {}
+        for row in rows:
+            pid = row[project_col]
+            if pid not in project_outstanding_map:  # only active projects
+                continue
+            key = (pid, row['month'].year, row['month'].month)
+            if row['type'] in ADDITIVE_TYPES:
+                add[key] = add.get(key, Decimal('0.00')) + row['total']
+            elif row['type'] in DEDUCTIVE_TYPES:
+                sub[key] = sub.get(key, Decimal('0.00')) + row['total']
+        return add, sub
+
+    paid_direct_add, paid_direct_sub = _sum_paid_adj_by_project_month(
+        paid_adj_base.filter(project_id__in=active_project_ids),
+        'project_id',
+    )
+    paid_wl_add, paid_wl_sub = _sum_paid_adj_by_project_month(
+        paid_adj_base.filter(work_log__project_id__in=active_project_ids),
+        'work_log__project_id',
+    )
+
    project_chart_data = []
-    for project in Project.objects.filter(active=True):
+    for project in active_projects_list:
        monthly_data = []
        for y, m in chart_months:
-            month_cost = Decimal('0.00')
-            month_logs = project.work_logs.filter(
-                date__year=y, date__month=m
-            ).prefetch_related('workers')
-            for log in month_logs:
-                for w in log.workers.all():
-                    month_cost += w.daily_rate
-            # Include paid adjustments for this project in this month
-            paid_adjs = PayrollAdjustment.objects.filter(
-                payroll_record__isnull=False,
-                date__year=y, date__month=m,
-            ).filter(Q(project=project) | Q(work_log__project=project))
-            for adj in paid_adjs:
-                if adj.type in ADDITIVE_TYPES:
-                    month_cost += adj.amount
-                elif adj.type in DEDUCTIVE_TYPES:
-                    month_cost -= adj.amount
+            key = (project.id, y, m)
+            month_cost = project_month_wage.get(key, Decimal('0.00'))
+            month_cost += paid_direct_add.get(key, Decimal('0.00'))
+            month_cost += paid_wl_add.get(key, Decimal('0.00'))
+            month_cost -= paid_direct_sub.get(key, Decimal('0.00'))
+            month_cost -= paid_wl_sub.get(key, Decimal('0.00'))
            monthly_data.append(float(month_cost))
        if any(v > 0 for v in monthly_data):
            project_chart_data.append({
@ -2801,9 +2925,9 @@ def payroll_dashboard(request):
    # This powers the "By Worker" toggle on the Monthly Payroll Totals chart.
    # Only ~14 workers x 6 months = tiny dataset, so we embed it all as JSON
    # and switching between workers is instant (no server round-trips).
-
-    # Starting date for the 6-month window (first day of the oldest chart month)
-    six_months_ago_date = datetime.date(chart_months[0][0], chart_months[0][1], 1)
+    #
+    # `six_months_ago_date` is already defined above (hoisted next to the
+    # date-window setup) and reused here.

    # Query 1: Total amount paid per worker per month.
    # Uses database-level grouping — one query for ALL workers at once.
@ -2848,8 +2972,13 @@ def payroll_dashboard(request):
    # Base pay is reverse-engineered from the net total:
    #   amount_paid = base + overtime + bonus + new_loan - deduction - loan_repayment - advance
    #   So: base = amount_paid - overtime - bonus - new_loan + deduction + loan_repayment + advance
+    #
+    # PERF: reuse `active_workers` (already loaded + cached at the top of the
+    # function) instead of re-querying Worker.objects.filter(active=True).
+    # Same ordered row-set; saves an SQL round-trip. The unused prefetches
+    # on `active_workers` are already materialised so they cost nothing extra.
    worker_chart_data = {}
-    for worker in Worker.objects.filter(active=True).order_by('name'):
+    for worker in active_workers:
        months_data = []
        has_any_data = False

@ -2903,16 +3032,25 @@ def payroll_dashboard(request):
    )['total'] or Decimal('0.00')

    # --- Active projects and workers for modal dropdowns ---
+    # `active_workers` is reused (already loaded + evaluated by the workers_data
+    # loop). For the modal-dropdown context key we alias it as `all_workers`
+    # so the template name stays descriptive.
+    all_workers = active_workers
    active_projects = Project.objects.filter(active=True).order_by('name')
-    all_workers = Worker.objects.filter(active=True).order_by('name')
-    all_teams = Team.objects.filter(active=True).prefetch_related('workers').order_by('name')
+    all_teams = Team.objects.filter(active=True).prefetch_related(
+        # PERF: prefetch only the active workers so the template's
+        # `team.workers.all` (and our map below) already filters to active
+        # without re-querying. Using `.filter()` on the plain `workers`
+        # accessor bypasses Django's prefetch cache and fires one SELECT
+        # per team — an N+1 we need to avoid.
+        Prefetch('workers', queryset=Worker.objects.filter(active=True), to_attr='active_workers_cached')
+    ).order_by('name')

-    # Team-workers map for auto-selecting workers when a team is picked
+    # Team-workers map for auto-selecting workers when a team is picked.
+    # Uses the prefetched `active_workers_cached` list — no extra queries.
    team_workers_map = {}
    for team in all_teams:
-        team_workers_map[str(team.id)] = list(
-            team.workers.filter(active=True).values_list('id', flat=True)
-        )
+        team_workers_map[str(team.id)] = [w.id for w in team.active_workers_cached]

    # NOTE: Pass raw Python objects here, NOT json.dumps() strings.
    # The template uses Django's |json_script filter which handles
@ -3026,10 +3164,17 @@ def payroll_dashboard(request):
        # main sort key has ties (e.g. two adjustments on the same date).
        adjustments = adjustments.order_by(sort_field, '-id')

-        # --- Stats cards (all computed BEFORE pagination) ---
+        # --- Pagination: 50 rows per page (flat view only) ---
+        # PERF: build the paginator first so we can reuse its cached `count`
+        # for the "Total adjustments" stat card below — avoids a duplicate
+        # `SELECT COUNT(*) FROM core_payrolladjustment`.
+        paginator = Paginator(adjustments, 50)
+        adj_page = paginator.get_page(request.GET.get('page', 1))
+
+        # --- Stats cards (all computed BEFORE pagination cuts the rows) ---
        # These numbers always reflect what the current filter produces,
        # not just what fits on the current page.
-        adj_total_count = adjustments.count()
+        adj_total_count = paginator.count
        unpaid_qs = adjustments.filter(payroll_record__isnull=True)
        adj_unpaid_count = unpaid_qs.count()
        adj_unpaid_sum = unpaid_qs.aggregate(
@ -3054,10 +3199,6 @@ def payroll_dashboard(request):
        if group_by in ('type', 'worker'):
            adj_groups = _group_adjustments(list(adjustments), group_by)

-        # --- Pagination: 50 rows per page (flat view only) ---
-        paginator = Paginator(adjustments, 50)
-        adj_page = paginator.get_page(request.GET.get('page', 1))
-
        # --- Everything the Adjustments tab template will need ---
        context.update({
            'adj_page': adj_page,
@ -3083,8 +3224,11 @@ def payroll_dashboard(request):
            # 'adjustment_types' context var (which is TYPE_CHOICES tuples
            # used by the Add/Edit adjustment modals).
            'adj_type_choices': list(ADDITIVE_TYPES) + list(DEDUCTIVE_TYPES),
-            'all_workers_for_filter': Worker.objects.filter(active=True).order_by('name'),
-            'all_teams_for_filter': Team.objects.filter(active=True).order_by('name'),
+            # PERF: reuse `all_workers`/`all_teams` (already cached above for
+            # the Add-Adjustment modal) — same row-set, same ordering, so no
+            # need to re-query the database for the filter popovers.
+            'all_workers_for_filter': all_workers,
+            'all_teams_for_filter': all_teams,
            # Task 4 will use this to decide +/- signs on each row.
            'additive_types': list(ADDITIVE_TYPES),
            # === CROSS-FILTER SOURCE: (team_id, worker_id) PAIRS ===