9.1 KiB
VM Deployment Runbook
Operational notes for the standard Flatlogic VM deployment used by this
project. This document describes the VM runtime layout, health checks, and the
June 2026 503 Service Unavailable recovery path.
Runtime Topology
The standard VM runs the app behind Apache and Cloudflare:
Cloudflare
-> Apache :80
-> Frontend Next.js production server :3001
-> Backend API :3000
Do not assume older local development ports on the VM. The standard port split
is frontend 3001 and backend 3000:
| Component | VM process | Port | Notes |
|---|---|---|---|
| Apache | apache2 |
80 | Public entrypoint, reverse proxy |
| Frontend | frontend-dev |
3001 | npm run build, then npm run start |
| Backend | backend-dev |
3000 | NODE_ENV=dev_stage npm run start |
| Telemetry | fl-telemetry |
4317/4318 | Executor telemetry daemon |
| Executor | fl-executor |
n/a | VM command/executor bridge |
The backend returns 401 Unauthorized for protected API endpoints without a
JWT. A 401 from http://127.0.0.1:3000/api/... means the backend is alive.
The backend default is port 3000 for dev_stage; an explicit PORT env var
overrides that when needed.
Process Manager
PM2 is managed by systemd:
sudo systemctl status pm2-ubuntu --no-pager
pm2 status
Expected PM2 apps:
| Name | Purpose |
|---|---|
frontend-dev |
Next.js frontend production server |
backend-dev |
Express API, migrations, seed, watcher |
fl-telemetry |
Local telemetry daemon |
fl-executor |
Standard VM executor bridge |
The frontend PM2 app name may remain frontend-dev for compatibility with the
standard VM image, but the process should run the production script. Build the
VM frontend with:
cd /home/ubuntu/executor/workspace/frontend
npm run build
Start it with:
FRONT_PORT=3001 npm run start
The production frontend is a Next.js server build served by next start.
Do not run the VM frontend with next dev; the dev server displays the Next.js
dev indicator in presentations.
Frontend Release Deploys
Automatic VM pulls should deploy the frontend as immutable releases instead of rebuilding in the live workspace. The executor VCS layer builds a fresh copy under:
/home/ubuntu/executor/frontend-releases/<timestamp>-<git-sha>/frontend
The deploy order is:
- Pull the requested branch into
/home/ubuntu/executor/workspace. - Archive
HEADinto a new release directory. - Copy frontend env files from the live workspace when present:
.env,.env.local,.env.production,.env.production.local. - Run
npm ci. - Run
npm run build. - Remove non-runtime build caches from the new release:
.next,.turbo,build/cache. Production runtime assets stay inbuild; localnext dev --turbopackuses.nextto avoid conflicts with production build manifests. - Switch
frontend-devto the new release withFRONT_PORT=3001 pm2 start npm --name frontend-dev -- run start. - Save PM2 and remove old frontend releases.
The active frontend release is the PM2 frontend-dev working directory. Check
it with:
pm2 jlist | jq '.[] | select(.name=="frontend-dev") | {
cwd:.pm2_env.pm_cwd,
script:.pm2_env.pm_exec_path,
args:.pm2_env.args,
env:{FRONT_PORT:.pm2_env.FRONT_PORT}
}'
Retention defaults to the latest 2 release directories. Override it by setting
FRONTEND_RELEASES_KEEP for the executor process before deploy. Do not delete
the active release directory; next start serves production assets from its
build directory.
Manual rollback is possible by starting frontend-dev from an older retained
release:
cd /home/ubuntu/executor/frontend-releases/<release-id>/frontend
pm2 delete frontend-dev
FRONT_PORT=3001 pm2 start npm --name frontend-dev -- run start
pm2 save --force
The PM2 dump is stored at:
~/.pm2/dump.pm2
This file contains environment variables and may contain secrets. Do not paste it into public tools or tickets without redacting tokens, DB passwords, SMTP credentials, API keys, and tunnel credentials.
Health Checks
Use these checks after a deploy or incident:
df -h
df -ih
free -h
sudo ss -ltnp | grep -E ':80|:3001|:3000|:4317|:4318'
curl -I http://127.0.0.1:3001
curl -I http://127.0.0.1:3000/api/auth/me
curl -I http://tbp.flatlogic.app
pm2 status
Expected healthy responses:
http://127.0.0.1:3001returns200 OK.http://127.0.0.1:3000/api/auth/mereturns401 Unauthorizedwithout JWT.http://tbp.flatlogic.appreturns200 OK.- PM2 shows all four apps
online.
Recovering From Apache 503 Service Unavailable
If Apache returns:
Service Unavailable
Apache/2.4.x Server at tbp.flatlogic.app Port 80
first check whether upstream app processes are listening:
sudo ss -ltnp | grep -E ':80|:3001|:3000'
curl -I http://127.0.0.1:3001
curl -I http://127.0.0.1:3000/api/auth/me
sudo systemctl status pm2-ubuntu --no-pager
If Apache is listening but 3001 and 3000 are not, PM2 did not restore or was
stopped. Restart it:
sudo systemctl reset-failed pm2-ubuntu
sudo systemctl restart pm2-ubuntu
pm2 status
Then re-run the health checks.
OOM-Kill Diagnosis
A VM can have enough disk and still fail if the kernel kills PM2 or a child process because memory spikes. Check kernel logs:
journalctl -k --since "YYYY-MM-DD HH:MM" --until "YYYY-MM-DD HH:MM" \
| grep -Ei 'oom|killed process|out of memory'
Known June 2026 incident:
pm2-ubuntu.servicefailed withResult: oom-kill.- Kernel killed
ffmpeg. ffmpegused about 3.3 GiB RSS on a 3.8 GiB RAM VM.- PM2 then stopped
frontend-dev,backend-dev,fl-telemetry, andfl-executor.
This points to reversed video generation rather than Apache, disk space, or frontend routing.
FFmpeg and Reverse Video Generation
The backend uses bundled ffmpeg-static/ffprobe-static via
backend/src/services/videoProcessing.ts; manual OS-level FFmpeg installation
is not required for this project.
Reverse video generation can be memory-heavy for large videos. Operational guardrails:
- FFmpeg reversal is serialized by
videoProcessing.reverseVideo(): only one FFmpeg process runs at a time in the backend process, and additional reverse generation requests wait in an in-process queue. - FFmpeg reversal uses
-threads 1. - FFmpeg reversal has a hard timeout (
FFMPEG_REVERSE_TIMEOUT_MS, default600000, exposed asconfig.resilience.ffmpeg.reverseTimeoutMs) and kills the child process if it exceeds the limit. - FFmpeg reversal is protected by an in-process circuit breaker
(
FFMPEG_BREAKER_FAILURE_THRESHOLD,FFMPEG_BREAKER_COOLDOWN_MS,FFMPEG_BREAKER_SUCCESS_THRESHOLD, exposed underconfig.resilience.ffmpeg.breaker) so repeated media failures stop launching new heavy jobs during the cooldown window. - FFprobe metadata extraction has a timeout (
FFPROBE_TIMEOUT_MS, default30000, exposed asconfig.resilience.ffmpeg.ffprobeTimeoutMs). TourPagesServicededuplicates reverse generation for the same source video storage key.- Treat large source videos as risky on small VMs.
- Check backend PM2 logs for
ffmpegor publish/save background errors. - If the VM OOMs, inspect kernel logs before changing Apache or database config.
Remaining hardening work and follow-up:
- Add input duration/resolution/size checks before reversal.
- Structured logs now include reverse-video input/output size and probed media metadata. Continue tuning rejection thresholds as real VM media patterns are observed.
- Consider running media processing in a separate worker with memory limits.
Logs
Useful log commands:
sudo journalctl -u pm2-ubuntu -n 200 --no-pager
pm2 logs frontend-dev --lines 100
pm2 logs backend-dev --lines 100
pm2 logs fl-executor --lines 100
pm2 logs fl-telemetry --lines 100
sudo tail -n 100 /var/log/apache2/error.log
pm2 logs tails by default. Press Ctrl-C before running the next command.
Executor Notes
The standard VM executor.js in ~/executor is not the web app startup script.
It handles VM commands, VCS operations, AI runner prompts, screenshots, and
telemetry. Starting it manually does not start the frontend/backend app.
Executor workspace path:
/home/ubuntu/executor/workspace
The executor can perform git operations when commanded, including reset/clean workflows through VCS commands. Do not run executor commands blindly when the goal is only to restore the web app. Use PM2/systemd for process recovery.
Node Version
The project requirement is Node.js 20.x LTS. Some standard VMs may report
/usr/bin/node as Node 22 in PM2. If startup fails after a system update,
verify:
node -v
which node
pm2 describe backend-dev
pm2 describe frontend-dev
Changing the VM Node version should be coordinated with PM2 startup paths and a full frontend/backend build check.
Persistence
After changing PM2 process definitions, save the process list:
pm2 save
For an incident-only restart where the process definitions were unchanged,
pm2 save is still safe and keeps the current expected app list for reboot.