22 KiB
22 KiB
๐ป Ghost Node โ Session History Archive
DO NOT load this at the start of a session โ it is archaeology, not active reference. Open it only if asked about historical context or old decisions. Current work โ PROGRESS.md | Architecture โ CLAUDE.md | Gotchas โ MEMORY.md
No session limit. Every completed session gets a full entry here permanently. When a session in PROGRESS.md is old enough to no longer be active context, move its full entry here.
What gets logged here
- Code sessions โ any session where files were edited. Full detail: what changed, why, how.
- Q&A sessions that produced lasting decisions โ tech stack choices, architectural decisions, strategic direction, important "why we do it this way" answers. Log with a
[Q&A]tag.- Skip โ pure troubleshooting Q&A where nothing was decided and no files changed.
Sessions 1โ8 โ Foundation Build (2026-03-06 โ 2026-03-09)
Session 1 โ 2026-03-06 โ Foundation
- Initial 6-file build:
database.py,models.py,worker.py,dashboard.html,requirements.txt,README.md - Three-thread architecture (A: FastAPI, B: Scraper, C: Telegram C2)
- Basic Playwright scraper, Telegram alerts, seeded DB (keywords, sites, config)
Session 2 โ 2026-03-08 Morning โ Navigation & Bug Fixes
- Dual-mode navigation: Mode A (direct URL) + Mode B (homepage + search discovery)
- Semantic search box discovery (ARIA, placeholder, label, ID heuristics)
- Browser auto-detection: Edge โ Yandex โ Chrome โ Brave โ Chromium
- UI fixes and stability improvements
Session 3 โ 2026-03-08 Afternoon โ Controls & UI
- Browser selector UI in dashboard Settings tab
- Incognito mode toggle, show/hide browser toggle
- Restart button, Kill button
- Four delay controls: launch / site-open / post-search / page-hold
Session 4 โ 2026-03-08 Evening โ Extraction System
- Background-window protection (3 layers preventing Chromium throttle)
- Link extraction fix
- Price/currency extraction system supporting 20 currencies
- Time-left extraction with
time_left_minsas Float (seconds precision) - New DB columns:
price_raw,currency,time_left,time_left_mins,price_updated_at
Session 5 โ 2026-03-08 Late โ Thread D & Sortable UI
- Thread D โ Price refresh loop (isolated event loop, every 5 min)
- Card-anchored DOM extraction for complex sites (HiBid)
- Sortable Price + Time Left columns in listings table
- Urgency colour coding for closing-soon lots
Session 6 โ 2026-03-08 Night โ Anti-Bot Humanisation
- 30+ stealth script patches (WebGL, canvas fingerprint noise, audio fingerprint, battery API, network info, media devices, iframe patching)
- Humanize level system:
raw/low/medium/heavy - Colour-coded humanize buttons in UI
Session 7 โ 2026-03-09 Morning โ Live Countdown
- Live countdown: 1-second ticker + 60-second sync endpoint
time_left_minsprecision fix (seconds-level Float)- JS syntax bug fix (apostrophe in
_humanizeDescsbroke all UI buttons โ use backtick JS literals always)
Session 8 โ 2026-03-09 โ Major Feature Pack (N2/N3/N5/N9/N10/N12/N13/N14/N15)
- N2 โ CAPTCHA solver: 2captcha + CapSolver wired into navigation
- N3 โ Block/rate-limit detection + 30min site cooldown + health tracking
- N5 โ Pagination:
max_pagesper site, universal next-page detection - N9 โ Thread E closing-soon alerts (user-controlled threshold, toggle, only lots with captured time data, 7-day staleness guard)
- N10 โ Multi-channel alerts: Telegram + Discord webhook + SMTP email (simultaneous or single)
- N12 โ PostgreSQL support via
DATABASE_URLenv var (auto-detected in database.py) - N13 โ Site health dashboard column (cooldown timer, error count, last error preview)
- N14 โ Login session support (persistent browser profile, ๐ Login button, pre-scrape session check)
- N15 โ Export: CSV, JSON, HTML cyberpunk report + Database Backup & Restore (download/upload
.db)
URL Validation Fix โ 2026-03-07
- Fixed validator rejecting homepage URLs without
{keyword}placeholder - New rule: block only if neither
{keyword}in URL norsearch_selectorprovided
Sessions 9โ15 โ Feature Expansion (2026-03-10 โ 2026-03-11)
Session 9 โ 2026-03-10 โ N16: AI Smart Filter
- N16 โ AI-powered lot analysis using Groq (free cloud) + Ollama (local, unlimited)
_ai_analyze()dispatcher,_build_ai_prompt(),_ai_call_groq(),_ai_call_ollama()PUT /api/keywords/{id}โ updateweightand/orai_targetPOST /api/ai/testโ test AI verdict in dashboardKeyword.ai_targetโ new column (Text โ natural-language filter description)Listing.ai_match(Integer: 1/0/NULL) +Listing.ai_reason(String 200)- Schema migration: auto-adds
ai_match,ai_reasonto listings;ai_targetto keywords - New config keys:
ai_filter_enabled,ai_provider,ai_model,ai_api_key,ai_base_url - Pipeline: if keyword has ai_target + AI enabled โ AI is judge (score still calculated for display)
Session 10 โ 2026-03-10 โ N17: Auto-Adapter (AI Selector Generator)
- N17 โ AI-powered CSS selector generator using Groq + Ollama
_clean_html_for_ai(raw_html, max_chars=14000)โ strips scripts/styles/SVGs, isolates main content_build_selector_prompt(cleaned_html, site_name)โ token-efficient prompt for 6-selector JSON_generate_selectors_ai(cleaned_html, site_name)โ calls Groq or Ollama, extracts JSON_validate_selectors(page, sel_dict)โ live-tests selectors, computes confidence score 0โ100_extract_with_selectors(page, ss)โ uses stored SiteSelectors rows to extract listingsadapt_site_now(site_id)โ full pipeline: browser โ navigate โ clean HTML โ AI โ validate โ persistSiteSelectorsORM model added:site_id, selector fields,confidence, rates,stale,provider- 3 new API endpoints:
POST/GET/DELETE /api/sites/{id}/adapt|selectors - Dashboard: confidence badges (โฅ70%=green, โฅ40%=orange, <40%=red), ๐ค Adapt / โบ Re-adapt / ร Clear
Session 11 โ 2026-03-10 โ Major Feature Pack (N4/N7/N1/Gmail/Multi-Closing/C2/AI-Debug)
- N4 โ Currency Display: Optional per-session display currency (ISO code). Rates via
frankfurter.appcached 6h.price_usdstored on every listing - N7 โ Price Filters:
min_price/max_priceon Keyword model. Compared in USD via_convert_price(). ๐ฐ button per keyword - N7 โ Location Capture:
locationcolumn on Listing. Extracted from lot cards viaLOC_SELSin JS_EXTRACT - N1 โ Proxy Rotation:
_RoundRobinclass,proxy_enabled/proxy_listconfig keys. Passed to Playwright browser launch - Gmail: Replaced 6-field SMTP with 3 fields:
gmail_address+gmail_app_password+email_to. Hardcodedsmtp.gmail.com:587 - Multi-Interval Closing Alerts:
closing_alert_schedule(comma-separated, e.g.60,30,10,5).0= no countdown.closing_alerts_sentJSON list tracks which thresholds fired - Telegram C2 Commands:
/top5,/sites,/keywords,/alert on|off <kw>,/help - AI Debug Log: In-memory circular buffer (
deque(maxlen=300)).GET /api/ai/debug/log(withsince_id) +DELETE. Dashboard: "๐ง AI Log" tab, filter buttons, colour-coded cards
Session 12 โ 2026-03-10 โ UI/UX Overhaul + DB Cleanup
- Inline keyword + site editing (click to rename/edit values)
- Drag-and-drop reordering (โฎโฎ handles) โ
sort_ordercolumn on Keyword + TargetSite.POST /api/keywords/reorder+POST /api/sites/reorder - Activity Log: search/filter box, scroll-to-bottom, clear. Limit raised 80 โ 200 lines
- AI Debug Log: search/filter across all cards, manual refresh
- Edit Site modal: full โ Edit button exposing all fields including login fields
- Listing detail view: click any listing title โ panel with all captured data. Toggle:
listing_detail_enabled - Batch keyword import: textarea, one keyword per line, optional
keyword:weightsuffix - Currency picker: searchable dropdown, 22 currencies + IQD, โ Raw to clear
- DELETED
closing_alert_sentboolean from ORM โclosing_alerts_sentJSON list is the sole tracker - DELETED dead SMTP seed keys:
email_smtp_host/port/user/pass,email_from,closing_alert_mins
Session 13 โ 2026-03-10 โ Lot Image Extraction + Display
JS_EXTRACTmodule-level constant replaces old inlinepage.evaluate("""..."""). AddsextractImages(root)โ triesdata-src,data-lazy-src,data-original,data-lazy, thenimg.src. Skips tiny icons (<40px). Returns up to 5 URLs per card- Pagination fix:
JS_EXTRACTwas undefined when pagination ran โ both page-1 and paginated pages now use same constant imagescolumn on Listing (TEXT, JSON array, 0โ10 URLs, NULL if none)- All 4
Listing(...)constructors updated to passimages=json.dumps(images_list[:5]) - Thumbnail in listings table (48ร48px, flexbox left of title,
onerrorhide fallback) - Gallery in Listing Detail View (140ร110px clickable thumbnails, flex-wrap, "LOT IMAGES" header)
Session 13b โ 2026-03-10 โ Image Dedup Fix
- Root cause found: HiBid CDN uses same path
img.axdfor ALL images, differentiated only by query params. Old dedup stripped query strings โ all 5 images collapsed to 1 - Fix:
addUrl()now deduplicates by full URL including query string - Apollo cache polling added:
wait_for_function(JS_APOLLO_WAIT, timeout=8000)before extraction
Session 14 โ 2026-03-10 โ Detail-Page Image Fetch
- Root cause: HiBid search results pages have 0
Lot:*entries in Apollo cache โ all images only available on detail pages JS_DETAIL_IMAGESโ 5-layer image extractor: (1) Apollo cache, (2) JSON-LD, (3) OG meta, (4) DOM img tags, (5) srcset. Reusable module-level constantJS_APOLLO_WAITโ reusable Apollo cache readiness poller_fetch_listing_images_batch()โ visits each new listing's detail page immediately after scraping. Saves full-size gallery_price_refresh_passrefactored to use sharedJS_DETAIL_IMAGES/JS_APOLLO_WAITconstants
Session 14b โ 2026-03-10 โ Image Count Edge Cases
[:5]โ[:10]cap in all 4 Listing constructors (raised to match JS extractor)- Removed
> len(existing)guard โ detail page result is always authoritative (full-size, not thumbnail) _price_refresh_pass:img_urls != existing_imgscomparison replaceslen > lenโ catches quality upgrades
Session 15 โ 2026-03-11 โ Tech Stack Research + Strategic Direction
- Researched tech stacks used by OpenRouter, Kling AI, SimilarWeb, Accio
- Ghost Node finalized stack: Python (FastAPI) backend, Next.js + React + TypeScript + Tailwind frontend, PostgreSQL (not MongoDB), Redis (cache/queue)
- Key decisions: no Rust/Go (network I/O bound), no PyTorch/TF yet (inference only), vLLM for production
- Ghost Node vision: auction intelligence layer as SaaS โ no competitor offers this
- Created HANDOFF.md (later merged into CLAUDE.md in Session 20b restructure)
Sessions 16โ20b โ Frontend Migration & Polish (2026-03-11 โ 2026-03-12)
Session 16 โ 2026-03-11 โ Next.js Frontend Migration (COMPLETE)
All 13 tasks complete:
- Node.js v22.14.0 installed portable at
%LOCALAPPDATA%\nodejs-portable\ - Scaffolded
frontend/with Next.js 16.1.6, React 19, TypeScript, Tailwind CSS v4 - Tailwind v4 discovery: CSS variable-based config via
@theme {}inglobals.cssโ NOtailwind.config.ts - Phase 0: Scaffold (next.config.ts, Tailwind v4 theme, types.ts, engineStore, useSSE, layout shell)
- Phase 1: Dashboard tab (StatsGrid, ActivityLog)
- Phase 2: Listings tab (ListingsTable, ListingRow, ListingDetailPanel, ImageGallery, useCountdown)
- Phase 3: Keywords tab (drag-drop via @dnd-kit, inline edit, batch import)
- Phase 4: Sites tab (health badges, AI confidence badge, adapt button, drag-drop)
- Phase 5: Settings tab (all 30+ config keys, backup/restore, Telegram test)
- Phase 6: AI Log tab (live polling, filter buttons, search, AILogCard)
- Phase 7:
output: 'export'โfrontend/out/โ FastAPI StaticFiles mount in worker.py - 21/21 tests passing (Vitest + React Testing Library)
- Backend:
sys.stdout.reconfigure(encoding='utf-8'),โโ->in database.py,\d/\/escapes fixed
Session 17 โ 2026-03-11 โ N6/N8/N11 Feature Pack
- N6 โ Editable Scoring Rules:
ScoringRule(id, signal, delta, category, notes)ORM model. Seeded from old hardcoded lists.calculate_attribute_score()queries DB. 4 new endpoints:GET/POST/PUT/DELETE /api/scoring-rules.ScoringRulesPanelReact component: add/edit/delete inline, colour-coded boosts/penalties, TanStack Query sync - N8 โ Scrape Window + Boost Mode: Engine checks time-of-day before each cycle. Config:
scrape_window_enabled,scrape_start_hour,scrape_end_hour. Overnight windows supported (start > end triggers overnight logic). Boost:boost_interval_minsreplacestimerwhen any lot hastime_left_mins โค 30 - N11 โ Cross-site Dedup (eBay only):
difflib.SequenceMatcher > 0.85before saving any new eBay listing. 24h window, differentsite_name. Only fires when site name/URL contains "ebay"
Session 17b โ 2026-03-11 โ AI-First Architecture + Scoring Toggle
scoring_enabledconfig key: Defaulttrue. Whenfalse, score gate skipped entirely โ AI sole judge. Three-way filter: (1) AI target + ai_filter_enabled โ AI; (2) scoring_enabled=true โ score gate; (3) scoring_enabled=false โ all lots pass- ScoringRulesPanel toggle:
โ ON / โ OFFbutton. When OFF: rules dim, green "โก AI-FIRST MODE ACTIVE" banner - Keywords โ Targets relabelling: Header "TARGETS", column "TARGET LABEL", AI column "AI DESCRIPTION โ ". Guidance text explains search term vs AI description roles
- proxy.ts migration:
middleware.tsโproxy.ts,middleware()โproxy()(Next.js 16 requirement) - 5 new config keys added to CLAUDE.md; ScoringRule added to DB schema
Session 18 โ 2026-03-11 โ Live Reload + AI Log Fix + Auto-Adapt 4 Fixes
- Live reload:
_cycle_now = threading.Event()(thread-safe). All 7 write endpoints call_cycle_now.set(). Scraper polls every 5s instead of single blockingasyncio.sleep(timer)โ wakes immediately on change - AI debug log fix:
AILogFeedunwrapsdata.entries(was storing whole envelope).AILogCardcompletely rewritten with correctRawEntryinterface โcall_type,direction,content,tokens_prompt,tokens_completion,verdict,status_code - Auto-adapt gate fix: Removed
auto_adapt_enabledcheck from manual ADAPT endpoint โ toggle only gates auto-trigger on new site creation, not manual clicks - Auto-adapt max_tokens 300 โ 500: 300 was insufficient for 6-selector JSON (~400โ500 tokens needed)
- Auto-adapt JSON extraction:
_extract_json()4-strategy helper: (1) directjson.loads, (2) strip markdown fences, (3) brace-depth counter, (4) regex fallback - ADAPT button UX:
adaptinglocal state + 45s timer inSiteRow.tsx. Showsโณ ADAPTINGโฆ(gold, disabled). Auto-refetches selectors viaqc.invalidateQuerieson completion
Session 18b โ 2026-03-11 โ Auto-Adapt: Popup Dismiss + CF Bypass
_auto_dismiss_popups(page): 17 known consent-framework selectors (OneTrust, Cookiebot, etc.) + text fallback (Accept, I Agree, OK, etc.). Called twice: after initialgoto()and after Mode B search navigation- Saved profile reuse:
adapt_site_now()checks.browser_profiles/<site_slug>/. If exists and non-empty โlaunch_persistent_context(). CF cookies/login sessions carry over automatically - CF non-headless fallback: CF detected in headless โ auto-retry with
headless=False. Covers ~95% of CF cases with zero user action. Browser briefly appears then auto-closes - Turnstile CAPTCHA solver: Non-headless also hits CF Turnstile + solver configured โ extracts
data-sitekey, calls 2captcha/CapSolver Turnstile endpoint, injects token, waits for CF redirect
Session 19 โ 2026-03-12 โ Image Lightbox + Thumbnail Arrow Navigation
- Lightbox:
createPortal(โฆ, document.body)to escape panelz-50stacking context. Same size/position as detail panel (w-96 h-full fixed right-0 top-0). โ CLOSE,โน PREV / NEXT โบ, dot indicators, backdrop-click-to-close, keyboard (โ/โ/Esc) - Thumbnail strip arrows:
โน/โบbuttons cycleactiveIdxwithout opening lightbox activeIdxsync: Strip arrows and lightbox arrows share same state โ highlighted thumbnail always reflects lightbox viewmountedSSR guard:useEffect(() => setMounted(true), [])โ portal only renders after hydration (document.bodyunavailable during Next.js SSR)
Session 20 โ 2026-03-12 โ Bug Fixes Batch
- Fix: ENGINE OFFLINE:
useSSE.tsusedEventSource('/api/stream')(disabled in static build โ instant 404 โonerrorโ permanent offline). Replaced with 5ssetIntervalpolling of/api/stats - Fix: Settings stuck "Loading settingsโฆ": (1)
fetchConfig()called.map()on dict โ silent TypeError โloadednevertrue. (2)saveConfig()sent[{key,value}]array but backend expects flat dict. Both fixed. Catch callssetConfig({})so form renders on error - Fix: Header version: "v2.5" โ "v2.7" in
Header.tsx - Fix:
serve_dashboard()always serving old HTML: Explicit@app.get("/")was beating StaticFiles mount but returning dashboard.html. Now checksfrontend/out/index.htmlfirst /legacyroute added:GET /legacyin worker.py always servesdashboard.html
Session 20b โ 2026-03-12 โ Fix: /legacy Route + MD File Restructure
- Root cause:
app.mount("/", StaticFiles(html=True))registers as Starlette sub-application that captures ALL paths. SPA fallback returnsindex.htmlwith 200 for unknown paths, shadowing@app.get("/legacy") - Fix: Replaced with
app.mount("/_next", StaticFiles(...))(assets only) +@app.get("/{full_path:path}")3-step catch-all: (a) exact file, (b).htmlmatch, (c)index.htmlSPA fallback - Why this works: FastAPI routes always take priority over parameterised
{path}catch-all./legacyhitsserve_legacy()first - MD restructure: HANDOFF.md deleted (content merged into CLAUDE.md). ARCHIVE.md created. PROGRESS.md and MEMORY.md stripped of session history. Dynamic MD update policy embedded in all files
Session 21 โ 2026-03-12 โ MD Restructure Completion + Q&A Decisions [Q&A + Config]
Code changes:
- CLAUDE.md: Added "Why These Decisions" section under Tech Stack (PostgreSQL, Redis, no Rust/Go, no PyTorch yet, vLLM rationale, Ghost Node unique value). Added tech stack reasoning from deleted HANDOFF.md.
- PROGRESS.md: Stripped all session history (now in ARCHIVE.md). Now contains only: Current State, Pending Features, Known Improvements. Added Priority 4 (Frontend Visual Polish).
- ARCHIVE.md: Added Q&A logging rule. Added this session entry. All session history (1โ21) now lives here permanently.
- HANDOFF.md: Confirmed deleted (was already removed in 20b).
Q&A decisions made this session:
- node_modules 553MB โ Normal for Next.js.
out/is only 1.8MB. Safe to deletenode_modulesโ runnpm installto restore before next build. - Q&A in ARCHIVE.md โ Yes: log Q&A sessions that produced lasting decisions. Skip pure troubleshooting Q&A with no lasting output. Tag with
[Q&A]. - MD update policy finalised โ Dynamic, not static. Each MD file only updated when session touched content relevant to it. Reminder embedded in all MD files.
- Frontend "looks beginner noob" โ Real concern. UI structure is solid (Next.js + Tailwind v4 + components). Needs visual design pass: gradients, shadows, typography, spacing, transitions. Added as Priority 4 in PROGRESS.md.
Full Session Log (All Sessions โ Updated Permanently)
| Date | Session | Key Deliverable |
|---|---|---|
| 2026-03-06 | 1 | Foundation โ full 3-thread architecture |
| 2026-03-07 | fix | URL validation for homepage Mode B |
| 2026-03-08 AM | 2 | Dual-mode navigation, browser auto-detect |
| 2026-03-08 PM | 3 | Controls UI, delay system |
| 2026-03-08 Eve | 4 | Price/time extraction, 20 currencies |
| 2026-03-08 Late | 5 | Thread D, sortable listings |
| 2026-03-08 Night | 6 | 30+ stealth patches, humanize levels |
| 2026-03-09 AM | 7 | Live countdown, JS bug fix |
| 2026-03-09 | 8 | N2+N3+N5+N9+N10+N12+N13+N14+N15 |
| 2026-03-10 | 9 | N16: AI Smart Filter โ Groq + Ollama |
| 2026-03-10 | 10 | N17: Auto-Adapter โ AI CSS selector generator |
| 2026-03-10 | 11 | N4+N7+N1+Gmail+Multi-Interval Closing+Telegram C2+AI Debug |
| 2026-03-10 | 12 | Inline edit, drag-drop, log controls, listing detail, batch import, DB cleanup |
| 2026-03-10 | 13 | Lot image extraction, thumbnail, gallery |
| 2026-03-10 | 13b | HiBid image dedup fix (full URL), Apollo cache polling |
| 2026-03-10 | 14 | Detail-page image fetch (_fetch_listing_images_batch) |
| 2026-03-10 | 14b | Image count edge-cases fixed |
| 2026-03-11 | 15 | Tech stack research, strategic direction, HANDOFF.md |
| 2026-03-11 | 16 | Next.js frontend migration complete (all 13 tasks) |
| 2026-03-11 | 17 | N6/N8/N11: scoring rules, scrape window, eBay dedup |
| 2026-03-11 | 17b | AI-first mode, scoring toggle, KeywordsโTargets relabel |
| 2026-03-11 | 18 | Live reload, AI log fix, auto-adapt 4 fixes, ADAPT UX |
| 2026-03-11 | 18b | Auto-adapt: popup dismiss, CF bypass, Turnstile solver |
| 2026-03-12 | 19 | Image lightbox + thumbnail arrow navigation |
| 2026-03-12 | 20 | ENGINE OFFLINE fix, Settings fix, /legacy route |
| 2026-03-12 | 20b | /legacy routing fix (StaticFiles refactor), MD restructure |
| 2026-03-12 | 21 [Q&A] | MD restructure completion, tech stack reasoning added, Q&A policy set |