39246-vm/docs/ARCHIVE.md
abbashkyt-creator 7d8ce0e322 V0.1
2026-03-14 04:02:22 +03:00

22 KiB
Raw Blame History

๐Ÿ‘ป Ghost Node โ€” Session History Archive

DO NOT load this at the start of a session โ€” it is archaeology, not active reference. Open it only if asked about historical context or old decisions. Current work โ†’ PROGRESS.md | Architecture โ†’ CLAUDE.md | Gotchas โ†’ MEMORY.md

No session limit. Every completed session gets a full entry here permanently. When a session in PROGRESS.md is old enough to no longer be active context, move its full entry here.

What gets logged here

  • Code sessions โ€” any session where files were edited. Full detail: what changed, why, how.
  • Q&A sessions that produced lasting decisions โ€” tech stack choices, architectural decisions, strategic direction, important "why we do it this way" answers. Log with a [Q&A] tag.
  • Skip โ€” pure troubleshooting Q&A where nothing was decided and no files changed.

Sessions 1โ€“8 โ€” Foundation Build (2026-03-06 โ€“ 2026-03-09)

Session 1 โ€” 2026-03-06 โ€” Foundation

  • Initial 6-file build: database.py, models.py, worker.py, dashboard.html, requirements.txt, README.md
  • Three-thread architecture (A: FastAPI, B: Scraper, C: Telegram C2)
  • Basic Playwright scraper, Telegram alerts, seeded DB (keywords, sites, config)

Session 2 โ€” 2026-03-08 Morning โ€” Navigation & Bug Fixes

  • Dual-mode navigation: Mode A (direct URL) + Mode B (homepage + search discovery)
  • Semantic search box discovery (ARIA, placeholder, label, ID heuristics)
  • Browser auto-detection: Edge โ†’ Yandex โ†’ Chrome โ†’ Brave โ†’ Chromium
  • UI fixes and stability improvements

Session 3 โ€” 2026-03-08 Afternoon โ€” Controls & UI

  • Browser selector UI in dashboard Settings tab
  • Incognito mode toggle, show/hide browser toggle
  • Restart button, Kill button
  • Four delay controls: launch / site-open / post-search / page-hold

Session 4 โ€” 2026-03-08 Evening โ€” Extraction System

  • Background-window protection (3 layers preventing Chromium throttle)
  • Link extraction fix
  • Price/currency extraction system supporting 20 currencies
  • Time-left extraction with time_left_mins as Float (seconds precision)
  • New DB columns: price_raw, currency, time_left, time_left_mins, price_updated_at

Session 5 โ€” 2026-03-08 Late โ€” Thread D & Sortable UI

  • Thread D โ€” Price refresh loop (isolated event loop, every 5 min)
  • Card-anchored DOM extraction for complex sites (HiBid)
  • Sortable Price + Time Left columns in listings table
  • Urgency colour coding for closing-soon lots

Session 6 โ€” 2026-03-08 Night โ€” Anti-Bot Humanisation

  • 30+ stealth script patches (WebGL, canvas fingerprint noise, audio fingerprint, battery API, network info, media devices, iframe patching)
  • Humanize level system: raw / low / medium / heavy
  • Colour-coded humanize buttons in UI

Session 7 โ€” 2026-03-09 Morning โ€” Live Countdown

  • Live countdown: 1-second ticker + 60-second sync endpoint
  • time_left_mins precision fix (seconds-level Float)
  • JS syntax bug fix (apostrophe in _humanizeDescs broke all UI buttons โ€” use backtick JS literals always)

Session 8 โ€” 2026-03-09 โ€” Major Feature Pack (N2/N3/N5/N9/N10/N12/N13/N14/N15)

  • N2 โ€” CAPTCHA solver: 2captcha + CapSolver wired into navigation
  • N3 โ€” Block/rate-limit detection + 30min site cooldown + health tracking
  • N5 โ€” Pagination: max_pages per site, universal next-page detection
  • N9 โ€” Thread E closing-soon alerts (user-controlled threshold, toggle, only lots with captured time data, 7-day staleness guard)
  • N10 โ€” Multi-channel alerts: Telegram + Discord webhook + SMTP email (simultaneous or single)
  • N12 โ€” PostgreSQL support via DATABASE_URL env var (auto-detected in database.py)
  • N13 โ€” Site health dashboard column (cooldown timer, error count, last error preview)
  • N14 โ€” Login session support (persistent browser profile, ๐Ÿ”‘ Login button, pre-scrape session check)
  • N15 โ€” Export: CSV, JSON, HTML cyberpunk report + Database Backup & Restore (download/upload .db)

URL Validation Fix โ€” 2026-03-07

  • Fixed validator rejecting homepage URLs without {keyword} placeholder
  • New rule: block only if neither {keyword} in URL nor search_selector provided

Sessions 9โ€“15 โ€” Feature Expansion (2026-03-10 โ€“ 2026-03-11)

Session 9 โ€” 2026-03-10 โ€” N16: AI Smart Filter

  • N16 โ€” AI-powered lot analysis using Groq (free cloud) + Ollama (local, unlimited)
  • _ai_analyze() dispatcher, _build_ai_prompt(), _ai_call_groq(), _ai_call_ollama()
  • PUT /api/keywords/{id} โ€” update weight and/or ai_target
  • POST /api/ai/test โ€” test AI verdict in dashboard
  • Keyword.ai_target โ€” new column (Text โ€” natural-language filter description)
  • Listing.ai_match (Integer: 1/0/NULL) + Listing.ai_reason (String 200)
  • Schema migration: auto-adds ai_match, ai_reason to listings; ai_target to keywords
  • New config keys: ai_filter_enabled, ai_provider, ai_model, ai_api_key, ai_base_url
  • Pipeline: if keyword has ai_target + AI enabled โ†’ AI is judge (score still calculated for display)

Session 10 โ€” 2026-03-10 โ€” N17: Auto-Adapter (AI Selector Generator)

  • N17 โ€” AI-powered CSS selector generator using Groq + Ollama
  • _clean_html_for_ai(raw_html, max_chars=14000) โ€” strips scripts/styles/SVGs, isolates main content
  • _build_selector_prompt(cleaned_html, site_name) โ€” token-efficient prompt for 6-selector JSON
  • _generate_selectors_ai(cleaned_html, site_name) โ€” calls Groq or Ollama, extracts JSON
  • _validate_selectors(page, sel_dict) โ€” live-tests selectors, computes confidence score 0โ€“100
  • _extract_with_selectors(page, ss) โ€” uses stored SiteSelectors rows to extract listings
  • adapt_site_now(site_id) โ€” full pipeline: browser โ†’ navigate โ†’ clean HTML โ†’ AI โ†’ validate โ†’ persist
  • SiteSelectors ORM model added: site_id, selector fields, confidence, rates, stale, provider
  • 3 new API endpoints: POST/GET/DELETE /api/sites/{id}/adapt|selectors
  • Dashboard: confidence badges (โ‰ฅ70%=green, โ‰ฅ40%=orange, <40%=red), ๐Ÿค– Adapt / โ†บ Re-adapt / ร— Clear

Session 11 โ€” 2026-03-10 โ€” Major Feature Pack (N4/N7/N1/Gmail/Multi-Closing/C2/AI-Debug)

  • N4 โ€” Currency Display: Optional per-session display currency (ISO code). Rates via frankfurter.app cached 6h. price_usd stored on every listing
  • N7 โ€” Price Filters: min_price / max_price on Keyword model. Compared in USD via _convert_price(). ๐Ÿ’ฐ button per keyword
  • N7 โ€” Location Capture: location column on Listing. Extracted from lot cards via LOC_SELS in JS_EXTRACT
  • N1 โ€” Proxy Rotation: _RoundRobin class, proxy_enabled / proxy_list config keys. Passed to Playwright browser launch
  • Gmail: Replaced 6-field SMTP with 3 fields: gmail_address + gmail_app_password + email_to. Hardcoded smtp.gmail.com:587
  • Multi-Interval Closing Alerts: closing_alert_schedule (comma-separated, e.g. 60,30,10,5). 0 = no countdown. closing_alerts_sent JSON list tracks which thresholds fired
  • Telegram C2 Commands: /top5, /sites, /keywords, /alert on|off <kw>, /help
  • AI Debug Log: In-memory circular buffer (deque(maxlen=300)). GET /api/ai/debug/log (with since_id) + DELETE. Dashboard: "๐Ÿง  AI Log" tab, filter buttons, colour-coded cards

Session 12 โ€” 2026-03-10 โ€” UI/UX Overhaul + DB Cleanup

  • Inline keyword + site editing (click to rename/edit values)
  • Drag-and-drop reordering (โ‹ฎโ‹ฎ handles) โ€” sort_order column on Keyword + TargetSite. POST /api/keywords/reorder + POST /api/sites/reorder
  • Activity Log: search/filter box, scroll-to-bottom, clear. Limit raised 80 โ†’ 200 lines
  • AI Debug Log: search/filter across all cards, manual refresh
  • Edit Site modal: full โœ Edit button exposing all fields including login fields
  • Listing detail view: click any listing title โ†’ panel with all captured data. Toggle: listing_detail_enabled
  • Batch keyword import: textarea, one keyword per line, optional keyword:weight suffix
  • Currency picker: searchable dropdown, 22 currencies + IQD, โœ• Raw to clear
  • DELETED closing_alert_sent boolean from ORM โ€” closing_alerts_sent JSON list is the sole tracker
  • DELETED dead SMTP seed keys: email_smtp_host/port/user/pass, email_from, closing_alert_mins

Session 13 โ€” 2026-03-10 โ€” Lot Image Extraction + Display

  • JS_EXTRACT module-level constant replaces old inline page.evaluate("""..."""). Adds extractImages(root) โ€” tries data-src, data-lazy-src, data-original, data-lazy, then img.src. Skips tiny icons (<40px). Returns up to 5 URLs per card
  • Pagination fix: JS_EXTRACT was undefined when pagination ran โ€” both page-1 and paginated pages now use same constant
  • images column on Listing (TEXT, JSON array, 0โ€“10 URLs, NULL if none)
  • All 4 Listing(...) constructors updated to pass images=json.dumps(images_list[:5])
  • Thumbnail in listings table (48ร—48px, flexbox left of title, onerror hide fallback)
  • Gallery in Listing Detail View (140ร—110px clickable thumbnails, flex-wrap, "LOT IMAGES" header)

Session 13b โ€” 2026-03-10 โ€” Image Dedup Fix

  • Root cause found: HiBid CDN uses same path img.axd for ALL images, differentiated only by query params. Old dedup stripped query strings โ†’ all 5 images collapsed to 1
  • Fix: addUrl() now deduplicates by full URL including query string
  • Apollo cache polling added: wait_for_function(JS_APOLLO_WAIT, timeout=8000) before extraction

Session 14 โ€” 2026-03-10 โ€” Detail-Page Image Fetch

  • Root cause: HiBid search results pages have 0 Lot:* entries in Apollo cache โ€” all images only available on detail pages
  • JS_DETAIL_IMAGES โ€” 5-layer image extractor: (1) Apollo cache, (2) JSON-LD, (3) OG meta, (4) DOM img tags, (5) srcset. Reusable module-level constant
  • JS_APOLLO_WAIT โ€” reusable Apollo cache readiness poller
  • _fetch_listing_images_batch() โ€” visits each new listing's detail page immediately after scraping. Saves full-size gallery
  • _price_refresh_pass refactored to use shared JS_DETAIL_IMAGES/JS_APOLLO_WAIT constants

Session 14b โ€” 2026-03-10 โ€” Image Count Edge Cases

  • [:5] โ†’ [:10] cap in all 4 Listing constructors (raised to match JS extractor)
  • Removed > len(existing) guard โ€” detail page result is always authoritative (full-size, not thumbnail)
  • _price_refresh_pass: img_urls != existing_imgs comparison replaces len > len โ€” catches quality upgrades

Session 15 โ€” 2026-03-11 โ€” Tech Stack Research + Strategic Direction

  • Researched tech stacks used by OpenRouter, Kling AI, SimilarWeb, Accio
  • Ghost Node finalized stack: Python (FastAPI) backend, Next.js + React + TypeScript + Tailwind frontend, PostgreSQL (not MongoDB), Redis (cache/queue)
  • Key decisions: no Rust/Go (network I/O bound), no PyTorch/TF yet (inference only), vLLM for production
  • Ghost Node vision: auction intelligence layer as SaaS โ€” no competitor offers this
  • Created HANDOFF.md (later merged into CLAUDE.md in Session 20b restructure)

Sessions 16โ€“20b โ€” Frontend Migration & Polish (2026-03-11 โ€“ 2026-03-12)

Session 16 โ€” 2026-03-11 โ€” Next.js Frontend Migration (COMPLETE)

All 13 tasks complete:

  • Node.js v22.14.0 installed portable at %LOCALAPPDATA%\nodejs-portable\
  • Scaffolded frontend/ with Next.js 16.1.6, React 19, TypeScript, Tailwind CSS v4
  • Tailwind v4 discovery: CSS variable-based config via @theme {} in globals.css โ€” NO tailwind.config.ts
  • Phase 0: Scaffold (next.config.ts, Tailwind v4 theme, types.ts, engineStore, useSSE, layout shell)
  • Phase 1: Dashboard tab (StatsGrid, ActivityLog)
  • Phase 2: Listings tab (ListingsTable, ListingRow, ListingDetailPanel, ImageGallery, useCountdown)
  • Phase 3: Keywords tab (drag-drop via @dnd-kit, inline edit, batch import)
  • Phase 4: Sites tab (health badges, AI confidence badge, adapt button, drag-drop)
  • Phase 5: Settings tab (all 30+ config keys, backup/restore, Telegram test)
  • Phase 6: AI Log tab (live polling, filter buttons, search, AILogCard)
  • Phase 7: output: 'export' โ†’ frontend/out/ โ†’ FastAPI StaticFiles mount in worker.py
  • 21/21 tests passing (Vitest + React Testing Library)
  • Backend: sys.stdout.reconfigure(encoding='utf-8'), โ†’ โ†’ -> in database.py, \d/\/ escapes fixed

Session 17 โ€” 2026-03-11 โ€” N6/N8/N11 Feature Pack

  • N6 โ€” Editable Scoring Rules: ScoringRule(id, signal, delta, category, notes) ORM model. Seeded from old hardcoded lists. calculate_attribute_score() queries DB. 4 new endpoints: GET/POST/PUT/DELETE /api/scoring-rules. ScoringRulesPanel React component: add/edit/delete inline, colour-coded boosts/penalties, TanStack Query sync
  • N8 โ€” Scrape Window + Boost Mode: Engine checks time-of-day before each cycle. Config: scrape_window_enabled, scrape_start_hour, scrape_end_hour. Overnight windows supported (start > end triggers overnight logic). Boost: boost_interval_mins replaces timer when any lot has time_left_mins โ‰ค 30
  • N11 โ€” Cross-site Dedup (eBay only): difflib.SequenceMatcher > 0.85 before saving any new eBay listing. 24h window, different site_name. Only fires when site name/URL contains "ebay"

Session 17b โ€” 2026-03-11 โ€” AI-First Architecture + Scoring Toggle

  • scoring_enabled config key: Default true. When false, score gate skipped entirely โ€” AI sole judge. Three-way filter: (1) AI target + ai_filter_enabled โ†’ AI; (2) scoring_enabled=true โ†’ score gate; (3) scoring_enabled=false โ†’ all lots pass
  • ScoringRulesPanel toggle: โ— ON / โ—‹ OFF button. When OFF: rules dim, green "โšก AI-FIRST MODE ACTIVE" banner
  • Keywords โ†’ Targets relabelling: Header "TARGETS", column "TARGET LABEL", AI column "AI DESCRIPTION โ˜…". Guidance text explains search term vs AI description roles
  • proxy.ts migration: middleware.ts โ†’ proxy.ts, middleware() โ†’ proxy() (Next.js 16 requirement)
  • 5 new config keys added to CLAUDE.md; ScoringRule added to DB schema

Session 18 โ€” 2026-03-11 โ€” Live Reload + AI Log Fix + Auto-Adapt 4 Fixes

  • Live reload: _cycle_now = threading.Event() (thread-safe). All 7 write endpoints call _cycle_now.set(). Scraper polls every 5s instead of single blocking asyncio.sleep(timer) โ€” wakes immediately on change
  • AI debug log fix: AILogFeed unwraps data.entries (was storing whole envelope). AILogCard completely rewritten with correct RawEntry interface โ€” call_type, direction, content, tokens_prompt, tokens_completion, verdict, status_code
  • Auto-adapt gate fix: Removed auto_adapt_enabled check from manual ADAPT endpoint โ€” toggle only gates auto-trigger on new site creation, not manual clicks
  • Auto-adapt max_tokens 300 โ†’ 500: 300 was insufficient for 6-selector JSON (~400โ€“500 tokens needed)
  • Auto-adapt JSON extraction: _extract_json() 4-strategy helper: (1) direct json.loads, (2) strip markdown fences, (3) brace-depth counter, (4) regex fallback
  • ADAPT button UX: adapting local state + 45s timer in SiteRow.tsx. Shows โณ ADAPTINGโ€ฆ (gold, disabled). Auto-refetches selectors via qc.invalidateQueries on completion

Session 18b โ€” 2026-03-11 โ€” Auto-Adapt: Popup Dismiss + CF Bypass

  • _auto_dismiss_popups(page): 17 known consent-framework selectors (OneTrust, Cookiebot, etc.) + text fallback (Accept, I Agree, OK, etc.). Called twice: after initial goto() and after Mode B search navigation
  • Saved profile reuse: adapt_site_now() checks .browser_profiles/<site_slug>/. If exists and non-empty โ†’ launch_persistent_context(). CF cookies/login sessions carry over automatically
  • CF non-headless fallback: CF detected in headless โ†’ auto-retry with headless=False. Covers ~95% of CF cases with zero user action. Browser briefly appears then auto-closes
  • Turnstile CAPTCHA solver: Non-headless also hits CF Turnstile + solver configured โ†’ extracts data-sitekey, calls 2captcha/CapSolver Turnstile endpoint, injects token, waits for CF redirect

Session 19 โ€” 2026-03-12 โ€” Image Lightbox + Thumbnail Arrow Navigation

  • Lightbox: createPortal(โ€ฆ, document.body) to escape panel z-50 stacking context. Same size/position as detail panel (w-96 h-full fixed right-0 top-0). โœ• CLOSE, โ€น PREV / NEXT โ€บ, dot indicators, backdrop-click-to-close, keyboard (โ†/โ†’/Esc)
  • Thumbnail strip arrows: โ€น / โ€บ buttons cycle activeIdx without opening lightbox
  • activeIdx sync: Strip arrows and lightbox arrows share same state โ€” highlighted thumbnail always reflects lightbox view
  • mounted SSR guard: useEffect(() => setMounted(true), []) โ€” portal only renders after hydration (document.body unavailable during Next.js SSR)

Session 20 โ€” 2026-03-12 โ€” Bug Fixes Batch

  • Fix: ENGINE OFFLINE: useSSE.ts used EventSource('/api/stream') (disabled in static build โ†’ instant 404 โ†’ onerror โ†’ permanent offline). Replaced with 5s setInterval polling of /api/stats
  • Fix: Settings stuck "Loading settingsโ€ฆ": (1) fetchConfig() called .map() on dict โ†’ silent TypeError โ†’ loaded never true. (2) saveConfig() sent [{key,value}] array but backend expects flat dict. Both fixed. Catch calls setConfig({}) so form renders on error
  • Fix: Header version: "v2.5" โ†’ "v2.7" in Header.tsx
  • Fix: serve_dashboard() always serving old HTML: Explicit @app.get("/") was beating StaticFiles mount but returning dashboard.html. Now checks frontend/out/index.html first
  • /legacy route added: GET /legacy in worker.py always serves dashboard.html

Session 20b โ€” 2026-03-12 โ€” Fix: /legacy Route + MD File Restructure

  • Root cause: app.mount("/", StaticFiles(html=True)) registers as Starlette sub-application that captures ALL paths. SPA fallback returns index.html with 200 for unknown paths, shadowing @app.get("/legacy")
  • Fix: Replaced with app.mount("/_next", StaticFiles(...)) (assets only) + @app.get("/{full_path:path}") 3-step catch-all: (a) exact file, (b) .html match, (c) index.html SPA fallback
  • Why this works: FastAPI routes always take priority over parameterised {path} catch-all. /legacy hits serve_legacy() first
  • MD restructure: HANDOFF.md deleted (content merged into CLAUDE.md). ARCHIVE.md created. PROGRESS.md and MEMORY.md stripped of session history. Dynamic MD update policy embedded in all files

Session 21 โ€” 2026-03-12 โ€” MD Restructure Completion + Q&A Decisions [Q&A + Config]

Code changes:

  • CLAUDE.md: Added "Why These Decisions" section under Tech Stack (PostgreSQL, Redis, no Rust/Go, no PyTorch yet, vLLM rationale, Ghost Node unique value). Added tech stack reasoning from deleted HANDOFF.md.
  • PROGRESS.md: Stripped all session history (now in ARCHIVE.md). Now contains only: Current State, Pending Features, Known Improvements. Added Priority 4 (Frontend Visual Polish).
  • ARCHIVE.md: Added Q&A logging rule. Added this session entry. All session history (1โ€“21) now lives here permanently.
  • HANDOFF.md: Confirmed deleted (was already removed in 20b).

Q&A decisions made this session:

  • node_modules 553MB โ†’ Normal for Next.js. out/ is only 1.8MB. Safe to delete node_modules โ€” run npm install to restore before next build.
  • Q&A in ARCHIVE.md โ†’ Yes: log Q&A sessions that produced lasting decisions. Skip pure troubleshooting Q&A with no lasting output. Tag with [Q&A].
  • MD update policy finalised โ†’ Dynamic, not static. Each MD file only updated when session touched content relevant to it. Reminder embedded in all MD files.
  • Frontend "looks beginner noob" โ†’ Real concern. UI structure is solid (Next.js + Tailwind v4 + components). Needs visual design pass: gradients, shadows, typography, spacing, transitions. Added as Priority 4 in PROGRESS.md.

Full Session Log (All Sessions โ€” Updated Permanently)

Date Session Key Deliverable
2026-03-06 1 Foundation โ€” full 3-thread architecture
2026-03-07 fix URL validation for homepage Mode B
2026-03-08 AM 2 Dual-mode navigation, browser auto-detect
2026-03-08 PM 3 Controls UI, delay system
2026-03-08 Eve 4 Price/time extraction, 20 currencies
2026-03-08 Late 5 Thread D, sortable listings
2026-03-08 Night 6 30+ stealth patches, humanize levels
2026-03-09 AM 7 Live countdown, JS bug fix
2026-03-09 8 N2+N3+N5+N9+N10+N12+N13+N14+N15
2026-03-10 9 N16: AI Smart Filter โ€” Groq + Ollama
2026-03-10 10 N17: Auto-Adapter โ€” AI CSS selector generator
2026-03-10 11 N4+N7+N1+Gmail+Multi-Interval Closing+Telegram C2+AI Debug
2026-03-10 12 Inline edit, drag-drop, log controls, listing detail, batch import, DB cleanup
2026-03-10 13 Lot image extraction, thumbnail, gallery
2026-03-10 13b HiBid image dedup fix (full URL), Apollo cache polling
2026-03-10 14 Detail-page image fetch (_fetch_listing_images_batch)
2026-03-10 14b Image count edge-cases fixed
2026-03-11 15 Tech stack research, strategic direction, HANDOFF.md
2026-03-11 16 Next.js frontend migration complete (all 13 tasks)
2026-03-11 17 N6/N8/N11: scoring rules, scrape window, eBay dedup
2026-03-11 17b AI-first mode, scoring toggle, Keywordsโ†’Targets relabel
2026-03-11 18 Live reload, AI log fix, auto-adapt 4 fixes, ADAPT UX
2026-03-11 18b Auto-adapt: popup dismiss, CF bypass, Turnstile solver
2026-03-12 19 Image lightbox + thumbnail arrow navigation
2026-03-12 20 ENGINE OFFLINE fix, Settings fix, /legacy route
2026-03-12 20b /legacy routing fix (StaticFiles refactor), MD restructure
2026-03-12 21 [Q&A] MD restructure completion, tech stack reasoning added, Q&A policy set