abbashkyt-creator 7d8ce0e322 V0.1

2026-03-14 04:02:22 +03:00

22 KiB

Raw Blame History

👻 Ghost Node — Session History Archive

DO NOT load this at the start of a session — it is archaeology, not active reference. Open it only if asked about historical context or old decisions. Current work → PROGRESS.md | Architecture → CLAUDE.md | Gotchas → MEMORY.md

No session limit. Every completed session gets a full entry here permanently. When a session in PROGRESS.md is old enough to no longer be active context, move its full entry here.

What gets logged here

Code sessions — any session where files were edited. Full detail: what changed, why, how.

Q&A sessions that produced lasting decisions — tech stack choices, architectural decisions, strategic direction, important "why we do it this way" answers. Log with a [Q&A] tag.

Skip — pure troubleshooting Q&A where nothing was decided and no files changed.

Sessions 1–8 — Foundation Build (2026-03-06 – 2026-03-09)

Session 1 — 2026-03-06 — Foundation

Initial 6-file build: database.py, models.py, worker.py, dashboard.html, requirements.txt, README.md
Three-thread architecture (A: FastAPI, B: Scraper, C: Telegram C2)
Basic Playwright scraper, Telegram alerts, seeded DB (keywords, sites, config)

Dual-mode navigation: Mode A (direct URL) + Mode B (homepage + search discovery)
Semantic search box discovery (ARIA, placeholder, label, ID heuristics)
Browser auto-detection: Edge → Yandex → Chrome → Brave → Chromium
UI fixes and stability improvements

Session 3 — 2026-03-08 Afternoon — Controls & UI

Browser selector UI in dashboard Settings tab
Incognito mode toggle, show/hide browser toggle
Restart button, Kill button
Four delay controls: launch / site-open / post-search / page-hold

Session 4 — 2026-03-08 Evening — Extraction System

Background-window protection (3 layers preventing Chromium throttle)
Link extraction fix
Price/currency extraction system supporting 20 currencies
Time-left extraction with time_left_mins as Float (seconds precision)
New DB columns: price_raw, currency, time_left, time_left_mins, price_updated_at

Session 5 — 2026-03-08 Late — Thread D & Sortable UI

Thread D — Price refresh loop (isolated event loop, every 5 min)
Card-anchored DOM extraction for complex sites (HiBid)
Sortable Price + Time Left columns in listings table
Urgency colour coding for closing-soon lots

Session 6 — 2026-03-08 Night — Anti-Bot Humanisation

30+ stealth script patches (WebGL, canvas fingerprint noise, audio fingerprint, battery API, network info, media devices, iframe patching)
Humanize level system: raw / low / medium / heavy
Colour-coded humanize buttons in UI

Session 7 — 2026-03-09 Morning — Live Countdown

Live countdown: 1-second ticker + 60-second sync endpoint
time_left_mins precision fix (seconds-level Float)
JS syntax bug fix (apostrophe in _humanizeDescs broke all UI buttons — use backtick JS literals always)

Session 8 — 2026-03-09 — Major Feature Pack (N2/N3/N5/N9/N10/N12/N13/N14/N15)

N2 — CAPTCHA solver: 2captcha + CapSolver wired into navigation
N3 — Block/rate-limit detection + 30min site cooldown + health tracking
N5 — Pagination: max_pages per site, universal next-page detection
N9 — Thread E closing-soon alerts (user-controlled threshold, toggle, only lots with captured time data, 7-day staleness guard)
N10 — Multi-channel alerts: Telegram + Discord webhook + SMTP email (simultaneous or single)
N12 — PostgreSQL support via DATABASE_URL env var (auto-detected in database.py)
N13 — Site health dashboard column (cooldown timer, error count, last error preview)
N14 — Login session support (persistent browser profile, 🔑 Login button, pre-scrape session check)
N15 — Export: CSV, JSON, HTML cyberpunk report + Database Backup & Restore (download/upload .db)

URL Validation Fix — 2026-03-07

Fixed validator rejecting homepage URLs without {keyword} placeholder
New rule: block only if neither {keyword} in URL nor search_selector provided

Sessions 9–15 — Feature Expansion (2026-03-10 – 2026-03-11)

Session 9 — 2026-03-10 — N16: AI Smart Filter

N16 — AI-powered lot analysis using Groq (free cloud) + Ollama (local, unlimited)
_ai_analyze() dispatcher, _build_ai_prompt(), _ai_call_groq(), _ai_call_ollama()
PUT /api/keywords/{id} — update weight and/or ai_target
POST /api/ai/test — test AI verdict in dashboard
Keyword.ai_target — new column (Text — natural-language filter description)
Listing.ai_match (Integer: 1/0/NULL) + Listing.ai_reason (String 200)
Schema migration: auto-adds ai_match, ai_reason to listings; ai_target to keywords
New config keys: ai_filter_enabled, ai_provider, ai_model, ai_api_key, ai_base_url
Pipeline: if keyword has ai_target + AI enabled → AI is judge (score still calculated for display)

Session 10 — 2026-03-10 — N17: Auto-Adapter (AI Selector Generator)

N17 — AI-powered CSS selector generator using Groq + Ollama
_clean_html_for_ai(raw_html, max_chars=14000) — strips scripts/styles/SVGs, isolates main content
_build_selector_prompt(cleaned_html, site_name) — token-efficient prompt for 6-selector JSON
_generate_selectors_ai(cleaned_html, site_name) — calls Groq or Ollama, extracts JSON
_validate_selectors(page, sel_dict) — live-tests selectors, computes confidence score 0–100
_extract_with_selectors(page, ss) — uses stored SiteSelectors rows to extract listings
adapt_site_now(site_id) — full pipeline: browser → navigate → clean HTML → AI → validate → persist
SiteSelectors ORM model added: site_id, selector fields, confidence, rates, stale, provider
3 new API endpoints: POST/GET/DELETE /api/sites/{id}/adapt|selectors
Dashboard: confidence badges (≥70%=green, ≥40%=orange, <40%=red), 🤖 Adapt / ↺ Re-adapt / × Clear

Session 11 — 2026-03-10 — Major Feature Pack (N4/N7/N1/Gmail/Multi-Closing/C2/AI-Debug)

N4 — Currency Display: Optional per-session display currency (ISO code). Rates via frankfurter.app cached 6h. price_usd stored on every listing
N7 — Price Filters: min_price / max_price on Keyword model. Compared in USD via _convert_price(). 💰 button per keyword
N7 — Location Capture: location column on Listing. Extracted from lot cards via LOC_SELS in JS_EXTRACT
N1 — Proxy Rotation: _RoundRobin class, proxy_enabled / proxy_list config keys. Passed to Playwright browser launch
Gmail: Replaced 6-field SMTP with 3 fields: gmail_address + gmail_app_password + email_to. Hardcoded smtp.gmail.com:587
Multi-Interval Closing Alerts: closing_alert_schedule (comma-separated, e.g. 60,30,10,5). 0 = no countdown. closing_alerts_sent JSON list tracks which thresholds fired
Telegram C2 Commands: /top5, /sites, /keywords, /alert on|off <kw>, /help
AI Debug Log: In-memory circular buffer (deque(maxlen=300)). GET /api/ai/debug/log (with since_id) + DELETE. Dashboard: "🧠 AI Log" tab, filter buttons, colour-coded cards

Session 12 — 2026-03-10 — UI/UX Overhaul + DB Cleanup

Inline keyword + site editing (click to rename/edit values)
Drag-and-drop reordering (⋮⋮ handles) — sort_order column on Keyword + TargetSite. POST /api/keywords/reorder + POST /api/sites/reorder
Activity Log: search/filter box, scroll-to-bottom, clear. Limit raised 80 → 200 lines
AI Debug Log: search/filter across all cards, manual refresh
Edit Site modal: full ✏ Edit button exposing all fields including login fields
Listing detail view: click any listing title → panel with all captured data. Toggle: listing_detail_enabled
Batch keyword import: textarea, one keyword per line, optional keyword:weight suffix
Currency picker: searchable dropdown, 22 currencies + IQD, ✕ Raw to clear
DELETED closing_alert_sent boolean from ORM — closing_alerts_sent JSON list is the sole tracker
DELETED dead SMTP seed keys: email_smtp_host/port/user/pass, email_from, closing_alert_mins

Session 13 — 2026-03-10 — Lot Image Extraction + Display

JS_EXTRACT module-level constant replaces old inline page.evaluate("""..."""). Adds extractImages(root) — tries data-src, data-lazy-src, data-original, data-lazy, then img.src. Skips tiny icons (<40px). Returns up to 5 URLs per card
Pagination fix: JS_EXTRACT was undefined when pagination ran — both page-1 and paginated pages now use same constant
images column on Listing (TEXT, JSON array, 0–10 URLs, NULL if none)
All 4 Listing(...) constructors updated to pass images=json.dumps(images_list[:5])
Thumbnail in listings table (48×48px, flexbox left of title, onerror hide fallback)
Gallery in Listing Detail View (140×110px clickable thumbnails, flex-wrap, "LOT IMAGES" header)

Session 13b — 2026-03-10 — Image Dedup Fix

Root cause found: HiBid CDN uses same path img.axd for ALL images, differentiated only by query params. Old dedup stripped query strings → all 5 images collapsed to 1
Fix: addUrl() now deduplicates by full URL including query string
Apollo cache polling added: wait_for_function(JS_APOLLO_WAIT, timeout=8000) before extraction

Session 14 — 2026-03-10 — Detail-Page Image Fetch

Root cause: HiBid search results pages have 0 Lot:* entries in Apollo cache — all images only available on detail pages
JS_DETAIL_IMAGES — 5-layer image extractor: (1) Apollo cache, (2) JSON-LD, (3) OG meta, (4) DOM img tags, (5) srcset. Reusable module-level constant
JS_APOLLO_WAIT — reusable Apollo cache readiness poller
_fetch_listing_images_batch() — visits each new listing's detail page immediately after scraping. Saves full-size gallery
_price_refresh_pass refactored to use shared JS_DETAIL_IMAGES/JS_APOLLO_WAIT constants

Session 14b — 2026-03-10 — Image Count Edge Cases

[:5] → [:10] cap in all 4 Listing constructors (raised to match JS extractor)
Removed > len(existing) guard — detail page result is always authoritative (full-size, not thumbnail)
_price_refresh_pass: img_urls != existing_imgs comparison replaces len > len — catches quality upgrades

Session 15 — 2026-03-11 — Tech Stack Research + Strategic Direction

Researched tech stacks used by OpenRouter, Kling AI, SimilarWeb, Accio
Ghost Node finalized stack: Python (FastAPI) backend, Next.js + React + TypeScript + Tailwind frontend, PostgreSQL (not MongoDB), Redis (cache/queue)
Key decisions: no Rust/Go (network I/O bound), no PyTorch/TF yet (inference only), vLLM for production
Ghost Node vision: auction intelligence layer as SaaS — no competitor offers this
Created HANDOFF.md (later merged into CLAUDE.md in Session 20b restructure)

Sessions 16–20b — Frontend Migration & Polish (2026-03-11 – 2026-03-12)

Session 16 — 2026-03-11 — Next.js Frontend Migration (COMPLETE)

All 13 tasks complete:

Node.js v22.14.0 installed portable at %LOCALAPPDATA%\nodejs-portable\
Scaffolded frontend/ with Next.js 16.1.6, React 19, TypeScript, Tailwind CSS v4
Tailwind v4 discovery: CSS variable-based config via @theme {} in globals.css — NO tailwind.config.ts
Phase 0: Scaffold (next.config.ts, Tailwind v4 theme, types.ts, engineStore, useSSE, layout shell)
Phase 1: Dashboard tab (StatsGrid, ActivityLog)
Phase 2: Listings tab (ListingsTable, ListingRow, ListingDetailPanel, ImageGallery, useCountdown)
Phase 3: Keywords tab (drag-drop via @dnd-kit, inline edit, batch import)
Phase 4: Sites tab (health badges, AI confidence badge, adapt button, drag-drop)
Phase 5: Settings tab (all 30+ config keys, backup/restore, Telegram test)
Phase 6: AI Log tab (live polling, filter buttons, search, AILogCard)
Phase 7: output: 'export' → frontend/out/ → FastAPI StaticFiles mount in worker.py
21/21 tests passing (Vitest + React Testing Library)
Backend: sys.stdout.reconfigure(encoding='utf-8'), → → -> in database.py, \d/\/ escapes fixed

Session 17 — 2026-03-11 — N6/N8/N11 Feature Pack

N6 — Editable Scoring Rules: ScoringRule(id, signal, delta, category, notes) ORM model. Seeded from old hardcoded lists. calculate_attribute_score() queries DB. 4 new endpoints: GET/POST/PUT/DELETE /api/scoring-rules. ScoringRulesPanel React component: add/edit/delete inline, colour-coded boosts/penalties, TanStack Query sync
N8 — Scrape Window + Boost Mode: Engine checks time-of-day before each cycle. Config: scrape_window_enabled, scrape_start_hour, scrape_end_hour. Overnight windows supported (start > end triggers overnight logic). Boost: boost_interval_mins replaces timer when any lot has time_left_mins ≤ 30
N11 — Cross-site Dedup (eBay only): difflib.SequenceMatcher > 0.85 before saving any new eBay listing. 24h window, different site_name. Only fires when site name/URL contains "ebay"

Session 17b — 2026-03-11 — AI-First Architecture + Scoring Toggle

scoring_enabled config key: Default true. When false, score gate skipped entirely — AI sole judge. Three-way filter: (1) AI target + ai_filter_enabled → AI; (2) scoring_enabled=true → score gate; (3) scoring_enabled=false → all lots pass
ScoringRulesPanel toggle: ● ON / ○ OFF button. When OFF: rules dim, green "⚡ AI-FIRST MODE ACTIVE" banner
Keywords → Targets relabelling: Header "TARGETS", column "TARGET LABEL", AI column "AI DESCRIPTION ★". Guidance text explains search term vs AI description roles
proxy.ts migration: middleware.ts → proxy.ts, middleware() → proxy() (Next.js 16 requirement)
5 new config keys added to CLAUDE.md; ScoringRule added to DB schema

Session 18 — 2026-03-11 — Live Reload + AI Log Fix + Auto-Adapt 4 Fixes

Live reload: _cycle_now = threading.Event() (thread-safe). All 7 write endpoints call _cycle_now.set(). Scraper polls every 5s instead of single blocking asyncio.sleep(timer) — wakes immediately on change
AI debug log fix: AILogFeed unwraps data.entries (was storing whole envelope). AILogCard completely rewritten with correct RawEntry interface — call_type, direction, content, tokens_prompt, tokens_completion, verdict, status_code
Auto-adapt gate fix: Removed auto_adapt_enabled check from manual ADAPT endpoint — toggle only gates auto-trigger on new site creation, not manual clicks
Auto-adapt max_tokens 300 → 500: 300 was insufficient for 6-selector JSON (~400–500 tokens needed)
Auto-adapt JSON extraction: _extract_json() 4-strategy helper: (1) direct json.loads, (2) strip markdown fences, (3) brace-depth counter, (4) regex fallback
ADAPT button UX: adapting local state + 45s timer in SiteRow.tsx. Shows ⏳ ADAPTING… (gold, disabled). Auto-refetches selectors via qc.invalidateQueries on completion

_auto_dismiss_popups(page): 17 known consent-framework selectors (OneTrust, Cookiebot, etc.) + text fallback (Accept, I Agree, OK, etc.). Called twice: after initial goto() and after Mode B search navigation
Saved profile reuse: adapt_site_now() checks .browser_profiles/<site_slug>/. If exists and non-empty → launch_persistent_context(). CF cookies/login sessions carry over automatically
CF non-headless fallback: CF detected in headless → auto-retry with headless=False. Covers ~95% of CF cases with zero user action. Browser briefly appears then auto-closes
Turnstile CAPTCHA solver: Non-headless also hits CF Turnstile + solver configured → extracts data-sitekey, calls 2captcha/CapSolver Turnstile endpoint, injects token, waits for CF redirect

Lightbox: createPortal(…, document.body) to escape panel z-50 stacking context. Same size/position as detail panel (w-96 h-full fixed right-0 top-0). ✕ CLOSE, ‹ PREV / NEXT ›, dot indicators, backdrop-click-to-close, keyboard (←/→/Esc)
Thumbnail strip arrows: ‹ / › buttons cycle activeIdx without opening lightbox
activeIdx sync: Strip arrows and lightbox arrows share same state — highlighted thumbnail always reflects lightbox view
mounted SSR guard: useEffect(() => setMounted(true), []) — portal only renders after hydration (document.body unavailable during Next.js SSR)

Session 20 — 2026-03-12 — Bug Fixes Batch

Fix: ENGINE OFFLINE: useSSE.ts used EventSource('/api/stream') (disabled in static build → instant 404 → onerror → permanent offline). Replaced with 5s setInterval polling of /api/stats
Fix: Settings stuck "Loading settings…": (1) fetchConfig() called .map() on dict → silent TypeError → loaded never true. (2) saveConfig() sent [{key,value}] array but backend expects flat dict. Both fixed. Catch calls setConfig({}) so form renders on error
Fix: Header version: "v2.5" → "v2.7" in Header.tsx
Fix: serve_dashboard() always serving old HTML: Explicit @app.get("/") was beating StaticFiles mount but returning dashboard.html. Now checks frontend/out/index.html first
/legacy route added: GET /legacy in worker.py always serves dashboard.html

Session 20b — 2026-03-12 — Fix: /legacy Route + MD File Restructure

Root cause: app.mount("/", StaticFiles(html=True)) registers as Starlette sub-application that captures ALL paths. SPA fallback returns index.html with 200 for unknown paths, shadowing @app.get("/legacy")
Fix: Replaced with app.mount("/_next", StaticFiles(...)) (assets only) + @app.get("/{full_path:path}") 3-step catch-all: (a) exact file, (b) .html match, (c) index.html SPA fallback
Why this works: FastAPI routes always take priority over parameterised {path} catch-all. /legacy hits serve_legacy() first
MD restructure: HANDOFF.md deleted (content merged into CLAUDE.md). ARCHIVE.md created. PROGRESS.md and MEMORY.md stripped of session history. Dynamic MD update policy embedded in all files

Session 21 — 2026-03-12 — MD Restructure Completion + Q&A Decisions [Q&A + Config]

Code changes:

CLAUDE.md: Added "Why These Decisions" section under Tech Stack (PostgreSQL, Redis, no Rust/Go, no PyTorch yet, vLLM rationale, Ghost Node unique value). Added tech stack reasoning from deleted HANDOFF.md.
PROGRESS.md: Stripped all session history (now in ARCHIVE.md). Now contains only: Current State, Pending Features, Known Improvements. Added Priority 4 (Frontend Visual Polish).
ARCHIVE.md: Added Q&A logging rule. Added this session entry. All session history (1–21) now lives here permanently.
HANDOFF.md: Confirmed deleted (was already removed in 20b).

Q&A decisions made this session:

node_modules 553MB → Normal for Next.js. out/ is only 1.8MB. Safe to delete node_modules — run npm install to restore before next build.
Q&A in ARCHIVE.md → Yes: log Q&A sessions that produced lasting decisions. Skip pure troubleshooting Q&A with no lasting output. Tag with [Q&A].
MD update policy finalised → Dynamic, not static. Each MD file only updated when session touched content relevant to it. Reminder embedded in all MD files.
Frontend "looks beginner noob" → Real concern. UI structure is solid (Next.js + Tailwind v4 + components). Needs visual design pass: gradients, shadows, typography, spacing, transitions. Added as Priority 4 in PROGRESS.md.

Full Session Log (All Sessions — Updated Permanently)

Date	Session	Key Deliverable
2026-03-06	1	Foundation — full 3-thread architecture
2026-03-07	fix	URL validation for homepage Mode B
2026-03-08 AM	2	Dual-mode navigation, browser auto-detect
2026-03-08 PM	3	Controls UI, delay system
2026-03-08 Eve	4	Price/time extraction, 20 currencies
2026-03-08 Late	5	Thread D, sortable listings
2026-03-08 Night	6	30+ stealth patches, humanize levels
2026-03-09 AM	7	Live countdown, JS bug fix
2026-03-09	8	N2+N3+N5+N9+N10+N12+N13+N14+N15
2026-03-10	9	N16: AI Smart Filter — Groq + Ollama
2026-03-10	10	N17: Auto-Adapter — AI CSS selector generator
2026-03-10	11	N4+N7+N1+Gmail+Multi-Interval Closing+Telegram C2+AI Debug
2026-03-10	12	Inline edit, drag-drop, log controls, listing detail, batch import, DB cleanup
2026-03-10	13	Lot image extraction, thumbnail, gallery
2026-03-10	13b	HiBid image dedup fix (full URL), Apollo cache polling
2026-03-10	14	Detail-page image fetch (`_fetch_listing_images_batch`)
2026-03-10	14b	Image count edge-cases fixed
2026-03-11	15	Tech stack research, strategic direction, HANDOFF.md
2026-03-11	16	Next.js frontend migration complete (all 13 tasks)
2026-03-11	17	N6/N8/N11: scoring rules, scrape window, eBay dedup
2026-03-11	17b	AI-first mode, scoring toggle, Keywords→Targets relabel
2026-03-11	18	Live reload, AI log fix, auto-adapt 4 fixes, ADAPT UX
2026-03-11	18b	Auto-adapt: popup dismiss, CF bypass, Turnstile solver
2026-03-12	19	Image lightbox + thumbnail arrow navigation
2026-03-12	20	ENGINE OFFLINE fix, Settings fix, /legacy route
2026-03-12	20b	/legacy routing fix (StaticFiles refactor), MD restructure
2026-03-12	21 [Q&A]	MD restructure completion, tech stack reasoning added, Q&A policy set

22 KiB Raw Blame History Unescape Escape

👻 Ghost Node — Session History Archive

What gets logged here

Sessions 1–8 — Foundation Build (2026-03-06 – 2026-03-09)

Session 1 — 2026-03-06 — Foundation

Session 2 — 2026-03-08 Morning — Navigation & Bug Fixes

Session 3 — 2026-03-08 Afternoon — Controls & UI

Session 4 — 2026-03-08 Evening — Extraction System

Session 5 — 2026-03-08 Late — Thread D & Sortable UI

Session 6 — 2026-03-08 Night — Anti-Bot Humanisation

Session 7 — 2026-03-09 Morning — Live Countdown

Session 8 — 2026-03-09 — Major Feature Pack (N2/N3/N5/N9/N10/N12/N13/N14/N15)

URL Validation Fix — 2026-03-07

Sessions 9–15 — Feature Expansion (2026-03-10 – 2026-03-11)

Session 9 — 2026-03-10 — N16: AI Smart Filter

Session 10 — 2026-03-10 — N17: Auto-Adapter (AI Selector Generator)

Session 11 — 2026-03-10 — Major Feature Pack (N4/N7/N1/Gmail/Multi-Closing/C2/AI-Debug)

Session 12 — 2026-03-10 — UI/UX Overhaul + DB Cleanup

Session 13 — 2026-03-10 — Lot Image Extraction + Display

Session 13b — 2026-03-10 — Image Dedup Fix

Session 14 — 2026-03-10 — Detail-Page Image Fetch

Session 14b — 2026-03-10 — Image Count Edge Cases

Session 15 — 2026-03-11 — Tech Stack Research + Strategic Direction

Sessions 16–20b — Frontend Migration & Polish (2026-03-11 – 2026-03-12)

Session 16 — 2026-03-11 — Next.js Frontend Migration (COMPLETE)

Session 17 — 2026-03-11 — N6/N8/N11 Feature Pack

Session 17b — 2026-03-11 — AI-First Architecture + Scoring Toggle

Session 18 — 2026-03-11 — Live Reload + AI Log Fix + Auto-Adapt 4 Fixes

Session 18b — 2026-03-11 — Auto-Adapt: Popup Dismiss + CF Bypass

Session 19 — 2026-03-12 — Image Lightbox + Thumbnail Arrow Navigation

Session 20 — 2026-03-12 — Bug Fixes Batch

Session 20b — 2026-03-12 — Fix: /legacy Route + MD File Restructure

Session 21 — 2026-03-12 — MD Restructure Completion + Q&A Decisions [Q&A + Config]

Full Session Log (All Sessions — Updated Permanently)

22 KiB

Raw Blame History