39246-vm/docs/CLAUDE.md
abbashkyt-creator 7d8ce0e322 V0.1
2026-03-14 04:02:22 +03:00

26 KiB
Raw Blame History

πŸ‘» Ghost Node β€” Claude Project Briefing

Read this first at the start of every session. This file is the single source of truth for project context, architecture, and rules. Last updated: 2026-03-12 (Session 20b β€” /legacy routing fix, MD restructure)


⚑ Right Now

  • Version: v2.7 | Status: βœ… Fully operational β€” 21/21 tests passing
  • Frontend: Next.js (React 19 + Tailwind v4) β€” static build at frontend/out/, served by FastAPI
  • Last session: 27 (2026-03-13) β€” Premium frontend redesign: ambient background, glassmorphism, gradient glow cards, Framer Motion animations, badge system
  • Next priority: Redis Cache Layer β†’ Docker Compose β†’ Lot Description Extraction
  • URLs: http://localhost:8000 (React UI) | http://localhost:8000/legacy (old HTML dashboard)

For AI Assistants β€” How to Use This Project

Read Order at Session Start

  1. This file (CLAUDE.md) β€” architecture, rules, schema, all config keys, all endpoints. Read fully.
  2. PROGRESS.md β€” recent sessions (15+) + pending features. Read fully.
  3. MEMORY.md β€” key gotchas, dev paths, tech patterns. Scan for relevant sections.
  4. Abbas.md β€” about you: what the AI learns/discovers about Abbas (portable; use in any project or conversation). Scan so responses fit you.
  5. FEEDBACK.md β€” project-specific notes about Abbas for Ghost Node. Scan when relevant.
  6. ERROR.md β€” error log (errors + fixes). Open when debugging or when the same error might recur.
  7. ARCHIVE.md β€” sessions 1–14. Do NOT load at session start. Open only if asked about old history.

How to Run the Program

  1. python worker.py β€” starts all 5 threads, FastAPI on port 8000
  2. Visit http://localhost:8000 β€” new React dashboard
  3. Visit http://localhost:8000/legacy β€” old HTML dashboard (always available as fallback)
  4. To build/test frontend: use Node.js (default installer path C:\Program Files\nodejs\ β€” usually on PATH). If using portable Node instead, set PATH first:
    • PowerShell: $env:PATH = "C:\Program Files\nodejs;$env:PATH" (installer) or $env:PATH = "C:\Users\Abbas\AppData\Local\nodejs-portable\node-v22.14.0-win-x64;$env:PATH" (portable)
    • Bash: export PATH="/c/Program Files/nodejs:$PATH" or portable path as before
  5. Build frontend: npm run build --prefix frontend (run from project root)
  6. Run tests: cd frontend && npx vitest run
  7. Both: export PATH="..." && cd frontend && npx vitest run && npm run build

Your Role

You are the lead developer continuing Ghost Node. Abbas is the project owner β€” sharp, direct, knows exactly what he wants. Your job: implement features, fix bugs, keep everything working.

Abbas's communication style: Direct. Minimal punctuation. Expects you to understand intent from brief messages. When he asks for a change, just do it efficiently without over-explaining.

Rules for AI

  • Never regenerate files from scratch β€” always read the existing file, then edit it
  • Ask before architectural changes that touch multiple files
  • Read the relevant worker.py section before editing it β€” it's 5,000+ lines, navigation matters
  • worker.py is always the definitive version β€” it may be ahead of any exports or summaries
  • After any frontend change: rebuild (npm run build --prefix frontend) and confirm tests still pass
  • After any backend change: check that the relevant API endpoint still returns expected data
  • MD files: update dynamically β€” see Session Rules section for which file to update when

Project Identity

  • Name: Ghost Node β€” International Auction Sniper
  • Version: v2.7
  • Owner: Abbas (tvhomeb3@gmail.com) β€” 20-year-old developer, Baghdad, Iraq
  • Goal: Automated lot monitoring across global auction sites with real-time multi-channel alerts, human-mimicry anti-bot engine, and self-healing site health system
  • Platform: Windows 11 primary, Linux compatible
  • Language: Python 3.10+
  • Vision: Evolve from personal tool into SaaS β€” the auction intelligence layer that watches every auction site, scores deals, predicts closing behaviour, and tells users when and where to bid. No competitor offers this.

File Roles (Always Use Uploaded Files β€” Never Regenerate From Scratch)

File Role Size
worker.py ALL logic β€” 5 threads, 42 API endpoints, scraper engine, stealth, CAPTCHA, alerts, export, login, backup, AI filter, Auto-Adapter, N4/N7/N8/N11/Proxy/Gmail/C2, reorder, JS_EXTRACT, JS_DETAIL_IMAGES, JS_APOLLO_WAIT, _fetch_listing_images_batch 5,000+ lines
models.py ORM models (Listing, Keyword, TargetSite, SiteSelectors, Config, ScoringRule), heuristic scorer, schema migration, DB seeder ~420 lines
database.py SQLAlchemy engine β€” SQLite (default) + PostgreSQL (via DATABASE_URL) ~76 lines
dashboard.html Legacy single-file UI β€” still serves as fallback if frontend/out/ deleted 3,515 lines
frontend/ Current UI β€” Next.js 16 + React 19 + TypeScript + Tailwind v4. out/ = static build served by FastAPI β€”
requirements.txt pip dependencies minimal
setup.bat One-click Windows setup β€”
restart.bat Auto-restart wrapper called by API β€”
CLAUDE.md This file β€” architecture, rules, schema, endpoints. Single source of truth β€”
PROGRESS.md Recent sessions (15+), pending features, known improvements β€”
MEMORY.md Key gotchas, dev paths, frontend architecture state β€”
Abbas.md About Abbas β€” what the AI learns/discovers about you (general); portable, use in any project or conversation β€”
FEEDBACK.md Project-specific notes about Abbas for Ghost Node (env, decisions here) β€”
ERROR.md Error log β€” errors we hit + fix; newest first. Use when same error might recur β€”
ARCHIVE.md Sessions 1–14 history. Do not read at session start β€”

Architecture β€” 5 Threads

worker.py
β”œβ”€β”€ Thread A  β€” FastAPI + Dashboard (port 8000) β€” 42 REST endpoints
β”œβ”€β”€ Thread B  β€” nuclear_engine() β€” Async Playwright scraper loop (N8 window + boost)
β”œβ”€β”€ Thread C  β€” telegram_c2_loop() β€” polls getUpdates every 3s
β”œβ”€β”€ Thread D  β€” price_refresh_loop() β€” revisits lot URLs every 5min
└── Thread E  β€” closing_alert_loop() β€” polls DB every 60s for closing lots

Key module-level constants:
β”œβ”€β”€ JS_EXTRACT         β€” Main listing extraction JS (cards + fallback ancestor walk)
β”œβ”€β”€ JS_DETAIL_IMAGES   β€” 5-layer image extractor (Apollo, JSON-LD, OG, DOM, srcset)
└── JS_APOLLO_WAIT     β€” Apollo cache readiness poller for wait_for_function

Key helpers:
β”œβ”€β”€ _fetch_listing_images_batch()  β€” Visits detail pages for full image gallery
β”œβ”€β”€ _ai_analyze()                  β€” AI lot filtering (Groq/Ollama)
β”œβ”€β”€ adapt_site_now()               β€” AI CSS selector generation (N17)
β”œβ”€β”€ _convert_price()               β€” Currency conversion to USD
β”œβ”€β”€ _discover_search_input()       β€” ARIA/placeholder heuristic search box finder
β”œβ”€β”€ _extract_json()                β€” 4-strategy JSON extractor for AI responses
└── _auto_dismiss_popups()         β€” 17 consent-framework selectors + text fallback

Navigation Modes

Mode Trigger Behaviour
A β€” Direct {keyword} in url_template Substitute keyword, navigate directly
B β€” Homepage No {keyword} in url_template Navigate to homepage, auto-discover search box via ARIA/placeholder/label/ID heuristics

Humanize Levels

Level Behaviour
raw Instant fill, no mouse, no scroll β€” fastest, zero protection
low Single mouse move + 1-2 scroll steps
medium Bezier mouse curve + scroll pauses + char-by-char typing
heavy Full: bezier + micro-tremors + 4-7 scroll steps + 12% typo rate + homepage pre-visit + all stealth patches

Stealth System

30+ navigator/window patches: webdriver hidden, 5 real plugin objects, languages, platform, vendor, hardwareConcurrency, deviceMemory, WebGL renderer (no SwiftShader), canvas fingerprint noise, audio fingerprint noise, battery API (realistic), network info (4G), media devices, iframe contentWindow patching, chrome.runtime, chrome.loadTimes(), Permissions API, screen dimensions, performance timing.


Database Schema

Listing:      id, title, price (Float), currency, price_raw, time_left, time_left_mins (Float),
              price_updated_at, link (unique), score, keyword, site_name, timestamp,
              ai_match (Integer: 1=match, 0=rejected, NULL=not analysed), ai_reason (String 200),
              location (String 200), price_usd (Float), closing_alerts_sent (Text β€” JSON list),
              images (Text β€” JSON array of image URLs, 0–10; NULL if none found)
Keyword:      id, term (unique), weight (score multiplier), ai_target (Text β€” natural-language filter),
              min_price (Float), max_price (Float), sort_order (Integer)
Config:       id, key (unique), value
TargetSite:   id, name, url_template, search_selector, enabled (int 0/1), max_pages,
              last_error, error_count, consecutive_failures, last_success_at, cooldown_until,
              requires_login, login_url, login_check_selector, login_enabled, sort_order (Integer)
SiteSelectors: id, site_id (unique FK), container_sel, title_sel, price_sel, time_sel, link_sel,
              next_page_sel, confidence (Float 0-100), container_count, title_rate, price_rate,
              provider (groq|ollama), generated_at, last_tested_at, stale (Bool), notes (Text)
ScoringRule:  id, signal (String 100), delta (Integer β€” positive=boost, negative=penalty),
              category (String 50 β€” "positive"|"negative"|"custom"), notes (Text)
              Seeded from hardcoded lists on first startup. calculate_attribute_score() queries this table.
              Bypassed entirely when scoring_enabled=false (AI-first mode).

All Config Keys

Key Description
telegram_token Bot token from @BotFather
telegram_chat_id Telegram chat/group ID
timer Seconds between scrape cycles (default 120)
browser_choice auto | edge | yandex | chrome | brave | chromium
incognito_mode true | false
show_browser true | false
delay_launch Seconds after browser opens
delay_site_open Seconds after homepage loads
delay_post_search Seconds after results page loads
delay_page_hold Seconds to hold results page
humanize_level raw | low | medium | heavy
captcha_solver none | 2captcha | capsolver
captcha_api_key API key for chosen CAPTCHA service
alert_channels Comma-separated: telegram,discord,email
discord_webhook Full Discord webhook URL
gmail_address Gmail sender address (replaces old SMTP config)
gmail_app_password 16-char Gmail App Password
email_to Destination email address for alerts
closing_alert_enabled true | false
closing_alert_schedule Comma-separated minutes before closing (e.g. 60,30,10,5). 0 = no countdown alerts
display_currency ISO 4217 code to convert prices for display (blank = raw). Rates via frankfurter.app
proxy_enabled true | false β€” enable round-robin proxy rotation
proxy_list Newline-separated proxy URLs (http://..., socks5://...)
db_url Empty = SQLite, or postgresql://...
site_auto_disable_after Failures before 30min cooldown (0 = never)
ai_filter_enabled true | false
ai_provider groq | ollama | none
ai_model Model name β€” llama-3.3-70b-versatile (Groq) or llama3.2:3b (Ollama)
ai_api_key Groq API key (free at console.groq.com)
ai_base_url Ollama base URL (default http://localhost:11434)
ai_debug true | false β€” log AI prompts/responses to in-memory buffer (viewable in AI Log tab)
auto_adapt_enabled true | false β€” enables N17 AI selector generation on new site creation
scoring_enabled true | false β€” when false, score gate skipped; AI becomes sole judge (AI-first mode)
scrape_window_enabled true | false β€” restrict scraping to a time-of-day window
scrape_start_hour 0–23 local hour β€” window start (default 8). Overnight windows supported (e.g. 22β†’06)
scrape_end_hour 0–23 local hour β€” window end (default 22). Engine sleeps outside this range
boost_interval_mins Minutes between cycles when a lot closes within 30 min (default 2). Replaces timer in boost mode

All 42 API Endpoints

Method Path Purpose
GET / Serves React dashboard (or dashboard.html fallback)
GET /legacy Always serves legacy dashboard.html
GET /api/stats Engine status, scanned count, alert count, uptime
GET /api/listings?limit=N All listings, newest first
DELETE /api/listings/{id} Delete one listing
DELETE /api/listings Clear all listings
GET /api/listings/countdown-sync Lightweight: id + time_left_mins only
GET /api/listings/refresh-status Last price-refresh timestamp
GET /api/keywords All keywords
POST /api/keywords Add keyword {term, weight}
DELETE /api/keywords/{id} Delete keyword
PUT /api/keywords/{id} Update keyword term, weight, ai_target, min_price, max_price, sort_order
POST /api/keywords/reorder Bulk-update sort_order from {order: [id, id, ...]}
GET /api/sites All target sites with health data
POST /api/sites Add site
PUT /api/sites/{id} Update site field
DELETE /api/sites/{id} Delete site
POST /api/sites/{id}/login Open visible browser for manual session save
POST /api/sites/{id}/adapt Trigger AI selector generation for site (N17)
GET /api/sites/{id}/selectors Get stored AI selectors for site (N17)
DELETE /api/sites/{id}/selectors Delete stored AI selectors (forces re-adapt) (N17)
POST /api/sites/reorder Bulk-update site sort_order from {order: [id, id, ...]}
GET /api/config All config key-value pairs β€” returns flat {key: value} dict
POST /api/config Save config (upsert) β€” expects flat {key: value} dict
POST /api/engine/pause Pause engine
POST /api/engine/resume Resume engine
POST /api/engine/restart Kill + relaunch scraper thread
POST /api/engine/kill Hard shutdown os._exit(0)
POST /api/telegram/test Send test message
POST /api/ai/test Test AI verdict for a title + ai_target
GET /api/ai/debug/log Poll in-memory AI debug log (supports limit + since_id)
DELETE /api/ai/debug/log Clear in-memory AI debug log
GET /api/export/csv Download all listings as CSV
GET /api/export/json Download all listings as JSON
GET /api/export/html Download cyberpunk HTML report
GET /api/debug/db Raw DB dump for diagnostics
GET /api/backup/download Download timestamped .db backup
POST /api/backup/restore Upload .db file to restore
GET /api/scoring-rules All scoring rules (N6)
POST /api/scoring-rules Add rule {signal, delta, notes} (N6)
PUT /api/scoring-rules/{id} Update rule signal/delta/notes (N6)
DELETE /api/scoring-rules/{id} Delete rule (N6)

Scoring System (N6 β€” DB-backed, optional)

Signals are stored in the scoring_rules table (editable via UI). Seeded from hardcoded defaults on first startup.

score = sum(matched rule.delta values) Γ— keyword.weight

# Filtering logic (in priority order):
# 1. AI target set + ai_filter_enabled=true  β†’ AI is sole judge; score ignored
# 2. scoring_enabled=true (default)           β†’ score < 0 rejects the lot
# 3. scoring_enabled=false (AI-first mode)    β†’ all lots pass; AI must do all filtering

AI-first mode (recommended): set scoring_enabled=false in Settings β†’ Scoring Rules. Every target needs an AI Description β€” the AI reads the lot title and decides accept/reject based on your natural-language description (e.g. "actual Samsung Tab S10 device, not covers or accessories").


Browser Resolution Priority (auto mode)

  1. Edge (C:/Program Files (x86)/Microsoft/Edge/Application/msedge.exe)
  2. Yandex (C:/Users/*/AppData/Local/Yandex/...)
  3. Chrome (C:/Program Files/Google/Chrome/Application/chrome.exe)
  4. Brave
  5. Playwright bundled Chromium

Known Bugs & Rules Never to Break

Issue Rule
SQLite WAL locking Always call db.flush() before db.commit()
enabled field Store as Python int (1/0), not bool β€” SQLite filter ==1 breaks on True
Background browser throttling Three-layer prevention: launch flags + JS init script + no bring_to_front()
JS f-string apostrophes All JS strings in _humanizeDescs etc. must use backtick literals
HiBid extraction Two-stage: Stage 1 inside cards, Stage 2 walks DOM ancestors for price siblings
_discover_search_input() Returns Locator β€” must call .element_handle() before page.evaluate()
networkidle timeout Always have domcontentloaded fallback
Closing alert stale lots Only fires on lots captured within last 7 days
closing_alerts_sent JSON list (e.g. [60,30]) β€” never overwrite, always json.loads + append
N7 price filter Compared in USD via _convert_price() β€” both min/max and listing price converted to USD before comparison
display_currency Frontend fetches rates from frankfurter.app (USD base); price_usd stored in DB is always USD
closing_alert_sent REMOVED from ORM β€” use closing_alerts_sent JSON list instead
Dead SMTP keys Removed from SEED_CONFIG β€” email_smtp_host/port/user/pass, email_from, closing_alert_mins
listing_detail_enabled true | false β€” click listing title to open detail panel
N11 dedup only on eBay Cross-site dedup (difflib > 0.85) only fires when site.name or url_template contains "ebay"
calculate_attribute_score() opens DB Opens its own SessionLocal() per call β€” don't call inside an already-open session loop
N8 overnight windows scrape_start_hour > scrape_end_hour triggers overnight logic (e.g. 22β†’06)
StaticFiles root mount Never use app.mount("/", StaticFiles(html=True)) β€” it shadows all explicit routes. Use /_next mount + /{full_path:path} catch-all instead
/api/config format GET returns flat {key: value} dict. POST expects same flat dict. Never treat as array
threading.Event not asyncio.Event asyncio.Event is not thread-safe for cross-thread signalling

Debugging Reference

Error Fix
Telegram 400 chat not found Open Telegram, press START on the bot. Groups need minus-prefixed chat_id
Telegram 401 Unauthorized Token format wrong β€” must be number:letters
No listings extracted Check show_browser=true, check site not in cooldown via /api/debug/db
SyntaxError in worker Almost always an apostrophe in an f-string β€” use backtick JS template literals
Search box not found Mode B Add CSS selector to search_selector field as override
SQLite database locked Already WAL-mitigated β€” if persists, switch to PostgreSQL
Price shows null _extract_price_and_currency() couldn't parse β€” check price_text in logs
Closing alert not firing Check: enabled=true, lot has time_left_mins, timestamp within 7 days; check closing_alerts_sent JSON list
Login check always false login_check_selector not matching β€” use show_browser=true to inspect
ENGINE OFFLINE in frontend useSSE.ts must poll /api/stats every 5s β€” not EventSource (disabled in static build)
Settings stuck "Loading" fetchConfig() must use flat dict, not .map(). Check catch calls setConfig({})

Tested Target Sites

Site Mode URL Template Selector
eBay UK Direct https://www.ebay.co.uk/sch/i.html?_nkw={keyword}&_sop=10 #gh-ac
eBay US Direct https://www.ebay.com/sch/i.html?_nkw={keyword}&_sop=10 #gh-ac
ShopGoodwill Homepage https://shopgoodwill.com/home input#st
HiBid Homepage https://hibid.com/ auto-discover

International Expansion β€” 12 Sites Ready to Add

Site Mode URL Template
Invaluable Direct https://www.invaluable.com/buy/?keywords={keyword}
BidSpotter Direct https://www.bidspotter.com/en-us/search?q={keyword}
Proxibid Direct https://www.proxibid.com/asp/Search.asp?searchStr={keyword}
i-bidder Direct https://www.i-bidder.com/en-gb/search/{keyword}
Catawiki Direct https://www.catawiki.com/en/search?q={keyword}
LiveAuctioneers Direct https://www.liveauctioneers.com/search/?keyword={keyword}
Lot-tissimo Direct https://www.lot-tissimo.com/en/search/{keyword}
Auctionet Direct https://auctionet.com/en/search?q={keyword}
Bonhams Direct https://www.bonhams.com/search/?q={keyword}
Copart Homepage https://www.copart.com β€” requires login
IAAI Homepage https://www.iaai.com β€” requires login
Drouot Homepage https://www.drouot.com β€” France

20 Supported Currency Codes

USD, GBP, EUR, CAD, AUD, JPY, CHF, SEK, NOK, DKK, NZD, HKD, SGD, MXN, BRL, INR, KRW, CNY, ZAR, AED


Finalized Tech Stack

Layer USE AVOID
Backend/API Python (FastAPI) Java, PHP, Ruby, Go
Scraper Python (Playwright) Rust, Puppeteer/Node
AI Inference Groq + Ollama (now) β†’ vLLM (production) PyTorch/TensorFlow (until custom model training)
Frontend TypeScript + React + Next.js + Tailwind jQuery, vanilla JS, Vue, Angular
Database PostgreSQL (primary) MongoDB, MySQL
Cache/Queue Redis In-memory Python dicts
Deployment Docker β†’ Docker Compose β†’ Kubernetes Manual .bat scripts
Auth NextAuth.js or Clerk (when multi-user) Custom auth from scratch

Why These Decisions (Do Not Reverse Without Strong Reason)

  • PostgreSQL NOT MongoDB β€” auction data is relational (listingsβ†’sites, keywordsβ†’filters, prices, scores). MongoDB loses JOINs, foreign keys, and our SQLAlchemy ORM. Decided Session 15.
  • Redis added β€” replaces in-memory Python dicts. Survives restarts, enables pub/sub for live dashboard, proper job queue for scrape tasks. Without this, all in-memory state is lost on restart.
  • No Rust, no Go β€” bottleneck is network I/O (waiting for auction sites), not CPU. Rust adds 10Γ— complexity for zero gain. Only reconsider Go for a dedicated WebSocket server at 10,000+ concurrent users.
  • No PyTorch/TensorFlow yet β€” we do inference via APIs only. Add PyTorch when building custom auction models (image condition classifier, price predictor). Not now.
  • vLLM for production β€” replaces Groq rate limits at $0.20–0.50/M tokens on rented GPU. Groq is fine for personal use.
  • Ghost Node's unique value β€” the auction intelligence layer. Watches every auction site, scores deals, predicts closing behaviour, alerts users when and where to bid. No SaaS competitor offers this. Vision: evolve from personal tool β†’ multi-user platform β†’ public API layer.

Session Rules

  1. Always read CLAUDE.md + PROGRESS.md at the start of every session. Scan MEMORY.md for gotchas.
  2. Update MD files dynamically β€” only update a file when this session had content that belongs in that file. Each file gets only what is related to it. No blanket updates across all files. See table below.
  3. Before ending a session: consider updating only the MDs that apply: Abbas.md if something general about Abbas was learned (any project); FEEDBACK.md if something project-specific about Abbas was learned; ERROR.md if an error was fixed; others per table.
  4. Never regenerate source files from scratch β€” always edit the uploaded versions
  5. Always ask before making architectural changes that affect multiple files
  6. Test logic before writing β€” read the relevant section of worker.py before editing
  7. The uploaded worker.py is always the definitive version β€” it may be ahead of any exports

⚠️ MD Update Rule β€” DYNAMIC, NOT STATIC

Only update an MD file when this session had something that belongs in that file. Do NOT update every file after every task. Do NOT update a file just because it exists. Each file gets ONLY what is related to it. If nothing for that file this session β€” skip it.

File Update when… Skip when…
CLAUDE.md API endpoints, DB schema, config keys, architecture, or known bug/rule changed Change had no schema/API/architecture impact
PROGRESS.md Any code file was changed β€” add a session entry No code or project files edited
MEMORY.md New gotcha, dev path, or architecture decision (reusable) Fix/lesson already covered or not reusable
Abbas.md AI learned or discovered something general about Abbas (about you; applies anywhere) Nothing general about Abbas this session
FEEDBACK.md Learned something about Abbas in this project’s context (env, decisions here) Nothing project-specific about Abbas
ERROR.md We hit an error and fixed it β€” worth logging for next time No errors resolved this session
ARCHIVE.md A PROGRESS.md session entry is old enough to archive (move here) Everything else

Examples of correct dynamic behaviour:

  • CSS colour tweak β†’ update PROGRESS.md only
  • New /api/scoring-rules endpoint added β†’ update PROGRESS.md + CLAUDE.md (endpoint list + schema)
  • Discovered FastAPI StaticFiles routing gotcha β†’ update PROGRESS.md + MEMORY.md
  • Pure Q&A session, no files edited β†’ update nothing