38567-vm/aiea/README.md
2026-02-18 14:31:16 +00:00

158 lines
4.0 KiB
Markdown

# AIEA Local Chrome Screenshot + OpenAI Assistant
Local-only (unpacked) Chrome extension + Python bridge that:
1. Captures the visible tab screenshot.
2. Extracts visible page text hierarchy (reduced HTML tree).
3. Saves files to `./screenshots`.
4. Calls OpenAI with:
- screenshot image
- extracted page content
- system instructions from `AI_EA_INSTRUCTIONS.MD`
- optional extra instructions from the extension UI
5. Returns short + medium response suggestions per visible post back to the extension popup.
No Chrome Web Store publishing is required.
## Project Structure
- `chrome_screenshot_ext/` unpacked MV3 extension (popup UI + capture + extraction)
- `tools/local_screenshot_bridge.py` local HTTP server (`127.0.0.1`) + OpenAI call
- `AI_EA_INSTRUCTIONS.MD` system instructions fed to the model
- `screenshots/` generated outputs (`.png`, `.json`, `.content.json`, `.ai.json`)
## Prerequisites
- macOS/Linux shell
- Python 3.9+
- Google Chrome
- OpenAI API key
## Setup (venv)
From repo root:
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r requirements.txt
```
Set API key with either:
1. Environment variable
```bash
export OPENAI_API_KEY="your_api_key_here"
```
2. `.env` file in project root
```bash
OPENAI_API_KEY=your_api_key_here
```
## Run the Local Server
With OpenAI enabled:
```bash
python3 tools/local_screenshot_bridge.py --port 8765 --out-dir screenshots --ai
```
Without OpenAI (save files only):
```bash
python3 tools/local_screenshot_bridge.py --port 8765 --out-dir screenshots
```
More logging (includes AI request/response ids + parse details):
```bash
python3 tools/local_screenshot_bridge.py --port 8765 --out-dir screenshots --ai --log-level debug
python3 tools/local_screenshot_bridge.py --port 8765 --out-dir screenshots --ai --log-level debug --ai-max-output-tokens 2500
```
Health check:
```bash
curl http://127.0.0.1:8765/health
```
## Load the Chrome Extension (Unpacked)
1. Open `chrome://extensions`
2. Enable Developer mode
3. Click `Load unpacked`
4. Select `chrome_screenshot_ext/`
If you change extension code, click `Reload` on this extension page.
## Use the Extension
1. Open any normal webpage (not `chrome://` pages).
2. Click extension icon.
3. Keep endpoint: `http://127.0.0.1:8765/screenshot`
4. Optional: add `Extra instructions` (2-line input in popup).
5. Click `Capture`.
Popup shows:
- Saved file paths
- AI suggestions per detected post (6 total):
- improved short + medium
- critical short + medium
- suggested short + medium
- Any AI error details
History in popup:
- `History` dropdown stores recent generations locally (`chrome.storage.local`)
- `Show` displays a previous result
- `Clear` removes local popup history
## Output Files
Generated in `screenshots/`:
- `YYYYMMDDTHHMMSSZ-<title-slug>.png` screenshot
- `YYYYMMDDTHHMMSSZ-<title-slug>.json` metadata
- `YYYYMMDDTHHMMSSZ-<title-slug>.content.json` extracted visible content tree
- `YYYYMMDDTHHMMSSZ-<title-slug>.ai.json` structured AI suggestions
## AI Behavior Notes
- System instructions source: `AI_EA_INSTRUCTIONS.MD`
- Extra per-capture instructions source: popup `Extra instructions`
- AI output is sanitized post-generation:
- quote/dash/space cleanup
- invisible unicode cleanup
- collapsed extra spaces/newlines
- forced lowercase (no capital letters)
## Useful CLI Options
```bash
python3 tools/local_screenshot_bridge.py --help
```
Common options:
- `--ai-model` (default: `gpt-5.2`)
- `--ai-max-posts` (default: `12`)
- `--ai-content-max-chars` (default: `120000`)
- `--ai-image-detail` (default: `auto`)
- `--ai-max-output-tokens` (default: `1400`)
- `--run ...` optional post-save hook command
## Troubleshooting
- `missing_openai_api_key`:
set `OPENAI_API_KEY` in shell or `.env`, then restart server.
- `missing_openai_sdk`:
run `pip install -r requirements.txt` inside your venv.
- Capture fails on Chrome internal pages:
`chrome://*`, Web Store, and some protected tabs cannot be scripted.
- No AI results in popup:
verify server started with `--ai`.