Initial import
This commit is contained in:
commit
f71cbe8d38
151
APP_CODE_REFERENCE.md
Normal file
151
APP_CODE_REFERENCE.md
Normal file
@ -0,0 +1,151 @@
|
|||||||
|
# App Code Reference for Flatlogic
|
||||||
|
|
||||||
|
## How the App Loads Recipes
|
||||||
|
|
||||||
|
The app uses the `DataManager` class to load recipe data. Key points:
|
||||||
|
|
||||||
|
### 1. Recipe Sources (Priority Order)
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
getRecipeSources() {
|
||||||
|
if (this.isNativePlatform()) {
|
||||||
|
return [
|
||||||
|
'data/recipes_enhanced.json.gz', // Preferred
|
||||||
|
'data/recipes.json', // Fallback
|
||||||
|
];
|
||||||
|
}
|
||||||
|
return [
|
||||||
|
'data/recipes_enhanced.json.gz',
|
||||||
|
'data/recipes.json',
|
||||||
|
];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The app tries sources in order until one succeeds.
|
||||||
|
|
||||||
|
### 2. File Loading Methods
|
||||||
|
|
||||||
|
**For regular JSON:**
|
||||||
|
```javascript
|
||||||
|
async fetchJson(path) {
|
||||||
|
const response = await fetch(path);
|
||||||
|
return response.json();
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**For gzipped JSON:**
|
||||||
|
```javascript
|
||||||
|
async fetchGzipJson(path) {
|
||||||
|
// On native platforms, uses Capacitor Filesystem API
|
||||||
|
// Then decompresses with DecompressionStream
|
||||||
|
// Falls back to fetch on web
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. File Size Constraints
|
||||||
|
|
||||||
|
Mobile WebView limitations:
|
||||||
|
- **Max file size to load via fetch():** ~10-15MB
|
||||||
|
- **Memory for decompressed data:** ~50-100MB
|
||||||
|
- **Recommended recipe count:** 15,000-25,000 recipes
|
||||||
|
|
||||||
|
**What happens with large files:**
|
||||||
|
- 52MB gzip file → 162MB uncompressed → Crash on mobile
|
||||||
|
- 4MB gzip file → 12MB uncompressed → Works fine
|
||||||
|
|
||||||
|
### 4. Recipe Matching Logic
|
||||||
|
|
||||||
|
Recipes are matched against user's pantry ingredients:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
// Ingredient matching in mealPlanner.js
|
||||||
|
const matched = r.ingredients.filter(ing => {
|
||||||
|
const ingLower = ing.toLowerCase();
|
||||||
|
return pantryNames.some(p => {
|
||||||
|
const pantryLower = p.toLowerCase();
|
||||||
|
// "squash" matches "butternut squash" (good)
|
||||||
|
// "butter" doesn't match "peanut butter" (prevented by word boundary check)
|
||||||
|
if (ingLower.includes(pantryLower)) {
|
||||||
|
const wordPattern = new RegExp(`\\b${pantryWords[0]}\\b`, 'i');
|
||||||
|
return wordPattern.test(ingLower);
|
||||||
|
}
|
||||||
|
return false;
|
||||||
|
});
|
||||||
|
}).length;
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Required Dependencies
|
||||||
|
|
||||||
|
In `package.json`:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"dependencies": {
|
||||||
|
"@capacitor/filesystem": "^1.1.0"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
After adding recipes, run:
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
npm run sync # Syncs to Android/iOS
|
||||||
|
npm run android:build
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. File Location
|
||||||
|
|
||||||
|
After `npm run sync`, files are copied to:
|
||||||
|
- Android: `android/app/src/main/assets/public/data/`
|
||||||
|
- iOS: `ios/App/App/public/data/`
|
||||||
|
|
||||||
|
### 7. Performance Considerations
|
||||||
|
|
||||||
|
**Bad (Current broken approach):**
|
||||||
|
- 188,435 recipes
|
||||||
|
- 52MB gzipped / 162MB uncompressed
|
||||||
|
- App crashes on mobile or takes 2+ minutes to load
|
||||||
|
|
||||||
|
**Good (Target):**
|
||||||
|
- 15,000-20,000 recipes
|
||||||
|
- 4-6MB gzipped / 12-18MB uncompressed
|
||||||
|
- Loads in <5 seconds on mobile
|
||||||
|
|
||||||
|
**How to achieve this:**
|
||||||
|
1. Filter to highest-rated recipes (4+ stars)
|
||||||
|
2. Remove recipes with >20 ingredients (too complex)
|
||||||
|
3. Remove recipes with <3 ingredients (incomplete)
|
||||||
|
4. Remove recipes missing images or incomplete data
|
||||||
|
5. Deduplicate by name
|
||||||
|
|
||||||
|
### 8. Testing on Device
|
||||||
|
|
||||||
|
After building:
|
||||||
|
```bash
|
||||||
|
npm run android:build
|
||||||
|
adb install -r android/app/build/outputs/apk/debug/app-debug.apk
|
||||||
|
```
|
||||||
|
|
||||||
|
Check console logs for:
|
||||||
|
```
|
||||||
|
[DataManager] Loaded XXXXX recipes from data/recipes.json.gz
|
||||||
|
[MealPlanner] Total recipes: XXXXX
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Current Data Files
|
||||||
|
|
||||||
|
In the handoff package:
|
||||||
|
- `recipes_sample.json` - 100 sample recipes (uncompressed)
|
||||||
|
- `recipes_enhanced.json.gz` - Full 188k recipe dataset (52MB, too large)
|
||||||
|
- `DATA_SUMMARY.txt` - Statistics about current data
|
||||||
|
|
||||||
|
### 10. What Flatlogic Should Deliver
|
||||||
|
|
||||||
|
1. `recipes_flatlogic.json.gz` - 15,000-20,000 recipes
|
||||||
|
2. `recipes_sample.json` - First 100 uncompressed (for review)
|
||||||
|
3. `validation_report.txt` - Results of running validate_recipes.py
|
||||||
|
4. Total size: Under 8MB compressed
|
||||||
|
|
||||||
|
## Questions?
|
||||||
|
|
||||||
|
If you need the full app source code, it's available at:
|
||||||
|
[Git repository URL]
|
||||||
13
DATA_SUMMARY.txt
Normal file
13
DATA_SUMMARY.txt
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
Current Dataset Summary
|
||||||
|
========================
|
||||||
|
|
||||||
|
Total validated recipes: 188435
|
||||||
|
Sample provided: 100 recipes
|
||||||
|
|
||||||
|
Coverage Stats:
|
||||||
|
- With ratings: 188435 (100.0%)
|
||||||
|
- With times: 188435 (100.0%)
|
||||||
|
- With servings: 7835 (4.2%)
|
||||||
|
- With instructions: 188435 (100.0%)
|
||||||
|
|
||||||
|
Average ingredients per recipe: 9.4
|
||||||
125
README.md
Normal file
125
README.md
Normal file
@ -0,0 +1,125 @@
|
|||||||
|
# Flatlogic Recipe Data Handoff Package
|
||||||
|
|
||||||
|
## What's Included
|
||||||
|
|
||||||
|
| File | Description | Size |
|
||||||
|
|------|-------------|------|
|
||||||
|
| `RECIPE_SCHEMA.md` | Complete schema documentation with examples | 4.7KB |
|
||||||
|
| `APP_CODE_REFERENCE.md` | How the app loads and uses recipe data | 3.7KB |
|
||||||
|
| `validate_recipes.py` | Python script to check data quality | 6.6KB |
|
||||||
|
| `recipes_sample.json` | 100 sample recipes (uncompressed for review) | 291KB |
|
||||||
|
| `recipes_current_full.json.gz` | Current 188,435 recipes (for reference) | 50MB |
|
||||||
|
| `DATA_SUMMARY.txt` | Statistics about current dataset | 301B |
|
||||||
|
|
||||||
|
## Quick Start for Flatlogic
|
||||||
|
|
||||||
|
### 1. Understand the Requirements
|
||||||
|
|
||||||
|
**Target Dataset:**
|
||||||
|
- **15,000-20,000 recipes** (current has too many - 188k causes crashes)
|
||||||
|
- **File size:** Under 8MB compressed (gzip)
|
||||||
|
- **Quality:** 85%+ should pass validation
|
||||||
|
|
||||||
|
### 2. Use the Validation Script
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run this on your generated data before delivery
|
||||||
|
python3 validate_recipes.py recipes_flatlogic.json.gz
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected output:
|
||||||
|
```
|
||||||
|
Quality Score: 90.0%
|
||||||
|
✅ EXCELLENT - Ready for delivery
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Key Validation Rules
|
||||||
|
|
||||||
|
**CRITICAL - Title-Ingredient Matching:**
|
||||||
|
- "Chicken Alfredo" MUST have "chicken" in ingredients
|
||||||
|
- "Butternut Squash Soup" MUST have "butternut squash" in ingredients
|
||||||
|
- "Peanut Butter Cookies" MUST have "peanut butter" in ingredients
|
||||||
|
|
||||||
|
The script checks this automatically.
|
||||||
|
|
||||||
|
### 4. Deliverables
|
||||||
|
|
||||||
|
Submit to client:
|
||||||
|
1. `recipes_flatlogic.json.gz` - Your compressed dataset
|
||||||
|
2. `recipes_sample.json` - First 100 recipes uncompressed
|
||||||
|
3. `validation_report.txt` - Output of validate_recipes.py
|
||||||
|
|
||||||
|
## Current Dataset Problems (For Reference)
|
||||||
|
|
||||||
|
The included `recipes_current_full.json.gz` (188,435 recipes) has these issues:
|
||||||
|
- ❌ Too large - 52MB compressed / 162MB uncompressed
|
||||||
|
- ❌ Causes app crashes on mobile devices
|
||||||
|
- ❌ Out-of-memory errors on Android WebView
|
||||||
|
- ❌ 10,000 recipe limit being applied (old fallback data)
|
||||||
|
|
||||||
|
**Why this happened:**
|
||||||
|
We merged Food.com Kaggle dataset (231k recipes) with existing data, filtered to 188k validated recipes, but mobile WebView can't handle files >10-15MB.
|
||||||
|
|
||||||
|
## Recipe Format Example
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": 1,
|
||||||
|
"name": "Creamy Garlic Butter Chicken",
|
||||||
|
"ingredients": [
|
||||||
|
"chicken breasts",
|
||||||
|
"butter",
|
||||||
|
"garlic cloves",
|
||||||
|
"heavy cream",
|
||||||
|
"parmesan cheese",
|
||||||
|
"spinach",
|
||||||
|
"salt",
|
||||||
|
"black pepper"
|
||||||
|
],
|
||||||
|
"steps": [
|
||||||
|
"Season chicken with salt and pepper.",
|
||||||
|
"Melt butter in skillet over medium heat.",
|
||||||
|
"Cook chicken 6-7 minutes per side until golden.",
|
||||||
|
"Add minced garlic, cook 1 minute.",
|
||||||
|
"Pour in cream and parmesan, simmer 3 minutes.",
|
||||||
|
"Stir in spinach until wilted.",
|
||||||
|
"Serve immediately."
|
||||||
|
],
|
||||||
|
"minutes": 25,
|
||||||
|
"tags": ["dinner", "chicken", "quick", "creamy"],
|
||||||
|
"rating": 4.7,
|
||||||
|
"servings": 4,
|
||||||
|
"difficulty": "easy"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Size Guidelines
|
||||||
|
|
||||||
|
| Recipe Count | Compressed Size | Mobile Performance |
|
||||||
|
|-------------|-----------------|-------------------|
|
||||||
|
| 10,000 | 3-4 MB | ✅ Fast loading |
|
||||||
|
| 20,000 | 6-8 MB | ✅ Good loading |
|
||||||
|
| 50,000 | 15-20 MB | ⚠️ Slow loading |
|
||||||
|
| 188,000 | 52 MB | ❌ Crashes app |
|
||||||
|
|
||||||
|
## Recommended Data Sources
|
||||||
|
|
||||||
|
1. **Food.com / Genius Kitchen** (preferred)
|
||||||
|
2. **AllRecipes**
|
||||||
|
3. **BBC Good Food**
|
||||||
|
4. **NYT Cooking**
|
||||||
|
|
||||||
|
Focus on highly-rated recipes (4+ stars) with complete instructions.
|
||||||
|
|
||||||
|
## Questions?
|
||||||
|
|
||||||
|
Contact the development team with:
|
||||||
|
- Validation script output
|
||||||
|
- Sample recipes for review
|
||||||
|
- Any schema clarifications needed
|
||||||
|
|
||||||
|
## Timeline
|
||||||
|
|
||||||
|
Target delivery: [Set with client]
|
||||||
|
Quality threshold: 85%+ valid recipes
|
||||||
|
File size limit: 8MB compressed
|
||||||
166
RECIPE_SCHEMA.md
Normal file
166
RECIPE_SCHEMA.md
Normal file
@ -0,0 +1,166 @@
|
|||||||
|
# Recipe Data Schema for Flatlogic
|
||||||
|
|
||||||
|
## Required Fields
|
||||||
|
|
||||||
|
Each recipe must have the following fields:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": 1, // Integer, unique identifier
|
||||||
|
"name": "Recipe Title", // String, max 100 chars
|
||||||
|
"ingredients": [ // Array of strings, required
|
||||||
|
"ingredient 1",
|
||||||
|
"ingredient 2"
|
||||||
|
],
|
||||||
|
"steps": [ // Array of strings, required
|
||||||
|
"Step 1 instruction",
|
||||||
|
"Step 2 instruction"
|
||||||
|
],
|
||||||
|
"minutes": 30, // Integer, prep time in minutes
|
||||||
|
"tags": ["dinner", "chicken"], // Array of strings
|
||||||
|
"rating": 4.5, // Number 0-5, optional but preferred
|
||||||
|
"servings": 4, // Integer, optional
|
||||||
|
"difficulty": "easy", // String: easy/medium/hard, optional
|
||||||
|
"image": "https://...", // String URL, optional
|
||||||
|
"nutrition": { // Object, optional
|
||||||
|
"calories": 350,
|
||||||
|
"protein": 25,
|
||||||
|
"carbs": 40,
|
||||||
|
"fat": 12
|
||||||
|
},
|
||||||
|
"dietary": { // Object, all boolean, optional
|
||||||
|
"vegan": false,
|
||||||
|
"vegetarian": false,
|
||||||
|
"glutenFree": false,
|
||||||
|
"dairyFree": false,
|
||||||
|
"keto": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Data Quality Requirements
|
||||||
|
|
||||||
|
### 1. Title-Ingredient Matching (CRITICAL)
|
||||||
|
- If a recipe title contains a food item, that item MUST be in the ingredients list
|
||||||
|
- Examples:
|
||||||
|
- ✅ "Butternut Squash Soup" must have "butternut squash" in ingredients
|
||||||
|
- ❌ "Chicken Alfredo" missing "chicken" or "alfredo" is invalid
|
||||||
|
- ❌ "Peanut Butter Cookies" missing "peanut butter" is invalid
|
||||||
|
|
||||||
|
### 2. Ingredient Completeness
|
||||||
|
- Each ingredient should be a specific food item
|
||||||
|
- Avoid generic terms like "spices" or "seasoning to taste"
|
||||||
|
- Include common pantry items (salt, pepper, oil) if used
|
||||||
|
|
||||||
|
### 3. Instructions Quality
|
||||||
|
- Each step must be a complete sentence
|
||||||
|
- Steps should be in logical cooking order
|
||||||
|
- Include temperatures, times, and visual cues ("golden brown")
|
||||||
|
|
||||||
|
### 4. Categorization
|
||||||
|
Include relevant tags from these categories:
|
||||||
|
- **Meal type**: breakfast, lunch, dinner, snack, dessert
|
||||||
|
- **Protein**: chicken, beef, pork, fish, seafood, tofu, eggs, beans
|
||||||
|
- **Cuisine**: italian, mexican, asian, indian, mediterranean, american
|
||||||
|
- **Method**: baked, grilled, fried, slow-cooked, no-cook
|
||||||
|
- **Dietary**: vegan, vegetarian, gluten-free, dairy-free, keto, low-carb
|
||||||
|
|
||||||
|
## Size Constraints
|
||||||
|
|
||||||
|
### For Mobile App Performance:
|
||||||
|
- **Maximum recipes**: 25,000 (to prevent memory issues)
|
||||||
|
- **Maximum file size**: 8MB compressed (gzip)
|
||||||
|
- **Target**: 15,000-20,000 high-quality recipes
|
||||||
|
|
||||||
|
### Recipe Distribution:
|
||||||
|
- 30% Quick meals (under 30 min)
|
||||||
|
- 25% Standard dinners (30-60 min)
|
||||||
|
- 20% Breakfast/brunch
|
||||||
|
- 15% Healthy/light options
|
||||||
|
- 10% Desserts/treats
|
||||||
|
|
||||||
|
## Data Sources to Use
|
||||||
|
|
||||||
|
Preferred sources (in order):
|
||||||
|
1. **Food.com / Genius Kitchen** - Well-structured, community rated
|
||||||
|
2. **AllRecipes** - Popular, tested recipes
|
||||||
|
3. **BBC Good Food** - Reliable, well-written
|
||||||
|
4. **NYT Cooking** - High quality
|
||||||
|
5. **Budget Bytes** - Simple, affordable
|
||||||
|
|
||||||
|
Avoid:
|
||||||
|
- Aggregator sites with stolen content
|
||||||
|
- Recipes without ratings/reviews
|
||||||
|
- AI-generated recipes without human testing
|
||||||
|
|
||||||
|
## File Format
|
||||||
|
|
||||||
|
Deliver as:
|
||||||
|
1. `recipes_flatlogic.json.gz` - Gzipped JSON array
|
||||||
|
2. `recipes_sample.json` - First 100 recipes uncompressed (for review)
|
||||||
|
3. `validation_report.txt` - Summary of data quality checks
|
||||||
|
|
||||||
|
## Validation Script
|
||||||
|
|
||||||
|
Use the provided `validate_recipes.py` script to check:
|
||||||
|
- Title-ingredient matching
|
||||||
|
- Complete data fields
|
||||||
|
- No duplicates
|
||||||
|
- Valid time ranges
|
||||||
|
- Proper categorization
|
||||||
|
|
||||||
|
Run: `python3 validate_recipes.py recipes_flatlogic.json`
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Good Recipe:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"id": 1,
|
||||||
|
"name": "Creamy Garlic Butter Chicken",
|
||||||
|
"ingredients": [
|
||||||
|
"chicken breasts",
|
||||||
|
"butter",
|
||||||
|
"garlic cloves",
|
||||||
|
"heavy cream",
|
||||||
|
"parmesan cheese",
|
||||||
|
"spinach",
|
||||||
|
"salt",
|
||||||
|
"black pepper"
|
||||||
|
],
|
||||||
|
"steps": [
|
||||||
|
"Season chicken with salt and pepper.",
|
||||||
|
"Melt butter in skillet over medium heat.",
|
||||||
|
"Cook chicken 6-7 minutes per side until golden.",
|
||||||
|
"Add minced garlic, cook 1 minute.",
|
||||||
|
"Pour in cream and parmesan, simmer 3 minutes.",
|
||||||
|
"Stir in spinach until wilted.",
|
||||||
|
"Serve immediately."
|
||||||
|
],
|
||||||
|
"minutes": 25,
|
||||||
|
"tags": ["dinner", "chicken", "quick", "creamy"],
|
||||||
|
"rating": 4.7,
|
||||||
|
"servings": 4,
|
||||||
|
"difficulty": "easy",
|
||||||
|
"dietary": {
|
||||||
|
"glutenFree": true,
|
||||||
|
"keto": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Bad Recipe (Don't Do This):
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "Peanut Butter & Jelly Sandwich", // Missing peanut butter!
|
||||||
|
"ingredients": ["bread", "jelly"], // ❌ Missing peanut butter
|
||||||
|
"steps": ["Make sandwich"], // ❌ Too vague
|
||||||
|
"minutes": 5
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Questions?
|
||||||
|
|
||||||
|
Contact: [Your contact info]
|
||||||
|
App: Main Recipe & Meal Planning App
|
||||||
|
Platform: Android/iOS/Web
|
||||||
BIN
recipes_current_full.json.gz
Normal file
BIN
recipes_current_full.json.gz
Normal file
Binary file not shown.
9671
recipes_sample.json
Normal file
9671
recipes_sample.json
Normal file
File diff suppressed because it is too large
Load Diff
192
validate_recipes.py
Normal file
192
validate_recipes.py
Normal file
@ -0,0 +1,192 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Recipe Validation Script for Flatlogic
|
||||||
|
Run this on your recipe dataset before delivery
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import gzip
|
||||||
|
import re
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Keywords that should appear in ingredients if in title
|
||||||
|
FOOD_KEYWORDS = {
|
||||||
|
'chicken', 'beef', 'pork', 'turkey', 'lamb', 'fish', 'salmon', 'tuna', 'shrimp',
|
||||||
|
'pasta', 'spaghetti', 'rice', 'quinoa', 'potato', 'potatoes', 'sweet potato',
|
||||||
|
'tomato', 'tomatoes', 'onion', 'onions', 'garlic', 'carrot', 'carrots',
|
||||||
|
'broccoli', 'spinach', 'mushroom', 'mushrooms', 'pepper', 'peppers',
|
||||||
|
'zucchini', 'squash', 'butternut squash', 'eggplant', 'cheese', 'cheddar',
|
||||||
|
'mozzarella', 'parmesan', 'milk', 'butter', 'cream', 'egg', 'eggs',
|
||||||
|
'bread', 'tortilla', 'pita', 'peanut butter', 'jam', 'jelly', 'chocolate',
|
||||||
|
'apple', 'apples', 'banana', 'bananas', 'orange', 'lemon', 'lime',
|
||||||
|
'strawberry', 'blueberry', 'bacon', 'sausage', 'ham', 'bean', 'beans',
|
||||||
|
'chickpea', 'chickpeas', 'lentil', 'lentils', 'corn', 'peas', 'avocado',
|
||||||
|
'tofu', 'nuts', 'almond', 'almonds', 'walnut', 'walnuts', 'coconut',
|
||||||
|
'pineapple', 'mango', 'oil', 'olive oil', 'vinegar', 'soy sauce',
|
||||||
|
'flour', 'sugar', 'honey', 'maple syrup', 'salt', 'pepper'
|
||||||
|
}
|
||||||
|
|
||||||
|
# Recipe types to exclude from matching (the end product, not an ingredient)
|
||||||
|
EXCLUDE_RECIPE_TYPES = {
|
||||||
|
'bread', 'cake', 'pie', 'cookies', 'muffins', 'brownies', 'bars',
|
||||||
|
'soup', 'stew', 'chili', 'sauce', 'gravy', 'dip', 'spread',
|
||||||
|
'salad', 'slaw', 'casserole', 'lasagna', 'pizza', 'smoothie',
|
||||||
|
'shake', 'cocktail', 'drink', 'burger', 'sandwich', 'wrap', 'taco', 'burrito'
|
||||||
|
}
|
||||||
|
|
||||||
|
def extract_food_keywords_from_title(title):
|
||||||
|
"""Extract food keywords that should be in ingredients"""
|
||||||
|
title_lower = title.lower()
|
||||||
|
|
||||||
|
# Clean title
|
||||||
|
title_lower = re.sub(r"'s\s+", ' ', title_lower)
|
||||||
|
title_lower = re.sub(r'\b(best|easy|quick|simple|homemade|perfect|delicious|amazing|ultimate|fried|baked|grilled|roasted|sauteed|steamed|boiled|poached)\b', '', title_lower)
|
||||||
|
title_lower = re.sub(r'\b(with|and|or|in|on)\b', ' ', title_lower)
|
||||||
|
|
||||||
|
found_keywords = []
|
||||||
|
for keyword in FOOD_KEYWORDS:
|
||||||
|
if keyword in EXCLUDE_RECIPE_TYPES:
|
||||||
|
continue
|
||||||
|
# Check for whole word matches
|
||||||
|
pattern = r'\b' + re.escape(keyword) + r'\b'
|
||||||
|
if re.search(pattern, title_lower):
|
||||||
|
found_keywords.append(keyword)
|
||||||
|
|
||||||
|
return found_keywords
|
||||||
|
|
||||||
|
def check_ingredients_match_title(recipe):
|
||||||
|
"""Check if title keywords are in ingredients"""
|
||||||
|
title_keywords = extract_food_keywords_from_title(recipe['name'])
|
||||||
|
if not title_keywords:
|
||||||
|
return True, [] # No specific food keywords in title
|
||||||
|
|
||||||
|
ingredients_str = ' '.join(recipe.get('ingredients', [])).lower()
|
||||||
|
|
||||||
|
missing = []
|
||||||
|
for keyword in title_keywords:
|
||||||
|
if keyword not in ingredients_str:
|
||||||
|
missing.append(keyword)
|
||||||
|
|
||||||
|
return len(missing) == 0, missing
|
||||||
|
|
||||||
|
def validate_recipe(recipe, idx):
|
||||||
|
"""Validate a single recipe"""
|
||||||
|
errors = []
|
||||||
|
|
||||||
|
# Required fields
|
||||||
|
if not recipe.get('name'):
|
||||||
|
errors.append("Missing name")
|
||||||
|
|
||||||
|
if not recipe.get('ingredients') or len(recipe['ingredients']) == 0:
|
||||||
|
errors.append("No ingredients")
|
||||||
|
elif len(recipe['ingredients']) < 2:
|
||||||
|
errors.append("Too few ingredients (minimum 2)")
|
||||||
|
|
||||||
|
if not recipe.get('steps') or len(recipe['steps']) == 0:
|
||||||
|
errors.append("No instructions")
|
||||||
|
elif len(recipe['steps']) < 2:
|
||||||
|
errors.append("Too few steps (minimum 2)")
|
||||||
|
|
||||||
|
# Check time
|
||||||
|
minutes = recipe.get('minutes')
|
||||||
|
if minutes is None:
|
||||||
|
errors.append("Missing time")
|
||||||
|
elif minutes < 1 or minutes > 1440: # More than 24 hours
|
||||||
|
errors.append(f"Invalid time: {minutes} minutes")
|
||||||
|
|
||||||
|
# Check title-ingredient match
|
||||||
|
matches, missing = check_ingredients_match_title(recipe)
|
||||||
|
if not matches:
|
||||||
|
errors.append(f"Title ingredients missing: {', '.join(missing)}")
|
||||||
|
|
||||||
|
return errors
|
||||||
|
|
||||||
|
def load_recipes(filepath):
|
||||||
|
"""Load recipes from JSON or gzipped JSON"""
|
||||||
|
path = Path(filepath)
|
||||||
|
|
||||||
|
if path.suffix == '.gz':
|
||||||
|
with gzip.open(path, 'rt') as f:
|
||||||
|
return json.load(f)
|
||||||
|
else:
|
||||||
|
with open(path) as f:
|
||||||
|
return json.load(f)
|
||||||
|
|
||||||
|
def main():
|
||||||
|
import sys
|
||||||
|
|
||||||
|
if len(sys.argv) < 2:
|
||||||
|
print("Usage: python3 validate_recipes.py <recipes.json|recipes.json.gz>")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
filepath = sys.argv[1]
|
||||||
|
print(f"Loading recipes from {filepath}...")
|
||||||
|
|
||||||
|
try:
|
||||||
|
recipes = load_recipes(filepath)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Error loading file: {e}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
print(f"\nTotal recipes: {len(recipes)}")
|
||||||
|
print("\nValidating...")
|
||||||
|
|
||||||
|
valid_count = 0
|
||||||
|
invalid_count = 0
|
||||||
|
errors_by_type = {}
|
||||||
|
sample_errors = []
|
||||||
|
|
||||||
|
for i, recipe in enumerate(recipes):
|
||||||
|
if i % 1000 == 0:
|
||||||
|
print(f" Processed {i}/{len(recipes)}...")
|
||||||
|
|
||||||
|
errors = validate_recipe(recipe, i)
|
||||||
|
|
||||||
|
if errors:
|
||||||
|
invalid_count += 1
|
||||||
|
for error in errors:
|
||||||
|
errors_by_type[error] = errors_by_type.get(error, 0) + 1
|
||||||
|
|
||||||
|
if len(sample_errors) < 5:
|
||||||
|
sample_errors.append({
|
||||||
|
'name': recipe.get('name', 'Unknown'),
|
||||||
|
'errors': errors
|
||||||
|
})
|
||||||
|
else:
|
||||||
|
valid_count += 1
|
||||||
|
|
||||||
|
# Report
|
||||||
|
print(f"\n{'='*60}")
|
||||||
|
print("VALIDATION RESULTS")
|
||||||
|
print(f"{'='*60}")
|
||||||
|
print(f"Total recipes: {len(recipes)}")
|
||||||
|
print(f"Valid recipes: {valid_count} ({valid_count/len(recipes)*100:.1f}%)")
|
||||||
|
print(f"Invalid recipes: {invalid_count} ({invalid_count/len(recipes)*100:.1f}%)")
|
||||||
|
|
||||||
|
print(f"\nError breakdown:")
|
||||||
|
for error, count in sorted(errors_by_type.items(), key=lambda x: x[1], reverse=True):
|
||||||
|
print(f" - {error}: {count} recipes")
|
||||||
|
|
||||||
|
if sample_errors:
|
||||||
|
print(f"\nSample errors:")
|
||||||
|
for item in sample_errors:
|
||||||
|
print(f" - {item['name']}: {', '.join(item['errors'])}")
|
||||||
|
|
||||||
|
# Quality score
|
||||||
|
quality_score = valid_count / len(recipes) * 100
|
||||||
|
print(f"\nQuality Score: {quality_score:.1f}%")
|
||||||
|
|
||||||
|
if quality_score >= 95:
|
||||||
|
print("✅ EXCELLENT - Ready for delivery")
|
||||||
|
elif quality_score >= 85:
|
||||||
|
print("⚠️ GOOD - Minor issues, acceptable for delivery")
|
||||||
|
elif quality_score >= 70:
|
||||||
|
print("❌ NEEDS WORK - Fix major issues before delivery")
|
||||||
|
else:
|
||||||
|
print("❌ REJECT - Significant data quality issues")
|
||||||
|
|
||||||
|
return quality_score
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
score = main()
|
||||||
|
sys.exit(0 if score >= 85 else 1)
|
||||||
Loading…
x
Reference in New Issue
Block a user