Initial import
This commit is contained in:
commit
f71cbe8d38
151
APP_CODE_REFERENCE.md
Normal file
151
APP_CODE_REFERENCE.md
Normal file
@ -0,0 +1,151 @@
|
||||
# App Code Reference for Flatlogic
|
||||
|
||||
## How the App Loads Recipes
|
||||
|
||||
The app uses the `DataManager` class to load recipe data. Key points:
|
||||
|
||||
### 1. Recipe Sources (Priority Order)
|
||||
|
||||
```javascript
|
||||
getRecipeSources() {
|
||||
if (this.isNativePlatform()) {
|
||||
return [
|
||||
'data/recipes_enhanced.json.gz', // Preferred
|
||||
'data/recipes.json', // Fallback
|
||||
];
|
||||
}
|
||||
return [
|
||||
'data/recipes_enhanced.json.gz',
|
||||
'data/recipes.json',
|
||||
];
|
||||
}
|
||||
```
|
||||
|
||||
The app tries sources in order until one succeeds.
|
||||
|
||||
### 2. File Loading Methods
|
||||
|
||||
**For regular JSON:**
|
||||
```javascript
|
||||
async fetchJson(path) {
|
||||
const response = await fetch(path);
|
||||
return response.json();
|
||||
}
|
||||
```
|
||||
|
||||
**For gzipped JSON:**
|
||||
```javascript
|
||||
async fetchGzipJson(path) {
|
||||
// On native platforms, uses Capacitor Filesystem API
|
||||
// Then decompresses with DecompressionStream
|
||||
// Falls back to fetch on web
|
||||
}
|
||||
```
|
||||
|
||||
### 3. File Size Constraints
|
||||
|
||||
Mobile WebView limitations:
|
||||
- **Max file size to load via fetch():** ~10-15MB
|
||||
- **Memory for decompressed data:** ~50-100MB
|
||||
- **Recommended recipe count:** 15,000-25,000 recipes
|
||||
|
||||
**What happens with large files:**
|
||||
- 52MB gzip file → 162MB uncompressed → Crash on mobile
|
||||
- 4MB gzip file → 12MB uncompressed → Works fine
|
||||
|
||||
### 4. Recipe Matching Logic
|
||||
|
||||
Recipes are matched against user's pantry ingredients:
|
||||
|
||||
```javascript
|
||||
// Ingredient matching in mealPlanner.js
|
||||
const matched = r.ingredients.filter(ing => {
|
||||
const ingLower = ing.toLowerCase();
|
||||
return pantryNames.some(p => {
|
||||
const pantryLower = p.toLowerCase();
|
||||
// "squash" matches "butternut squash" (good)
|
||||
// "butter" doesn't match "peanut butter" (prevented by word boundary check)
|
||||
if (ingLower.includes(pantryLower)) {
|
||||
const wordPattern = new RegExp(`\\b${pantryWords[0]}\\b`, 'i');
|
||||
return wordPattern.test(ingLower);
|
||||
}
|
||||
return false;
|
||||
});
|
||||
}).length;
|
||||
```
|
||||
|
||||
### 5. Required Dependencies
|
||||
|
||||
In `package.json`:
|
||||
```json
|
||||
{
|
||||
"dependencies": {
|
||||
"@capacitor/filesystem": "^1.1.0"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
After adding recipes, run:
|
||||
```bash
|
||||
npm install
|
||||
npm run sync # Syncs to Android/iOS
|
||||
npm run android:build
|
||||
```
|
||||
|
||||
### 6. File Location
|
||||
|
||||
After `npm run sync`, files are copied to:
|
||||
- Android: `android/app/src/main/assets/public/data/`
|
||||
- iOS: `ios/App/App/public/data/`
|
||||
|
||||
### 7. Performance Considerations
|
||||
|
||||
**Bad (Current broken approach):**
|
||||
- 188,435 recipes
|
||||
- 52MB gzipped / 162MB uncompressed
|
||||
- App crashes on mobile or takes 2+ minutes to load
|
||||
|
||||
**Good (Target):**
|
||||
- 15,000-20,000 recipes
|
||||
- 4-6MB gzipped / 12-18MB uncompressed
|
||||
- Loads in <5 seconds on mobile
|
||||
|
||||
**How to achieve this:**
|
||||
1. Filter to highest-rated recipes (4+ stars)
|
||||
2. Remove recipes with >20 ingredients (too complex)
|
||||
3. Remove recipes with <3 ingredients (incomplete)
|
||||
4. Remove recipes missing images or incomplete data
|
||||
5. Deduplicate by name
|
||||
|
||||
### 8. Testing on Device
|
||||
|
||||
After building:
|
||||
```bash
|
||||
npm run android:build
|
||||
adb install -r android/app/build/outputs/apk/debug/app-debug.apk
|
||||
```
|
||||
|
||||
Check console logs for:
|
||||
```
|
||||
[DataManager] Loaded XXXXX recipes from data/recipes.json.gz
|
||||
[MealPlanner] Total recipes: XXXXX
|
||||
```
|
||||
|
||||
### 9. Current Data Files
|
||||
|
||||
In the handoff package:
|
||||
- `recipes_sample.json` - 100 sample recipes (uncompressed)
|
||||
- `recipes_enhanced.json.gz` - Full 188k recipe dataset (52MB, too large)
|
||||
- `DATA_SUMMARY.txt` - Statistics about current data
|
||||
|
||||
### 10. What Flatlogic Should Deliver
|
||||
|
||||
1. `recipes_flatlogic.json.gz` - 15,000-20,000 recipes
|
||||
2. `recipes_sample.json` - First 100 uncompressed (for review)
|
||||
3. `validation_report.txt` - Results of running validate_recipes.py
|
||||
4. Total size: Under 8MB compressed
|
||||
|
||||
## Questions?
|
||||
|
||||
If you need the full app source code, it's available at:
|
||||
[Git repository URL]
|
||||
13
DATA_SUMMARY.txt
Normal file
13
DATA_SUMMARY.txt
Normal file
@ -0,0 +1,13 @@
|
||||
Current Dataset Summary
|
||||
========================
|
||||
|
||||
Total validated recipes: 188435
|
||||
Sample provided: 100 recipes
|
||||
|
||||
Coverage Stats:
|
||||
- With ratings: 188435 (100.0%)
|
||||
- With times: 188435 (100.0%)
|
||||
- With servings: 7835 (4.2%)
|
||||
- With instructions: 188435 (100.0%)
|
||||
|
||||
Average ingredients per recipe: 9.4
|
||||
125
README.md
Normal file
125
README.md
Normal file
@ -0,0 +1,125 @@
|
||||
# Flatlogic Recipe Data Handoff Package
|
||||
|
||||
## What's Included
|
||||
|
||||
| File | Description | Size |
|
||||
|------|-------------|------|
|
||||
| `RECIPE_SCHEMA.md` | Complete schema documentation with examples | 4.7KB |
|
||||
| `APP_CODE_REFERENCE.md` | How the app loads and uses recipe data | 3.7KB |
|
||||
| `validate_recipes.py` | Python script to check data quality | 6.6KB |
|
||||
| `recipes_sample.json` | 100 sample recipes (uncompressed for review) | 291KB |
|
||||
| `recipes_current_full.json.gz` | Current 188,435 recipes (for reference) | 50MB |
|
||||
| `DATA_SUMMARY.txt` | Statistics about current dataset | 301B |
|
||||
|
||||
## Quick Start for Flatlogic
|
||||
|
||||
### 1. Understand the Requirements
|
||||
|
||||
**Target Dataset:**
|
||||
- **15,000-20,000 recipes** (current has too many - 188k causes crashes)
|
||||
- **File size:** Under 8MB compressed (gzip)
|
||||
- **Quality:** 85%+ should pass validation
|
||||
|
||||
### 2. Use the Validation Script
|
||||
|
||||
```bash
|
||||
# Run this on your generated data before delivery
|
||||
python3 validate_recipes.py recipes_flatlogic.json.gz
|
||||
```
|
||||
|
||||
Expected output:
|
||||
```
|
||||
Quality Score: 90.0%
|
||||
✅ EXCELLENT - Ready for delivery
|
||||
```
|
||||
|
||||
### 3. Key Validation Rules
|
||||
|
||||
**CRITICAL - Title-Ingredient Matching:**
|
||||
- "Chicken Alfredo" MUST have "chicken" in ingredients
|
||||
- "Butternut Squash Soup" MUST have "butternut squash" in ingredients
|
||||
- "Peanut Butter Cookies" MUST have "peanut butter" in ingredients
|
||||
|
||||
The script checks this automatically.
|
||||
|
||||
### 4. Deliverables
|
||||
|
||||
Submit to client:
|
||||
1. `recipes_flatlogic.json.gz` - Your compressed dataset
|
||||
2. `recipes_sample.json` - First 100 recipes uncompressed
|
||||
3. `validation_report.txt` - Output of validate_recipes.py
|
||||
|
||||
## Current Dataset Problems (For Reference)
|
||||
|
||||
The included `recipes_current_full.json.gz` (188,435 recipes) has these issues:
|
||||
- ❌ Too large - 52MB compressed / 162MB uncompressed
|
||||
- ❌ Causes app crashes on mobile devices
|
||||
- ❌ Out-of-memory errors on Android WebView
|
||||
- ❌ 10,000 recipe limit being applied (old fallback data)
|
||||
|
||||
**Why this happened:**
|
||||
We merged Food.com Kaggle dataset (231k recipes) with existing data, filtered to 188k validated recipes, but mobile WebView can't handle files >10-15MB.
|
||||
|
||||
## Recipe Format Example
|
||||
|
||||
```json
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Creamy Garlic Butter Chicken",
|
||||
"ingredients": [
|
||||
"chicken breasts",
|
||||
"butter",
|
||||
"garlic cloves",
|
||||
"heavy cream",
|
||||
"parmesan cheese",
|
||||
"spinach",
|
||||
"salt",
|
||||
"black pepper"
|
||||
],
|
||||
"steps": [
|
||||
"Season chicken with salt and pepper.",
|
||||
"Melt butter in skillet over medium heat.",
|
||||
"Cook chicken 6-7 minutes per side until golden.",
|
||||
"Add minced garlic, cook 1 minute.",
|
||||
"Pour in cream and parmesan, simmer 3 minutes.",
|
||||
"Stir in spinach until wilted.",
|
||||
"Serve immediately."
|
||||
],
|
||||
"minutes": 25,
|
||||
"tags": ["dinner", "chicken", "quick", "creamy"],
|
||||
"rating": 4.7,
|
||||
"servings": 4,
|
||||
"difficulty": "easy"
|
||||
}
|
||||
```
|
||||
|
||||
## Size Guidelines
|
||||
|
||||
| Recipe Count | Compressed Size | Mobile Performance |
|
||||
|-------------|-----------------|-------------------|
|
||||
| 10,000 | 3-4 MB | ✅ Fast loading |
|
||||
| 20,000 | 6-8 MB | ✅ Good loading |
|
||||
| 50,000 | 15-20 MB | ⚠️ Slow loading |
|
||||
| 188,000 | 52 MB | ❌ Crashes app |
|
||||
|
||||
## Recommended Data Sources
|
||||
|
||||
1. **Food.com / Genius Kitchen** (preferred)
|
||||
2. **AllRecipes**
|
||||
3. **BBC Good Food**
|
||||
4. **NYT Cooking**
|
||||
|
||||
Focus on highly-rated recipes (4+ stars) with complete instructions.
|
||||
|
||||
## Questions?
|
||||
|
||||
Contact the development team with:
|
||||
- Validation script output
|
||||
- Sample recipes for review
|
||||
- Any schema clarifications needed
|
||||
|
||||
## Timeline
|
||||
|
||||
Target delivery: [Set with client]
|
||||
Quality threshold: 85%+ valid recipes
|
||||
File size limit: 8MB compressed
|
||||
166
RECIPE_SCHEMA.md
Normal file
166
RECIPE_SCHEMA.md
Normal file
@ -0,0 +1,166 @@
|
||||
# Recipe Data Schema for Flatlogic
|
||||
|
||||
## Required Fields
|
||||
|
||||
Each recipe must have the following fields:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": 1, // Integer, unique identifier
|
||||
"name": "Recipe Title", // String, max 100 chars
|
||||
"ingredients": [ // Array of strings, required
|
||||
"ingredient 1",
|
||||
"ingredient 2"
|
||||
],
|
||||
"steps": [ // Array of strings, required
|
||||
"Step 1 instruction",
|
||||
"Step 2 instruction"
|
||||
],
|
||||
"minutes": 30, // Integer, prep time in minutes
|
||||
"tags": ["dinner", "chicken"], // Array of strings
|
||||
"rating": 4.5, // Number 0-5, optional but preferred
|
||||
"servings": 4, // Integer, optional
|
||||
"difficulty": "easy", // String: easy/medium/hard, optional
|
||||
"image": "https://...", // String URL, optional
|
||||
"nutrition": { // Object, optional
|
||||
"calories": 350,
|
||||
"protein": 25,
|
||||
"carbs": 40,
|
||||
"fat": 12
|
||||
},
|
||||
"dietary": { // Object, all boolean, optional
|
||||
"vegan": false,
|
||||
"vegetarian": false,
|
||||
"glutenFree": false,
|
||||
"dairyFree": false,
|
||||
"keto": false
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Data Quality Requirements
|
||||
|
||||
### 1. Title-Ingredient Matching (CRITICAL)
|
||||
- If a recipe title contains a food item, that item MUST be in the ingredients list
|
||||
- Examples:
|
||||
- ✅ "Butternut Squash Soup" must have "butternut squash" in ingredients
|
||||
- ❌ "Chicken Alfredo" missing "chicken" or "alfredo" is invalid
|
||||
- ❌ "Peanut Butter Cookies" missing "peanut butter" is invalid
|
||||
|
||||
### 2. Ingredient Completeness
|
||||
- Each ingredient should be a specific food item
|
||||
- Avoid generic terms like "spices" or "seasoning to taste"
|
||||
- Include common pantry items (salt, pepper, oil) if used
|
||||
|
||||
### 3. Instructions Quality
|
||||
- Each step must be a complete sentence
|
||||
- Steps should be in logical cooking order
|
||||
- Include temperatures, times, and visual cues ("golden brown")
|
||||
|
||||
### 4. Categorization
|
||||
Include relevant tags from these categories:
|
||||
- **Meal type**: breakfast, lunch, dinner, snack, dessert
|
||||
- **Protein**: chicken, beef, pork, fish, seafood, tofu, eggs, beans
|
||||
- **Cuisine**: italian, mexican, asian, indian, mediterranean, american
|
||||
- **Method**: baked, grilled, fried, slow-cooked, no-cook
|
||||
- **Dietary**: vegan, vegetarian, gluten-free, dairy-free, keto, low-carb
|
||||
|
||||
## Size Constraints
|
||||
|
||||
### For Mobile App Performance:
|
||||
- **Maximum recipes**: 25,000 (to prevent memory issues)
|
||||
- **Maximum file size**: 8MB compressed (gzip)
|
||||
- **Target**: 15,000-20,000 high-quality recipes
|
||||
|
||||
### Recipe Distribution:
|
||||
- 30% Quick meals (under 30 min)
|
||||
- 25% Standard dinners (30-60 min)
|
||||
- 20% Breakfast/brunch
|
||||
- 15% Healthy/light options
|
||||
- 10% Desserts/treats
|
||||
|
||||
## Data Sources to Use
|
||||
|
||||
Preferred sources (in order):
|
||||
1. **Food.com / Genius Kitchen** - Well-structured, community rated
|
||||
2. **AllRecipes** - Popular, tested recipes
|
||||
3. **BBC Good Food** - Reliable, well-written
|
||||
4. **NYT Cooking** - High quality
|
||||
5. **Budget Bytes** - Simple, affordable
|
||||
|
||||
Avoid:
|
||||
- Aggregator sites with stolen content
|
||||
- Recipes without ratings/reviews
|
||||
- AI-generated recipes without human testing
|
||||
|
||||
## File Format
|
||||
|
||||
Deliver as:
|
||||
1. `recipes_flatlogic.json.gz` - Gzipped JSON array
|
||||
2. `recipes_sample.json` - First 100 recipes uncompressed (for review)
|
||||
3. `validation_report.txt` - Summary of data quality checks
|
||||
|
||||
## Validation Script
|
||||
|
||||
Use the provided `validate_recipes.py` script to check:
|
||||
- Title-ingredient matching
|
||||
- Complete data fields
|
||||
- No duplicates
|
||||
- Valid time ranges
|
||||
- Proper categorization
|
||||
|
||||
Run: `python3 validate_recipes.py recipes_flatlogic.json`
|
||||
|
||||
## Examples
|
||||
|
||||
### Good Recipe:
|
||||
```json
|
||||
{
|
||||
"id": 1,
|
||||
"name": "Creamy Garlic Butter Chicken",
|
||||
"ingredients": [
|
||||
"chicken breasts",
|
||||
"butter",
|
||||
"garlic cloves",
|
||||
"heavy cream",
|
||||
"parmesan cheese",
|
||||
"spinach",
|
||||
"salt",
|
||||
"black pepper"
|
||||
],
|
||||
"steps": [
|
||||
"Season chicken with salt and pepper.",
|
||||
"Melt butter in skillet over medium heat.",
|
||||
"Cook chicken 6-7 minutes per side until golden.",
|
||||
"Add minced garlic, cook 1 minute.",
|
||||
"Pour in cream and parmesan, simmer 3 minutes.",
|
||||
"Stir in spinach until wilted.",
|
||||
"Serve immediately."
|
||||
],
|
||||
"minutes": 25,
|
||||
"tags": ["dinner", "chicken", "quick", "creamy"],
|
||||
"rating": 4.7,
|
||||
"servings": 4,
|
||||
"difficulty": "easy",
|
||||
"dietary": {
|
||||
"glutenFree": true,
|
||||
"keto": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Bad Recipe (Don't Do This):
|
||||
```json
|
||||
{
|
||||
"name": "Peanut Butter & Jelly Sandwich", // Missing peanut butter!
|
||||
"ingredients": ["bread", "jelly"], // ❌ Missing peanut butter
|
||||
"steps": ["Make sandwich"], // ❌ Too vague
|
||||
"minutes": 5
|
||||
}
|
||||
```
|
||||
|
||||
## Questions?
|
||||
|
||||
Contact: [Your contact info]
|
||||
App: Main Recipe & Meal Planning App
|
||||
Platform: Android/iOS/Web
|
||||
BIN
recipes_current_full.json.gz
Normal file
BIN
recipes_current_full.json.gz
Normal file
Binary file not shown.
9671
recipes_sample.json
Normal file
9671
recipes_sample.json
Normal file
File diff suppressed because it is too large
Load Diff
192
validate_recipes.py
Normal file
192
validate_recipes.py
Normal file
@ -0,0 +1,192 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Recipe Validation Script for Flatlogic
|
||||
Run this on your recipe dataset before delivery
|
||||
"""
|
||||
|
||||
import json
|
||||
import gzip
|
||||
import re
|
||||
from pathlib import Path
|
||||
|
||||
# Keywords that should appear in ingredients if in title
|
||||
FOOD_KEYWORDS = {
|
||||
'chicken', 'beef', 'pork', 'turkey', 'lamb', 'fish', 'salmon', 'tuna', 'shrimp',
|
||||
'pasta', 'spaghetti', 'rice', 'quinoa', 'potato', 'potatoes', 'sweet potato',
|
||||
'tomato', 'tomatoes', 'onion', 'onions', 'garlic', 'carrot', 'carrots',
|
||||
'broccoli', 'spinach', 'mushroom', 'mushrooms', 'pepper', 'peppers',
|
||||
'zucchini', 'squash', 'butternut squash', 'eggplant', 'cheese', 'cheddar',
|
||||
'mozzarella', 'parmesan', 'milk', 'butter', 'cream', 'egg', 'eggs',
|
||||
'bread', 'tortilla', 'pita', 'peanut butter', 'jam', 'jelly', 'chocolate',
|
||||
'apple', 'apples', 'banana', 'bananas', 'orange', 'lemon', 'lime',
|
||||
'strawberry', 'blueberry', 'bacon', 'sausage', 'ham', 'bean', 'beans',
|
||||
'chickpea', 'chickpeas', 'lentil', 'lentils', 'corn', 'peas', 'avocado',
|
||||
'tofu', 'nuts', 'almond', 'almonds', 'walnut', 'walnuts', 'coconut',
|
||||
'pineapple', 'mango', 'oil', 'olive oil', 'vinegar', 'soy sauce',
|
||||
'flour', 'sugar', 'honey', 'maple syrup', 'salt', 'pepper'
|
||||
}
|
||||
|
||||
# Recipe types to exclude from matching (the end product, not an ingredient)
|
||||
EXCLUDE_RECIPE_TYPES = {
|
||||
'bread', 'cake', 'pie', 'cookies', 'muffins', 'brownies', 'bars',
|
||||
'soup', 'stew', 'chili', 'sauce', 'gravy', 'dip', 'spread',
|
||||
'salad', 'slaw', 'casserole', 'lasagna', 'pizza', 'smoothie',
|
||||
'shake', 'cocktail', 'drink', 'burger', 'sandwich', 'wrap', 'taco', 'burrito'
|
||||
}
|
||||
|
||||
def extract_food_keywords_from_title(title):
|
||||
"""Extract food keywords that should be in ingredients"""
|
||||
title_lower = title.lower()
|
||||
|
||||
# Clean title
|
||||
title_lower = re.sub(r"'s\s+", ' ', title_lower)
|
||||
title_lower = re.sub(r'\b(best|easy|quick|simple|homemade|perfect|delicious|amazing|ultimate|fried|baked|grilled|roasted|sauteed|steamed|boiled|poached)\b', '', title_lower)
|
||||
title_lower = re.sub(r'\b(with|and|or|in|on)\b', ' ', title_lower)
|
||||
|
||||
found_keywords = []
|
||||
for keyword in FOOD_KEYWORDS:
|
||||
if keyword in EXCLUDE_RECIPE_TYPES:
|
||||
continue
|
||||
# Check for whole word matches
|
||||
pattern = r'\b' + re.escape(keyword) + r'\b'
|
||||
if re.search(pattern, title_lower):
|
||||
found_keywords.append(keyword)
|
||||
|
||||
return found_keywords
|
||||
|
||||
def check_ingredients_match_title(recipe):
|
||||
"""Check if title keywords are in ingredients"""
|
||||
title_keywords = extract_food_keywords_from_title(recipe['name'])
|
||||
if not title_keywords:
|
||||
return True, [] # No specific food keywords in title
|
||||
|
||||
ingredients_str = ' '.join(recipe.get('ingredients', [])).lower()
|
||||
|
||||
missing = []
|
||||
for keyword in title_keywords:
|
||||
if keyword not in ingredients_str:
|
||||
missing.append(keyword)
|
||||
|
||||
return len(missing) == 0, missing
|
||||
|
||||
def validate_recipe(recipe, idx):
|
||||
"""Validate a single recipe"""
|
||||
errors = []
|
||||
|
||||
# Required fields
|
||||
if not recipe.get('name'):
|
||||
errors.append("Missing name")
|
||||
|
||||
if not recipe.get('ingredients') or len(recipe['ingredients']) == 0:
|
||||
errors.append("No ingredients")
|
||||
elif len(recipe['ingredients']) < 2:
|
||||
errors.append("Too few ingredients (minimum 2)")
|
||||
|
||||
if not recipe.get('steps') or len(recipe['steps']) == 0:
|
||||
errors.append("No instructions")
|
||||
elif len(recipe['steps']) < 2:
|
||||
errors.append("Too few steps (minimum 2)")
|
||||
|
||||
# Check time
|
||||
minutes = recipe.get('minutes')
|
||||
if minutes is None:
|
||||
errors.append("Missing time")
|
||||
elif minutes < 1 or minutes > 1440: # More than 24 hours
|
||||
errors.append(f"Invalid time: {minutes} minutes")
|
||||
|
||||
# Check title-ingredient match
|
||||
matches, missing = check_ingredients_match_title(recipe)
|
||||
if not matches:
|
||||
errors.append(f"Title ingredients missing: {', '.join(missing)}")
|
||||
|
||||
return errors
|
||||
|
||||
def load_recipes(filepath):
|
||||
"""Load recipes from JSON or gzipped JSON"""
|
||||
path = Path(filepath)
|
||||
|
||||
if path.suffix == '.gz':
|
||||
with gzip.open(path, 'rt') as f:
|
||||
return json.load(f)
|
||||
else:
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
|
||||
def main():
|
||||
import sys
|
||||
|
||||
if len(sys.argv) < 2:
|
||||
print("Usage: python3 validate_recipes.py <recipes.json|recipes.json.gz>")
|
||||
sys.exit(1)
|
||||
|
||||
filepath = sys.argv[1]
|
||||
print(f"Loading recipes from {filepath}...")
|
||||
|
||||
try:
|
||||
recipes = load_recipes(filepath)
|
||||
except Exception as e:
|
||||
print(f"Error loading file: {e}")
|
||||
sys.exit(1)
|
||||
|
||||
print(f"\nTotal recipes: {len(recipes)}")
|
||||
print("\nValidating...")
|
||||
|
||||
valid_count = 0
|
||||
invalid_count = 0
|
||||
errors_by_type = {}
|
||||
sample_errors = []
|
||||
|
||||
for i, recipe in enumerate(recipes):
|
||||
if i % 1000 == 0:
|
||||
print(f" Processed {i}/{len(recipes)}...")
|
||||
|
||||
errors = validate_recipe(recipe, i)
|
||||
|
||||
if errors:
|
||||
invalid_count += 1
|
||||
for error in errors:
|
||||
errors_by_type[error] = errors_by_type.get(error, 0) + 1
|
||||
|
||||
if len(sample_errors) < 5:
|
||||
sample_errors.append({
|
||||
'name': recipe.get('name', 'Unknown'),
|
||||
'errors': errors
|
||||
})
|
||||
else:
|
||||
valid_count += 1
|
||||
|
||||
# Report
|
||||
print(f"\n{'='*60}")
|
||||
print("VALIDATION RESULTS")
|
||||
print(f"{'='*60}")
|
||||
print(f"Total recipes: {len(recipes)}")
|
||||
print(f"Valid recipes: {valid_count} ({valid_count/len(recipes)*100:.1f}%)")
|
||||
print(f"Invalid recipes: {invalid_count} ({invalid_count/len(recipes)*100:.1f}%)")
|
||||
|
||||
print(f"\nError breakdown:")
|
||||
for error, count in sorted(errors_by_type.items(), key=lambda x: x[1], reverse=True):
|
||||
print(f" - {error}: {count} recipes")
|
||||
|
||||
if sample_errors:
|
||||
print(f"\nSample errors:")
|
||||
for item in sample_errors:
|
||||
print(f" - {item['name']}: {', '.join(item['errors'])}")
|
||||
|
||||
# Quality score
|
||||
quality_score = valid_count / len(recipes) * 100
|
||||
print(f"\nQuality Score: {quality_score:.1f}%")
|
||||
|
||||
if quality_score >= 95:
|
||||
print("✅ EXCELLENT - Ready for delivery")
|
||||
elif quality_score >= 85:
|
||||
print("⚠️ GOOD - Minor issues, acceptable for delivery")
|
||||
elif quality_score >= 70:
|
||||
print("❌ NEEDS WORK - Fix major issues before delivery")
|
||||
else:
|
||||
print("❌ REJECT - Significant data quality issues")
|
||||
|
||||
return quality_score
|
||||
|
||||
if __name__ == '__main__':
|
||||
score = main()
|
||||
sys.exit(0 if score >= 85 else 1)
|
||||
Loading…
x
Reference in New Issue
Block a user