# Programmatic Schema Validation and Refactoring ## Why Schema Validation Matters When building a knowledge base with hundreds of articles, consistency is critical. But manually checking every file for proper frontmatter, correct section order, and template compliance is unsustainable. **Solution:** Use bash scripts and command-line tools to validate and refactor files programmatically, saving AI compute for actual content creation. ## The Efficiency Argument **AI-driven validation (slow, expensive):** - AI reads 500 files → 500 LLM calls - Token cost scales linearly with vault size - Slow iteration cycles **Programmatic validation (fast, free):** - Bash script reads 500 files → milliseconds - Zero token cost - Instant feedback loops **Use AI for:** Content creation, pattern recognition, writing **Use scripts for:** Validation, bulk updates, schema enforcement ## Core Frontmatter Schema midimaze uses structured frontmatter with two key tracking fields: ```yaml --- created: 2025-11-08T14:55:58-0800 updated: 2025-11-08T14:55:58-0800 edited_seconds: 0 slug: 3jsvv8n3qmi template_type: feature-guide # ← Enables programmatic filtering schema_validated: 2025-11-08 # ← Tracks last validation tasks_status: tasks_unfinished: tasks_completed: --- ``` ### Key Fields **`template_type`:** Pattern used for this article - Values: `feature-guide`, `conceptual-explainer`, `workflow-guide`, `troubleshooting-guide`, `gear-guide`, etc. - **Why:** Enables bulk operations by category - **Example:** Find all feature guides: `grep -l "template_type: feature-guide" **/*.md` **`schema_validated`:** Date of last validation - Format: `YYYY-MM-DD` - **Why:** Track which files need re-validation after schema changes - **Example:** Find unvalidated files: `grep -L "schema_validated:" **/*.md` ## Validation Operations ### 1. Find Files Missing Required Fields **Check for template_type:** ```bash # Files missing template_type (old schema) grep -L "^template_type:" **/*.md ``` **Check for schema_validated:** ```bash # Files never validated grep -L "^schema_validated:" **/*.md ``` **Check for both:** ```bash # Combine with logical AND grep -L "^template_type:" **/*.md | xargs grep -L "^schema_validated:" ``` ### 2. Validate Field Values **Check for invalid template types:** ```bash # Extract all template_type values grep "^template_type:" **/*.md | sort | uniq # Expected values: # template_type: feature-guide # template_type: conceptual-explainer # template_type: workflow-guide # template_type: troubleshooting-guide # template_type: gear-guide ``` **Find typos:** ```bash # Should return empty (no typos) grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)" ``` ### 3. Find Files by Pattern Type **All feature guides:** ```bash grep -l "^template_type: feature-guide" **/*.md ``` **All gear guides needing manufacturer field:** ```bash grep -l "^template_type: gear-guide" **/*.md | xargs grep -L "^manufacturer:" ``` **Count by type:** ```bash for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do echo "$type: $(grep -l "template_type: $type" **/*.md | wc -l)" done ``` ### 4. Validate Dates **Find invalid schema_validated dates:** ```bash # Should be YYYY-MM-DD format grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}" ``` **Find files validated before a certain date:** ```bash # Files validated before template_type was added grep "^schema_validated: 2025-11-0[1-7]" **/*.md ``` ## Refactoring Operations ### 1. Add Missing Fields **Add template_type to untyped files:** ```bash # Find feature guide files without template_type for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:"); do # Insert after slug field sed -i '' '/^slug:/a\ template_type: feature-guide ' "$file" echo "Added template_type to $file" done ``` **Add schema_validated with current date:** ```bash for file in $(grep -L "^schema_validated:" **/*.md); do sed -i '' '/^template_type:/a\ schema_validated: '$(date +"%Y-%m-%d")' ' "$file" echo "Added schema_validated to $file" done ``` ### 2. Update Field Values **Update all schema_validated dates after validation:** ```bash # After validating all files for file in **/*.md; do sed -i '' "s/^schema_validated:.*/schema_validated: $(date +"%Y-%m-%d")/" "$file" done ``` **Fix template_type typos:** ```bash # Fix "feature" to "feature-guide" sed -i '' 's/^template_type: feature$/template_type: feature-guide/' **/*.md ``` ### 3. Enforce Field Order **Standardize frontmatter order:** ```bash # Desired order: # created, updated, edited_seconds, slug, template_type, schema_validated, tasks... # This requires YAML parsing (use yq) for file in **/*.md; do yq eval -i '. | sort_keys' "$file" done ``` ### 4. Bulk Updates by Template Type **Add new field to all gear guides:** ```bash for file in $(grep -l "template_type: gear-guide" **/*.md); do # Add manufacturer_url field if missing if ! grep -q "^manufacturer_url:" "$file"; then sed -i '' '/^manufacturer:/a\ manufacturer_url: ' "$file" echo "Added manufacturer_url to $file" fi done ``` ## Validation Scripts ### Complete Validation Script Create `validate-schema.sh`: ```bash #!/bin/bash echo "=== Frontmatter Schema Validation ===" echo # Check for missing required fields echo "Files missing template_type:" grep -L "^template_type:" **/*.md echo echo "Files missing schema_validated:" grep -L "^schema_validated:" **/*.md echo # Validate template_type values echo "Invalid template_type values:" grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)" echo # Validate date formats echo "Invalid schema_validated dates:" grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}" echo # Count files by type echo "Files by template type:" for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do count=$(grep -l "template_type: $type" **/*.md 2>/dev/null | wc -l) echo " $type: $count" done echo echo "=== Validation Complete ===" ``` **Usage:** ```bash chmod +x validate-schema.sh ./validate-schema.sh ``` ### Validation + Auto-Fix Script Create `fix-schema.sh`: ```bash #!/bin/bash echo "=== Auto-fixing Schema Issues ===" echo # Add template_type to feature guides missing it echo "Adding template_type to feature guides..." for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:" 2>/dev/null); do sed -i '' '/^slug:/a\ template_type: feature-guide ' "$file" echo " Fixed: $file" done # Add schema_validated to all files missing it echo echo "Adding schema_validated..." for file in $(grep -L "^schema_validated:" **/*.md); do sed -i '' '/^template_type:/a\ schema_validated: '$(date +"%Y-%m-%d")' ' "$file" echo " Fixed: $file" done echo echo "=== Auto-fix Complete ===" echo "Run validate-schema.sh to verify" ``` ## Integration with Git ### Pre-Commit Hook Create `.git/hooks/pre-commit`: ```bash #!/bin/bash # Run schema validation before commit ./validate-schema.sh > /tmp/schema-validation.txt if grep -q "Invalid" /tmp/schema-validation.txt; then echo "❌ Schema validation failed!" cat /tmp/schema-validation.txt echo echo "Fix issues or run: ./fix-schema.sh" exit 1 fi echo "✅ Schema validation passed" exit 0 ``` **Make executable:** ```bash chmod +x .git/hooks/pre-commit ``` ### Finding Files Changed Since Last Validation ```bash # Files modified after their last schema_validated date for file in **/*.md; do validated_date=$(grep "^schema_validated:" "$file" | cut -d' ' -f2) file_modified=$(stat -f "%Sm" -t "%Y-%m-%d" "$file") if [[ "$file_modified" > "$validated_date" ]]; then echo "$file needs revalidation (modified: $file_modified, validated: $validated_date)" fi done ``` ## Advanced Validation with yq **Install yq:** ```bash brew install yq ``` ### Extract and Validate Frontmatter **Get all frontmatter as JSON:** ```bash yq eval -o=json '.' file.md ``` **Check if required fields exist:** ```bash yq eval 'has("template_type") and has("schema_validated")' file.md # Output: true or false ``` **Validate all required fields:** ```bash for file in **/*.md; do if ! yq eval 'has("created") and has("slug") and has("template_type")' "$file" | grep -q "true"; then echo "Missing required fields: $file" fi done ``` ### Bulk Update with yq **Set schema_validated on all files:** ```bash for file in **/*.md; do yq eval -i ".schema_validated = \"$(date +"%Y-%m-%d")\"" "$file" done ``` **Add field conditionally:** ```bash for file in **/*.md; do if ! yq eval 'has("template_type")' "$file" | grep -q "true"; then yq eval -i '.template_type = "feature-guide"' "$file" fi done ``` ## Performance Comparison **Scenario:** Validate 500 markdown files | Method | Time | Cost | |--------|------|------| | **AI (GPT-4)** | ~10 min | ~$2.00 (tokens) | | **bash + grep** | ~2 sec | $0.00 | | **yq parsing** | ~10 sec | $0.00 | **Conclusion:** Use programmatic validation for bulk operations. Reserve AI for content quality and pattern detection. ## Best Practices ### 1. Validate After Schema Changes When you add/remove frontmatter fields: 1. Update templates in `_templates/` 2. Update AGENTS.md documentation 3. Run `validate-schema.sh` 4. Run `fix-schema.sh` if needed 5. Commit changes ### 2. Validate Before Publishing ```bash # Pre-publish checklist ./validate-schema.sh # All clear? Publish! ``` ### 3. Track Validation in Git ```bash # Commit validation date updates separately git add **/*.md git commit -m "chore: update schema_validated dates after validation" ``` ### 4. Use template_type for Targeted Updates ```bash # Example: All feature guides need new "Keyboard Shortcut" section for file in $(grep -l "template_type: feature-guide" **/*.md); do # Check if section exists if ! grep -q "## Keyboard Shortcut" "$file"; then # Add section after "How to Use It" sed -i '' '/## How to Use It/a\ \ ## Keyboard Shortcut\ \ **Shortcut:** \ ' "$file" fi done ``` ## Related - [[Template Systems - Templater vs AI Agent Creation]] - [[Verifying File Changes with Terminal Inspection]] - `_Nakul/5. Coding Actions/midimaze/Article Structure Patterns.md`