Programmatic Schema Validation and Refactoring
Programmatic Schema Validation and Refactoring
Why Schema Validation Matters
When building a knowledge base with hundreds of articles, consistency is critical. But manually checking every file for proper frontmatter, correct section order, and template compliance is unsustainable.
Solution: Use bash scripts and command-line tools to validate and refactor files programmatically, saving AI compute for actual content creation.
The Efficiency Argument
AI-driven validation (slow, expensive):
- AI reads 500 files β 500 LLM calls
- Token cost scales linearly with vault size
- Slow iteration cycles
Programmatic validation (fast, free):
- Bash script reads 500 files β milliseconds
- Zero token cost
- Instant feedback loops
Use AI for: Content creation, pattern recognition, writing Use scripts for: Validation, bulk updates, schema enforcement
Core Frontmatter Schema
midimaze uses structured frontmatter with two key tracking fields:
yaml--- created: 2025-11-08T14:55:58-0800 updated: 2025-11-08T14:55:58-0800 edited_seconds: 0 slug: 3jsvv8n3qmi template_type: feature-guide # β Enables programmatic filtering schema_validated: 2025-11-08 # β Tracks last validation tasks_status: tasks_unfinished: tasks_completed: ---
Key Fields
template_type: Pattern used for this article
- Values:
feature-guide,conceptual-explainer,workflow-guide,troubleshooting-guide,gear-guide, etc. - Why: Enables bulk operations by category
- Example: Find all feature guides:
grep -l "template_type: feature-guide" **/*.md
schema_validated: Date of last validation
- Format:
YYYY-MM-DD - Why: Track which files need re-validation after schema changes
- Example: Find unvalidated files:
grep -L "schema_validated:" **/*.md
Validation Operations
1. Find Files Missing Required Fields
Check for template_type:
bash# Files missing template_type (old schema) grep -L "^template_type:" **/*.md
Check for schema_validated:
bash# Files never validated grep -L "^schema_validated:" **/*.md
Check for both:
bash# Combine with logical AND grep -L "^template_type:" **/*.md | xargs grep -L "^schema_validated:"
2. Validate Field Values
Check for invalid template types:
bash# Extract all template_type values grep "^template_type:" **/*.md | sort | uniq # Expected values: # template_type: feature-guide # template_type: conceptual-explainer # template_type: workflow-guide # template_type: troubleshooting-guide # template_type: gear-guide
Find typos:
bash# Should return empty (no typos) grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)"
3. Find Files by Pattern Type
All feature guides:
bashgrep -l "^template_type: feature-guide" **/*.md
All gear guides needing manufacturer field:
bashgrep -l "^template_type: gear-guide" **/*.md | xargs grep -L "^manufacturer:"
Count by type:
bashfor type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do echo "$type: $(grep -l "template_type: $type" **/*.md | wc -l)" done
4. Validate Dates
Find invalid schema_validated dates:
bash# Should be YYYY-MM-DD format grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}"
Find files validated before a certain date:
bash# Files validated before template_type was added grep "^schema_validated: 2025-11-0[1-7]" **/*.md
Refactoring Operations
1. Add Missing Fields
Add template_type to untyped files:
bash# Find feature guide files without template_type for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:"); do # Insert after slug field sed -i '' '/^slug:/a\ template_type: feature-guide ' "$file" echo "Added template_type to $file" done
Add schema_validated with current date:
bashfor file in $(grep -L "^schema_validated:" **/*.md); do sed -i '' '/^template_type:/a\ schema_validated: '$(date +"%Y-%m-%d")' ' "$file" echo "Added schema_validated to $file" done
2. Update Field Values
Update all schema_validated dates after validation:
bash# After validating all files for file in **/*.md; do sed -i '' "s/^schema_validated:.*/schema_validated: $(date +"%Y-%m-%d")/" "$file" done
Fix template_type typos:
bash# Fix "feature" to "feature-guide" sed -i '' 's/^template_type: feature$/template_type: feature-guide/' **/*.md
3. Enforce Field Order
Standardize frontmatter order:
bash# Desired order: # created, updated, edited_seconds, slug, template_type, schema_validated, tasks... # This requires YAML parsing (use yq) for file in **/*.md; do yq eval -i '. | sort_keys' "$file" done
4. Bulk Updates by Template Type
Add new field to all gear guides:
bashfor file in $(grep -l "template_type: gear-guide" **/*.md); do # Add manufacturer_url field if missing if ! grep -q "^manufacturer_url:" "$file"; then sed -i '' '/^manufacturer:/a\ manufacturer_url: ' "$file" echo "Added manufacturer_url to $file" fi done
Validation Scripts
Complete Validation Script
Create validate-schema.sh:
bash#!/bin/bash echo "=== Frontmatter Schema Validation ===" echo # Check for missing required fields echo "Files missing template_type:" grep -L "^template_type:" **/*.md echo echo "Files missing schema_validated:" grep -L "^schema_validated:" **/*.md echo # Validate template_type values echo "Invalid template_type values:" grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)" echo # Validate date formats echo "Invalid schema_validated dates:" grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}" echo # Count files by type echo "Files by template type:" for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do count=$(grep -l "template_type: $type" **/*.md 2>/dev/null | wc -l) echo " $type: $count" done echo echo "=== Validation Complete ==="
Usage:
bashchmod +x validate-schema.sh ./validate-schema.sh
Validation + Auto-Fix Script
Create fix-schema.sh:
bash#!/bin/bash echo "=== Auto-fixing Schema Issues ===" echo # Add template_type to feature guides missing it echo "Adding template_type to feature guides..." for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:" 2>/dev/null); do sed -i '' '/^slug:/a\ template_type: feature-guide ' "$file" echo " Fixed: $file" done # Add schema_validated to all files missing it echo echo "Adding schema_validated..." for file in $(grep -L "^schema_validated:" **/*.md); do sed -i '' '/^template_type:/a\ schema_validated: '$(date +"%Y-%m-%d")' ' "$file" echo " Fixed: $file" done echo echo "=== Auto-fix Complete ===" echo "Run validate-schema.sh to verify"
Integration with Git
Pre-Commit Hook
Create .git/hooks/pre-commit:
bash#!/bin/bash # Run schema validation before commit ./validate-schema.sh > /tmp/schema-validation.txt if grep -q "Invalid" /tmp/schema-validation.txt; then echo "β Schema validation failed!" cat /tmp/schema-validation.txt echo echo "Fix issues or run: ./fix-schema.sh" exit 1 fi echo "β Schema validation passed" exit 0
Make executable:
bashchmod +x .git/hooks/pre-commit
Finding Files Changed Since Last Validation
bash# Files modified after their last schema_validated date for file in **/*.md; do validated_date=$(grep "^schema_validated:" "$file" | cut -d' ' -f2) file_modified=$(stat -f "%Sm" -t "%Y-%m-%d" "$file") if <span class="wikilink-broken" title="Page not found: "$file_modified" > "$validated_date" "> "$file_modified" > "$validated_date" </span>; then echo "$file needs revalidation (modified: $file_modified, validated: $validated_date)" fi done
Advanced Validation with yq
Install yq:
bashbrew install yq
Extract and Validate Frontmatter
Get all frontmatter as JSON:
bashyq eval -o=json '.' file.md
Check if required fields exist:
bashyq eval 'has("template_type") and has("schema_validated")' file.md # Output: true or false
Validate all required fields:
bashfor file in **/*.md; do if ! yq eval 'has("created") and has("slug") and has("template_type")' "$file" | grep -q "true"; then echo "Missing required fields: $file" fi done
Bulk Update with yq
Set schema_validated on all files:
bashfor file in **/*.md; do yq eval -i ".schema_validated = \"$(date +"%Y-%m-%d")\"" "$file" done
Add field conditionally:
bashfor file in **/*.md; do if ! yq eval 'has("template_type")' "$file" | grep -q "true"; then yq eval -i '.template_type = "feature-guide"' "$file" fi done
Performance Comparison
Scenario: Validate 500 markdown files
| Method | Time | Cost |
|---|---|---|
| AI (GPT-4) | ~10 min | ~$2.00 (tokens) |
| bash + grep | ~2 sec | $0.00 |
| yq parsing | ~10 sec | $0.00 |
Conclusion: Use programmatic validation for bulk operations. Reserve AI for content quality and pattern detection.
Best Practices
1. Validate After Schema Changes
When you add/remove frontmatter fields:
- Update templates in
_templates/ - Update AGENTS.md documentation
- Run
validate-schema.sh - Run
fix-schema.shif needed - Commit changes
2. Validate Before Publishing
bash# Pre-publish checklist ./validate-schema.sh # All clear? Publish!
3. Track Validation in Git
bash# Commit validation date updates separately git add **/*.md git commit -m "chore: update schema_validated dates after validation"
4. Use template_type for Targeted Updates
bash# Example: All feature guides need new "Keyboard Shortcut" section for file in $(grep -l "template_type: feature-guide" **/*.md); do # Check if section exists if ! grep -q "## Keyboard Shortcut" "$file"; then # Add section after "How to Use It" sed -i '' '/## How to Use It/a\ \ ## Keyboard Shortcut\ \ **Shortcut:** \ ' "$file" fi done
Related
- Template Systems - Templater vs AI Agent Creation
- Verifying File Changes with Terminal Inspection
_Nakul/5. Coding Actions/midimaze/Article Structure Patterns.md