# Programmatic Schema Validation and Refactoring
## Why Schema Validation Matters
When building a knowledge base with hundreds of articles, consistency is critical. But manually checking every file for proper frontmatter, correct section order, and template compliance is unsustainable.
**Solution:** Use bash scripts and command-line tools to validate and refactor files programmatically, saving AI compute for actual content creation.
## The Efficiency Argument
**AI-driven validation (slow, expensive):**
- AI reads 500 files → 500 LLM calls
- Token cost scales linearly with vault size
- Slow iteration cycles
**Programmatic validation (fast, free):**
- Bash script reads 500 files → milliseconds
- Zero token cost
- Instant feedback loops
**Use AI for:** Content creation, pattern recognition, writing
**Use scripts for:** Validation, bulk updates, schema enforcement
## Core Frontmatter Schema
midimaze uses structured frontmatter with two key tracking fields:
```yaml
---
created: 2025-11-08T14:55:58-0800
updated: 2025-11-08T14:55:58-0800
edited_seconds: 0
slug: 3jsvv8n3qmi
template_type: feature-guide # ← Enables programmatic filtering
schema_validated: 2025-11-08 # ← Tracks last validation
tasks_status:
tasks_unfinished:
tasks_completed:
---
```
### Key Fields
**`template_type`:** Pattern used for this article
- Values: `feature-guide`, `conceptual-explainer`, `workflow-guide`, `troubleshooting-guide`, `gear-guide`, etc.
- **Why:** Enables bulk operations by category
- **Example:** Find all feature guides: `grep -l "template_type: feature-guide" **/*.md`
**`schema_validated`:** Date of last validation
- Format: `YYYY-MM-DD`
- **Why:** Track which files need re-validation after schema changes
- **Example:** Find unvalidated files: `grep -L "schema_validated:" **/*.md`
## Validation Operations
### 1. Find Files Missing Required Fields
**Check for template_type:**
```bash
# Files missing template_type (old schema)
grep -L "^template_type:" **/*.md
```
**Check for schema_validated:**
```bash
# Files never validated
grep -L "^schema_validated:" **/*.md
```
**Check for both:**
```bash
# Combine with logical AND
grep -L "^template_type:" **/*.md | xargs grep -L "^schema_validated:"
```
### 2. Validate Field Values
**Check for invalid template types:**
```bash
# Extract all template_type values
grep "^template_type:" **/*.md | sort | uniq
# Expected values:
# template_type: feature-guide
# template_type: conceptual-explainer
# template_type: workflow-guide
# template_type: troubleshooting-guide
# template_type: gear-guide
```
**Find typos:**
```bash
# Should return empty (no typos)
grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)"
```
### 3. Find Files by Pattern Type
**All feature guides:**
```bash
grep -l "^template_type: feature-guide" **/*.md
```
**All gear guides needing manufacturer field:**
```bash
grep -l "^template_type: gear-guide" **/*.md | xargs grep -L "^manufacturer:"
```
**Count by type:**
```bash
for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do
echo "$type: $(grep -l "template_type: $type" **/*.md | wc -l)"
done
```
### 4. Validate Dates
**Find invalid schema_validated dates:**
```bash
# Should be YYYY-MM-DD format
grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}"
```
**Find files validated before a certain date:**
```bash
# Files validated before template_type was added
grep "^schema_validated: 2025-11-0[1-7]" **/*.md
```
## Refactoring Operations
### 1. Add Missing Fields
**Add template_type to untyped files:**
```bash
# Find feature guide files without template_type
for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:"); do
# Insert after slug field
sed -i '' '/^slug:/a\
template_type: feature-guide
' "$file"
echo "Added template_type to $file"
done
```
**Add schema_validated with current date:**
```bash
for file in $(grep -L "^schema_validated:" **/*.md); do
sed -i '' '/^template_type:/a\
schema_validated: '$(date +"%Y-%m-%d")'
' "$file"
echo "Added schema_validated to $file"
done
```
### 2. Update Field Values
**Update all schema_validated dates after validation:**
```bash
# After validating all files
for file in **/*.md; do
sed -i '' "s/^schema_validated:.*/schema_validated: $(date +"%Y-%m-%d")/" "$file"
done
```
**Fix template_type typos:**
```bash
# Fix "feature" to "feature-guide"
sed -i '' 's/^template_type: feature$/template_type: feature-guide/' **/*.md
```
### 3. Enforce Field Order
**Standardize frontmatter order:**
```bash
# Desired order:
# created, updated, edited_seconds, slug, template_type, schema_validated, tasks...
# This requires YAML parsing (use yq)
for file in **/*.md; do
yq eval -i '. | sort_keys' "$file"
done
```
### 4. Bulk Updates by Template Type
**Add new field to all gear guides:**
```bash
for file in $(grep -l "template_type: gear-guide" **/*.md); do
# Add manufacturer_url field if missing
if ! grep -q "^manufacturer_url:" "$file"; then
sed -i '' '/^manufacturer:/a\
manufacturer_url:
' "$file"
echo "Added manufacturer_url to $file"
fi
done
```
## Validation Scripts
### Complete Validation Script
Create `validate-schema.sh`:
```bash
#!/bin/bash
echo "=== Frontmatter Schema Validation ==="
echo
# Check for missing required fields
echo "Files missing template_type:"
grep -L "^template_type:" **/*.md
echo
echo "Files missing schema_validated:"
grep -L "^schema_validated:" **/*.md
echo
# Validate template_type values
echo "Invalid template_type values:"
grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)"
echo
# Validate date formats
echo "Invalid schema_validated dates:"
grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}"
echo
# Count files by type
echo "Files by template type:"
for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do
count=$(grep -l "template_type: $type" **/*.md 2>/dev/null | wc -l)
echo " $type: $count"
done
echo
echo "=== Validation Complete ==="
```
**Usage:**
```bash
chmod +x validate-schema.sh
./validate-schema.sh
```
### Validation + Auto-Fix Script
Create `fix-schema.sh`:
```bash
#!/bin/bash
echo "=== Auto-fixing Schema Issues ==="
echo
# Add template_type to feature guides missing it
echo "Adding template_type to feature guides..."
for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:" 2>/dev/null); do
sed -i '' '/^slug:/a\
template_type: feature-guide
' "$file"
echo " Fixed: $file"
done
# Add schema_validated to all files missing it
echo
echo "Adding schema_validated..."
for file in $(grep -L "^schema_validated:" **/*.md); do
sed -i '' '/^template_type:/a\
schema_validated: '$(date +"%Y-%m-%d")'
' "$file"
echo " Fixed: $file"
done
echo
echo "=== Auto-fix Complete ==="
echo "Run validate-schema.sh to verify"
```
## Integration with Git
### Pre-Commit Hook
Create `.git/hooks/pre-commit`:
```bash
#!/bin/bash
# Run schema validation before commit
./validate-schema.sh > /tmp/schema-validation.txt
if grep -q "Invalid" /tmp/schema-validation.txt; then
echo "❌ Schema validation failed!"
cat /tmp/schema-validation.txt
echo
echo "Fix issues or run: ./fix-schema.sh"
exit 1
fi
echo "✅ Schema validation passed"
exit 0
```
**Make executable:**
```bash
chmod +x .git/hooks/pre-commit
```
### Finding Files Changed Since Last Validation
```bash
# Files modified after their last schema_validated date
for file in **/*.md; do
validated_date=$(grep "^schema_validated:" "$file" | cut -d' ' -f2)
file_modified=$(stat -f "%Sm" -t "%Y-%m-%d" "$file")
if [[ "$file_modified" > "$validated_date" ]]; then
echo "$file needs revalidation (modified: $file_modified, validated: $validated_date)"
fi
done
```
## Advanced Validation with yq
**Install yq:**
```bash
brew install yq
```
### Extract and Validate Frontmatter
**Get all frontmatter as JSON:**
```bash
yq eval -o=json '.' file.md
```
**Check if required fields exist:**
```bash
yq eval 'has("template_type") and has("schema_validated")' file.md
# Output: true or false
```
**Validate all required fields:**
```bash
for file in **/*.md; do
if ! yq eval 'has("created") and has("slug") and has("template_type")' "$file" | grep -q "true"; then
echo "Missing required fields: $file"
fi
done
```
### Bulk Update with yq
**Set schema_validated on all files:**
```bash
for file in **/*.md; do
yq eval -i ".schema_validated = \"$(date +"%Y-%m-%d")\"" "$file"
done
```
**Add field conditionally:**
```bash
for file in **/*.md; do
if ! yq eval 'has("template_type")' "$file" | grep -q "true"; then
yq eval -i '.template_type = "feature-guide"' "$file"
fi
done
```
## Performance Comparison
**Scenario:** Validate 500 markdown files
| Method | Time | Cost |
|--------|------|------|
| **AI (GPT-4)** | ~10 min | ~$2.00 (tokens) |
| **bash + grep** | ~2 sec | $0.00 |
| **yq parsing** | ~10 sec | $0.00 |
**Conclusion:** Use programmatic validation for bulk operations. Reserve AI for content quality and pattern detection.
## Best Practices
### 1. Validate After Schema Changes
When you add/remove frontmatter fields:
1. Update templates in `_templates/`
2. Update AGENTS.md documentation
3. Run `validate-schema.sh`
4. Run `fix-schema.sh` if needed
5. Commit changes
### 2. Validate Before Publishing
```bash
# Pre-publish checklist
./validate-schema.sh
# All clear? Publish!
```
### 3. Track Validation in Git
```bash
# Commit validation date updates separately
git add **/*.md
git commit -m "chore: update schema_validated dates after validation"
```
### 4. Use template_type for Targeted Updates
```bash
# Example: All feature guides need new "Keyboard Shortcut" section
for file in $(grep -l "template_type: feature-guide" **/*.md); do
# Check if section exists
if ! grep -q "## Keyboard Shortcut" "$file"; then
# Add section after "How to Use It"
sed -i '' '/## How to Use It/a\
\
## Keyboard Shortcut\
\
**Shortcut:** \
' "$file"
fi
done
```
## Related
- [[Template Systems - Templater vs AI Agent Creation]]
- [[Verifying File Changes with Terminal Inspection]]
- `_Nakul/5. Coding Actions/midimaze/Article Structure Patterns.md`