← Back to articles

Programmatic Schema Validation and Refactoring

Path: Computer Tech/Productivity/Obsidian/Programmatic Schema Validation and Refactoring.mdUpdated: 2/3/2026

Programmatic Schema Validation and Refactoring

Why Schema Validation Matters

When building a knowledge base with hundreds of articles, consistency is critical. But manually checking every file for proper frontmatter, correct section order, and template compliance is unsustainable.

Solution: Use bash scripts and command-line tools to validate and refactor files programmatically, saving AI compute for actual content creation.

The Efficiency Argument

AI-driven validation (slow, expensive):

  • AI reads 500 files β†’ 500 LLM calls
  • Token cost scales linearly with vault size
  • Slow iteration cycles

Programmatic validation (fast, free):

  • Bash script reads 500 files β†’ milliseconds
  • Zero token cost
  • Instant feedback loops

Use AI for: Content creation, pattern recognition, writing Use scripts for: Validation, bulk updates, schema enforcement

Core Frontmatter Schema

midimaze uses structured frontmatter with two key tracking fields:

yaml
---
created: 2025-11-08T14:55:58-0800
updated: 2025-11-08T14:55:58-0800
edited_seconds: 0
slug: 3jsvv8n3qmi
template_type: feature-guide           # ← Enables programmatic filtering
schema_validated: 2025-11-08           # ← Tracks last validation
tasks_status: 
tasks_unfinished: 
tasks_completed: 
---

Key Fields

template_type: Pattern used for this article

  • Values: feature-guide, conceptual-explainer, workflow-guide, troubleshooting-guide, gear-guide, etc.
  • Why: Enables bulk operations by category
  • Example: Find all feature guides: grep -l "template_type: feature-guide" **/*.md

schema_validated: Date of last validation

  • Format: YYYY-MM-DD
  • Why: Track which files need re-validation after schema changes
  • Example: Find unvalidated files: grep -L "schema_validated:" **/*.md

Validation Operations

1. Find Files Missing Required Fields

Check for template_type:

bash
# Files missing template_type (old schema)
grep -L "^template_type:" **/*.md

Check for schema_validated:

bash
# Files never validated
grep -L "^schema_validated:" **/*.md

Check for both:

bash
# Combine with logical AND
grep -L "^template_type:" **/*.md | xargs grep -L "^schema_validated:"

2. Validate Field Values

Check for invalid template types:

bash
# Extract all template_type values
grep "^template_type:" **/*.md | sort | uniq

# Expected values:
# template_type: feature-guide
# template_type: conceptual-explainer
# template_type: workflow-guide
# template_type: troubleshooting-guide
# template_type: gear-guide

Find typos:

bash
# Should return empty (no typos)
grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)"

3. Find Files by Pattern Type

All feature guides:

bash
grep -l "^template_type: feature-guide" **/*.md

All gear guides needing manufacturer field:

bash
grep -l "^template_type: gear-guide" **/*.md | xargs grep -L "^manufacturer:"

Count by type:

bash
for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do
  echo "$type: $(grep -l "template_type: $type" **/*.md | wc -l)"
done

4. Validate Dates

Find invalid schema_validated dates:

bash
# Should be YYYY-MM-DD format
grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}"

Find files validated before a certain date:

bash
# Files validated before template_type was added
grep "^schema_validated: 2025-11-0[1-7]" **/*.md

Refactoring Operations

1. Add Missing Fields

Add template_type to untyped files:

bash
# Find feature guide files without template_type
for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:"); do
  # Insert after slug field
  sed -i '' '/^slug:/a\
template_type: feature-guide
' "$file"
  echo "Added template_type to $file"
done

Add schema_validated with current date:

bash
for file in $(grep -L "^schema_validated:" **/*.md); do
  sed -i '' '/^template_type:/a\
schema_validated: '$(date +"%Y-%m-%d")'
' "$file"
  echo "Added schema_validated to $file"
done

2. Update Field Values

Update all schema_validated dates after validation:

bash
# After validating all files
for file in **/*.md; do
  sed -i '' "s/^schema_validated:.*/schema_validated: $(date +"%Y-%m-%d")/" "$file"
done

Fix template_type typos:

bash
# Fix "feature" to "feature-guide"
sed -i '' 's/^template_type: feature$/template_type: feature-guide/' **/*.md

3. Enforce Field Order

Standardize frontmatter order:

bash
# Desired order:
# created, updated, edited_seconds, slug, template_type, schema_validated, tasks...

# This requires YAML parsing (use yq)
for file in **/*.md; do
  yq eval -i '. | sort_keys' "$file"
done

4. Bulk Updates by Template Type

Add new field to all gear guides:

bash
for file in $(grep -l "template_type: gear-guide" **/*.md); do
  # Add manufacturer_url field if missing
  if ! grep -q "^manufacturer_url:" "$file"; then
    sed -i '' '/^manufacturer:/a\
manufacturer_url: 
' "$file"
    echo "Added manufacturer_url to $file"
  fi
done

Validation Scripts

Complete Validation Script

Create validate-schema.sh:

bash
#!/bin/bash

echo "=== Frontmatter Schema Validation ==="
echo

# Check for missing required fields
echo "Files missing template_type:"
grep -L "^template_type:" **/*.md
echo

echo "Files missing schema_validated:"
grep -L "^schema_validated:" **/*.md
echo

# Validate template_type values
echo "Invalid template_type values:"
grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)"
echo

# Validate date formats
echo "Invalid schema_validated dates:"
grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}"
echo

# Count files by type
echo "Files by template type:"
for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do
  count=$(grep -l "template_type: $type" **/*.md 2>/dev/null | wc -l)
  echo "  $type: $count"
done
echo

echo "=== Validation Complete ==="

Usage:

bash
chmod +x validate-schema.sh
./validate-schema.sh

Validation + Auto-Fix Script

Create fix-schema.sh:

bash
#!/bin/bash

echo "=== Auto-fixing Schema Issues ==="
echo

# Add template_type to feature guides missing it
echo "Adding template_type to feature guides..."
for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:" 2>/dev/null); do
  sed -i '' '/^slug:/a\
template_type: feature-guide
' "$file"
  echo "  Fixed: $file"
done

# Add schema_validated to all files missing it
echo
echo "Adding schema_validated..."
for file in $(grep -L "^schema_validated:" **/*.md); do
  sed -i '' '/^template_type:/a\
schema_validated: '$(date +"%Y-%m-%d")'
' "$file"
  echo "  Fixed: $file"
done

echo
echo "=== Auto-fix Complete ==="
echo "Run validate-schema.sh to verify"

Integration with Git

Pre-Commit Hook

Create .git/hooks/pre-commit:

bash
#!/bin/bash

# Run schema validation before commit
./validate-schema.sh > /tmp/schema-validation.txt

if grep -q "Invalid" /tmp/schema-validation.txt; then
  echo "❌ Schema validation failed!"
  cat /tmp/schema-validation.txt
  echo
  echo "Fix issues or run: ./fix-schema.sh"
  exit 1
fi

echo "βœ… Schema validation passed"
exit 0

Make executable:

bash
chmod +x .git/hooks/pre-commit

Finding Files Changed Since Last Validation

bash
# Files modified after their last schema_validated date
for file in **/*.md; do
  validated_date=$(grep "^schema_validated:" "$file" | cut -d' ' -f2)
  file_modified=$(stat -f "%Sm" -t "%Y-%m-%d" "$file")
  
  if <span class="wikilink-broken" title="Page not found:  "$file_modified" > "$validated_date" "> "$file_modified" > "$validated_date" </span>; then
    echo "$file needs revalidation (modified: $file_modified, validated: $validated_date)"
  fi
done

Advanced Validation with yq

Install yq:

bash
brew install yq

Extract and Validate Frontmatter

Get all frontmatter as JSON:

bash
yq eval -o=json '.' file.md

Check if required fields exist:

bash
yq eval 'has("template_type") and has("schema_validated")' file.md
# Output: true or false

Validate all required fields:

bash
for file in **/*.md; do
  if ! yq eval 'has("created") and has("slug") and has("template_type")' "$file" | grep -q "true"; then
    echo "Missing required fields: $file"
  fi
done

Bulk Update with yq

Set schema_validated on all files:

bash
for file in **/*.md; do
  yq eval -i ".schema_validated = \"$(date +"%Y-%m-%d")\"" "$file"
done

Add field conditionally:

bash
for file in **/*.md; do
  if ! yq eval 'has("template_type")' "$file" | grep -q "true"; then
    yq eval -i '.template_type = "feature-guide"' "$file"
  fi
done

Performance Comparison

Scenario: Validate 500 markdown files

MethodTimeCost
AI (GPT-4)~10 min~$2.00 (tokens)
bash + grep~2 sec$0.00
yq parsing~10 sec$0.00

Conclusion: Use programmatic validation for bulk operations. Reserve AI for content quality and pattern detection.

Best Practices

1. Validate After Schema Changes

When you add/remove frontmatter fields:

  1. Update templates in _templates/
  2. Update AGENTS.md documentation
  3. Run validate-schema.sh
  4. Run fix-schema.sh if needed
  5. Commit changes

2. Validate Before Publishing

bash
# Pre-publish checklist
./validate-schema.sh
# All clear? Publish!

3. Track Validation in Git

bash
# Commit validation date updates separately
git add **/*.md
git commit -m "chore: update schema_validated dates after validation"

4. Use template_type for Targeted Updates

bash
# Example: All feature guides need new "Keyboard Shortcut" section
for file in $(grep -l "template_type: feature-guide" **/*.md); do
  # Check if section exists
  if ! grep -q "## Keyboard Shortcut" "$file"; then
    # Add section after "How to Use It"
    sed -i '' '/## How to Use It/a\
\
## Keyboard Shortcut\
\
**Shortcut:** \
' "$file"
  fi
done

Related