Programmatic Schema Validation and Refactoring

Path: Computer Tech/Productivity/Obsidian/Programmatic Schema Validation and Refactoring.mdUpdated: 2/3/2026

Programmatic Schema Validation and Refactoring

Why Schema Validation Matters

When building a knowledge base with hundreds of articles, consistency is critical. But manually checking every file for proper frontmatter, correct section order, and template compliance is unsustainable.

Solution: Use bash scripts and command-line tools to validate and refactor files programmatically, saving AI compute for actual content creation.

The Efficiency Argument

AI-driven validation (slow, expensive):

AI reads 500 files → 500 LLM calls
Token cost scales linearly with vault size
Slow iteration cycles

Programmatic validation (fast, free):

Bash script reads 500 files → milliseconds
Zero token cost
Instant feedback loops

Use AI for: Content creation, pattern recognition, writing Use scripts for: Validation, bulk updates, schema enforcement

Core Frontmatter Schema

midimaze uses structured frontmatter with two key tracking fields:

yaml
---
created: 2025-11-08T14:55:58-0800
updated: 2025-11-08T14:55:58-0800
edited_seconds: 0
slug: 3jsvv8n3qmi
template_type: feature-guide           # ← Enables programmatic filtering
schema_validated: 2025-11-08           # ← Tracks last validation
tasks_status: 
tasks_unfinished: 
tasks_completed: 
---

Key Fields

template_type: Pattern used for this article

Values: feature-guide, conceptual-explainer, workflow-guide, troubleshooting-guide, gear-guide, etc.
Why: Enables bulk operations by category
Example: Find all feature guides: grep -l "template_type: feature-guide" **/*.md

schema_validated: Date of last validation

Format: YYYY-MM-DD
Why: Track which files need re-validation after schema changes
Example: Find unvalidated files: grep -L "schema_validated:" **/*.md

Validation Operations

1. Find Files Missing Required Fields

Check for template_type:

bash
# Files missing template_type (old schema)
grep -L "^template_type:" **/*.md

Check for schema_validated:

bash
# Files never validated
grep -L "^schema_validated:" **/*.md

Check for both:

bash
# Combine with logical AND
grep -L "^template_type:" **/*.md | xargs grep -L "^schema_validated:"

2. Validate Field Values

Check for invalid template types:

bash
# Extract all template_type values
grep "^template_type:" **/*.md | sort | uniq

# Expected values:
# template_type: feature-guide
# template_type: conceptual-explainer
# template_type: workflow-guide
# template_type: troubleshooting-guide
# template_type: gear-guide

Find typos:

bash
# Should return empty (no typos)
grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)"

3. Find Files by Pattern Type

All feature guides:

bash
grep -l "^template_type: feature-guide" **/*.md

All gear guides needing manufacturer field:

bash
grep -l "^template_type: gear-guide" **/*.md | xargs grep -L "^manufacturer:"

Count by type:

bash
for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do
  echo "$type: $(grep -l "template_type: $type" **/*.md | wc -l)"
done

4. Validate Dates

Find invalid schema_validated dates:

bash
# Should be YYYY-MM-DD format
grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}"

Find files validated before a certain date:

bash
# Files validated before template_type was added
grep "^schema_validated: 2025-11-0[1-7]" **/*.md

Refactoring Operations

1. Add Missing Fields

Add template_type to untyped files:

bash
# Find feature guide files without template_type
for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:"); do
  # Insert after slug field
  sed -i '' '/^slug:/a\
template_type: feature-guide
' "$file"
  echo "Added template_type to $file"
done

Add schema_validated with current date:

bash
for file in $(grep -L "^schema_validated:" **/*.md); do
  sed -i '' '/^template_type:/a\
schema_validated: '$(date +"%Y-%m-%d")'
' "$file"
  echo "Added schema_validated to $file"
done

2. Update Field Values

Update all schema_validated dates after validation:

bash
# After validating all files
for file in **/*.md; do
  sed -i '' "s/^schema_validated:.*/schema_validated: $(date +"%Y-%m-%d")/" "$file"
done

Fix template_type typos:

bash
# Fix "feature" to "feature-guide"
sed -i '' 's/^template_type: feature$/template_type: feature-guide/' **/*.md

3. Enforce Field Order

Standardize frontmatter order:

bash
# Desired order:
# created, updated, edited_seconds, slug, template_type, schema_validated, tasks...

# This requires YAML parsing (use yq)
for file in **/*.md; do
  yq eval -i '. | sort_keys' "$file"
done

4. Bulk Updates by Template Type

Add new field to all gear guides:

bash
for file in $(grep -l "template_type: gear-guide" **/*.md); do
  # Add manufacturer_url field if missing
  if ! grep -q "^manufacturer_url:" "$file"; then
    sed -i '' '/^manufacturer:/a\
manufacturer_url: 
' "$file"
    echo "Added manufacturer_url to $file"
  fi
done

Validation Scripts

Complete Validation Script

Create validate-schema.sh:

bash
#!/bin/bash

echo "=== Frontmatter Schema Validation ==="
echo

# Check for missing required fields
echo "Files missing template_type:"
grep -L "^template_type:" **/*.md
echo

echo "Files missing schema_validated:"
grep -L "^schema_validated:" **/*.md
echo

# Validate template_type values
echo "Invalid template_type values:"
grep "^template_type:" **/*.md | grep -v -E "(feature-guide|conceptual-explainer|workflow-guide|troubleshooting-guide|gear-guide)"
echo

# Validate date formats
echo "Invalid schema_validated dates:"
grep "^schema_validated:" **/*.md | grep -v -E "[0-9]{4}-[0-9]{2}-[0-9]{2}"
echo

# Count files by type
echo "Files by template type:"
for type in feature-guide conceptual-explainer workflow-guide troubleshooting-guide gear-guide; do
  count=$(grep -l "template_type: $type" **/*.md 2>/dev/null | wc -l)
  echo "  $type: $count"
done
echo

echo "=== Validation Complete ==="

Usage:

bash
chmod +x validate-schema.sh
./validate-schema.sh

Validation + Auto-Fix Script

Create fix-schema.sh:

bash
#!/bin/bash

echo "=== Auto-fixing Schema Issues ==="
echo

# Add template_type to feature guides missing it
echo "Adding template_type to feature guides..."
for file in $(grep -l "## What It Does" **/*.md | xargs grep -L "^template_type:" 2>/dev/null); do
  sed -i '' '/^slug:/a\
template_type: feature-guide
' "$file"
  echo "  Fixed: $file"
done

# Add schema_validated to all files missing it
echo
echo "Adding schema_validated..."
for file in $(grep -L "^schema_validated:" **/*.md); do
  sed -i '' '/^template_type:/a\
schema_validated: '$(date +"%Y-%m-%d")'
' "$file"
  echo "  Fixed: $file"
done

echo
echo "=== Auto-fix Complete ==="
echo "Run validate-schema.sh to verify"

Integration with Git

Pre-Commit Hook

Create .git/hooks/pre-commit:

bash
#!/bin/bash

# Run schema validation before commit
./validate-schema.sh > /tmp/schema-validation.txt

if grep -q "Invalid" /tmp/schema-validation.txt; then
  echo "❌ Schema validation failed!"
  cat /tmp/schema-validation.txt
  echo
  echo "Fix issues or run: ./fix-schema.sh"
  exit 1
fi

echo "✅ Schema validation passed"
exit 0

Make executable:

bash
chmod +x .git/hooks/pre-commit

Finding Files Changed Since Last Validation

bash
# Files modified after their last schema_validated date
for file in **/*.md; do
  validated_date=$(grep "^schema_validated:" "$file" | cut -d' ' -f2)
  file_modified=$(stat -f "%Sm" -t "%Y-%m-%d" "$file")
  
  if <span class="wikilink-broken" title="Page not found:  "$file_modified" > "$validated_date" "> "$file_modified" > "$validated_date" </span>; then
    echo "$file needs revalidation (modified: $file_modified, validated: $validated_date)"
  fi
done

Advanced Validation with yq

Install yq:

bash
brew install yq

Extract and Validate Frontmatter

Get all frontmatter as JSON:

bash
yq eval -o=json '.' file.md

Check if required fields exist:

bash
yq eval 'has("template_type") and has("schema_validated")' file.md
# Output: true or false

Validate all required fields:

bash
for file in **/*.md; do
  if ! yq eval 'has("created") and has("slug") and has("template_type")' "$file" | grep -q "true"; then
    echo "Missing required fields: $file"
  fi
done

Bulk Update with yq

Set schema_validated on all files:

bash
for file in **/*.md; do
  yq eval -i ".schema_validated = \"$(date +"%Y-%m-%d")\"" "$file"
done

Add field conditionally:

bash
for file in **/*.md; do
  if ! yq eval 'has("template_type")' "$file" | grep -q "true"; then
    yq eval -i '.template_type = "feature-guide"' "$file"
  fi
done

Performance Comparison

Scenario: Validate 500 markdown files

Method	Time	Cost
AI (GPT-4)	~10 min	~$2.00 (tokens)
bash + grep	~2 sec	$0.00
yq parsing	~10 sec	$0.00

Conclusion: Use programmatic validation for bulk operations. Reserve AI for content quality and pattern detection.

Best Practices

1. Validate After Schema Changes

When you add/remove frontmatter fields:

Update templates in _templates/
Update AGENTS.md documentation
Run validate-schema.sh
Run fix-schema.sh if needed
Commit changes

2. Validate Before Publishing

bash
# Pre-publish checklist
./validate-schema.sh
# All clear? Publish!

3. Track Validation in Git

bash
# Commit validation date updates separately
git add **/*.md
git commit -m "chore: update schema_validated dates after validation"

4. Use template_type for Targeted Updates

bash
# Example: All feature guides need new "Keyboard Shortcut" section
for file in $(grep -l "template_type: feature-guide" **/*.md); do
  # Check if section exists
  if ! grep -q "## Keyboard Shortcut" "$file"; then
    # Add section after "How to Use It"
    sed -i '' '/## How to Use It/a\
\
## Keyboard Shortcut\
\
**Shortcut:** \
' "$file"
  fi
done

Template Systems - Templater vs AI Agent Creation
Verifying File Changes with Terminal Inspection
_Nakul/5. Coding Actions/midimaze/Article Structure Patterns.md

Programmatic Schema Validation and Refactoring

Programmatic Schema Validation and Refactoring

Why Schema Validation Matters

The Efficiency Argument

Core Frontmatter Schema

Key Fields

Validation Operations

1. Find Files Missing Required Fields

2. Validate Field Values

3. Find Files by Pattern Type

4. Validate Dates

Refactoring Operations

1. Add Missing Fields

2. Update Field Values

3. Enforce Field Order

4. Bulk Updates by Template Type

Validation Scripts

Complete Validation Script

Validation + Auto-Fix Script

Integration with Git

Pre-Commit Hook

Finding Files Changed Since Last Validation

Advanced Validation with yq

Extract and Validate Frontmatter

Bulk Update with yq

Performance Comparison

Best Practices

1. Validate After Schema Changes

2. Validate Before Publishing

3. Track Validation in Git

4. Use template_type for Targeted Updates

Related