← Back to articles

GPS Dataset Architecture with Convex

Path: Computer Tech/Development/Tech Companies/Google/Google Maps Platform/GPS Dataset Catalog/GPS Dataset Architecture with Convex.mdUpdated: 2/3/2026

GPS Dataset Architecture with Convex

Problem Statement

Challenge: How to manage a growing catalog of GPS-tagged objects (field recordings, venue locations, water infrastructure, monitoring sites) with rich metadata, semantic search, and integration with Google Maps?

Requirements:

  1. Store GPS coordinates with extensive metadata (equipment, dates, conditions, notes)
  2. Support multiple dataset types (audio recordings, venues, infrastructure)
  3. Enable semantic search ("find recordings with heavy bass near the border")
  4. Sync with Google Maps Datasets API for visualization
  5. Version control via MDX/GeoJSON files
  6. Real-time collaboration and updates
  7. Integrate with existing NNT ecosystem (share song/audio data)

Solution: Convex database with structured schema, Google Maps Datasets API for rendering, dual persistence (Convex + GeoJSON files), semantic search via embeddings.

Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Convex Database                                    β”‚
β”‚  β€’ gpsDatasets - Dataset collections               β”‚
β”‚  β€’ gpsObjects - Individual GPS points/geometries   β”‚
β”‚  β€’ gpsObjectEmbeddings - Semantic search vectors   β”‚
β”‚  β€’ Related: songs, venues, infrastructure          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↕ (bidirectional sync)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Google Maps Datasets API                           β”‚
β”‚  β€’ Spatial indexing and rendering                  β”‚
β”‚  β€’ Vector tiles for performance                    β”‚
β”‚  β€’ Integration with Maps JavaScript API            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↕ (version control)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  GeoJSON Files (Git)                                β”‚
β”‚  β€’ Source of truth for GPS data                    β”‚
β”‚  β€’ Stored in vault: _Nakul/5. Coding Actions/     β”‚
β”‚  β€’ Exported from Convex periodically               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Convex Schema

Core Tables

typescript
// convex/schema.ts
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

export default defineSchema({
  // GPS Dataset collections (e.g., "Tijuana Field Recordings 2025")
  gpsDatasets: defineTable({
    name: v.string(),
    description: v.string(),
    datasetType: v.union(
      v.literal("audio_recordings"),
      v.literal("music_venues"),
      v.literal("water_infrastructure"),
      v.literal("monitoring_sites"),
      v.literal("custom")
    ),
    
    // Google Maps Datasets API integration
    googleDatasetId: v.optional(v.string()),  // From Maps Datasets API
    lastSyncedAt: v.optional(v.number()),
    
    // File locations
    geojsonPath: v.string(),  // Path in vault: _Nakul/5. Coding Actions/...
    
    // Metadata
    createdBy: v.string(),
    createdAt: v.number(),
    updatedAt: v.number(),
    isPublic: v.boolean()  // Share on midimaze.com?
  })
    .index("by_name", ["name"])
    .index("by_type", ["datasetType"])
    .index("by_google_id", ["googleDatasetId"]),
  
  // Individual GPS objects (points, lines, polygons)
  gpsObjects: defineTable({
    datasetId: v.id("gpsDatasets"),
    
    // GeoJSON geometry
    geometry: v.object({
      type: v.union(
        v.literal("Point"),
        v.literal("LineString"),
        v.literal("Polygon"),
        v.literal("MultiPoint"),
        v.literal("MultiLineString"),
        v.literal("MultiPolygon")
      ),
      coordinates: v.any()  // Array structure varies by type
    }),
    
    // Common properties
    name: v.string(),
    description: v.optional(v.string()),
    
    // Type-specific properties
    properties: v.any(),  // Flexible JSON object
    
    // Relationships
    songId: v.optional(v.id("songs")),  // Link to NNT song if audio recording
    venueId: v.optional(v.string()),    // Link to venue database
    
    // Metadata
    createdAt: v.number(),
    updatedAt: v.number(),
    tags: v.array(v.string())
  })
    .index("by_dataset", ["datasetId"])
    .index("by_song", ["songId"])
    .index("by_tags", ["tags"]),
  
  // Embeddings for semantic search
  gpsObjectEmbeddings: defineTable({
    objectId: v.id("gpsObjects"),
    datasetId: v.id("gpsDatasets"),
    
    // Vector embedding
    embedding: v.array(v.float64()),  // 1536 dimensions (OpenAI)
    
    // Searchable text (concatenated from name, description, properties)
    searchText: v.string(),
    
    // Metadata for filtering
    metadata: v.object({
      name: v.string(),
      datasetType: v.string(),
      coordinates: v.array(v.float64()),  // [lng, lat] for distance sorting
      tags: v.array(v.string())
    }),
    
    createdAt: v.number()
  })
    .index("by_object", ["objectId"])
    .index("by_dataset", ["datasetId"]),
  
  // Existing NNT tables (integrate with)
  songs: defineTable({
    title: v.string(),
    artist: v.string(),
    album: v.optional(v.string()),
    year: v.optional(v.number()),
    appleMusicId: v.optional(v.string()),
    spotifyId: v.optional(v.string()),
    youtubeId: v.optional(v.string()),
    duration: v.optional(v.number()),
    createdAt: v.number(),
    updatedAt: v.number()
  })
    .index("by_title_artist", ["title", "artist"])
    .index("by_apple_music", ["appleMusicId"])
});

Dataset Type Schemas

Audio Recordings Dataset

Properties structure:

typescript
{
  // Recording metadata
  recording_date: "2025-11-01T21:30:00Z",
  equipment: "Zoom H6, Rode NTG3",
  duration: "45:32",
  
  // Audio files
  audio_url: "https://audio.example.com/tj-jazz-2025-11-01.wav",
  waveform_url: "https://audio.example.com/tj-jazz-2025-11-01-waveform.png",
  
  // Recording conditions
  weather: "Clear, 18Β°C, light winds",
  ambient_noise: "moderate",
  time_of_day: "evening",
  
  // Environmental data (from Google APIs)
  uaqi: 42,  // Air Quality Index
  pollen_level: "low",
  
  // Technical metadata
  sample_rate: 48000,
  bit_depth: 24,
  channels: 2,
  file_size_mb: 234,
  
  // Notes
  notes: "Live jazz trio, moderate audience noise",
  transcript: "Optional voice annotation transcript"
}

Example Convex object:

typescript
{
  _id: "abc123",
  datasetId: "tijuana-recordings-2025",
  geometry: {
    type: "Point",
    coordinates: [-117.0363, 32.5327]  // [longitude, latitude]
  },
  name: "Tijuana Jazz Club - Friday Night Set",
  description: "Live jazz trio performance, intimate atmosphere",
  properties: {
    recording_date: "2025-11-01T21:30:00Z",
    equipment: "Zoom H6, Rode NTG3",
    duration: "45:32",
    audio_url: "https://audio.example.com/tj-jazz-2025-11-01.wav",
    weather: "Clear, 18Β°C, light winds",
    uaqi: 42,
    notes: "Live jazz trio, moderate audience noise"
  },
  songId: "song_xyz789",  // Link to songs table
  tags: ["jazz", "live", "tijuana", "trio"],
  createdAt: 1699488000,
  updatedAt: 1699488000
}

Music Venues Dataset

Properties structure:

typescript
{
  // Venue info
  address: "Av. RevoluciΓ³n 1006, Centro, Tijuana",
  capacity: 150,
  genres: ["jazz", "blues", "soul"],
  
  // Contact
  phone: "663 438 3418",
  website: "https://tijuanajazzclub.com",
  email: "[email protected]",
  
  // Hours (JSON)
  hours: {
    monday: "18:30-00:30",
    tuesday: "closed",
    // ...
  },
  
  // Equipment
  sound_system: "QSC K12.2 speakers, Yamaha MG16 mixer",
  stage_size: "12ft x 8ft",
  has_piano: true,
  has_drums: true,
  
  // Reviews
  google_rating: 4.8,
  google_reviews: 127,
  
  // Venue network analysis
  nearest_border_crossing: "San Ysidro",
  distance_to_border_km: 5.2
}

Water Infrastructure Dataset

Properties structure:

typescript
{
  // Infrastructure type
  feature_type: "sewer_main" | "manhole" | "pump_station" | "treatment_plant",
  
  // Sewer line properties (LineString)
  line_id: "TJ-SW-1234",
  diameter_mm: 450,
  material: "PVC",
  install_date: "2010-06-15",
  last_inspection: "2025-08-20",
  condition: "good" | "moderate_wear" | "needs_repair" | "critical",
  flow_direction: "gravity" | "pressurized",
  
  // Manhole properties (Point)
  manhole_id: "MH-0423",
  depth_m: 3.2,
  access_type: "street" | "alley" | "private",
  
  // Treatment plant properties (Polygon)
  plant_id: "WWTP-La-Morita",
  capacity_mgd: 25,  // Million gallons per day
  service_population: 150000,
  treatment_level: "secondary" | "tertiary",
  
  // Maintenance
  priority: "routine" | "medium" | "high" | "urgent",
  next_inspection_date: "2026-02-15",
  maintenance_notes: "Sediment buildup, schedule cleaning"
}

Key Mutations

Create Dataset

typescript
// convex/gpsDatasets.ts
export const createDataset = mutation({
  args: {
    name: v.string(),
    description: v.string(),
    datasetType: v.union(
      v.literal("audio_recordings"),
      v.literal("music_venues"),
      v.literal("water_infrastructure"),
      v.literal("monitoring_sites"),
      v.literal("custom")
    ),
    geojsonPath: v.string(),
    isPublic: v.boolean()
  },
  handler: async (ctx, args) => {
    const datasetId = await ctx.db.insert("gpsDatasets", {
      name: args.name,
      description: args.description,
      datasetType: args.datasetType,
      geojsonPath: args.geojsonPath,
      createdBy: ctx.auth.userId ?? "nakul",
      createdAt: Date.now(),
      updatedAt: Date.now(),
      isPublic: args.isPublic
    });
    
    return datasetId;
  }
});

Add GPS Object

typescript
export const addGPSObject = mutation({
  args: {
    datasetId: v.id("gpsDatasets"),
    geometry: v.object({
      type: v.string(),
      coordinates: v.any()
    }),
    name: v.string(),
    description: v.optional(v.string()),
    properties: v.any(),
    songId: v.optional(v.id("songs")),
    tags: v.array(v.string())
  },
  handler: async (ctx, args) => {
    // Insert GPS object
    const objectId = await ctx.db.insert("gpsObjects", {
      datasetId: args.datasetId,
      geometry: args.geometry,
      name: args.name,
      description: args.description,
      properties: args.properties,
      songId: args.songId,
      tags: args.tags,
      createdAt: Date.now(),
      updatedAt: Date.now()
    });
    
    // Generate embedding asynchronously (background job)
    await ctx.scheduler.runAfter(0, "generateEmbedding", {
      objectId
    });
    
    return objectId;
  }
});

Generate Embedding (Background Job)

typescript
import { OpenAI } from "openai";

export const generateEmbedding = internalMutation({
  args: { objectId: v.id("gpsObjects") },
  handler: async (ctx, args) => {
    const object = await ctx.db.get(args.objectId);
    if (!object) return;
    
    const dataset = await ctx.db.get(object.datasetId);
    if (!dataset) return;
    
    // Concatenate searchable text
    const searchText = [
      object.name,
      object.description || "",
      JSON.stringify(object.properties),
      object.tags.join(" ")
    ].join(" ");
    
    // Generate embedding via OpenAI
    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    const response = await openai.embeddings.create({
      model: "text-embedding-3-small",
      input: searchText
    });
    
    const embedding = response.data[0].embedding;
    
    // Extract coordinates for Point geometries
    let coordinates = [0, 0];
    if (object.geometry.type === "Point") {
      coordinates = object.geometry.coordinates;
    }
    
    // Store embedding
    await ctx.db.insert("gpsObjectEmbeddings", {
      objectId: args.objectId,
      datasetId: object.datasetId,
      embedding,
      searchText,
      metadata: {
        name: object.name,
        datasetType: dataset.datasetType,
        coordinates,
        tags: object.tags
      },
      createdAt: Date.now()
    });
  }
});

Key Queries

Get Dataset with Objects

typescript
export const getDatasetWithObjects = query({
  args: { datasetId: v.id("gpsDatasets") },
  handler: async (ctx, args) => {
    const dataset = await ctx.db.get(args.datasetId);
    if (!dataset) return null;
    
    const objects = await ctx.db
      .query("gpsObjects")
      .withIndex("by_dataset", q => q.eq("datasetId", args.datasetId))
      .collect();
    
    return { dataset, objects };
  }
});

Semantic Search

typescript
export const semanticSearch = query({
  args: {
    query: v.string(),
    datasetId: v.optional(v.id("gpsDatasets")),
    limit: v.optional(v.number())
  },
  handler: async (ctx, args) => {
    // Generate query embedding
    const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
    const response = await openai.embeddings.create({
      model: "text-embedding-3-small",
      input: args.query
    });
    const queryEmbedding = response.data[0].embedding;
    
    // Get all embeddings (optionally filtered by dataset)
    let embeddingsQuery = ctx.db.query("gpsObjectEmbeddings");
    if (args.datasetId) {
      embeddingsQuery = embeddingsQuery.withIndex("by_dataset", q =>
        q.eq("datasetId", args.datasetId)
      );
    }
    const allEmbeddings = await embeddingsQuery.collect();
    
    // Calculate cosine similarity
    const withScores = allEmbeddings.map(doc => ({
      ...doc,
      similarity: cosineSimilarity(queryEmbedding, doc.embedding)
    }));
    
    // Sort by similarity and return top results
    const results = withScores
      .sort((a, b) => b.similarity - a.similarity)
      .slice(0, args.limit ?? 10);
    
    // Fetch full GPS objects
    const objects = await Promise.all(
      results.map(r => ctx.db.get(r.objectId))
    );
    
    return results.map((r, i) => ({
      object: objects[i],
      similarity: r.similarity
    }));
  }
});

// Helper function
function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  return dotProduct / (magnitudeA * magnitudeB);
}

Spatial Queries (Bounding Box)

typescript
export const getObjectsInBounds = query({
  args: {
    datasetId: v.id("gpsDatasets"),
    bounds: v.object({
      north: v.number(),
      south: v.number(),
      east: v.number(),
      west: v.number()
    })
  },
  handler: async (ctx, args) => {
    const objects = await ctx.db
      .query("gpsObjects")
      .withIndex("by_dataset", q => q.eq("datasetId", args.datasetId))
      .collect();
    
    // Filter by bounding box (for Point geometries)
    return objects.filter(obj => {
      if (obj.geometry.type !== "Point") return true;
      const [lng, lat] = obj.geometry.coordinates;
      return (
        lat >= args.bounds.south &&
        lat <= args.bounds.north &&
        lng >= args.bounds.west &&
        lng <= args.bounds.east
      );
    });
  }
});

Integration with NNT Ecosystem

Linking Field Recordings to Songs

When you record audio at a venue, you can link the GPS object to an existing song in your NNT ecosystem:

typescript
// 1. Create GPS object for field recording
const objectId = await addGPSObject({
  datasetId: "tijuana-recordings-2025",
  geometry: { type: "Point", coordinates: [-117.0363, 32.5327] },
  name: "Tijuana Jazz Club - Friday Night Set",
  properties: {
    recording_date: "2025-11-01T21:30:00Z",
    audio_url: "https://..."
  },
  songId: existingSongId,  // Link to songs table
  tags: ["jazz", "live"]
});

// 2. Now the song has a location
const song = await db.get(existingSongId);
const recording = await db.get(objectId);

// 3. tscribe component can display map
<TScribe 
  songId={existingSongId}
  showMap={true}  // Renders map with recording location
/>

Querying Songs by Location

typescript
export const getSongsNearLocation = query({
  args: {
    latitude: v.number(),
    longitude: v.number(),
    radiusKm: v.number()
  },
  handler: async (ctx, args) => {
    // Get all GPS objects linked to songs
    const allObjects = await ctx.db
      .query("gpsObjects")
      .filter(q => q.neq(q.field("songId"), undefined))
      .collect();
    
    // Filter by distance (approximate)
    const nearby = allObjects.filter(obj => {
      if (obj.geometry.type !== "Point") return false;
      const [lng, lat] = obj.geometry.coordinates;
      const distance = haversineDistance(
        args.latitude,
        args.longitude,
        lat,
        lng
      );
      return distance <= args.radiusKm;
    });
    
    // Fetch songs
    const songs = await Promise.all(
      nearby.map(obj => ctx.db.get(obj.songId!))
    );
    
    return nearby.map((obj, i) => ({
      object: obj,
      song: songs[i]
    }));
  }
});

Performance Considerations

Embedding Storage

Storage per object:

  • 1536 dimensions Γ— 8 bytes = ~12KB per GPS object
  • 10,000 objects = ~120MB (manageable)

Optimization strategies:

  1. Use smaller embedding models (384 or 768 dimensions)
  2. Only embed rich metadata (skip simple points)
  3. Store embeddings in separate vector database if Convex limits hit
  4. Regenerate embeddings periodically (monthly) rather than storing all versions

Query Optimization

Spatial queries:

  • Convex doesn't have native geospatial indexes
  • For large datasets (>10k objects), consider:
    • Bounding box pre-filter before precise distance calculations
    • Quadtree indexing in metadata
    • Offload spatial queries to Google Maps Datasets API

Embedding search:

  • Cosine similarity calculation is O(n) across all embeddings
  • For datasets >50k objects, consider:
    • Using specialized vector databases (Pinecone, Weaviate)
    • Implementing approximate nearest neighbor (ANN) algorithms
    • Pre-filtering by dataset type or tags before similarity search

Migration Path

Phase 1: Core Schema (Week 1)

  1. Implement gpsDatasets + gpsObjects tables
  2. Create basic mutations (add dataset, add object)
  3. Build simple React component to display datasets

Phase 2: Embeddings (Week 2)

  1. Add gpsObjectEmbeddings table
  2. Background job to generate embeddings
  3. Semantic search query implementation

Phase 3: Google Maps Integration (Week 3)

  1. Sync Convex β†’ Maps Datasets API (see next article)
  2. Render datasets on interactive maps
  3. Click handlers to show object details

Phase 4: NNT Integration (Week 4)

  1. Link GPS objects to songs table
  2. Update tscribe component to show maps
  3. "Record at location" workflow

Phase 5: Version Control (Week 5)

  1. Export Convex β†’ GeoJSON files
  2. Commit GeoJSON to Git
  3. Seed Convex from GeoJSON on deploy

File Storage Structure

GeoJSON files in vault:

_Nakul/5. Coding Actions/midimaze/gps-datasets/
β”œβ”€β”€ tijuana-field-recordings-2025.geojson
β”œβ”€β”€ music-venues-tijuana.geojson
β”œβ”€β”€ water-infrastructure-tijuana.geojson
└── monitoring-sites-tijuana-river.geojson

Each file synced with:

  • Convex gpsDatasets table (live, mutable)
  • Google Maps Datasets API (rendered on maps)
  • Git version control (source of truth)

See Also