GPS Dataset Architecture with Convex
GPS Dataset Architecture with Convex
Problem Statement
Challenge: How to manage a growing catalog of GPS-tagged objects (field recordings, venue locations, water infrastructure, monitoring sites) with rich metadata, semantic search, and integration with Google Maps?
Requirements:
- Store GPS coordinates with extensive metadata (equipment, dates, conditions, notes)
- Support multiple dataset types (audio recordings, venues, infrastructure)
- Enable semantic search ("find recordings with heavy bass near the border")
- Sync with Google Maps Datasets API for visualization
- Version control via MDX/GeoJSON files
- Real-time collaboration and updates
- Integrate with existing NNT ecosystem (share song/audio data)
Solution: Convex database with structured schema, Google Maps Datasets API for rendering, dual persistence (Convex + GeoJSON files), semantic search via embeddings.
Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Convex Database β
β β’ gpsDatasets - Dataset collections β
β β’ gpsObjects - Individual GPS points/geometries β
β β’ gpsObjectEmbeddings - Semantic search vectors β
β β’ Related: songs, venues, infrastructure β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (bidirectional sync)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Google Maps Datasets API β
β β’ Spatial indexing and rendering β
β β’ Vector tiles for performance β
β β’ Integration with Maps JavaScript API β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β (version control)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β GeoJSON Files (Git) β
β β’ Source of truth for GPS data β
β β’ Stored in vault: _Nakul/5. Coding Actions/ β
β β’ Exported from Convex periodically β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Convex Schema
Core Tables
typescript// convex/schema.ts import { defineSchema, defineTable } from "convex/server"; import { v } from "convex/values"; export default defineSchema({ // GPS Dataset collections (e.g., "Tijuana Field Recordings 2025") gpsDatasets: defineTable({ name: v.string(), description: v.string(), datasetType: v.union( v.literal("audio_recordings"), v.literal("music_venues"), v.literal("water_infrastructure"), v.literal("monitoring_sites"), v.literal("custom") ), // Google Maps Datasets API integration googleDatasetId: v.optional(v.string()), // From Maps Datasets API lastSyncedAt: v.optional(v.number()), // File locations geojsonPath: v.string(), // Path in vault: _Nakul/5. Coding Actions/... // Metadata createdBy: v.string(), createdAt: v.number(), updatedAt: v.number(), isPublic: v.boolean() // Share on midimaze.com? }) .index("by_name", ["name"]) .index("by_type", ["datasetType"]) .index("by_google_id", ["googleDatasetId"]), // Individual GPS objects (points, lines, polygons) gpsObjects: defineTable({ datasetId: v.id("gpsDatasets"), // GeoJSON geometry geometry: v.object({ type: v.union( v.literal("Point"), v.literal("LineString"), v.literal("Polygon"), v.literal("MultiPoint"), v.literal("MultiLineString"), v.literal("MultiPolygon") ), coordinates: v.any() // Array structure varies by type }), // Common properties name: v.string(), description: v.optional(v.string()), // Type-specific properties properties: v.any(), // Flexible JSON object // Relationships songId: v.optional(v.id("songs")), // Link to NNT song if audio recording venueId: v.optional(v.string()), // Link to venue database // Metadata createdAt: v.number(), updatedAt: v.number(), tags: v.array(v.string()) }) .index("by_dataset", ["datasetId"]) .index("by_song", ["songId"]) .index("by_tags", ["tags"]), // Embeddings for semantic search gpsObjectEmbeddings: defineTable({ objectId: v.id("gpsObjects"), datasetId: v.id("gpsDatasets"), // Vector embedding embedding: v.array(v.float64()), // 1536 dimensions (OpenAI) // Searchable text (concatenated from name, description, properties) searchText: v.string(), // Metadata for filtering metadata: v.object({ name: v.string(), datasetType: v.string(), coordinates: v.array(v.float64()), // [lng, lat] for distance sorting tags: v.array(v.string()) }), createdAt: v.number() }) .index("by_object", ["objectId"]) .index("by_dataset", ["datasetId"]), // Existing NNT tables (integrate with) songs: defineTable({ title: v.string(), artist: v.string(), album: v.optional(v.string()), year: v.optional(v.number()), appleMusicId: v.optional(v.string()), spotifyId: v.optional(v.string()), youtubeId: v.optional(v.string()), duration: v.optional(v.number()), createdAt: v.number(), updatedAt: v.number() }) .index("by_title_artist", ["title", "artist"]) .index("by_apple_music", ["appleMusicId"]) });
Dataset Type Schemas
Audio Recordings Dataset
Properties structure:
typescript{ // Recording metadata recording_date: "2025-11-01T21:30:00Z", equipment: "Zoom H6, Rode NTG3", duration: "45:32", // Audio files audio_url: "https://audio.example.com/tj-jazz-2025-11-01.wav", waveform_url: "https://audio.example.com/tj-jazz-2025-11-01-waveform.png", // Recording conditions weather: "Clear, 18Β°C, light winds", ambient_noise: "moderate", time_of_day: "evening", // Environmental data (from Google APIs) uaqi: 42, // Air Quality Index pollen_level: "low", // Technical metadata sample_rate: 48000, bit_depth: 24, channels: 2, file_size_mb: 234, // Notes notes: "Live jazz trio, moderate audience noise", transcript: "Optional voice annotation transcript" }
Example Convex object:
typescript{ _id: "abc123", datasetId: "tijuana-recordings-2025", geometry: { type: "Point", coordinates: [-117.0363, 32.5327] // [longitude, latitude] }, name: "Tijuana Jazz Club - Friday Night Set", description: "Live jazz trio performance, intimate atmosphere", properties: { recording_date: "2025-11-01T21:30:00Z", equipment: "Zoom H6, Rode NTG3", duration: "45:32", audio_url: "https://audio.example.com/tj-jazz-2025-11-01.wav", weather: "Clear, 18Β°C, light winds", uaqi: 42, notes: "Live jazz trio, moderate audience noise" }, songId: "song_xyz789", // Link to songs table tags: ["jazz", "live", "tijuana", "trio"], createdAt: 1699488000, updatedAt: 1699488000 }
Music Venues Dataset
Properties structure:
typescript{ // Venue info address: "Av. RevoluciΓ³n 1006, Centro, Tijuana", capacity: 150, genres: ["jazz", "blues", "soul"], // Contact phone: "663 438 3418", website: "https://tijuanajazzclub.com", email: "[email protected]", // Hours (JSON) hours: { monday: "18:30-00:30", tuesday: "closed", // ... }, // Equipment sound_system: "QSC K12.2 speakers, Yamaha MG16 mixer", stage_size: "12ft x 8ft", has_piano: true, has_drums: true, // Reviews google_rating: 4.8, google_reviews: 127, // Venue network analysis nearest_border_crossing: "San Ysidro", distance_to_border_km: 5.2 }
Water Infrastructure Dataset
Properties structure:
typescript{ // Infrastructure type feature_type: "sewer_main" | "manhole" | "pump_station" | "treatment_plant", // Sewer line properties (LineString) line_id: "TJ-SW-1234", diameter_mm: 450, material: "PVC", install_date: "2010-06-15", last_inspection: "2025-08-20", condition: "good" | "moderate_wear" | "needs_repair" | "critical", flow_direction: "gravity" | "pressurized", // Manhole properties (Point) manhole_id: "MH-0423", depth_m: 3.2, access_type: "street" | "alley" | "private", // Treatment plant properties (Polygon) plant_id: "WWTP-La-Morita", capacity_mgd: 25, // Million gallons per day service_population: 150000, treatment_level: "secondary" | "tertiary", // Maintenance priority: "routine" | "medium" | "high" | "urgent", next_inspection_date: "2026-02-15", maintenance_notes: "Sediment buildup, schedule cleaning" }
Key Mutations
Create Dataset
typescript// convex/gpsDatasets.ts export const createDataset = mutation({ args: { name: v.string(), description: v.string(), datasetType: v.union( v.literal("audio_recordings"), v.literal("music_venues"), v.literal("water_infrastructure"), v.literal("monitoring_sites"), v.literal("custom") ), geojsonPath: v.string(), isPublic: v.boolean() }, handler: async (ctx, args) => { const datasetId = await ctx.db.insert("gpsDatasets", { name: args.name, description: args.description, datasetType: args.datasetType, geojsonPath: args.geojsonPath, createdBy: ctx.auth.userId ?? "nakul", createdAt: Date.now(), updatedAt: Date.now(), isPublic: args.isPublic }); return datasetId; } });
Add GPS Object
typescriptexport const addGPSObject = mutation({ args: { datasetId: v.id("gpsDatasets"), geometry: v.object({ type: v.string(), coordinates: v.any() }), name: v.string(), description: v.optional(v.string()), properties: v.any(), songId: v.optional(v.id("songs")), tags: v.array(v.string()) }, handler: async (ctx, args) => { // Insert GPS object const objectId = await ctx.db.insert("gpsObjects", { datasetId: args.datasetId, geometry: args.geometry, name: args.name, description: args.description, properties: args.properties, songId: args.songId, tags: args.tags, createdAt: Date.now(), updatedAt: Date.now() }); // Generate embedding asynchronously (background job) await ctx.scheduler.runAfter(0, "generateEmbedding", { objectId }); return objectId; } });
Generate Embedding (Background Job)
typescriptimport { OpenAI } from "openai"; export const generateEmbedding = internalMutation({ args: { objectId: v.id("gpsObjects") }, handler: async (ctx, args) => { const object = await ctx.db.get(args.objectId); if (!object) return; const dataset = await ctx.db.get(object.datasetId); if (!dataset) return; // Concatenate searchable text const searchText = [ object.name, object.description || "", JSON.stringify(object.properties), object.tags.join(" ") ].join(" "); // Generate embedding via OpenAI const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const response = await openai.embeddings.create({ model: "text-embedding-3-small", input: searchText }); const embedding = response.data[0].embedding; // Extract coordinates for Point geometries let coordinates = [0, 0]; if (object.geometry.type === "Point") { coordinates = object.geometry.coordinates; } // Store embedding await ctx.db.insert("gpsObjectEmbeddings", { objectId: args.objectId, datasetId: object.datasetId, embedding, searchText, metadata: { name: object.name, datasetType: dataset.datasetType, coordinates, tags: object.tags }, createdAt: Date.now() }); } });
Key Queries
Get Dataset with Objects
typescriptexport const getDatasetWithObjects = query({ args: { datasetId: v.id("gpsDatasets") }, handler: async (ctx, args) => { const dataset = await ctx.db.get(args.datasetId); if (!dataset) return null; const objects = await ctx.db .query("gpsObjects") .withIndex("by_dataset", q => q.eq("datasetId", args.datasetId)) .collect(); return { dataset, objects }; } });
Semantic Search
typescriptexport const semanticSearch = query({ args: { query: v.string(), datasetId: v.optional(v.id("gpsDatasets")), limit: v.optional(v.number()) }, handler: async (ctx, args) => { // Generate query embedding const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }); const response = await openai.embeddings.create({ model: "text-embedding-3-small", input: args.query }); const queryEmbedding = response.data[0].embedding; // Get all embeddings (optionally filtered by dataset) let embeddingsQuery = ctx.db.query("gpsObjectEmbeddings"); if (args.datasetId) { embeddingsQuery = embeddingsQuery.withIndex("by_dataset", q => q.eq("datasetId", args.datasetId) ); } const allEmbeddings = await embeddingsQuery.collect(); // Calculate cosine similarity const withScores = allEmbeddings.map(doc => ({ ...doc, similarity: cosineSimilarity(queryEmbedding, doc.embedding) })); // Sort by similarity and return top results const results = withScores .sort((a, b) => b.similarity - a.similarity) .slice(0, args.limit ?? 10); // Fetch full GPS objects const objects = await Promise.all( results.map(r => ctx.db.get(r.objectId)) ); return results.map((r, i) => ({ object: objects[i], similarity: r.similarity })); } }); // Helper function function cosineSimilarity(a: number[], b: number[]): number { const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0); const magnitudeA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0)); const magnitudeB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0)); return dotProduct / (magnitudeA * magnitudeB); }
Spatial Queries (Bounding Box)
typescriptexport const getObjectsInBounds = query({ args: { datasetId: v.id("gpsDatasets"), bounds: v.object({ north: v.number(), south: v.number(), east: v.number(), west: v.number() }) }, handler: async (ctx, args) => { const objects = await ctx.db .query("gpsObjects") .withIndex("by_dataset", q => q.eq("datasetId", args.datasetId)) .collect(); // Filter by bounding box (for Point geometries) return objects.filter(obj => { if (obj.geometry.type !== "Point") return true; const [lng, lat] = obj.geometry.coordinates; return ( lat >= args.bounds.south && lat <= args.bounds.north && lng >= args.bounds.west && lng <= args.bounds.east ); }); } });
Integration with NNT Ecosystem
Linking Field Recordings to Songs
When you record audio at a venue, you can link the GPS object to an existing song in your NNT ecosystem:
typescript// 1. Create GPS object for field recording const objectId = await addGPSObject({ datasetId: "tijuana-recordings-2025", geometry: { type: "Point", coordinates: [-117.0363, 32.5327] }, name: "Tijuana Jazz Club - Friday Night Set", properties: { recording_date: "2025-11-01T21:30:00Z", audio_url: "https://..." }, songId: existingSongId, // Link to songs table tags: ["jazz", "live"] }); // 2. Now the song has a location const song = await db.get(existingSongId); const recording = await db.get(objectId); // 3. tscribe component can display map <TScribe songId={existingSongId} showMap={true} // Renders map with recording location />
Querying Songs by Location
typescriptexport const getSongsNearLocation = query({ args: { latitude: v.number(), longitude: v.number(), radiusKm: v.number() }, handler: async (ctx, args) => { // Get all GPS objects linked to songs const allObjects = await ctx.db .query("gpsObjects") .filter(q => q.neq(q.field("songId"), undefined)) .collect(); // Filter by distance (approximate) const nearby = allObjects.filter(obj => { if (obj.geometry.type !== "Point") return false; const [lng, lat] = obj.geometry.coordinates; const distance = haversineDistance( args.latitude, args.longitude, lat, lng ); return distance <= args.radiusKm; }); // Fetch songs const songs = await Promise.all( nearby.map(obj => ctx.db.get(obj.songId!)) ); return nearby.map((obj, i) => ({ object: obj, song: songs[i] })); } });
Performance Considerations
Embedding Storage
Storage per object:
- 1536 dimensions Γ 8 bytes = ~12KB per GPS object
- 10,000 objects = ~120MB (manageable)
Optimization strategies:
- Use smaller embedding models (384 or 768 dimensions)
- Only embed rich metadata (skip simple points)
- Store embeddings in separate vector database if Convex limits hit
- Regenerate embeddings periodically (monthly) rather than storing all versions
Query Optimization
Spatial queries:
- Convex doesn't have native geospatial indexes
- For large datasets (>10k objects), consider:
- Bounding box pre-filter before precise distance calculations
- Quadtree indexing in metadata
- Offload spatial queries to Google Maps Datasets API
Embedding search:
- Cosine similarity calculation is O(n) across all embeddings
- For datasets >50k objects, consider:
- Using specialized vector databases (Pinecone, Weaviate)
- Implementing approximate nearest neighbor (ANN) algorithms
- Pre-filtering by dataset type or tags before similarity search
Migration Path
Phase 1: Core Schema (Week 1)
- Implement gpsDatasets + gpsObjects tables
- Create basic mutations (add dataset, add object)
- Build simple React component to display datasets
Phase 2: Embeddings (Week 2)
- Add gpsObjectEmbeddings table
- Background job to generate embeddings
- Semantic search query implementation
Phase 3: Google Maps Integration (Week 3)
- Sync Convex β Maps Datasets API (see next article)
- Render datasets on interactive maps
- Click handlers to show object details
Phase 4: NNT Integration (Week 4)
- Link GPS objects to songs table
- Update tscribe component to show maps
- "Record at location" workflow
Phase 5: Version Control (Week 5)
- Export Convex β GeoJSON files
- Commit GeoJSON to Git
- Seed Convex from GeoJSON on deploy
File Storage Structure
GeoJSON files in vault:
_Nakul/5. Coding Actions/midimaze/gps-datasets/
βββ tijuana-field-recordings-2025.geojson
βββ music-venues-tijuana.geojson
βββ water-infrastructure-tijuana.geojson
βββ monitoring-sites-tijuana-river.geojson
Each file synced with:
- Convex gpsDatasets table (live, mutable)
- Google Maps Datasets API (rendered on maps)
- Git version control (source of truth)
See Also
- Syncing Maps Datasets API with Convex - Bidirectional sync workflow
- Semantic Search for GPS Objects - Advanced search patterns
- Building a GPS Dataset Manager - React component guide
- Maps Datasets API - Google's spatial database
- Data Storage Architecture - NNT ecosystem database design
- Langchain with Convex - Embedding implementation patterns