Database Theory
Database Theory
A database is an organized collection of data that can be efficiently queried, updated, and managed. Understanding database fundamentals helps you build scalable, reliable applications.
Core Concepts
CRUD Operations
Every database supports four fundamental operations:
Create - Add new data
typescriptawait db.insert("songs", { title: "Giant Steps", artist: "John Coltrane", year: 1960 });
Read - Retrieve data
typescriptconst song = await db.query("songs") .filter(q => q.eq(q.field("title"), "Giant Steps")) .first();
Update - Modify existing data
typescriptawait db.patch(songId, { year: 1959 });
Delete - Remove data
typescriptawait db.delete(songId);
Schemas
A schema defines the structure of your data - what fields exist and what types they have.
Example schema (Convex):
typescriptimport { defineSchema, defineTable } from "convex/server"; import { v } from "convex/values"; export default defineSchema({ songs: defineTable({ title: v.string(), artist: v.string(), year: v.number(), duration: v.optional(v.number()), // Optional field tags: v.array(v.string()) // Array of strings }) });
Why schemas matter:
- Type safety - Catch errors at compile time
- Data integrity - Ensure consistency
- Performance - Database can optimize storage
Queries
Queries retrieve data based on conditions.
Simple query:
typescript// Get all jazz songs const jazzSongs = await db.query("songs") .filter(q => q.eq(q.field("genre"), "jazz")) .collect();
Complex query:
typescript// Get songs from 1960s by Coltrane, sorted by title const songs = await db.query("songs") .filter(q => q.and( q.eq(q.field("artist"), "John Coltrane"), q.gte(q.field("year"), 1960), q.lte(q.field("year"), 1969) ) ) .order("asc", "title") .collect();
Indexes
Indexes make queries fast by creating lookup tables.
Without index:
- Database scans every record β O(n) time
- 10,000 songs β 10,000 checks
With index:
- Database jumps to match β O(log n) time
- 10,000 songs β ~13 checks
Creating indexes (Convex):
typescriptsongs: defineTable({ title: v.string(), artist: v.string(), year: v.number() }) .index("by_artist", ["artist"]) .index("by_year", ["year"]) .index("by_artist_year", ["artist", "year"]) // Compound index
Using indexes:
typescript// Fast - uses index const coltraneSONGS = await db.query("songs") .withIndex("by_artist", q => q.eq("artist", "John Coltrane")) .collect(); // Slow - no index const oldSongs = await db.query("songs") .filter(q => q.lt(q.field("year"), 1950)) // No index on year < 1950 .collect();
Database Types
Relational Databases
Structure: Tables with rows and columns
Relationships: Foreign keys link tables
Query language: SQL
Examples: PostgreSQL, MySQL, SQLite
Song database (relational):
sql-- Songs table CREATE TABLE songs ( id SERIAL PRIMARY KEY, title VARCHAR(255), artist_id INT REFERENCES artists(id), album_id INT REFERENCES albums(id) ); -- Artists table CREATE TABLE artists ( id SERIAL PRIMARY KEY, name VARCHAR(255) ); -- Query with JOIN SELECT songs.title, artists.name FROM songs JOIN artists ON songs.artist_id = artists.id WHERE artists.name = 'John Coltrane';
Strengths:
- Strong data integrity (ACID transactions)
- Complex relationships
- Mature tooling
Weaknesses:
- Rigid schema (migrations required)
- Harder to scale horizontally
NoSQL Databases
Structure: Flexible documents, key-value pairs, graphs
Relationships: Embedded or referenced
Examples: MongoDB, Convex, Firebase
Song database (NoSQL/Convex):
typescript{ _id: "abc123", title: "Giant Steps", artist: { name: "John Coltrane", born: 1926 }, album: "Giant Steps", tracks: [ { position: 1, title: "Giant Steps", duration: 285 }, { position: 2, title: "Cousin Mary", duration: 342 } ] }
Strengths:
- Flexible schema
- Easy horizontal scaling
- Fast development
Weaknesses:
- Less rigid data integrity
- Complex relationships harder to model
Convex-Specific Concepts
Real-Time Reactivity
Convex automatically updates React components when data changes.
typescriptfunction SongList() { // Subscribes to changes const songs = useQuery(api.songs.list); // Automatically re-renders when songs added/updated/deleted return ( <ul> {songs?.map(song => <li key={song._id}>{song.title}</li>)} </ul> ); }
Queries, Mutations, Actions
Query - Read data (can't modify)
typescriptexport const get = query({ args: { id: v.id("songs") }, handler: async (ctx, args) => { return await ctx.db.get(args.id); } });
Mutation - Write data (deterministic)
typescriptexport const add = mutation({ args: { title: v.string(), artist: v.string() }, handler: async (ctx, args) => { return await ctx.db.insert("songs", args); } });
Action - Call external APIs (non-deterministic)
typescriptexport const fetchFromSpotify = action({ args: { songId: v.string() }, handler: async (ctx, args) => { const data = await fetch(`https://api.spotify.com/...`); // Can call mutations to save results await ctx.runMutation(api.songs.update, { ... }); } });
Document IDs
Convex uses typed IDs: Id<"songs">
typescript// Type-safe ID const songId: Id<"songs"> = "k17abc123"; // Can't accidentally use wrong ID const artist = await db.get(songId); // Error: Expected Id<"artists">
Common Patterns
One-to-Many Relationships
Example: One artist, many songs
Approach 1: Array of IDs
typescriptartists: defineTable({ name: v.string(), songIds: v.array(v.id("songs")) })
Approach 2: Back-reference (better for Convex)
typescriptsongs: defineTable({ title: v.string(), artistId: v.id("artists") }).index("by_artist", ["artistId"]) // Query all songs by artist const songs = await db.query("songs") .withIndex("by_artist", q => q.eq("artistId", artistId)) .collect();
Many-to-Many Relationships
Example: Songs can have multiple tags, tags apply to multiple songs
Junction table:
typescriptsongTags: defineTable({ songId: v.id("songs"), tagId: v.id("tags") }) .index("by_song", ["songId"]) .index("by_tag", ["tagId"])
Denormalization
Principle: Copy data to avoid joins
Example: Store artist name in song (instead of just ID)
typescriptsongs: defineTable({ title: v.string(), artistId: v.id("artists"), artistName: v.string() // Denormalized - avoid extra query })
Trade-off:
- β Faster queries (no join)
- β More storage
- β Must update in multiple places
Best Practices
Design for Your Queries
Bad: Design schema first, then figure out queries
Good: Design queries first, then schema to support them
Example:
- Need to show "Recent songs by artist"?
- Create index:
by_artist_created
Use Indexes Wisely
Too few indexes: Slow queries
Too many indexes: Slow writes, wasted storage
Rule of thumb: Index fields you filter/sort by frequently
Pagination for Large Results
Bad:
typescriptconst allSongs = await db.query("songs").collect(); // 10,000 songs!
Good:
typescriptconst page = await db.query("songs") .order("desc", "createdAt") .take(50); // First 50 only
Atomic Updates
Use transactions to ensure data consistency:
typescript// Transfer plays from one song to another export const transferPlays = mutation({ args: { fromSongId: v.id("songs"), toSongId: v.id("songs") }, handler: async (ctx, args) => { const fromSong = await ctx.db.get(args.fromSongId); const toSong = await ctx.db.get(args.toSongId); // Both updates succeed or both fail (atomic) await ctx.db.patch(args.fromSongId, { plays: 0 }); await ctx.db.patch(args.toSongId, { plays: toSong.plays + fromSong.plays }); } });
Real-World Example: NNT Ecosystem
Requirements:
- Store songs with multiple annotation instances
- Each annotation has start/end times, text, embeddings
- Real-time updates when annotations added
- Fast search by song, time range, semantic content
Schema:
typescriptexport default defineSchema({ songs: defineTable({ title: v.string(), artist: v.string(), appleMusicId: v.optional(v.string()), spotifyId: v.optional(v.string()) }), annotations: defineTable({ songId: v.id("songs"), startTime: v.number(), endTime: v.number(), text: v.string(), embedding: v.array(v.float64()), createdBy: v.string(), createdAt: v.number() }) .index("by_song", ["songId"]) .index("by_song_time", ["songId", "startTime"]), tscribeInstances: defineTable({ articleId: v.string(), songId: v.id("songs"), section: v.string(), startTime: v.number(), endTime: v.number() }).index("by_article", ["articleId"]) });
Query examples:
typescript// Get all annotations for a song, ordered by time const annotations = await db.query("annotations") .withIndex("by_song_time", q => q.eq("songId", songId)) .order("asc", "startTime") .collect(); // Get annotations overlapping specific time range const overlap = await db.query("annotations") .withIndex("by_song_time", q => q.eq("songId", songId) .gte("startTime", 100) .lte("endTime", 200) ) .collect();
See Also
- Getting started with Convex - Convex setup
- Langchain with Convex - Embeddings + vector search
- Data Storage Architecture - NNT-specific schema details
- Database Query Patterns - Efficient Convex queries