Database Theory

Path: Computer Tech/Cloud and Backend/Convex/Database Theory.mdUpdated: 2/3/2026

Database Theory

A database is an organized collection of data that can be efficiently queried, updated, and managed. Understanding database fundamentals helps you build scalable, reliable applications.

Core Concepts

CRUD Operations

Every database supports four fundamental operations:

Create - Add new data

typescript
await db.insert("songs", {
  title: "Giant Steps",
  artist: "John Coltrane",
  year: 1960
});

Read - Retrieve data

typescript
const song = await db.query("songs")
  .filter(q => q.eq(q.field("title"), "Giant Steps"))
  .first();

Update - Modify existing data

typescript
await db.patch(songId, { year: 1959 });

Delete - Remove data

typescript
await db.delete(songId);

Schemas

A schema defines the structure of your data - what fields exist and what types they have.

Example schema (Convex):

typescript
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";

export default defineSchema({
  songs: defineTable({
    title: v.string(),
    artist: v.string(),
    year: v.number(),
    duration: v.optional(v.number()),  // Optional field
    tags: v.array(v.string())          // Array of strings
  })
});

Why schemas matter:

Type safety - Catch errors at compile time
Data integrity - Ensure consistency
Performance - Database can optimize storage

Queries

Queries retrieve data based on conditions.

Simple query:

typescript
// Get all jazz songs
const jazzSongs = await db.query("songs")
  .filter(q => q.eq(q.field("genre"), "jazz"))
  .collect();

Complex query:

typescript
// Get songs from 1960s by Coltrane, sorted by title
const songs = await db.query("songs")
  .filter(q => 
    q.and(
      q.eq(q.field("artist"), "John Coltrane"),
      q.gte(q.field("year"), 1960),
      q.lte(q.field("year"), 1969)
    )
  )
  .order("asc", "title")
  .collect();

Indexes

Indexes make queries fast by creating lookup tables.

Without index:

Database scans every record → O(n) time
10,000 songs → 10,000 checks

With index:

Database jumps to match → O(log n) time
10,000 songs → ~13 checks

Creating indexes (Convex):

typescript
songs: defineTable({
  title: v.string(),
  artist: v.string(),
  year: v.number()
})
  .index("by_artist", ["artist"])
  .index("by_year", ["year"])
  .index("by_artist_year", ["artist", "year"])  // Compound index

Using indexes:

typescript
// Fast - uses index
const coltraneSONGS = await db.query("songs")
  .withIndex("by_artist", q => q.eq("artist", "John Coltrane"))
  .collect();

// Slow - no index
const oldSongs = await db.query("songs")
  .filter(q => q.lt(q.field("year"), 1950))  // No index on year < 1950
  .collect();

Database Types

Relational Databases

Structure: Tables with rows and columns
Relationships: Foreign keys link tables
Query language: SQL
Examples: PostgreSQL, MySQL, SQLite

Song database (relational):

sql
-- Songs table
CREATE TABLE songs (
  id SERIAL PRIMARY KEY,
  title VARCHAR(255),
  artist_id INT REFERENCES artists(id),
  album_id INT REFERENCES albums(id)
);

-- Artists table
CREATE TABLE artists (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255)
);

-- Query with JOIN
SELECT songs.title, artists.name
FROM songs
JOIN artists ON songs.artist_id = artists.id
WHERE artists.name = 'John Coltrane';

Strengths:

Strong data integrity (ACID transactions)
Complex relationships
Mature tooling

Weaknesses:

Rigid schema (migrations required)
Harder to scale horizontally

NoSQL Databases

Structure: Flexible documents, key-value pairs, graphs
Relationships: Embedded or referenced
Examples: MongoDB, Convex, Firebase

Song database (NoSQL/Convex):

typescript
{
  _id: "abc123",
  title: "Giant Steps",
  artist: {
    name: "John Coltrane",
    born: 1926
  },
  album: "Giant Steps",
  tracks: [
    { position: 1, title: "Giant Steps", duration: 285 },
    { position: 2, title: "Cousin Mary", duration: 342 }
  ]
}

Strengths:

Flexible schema
Easy horizontal scaling
Fast development

Weaknesses:

Less rigid data integrity
Complex relationships harder to model

Convex-Specific Concepts

Real-Time Reactivity

Convex automatically updates React components when data changes.

typescript
function SongList() {
  // Subscribes to changes
  const songs = useQuery(api.songs.list);
  
  // Automatically re-renders when songs added/updated/deleted
  return (
    <ul>
      {songs?.map(song => <li key={song._id}>{song.title}</li>)}
    </ul>
  );
}

Queries, Mutations, Actions

Query - Read data (can't modify)

typescript
export const get = query({
  args: { id: v.id("songs") },
  handler: async (ctx, args) => {
    return await ctx.db.get(args.id);
  }
});

Mutation - Write data (deterministic)

typescript
export const add = mutation({
  args: { title: v.string(), artist: v.string() },
  handler: async (ctx, args) => {
    return await ctx.db.insert("songs", args);
  }
});

Action - Call external APIs (non-deterministic)

typescript
export const fetchFromSpotify = action({
  args: { songId: v.string() },
  handler: async (ctx, args) => {
    const data = await fetch(`https://api.spotify.com/...`);
    // Can call mutations to save results
    await ctx.runMutation(api.songs.update, { ... });
  }
});

Document IDs

Convex uses typed IDs: Id<"songs">

typescript
// Type-safe ID
const songId: Id<"songs"> = "k17abc123";

// Can't accidentally use wrong ID
const artist = await db.get(songId);  // Error: Expected Id<"artists">

Common Patterns

One-to-Many Relationships

Example: One artist, many songs

Approach 1: Array of IDs

typescript
artists: defineTable({
  name: v.string(),
  songIds: v.array(v.id("songs"))
})

Approach 2: Back-reference (better for Convex)

typescript
songs: defineTable({
  title: v.string(),
  artistId: v.id("artists")
}).index("by_artist", ["artistId"])

// Query all songs by artist
const songs = await db.query("songs")
  .withIndex("by_artist", q => q.eq("artistId", artistId))
  .collect();

Many-to-Many Relationships

Example: Songs can have multiple tags, tags apply to multiple songs

Junction table:

typescript
songTags: defineTable({
  songId: v.id("songs"),
  tagId: v.id("tags")
})
  .index("by_song", ["songId"])
  .index("by_tag", ["tagId"])

Denormalization

Principle: Copy data to avoid joins

Example: Store artist name in song (instead of just ID)

typescript
songs: defineTable({
  title: v.string(),
  artistId: v.id("artists"),
  artistName: v.string()  // Denormalized - avoid extra query
})

Trade-off:

✅ Faster queries (no join)
❌ More storage
❌ Must update in multiple places

Best Practices

Design for Your Queries

Bad: Design schema first, then figure out queries
Good: Design queries first, then schema to support them

Example:

Need to show "Recent songs by artist"?
Create index: by_artist_created

Use Indexes Wisely

Too few indexes: Slow queries
Too many indexes: Slow writes, wasted storage

Rule of thumb: Index fields you filter/sort by frequently

Pagination for Large Results

Bad:

typescript
const allSongs = await db.query("songs").collect();  // 10,000 songs!

Good:

typescript
const page = await db.query("songs")
  .order("desc", "createdAt")
  .take(50);  // First 50 only

Atomic Updates

Use transactions to ensure data consistency:

typescript
// Transfer plays from one song to another
export const transferPlays = mutation({
  args: { fromSongId: v.id("songs"), toSongId: v.id("songs") },
  handler: async (ctx, args) => {
    const fromSong = await ctx.db.get(args.fromSongId);
    const toSong = await ctx.db.get(args.toSongId);
    
    // Both updates succeed or both fail (atomic)
    await ctx.db.patch(args.fromSongId, { plays: 0 });
    await ctx.db.patch(args.toSongId, {
      plays: toSong.plays + fromSong.plays
    });
  }
});

Real-World Example: NNT Ecosystem

Requirements:

Store songs with multiple annotation instances
Each annotation has start/end times, text, embeddings
Real-time updates when annotations added
Fast search by song, time range, semantic content

Schema:

typescript
export default defineSchema({
  songs: defineTable({
    title: v.string(),
    artist: v.string(),
    appleMusicId: v.optional(v.string()),
    spotifyId: v.optional(v.string())
  }),
  
  annotations: defineTable({
    songId: v.id("songs"),
    startTime: v.number(),
    endTime: v.number(),
    text: v.string(),
    embedding: v.array(v.float64()),
    createdBy: v.string(),
    createdAt: v.number()
  })
    .index("by_song", ["songId"])
    .index("by_song_time", ["songId", "startTime"]),
  
  tscribeInstances: defineTable({
    articleId: v.string(),
    songId: v.id("songs"),
    section: v.string(),
    startTime: v.number(),
    endTime: v.number()
  }).index("by_article", ["articleId"])
});

Query examples:

typescript
// Get all annotations for a song, ordered by time
const annotations = await db.query("annotations")
  .withIndex("by_song_time", q => q.eq("songId", songId))
  .order("asc", "startTime")
  .collect();

// Get annotations overlapping specific time range
const overlap = await db.query("annotations")
  .withIndex("by_song_time", q => 
    q.eq("songId", songId)
     .gte("startTime", 100)
     .lte("endTime", 200)
  )
  .collect();

Database Theory

Database Theory

Core Concepts

CRUD Operations

Schemas

Queries

Indexes

Database Types

Relational Databases

NoSQL Databases

Convex-Specific Concepts

Real-Time Reactivity

Queries, Mutations, Actions

Document IDs

Common Patterns

One-to-Many Relationships

Many-to-Many Relationships

Denormalization

Best Practices

Design for Your Queries

Use Indexes Wisely

Pagination for Large Results

Atomic Updates

Real-World Example: NNT Ecosystem

See Also