# Database Theory
A database is an organized collection of data that can be efficiently queried, updated, and managed. Understanding database fundamentals helps you build scalable, reliable applications.
## Core Concepts
### CRUD Operations
Every database supports four fundamental operations:
**Create** - Add new data
```typescript
await db.insert("songs", {
title: "Giant Steps",
artist: "John Coltrane",
year: 1960
});
```
**Read** - Retrieve data
```typescript
const song = await db.query("songs")
.filter(q => q.eq(q.field("title"), "Giant Steps"))
.first();
```
**Update** - Modify existing data
```typescript
await db.patch(songId, { year: 1959 });
```
**Delete** - Remove data
```typescript
await db.delete(songId);
```
### Schemas
A **schema** defines the structure of your data - what fields exist and what types they have.
**Example schema (Convex):**
```typescript
import { defineSchema, defineTable } from "convex/server";
import { v } from "convex/values";
export default defineSchema({
songs: defineTable({
title: v.string(),
artist: v.string(),
year: v.number(),
duration: v.optional(v.number()), // Optional field
tags: v.array(v.string()) // Array of strings
})
});
```
**Why schemas matter:**
- **Type safety** - Catch errors at compile time
- **Data integrity** - Ensure consistency
- **Performance** - Database can optimize storage
### Queries
Queries retrieve data based on conditions.
**Simple query:**
```typescript
// Get all jazz songs
const jazzSongs = await db.query("songs")
.filter(q => q.eq(q.field("genre"), "jazz"))
.collect();
```
**Complex query:**
```typescript
// Get songs from 1960s by Coltrane, sorted by title
const songs = await db.query("songs")
.filter(q =>
q.and(
q.eq(q.field("artist"), "John Coltrane"),
q.gte(q.field("year"), 1960),
q.lte(q.field("year"), 1969)
)
)
.order("asc", "title")
.collect();
```
### Indexes
**Indexes** make queries fast by creating lookup tables.
**Without index:**
- Database scans every record → O(n) time
- 10,000 songs → 10,000 checks
**With index:**
- Database jumps to match → O(log n) time
- 10,000 songs → ~13 checks
**Creating indexes (Convex):**
```typescript
songs: defineTable({
title: v.string(),
artist: v.string(),
year: v.number()
})
.index("by_artist", ["artist"])
.index("by_year", ["year"])
.index("by_artist_year", ["artist", "year"]) // Compound index
```
**Using indexes:**
```typescript
// Fast - uses index
const coltraneSONGS = await db.query("songs")
.withIndex("by_artist", q => q.eq("artist", "John Coltrane"))
.collect();
// Slow - no index
const oldSongs = await db.query("songs")
.filter(q => q.lt(q.field("year"), 1950)) // No index on year < 1950
.collect();
```
## Database Types
### Relational Databases
**Structure:** Tables with rows and columns
**Relationships:** Foreign keys link tables
**Query language:** SQL
**Examples:** PostgreSQL, MySQL, SQLite
**Song database (relational):**
```sql
-- Songs table
CREATE TABLE songs (
id SERIAL PRIMARY KEY,
title VARCHAR(255),
artist_id INT REFERENCES artists(id),
album_id INT REFERENCES albums(id)
);
-- Artists table
CREATE TABLE artists (
id SERIAL PRIMARY KEY,
name VARCHAR(255)
);
-- Query with JOIN
SELECT songs.title, artists.name
FROM songs
JOIN artists ON songs.artist_id = artists.id
WHERE artists.name = 'John Coltrane';
```
**Strengths:**
- Strong data integrity (ACID transactions)
- Complex relationships
- Mature tooling
**Weaknesses:**
- Rigid schema (migrations required)
- Harder to scale horizontally
### NoSQL Databases
**Structure:** Flexible documents, key-value pairs, graphs
**Relationships:** Embedded or referenced
**Examples:** MongoDB, Convex, Firebase
**Song database (NoSQL/Convex):**
```typescript
{
_id: "abc123",
title: "Giant Steps",
artist: {
name: "John Coltrane",
born: 1926
},
album: "Giant Steps",
tracks: [
{ position: 1, title: "Giant Steps", duration: 285 },
{ position: 2, title: "Cousin Mary", duration: 342 }
]
}
```
**Strengths:**
- Flexible schema
- Easy horizontal scaling
- Fast development
**Weaknesses:**
- Less rigid data integrity
- Complex relationships harder to model
## Convex-Specific Concepts
### Real-Time Reactivity
Convex automatically updates React components when data changes.
```typescript
function SongList() {
// Subscribes to changes
const songs = useQuery(api.songs.list);
// Automatically re-renders when songs added/updated/deleted
return (
<ul>
{songs?.map(song => <li key={song._id}>{song.title}</li>)}
</ul>
);
}
```
### Queries, Mutations, Actions
**Query** - Read data (can't modify)
```typescript
export const get = query({
args: { id: v.id("songs") },
handler: async (ctx, args) => {
return await ctx.db.get(args.id);
}
});
```
**Mutation** - Write data (deterministic)
```typescript
export const add = mutation({
args: { title: v.string(), artist: v.string() },
handler: async (ctx, args) => {
return await ctx.db.insert("songs", args);
}
});
```
**Action** - Call external APIs (non-deterministic)
```typescript
export const fetchFromSpotify = action({
args: { songId: v.string() },
handler: async (ctx, args) => {
const data = await fetch(`https://api.spotify.com/...`);
// Can call mutations to save results
await ctx.runMutation(api.songs.update, { ... });
}
});
```
### Document IDs
Convex uses typed IDs: `Id<"songs">`
```typescript
// Type-safe ID
const songId: Id<"songs"> = "k17abc123";
// Can't accidentally use wrong ID
const artist = await db.get(songId); // Error: Expected Id<"artists">
```
## Common Patterns
### One-to-Many Relationships
**Example:** One artist, many songs
**Approach 1: Array of IDs**
```typescript
artists: defineTable({
name: v.string(),
songIds: v.array(v.id("songs"))
})
```
**Approach 2: Back-reference (better for Convex)**
```typescript
songs: defineTable({
title: v.string(),
artistId: v.id("artists")
}).index("by_artist", ["artistId"])
// Query all songs by artist
const songs = await db.query("songs")
.withIndex("by_artist", q => q.eq("artistId", artistId))
.collect();
```
### Many-to-Many Relationships
**Example:** Songs can have multiple tags, tags apply to multiple songs
**Junction table:**
```typescript
songTags: defineTable({
songId: v.id("songs"),
tagId: v.id("tags")
})
.index("by_song", ["songId"])
.index("by_tag", ["tagId"])
```
### Denormalization
**Principle:** Copy data to avoid joins
**Example:** Store artist name in song (instead of just ID)
```typescript
songs: defineTable({
title: v.string(),
artistId: v.id("artists"),
artistName: v.string() // Denormalized - avoid extra query
})
```
**Trade-off:**
- ✅ Faster queries (no join)
- ❌ More storage
- ❌ Must update in multiple places
## Best Practices
### Design for Your Queries
**Bad:** Design schema first, then figure out queries
**Good:** Design queries first, then schema to support them
**Example:**
- Need to show "Recent songs by artist"?
- Create index: `by_artist_created`
### Use Indexes Wisely
**Too few indexes:** Slow queries
**Too many indexes:** Slow writes, wasted storage
**Rule of thumb:** Index fields you filter/sort by frequently
### Pagination for Large Results
**Bad:**
```typescript
const allSongs = await db.query("songs").collect(); // 10,000 songs!
```
**Good:**
```typescript
const page = await db.query("songs")
.order("desc", "createdAt")
.take(50); // First 50 only
```
### Atomic Updates
Use transactions to ensure data consistency:
```typescript
// Transfer plays from one song to another
export const transferPlays = mutation({
args: { fromSongId: v.id("songs"), toSongId: v.id("songs") },
handler: async (ctx, args) => {
const fromSong = await ctx.db.get(args.fromSongId);
const toSong = await ctx.db.get(args.toSongId);
// Both updates succeed or both fail (atomic)
await ctx.db.patch(args.fromSongId, { plays: 0 });
await ctx.db.patch(args.toSongId, {
plays: toSong.plays + fromSong.plays
});
}
});
```
## Real-World Example: NNT Ecosystem
**Requirements:**
- Store songs with multiple annotation instances
- Each annotation has start/end times, text, embeddings
- Real-time updates when annotations added
- Fast search by song, time range, semantic content
**Schema:**
```typescript
export default defineSchema({
songs: defineTable({
title: v.string(),
artist: v.string(),
appleMusicId: v.optional(v.string()),
spotifyId: v.optional(v.string())
}),
annotations: defineTable({
songId: v.id("songs"),
startTime: v.number(),
endTime: v.number(),
text: v.string(),
embedding: v.array(v.float64()),
createdBy: v.string(),
createdAt: v.number()
})
.index("by_song", ["songId"])
.index("by_song_time", ["songId", "startTime"]),
tscribeInstances: defineTable({
articleId: v.string(),
songId: v.id("songs"),
section: v.string(),
startTime: v.number(),
endTime: v.number()
}).index("by_article", ["articleId"])
});
```
**Query examples:**
```typescript
// Get all annotations for a song, ordered by time
const annotations = await db.query("annotations")
.withIndex("by_song_time", q => q.eq("songId", songId))
.order("asc", "startTime")
.collect();
// Get annotations overlapping specific time range
const overlap = await db.query("annotations")
.withIndex("by_song_time", q =>
q.eq("songId", songId)
.gte("startTime", 100)
.lte("endTime", 200)
)
.collect();
```
## See Also
- [[Getting started with Convex]] - Convex setup
- [[Langchain with Convex]] - Embeddings + vector search
- [[Data Storage Architecture]] - NNT-specific schema details
- [[Database Query Patterns]] - Efficient Convex queries