Milestone 3: Database Selection
Status: Planning
Goal: Choose database for developer accounts, app metadata, and analytics.
Overview
The database stores all persistent data: developer accounts, app metadata, versions, telemetry events, and audit logs.
Requirements
Data Characteristics
| Data Type |
Volume |
Access Pattern |
Consistency |
| Developers |
10K rows |
Read-heavy, low write |
Strong |
| Apps |
100K rows |
Read-heavy |
Strong |
| Versions |
500K rows |
Read-heavy |
Strong |
| API Keys |
50K rows |
Read-heavy |
Strong |
| Telemetry |
100M+ rows |
Write-heavy, append |
Eventual OK |
| Audit Logs |
10M+ rows |
Write-heavy, append |
Eventual OK |
Query Patterns
- Get developer by email
- List apps by developer
- Get app with latest version
- Search apps by name/tags
- Aggregate telemetry by app/day
- Time-range queries on events
Options Analysis
Option A: PostgreSQL
Characteristics
Pros
| Advantage |
Details |
| Battle-tested |
Decades of reliability |
| ACID compliance |
Strong consistency |
| JSON support |
JSONB for flexible data |
| Full-text search |
No separate search engine needed |
| Extensions |
PostGIS, pg_trgm, etc. |
| Tooling |
pgAdmin, great ORMs |
Cons
| Disadvantage |
Details |
| Ops overhead |
Need connection pooling |
| Scaling writes |
Vertical scaling limits |
| Time-series |
Not optimized for telemetry |
Hosting Options
| Provider |
Free Tier |
Paid |
| Supabase |
500MB |
$25/mo |
| Neon |
512MB |
$19/mo |
| Railway |
1GB |
$5/mo |
| AWS RDS |
- |
$15/mo+ |
| Self-hosted |
- |
VPS cost |
Option B: SQLite + Litestream
Characteristics
Pros
| Advantage |
Details |
| Zero ops |
No separate DB server |
| Fast reads |
In-process, no network |
| Simple backup |
Litestream handles replication |
| Low cost |
Just storage costs |
| Portable |
Easy local development |
Cons
| Disadvantage |
Details |
| Single writer |
Limits write concurrency |
| No horizontal scale |
One server only |
| Limited features |
No full-text (without FTS5) |
Cost Estimate
| Component |
Cost/month |
| S3 storage (10GB) |
$0.25 |
| Compute |
Included in app server |
Option C: PostgreSQL + TimescaleDB
Characteristics
Pros
| Advantage |
Details |
| Best of both |
Relational + time-series |
| Auto-partition |
Handles telemetry scale |
| Compression |
90%+ compression ratio |
| Continuous aggregates |
Pre-computed rollups |
Cons
| Disadvantage |
Details |
| Complexity |
More to manage |
| Cost |
Higher than plain Postgres |
| Learning curve |
New concepts |
Option D: Hybrid Approach
Pros
| Advantage |
Details |
| Right tool for job |
Optimized for each use case |
| Scale independently |
Telemetry won't affect main DB |
| Performance |
Best possible for each workload |
Cons
| Disadvantage |
Details |
| Complexity |
Multiple systems to manage |
| Cost |
More infrastructure |
| Consistency |
Cross-DB transactions hard |
Schema Design
Core Tables
-- Developers
CREATE TABLE developers (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL,
name VARCHAR(100) NOT NULL,
password_hash VARCHAR(255),
oauth_provider VARCHAR(50),
oauth_id VARCHAR(255),
verified BOOLEAN DEFAULT FALSE,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- API Keys
CREATE TABLE api_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
developer_id UUID REFERENCES developers(id) ON DELETE CASCADE,
name VARCHAR(100) NOT NULL,
key_hash VARCHAR(255) NOT NULL,
key_prefix VARCHAR(10) NOT NULL, -- For display: "mk_abc..."
permissions JSONB DEFAULT '[]',
last_used_at TIMESTAMPTZ,
expires_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Apps
CREATE TABLE apps (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
developer_id UUID REFERENCES developers(id) ON DELETE CASCADE,
package_id VARCHAR(255) UNIQUE NOT NULL, -- com.dev.app
name VARCHAR(100) NOT NULL,
description TEXT,
category VARCHAR(50),
tags VARCHAR(50)[] DEFAULT '{}',
status VARCHAR(20) DEFAULT 'draft', -- draft, published, suspended
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- App Versions
CREATE TABLE app_versions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
app_id UUID REFERENCES apps(id) ON DELETE CASCADE,
version_code INTEGER NOT NULL,
version_name VARCHAR(20) NOT NULL,
package_url TEXT NOT NULL,
package_size BIGINT NOT NULL,
signature VARCHAR(512) NOT NULL,
permissions JSONB DEFAULT '[]',
min_mosis_version VARCHAR(20),
release_notes TEXT,
status VARCHAR(20) DEFAULT 'draft', -- draft, review, approved, published, rejected
review_notes TEXT,
published_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(app_id, version_code)
);
-- Developer Signing Keys
CREATE TABLE signing_keys (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
developer_id UUID REFERENCES developers(id) ON DELETE CASCADE,
name VARCHAR(100) NOT NULL,
public_key TEXT NOT NULL,
fingerprint VARCHAR(64) NOT NULL,
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMPTZ DEFAULT NOW()
);
Telemetry Tables (if using PostgreSQL)
Indexes
Migration Strategy
Approach: Incremental Migrations
Tools
- Go: golang-migrate, goose
- Node.js: Prisma Migrate, Drizzle Kit
- Rust: sqlx migrate, refinery
Rollback Strategy
- Every migration has up/down
- Test rollbacks in staging
- Keep migrations small and focused
Backup Strategy
PostgreSQL
SQLite + Litestream
Recovery Time Objectives
| Scenario |
RTO |
RPO |
| Hardware failure |
1 hour |
5 minutes |
| Data corruption |
4 hours |
1 hour |
| Disaster recovery |
24 hours |
24 hours |
Recommendation
For MVP/Early Stage
SQLite + Litestream
- Simplest to operate
- Lowest cost
- Good enough for initial scale
- Easy migration to PostgreSQL later
For Production Scale
PostgreSQL + TimescaleDB
- Handles all data types well
- Time-series for telemetry
- Proven at scale
- Good tooling ecosystem
Hybrid (If needed later)
Deliverables
Open Questions
- Expected telemetry volume per day?
- How long to retain raw telemetry?
- Need for real-time analytics vs batch?
- Multi-region requirements?
References