Milestone 3: Database Selection
Status: Decided
Goal: Choose database for developer accounts, app metadata, and analytics.
Decision
SQLite + Litestream for self-hosted deployment on Synology NAS.
Rationale
- Single container - No separate database service needed
- Minimal resources - ~50MB RAM, perfect for NAS
- Zero ops - No connection pooling, no tuning
- Continuous backup - Litestream replicates to local storage
- Point-in-time recovery - Restore to any moment
- Sufficient scale - Handles 1000s of developers easily
Architecture
Litestream Configuration
Overview
The database stores all persistent data: developer accounts, app metadata, versions, telemetry events, and audit logs.
Requirements
Data Characteristics
| Data Type |
Volume |
Access Pattern |
Consistency |
| Developers |
10K rows |
Read-heavy, low write |
Strong |
| Apps |
100K rows |
Read-heavy |
Strong |
| Versions |
500K rows |
Read-heavy |
Strong |
| API Keys |
50K rows |
Read-heavy |
Strong |
| Telemetry |
100M+ rows |
Write-heavy, append |
Eventual OK |
| Audit Logs |
10M+ rows |
Write-heavy, append |
Eventual OK |
Query Patterns
- Get developer by email
- List apps by developer
- Get app with latest version
- Search apps by name/tags
- Aggregate telemetry by app/day
- Time-range queries on events
Options Analysis
Option A: PostgreSQL
Characteristics
Pros
| Advantage |
Details |
| Battle-tested |
Decades of reliability |
| ACID compliance |
Strong consistency |
| JSON support |
JSONB for flexible data |
| Full-text search |
No separate search engine needed |
| Extensions |
PostGIS, pg_trgm, etc. |
| Tooling |
pgAdmin, great ORMs |
Cons
| Disadvantage |
Details |
| Ops overhead |
Need connection pooling |
| Scaling writes |
Vertical scaling limits |
| Time-series |
Not optimized for telemetry |
Hosting Options
| Provider |
Free Tier |
Paid |
| Supabase |
500MB |
$25/mo |
| Neon |
512MB |
$19/mo |
| Railway |
1GB |
$5/mo |
| AWS RDS |
- |
$15/mo+ |
| Self-hosted |
- |
VPS cost |
Option B: SQLite + Litestream
Characteristics
Pros
| Advantage |
Details |
| Zero ops |
No separate DB server |
| Fast reads |
In-process, no network |
| Simple backup |
Litestream handles replication |
| Low cost |
Just storage costs |
| Portable |
Easy local development |
Cons
| Disadvantage |
Details |
| Single writer |
Limits write concurrency |
| No horizontal scale |
One server only |
| Limited features |
No full-text (without FTS5) |
Cost Estimate
| Component |
Cost/month |
| S3 storage (10GB) |
$0.25 |
| Compute |
Included in app server |
Option C: PostgreSQL + TimescaleDB
Characteristics
Pros
| Advantage |
Details |
| Best of both |
Relational + time-series |
| Auto-partition |
Handles telemetry scale |
| Compression |
90%+ compression ratio |
| Continuous aggregates |
Pre-computed rollups |
Cons
| Disadvantage |
Details |
| Complexity |
More to manage |
| Cost |
Higher than plain Postgres |
| Learning curve |
New concepts |
Option D: Hybrid Approach
Pros
| Advantage |
Details |
| Right tool for job |
Optimized for each use case |
| Scale independently |
Telemetry won't affect main DB |
| Performance |
Best possible for each workload |
Cons
| Disadvantage |
Details |
| Complexity |
Multiple systems to manage |
| Cost |
More infrastructure |
| Consistency |
Cross-DB transactions hard |
Schema Design (SQLite)
Core Tables
-- Developers
CREATE TABLE developers (
id TEXT PRIMARY KEY, -- UUID as text
email TEXT UNIQUE NOT NULL,
name TEXT NOT NULL,
password_hash TEXT,
oauth_provider TEXT,
oauth_id TEXT,
verified INTEGER DEFAULT 0,
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
-- API Keys
CREATE TABLE api_keys (
id TEXT PRIMARY KEY,
developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
name TEXT NOT NULL,
key_hash TEXT NOT NULL,
key_prefix TEXT NOT NULL, -- For display: "mk_abc..."
permissions TEXT DEFAULT '[]', -- JSON array
last_used_at TEXT,
expires_at TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
-- Apps
CREATE TABLE apps (
id TEXT PRIMARY KEY,
developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
package_id TEXT UNIQUE NOT NULL, -- com.dev.app
name TEXT NOT NULL,
description TEXT,
category TEXT,
tags TEXT DEFAULT '[]', -- JSON array
status TEXT DEFAULT 'draft', -- draft, published, suspended
created_at TEXT DEFAULT (datetime('now')),
updated_at TEXT DEFAULT (datetime('now'))
);
-- App Versions
CREATE TABLE app_versions (
id TEXT PRIMARY KEY,
app_id TEXT NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
version_code INTEGER NOT NULL,
version_name TEXT NOT NULL,
package_url TEXT NOT NULL,
package_size INTEGER NOT NULL,
signature TEXT NOT NULL,
permissions TEXT DEFAULT '[]', -- JSON array
min_mosis_version TEXT,
release_notes TEXT,
status TEXT DEFAULT 'draft', -- draft, review, approved, published, rejected
review_notes TEXT,
published_at TEXT,
created_at TEXT DEFAULT (datetime('now')),
UNIQUE(app_id, version_code)
);
-- Developer Signing Keys
CREATE TABLE signing_keys (
id TEXT PRIMARY KEY,
developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
name TEXT NOT NULL,
public_key TEXT NOT NULL,
fingerprint TEXT NOT NULL,
is_active INTEGER DEFAULT 1,
created_at TEXT DEFAULT (datetime('now'))
);
Telemetry Tables
Note: For high-volume telemetry, consider:
- Separate SQLite database file for telemetry (isolates write load)
- Monthly table rotation with application-level partitioning
- Aggressive data retention (delete events older than 90 days)
Indexes
Full-text Search: For app search, use SQLite FTS5:
Migration Strategy
Approach: Incremental Migrations
Tools
- Go: golang-migrate, goose
- Node.js: Prisma Migrate, Drizzle Kit
- Rust: sqlx migrate, refinery
Rollback Strategy
- Every migration has up/down
- Test rollbacks in staging
- Keep migrations small and focused
Backup Strategy
PostgreSQL
SQLite + Litestream
Recovery Time Objectives
| Scenario |
RTO |
RPO |
| Hardware failure |
1 hour |
5 minutes |
| Data corruption |
4 hours |
1 hour |
| Disaster recovery |
24 hours |
24 hours |
Recommendation
For MVP/Early Stage
SQLite + Litestream
- Simplest to operate
- Lowest cost
- Good enough for initial scale
- Easy migration to PostgreSQL later
For Production Scale
PostgreSQL + TimescaleDB
- Handles all data types well
- Time-series for telemetry
- Proven at scale
- Good tooling ecosystem
Hybrid (If needed later)
Deliverables
Open Questions
Expected telemetry volume per day? → Start simple, optimize if needed
How long to retain raw telemetry? → 90 days raw, daily aggregates indefinitely
Need for real-time analytics vs batch? → Batch is sufficient for MVP
Multi-region requirements? → Single NAS deployment for now
References