Files
MosisService/DEV_PORTAL_M03_DATABASE.md

14 KiB

Milestone 3: Database Selection

Status: Decided Goal: Choose database for developer accounts, app metadata, and analytics.

Decision

SQLite + Litestream for self-hosted deployment on Synology NAS.

Database:   SQLite 3.x (WAL mode)
Driver:     modernc.org/sqlite (pure Go, no CGO)
Backup:     Litestream continuous replication
Storage:    Synology volume (/volume1/mosis/)

Rationale

  1. Single container - No separate database service needed
  2. Minimal resources - ~50MB RAM, perfect for NAS
  3. Zero ops - No connection pooling, no tuning
  4. Continuous backup - Litestream replicates to local storage
  5. Point-in-time recovery - Restore to any moment
  6. Sufficient scale - Handles 1000s of developers easily

Architecture

┌─────────────────────────────────────────┐
│           Synology NAS                   │
│  ┌─────────────────────────────────┐    │
│  │  mosis-portal container         │    │
│  │  ├── Go binary                  │    │
│  │  ├── SQLite (portal.db)         │    │
│  │  └── Litestream                 │    │
│  └──────────────┬──────────────────┘    │
│                 │                        │
│  ┌──────────────▼──────────────────┐    │
│  │  /volume1/mosis/                │    │
│  │  ├── data/portal.db             │    │
│  │  ├── data/portal.db-wal         │    │
│  │  ├── backups/ (litestream)      │    │
│  │  └── packages/ (app uploads)    │    │
│  └─────────────────────────────────┘    │
└─────────────────────────────────────────┘

Litestream Configuration

dbs:
  - path: /data/portal.db
    replicas:
      - type: file
        path: /backups/portal
        retention: 720h  # 30 days

Overview

The database stores all persistent data: developer accounts, app metadata, versions, telemetry events, and audit logs.


Requirements

Data Characteristics

Data Type Volume Access Pattern Consistency
Developers 10K rows Read-heavy, low write Strong
Apps 100K rows Read-heavy Strong
Versions 500K rows Read-heavy Strong
API Keys 50K rows Read-heavy Strong
Telemetry 100M+ rows Write-heavy, append Eventual OK
Audit Logs 10M+ rows Write-heavy, append Eventual OK

Query Patterns

  • Get developer by email
  • List apps by developer
  • Get app with latest version
  • Search apps by name/tags
  • Aggregate telemetry by app/day
  • Time-range queries on events

Options Analysis

Option A: PostgreSQL

Characteristics

Type:       Relational (SQL)
ACID:       Full
JSON:       Native JSONB support
Full-text:  Built-in tsvector
Scaling:    Vertical + read replicas

Pros

Advantage Details
Battle-tested Decades of reliability
ACID compliance Strong consistency
JSON support JSONB for flexible data
Full-text search No separate search engine needed
Extensions PostGIS, pg_trgm, etc.
Tooling pgAdmin, great ORMs

Cons

Disadvantage Details
Ops overhead Need connection pooling
Scaling writes Vertical scaling limits
Time-series Not optimized for telemetry

Hosting Options

Provider Free Tier Paid
Supabase 500MB $25/mo
Neon 512MB $19/mo
Railway 1GB $5/mo
AWS RDS - $15/mo+
Self-hosted - VPS cost

Option B: SQLite + Litestream

Characteristics

Type:       Embedded relational
ACID:       Full
Scaling:    Single writer
Backup:     Litestream to S3

Pros

Advantage Details
Zero ops No separate DB server
Fast reads In-process, no network
Simple backup Litestream handles replication
Low cost Just storage costs
Portable Easy local development

Cons

Disadvantage Details
Single writer Limits write concurrency
No horizontal scale One server only
Limited features No full-text (without FTS5)

Cost Estimate

Component Cost/month
S3 storage (10GB) $0.25
Compute Included in app server

Option C: PostgreSQL + TimescaleDB

Characteristics

Type:       Time-series extension
Base:       PostgreSQL
Scaling:    Automatic partitioning
Compression: Native

Pros

Advantage Details
Best of both Relational + time-series
Auto-partition Handles telemetry scale
Compression 90%+ compression ratio
Continuous aggregates Pre-computed rollups

Cons

Disadvantage Details
Complexity More to manage
Cost Higher than plain Postgres
Learning curve New concepts

Option D: Hybrid Approach

PostgreSQL          → Developers, Apps, Versions, API Keys
ClickHouse/QuestDB  → Telemetry, Analytics
Redis               → Caching, Sessions

Pros

Advantage Details
Right tool for job Optimized for each use case
Scale independently Telemetry won't affect main DB
Performance Best possible for each workload

Cons

Disadvantage Details
Complexity Multiple systems to manage
Cost More infrastructure
Consistency Cross-DB transactions hard

Schema Design (SQLite)

Core Tables

-- Developers
CREATE TABLE developers (
    id TEXT PRIMARY KEY,  -- UUID as text
    email TEXT UNIQUE NOT NULL,
    name TEXT NOT NULL,
    password_hash TEXT,
    oauth_provider TEXT,
    oauth_id TEXT,
    verified INTEGER DEFAULT 0,
    created_at TEXT DEFAULT (datetime('now')),
    updated_at TEXT DEFAULT (datetime('now'))
);

-- API Keys
CREATE TABLE api_keys (
    id TEXT PRIMARY KEY,
    developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
    name TEXT NOT NULL,
    key_hash TEXT NOT NULL,
    key_prefix TEXT NOT NULL,  -- For display: "mk_abc..."
    permissions TEXT DEFAULT '[]',  -- JSON array
    last_used_at TEXT,
    expires_at TEXT,
    created_at TEXT DEFAULT (datetime('now'))
);

-- Apps
CREATE TABLE apps (
    id TEXT PRIMARY KEY,
    developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
    package_id TEXT UNIQUE NOT NULL,  -- com.dev.app
    name TEXT NOT NULL,
    description TEXT,
    category TEXT,
    tags TEXT DEFAULT '[]',  -- JSON array
    status TEXT DEFAULT 'draft',  -- draft, published, suspended
    created_at TEXT DEFAULT (datetime('now')),
    updated_at TEXT DEFAULT (datetime('now'))
);

-- App Versions
CREATE TABLE app_versions (
    id TEXT PRIMARY KEY,
    app_id TEXT NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
    version_code INTEGER NOT NULL,
    version_name TEXT NOT NULL,
    package_url TEXT NOT NULL,
    package_size INTEGER NOT NULL,
    signature TEXT NOT NULL,
    permissions TEXT DEFAULT '[]',  -- JSON array
    min_mosis_version TEXT,
    release_notes TEXT,
    status TEXT DEFAULT 'draft',  -- draft, review, approved, published, rejected
    review_notes TEXT,
    published_at TEXT,
    created_at TEXT DEFAULT (datetime('now')),
    UNIQUE(app_id, version_code)
);

-- Developer Signing Keys
CREATE TABLE signing_keys (
    id TEXT PRIMARY KEY,
    developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
    name TEXT NOT NULL,
    public_key TEXT NOT NULL,
    fingerprint TEXT NOT NULL,
    is_active INTEGER DEFAULT 1,
    created_at TEXT DEFAULT (datetime('now'))
);

Telemetry Tables

-- Telemetry Events (append-only, partition by month via separate tables)
CREATE TABLE telemetry_events (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    app_id TEXT NOT NULL,
    device_id TEXT NOT NULL,  -- Hashed for privacy
    event_type TEXT NOT NULL,
    event_data TEXT,  -- JSON string
    mosis_version TEXT,
    timestamp TEXT NOT NULL  -- ISO8601 format
);

-- Crash Reports
CREATE TABLE crash_reports (
    id TEXT PRIMARY KEY,
    app_id TEXT NOT NULL,
    app_version TEXT NOT NULL,
    device_id TEXT NOT NULL,
    crash_type TEXT NOT NULL,
    message TEXT,
    stack_trace TEXT,
    context TEXT,  -- JSON string
    mosis_version TEXT,
    timestamp TEXT NOT NULL,
    created_at TEXT DEFAULT (datetime('now'))
);

-- Daily aggregates (computed by background job)
CREATE TABLE telemetry_daily (
    app_id TEXT NOT NULL,
    date TEXT NOT NULL,  -- YYYY-MM-DD
    event_type TEXT NOT NULL,
    count INTEGER NOT NULL,
    unique_devices INTEGER NOT NULL,
    PRIMARY KEY (app_id, date, event_type)
);

-- Audit Logs
CREATE TABLE audit_logs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    developer_id TEXT,
    action TEXT NOT NULL,
    resource_type TEXT,
    resource_id TEXT,
    details TEXT,  -- JSON string
    ip_address TEXT,
    user_agent TEXT,
    created_at TEXT DEFAULT (datetime('now'))
);

Note: For high-volume telemetry, consider:

  • Separate SQLite database file for telemetry (isolates write load)
  • Monthly table rotation with application-level partitioning
  • Aggressive data retention (delete events older than 90 days)

Indexes

-- Developers
CREATE INDEX idx_developers_email ON developers(email);
CREATE INDEX idx_developers_oauth ON developers(oauth_provider, oauth_id);

-- API Keys
CREATE INDEX idx_api_keys_developer ON api_keys(developer_id);
CREATE INDEX idx_api_keys_prefix ON api_keys(key_prefix);

-- Apps
CREATE INDEX idx_apps_developer ON apps(developer_id);
CREATE INDEX idx_apps_package ON apps(package_id);
CREATE INDEX idx_apps_status ON apps(status);
CREATE INDEX idx_apps_name ON apps(name);  -- For LIKE searches

-- Versions
CREATE INDEX idx_versions_app ON app_versions(app_id);
CREATE INDEX idx_versions_status ON app_versions(status);

-- Signing Keys
CREATE INDEX idx_signing_keys_developer ON signing_keys(developer_id);
CREATE INDEX idx_signing_keys_fingerprint ON signing_keys(fingerprint);

-- Telemetry
CREATE INDEX idx_telemetry_app ON telemetry_events(app_id, timestamp);
CREATE INDEX idx_telemetry_type ON telemetry_events(event_type, timestamp);

-- Crashes
CREATE INDEX idx_crashes_app ON crash_reports(app_id, timestamp);
CREATE INDEX idx_crashes_type ON crash_reports(crash_type);

-- Audit Logs
CREATE INDEX idx_audit_developer ON audit_logs(developer_id);
CREATE INDEX idx_audit_created ON audit_logs(created_at);

Full-text Search: For app search, use SQLite FTS5:

-- Create FTS5 virtual table for app search
CREATE VIRTUAL TABLE apps_fts USING fts5(
    name,
    description,
    tags,
    content='apps',
    content_rowid='rowid'
);

-- Triggers to keep FTS in sync
CREATE TRIGGER apps_ai AFTER INSERT ON apps BEGIN
    INSERT INTO apps_fts(rowid, name, description, tags)
    VALUES (NEW.rowid, NEW.name, NEW.description, NEW.tags);
END;

CREATE TRIGGER apps_ad AFTER DELETE ON apps BEGIN
    INSERT INTO apps_fts(apps_fts, rowid, name, description, tags)
    VALUES ('delete', OLD.rowid, OLD.name, OLD.description, OLD.tags);
END;

CREATE TRIGGER apps_au AFTER UPDATE ON apps BEGIN
    INSERT INTO apps_fts(apps_fts, rowid, name, description, tags)
    VALUES ('delete', OLD.rowid, OLD.name, OLD.description, OLD.tags);
    INSERT INTO apps_fts(rowid, name, description, tags)
    VALUES (NEW.rowid, NEW.name, NEW.description, NEW.tags);
END;

Migration Strategy

Approach: Incremental Migrations

migrations/
├── 001_create_developers.sql
├── 002_create_apps.sql
├── 003_create_versions.sql
├── 004_create_telemetry.sql
└── ...

Tools

  • Go: golang-migrate, goose
  • Node.js: Prisma Migrate, Drizzle Kit
  • Rust: sqlx migrate, refinery

Rollback Strategy

  • Every migration has up/down
  • Test rollbacks in staging
  • Keep migrations small and focused

Backup Strategy

PostgreSQL

# Daily full backup
pg_dump -Fc $DATABASE_URL > backup_$(date +%Y%m%d).dump

# Continuous WAL archiving to S3
archive_command = 'aws s3 cp %p s3://backups/wal/%f'

SQLite + Litestream

# litestream.yml
dbs:
  - path: /data/mosis.db
    replicas:
      - url: s3://backups/mosis
        retention: 720h  # 30 days

Recovery Time Objectives

Scenario RTO RPO
Hardware failure 1 hour 5 minutes
Data corruption 4 hours 1 hour
Disaster recovery 24 hours 24 hours

Recommendation

For MVP/Early Stage

SQLite + Litestream

  • Simplest to operate
  • Lowest cost
  • Good enough for initial scale
  • Easy migration to PostgreSQL later

For Production Scale

PostgreSQL + TimescaleDB

  • Handles all data types well
  • Time-series for telemetry
  • Proven at scale
  • Good tooling ecosystem

Hybrid (If needed later)

PostgreSQL     → Core data (developers, apps)
TimescaleDB    → Telemetry (same cluster, extension)
Redis          → Caching, rate limiting

Deliverables

  • Final database selection (SQLite + Litestream)
  • Complete schema design (core + telemetry + FTS5)
  • Migration scripts (golang-migrate)
  • Backup/restore procedures (Litestream to local storage)
  • Connection pooling setup (not needed for SQLite)
  • Monitoring queries

Open Questions

  1. Expected telemetry volume per day? → Start simple, optimize if needed
  2. How long to retain raw telemetry? → 90 days raw, daily aggregates indefinitely
  3. Need for real-time analytics vs batch? → Batch is sufficient for MVP
  4. Multi-region requirements? → Single NAS deployment for now

References