move docs to docs/ folder, merge architecture files, update references
This commit is contained in:
527
docs/DEV_PORTAL_M03_DATABASE.md
Normal file
527
docs/DEV_PORTAL_M03_DATABASE.md
Normal file
@@ -0,0 +1,527 @@
|
||||
# Milestone 3: Database Selection
|
||||
|
||||
**Status**: Decided
|
||||
**Goal**: Choose database for developer accounts, app metadata, and analytics.
|
||||
|
||||
## Decision
|
||||
|
||||
**SQLite + Litestream** for self-hosted deployment on Synology NAS.
|
||||
|
||||
```
|
||||
Database: SQLite 3.x (WAL mode)
|
||||
Driver: modernc.org/sqlite (pure Go, no CGO)
|
||||
Backup: Litestream continuous replication
|
||||
Storage: Synology volume (/volume1/mosis/)
|
||||
```
|
||||
|
||||
### Rationale
|
||||
|
||||
1. **Single container** - No separate database service needed
|
||||
2. **Minimal resources** - ~50MB RAM, perfect for NAS
|
||||
3. **Zero ops** - No connection pooling, no tuning
|
||||
4. **Continuous backup** - Litestream replicates to local storage
|
||||
5. **Point-in-time recovery** - Restore to any moment
|
||||
6. **Sufficient scale** - Handles 1000s of developers easily
|
||||
|
||||
### Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────┐
|
||||
│ Synology NAS │
|
||||
│ ┌─────────────────────────────────┐ │
|
||||
│ │ mosis-portal container │ │
|
||||
│ │ ├── Go binary │ │
|
||||
│ │ ├── SQLite (portal.db) │ │
|
||||
│ │ └── Litestream │ │
|
||||
│ └──────────────┬──────────────────┘ │
|
||||
│ │ │
|
||||
│ ┌──────────────▼──────────────────┐ │
|
||||
│ │ /volume1/mosis/ │ │
|
||||
│ │ ├── data/portal.db │ │
|
||||
│ │ ├── data/portal.db-wal │ │
|
||||
│ │ ├── backups/ (litestream) │ │
|
||||
│ │ └── packages/ (app uploads) │ │
|
||||
│ └─────────────────────────────────┘ │
|
||||
└─────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### Litestream Configuration
|
||||
|
||||
```yaml
|
||||
dbs:
|
||||
- path: /data/portal.db
|
||||
replicas:
|
||||
- type: file
|
||||
path: /backups/portal
|
||||
retention: 720h # 30 days
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The database stores all persistent data: developer accounts, app metadata, versions, telemetry events, and audit logs.
|
||||
|
||||
---
|
||||
|
||||
## Requirements
|
||||
|
||||
### Data Characteristics
|
||||
|
||||
| Data Type | Volume | Access Pattern | Consistency |
|
||||
|-----------|--------|----------------|-------------|
|
||||
| Developers | 10K rows | Read-heavy, low write | Strong |
|
||||
| Apps | 100K rows | Read-heavy | Strong |
|
||||
| Versions | 500K rows | Read-heavy | Strong |
|
||||
| API Keys | 50K rows | Read-heavy | Strong |
|
||||
| Telemetry | 100M+ rows | Write-heavy, append | Eventual OK |
|
||||
| Audit Logs | 10M+ rows | Write-heavy, append | Eventual OK |
|
||||
|
||||
### Query Patterns
|
||||
|
||||
- Get developer by email
|
||||
- List apps by developer
|
||||
- Get app with latest version
|
||||
- Search apps by name/tags
|
||||
- Aggregate telemetry by app/day
|
||||
- Time-range queries on events
|
||||
|
||||
---
|
||||
|
||||
## Options Analysis
|
||||
|
||||
### Option A: PostgreSQL
|
||||
|
||||
#### Characteristics
|
||||
```
|
||||
Type: Relational (SQL)
|
||||
ACID: Full
|
||||
JSON: Native JSONB support
|
||||
Full-text: Built-in tsvector
|
||||
Scaling: Vertical + read replicas
|
||||
```
|
||||
|
||||
#### Pros
|
||||
| Advantage | Details |
|
||||
|-----------|---------|
|
||||
| Battle-tested | Decades of reliability |
|
||||
| ACID compliance | Strong consistency |
|
||||
| JSON support | JSONB for flexible data |
|
||||
| Full-text search | No separate search engine needed |
|
||||
| Extensions | PostGIS, pg_trgm, etc. |
|
||||
| Tooling | pgAdmin, great ORMs |
|
||||
|
||||
#### Cons
|
||||
| Disadvantage | Details |
|
||||
|--------------|---------|
|
||||
| Ops overhead | Need connection pooling |
|
||||
| Scaling writes | Vertical scaling limits |
|
||||
| Time-series | Not optimized for telemetry |
|
||||
|
||||
#### Hosting Options
|
||||
| Provider | Free Tier | Paid |
|
||||
|----------|-----------|------|
|
||||
| Supabase | 500MB | $25/mo |
|
||||
| Neon | 512MB | $19/mo |
|
||||
| Railway | 1GB | $5/mo |
|
||||
| AWS RDS | - | $15/mo+ |
|
||||
| Self-hosted | - | VPS cost |
|
||||
|
||||
---
|
||||
|
||||
### Option B: SQLite + Litestream
|
||||
|
||||
#### Characteristics
|
||||
```
|
||||
Type: Embedded relational
|
||||
ACID: Full
|
||||
Scaling: Single writer
|
||||
Backup: Litestream to S3
|
||||
```
|
||||
|
||||
#### Pros
|
||||
| Advantage | Details |
|
||||
|-----------|---------|
|
||||
| Zero ops | No separate DB server |
|
||||
| Fast reads | In-process, no network |
|
||||
| Simple backup | Litestream handles replication |
|
||||
| Low cost | Just storage costs |
|
||||
| Portable | Easy local development |
|
||||
|
||||
#### Cons
|
||||
| Disadvantage | Details |
|
||||
|--------------|---------|
|
||||
| Single writer | Limits write concurrency |
|
||||
| No horizontal scale | One server only |
|
||||
| Limited features | No full-text (without FTS5) |
|
||||
|
||||
#### Cost Estimate
|
||||
| Component | Cost/month |
|
||||
|-----------|------------|
|
||||
| S3 storage (10GB) | $0.25 |
|
||||
| Compute | Included in app server |
|
||||
|
||||
---
|
||||
|
||||
### Option C: PostgreSQL + TimescaleDB
|
||||
|
||||
#### Characteristics
|
||||
```
|
||||
Type: Time-series extension
|
||||
Base: PostgreSQL
|
||||
Scaling: Automatic partitioning
|
||||
Compression: Native
|
||||
```
|
||||
|
||||
#### Pros
|
||||
| Advantage | Details |
|
||||
|-----------|---------|
|
||||
| Best of both | Relational + time-series |
|
||||
| Auto-partition | Handles telemetry scale |
|
||||
| Compression | 90%+ compression ratio |
|
||||
| Continuous aggregates | Pre-computed rollups |
|
||||
|
||||
#### Cons
|
||||
| Disadvantage | Details |
|
||||
|--------------|---------|
|
||||
| Complexity | More to manage |
|
||||
| Cost | Higher than plain Postgres |
|
||||
| Learning curve | New concepts |
|
||||
|
||||
---
|
||||
|
||||
### Option D: Hybrid Approach
|
||||
|
||||
```
|
||||
PostgreSQL → Developers, Apps, Versions, API Keys
|
||||
ClickHouse/QuestDB → Telemetry, Analytics
|
||||
Redis → Caching, Sessions
|
||||
```
|
||||
|
||||
#### Pros
|
||||
| Advantage | Details |
|
||||
|-----------|---------|
|
||||
| Right tool for job | Optimized for each use case |
|
||||
| Scale independently | Telemetry won't affect main DB |
|
||||
| Performance | Best possible for each workload |
|
||||
|
||||
#### Cons
|
||||
| Disadvantage | Details |
|
||||
|--------------|---------|
|
||||
| Complexity | Multiple systems to manage |
|
||||
| Cost | More infrastructure |
|
||||
| Consistency | Cross-DB transactions hard |
|
||||
|
||||
---
|
||||
|
||||
## Schema Design (SQLite)
|
||||
|
||||
### Core Tables
|
||||
|
||||
```sql
|
||||
-- Developers
|
||||
CREATE TABLE developers (
|
||||
id TEXT PRIMARY KEY, -- UUID as text
|
||||
email TEXT UNIQUE NOT NULL,
|
||||
name TEXT NOT NULL,
|
||||
password_hash TEXT,
|
||||
oauth_provider TEXT,
|
||||
oauth_id TEXT,
|
||||
verified INTEGER DEFAULT 0,
|
||||
created_at TEXT DEFAULT (datetime('now')),
|
||||
updated_at TEXT DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
-- API Keys
|
||||
CREATE TABLE api_keys (
|
||||
id TEXT PRIMARY KEY,
|
||||
developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
|
||||
name TEXT NOT NULL,
|
||||
key_hash TEXT NOT NULL,
|
||||
key_prefix TEXT NOT NULL, -- For display: "mk_abc..."
|
||||
permissions TEXT DEFAULT '[]', -- JSON array
|
||||
last_used_at TEXT,
|
||||
expires_at TEXT,
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
-- Apps
|
||||
CREATE TABLE apps (
|
||||
id TEXT PRIMARY KEY,
|
||||
developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
|
||||
package_id TEXT UNIQUE NOT NULL, -- com.dev.app
|
||||
name TEXT NOT NULL,
|
||||
description TEXT,
|
||||
category TEXT,
|
||||
tags TEXT DEFAULT '[]', -- JSON array
|
||||
status TEXT DEFAULT 'draft', -- draft, published, suspended
|
||||
created_at TEXT DEFAULT (datetime('now')),
|
||||
updated_at TEXT DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
-- App Versions
|
||||
CREATE TABLE app_versions (
|
||||
id TEXT PRIMARY KEY,
|
||||
app_id TEXT NOT NULL REFERENCES apps(id) ON DELETE CASCADE,
|
||||
version_code INTEGER NOT NULL,
|
||||
version_name TEXT NOT NULL,
|
||||
package_url TEXT NOT NULL,
|
||||
package_size INTEGER NOT NULL,
|
||||
signature TEXT NOT NULL,
|
||||
permissions TEXT DEFAULT '[]', -- JSON array
|
||||
min_mosis_version TEXT,
|
||||
release_notes TEXT,
|
||||
status TEXT DEFAULT 'draft', -- draft, review, approved, published, rejected
|
||||
review_notes TEXT,
|
||||
published_at TEXT,
|
||||
created_at TEXT DEFAULT (datetime('now')),
|
||||
UNIQUE(app_id, version_code)
|
||||
);
|
||||
|
||||
-- Developer Signing Keys
|
||||
CREATE TABLE signing_keys (
|
||||
id TEXT PRIMARY KEY,
|
||||
developer_id TEXT NOT NULL REFERENCES developers(id) ON DELETE CASCADE,
|
||||
name TEXT NOT NULL,
|
||||
public_key TEXT NOT NULL,
|
||||
fingerprint TEXT NOT NULL,
|
||||
is_active INTEGER DEFAULT 1,
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
);
|
||||
```
|
||||
|
||||
### Telemetry Tables
|
||||
|
||||
```sql
|
||||
-- Telemetry Events (append-only, partition by month via separate tables)
|
||||
CREATE TABLE telemetry_events (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
app_id TEXT NOT NULL,
|
||||
device_id TEXT NOT NULL, -- Hashed for privacy
|
||||
event_type TEXT NOT NULL,
|
||||
event_data TEXT, -- JSON string
|
||||
mosis_version TEXT,
|
||||
timestamp TEXT NOT NULL -- ISO8601 format
|
||||
);
|
||||
|
||||
-- Crash Reports
|
||||
CREATE TABLE crash_reports (
|
||||
id TEXT PRIMARY KEY,
|
||||
app_id TEXT NOT NULL,
|
||||
app_version TEXT NOT NULL,
|
||||
device_id TEXT NOT NULL,
|
||||
crash_type TEXT NOT NULL,
|
||||
message TEXT,
|
||||
stack_trace TEXT,
|
||||
context TEXT, -- JSON string
|
||||
mosis_version TEXT,
|
||||
timestamp TEXT NOT NULL,
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
);
|
||||
|
||||
-- Daily aggregates (computed by background job)
|
||||
CREATE TABLE telemetry_daily (
|
||||
app_id TEXT NOT NULL,
|
||||
date TEXT NOT NULL, -- YYYY-MM-DD
|
||||
event_type TEXT NOT NULL,
|
||||
count INTEGER NOT NULL,
|
||||
unique_devices INTEGER NOT NULL,
|
||||
PRIMARY KEY (app_id, date, event_type)
|
||||
);
|
||||
|
||||
-- Audit Logs
|
||||
CREATE TABLE audit_logs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
developer_id TEXT,
|
||||
action TEXT NOT NULL,
|
||||
resource_type TEXT,
|
||||
resource_id TEXT,
|
||||
details TEXT, -- JSON string
|
||||
ip_address TEXT,
|
||||
user_agent TEXT,
|
||||
created_at TEXT DEFAULT (datetime('now'))
|
||||
);
|
||||
```
|
||||
|
||||
**Note**: For high-volume telemetry, consider:
|
||||
- Separate SQLite database file for telemetry (isolates write load)
|
||||
- Monthly table rotation with application-level partitioning
|
||||
- Aggressive data retention (delete events older than 90 days)
|
||||
|
||||
### Indexes
|
||||
|
||||
```sql
|
||||
-- Developers
|
||||
CREATE INDEX idx_developers_email ON developers(email);
|
||||
CREATE INDEX idx_developers_oauth ON developers(oauth_provider, oauth_id);
|
||||
|
||||
-- API Keys
|
||||
CREATE INDEX idx_api_keys_developer ON api_keys(developer_id);
|
||||
CREATE INDEX idx_api_keys_prefix ON api_keys(key_prefix);
|
||||
|
||||
-- Apps
|
||||
CREATE INDEX idx_apps_developer ON apps(developer_id);
|
||||
CREATE INDEX idx_apps_package ON apps(package_id);
|
||||
CREATE INDEX idx_apps_status ON apps(status);
|
||||
CREATE INDEX idx_apps_name ON apps(name); -- For LIKE searches
|
||||
|
||||
-- Versions
|
||||
CREATE INDEX idx_versions_app ON app_versions(app_id);
|
||||
CREATE INDEX idx_versions_status ON app_versions(status);
|
||||
|
||||
-- Signing Keys
|
||||
CREATE INDEX idx_signing_keys_developer ON signing_keys(developer_id);
|
||||
CREATE INDEX idx_signing_keys_fingerprint ON signing_keys(fingerprint);
|
||||
|
||||
-- Telemetry
|
||||
CREATE INDEX idx_telemetry_app ON telemetry_events(app_id, timestamp);
|
||||
CREATE INDEX idx_telemetry_type ON telemetry_events(event_type, timestamp);
|
||||
|
||||
-- Crashes
|
||||
CREATE INDEX idx_crashes_app ON crash_reports(app_id, timestamp);
|
||||
CREATE INDEX idx_crashes_type ON crash_reports(crash_type);
|
||||
|
||||
-- Audit Logs
|
||||
CREATE INDEX idx_audit_developer ON audit_logs(developer_id);
|
||||
CREATE INDEX idx_audit_created ON audit_logs(created_at);
|
||||
```
|
||||
|
||||
**Full-text Search**: For app search, use SQLite FTS5:
|
||||
|
||||
```sql
|
||||
-- Create FTS5 virtual table for app search
|
||||
CREATE VIRTUAL TABLE apps_fts USING fts5(
|
||||
name,
|
||||
description,
|
||||
tags,
|
||||
content='apps',
|
||||
content_rowid='rowid'
|
||||
);
|
||||
|
||||
-- Triggers to keep FTS in sync
|
||||
CREATE TRIGGER apps_ai AFTER INSERT ON apps BEGIN
|
||||
INSERT INTO apps_fts(rowid, name, description, tags)
|
||||
VALUES (NEW.rowid, NEW.name, NEW.description, NEW.tags);
|
||||
END;
|
||||
|
||||
CREATE TRIGGER apps_ad AFTER DELETE ON apps BEGIN
|
||||
INSERT INTO apps_fts(apps_fts, rowid, name, description, tags)
|
||||
VALUES ('delete', OLD.rowid, OLD.name, OLD.description, OLD.tags);
|
||||
END;
|
||||
|
||||
CREATE TRIGGER apps_au AFTER UPDATE ON apps BEGIN
|
||||
INSERT INTO apps_fts(apps_fts, rowid, name, description, tags)
|
||||
VALUES ('delete', OLD.rowid, OLD.name, OLD.description, OLD.tags);
|
||||
INSERT INTO apps_fts(rowid, name, description, tags)
|
||||
VALUES (NEW.rowid, NEW.name, NEW.description, NEW.tags);
|
||||
END;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Approach: Incremental Migrations
|
||||
|
||||
```
|
||||
migrations/
|
||||
├── 001_create_developers.sql
|
||||
├── 002_create_apps.sql
|
||||
├── 003_create_versions.sql
|
||||
├── 004_create_telemetry.sql
|
||||
└── ...
|
||||
```
|
||||
|
||||
### Tools
|
||||
- **Go**: golang-migrate, goose
|
||||
- **Node.js**: Prisma Migrate, Drizzle Kit
|
||||
- **Rust**: sqlx migrate, refinery
|
||||
|
||||
### Rollback Strategy
|
||||
- Every migration has up/down
|
||||
- Test rollbacks in staging
|
||||
- Keep migrations small and focused
|
||||
|
||||
---
|
||||
|
||||
## Backup Strategy
|
||||
|
||||
### PostgreSQL
|
||||
```bash
|
||||
# Daily full backup
|
||||
pg_dump -Fc $DATABASE_URL > backup_$(date +%Y%m%d).dump
|
||||
|
||||
# Continuous WAL archiving to S3
|
||||
archive_command = 'aws s3 cp %p s3://backups/wal/%f'
|
||||
```
|
||||
|
||||
### SQLite + Litestream
|
||||
```yaml
|
||||
# litestream.yml
|
||||
dbs:
|
||||
- path: /data/mosis.db
|
||||
replicas:
|
||||
- url: s3://backups/mosis
|
||||
retention: 720h # 30 days
|
||||
```
|
||||
|
||||
### Recovery Time Objectives
|
||||
| Scenario | RTO | RPO |
|
||||
|----------|-----|-----|
|
||||
| Hardware failure | 1 hour | 5 minutes |
|
||||
| Data corruption | 4 hours | 1 hour |
|
||||
| Disaster recovery | 24 hours | 24 hours |
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
### For MVP/Early Stage
|
||||
**SQLite + Litestream**
|
||||
- Simplest to operate
|
||||
- Lowest cost
|
||||
- Good enough for initial scale
|
||||
- Easy migration to PostgreSQL later
|
||||
|
||||
### For Production Scale
|
||||
**PostgreSQL + TimescaleDB**
|
||||
- Handles all data types well
|
||||
- Time-series for telemetry
|
||||
- Proven at scale
|
||||
- Good tooling ecosystem
|
||||
|
||||
### Hybrid (If needed later)
|
||||
```
|
||||
PostgreSQL → Core data (developers, apps)
|
||||
TimescaleDB → Telemetry (same cluster, extension)
|
||||
Redis → Caching, rate limiting
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Deliverables
|
||||
|
||||
- [x] Final database selection (SQLite + Litestream)
|
||||
- [x] Complete schema design (core + telemetry + FTS5)
|
||||
- [ ] Migration scripts (golang-migrate)
|
||||
- [x] Backup/restore procedures (Litestream to local storage)
|
||||
- [x] ~~Connection pooling setup~~ (not needed for SQLite)
|
||||
- [ ] Monitoring queries
|
||||
|
||||
---
|
||||
|
||||
## Open Questions
|
||||
|
||||
1. ~~Expected telemetry volume per day?~~ → Start simple, optimize if needed
|
||||
2. ~~How long to retain raw telemetry?~~ → 90 days raw, daily aggregates indefinitely
|
||||
3. ~~Need for real-time analytics vs batch?~~ → Batch is sufficient for MVP
|
||||
4. ~~Multi-region requirements?~~ → Single NAS deployment for now
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
- [PostgreSQL JSONB performance](https://www.postgresql.org/docs/current/datatype-json.html)
|
||||
- [TimescaleDB vs InfluxDB](https://www.timescale.com/blog/timescaledb-vs-influxdb/)
|
||||
- [Litestream documentation](https://litestream.io/)
|
||||
- [SQLite at scale](https://www.sqlite.org/whentouse.html)
|
||||
Reference in New Issue
Block a user