15 KiB
15 KiB
Milestone 8: Telemetry System
Status: Planning Goal: Collect app usage analytics and crash reports while respecting privacy.
Overview
Telemetry provides developers with insights into app usage, performance, and crashes. Must balance usefulness with user privacy.
Privacy Principles
- Minimal collection - Only what's necessary
- No PII by default - Anonymized device IDs
- Transparency - Users know what's collected
- Opt-out available - Users can disable
- Data retention limits - Auto-delete old data
- GDPR compliance - Export/delete on request
Event Types
Automatic Events (Default)
| Event | Description | Data |
|---|---|---|
app_start |
App launched | version, mosis_version |
app_stop |
App closed | duration_seconds |
app_crash |
Unhandled error | crash_type, message |
lua_error |
Lua runtime error | message, stack (no user data) |
Performance Events (Default)
| Event | Description | Data |
|---|---|---|
perf_frame |
Frame time (sampled) | avg_ms, p95_ms |
perf_memory |
Memory usage | used_mb, limit_mb |
perf_startup |
Startup time | duration_ms |
Usage Events (Opt-in)
| Event | Description | Data |
|---|---|---|
screen_view |
Screen navigation | screen_name |
button_click |
UI interaction | element_id |
feature_used |
Feature usage | feature_name |
Data Schema
Event Payload
{
"app_id": "com.developer.myapp",
"app_version": "1.2.0",
"mosis_version": "1.0.0",
"device_id": "sha256_hashed_id",
"session_id": "uuid",
"events": [
{
"type": "app_start",
"timestamp": "2024-01-15T10:30:00Z",
"data": {}
},
{
"type": "screen_view",
"timestamp": "2024-01-15T10:30:05Z",
"data": {
"screen_name": "home"
}
}
]
}
Crash Report Payload
{
"app_id": "com.developer.myapp",
"app_version": "1.2.0",
"mosis_version": "1.0.0",
"device_id": "sha256_hashed_id",
"timestamp": "2024-01-15T10:35:00Z",
"crash": {
"type": "lua_error",
"message": "attempt to index nil value 'user'",
"stack_trace": "main.lua:42: in function 'loadUser'\nmain.lua:15: in main chunk",
"context": {
"screen": "profile.rml",
"memory_mb": 45,
"uptime_seconds": 300
}
}
}
Device ID Hashing
-- On device
local raw_id = get_android_id() -- or similar
local hashed = sha256(raw_id .. "mosis_salt_" .. app_id)
-- Result: "a3f2b1c4d5e6..."
-- Cannot reverse to original device ID
-- Different per app (can't track across apps)
Collection Architecture
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Device │────►│ Batch │────►│ API │────►│ Storage │
│ │ │ Queue │ │ │ │ │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│
│ Every 60s or
│ on app close
▼
┌──────────┐
│ Upload │
└──────────┘
Client-Side Batching
-- TelemetryManager on device
local events = {}
local last_flush = os.time()
function track(event_type, data)
if not telemetry_enabled then return end
table.insert(events, {
type = event_type,
timestamp = os.date("!%Y-%m-%dT%H:%M:%SZ"),
data = data or {}
})
-- Flush if batch is large or time elapsed
if #events >= 50 or (os.time() - last_flush) > 60 then
flush()
end
end
function flush()
if #events == 0 then return end
local payload = {
app_id = APP_ID,
app_version = APP_VERSION,
device_id = HASHED_DEVICE_ID,
events = events
}
-- Async HTTP POST
http.post(TELEMETRY_URL, json.encode(payload))
events = {}
last_flush = os.time()
end
Storage Options
Option A: PostgreSQL + TimescaleDB
-- Hypertable for time-series data
CREATE TABLE telemetry_events (
time TIMESTAMPTZ NOT NULL,
app_id TEXT NOT NULL,
device_id TEXT NOT NULL,
session_id TEXT,
event_type TEXT NOT NULL,
event_data JSONB,
app_version TEXT,
mosis_version TEXT
);
SELECT create_hypertable('telemetry_events', 'time');
-- Continuous aggregate for daily stats
CREATE MATERIALIZED VIEW daily_stats
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', time) AS day,
app_id,
event_type,
COUNT(*) as count,
COUNT(DISTINCT device_id) as unique_devices
FROM telemetry_events
GROUP BY day, app_id, event_type;
Option B: ClickHouse
CREATE TABLE telemetry_events (
timestamp DateTime,
app_id String,
device_id String,
session_id String,
event_type String,
event_data String, -- JSON
app_version String,
mosis_version String
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (app_id, timestamp);
Option C: Custom + PostgreSQL
Raw events → Write to append-only log
Aggregator → Process hourly → Write to PostgreSQL
Cleanup → Delete raw after 24h
Aggregation
Pre-computed Metrics
| Metric | Granularity | Retention |
|---|---|---|
| Daily active users | Day | 2 years |
| Event counts | Day | 1 year |
| Crash counts | Day | 1 year |
| Session duration | Day | 90 days |
| Performance percentiles | Day | 90 days |
Aggregation Queries
-- Daily active users
SELECT
DATE_TRUNC('day', time) as day,
COUNT(DISTINCT device_id) as dau
FROM telemetry_events
WHERE app_id = $1
AND event_type = 'app_start'
AND time > NOW() - INTERVAL '30 days'
GROUP BY day
ORDER BY day;
-- Crash rate by version
SELECT
app_version,
COUNT(*) FILTER (WHERE event_type = 'app_crash') as crashes,
COUNT(*) FILTER (WHERE event_type = 'app_start') as starts,
ROUND(
100.0 * COUNT(*) FILTER (WHERE event_type = 'app_crash') /
NULLIF(COUNT(*) FILTER (WHERE event_type = 'app_start'), 0),
2
) as crash_rate
FROM telemetry_events
WHERE app_id = $1
AND time > NOW() - INTERVAL '7 days'
GROUP BY app_version;
Crash Grouping
Stack Trace Fingerprinting
func fingerprintCrash(crash CrashReport) string {
// Normalize stack trace
normalized := normalizeStackTrace(crash.StackTrace)
// Hash key components
key := fmt.Sprintf("%s:%s:%s",
crash.CrashType,
crash.Message,
normalized,
)
return sha256(key)[:16]
}
func normalizeStackTrace(stack string) string {
// Remove line numbers (they change with code updates)
// Remove memory addresses
// Keep function names and file names
re := regexp.MustCompile(`:\d+:`)
return re.ReplaceAllString(stack, ":?:")
}
Crash Groups Table
CREATE TABLE crash_groups (
id UUID PRIMARY KEY,
app_id TEXT NOT NULL,
fingerprint TEXT NOT NULL,
crash_type TEXT NOT NULL,
message TEXT,
sample_stack_trace TEXT,
first_seen TIMESTAMPTZ NOT NULL,
last_seen TIMESTAMPTZ NOT NULL,
occurrence_count INT DEFAULT 1,
affected_versions TEXT[],
status TEXT DEFAULT 'open', -- open, resolved, ignored
UNIQUE(app_id, fingerprint)
);
Developer Dashboard
Metrics View
┌─────────────────────────────────────────────────────────────┐
│ Analytics - My Calculator │
├─────────────────────────────────────────────────────────────┤
│ │
│ Date Range: [Last 30 days ▼] │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Daily Users │ │ Crashes │ │ Crash-free │ │
│ │ 1,234 │ │ 23 │ │ 98.1% │ │
│ │ ▲ +12% │ │ ▼ -45% │ │ ▲ +2% │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Daily Active Users │ │
│ │ [Line chart showing DAU over time] │ │
│ └────────────────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Version Distribution │ │
│ │ [Pie chart: v1.2.0: 60%, v1.1.0: 30%, v1.0.0: 10%]│ │
│ └────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
Crashes View
┌─────────────────────────────────────────────────────────────┐
│ Crashes - My Calculator │
├─────────────────────────────────────────────────────────────┤
│ │
│ Filter: [All versions ▼] [Open ▼] │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ ● attempt to index nil value 'user' │ │
│ │ lua_error • 156 occurrences • v1.2.0 │ │
│ │ First: Jan 10 • Last: Jan 15 │ │
│ │ [View] │ │
│ ├──────────────────────────────────────────────────────┤ │
│ │ ● memory limit exceeded │ │
│ │ sandbox_error • 23 occurrences • v1.1.0, v1.2.0 │ │
│ │ First: Jan 5 • Last: Jan 14 │ │
│ │ [View] │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────┘
API Endpoints
# Ingestion (from devices)
POST /v1/telemetry/events:
auth: device_token or api_key
body: { app_id, device_id, events[] }
response: { received: number }
POST /v1/telemetry/crash:
auth: device_token or api_key
body: { app_id, device_id, crash }
response: { id: string }
# Dashboard (for developers)
GET /v1/apps/:id/analytics/overview:
auth: required
query: { start_date, end_date }
response: { dau, crashes, crash_free_rate, ... }
GET /v1/apps/:id/analytics/events:
auth: required
query: { start_date, end_date, event_type }
response: { data: [{ date, count, unique_devices }] }
GET /v1/apps/:id/crashes:
auth: required
query: { version, status, page, limit }
response: { crashes: CrashGroup[], total }
GET /v1/apps/:id/crashes/:fingerprint:
auth: required
response: { crash_group, recent_occurrences[] }
PATCH /v1/apps/:id/crashes/:fingerprint:
auth: required
body: { status: 'resolved' | 'ignored' }
response: { crash_group }
Data Retention
| Data Type | Retention | Reason |
|---|---|---|
| Raw events | 7 days | Debugging |
| Daily aggregates | 2 years | Trends |
| Crash reports | 90 days | Investigation |
| Crash groups | Forever | Issue tracking |
Cleanup Job
-- Run daily
DELETE FROM telemetry_events
WHERE time < NOW() - INTERVAL '7 days';
DELETE FROM crash_reports
WHERE timestamp < NOW() - INTERVAL '90 days';
Privacy Controls
User Settings
Settings > Privacy > Analytics
├── [✓] Send crash reports (helps developers fix bugs)
├── [ ] Send usage analytics (how you use apps)
└── [Request Data Deletion]
GDPR Endpoints
# User requests their data
GET /v1/privacy/export:
auth: user_token
response: { download_url } # JSON export of all data
# User requests deletion
DELETE /v1/privacy/data:
auth: user_token
response: { status: 'scheduled' } # Delete within 30 days
Deliverables
- Event schema specification
- Client-side SDK for batching
- Ingestion API endpoints
- Storage setup (TimescaleDB or ClickHouse)
- Aggregation jobs
- Crash grouping logic
- Developer dashboard
- Privacy controls
- Data retention automation
- GDPR export/delete
Open Questions
- Real-time crash alerts (email/Slack)?
- Sampling for high-volume apps?
- Custom events API for developers?
- Benchmarks/comparisons with similar apps?