ADR-009: Logging and Monitoring Infrastructure
Status: Accepted
Context
As the photobooth application grows in complexity with payment integrations, multiple printer support, and cloud storage, we need robust logging and monitoring to:
- Debug production issues quickly
- Track payment transaction health
- Monitor printer connectivity and status
- Observe system performance and capacity
Decision
We will implement a multi-layer observability stack:
- Application Logging: Structured JSON logs with context (tenant_id, session_id, transaction_id)
- Error Tracking: Sentry for exception capture and grouping
- Metrics & Alerts: OpenTelemetry + Prometheus/Grafana Cloud (free tier)
- Uptime Monitoring: UptimeRobot or similar for endpoint monitoring
Architecture
Logging Strategy
Log Levels
| Level | Usage |
|---|---|
| ERROR | Payment failures, printer errors, exceptions |
| WARN | Retries, degraded performance, invalid inputs |
| INFO | Key events (session start/end, payment success) |
| DEBUG | Detailed flow for debugging |
Required Context Fields
interface LogContext {
tenant_id?: string;
session_id?: string;
transaction_id?: string;
printer_id?: string;
user_id?: string;
timestamp: string;
environment: "development" | "production";
}
Monitoring Targets
1. Payment Flow
- Payment creation success/failure rate
- Webhook delivery success
- Transaction processing time
- QRIS scan to payment confirmation latency
2. Hardware/Printing
- Printer connection status
- Print job success/failure rate
- Average print time
- Bluetooth disconnect events
3. System Health
- API response times
- Error rates by endpoint
- Cloud storage upload success
- Session completion rate
Tools Selection
| Category | Tool | Tier | Cost |
|---|---|---|---|
| Logging | Grafana Cloud (Loki) | Free | $0 |
| Error Tracking | Sentry | Team | $0 (free tier) |
| Metrics | Grafana Cloud (Prometheus) | Free | $0 |
| Uptime | UptimeRobot | Free | $0 |
Note: All selected tools have generous free tiers suitable for MVP.
Implementation Phases
Phase 1: Foundation
- Integrate Sentry for frontend/backend error tracking
- Add structured logging to payment flow
- Create error boundary in React
Phase 2: Metrics
- Add OpenTelemetry instrumentation
- Set up Grafana Cloud dashboard
- Create payment success/failure charts
Phase 3: Alerts
- Configure error alerts (Slack/email)
- Set up printer offline notifications
- Payment failure webhook alerts
Consequences
Positive
- Faster debugging in production
- Data-driven optimization decisions
- Proactive monitoring before users report issues
- All tools have free tiers
Negative & Risks
- Additional DevOps complexity
- Log storage costs at scale
- Need to balance log verbosity vs. cost
Related Files
- RFC: RFC-006-logging-monitoring.md
- ADR-003: ADR-003-payment-integration.md
- ADR-005: ADR-005-hardware-support.md