Skip to main content

ADR-009: Logging and Monitoring Infrastructure

Status: Accepted

Context

As the photobooth application grows in complexity with payment integrations, multiple printer support, and cloud storage, we need robust logging and monitoring to:

  • Debug production issues quickly
  • Track payment transaction health
  • Monitor printer connectivity and status
  • Observe system performance and capacity

Decision

We will implement a multi-layer observability stack:

  1. Application Logging: Structured JSON logs with context (tenant_id, session_id, transaction_id)
  2. Error Tracking: Sentry for exception capture and grouping
  3. Metrics & Alerts: OpenTelemetry + Prometheus/Grafana Cloud (free tier)
  4. Uptime Monitoring: UptimeRobot or similar for endpoint monitoring

Architecture

Logging Strategy

Log Levels

LevelUsage
ERRORPayment failures, printer errors, exceptions
WARNRetries, degraded performance, invalid inputs
INFOKey events (session start/end, payment success)
DEBUGDetailed flow for debugging

Required Context Fields

interface LogContext {
tenant_id?: string;
session_id?: string;
transaction_id?: string;
printer_id?: string;
user_id?: string;
timestamp: string;
environment: "development" | "production";
}

Monitoring Targets

1. Payment Flow

  • Payment creation success/failure rate
  • Webhook delivery success
  • Transaction processing time
  • QRIS scan to payment confirmation latency

2. Hardware/Printing

  • Printer connection status
  • Print job success/failure rate
  • Average print time
  • Bluetooth disconnect events

3. System Health

  • API response times
  • Error rates by endpoint
  • Cloud storage upload success
  • Session completion rate

Tools Selection

CategoryToolTierCost
LoggingGrafana Cloud (Loki)Free$0
Error TrackingSentryTeam$0 (free tier)
MetricsGrafana Cloud (Prometheus)Free$0
UptimeUptimeRobotFree$0

Note: All selected tools have generous free tiers suitable for MVP.

Implementation Phases

Phase 1: Foundation

  • Integrate Sentry for frontend/backend error tracking
  • Add structured logging to payment flow
  • Create error boundary in React

Phase 2: Metrics

  • Add OpenTelemetry instrumentation
  • Set up Grafana Cloud dashboard
  • Create payment success/failure charts

Phase 3: Alerts

  • Configure error alerts (Slack/email)
  • Set up printer offline notifications
  • Payment failure webhook alerts

Consequences

Positive

  • Faster debugging in production
  • Data-driven optimization decisions
  • Proactive monitoring before users report issues
  • All tools have free tiers

Negative & Risks

  • Additional DevOps complexity
  • Log storage costs at scale
  • Need to balance log verbosity vs. cost