Cloudbrief
Architecture

Cloudbrief architecture

The 30-second version: Cloudbrief is a FastAPI backend + a Next.js static-export frontend + an APScheduler worker, all running on a single AWS Lightsail box. The worker assumes a role into your account once a day, pulls a defined data set, runs the detector layer, optionally invokes Claude for synthesis, stores the result.

The pieces

                                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   browser ── HTTPS ── Cloudflare edge ──── β”‚ Lightsail (ap-south-1)    β”‚
                       (proxied, strict)    β”‚                           β”‚
                       cert: CF Origin CA   β”‚  nginx (TLS terminate)    β”‚
                                            β”‚   β”‚                       β”‚
                                            β”‚   β”œβ”€β†’ static frontend     β”‚
                                            β”‚   β”‚   /var/www/analyzer/  β”‚
                                            β”‚   β”‚                       β”‚
                                            β”‚   └─→ /api/* β†’ FastAPI    β”‚
                                            β”‚       (uvicorn :8000)     β”‚
                                            β”‚           β”‚               β”‚
                                            β”‚           β–Ό               β”‚
                                            β”‚    PostgreSQL 16 +        β”‚
                                            β”‚    pgvector (local)       β”‚
                                            β”‚                           β”‚
                                            β”‚    APScheduler worker     β”‚
                                            β”‚    (heartbeat, daily,     β”‚
                                            β”‚     weekly digest, etc.)  β”‚
                                            β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                 β”‚
                                                 β”‚ STS:AssumeRole
                                                 β–Ό
                                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                            β”‚  YOUR AWS account          β”‚
                                            β”‚  (cross-account read-only) β”‚
                                            β”‚                            β”‚
                                            β”‚  Cost Explorer Β· CloudWatchβ”‚
                                            β”‚  PI Β· CloudTrail Β· ALB Β· EBβ”‚
                                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                       β”‚
                                                       β”‚  Synthesis data
                                                       β–Ό
                                            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                            β”‚  Anthropic Claude API      β”‚
                                            β”‚  (only when detectors fire)β”‚
                                            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

What runs where

ComponentLivesPurpose
Frontend SPACloudflare CDN edge (HTML/JS/CSS)Renders the dashboard, reports, investigations UI
APIAWS Lightsail (analyzer-api systemd unit)Auth, CRUD, on-demand investigation triggers
WorkerAWS Lightsail (analyzer-worker systemd unit)Scheduled jobs: daily analysis, weekly digest, Paperbrief ingestion
DatabaseAWS Lightsail (Postgres 16, local)Everything multi-tenant lives here
Report blob storageS3 analyzer-reports-*Synthesised reports, retained forever
BackupsS3 analyzer-backups-*Daily DB backups (retained 30d)
SecretsAWS SSM Parameter StoreAPI keys, DB password, pgcrypto key, TLS cert + key

Daily analysis lifecycle

  1. 09:00 IST cron fires in the worker for every aws_accounts row where daily_enabled = true.
  2. Worker decrypts the AWS credentials (column-encrypted via pgcrypto), assumes the cross-account role via STS.
  3. Each collector runs in parallel: Cost Explorer for the last 8 days, CloudWatch metrics for relevant resources, PI for top SQL, etc.
  4. Raw collected data is fed to every detector in worker/detectors/. Each detector emits zero or more Signal rows into detector_signals (fired or not β€” un-fired rows are kept for recall analysis).
  5. If any detector fires:
    • The synthesis prompt is built (system + fired signals + narrow data slices)
    • Claude is called with model = claude-sonnet-4-6, streaming off
    • Response is parsed for findings + root-cause chains
    • A analysis_runs row + findings rows are written
    • Report is rendered to HTML, uploaded to S3
    • Email is sent via SES to the account's recipients (if any configured)
  6. If no detector fired:
    • A analysis_runs row is written with status "all_clear"
    • No LLM call, no report HTML, no email content beyond a one-liner
    • Total cost: ~$0.000

Cost model

Per AWS account per day:

  • Data collection: free (CloudWatch / Cost Explorer API limits are well within free)
  • Detector layer: free (pure Python, runs in worker)
  • LLM synthesis: $0.04–$0.20 per fired-signal day, $0 per quiet day
  • SES email: $0.0001 per email, negligible

Typical monthly spend per AWS account: $0.50–$2 if mostly-quiet, $2–$8 if you actively investigate every week.

Storage shapes

organizations
  β”œβ”€ org_members (FK CASCADE)
  └─ aws_accounts (FK CASCADE)
       └─ analysis_runs (FK CASCADE)
            β”œβ”€ findings (FK CASCADE)
            └─ detector_signals (FK CASCADE)

Every multi-tenant table has org_id (or transitively belongs to one), enforced at the FK level. No table is shared between Cloudbrief and Paperbrief.

Why a single Lightsail box

We started cheap. At our current scale (≀ 10 organisations), one box runs the entire platform comfortably:

  • 1 vCPU, 1 GB RAM (Lightsail small_3_1)
  • ~$12/month
  • DB + worker + API all on the same host

Scaling milestones, in order of when we'd hit them:

  • ~50 organisations: split DB to RDS Postgres (single instance, no replicas yet). Application code change: zero (DATABASE_URL env var).
  • ~200 organisations: split worker to its own box so DB I/O during heavy ingest doesn't block API responses.
  • ~1000 organisations: move from Lightsail to ECS+Fargate so we can scale horizontally and run multiple worker replicas.

We're nowhere near 50 today. Premature horizontal scaling would just cost more.

Repo layout

PathWhat
backend/app/FastAPI + SQLAlchemy + Alembic migrations
worker/APScheduler entry, detectors, collectors, analyzers
frontend/Next.js static-export SPA
nginx/TLS / vhost config installed on the box
scripts/bootstrap, deploy, backup, restore

Source-of-truth repo: github.com/manishrgaud7781/aws-connector-ai (opens in a new tab).