Paperbrief architecture
The 30-second version
Upload → APScheduler worker picks it up → parse text → chunk → embed (OpenAI) → store in pgvector → ready to chat. Chat: embed question → top-K vector search → Claude synthesises answer with citations as a separate SSE event.
What runs where
Same single Lightsail box as Cloudbrief (see Cloudbrief architecture). Paperbrief lives in the same FastAPI process + APScheduler worker, but with its own tables, its own pipeline, its own per-org access toggle.
Ingestion lifecycle
┌───────────────────────────┐
│ POST /api/paperbrief/ │
│ documents (multipart) │ ← from the frontend Upload button
└──────────┬────────────────┘
│
▼
┌───────────────────────────┐
│ API: │
│ • validate size + type │
│ • create document row │ status="pending"
│ • upload bytes to S3 │ paperbrief/{org_id}/{doc_id}/...
│ • commit │
│ • return doc to user │
└──────────┬────────────────┘
│ user sees status=pending; UI polls every 5s
│
▼
┌───────────────────────────┐
│ Worker (every 30s): │
│ SELECT FROM │
│ paperbrief_documents │
│ WHERE status='pending' │
│ LIMIT 5 │
└──────────┬────────────────┘
│ for each pending doc
▼
┌───────────────────────────┐
│ Ingester: │
│ • flip → processing │
│ • S3 download bytes │
│ • parse → text + pages │
│ • if no text + OCR ok: │
│ Claude vision OCR │
│ • language detect │
│ • chunk (RecursiveCT) │
│ • embed (OpenAI batch) │
│ • bulk-insert chunks │
│ • flip → ready │
│ • log usage events │
└──────────┬────────────────┘
│
▼
status=ready in UI;
checkbox enabledFailures along the way flip status to failed with a user-facing status_message. No partial state — chunks are either fully inserted or not at all.
Chat lifecycle
┌─────────────────────────┐
│ POST /api/paperbrief/ │
│ chat (SSE stream) │
└──────────┬──────────────┘
│
▼
┌─────────────────────────┐
│ • emit conversation │ ← so UI can update URL right away
│ event with conv id │
│ • embed question │ OpenAI text-embedding-3-small
│ • vector search │ pgvector cosine, scoped by org_id
│ • emit citations event │ ← UI renders source pills
└──────────┬──────────────┘
│
▼
┌─────────────────────────┐
│ Anthropic streaming: │
│ system + user message │
│ with citations as ctx │
│ │
│ for each token: │
│ emit token event │ ← UI appends to visible answer
│ │
│ emit usage event │ ← UI uses for cost display
│ emit done event │
└──────────┬──────────────┘
│
▼
persist user msg + assistant msg
to paperbrief_messages
write usage event rowData shapes
organizations
├─ org_members (FK CASCADE)
├─ paperbrief_documents (FK CASCADE)
│ └─ paperbrief_chunks (FK CASCADE; embedding vector(1536))
├─ paperbrief_conversations (FK CASCADE)
│ └─ paperbrief_messages (FK CASCADE)
└─ paperbrief_usage_events (FK CASCADE)Every multi-tenant table has org_id. Foreign keys are explicit; cascades are explicit. No ORM "soft" relationships.
pgvector index
Single ivfflat index on paperbrief_chunks.embedding with vector_cosine_ops, lists=100. Tuned for up to ~1M vectors. We'll migrate to HNSW when we cross that threshold (a one-shot reindex; no app code change).
The query plan filters by org_id + document_id ANY(...) BEFORE the ANN scan — so the search space is always small per request, not "all chunks in the system".
What's sent to AI providers
For one chat turn:
- OpenAI text-embedding-3-small receives: your question text. Nothing else.
- Anthropic Claude Sonnet receives: the system prompt (one paragraph instructing it to ground answers), the top-K chunk texts (with document title + chunk index labels for the model's context), and your question.
For one OCR pass:
- Anthropic Claude Sonnet receives: a base64 PNG of one page + a transcription prompt. No other context.
We don't send PII metadata (your name, email, org name) to providers. The document/chunk content itself is whatever you uploaded.
Why this scales
The pipeline is embarrassingly parallel — adding worker capacity means processing more pending docs concurrently. Chat is stateless per-request; the only scaling bottleneck is the LLM API itself.
At our current scale (≤ 10 orgs), one Lightsail box handles everything comfortably. See Cloudbrief architecture → Scaling milestones.
Repo layout (relevant slice)
| Path | What |
|---|---|
backend/app/paperbrief/parsers/ | PDF / DOCX / TXT extractors |
backend/app/paperbrief/chunker.py | RecursiveCharacterTextSplitter w/ Indic separators |
backend/app/paperbrief/embedder.py | OpenAI embeddings (batched, retried) |
backend/app/paperbrief/ocr.py | Claude vision OCR |
backend/app/paperbrief/storage.py | S3 wrapper, per-org prefix |
backend/app/paperbrief/ingest.py | Orchestrator |
backend/app/paperbrief/retrieval.py | pgvector cosine search |
backend/app/paperbrief/chat.py | Claude streaming + SSE events |
backend/app/paperbrief/usage.py | Per-event cost recording |
backend/app/api/paperbrief.py | HTTP endpoints |
worker/jobs/paperbrief_ingester.py | 30-second polling job |