Troubleshooting Paperbrief

Upload stays in "pending" or "processing" forever

The worker polls every 30 seconds and processes up to 5 pending docs per tick. Typical end-to-end time:

TXT/MD: < 5 seconds
DOCX: 5–15 seconds
PDF with text layer: 5–30 seconds
Scanned PDF (OCR): 30 seconds to several minutes (depends on page count)

If a doc has been pending/processing for more than 5 minutes, something is wrong:

Reload the page. The status pill is updated every 5s but cached client-side; a hard reload (Cmd+Shift+R) forces a fresh fetch.
Check the doc card — if status is failed, the message tells you why (see below for common causes).
Worker may be down. If we have an outage, all pending uploads stack up. We see this immediately on our side; should resolve within an hour. Email us if it persists.

Upload status went to "failed" — what now?

Click the document card to see the failure message. Common causes:

"File is X MB — the limit is 25 MB"

Self-explanatory. Either compress (PDFs often compress 2-5× with gs -sDEVICE=pdfwrite -dPDFSETTINGS=/ebook) or split into multiple uploads.

"'X.pdf' appears to use an older non-Unicode font that couldn't be decoded"

Legacy Indic font encoding issue. See File formats → Garbled text detection for the fix.

"'X.pdf' produced no usable text chunks"

The parser ran but the extracted text was all whitespace or below the chunk threshold. Often means the PDF is image-only and OCR is disabled. Either:

Re-upload — if OCR is enabled in the platform config (it is by default in early-access), this shouldn't happen. Likely a transient.
If the PDF is genuinely empty / image-only and OCR is disabled, you'll see a different message (OCRRequiredError); ping us to enable OCR.

"OCR couldn't extract any text from 'X.pdf'"

OCR ran but produced empty output. Usually means the PDF has only graphics with no readable text content. Workaround: convert to images externally and use a different OCR tool, then upload the text.

"This document produces N chunks; the per-document limit is 2000"

You hit the chunk ceiling. Split the document into pieces (chapters, sections) and upload separately. We may raise the cap on request — tell us your use case.

"Processing failed unexpectedly"

A bug or transient. Email us with the document name + approximate upload time and we'll trace it in the worker log.

Chat returns "I could not find this information"

The retrieval found chunks but the model judged them not relevant enough. Try:

Rephrase using terms that appear in the document. Vector search is semantic but still benefits from term overlap.
Select fewer documents so the retrieval is more focused.
Ask more specifically. "What's the liability cap" is better than "what about liability".
Check the right docs are selected in the sidebar. The error message is the model telling you it couldn't ground the answer — usually correct.

If you know the content is there and the retrieval misses it consistently, email us the document + the question. We can look at the embedding similarity scores and figure out whether to tune retrieval.

Chat returns "The model failed to answer"

Three causes from most common to least:

Anthropic rate limit (429). We got throttled. Retry in 30 seconds.
Anthropic transient error (5xx). Same retry.
Misconfigured model name. If you see this consistently, email us — we may have a deployment issue.

The error is logged on our side. We see the pattern and respond within an hour.

Chat returns no answer, page seems to reload

Was a real bug; fixed in the SP7a patch. If you still see this on a current build:

Hard-reload the page (Cmd+Shift+R)
Try again. If it recurs reproducibly, email us with the question text + selected docs.

Citations look wrong

If a citation pill shows a document title that's clearly unrelated to the answer:

The retrieval pulled in that chunk because of some incidental term match.
The model wisely IGNORED it in the answer (good behavior).
The pill still shows because it's what we sent as context.

If the answer itself contradicts what the cited chunk says, that's a model error — please email us with the conversation ID + the offending answer. We log these to improve prompts.

Document delete didn't fully clean up

Delete cascades to chunks via FK ON DELETE CASCADE. S3 object delete is best-effort (we log on failure but don't block the DB delete). Result: rare orphaned S3 objects under your org's prefix.

If you need to verify nothing's left:

The doc disappears from your library immediately.
All conversations that referenced it still show — chats keep their messages with the document title preserved in citations.
S3 cleanup is async; orphans cost ~$0.023/GB/month. We sweep periodically.

To force-clean stuck S3 objects, email us with the document ID.

"Paperbrief isn't enabled for your company"

Your org doesn't have the paperbrief product flag. Two paths:

If you're the platform admin: Workspace → Platform admin → your org → Product access → toggle Paperbrief on.
If you're not: email manish.gaud@bhavitech.com asking to enable Paperbrief for your org. Reply usually within a business day.

I can't switch to Paperbrief — it's not in the product dropdown

Either:

Paperbrief isn't enabled for your org (see above).
The platform admin enabled it after you loaded the page — hard-reload to pick up the change.

Architecture