Home›Blog›Tutorials

Tutorials

Privacy-First AI PDF Chat: BYOK 2026

By imisspdf Team·May 24, 2026·15 min read

You drop a 40-page contract into ChatPDF and ask “what’s the termination clause?” and get a clean answer in three seconds. Magic. But where did your contract go, and who saw the question?

For most “chat with PDF” tools, the honest answer is: your file went to their server, the recognized text went to OpenAI or Anthropic or Google through their server-side proxy, the answer came back through their server, and a copy of your file is sitting in their storage for the next 1-30 days while a copy of your conversation is in their logs for some period after that.

For non-sensitive documents — a public research paper, a software manual, a published book — none of this matters. For a contract, a payslip, a medical record, an internal strategy doc, an unannounced product roadmap, every step of that data flow is a step where the document is somewhere it shouldn’t be.

This article explains the alternative pattern: BYOK in-browser PDF chat. We’ll cover what BYOK means, why the standard hosted-AI-PDF-chat architecture has hidden privacy risk, how the in-browser BYOK pattern fixes it, which models work best for which kinds of PDF, the cost vs accuracy trade-offs, and when fully local models (Ollama) make sense despite their accuracy limits.

The hidden privacy risk in hosted AI PDF chat

The category is currently dominated by ChatPDF (the original), Humata, Adobe AI Assistant for Acrobat, Microsoft Copilot for Word, and a handful of newer tools (DocuChat, AskYourPDF, ChatDOC). The architecture is similar across all of them:

You upload your PDF to the tool’s server.
The server extracts text from the PDF (often running OCR if it’s scanned).
The server splits the text into chunks and computes embeddings (vector representations) using OpenAI’s or another provider’s embedding API.
The chunks and embeddings are stored in the tool’s vector database.
You ask a question. The server runs a similarity search to find the most relevant chunks.
The server constructs a prompt combining your question with the retrieved chunks.
The server sends the prompt to OpenAI/Anthropic/Google through their AI-provider account, using their own API key.
The provider returns the answer to the server.
The server returns the answer to you.

This is the RAG (Retrieval-Augmented Generation) pattern, and it’s a fine pattern. The problem isn’t the pattern — it’s that steps 1 through 9 all run on the tool company’s infrastructure, which means your document, your question, and the AI’s answer all pass through (and may be stored on) a third party that isn’t the AI provider.

That’s two layers of trust required, not one. Even if you trust OpenAI’s API data policy (which is reasonable — API usage isn’t used to train models by default since March 2023), you also have to trust the tool company’s privacy policy, their data retention, their employee access controls, their breach disclosure track record, and the jurisdiction their servers operate in.

Humata, per their own AppSumo Q&A, does not offer BYOK functionality — your data goes through their stack. ChatPDF’s documentation describes their backend API as the standard server-mediated model. None of the major hosted PDF chat tools we surveyed offer an in-browser BYOK option.

For most users on most documents, this trust layer is acceptable. For documents that contain personal data, financial data, medical data, legal material under NDA, or pre-publication content, it’s two trust layers too many.

The BYOK pattern

BYOK — bring your own key — is the architectural alternative. Instead of using the tool’s hosted AI account, the user supplies their own API key. The tool sends requests as the user, not on the user’s behalf, and the tool’s server is no longer in the request path between you and the AI provider.

There are two flavors of BYOK in practice:

Server-side BYOK (most B2B SaaS): You give the tool your API key, the tool stores it in their database, the tool’s server uses your key to make requests. You’re paying for the AI usage directly to OpenAI rather than to the tool — but the tool’s server is still in the path. This is better than the fully-hosted pattern but only marginally; your file still passes through their server, and your API key now lives in their database too (which is itself a sensitive credential).

In-browser BYOK (what we use): You paste your API key into the tool’s web page. The key is stored in your browser’s localStorage on your device — it never leaves your machine. When you make a chat request, the JavaScript running in your browser tab makes the request directly to OpenAI/Anthropic/Google. The tool’s server is never involved in the AI call. The PDF text extraction and chunking also happen in your browser. No file upload, no key upload, no request proxy.

This is what makes the in-browser BYOK pattern structurally different. The data flow looks like:

Your browser  →  OpenAI/Anthropic/Google API  →  your browser
       ↑                                                    ↓
   PDF file                                          answer displayed
   (never uploaded)                                  (no server log)

Compare to the standard hosted flow:

Your browser  →  Tool's server  →  Tool's vector DB  →  Tool's server  →  AI provider
                                                                                  ↓
       ←        ←        ←         ←        ←         ←        ←        ←        ←
                              (your file in transit at every arrow)

There’s only one third party in the BYOK flow — the AI provider you chose. There are at least two third parties in the hosted flow.

How imisspdf chat-pdf actually works

For full transparency, here’s exactly what happens when you use imisspdf chat-pdf:

1. First visit. You open the tool. Our static page loads with the chat UI and the PDF processing engine. Nothing has touched the AI provider yet.

2. API key setup. On first use, you enter your OpenAI, Anthropic, Claude, or Gemini API key. The key is saved in your browser’s localStorage with the origin imisspdf.com. Our server never receives the key. You can verify this by opening devtools → Network and confirming there’s no outbound POST containing your key.

3. Drop a PDF. The file goes into browser memory. We extract the text using pdf.js running in your tab — no upload. If the PDF is image-based and needs OCR, that happens locally too via Tesseract.js. (See OCR PDF online free: Tesseract.js explained for how that works.)

4. Embeddings and chunking. For long documents, we chunk the text into overlapping segments and request embeddings from your chosen provider. Each embedding request goes from your browser directly to the provider’s API using your key. The embeddings come back to your browser and are stored in memory for the duration of your session.

5. Ask a question. You type the question. Our code does a vector similarity search in your browser to find the most relevant chunks, builds a prompt combining your question with those chunks, and sends the prompt to the AI provider directly from your browser using your key. The answer streams back to your browser.

6. Display. The answer appears in the chat UI. Nothing about the question, the document content, or the answer is logged on our infrastructure. We can’t see your conversation because we’re not in it.

7. Optional: clear. Closing the tab clears the in-memory state (chunks, embeddings, conversation). Your API key stays in localStorage unless you click “forget API key” or clear site data.

The only network traffic between your browser and imisspdf’s server is: (a) the initial page load, (b) any subsequent navigation requests, and (c) anonymous analytics (Plausible). No file content, no questions, no answers, no API keys.

Which model to use, and why

The “chat with my PDF” problem is generic enough that all the major frontier LLMs handle it well. The differences are real but smaller than the marketing implies. Pricing as of May 2026 (sources cited below):

Model	Input price	Output price	Context	Best for
Claude Opus 4.6	$5/M tokens	$25/M tokens	1M (200K default)	Highest-stakes accuracy on dense/legal/academic
Claude Sonnet 4.6	$3/M tokens	$15/M tokens	1M (200K default)	Best general default — long PDFs, fair price
Claude Haiku 4.5	$1/M tokens	$5/M tokens	200K	Cheap, fast, good for casual Q&A
GPT-5.5	$5/M tokens	$30/M tokens	256K	Strong on technical/coding-adjacent PDFs
GPT-4o	$2.50/M tokens	$10/M tokens	128K	Solid mid-tier, mature ecosystem
GPT-4.1 mini	$0.40/M tokens	$1.60/M tokens	128K	Very cheap, surprisingly capable
Gemini 3.1 Pro	$2/M tokens	$12/M tokens	2M	Largest context, best for huge documents
Gemini 3 Flash	$0.50/M tokens	$3/M tokens	1M	Cheap + huge context combination
Ollama (local)	Free	Free	Model-dependent	Maximum privacy, no API needed

A few practical notes from running these against PDFs:

Claude Sonnet 4.6 is the best general default in 2026 if you don’t have a specific reason to use something else. The 1M-token context window (since Anthropic made it generally available in March 2026) means you can fit a 700-page document in a single prompt without chunking, which improves answer quality on questions that span the whole document. Sonnet is strong on reasoning over long context — the place most other models start to drift past 100K tokens.

Gemini 3.1 Pro has the largest context (2M tokens) and is genuinely good for very long documents (multi-volume reports, entire books). It also tends to be slightly more verbose than Claude in answers.

GPT-4o is a fine general default if you already have OpenAI credits. Its smaller context window (128K) means longer documents do need to be chunked, which our tool handles transparently.

GPT-4.1 mini and Gemini 3 Flash are the bargain options. For straightforward Q&A on a 20-page document, they’re indistinguishable from the flagship models at 5-10% of the cost. For nuanced questions where you need the model to weigh subtle context, the bigger models pull ahead.

Claude Opus 4.6 and GPT-5.5 are overkill for most PDF chat — they’re built for tasks that require careful multi-step reasoning. If you’re chatting with a dense legal contract or a research paper where you can’t afford a wrong inference, the cost premium can be worth it.

Ollama with Llama 3.3 70B or Qwen 2.5 72B is the privacy maximalist option. The model runs entirely on your machine — no API, no internet, no third party at all. Llama 3.3 produces output that’s roughly comparable to GPT-4o on most factual extraction tasks, slightly weaker on complex multi-step reasoning. The catch is speed: even on a powerful Mac, generation is 5-30 tokens per second versus 50-200 from a hosted API. And running a 70B model requires roughly 40 GB of RAM, which limits it to capable laptops and workstations.

Cost: a realistic example

Let’s price a single workflow: chat with a 50-page contract, ask 20 questions back and forth.

Rough token math: a 50-page PDF is ~25,000 tokens of text. Each question is ~50 tokens of user input. Each answer is ~200 tokens of model output. With RAG (retrieve top 5 chunks of ~500 tokens each), each request prompt is ~3,000 tokens input.

For 20 questions:

Total input tokens (across 20 prompts): 20 × 3,000 = 60,000
Total output tokens: 20 × 200 = 4,000
Initial embeddings on the full document: 25,000 tokens × OpenAI embedding price ($0.13/M) = $0.003

Cost by model:

Model	Cost for the workflow
Claude Sonnet 4.6	$0.06 ($0.18 input + $0.06 output)
GPT-4o	$0.19 ($0.15 input + $0.04 output)
GPT-4.1 mini	$0.03 ($0.024 input + $0.006 output)
Gemini 3 Flash	$0.04 ($0.03 input + $0.012 output)
Claude Opus 4.6	$0.40 ($0.30 input + $0.10 output)
Ollama (local)	$0

A 50-page contract conversation costs between $0.03 and $0.40 depending on the model. For most users, even the expensive end is cheap enough that the cost isn’t the deciding factor — the privacy posture and the answer quality matter more.

If you’re processing dozens of documents per day, the cheaper models add up to real money. If you process a few per week, run whichever you trust most.

When fully-local (Ollama) is the right pick

There’s a meaningful third option beyond hosted APIs: run the whole stack locally with Ollama. Ollama is an open-source runtime for local LLMs that exposes a simple HTTP API on localhost. Pair it with a model like Llama 3.3 70B, Qwen 2.5 72B, or Mistral Large, and you have a chat-with-PDF stack where nothing leaves your device, not even to OpenAI.

Use Ollama when:

The document is so sensitive that even sending it to a major AI provider’s API is unacceptable (defense, classified-adjacent material, attorney-client privileged content, certain HIPAA workflows)
You’re in a country or jurisdiction with strict data residency rules that the major AI providers can’t meet
You’re rate-limited, on a flaky connection, or working offline
You’re philosophically committed to running open-source models locally regardless of cost
You want to experiment with models and prompts without worrying about API bills

Skip Ollama when:

You don’t have a machine with 16+ GB of RAM (32+ GB for larger models)
You need fast responses (Ollama on consumer hardware is 5-10x slower than hosted APIs)
You need the absolute best model accuracy on complex documents (Claude Opus and GPT-5.5 are still meaningfully ahead of any open-source model on dense reasoning)
You’re a casual user who chats with a few PDFs a month (the setup cost outweighs the privacy benefit at low volume)

Our chat-pdf tool supports Ollama as a provider: you point it at your local Ollama endpoint (usually http://localhost:11434) and pick from your installed models. The same in-browser RAG flow runs — embeddings happen via Ollama’s embedding endpoint, chat completion via Ollama’s chat endpoint. No internet required after model download.

Comparison with the other “chat with PDF” tools

Tool	Architecture	File location	API key location	Per-month cost (typical use)
ChatPDF	Server-hosted, OpenAI proxy	Their server	N/A (they use theirs)	$7-20 (Pro tier)
Humata	Server-hosted, OpenAI proxy	Their server, AES-256 at rest, 30-day retention	N/A (they use theirs)	$15-30
Adobe AI Assistant	Server-hosted, internal AI	Adobe Document Cloud	N/A (Adobe internal)	$5 add-on to Acrobat
Microsoft Copilot	Server-hosted, Microsoft AI	Microsoft 365 storage	N/A (M365 license)	$30 (Copilot Pro)
ChatDOC	Server-hosted, OpenAI proxy	Their server	N/A (they use theirs)	$10-20
imisspdf chat-pdf	In-browser BYOK	Your device only	Your localStorage only	Whatever you pay OpenAI/etc directly (~$1-5/month for casual use)

The competitive landscape is dominated by hosted SaaS tools that monetize a markup on AI provider costs. The BYOK in-browser pattern is rarer because it’s harder to monetize — we make zero money on your API calls because we’re not in the call path.

Practical tips for better answers

A few habits that improve PDF-chat quality regardless of model:

Be specific. “What’s the termination clause?” beats “tell me about this contract”. The retrieval step looks for chunks that match your question semantically; specific questions find the right chunks.
Cite page numbers when asking follow-ups. “On page 14, paragraph 3 says X. Does that conflict with the obligation on page 22?” The model has the full context but pointing it at specific sections improves the precision of the answer.
Ask for citations. “When you make a claim about the document, quote the exact passage you’re drawing from.” This forces the model to ground its answers in the text and makes hallucinations easy to spot.
Use a frontier model for the first pass, a cheaper model for follow-ups. Once you have a frame, follow-up questions usually don’t need the most expensive model.
For dense legal/medical text, prefer Claude. Anthropic’s models have a consistent edge on careful reading of dense prose. For code-adjacent or technical PDFs, GPT-5.5 is often slightly better.
For multilingual documents, prefer Gemini. Google’s training data has been multilingual since the earliest models and Gemini handles non-English content with less degradation than GPT or Claude.
Cache the conversation context. Most providers offer prompt caching that discounts repeat requests on the same context by 50-90%. For multi-turn chat over the same document, this is a meaningful cost reduction.

Try BYOK in-browser chat

If the privacy posture of standard hosted PDF chat tools doesn’t fit your document, try imisspdf chat-pdf →. Bring your own OpenAI, Claude, Gemini, or Ollama setup. The PDF stays on your device. The API key stays on your device. We see none of it.

For documents where you don’t have a strong privacy requirement and you’d rather not deal with API keys, ChatPDF and Humata are legitimate tools with honest privacy policies — they just don’t fit the threat model we built imisspdf for. The frame that works best: decide per document, not per tool.

Frequently asked questions

The FAQ block at the top of this article covers the most common questions about BYOK and AI PDF chat. For related coverage, see Is iLovePDF safe? A 2026 privacy review and imisspdf vs iLovePDF: Privacy-First Alternative.

Sources

Try it now — free, in your browser

Use Chat with PDF: Ask questions and get grounded answers. No signup, nothing uploaded.

Frequently asked questions

BYOK stands for 'bring your own key' — instead of using the tool's hosted AI account, you paste your own OpenAI, Claude, or Gemini API key into the tool. The chat request then goes from your browser directly to OpenAI/Anthropic/Google. The PDF tool's own server is never in the request path. This matters because most 'chat with PDF' tools have two privacy exposures: your file sits on their server, and the AI-generated answer also passes through their server. BYOK removes the second exposure entirely and, in an in-browser tool, the first one too.

Yes — for a specific and important reason. With a typical hosted PDF chat tool, your data goes to two third parties: the tool company and the AI provider. With BYOK, it only goes to one: the AI provider you chose. Even if you trust both, fewer parties touching your file is structurally safer. Beyond that, OpenAI, Anthropic, and Google all have published data-use policies for API usage that are stricter than the consumer-product equivalents — API requests are not used to train models by default, and enterprise tiers add zero data retention. The hosted tool company is the additional layer of risk you remove by going BYOK.

It removes the tool's server entirely from the request flow. In a server-based BYOK tool, you give the tool your API key, the tool stores it, and the tool makes requests on your behalf — your file still passes through their server, and your API key is in their database. In imisspdf's in-browser BYOK, your API key is stored in your browser's localStorage on your device. When you ask a question, the JavaScript running in your tab makes the request directly to OpenAI/Anthropic/Google. The imisspdf server is never in the request path. We don't have your file, your question, or your API key on any of our infrastructure.

Depends on the trade-off you want. For broad capability and the largest context window (1 million tokens), Claude Sonnet 4.6 and Gemini 3.1 Pro both handle very long PDFs in a single pass. For cost-sensitive use on shorter documents, GPT-4o mini and Gemini 3 Flash are cheap and fast. For fully local processing with no API costs at all, Ollama with a model like Llama 3.3 or Qwen 2.5 runs on your laptop — slower and slightly less accurate than the frontier hosted models, but no data leaves your device. For dense academic or legal documents where accuracy matters most, Claude Opus 4.6 and GPT-5.5 produce the most reliable answers but cost the most per query.

It's stored in your browser's localStorage, scoped to imisspdf.com. localStorage is a per-origin storage area in your browser — only pages served from imisspdf.com can read it, and it stays on your device. Our server never receives it. When you make a chat request, the JavaScript in your browser reads the key from localStorage, attaches it to the API request headers, and sends the request directly to OpenAI/Anthropic/Google. You can clear the key any time by clicking 'forget API key' in the tool, by clearing your browser's site data for imisspdf.com, or by switching to an incognito/private window (localStorage doesn't persist in private mode).

imisspdf Team

We build imisspdf — every PDF tool in one place, free and private. Practical guides from the people who make the tools.

How-to

How to Convert TIFF (Multi-Page Scan) to PDF Locally

Convert single or multi-page TIFF scans into PDF. Preserves resolution, handles G4/LZW compression. Browser-only — your scans never upload.

How-to

How to Convert RTF to PDF Properly (Preserve Bold, Italic, Tables)

Turn Rich Text Format files into PDF without losing styling. Browser-only, no upload, handles RTF from WordPad, TextEdit, or legacy Word.

How-to

How to Convert ODT to PDF Without LibreOffice (Browser-Only)

Convert OpenDocument .odt files to PDF in your browser. No LibreOffice install, no upload, formatting preserved.

Tools

Solutions

Company

Product

Privacy-First AI PDF Chat: BYOK 2026

The hidden privacy risk in hosted AI PDF chat

The BYOK pattern

How imisspdf chat-pdf actually works

Which model to use, and why

Cost: a realistic example

When fully-local (Ollama) is the right pick

Comparison with the other “chat with PDF” tools

Practical tips for better answers

Try BYOK in-browser chat

Frequently asked questions

Sources

Frequently asked questions

imisspdf Team

Related articles

How to Convert TIFF (Multi-Page Scan) to PDF Locally

How to Convert RTF to PDF Properly (Preserve Bold, Italic, Tables)

How to Convert ODT to PDF Without LibreOffice (Browser-Only)