How to convert PDF to text
Three steps. Everything runs locally.
Pick a PDF
Drop or select one PDF. It is read locally — no upload.
Pick separator
How to mark page breaks in the output. Default is human-readable markers.
Download .txt
Plain UTF-8 text file that opens in every editor.
What is "PDF to Text"?
Converting a PDF to text means stripping a PDF document down to its plain words — no fonts, no images, no layout. The result is a UTF-8 .txt file that opens in every editor, every operating system, every decade. It is the format of choice when you need to grep through a document, paste it into a chatbot, feed it to a script, or simply archive what the PDF said in the smallest possible file.
The text in a normal PDF is stored as a sequence of positioned characters. This tool reads those characters back, groups them into lines by Y-coordinate, sorts the lines top-to-bottom, sorts items left-to-right within each line, and writes the result as a single UTF-8 string with one page after another.
How PDF to Text works in your browser
When you drop a PDF, your browser reads it into memory. We hand the
bytes to
PDF.js,
Mozilla's open-source PDF engine. For each page we call
getTextContent(), which returns every text item with its
position. We cluster the items into lines, sort them in reading
order, and concatenate them into a single string. There is no OCR
step — text already in the PDF as text is extracted directly. If
the PDF is a scanned image, the text layer is empty and nothing
comes out (use OCR first in that case).
The result is written to a Blob and offered as a download. Nothing is uploaded. The entire pipeline — parsing, extraction, packaging — runs inside your browser tab. You can run it offline and it still works.
Common use cases
- Searchable archive of a folder of PDFs. Extract text from each PDF, store the .txt next to it, and now grep finds anything in seconds.
- Pasting a PDF into ChatGPT/Claude/Gemini. Most LLM web UIs accept plain text far more cleanly than PDFs — paste the .txt and ask away.
- Diffing two versions of a document. Extract both PDFs, run
diff, and see exactly what changed. - Pre-processing for NLP / scripts. Sentiment analysis, named-entity extraction, summarisation — they all take text, not PDF.
- Accessibility. Convert to text, paste into a screen-reader-friendly editor.
Privacy & security
Contracts, payslips, medical records, internal reports — the kind of documents people most want to extract text from are exactly the ones they least want sitting on a stranger's server. Most online PDF-to-text tools upload the file, extract text on their server, and deliver a .txt. imisspdf does the same job with PDF.js running inside your tab. There is no upload, no account, no daily limit. See our iLovePDF privacy review for what the standard upload model actually looks like.
Frequently asked questions
Scanned PDFs are images of pages, not text. There is no text layer to extract — the words you see are just pixels. Run the PDF through our OCR tool first; it adds a text layer over the image, after which pdf-to-text can extract the words. If the PDF was created by photographing or scanning paper, OCR is always the missing step.
No. The output is plain UTF-8 text — no bold, italic, font sizes, or colours. Lines from the source PDF are kept, but multi-column layouts are flattened into reading order one column at a time. If you need formatting preserved, use PDF to Word instead, which keeps headings and inline styling.
Best-effort. Tables become tab-separated-ish lines based on the original column positions in the PDF — usually readable but not perfectly aligned. Multi-column articles are extracted one column at a time, top-to-bottom. For accurate table extraction, use PDF to Excel.
No. PDF.js parses the file inside your browser tab, extracts the text using getTextContent(), and writes the result to a Blob that downloads to your computer. Nothing crosses the network. You can verify this by running the tool while offline — it still works.
Not directly. Encrypted PDFs cannot be parsed without the password. Run the file through our Unlock PDF tool first (provide the password), then bring the unlocked PDF here. We refuse encrypted PDFs explicitly with a friendly error rather than silently returning an empty file.
Tips for best results
- If the file came from a scanner or camera, run OCR first. Scanned PDFs have no text layer to extract.
- Pick "Page markers" for human reading. The default separator makes it easy to find where one page ends and the next begins.
- Use form-feed for ASCII printers/old tools. If you are piping the .txt to something old, the \f character is the traditional page break.
- Multi-column PDFs need a clean source. If two columns mix into each other, the underlying PDF likely encodes text out of order — try opening it in Acrobat and re-saving.
- Unlock encrypted PDFs first. We refuse to silently return empty text — Unlock PDF, then come back.
Related PDF tools
- TXT to PDF — the inverse: turn plain text back into a formatted PDF.
- PDF to Word — preserves headings and inline styling, not just the words.
- OCR PDF — make a scanned PDF text-extractable first.
- Summarize PDF — let an LLM read the PDF and give you the bullet points.