Home›Tools›PDF to Text

PDF to Text

Extract plain text from a PDF to a .txt file. 100% in your browser — nothing uploaded.

Select a PDF

or drop one PDF here

100% in-browser No upload No signup

—

Page separator

Extracting text…

Your file is ready

output.pdf · —

Processed entirely in your browser — the file never left your device.

How to convert PDF to text

Three steps. Everything runs locally.

Pick a PDF

Drop or select one PDF. It is read locally — no upload.

Pick separator

How to mark page breaks in the output. Default is human-readable markers.

Download .txt

Plain UTF-8 text file that opens in every editor.

Keep going

Related PDF tools

PNG to PDF

Convert PNG images to PDF with transparency support.

What is "PDF to Text"?

Converting a PDF to text means stripping a PDF document down to its plain words — no fonts, no images, no layout. The result is a UTF-8 .txt file that opens in every editor, every operating system, every decade. It is the format of choice when you need to grep through a document, paste it into a chatbot, feed it to a script, or simply archive what the PDF said in the smallest possible file.

The text in a normal PDF is stored as a sequence of positioned characters. This tool reads those characters back, groups them into lines by Y-coordinate, sorts the lines top-to-bottom, sorts items left-to-right within each line, and writes the result as a single UTF-8 string with one page after another.

How PDF to Text works in your browser

When you drop a PDF, your browser reads it into memory. We hand the bytes to PDF.js, Mozilla's open-source PDF engine. For each page we call getTextContent(), which returns every text item with its position. We cluster the items into lines, sort them in reading order, and concatenate them into a single string. There is no OCR step — text already in the PDF as text is extracted directly. If the PDF is a scanned image, the text layer is empty and nothing comes out (use OCR first in that case).

The result is written to a Blob and offered as a download. Nothing is uploaded. The entire pipeline — parsing, extraction, packaging — runs inside your browser tab. You can run it offline and it still works.

Common use cases

Searchable archive of a folder of PDFs. Extract text from each PDF, store the .txt next to it, and now grep finds anything in seconds.
Pasting a PDF into ChatGPT/Claude/Gemini. Most LLM web UIs accept plain text far more cleanly than PDFs — paste the .txt and ask away.
Diffing two versions of a document. Extract both PDFs, run diff, and see exactly what changed.
Pre-processing for NLP / scripts. Sentiment analysis, named-entity extraction, summarisation — they all take text, not PDF.
Accessibility. Convert to text, paste into a screen-reader-friendly editor.

Privacy & security

Contracts, payslips, medical records, internal reports — the kind of documents people most want to extract text from are exactly the ones they least want sitting on a stranger's server. Most online PDF-to-text tools upload the file, extract text on their server, and deliver a .txt. imisspdf does the same job with PDF.js running inside your tab. There is no upload, no account, no daily limit. See our iLovePDF privacy review for what the standard upload model actually looks like.

Frequently asked questions

Scanned PDFs are images of pages, not text. There is no text layer to extract — the words you see are just pixels. Run the PDF through our OCR tool first; it adds a text layer over the image, after which pdf-to-text can extract the words. If the PDF was created by photographing or scanning paper, OCR is always the missing step.

No. The output is plain UTF-8 text — no bold, italic, font sizes, or colours. Lines from the source PDF are kept, but multi-column layouts are flattened into reading order one column at a time. If you need formatting preserved, use PDF to Word instead, which keeps headings and inline styling.

Best-effort. Tables become tab-separated-ish lines based on the original column positions in the PDF — usually readable but not perfectly aligned. Multi-column articles are extracted one column at a time, top-to-bottom. For accurate table extraction, use PDF to Excel.

No. PDF.js parses the file inside your browser tab, extracts the text using getTextContent(), and writes the result to a Blob that downloads to your computer. Nothing crosses the network. You can verify this by running the tool while offline — it still works.

Not directly. Encrypted PDFs cannot be parsed without the password. Run the file through our Unlock PDF tool first (provide the password), then bring the unlocked PDF here. We refuse encrypted PDFs explicitly with a friendly error rather than silently returning an empty file.

Tips for best results

If the file came from a scanner or camera, run OCR first. Scanned PDFs have no text layer to extract.
Pick "Page markers" for human reading. The default separator makes it easy to find where one page ends and the next begins.
Use form-feed for ASCII printers/old tools. If you are piping the .txt to something old, the \f character is the traditional page break.
Multi-column PDFs need a clean source. If two columns mix into each other, the underlying PDF likely encodes text out of order — try opening it in Acrobat and re-saving.
Unlock encrypted PDFs first. We refuse to silently return empty text — Unlock PDF, then come back.

Related PDF tools

TXT to PDF — the inverse: turn plain text back into a formatted PDF.
PDF to Word — preserves headings and inline styling, not just the words.
OCR PDF — make a scanned PDF text-extractable first.
Summarize PDF — let an LLM read the PDF and give you the bullet points.

Tools

Solutions

Company

Product

PDF to Text

Select a PDF

Your file is ready

How to convert PDF to text

Pick a PDF

Pick separator

Download .txt

Related PDF tools

TIFF to PDF

XPS to PDF

HEIC to PDF

PNG to PDF

What is "PDF to Text"?

How PDF to Text works in your browser

Common use cases

Privacy & security

Frequently asked questions

Tips for best results

Related PDF tools