OCR — optical character recognition — turns a scanned image of a page into selectable, searchable text. The natural question before you rely on it is: how accurate is it, really? The honest answer: on a clean, high-resolution scan of ordinary printed text, modern OCR reaches roughly 95 to 99 percent character accuracy, but that figure swings widely with the quality of what you feed it. A pristine 300 DPI scan in a common font sits at the top of the range; a faded fax of stylized text or handwriting falls far below.
This guide explains what those numbers mean, the factors that drive accuracy up or down, how browser-based engines like Tesseract compare with cloud services, and the steps that get cleaner results. You can try recognition with the in-browser OCR PDF tool, which processes your file locally without uploading it.
What “OCR accuracy” actually measures
“99 percent accurate” sounds definitive, but accuracy can be measured two ways that tell very different stories.
- Character accuracy is the percentage of individual letters, digits, and symbols recognized correctly. This is the number vendors usually quote because it looks impressive.
- Word accuracy is the percentage of whole words that are completely correct. A single wrong character ruins the entire word, so word accuracy is always lower.
The gap matters. A page at 98 percent character accuracy might be only around 90 percent word accuracy, because those scattered errors land in many different words. Concretely: 98 percent character accuracy on a 2,000-character page means about 40 wrong characters — noticeable when you read it, even though the percentage sounds high.
Which metric matters depends on your goal:
- For full-text search and indexing, character accuracy is what counts, because most words remain findable even with a few errors scattered around.
- For republishing or quoting text verbatim, word accuracy is the realistic measure, and you will almost certainly need to proofread.
If a guide on what a searchable PDF is tells you OCR makes a document searchable, that is true at typical character-accuracy levels — but “searchable” is a lower bar than “perfectly transcribed.”
The factors that drive OCR accuracy
Accuracy is not a fixed property of an OCR engine; it is mostly a property of the input. Five factors dominate.
1. Scan resolution (DPI)
This is the biggest lever most people control. OCR engines expect characters of a certain pixel height. Below about 200 DPI the letters become too coarse and errors climb steeply. The established sweet spot is 300 DPI for standard text — enough detail without bloating the file. Going past 600 DPI rarely helps and just slows processing. If your accuracy is poor, re-scanning at 300 DPI is usually the highest-impact fix.
2. Language and the recognition model
OCR engines use language models to disambiguate similar shapes, so you must tell the engine which language to expect. Run an English model over a French document and accuracy suffers, because its expectations about letter combinations and accents are wrong. Common, widely-supported languages perform best; less common scripts trail behind.
3. Font and print quality
Clean, standard typefaces — the kind used in books and reports — are what engines are trained on and recognize best. Accuracy drops with:
- Decorative, script, or condensed fonts whose letterforms differ from the norm.
- Faded, smudged, or photocopied text with low contrast.
- Very small type, which compounds with low DPI.
A crisp original in a plain font is worth more than any algorithmic cleverness.
4. Image quality: contrast, skew, and noise
Even at high DPI, a poor image hurts. Low contrast, skew (a page scanned at an angle), phone-photo shadows, speckle, and bleed-through from the reverse side all degrade recognition. A flat, square, evenly-lit, high-contrast scan is ideal. Our guide to scanning documents with a phone camera covers getting a clean image, and the Scan PDF tool helps produce a tidy result before OCR.
5. Handwriting
Standard OCR is built for typeset characters and largely cannot read handwriting reliably. Neat block printing may yield passable results; natural cursive is often effectively unusable. Reliable handwriting recognition requires a different, harder technology — intelligent character recognition (ICR) — which only the most advanced cloud models handle reasonably well. If your document is handwritten and the content matters, plan to verify or retype it.
Realistic accuracy expectations by document type
Putting the factors together, here is what to realistically expect at the character level:
| Document type | Typical character accuracy | Notes |
|---|---|---|
| Clean 300 DPI print, common font, supported language | 98–99%+ | The best case; minor proofing for verbatim use |
| Decent office scan or good phone photo | 95–98% | Usually fine for search; proof for republishing |
| Low-resolution or faxed print | 85–95% | Noticeable errors; verify important fields |
| Stylized, ornate, or very small fonts | 80–93% | Expect manual correction |
| Neat hand printing | 70–90% | Highly variable; check everything |
| Natural cursive handwriting | Often unusable | Needs specialized ICR, not standard OCR |
These ranges are deliberately broad because the real determinant is your specific input. The same engine that hits 99 percent on a clean report might manage 80 percent on a faded fax.
Tesseract versus cloud OCR
There are two broad categories of OCR you will encounter.
Tesseract is the leading open-source OCR engine, mature and widely used, and it powers browser-based tools that run entirely on your device. On clean, high-resolution scans of standard fonts in supported languages, it is genuinely strong, frequently landing in the same 95-plus percent range as commercial tools.
Cloud OCR services from large providers run on servers trained on enormous proprietary datasets. Their advantage shows up on hard inputs: low-quality scans, complex multi-column layouts, tables, unusual fonts, and — most of all — handwriting, where their machine-learning models meaningfully outperform general-purpose engines.
The trade-off is privacy. Cloud OCR uploads your document to a third-party server; browser OCR keeps the file on your device. For a faded manuscript you do not mind sharing, the cloud’s edge may be worth it. For a scanned contract, tax form, or medical record, the privacy of in-browser OCR usually outweighs a marginal gain on what is, in most real cases, ordinary printed text where the gap is small. Our deeper explainer, OCR PDF online free with Tesseract explained, covers how the in-browser engine works and its limits.
How OCR fits into a searchable PDF
Accuracy also depends on what you are producing. OCR has two common outputs:
- Plain extracted text you copy out for use elsewhere. The PDF to Text tool gives you the raw recognized text.
- A searchable PDF, where an invisible text layer is added underneath the original scanned image. The page still looks exactly like the scan, but you can select and search the text behind it.
The searchable-PDF approach is forgiving of moderate errors, because you keep the perfect original image for reading and the imperfect text layer only powers search and copy. Even at 95 percent character accuracy, most search queries succeed because most words are intact. For the concept in full, see what is OCR and how it works.
Practical tips to maximize accuracy
The biggest wins come from the input, not the engine. In rough order of impact:
- Scan at ~300 DPI. This alone fixes most accuracy complaints. Below 200 DPI, expect trouble.
- Maximize contrast and lighting. Black text on white, evenly lit, no shadows. A dim phone photo is a common culprit.
- Keep the page flat and square. Skew confuses line detection.
- Pick the right language before running OCR, so the model matches your text.
- Start from the cleanest original. A first-generation print beats a third-generation photocopy.
- Use a plain font where you have any choice over the source.
- Always proofread documents you will reuse verbatim, especially numbers, names, and codes where one wrong character changes meaning.
How to run OCR privately, in your browser
The OCR PDF tool recognizes text on your device, with no upload. The PDF is read into your browser’s memory, the Tesseract engine processes the pages locally, and you get back a searchable PDF or extracted text — nothing travels to a server.
- Open the OCR PDF tool and load your scanned PDF.
- Select the document’s language so the engine uses the right model.
- Run recognition. On a clean scan, expect strong accuracy; on a poor one, expect to proofread.
- Download the searchable PDF, or pull the text out with PDF to Text.
This matters because the documents people OCR are often exactly the ones they should not upload: scanned contracts, statements, IDs, and medical records. In-browser OCR makes them searchable without exposing them to a third party.
Conclusion
So, how accurate is PDF OCR? On clean, high-resolution printed text in a supported language, realistically 95 to 99 percent at the character level — good enough to make a document fully searchable and usable. But accuracy is mostly a function of your input: resolution, contrast, straightness, font, language, and above all whether the text is printed or handwritten. Improve the scan and you improve the result, far more than switching engines would. And because the OCR PDF tool runs entirely in your browser, you can recognize text in sensitive documents at strong accuracy without ever uploading them anywhere.
Want to test it on your own document? Run it through the free, no-upload OCR PDF tool and judge the accuracy for yourself.
Use OCR PDF: Convert scanned PDFs into searchable selectable documents. No signup, nothing uploaded.
Frequently asked questions
On a clean, high-resolution scan of standard printed text, modern OCR engines typically reach 95 to 99 percent character accuracy, which means roughly one to five errors per hundred characters. The exact figure depends heavily on input quality: a crisp 300 DPI scan of a clearly printed document in a common font and a well-supported language sits at the top of that range, while a faxed, low-resolution, or skewed page falls well below it. It is important to read accuracy at the character level, because even 98 percent character accuracy can leave a noticeable scattering of typos across a long document. For searchability and full-text indexing this is usually good enough, but for anything that will be republished verbatim, a human proofread of the OCR output is still wise.
Character accuracy measures the percentage of individual letters, digits, and symbols recognized correctly, while word accuracy measures the percentage of whole words that are completely correct. The two diverge because a single misread character ruins an entire word, so word accuracy is always lower than character accuracy on the same page. For example, 98 percent character accuracy might correspond to only about 90 percent word accuracy, because those scattered character errors land in many different words. When a vendor quotes a high accuracy number, ask which metric they mean; character accuracy looks more impressive but word accuracy better reflects how usable the text feels when you read it. For full-text search, character accuracy matters most, since most words will still be findable.
Yes, dramatically, and it is the single biggest lever most people control. OCR engines are trained on characters of a certain pixel height, and below roughly 200 DPI the letters become too coarse for reliable recognition, causing errors to climb sharply. The widely recommended sweet spot is 300 DPI for standard text, which gives the engine enough detail without bloating the file. Going much above 600 DPI rarely improves accuracy further and just produces larger images that process more slowly. If your accuracy is poor, before blaming the engine, re-scan the original at 300 DPI in good lighting with the page flat and square, since a better input almost always beats a better algorithm. Resolution, contrast, and straightness together account for most accuracy problems.
Generally no, not with the same reliability as printed text. Standard OCR engines are built for typeset characters with consistent shapes and spacing, and handwriting violates all of those assumptions with variable letterforms, connected cursive, and inconsistent baselines. On neat block printing an engine may manage passable results, but on natural cursive accuracy drops steeply and can be effectively unusable. Recognizing handwriting reliably requires a specialized technology called intelligent character recognition, or ICR, which is a different and harder problem that the best cloud providers handle far better than general-purpose engines. If your document is handwritten and the text matters, expect to verify or retype most of it, and treat any OCR output as a rough draft rather than a faithful transcription.
For clean printed documents the gap is smaller than you might expect, but cloud services generally hold an edge on difficult inputs. Browser-based OCR using the Tesseract engine performs very well on crisp, high-resolution scans of standard fonts and supported languages, often landing in the same 95-plus percent range as cloud tools. Where large cloud providers pull ahead is on hard cases: low-quality scans, complex layouts with columns and tables, unusual fonts, and especially handwriting, where their machine-learning models have been trained on enormous datasets. The trade-off is privacy: cloud OCR uploads your document to a third-party server, while browser OCR processes it entirely on your device. For sensitive files, the privacy of in-browser OCR often outweighs the marginal accuracy advantage of the cloud on everyday printed text.
Start with the input, because it matters more than the engine. Scan or photograph the original at around 300 DPI, with even lighting, high contrast between text and background, and the page flat and square to the camera so lines are not skewed. Choose the correct recognition language before running OCR, since telling the engine to expect English when the text is French degrades results. Clean documents in a common, well-printed font outperform faded, ornate, or stylized text, so if you can obtain a better copy of the original, do. After OCR, always spot-check the output, paying special attention to numbers, names, and anything where a single wrong character changes meaning. For documents you will reuse verbatim, budget time for a human proofread of the recognized text.
Related articles
Best Document Scanner Apps 2026 (Honest)
We ranked 9 document scanner apps in 2026 on privacy, OCR, and price. See which upload your scans, which had breaches, and the safest in-browser option.
Best Free PDF Compressor 2026 (Tested)
We tested 10 free PDF compressors in 2026 on file size, quality, privacy, and limits. See the rankings, the comparison table, and which one wins for you.
Best Free PDF Reader 2026
We tested 10 free PDF readers in 2026 on privacy, offline use, annotation, speed, and platform. See the rankings, comparison table, and best way to view PDFs.