OCR

OCR PDF — Extract Text from Scanned PDFs

Run OCR on scanned PDFs to pull out selectable text. 100% in your browser.

Choose scanned PDF for OCR
OCR happens 100% in your browser — large files may take a minute

Optical Character Recognition — how it works and when to use it

OCR (Optical Character Recognition) analyses pixel patterns in a scanned image and converts them into machine-readable text. This tool runs Tesseract — the industry-standard open-source OCR engine, maintained by Google — compiled to WebAssembly so it processes your PDF entirely inside your browser without touching a server.

What Tesseract handles well

  • Clean, high-contrast scans of printed documents (accuracy typically >95%)
  • Standard A4 and letter-size documents with single-column layout
  • Mixed text and images — text regions are extracted, images are skipped
  • Documents in English, with support for additional Tesseract language packs

OCR PDF vs Make PDF Searchable — which do you need?

This tool extracts text as a plain .txt file — ideal for copying into Word, pasting into a database, or feeding into another tool. If you want to keep the original scanned PDF but make it Ctrl+F-searchable, use Make PDF Searchable instead, which adds an invisible text layer without changing how the document looks.

How to use this tool

  1. Upload your scanned PDF.
  2. Wait while OCR runs page by page (5–15 seconds per page).
  3. Copy the extracted text from the preview, or download the full output as a .txt file.

Frequently asked questions

Which languages does OCR support?

English is loaded by default. Tesseract supports 100+ languages — Hindi, Tamil, Bangla, Arabic, Chinese, and more — though non-English packs are larger and load slower.

How accurate is the OCR?

Very accurate (95%+) for clean 150+ DPI printed scans. Accuracy drops for skewed scans, low-resolution photos, or heavily stylised fonts. Handwriting recognition is limited.

My PDF has columns — will it extract them in the right order?

Single-column documents work perfectly. Two-column layouts (newspapers, academic papers) may interleave columns. Run OCR on individual pages and reorder the text manually if needed.

What's the difference between this and Scanned PDF to Text?

Same OCR engine, different audience. This page is for users who know the term OCR. Scanned PDF to Text is the same tool framed for users who just want text from a paper document.

Is the file ever uploaded to a server?

No. Tesseract runs via WebAssembly in your browser. Your PDFs never leave your device.

Related tools