Question

Python interpreter

Forum|Forum|2 years ago
May 22, 2023
1 reply
105 views

Eslam242
New

I created python code to extract text from scanned PDF, and its extracted text well but not the layur so i need to extract the text with the same layout as the PDF, including tables, margins, alignment, and indentation.

This post has been closed for comments. Please create a new post if you need help or have a question about this topic.

Otis32
New
Forum|Forum|2 years ago
May 22, 2023

I created python code to extract text from scanned PDF, and its extracted link well but not the layur so i need to extract the text with the same layout as the PDF, including tables, margins, alignment, and indentation.

Use an OCR library or tool specifically designed for scanned PDFs. OCR technology recognizes and extracts text from images or scanned documents, retaining the layout information. Tesseract is a popular open-source OCR engine that can be integrated with Python.

Like

Useful links

Sign up

Use your Zapier credentials

Log in to the Community

Use your Zapier credentials

Scanning file for viruses.

This file cannot be downloaded