Skip to main content
Question

How to Improve PDF Data Extraction to Excel


Hi,

I’m using Zapier to extract data from PDFs and send it to Excel using PDF parser apps like PDF.co or Docparser. However, I’ve noticed that the accuracy of data extraction heavily depends on the structure of the PDF. For example, when working with compressed or complex PDFs, the output is similar to what I get when converting the file manually using Foxit PDF Editor.

Any advice or suggestions would be greatly appreciated!

Thank you!

Did this topic help you find an answer to your question?

4 replies

Forum|alt.badge.img+1

Hi ​@romulusio1986 

 

Here is a reply I gave to another recent post about PDF extraction and parsing. It uses Google Docs to turn your PDF into plain text. This may make it simpler and more accurate for your parser steps

 

 

If this isn’t helpful then it would be great if you could post actual examples of what is happening, i.e when it is working and when it is not giving the desired result.

 

I hope this helps :)

Tell us how it goes


Hi,

Thank you for your insights! I tested extracting data from PDFs to Excel using PDF.co via Zapier, but I noticed an issue: the extracted data is still compressed into a single cell, instead of being spread out in a table format with proper rows and columns.

Is there a way to adjust the settings or add steps in the Zapier workflow to ensure the data is formatted like a table when sent to Excel? Any guidance on how to handle this would be greatly appreciated!

Thanks in advance!

 


Forum|alt.badge.img+1

@romulusio1986 

It looks like you have the data in a string, I assume you can access this in Zapier formatter steps etc? Maybe there is some other Line items output that can be acheived with PDF.co which someone with experience there can assist.

Other than that, it will be possible to forrmat this string in Zapier and get it into your spreadsheet (just a technical issue now). if you want to follow that route I can help.

To proceed it would be good to understand, What is the end goal for this zap? i.e will you be getting a similar table from lots of pdfs? Do you want to get any table from any pdf? All of this can add to the complexity of the task.

Would you be able to post an image of what you would like this to look like in your excel? It looks like two columns to me, with a Dviding row about 8 Rows down. Do you want to keep the Notes etc?

Are column headers missing from the data?

 

I’ll be looking for your reply


SamB
Community Manager
Forum|alt.badge.img+11
  • Community Manager
  • 7396 replies
  • January 15, 2025

Hey there, ​@romulusio1986 👋

It’s been a while since you posted here—were you able to get things sorted on this?

If not, feel free to send over some further details as shane.massey mentioned and we’ll go from there! 🙂