I receive an email with a pdf attachment. I want to extract a phone number from that pdf attachment. I have it set to send the pdf to Google Drive as a Google doc. How do I extract the phone number? None of the options I am seeing seem to pull the number.
Best answer by SamBView original
Just wanted to follow up here to add a few more suggestions for anyone looking to extract phone numbers from a PDF. :)
Troy’s suggestion of using a Formatter step to extract the phone number is spot on if you’ve got access to the text within the PDF. For that you’d need to use either a Formatter (Text > Extract Phone Number) or Formatter (Text > Extract Pattern) action. The Extract Pattern route may be better if you have multiple phone numbers that you need to extract.
You can find out more about how to extract phone numbers within Regex here: How to Extract Email Addresses, Phone Numbers, and Links From Text
What if you don’t have access to the text contained with the PDF?
In that case, then it may be necessary to enlist the help of an app like DocParser.
DocParser would be able to extract the individual data fields from the PDF and allow you to select the phone number in your Zap. We have a blog post that talks more about this here: Extract Structured Data from PDFs with Docparser
What if DocParser isn’t a viable option for you?
It may be possible to use Google Drive to automatically convert the PDF into a Google Docs text file. Google have a guide that talks about this here: Convert PDF and photo files to text
The Google Drive integration has the option to select whether or not to convert a file to a Google Document when uploading documents:
The slight snag with this is that it’s not guaranteed that all the text will be converted correctly, for best results Google recommends using common fonts like Arial or Times New Roman.
But if that’s the route you wanted to take, you could use an Upload File action to to automatically convert the PDF into text. The contents of a Google Doc are accessible when using either the New Document in Folder or the New Document Google Docs triggers. So you’d need to set up a separate Zap (from the one that uploads the file) with one of those triggers, to trigger on the converted doc and access the text from within it.
You’d then want to follow that up with a Formatter (Text > Extract Phone Number or Extract Pattern) action (as mentioned above) to extract the phone number from the Google Doc.
If you can access the PDF data, then try using a Formatter step.
Check out this article: https://zapier.com/blog/zapier-formatter-guide/#extract