Best answer

What is best way to process large CSV files

  • 27 October 2022
  • 5 replies
  • 806 views

I have CSV files with between 300,000 and 600,000 records, what is the  most efficient way -or what is the recommended way to process these files?  I should mention I would prefer to have them processed after they are received at a predefined mailbox

icon

Best answer by RALaBarge 31 October 2022, 15:41

View original

This post has been closed for comments. Please create a new post if you need help or have a question about this topic.

5 replies

Userlevel 7
Badge +14

Hi @phorion 

Good question.

Please explain more about what you are trying to do with the data from the CSV files.

 

For some context…

Formatter > Utilities > Import CSV File: https://help.zapier.com/hc/en-us/articles/8496060898701-How-to-Import-CSV-Files-with-Formatter#common-problems-with-the-import-csv-file-utility-0-1

 

CSV Import only supports file sizes < 150K error

The utility only supports importing files that are 150K or less (which is around 1000 rows of a 10 column CSV file.) You'll need to split the CSV file into multiple files if it's too large.

 

The Looping app which can be used to handle line items, currently has a 500 record limit.

 

You just got me thinking this through a little more and I realized that I only need a conduit.  The processing of the file will be accomplished with or by our ETL tool - So if Zapier can monitor a mailbox and copy the file when it is received as an attachment to a Amazon S3 bucket, we should be good.
Source - mailbox. -→ Zapier. --→ Destination : S3 Bucket
- Thoughts??? Is this easier, simpler?  It would be great if when the file is received we could do a linux like wc -l and determine the number of rows in the file, and other attributes like timestamps (when the file was processed)

 

Thanks for the reply,, Also we have an additional requirement for end-to-end PGP Encryption.  When the file is sourced it should be PGP encrypted before it is transported/copied to S3 Bucket.

Source - mailbox. -→ Complete File PGP Encrypted→ Zapier. --→ Destination : S3 Bucket

Userlevel 7
Badge +14

@phorion 

Try using this Zap trigger: Gmail - New Attachment

 

Userlevel 4
Badge +7

Hey there @phorion !

I agree with Troy that the best Email trigger to grab attachments with would be the Gmail New Attachment trigger.  If your emails have additional attachments on it (like photos used in email headers or other things of that sort), you would need to use a Filter by Zapier action as the second step.

At the moment, Zapier offers no apps which will encrypt something with your PGP key.  Our Code Steps do not let developers call non-standard libraries either and it appears that neither Python and Node.js have PGP support built into the base language.

Finally, we do support an Amazon S3 app which does support uploading files.

I do see that Amazon S3 supports plugins and that someone has made a PGP encryptor which can run after Zapier drops off the file.

Zapier does encrypt data both at rest and in motion in our service, as well as offering SOC2 compliance.  More on this here.

Let us know if you run into any snags along the way!