Best answer

Convert HTML table to CSV or Google Sheet

  • 25 January 2021
  • 14 replies
  • 3632 views

Userlevel 1

Each day we receive an email report that has an html table with important data. Each table has the same number of columns with the same headers. The number of rows is different each time. It is a simple html table with no classes or ids. I’d love to have these tables automatically archived as a csv or google sheet. Any ideas on how to do that?

icon

Best answer by MarijnVerdult 25 January 2021, 17:42

View original

This post has been closed for comments. Please create a new post if you need help or have a question about this topic.

14 replies

Userlevel 4
Badge +3

Hi @emilysia , it should be 6 steps, between your step 1 and 2 you should also insert two steps to “trim the edges”.

 

Can you send the HTML code you’re inputting?

Userlevel 1

That works! Thanks! I’ll be sure to share if I can think of any improvements.

Okay, I think I’ve almost got it… I just need some minor tweaking but just am missing where to make the correction

 

 

Userlevel 4
Badge +3

It seems like that is step 4 and 5 that you need to tweak
In step 4 you might have typed <tr > with an extra space and it seems like step 5 (the line items) dit not work properly. Try copy-pasting the correct text as the delimiter. 

Hi, I’ve been trying to follow the instructions and wanted to make sure I understand. I would like to extract the information from a HTML table from an email but can’t seem to do the splits described above. So there are supposed to be 4 steps in the zap:

  1. New email matching search
  2. Split text (using the HTML body) as input, separators  “</td>   </tr>   <tr>     <td>”, segment index = ALL (as line items)
  3. Split text (using the HTML body) as input, separators  ““</td>     <td>”, segment index = ALL (as separate fields)
  4. Create Spreadsheet rows

 

I’m stuck on steps 3 and 4 because I am not able to get the table data on the output to use in step #4 and the output is still

</div>

</div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr" style="color:rgb(136,136,136);font-size:12.8px"><div><span style="font-size:12.8px"><b>Arnau Rovira Muntada</b></span></div><div><span style="font-size:12.8px">Knowledge Management Lead, Analytics</span></div><div><span style="font-size:10pt;color:rgb(249,157,28)">Devex</span><span style="font-size:10pt;color:gray"> – Do Good. Do It Well.<sup>TM</sup></span></div></div><div

 

Appreciate any assistance. Thanks!

Hi @emilysia , it should be 6 steps, between your step 1 and 2 you should also insert two steps to “trim the edges”.

 

Can you send the HTML code you’re inputting?

Hi! Where do I find the “trim the edges” step? So this is the body html (sorry for the length)
 

<div><br></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Missatge reenviat ---------<br>De: <strong class="gmail_sendername" dir="auto">Marketo Analytics</strong> <span dir="auto">&lt;<a href="mailto:no-reply@marketo.com">no-reply@marketo.com</a>&gt;</span><br>Data: dt, 26 gen 2021 a les 13:07<br>Assumpte: New Unique Subscribers Report<br>Per a: &lt;<a href="mailto:arnau.rovira@devex.com">arnau.rovira@devex.com</a>&gt;<br></div><br><br>

<u></u>

<div>

<br>

<img vspace="0" hspace="10" border="0" alt="Marketo Analytics" src="cid:17743511048a0511ada1" style="max-width: 100%;">

<br><br>

<table border="0" cellpadding="0" cellspacing="0" width="100%" style="border:1px solid #716fb3">

<tbody><tr>

<td style="background-color:#716fb3;font-size:9px"> </td>

</tr>

<tr>

<td>

<table border="0" cellpadding="0" cellspacing="0">

<tbody><tr>

<td style="padding:20px;color:#45555f;font-family:Tahoma,Helvetica;font-size:12px;line-height:18px">

<font style="color:rgb(113,111,179);font-size:23px;font-weight:bold;line-height:26px">New Unique Subscribers</font>

<p style="color:#333;margin-top:6px"></p><table border="0" cellpadding="2"><tbody><tr><th align="left">Report:</th><td><a href="https://app-sjn.marketo.com?tok=2xUAKZjCKgkO5QFywM6%2BJfCwnGDydZaG%2FbByDv7o%2F9w5hGx7ZOnXfmvEp%2FJnOc8Gdu2EBDREMdZoxFEJS7bTc9Bt6uFsM1aGUGb2moPidosceeQ1j8ZFzKCDhTkgh%2F6Gk0%2FOIwkaUyw9MlfMP%2Bp51m%2B1dpU7TyuI72nSaNZ1xA%3D%3D&amp;loginMsgColor=%23666&amp;ssid=170&amp;host=sjnbe4.marketo.org#/classic/AR1708A1ZN1" target="_blank">New Unique Subscribers</a></td></tr><tr><th align="left">Generated:</th><td>Jan 26, 2021 5:52 AM EST</td></tr><tr><th align="left">Time Frame:</th><td>All Time</td></tr><tr><th align="left">Smart List:</th><td>9 rules</td></tr></tbody></table><p></p>

<p>

1-5 of 474

</p><div style="text-align:right;margin-right:4px;margin-bottom:4px">

</div>

<table cellpadding="10" width="800">

<tbody><tr style="background-color:#cccccc">

<th>Created At ⇓</th><th>Total Leads</th></tr>

<tr>

<td>01/24/2021</td><td align="right">450</td></tr>

<tr bgcolor="#EEEEEE">

<td>01/17/2021</td><td align="right">986</td></tr>

<tr>

<td>01/10/2021</td><td align="right">824</td></tr>

<tr bgcolor="#EEEEEE">

<td>01/03/2021</td><td align="right">775</td></tr>

<tr>

<td>12/27/2020</td><td align="right">494</td></tr>

<tr bgcolor="#EEEEEE">

<td><i><u></u>All Others (469):<u></u></i></td><td align="right"><i>212,798</i></td></tr>

<tr>

<td><b><u></u>Total (474):<u></u></b></td><td align="right"><b>216,327</b></td></tr>

</tbody></table>

<br><br>

<br><br>

<p>If you no longer wish to receive this message, you can <a href="http://na-sjn.marketo.com/access/unsubReport?tok=2wARCJuKaUIE5BEurda3Eoa8kDvxJ7CFo4Z%2BT%2FLr08YyhElsfO3adlPgntpDHddmNe2bFwcIXZNuzUp3M9HFLZoVp8VrCmi5Hi%2Fs3fCMPtwYJO0hjpMn9qz4iTdV7M6XkHXIVB5fFg%3D%3D" target="_blank">unsubscribe from this report</a>.</p>

<p></p></td>

</tr>

</tbody></table>

</td>

</tr>

</tbody></table>

</div>

</div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr" style="color:rgb(136,136,136);font-size:12.8px"><div><span style="font-size:12.8px"><b>Arnau Rovira Muntada</b></span></div><div><span style="font-size:12.8px">Knowledge Management Lead, Analytics</span></div><div><span style="font-size:10pt;color:rgb(249,157,28)">Devex</span><span style="font-size:10pt;color:gray"> – Do Good. Do It Well.<sup>TM</sup></span></div></div><div dir="ltr" style="color:rgb(136,136,136);font-size:12.8px"><br><span style="font-size:10pt"></span><p><a href="https://www.devex.com/en/" style="color:rgb(17,85,204)" target="_blank"><span style="font-size:10pt">Devex</span></a><span style="font-size:10pt;color:rgb(127,127,127)"> is the media platform for the global development community.</span></p></div></div></div></div></div></div></div></div></div>

Hi, where do I find “trim the edges”?

 

This is the body/ html code of the email

 

<div><br></div><div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">---------- Missatge reenviat ---------<br>De: <strong class="gmail_sendername" dir="auto">Marketo Analytics</strong> <span dir="auto">&lt;<a href="mailto:no-reply@marketo.com">no-reply@marketo.com</a>&gt;</span><br>Data: dt, 26 gen 2021 a les 13:07<br>Assumpte: New Unique Subscribers Report<br>Per a: &lt;<a href="mailto:arnau.rovira@devex.com">arnau.rovira@devex.com</a>&gt;<br></div><br><br>

<u></u>

<div>

<br>

<img vspace="0" hspace="10" border="0" alt="Marketo Analytics" src="cid:17743511048a0511ada1" style="max-width: 100%;">

<br><br>

<table border="0" cellpadding="0" cellspacing="0" width="100%" style="border:1px solid #716fb3">

<tbody><tr>

<td style="background-color:#716fb3;font-size:9px"> </td>

</tr>

<tr>

<td>

<table border="0" cellpadding="0" cellspacing="0">

<tbody><tr>

<td style="padding:20px;color:#45555f;font-family:Tahoma,Helvetica;font-size:12px;line-height:18px">

<font style="color:rgb(113,111,179);font-size:23px;font-weight:bold;line-height:26px">New Unique Subscribers</font>

<p style="color:#333;margin-top:6px"></p><table border="0" cellpadding="2"><tbody><tr><th align="left">Report:</th><td><a href="https://app-sjn.marketo.com?tok=2xUAKZjCKgkO5QFywM6%2BJfCwnGDydZaG%2FbByDv7o%2F9w5hGx7ZOnXfmvEp%2FJnOc8Gdu2EBDREMdZoxFEJS7bTc9Bt6uFsM1aGUGb2moPidosceeQ1j8ZFzKCDhTkgh%2F6Gk0%2FOIwkaUyw9MlfMP%2Bp51m%2B1dpU7TyuI72nSaNZ1xA%3D%3D&amp;loginMsgColor=%23666&amp;ssid=170&amp;host=sjnbe4.marketo.org#/classic/AR1708A1ZN1" target="_blank">New Unique Subscribers</a></td></tr><tr><th align="left">Generated:</th><td>Jan 26, 2021 5:52 AM EST</td></tr><tr><th align="left">Time Frame:</th><td>All Time</td></tr><tr><th align="left">Smart List:</th><td>9 rules</td></tr></tbody></table><p></p>

<p>

1-5 of 474

</p><div style="text-align:right;margin-right:4px;margin-bottom:4px">

</div>

<table cellpadding="10" width="800">

<tbody><tr style="background-color:#cccccc">

<th>Created At ⇓</th><th>Total Leads</th></tr>

<tr>

<td>01/24/2021</td><td align="right">450</td></tr>

<tr bgcolor="#EEEEEE">

<td>01/17/2021</td><td align="right">986</td></tr>

<tr>

<td>01/10/2021</td><td align="right">824</td></tr>

<tr bgcolor="#EEEEEE">

<td>01/03/2021</td><td align="right">775</td></tr>

<tr>

<td>12/27/2020</td><td align="right">494</td></tr>

<tr bgcolor="#EEEEEE">

<td><i><u></u>All Others (469):<u></u></i></td><td align="right"><i>212,798</i></td></tr>

<tr>

<td><b><u></u>Total (474):<u></u></b></td><td align="right"><b>216,327</b></td></tr>

</tbody></table>

<br><br>

<br><br>

<p>If you no longer wish to receive this message, you can <a href="http://na-sjn.marketo.com/access/unsubReport?tok=2wARCJuKaUIE5BEurda3Eoa8kDvxJ7CFo4Z%2BT%2FLr08YyhElsfO3adlPgntpDHddmNe2bFwcIXZNuzUp3M9HFLZoVp8VrCmi5Hi%2Fs3fCMPtwYJO0hjpMn9qz4iTdV7M6XkHXIVB5fFg%3D%3D" target="_blank">unsubscribe from this report</a>.</p>

<p></p></td>

</tr>

</tbody></table>

</td>

</tr>

</tbody></table>

</div>

</div></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div dir="ltr" style="color:rgb(136,136,136);font-size:12.8px"><div><span style="font-size:12.8px"><b>Arnau Rovira Muntada</b></span></div><div><span style="font-size:12.8px">Knowledge Management Lead, Analytics</span></div><div><span style="font-size:10pt;color:rgb(249,157,28)">Devex</span><span style="font-size:10pt;color:gray"> – Do Good. Do It Well.<sup>TM</sup></span></div></div><div dir="ltr" style="color:rgb(136,136,136);font-size:12.8px"><br><span style="font-size:10pt"></span><p><a href="https://www.devex.com/en/" style="color:rgb(17,85,204)" target="_blank"><span style="font-size:10pt">Devex</span></a><span style="font-size:10pt;color:rgb(127,127,127)"> is the media platform for the global development community.</span></p></div></div></div></div></div></div></div></div></div>

Hi, is there a character limit? I can’t seem to sent it here. I was trying to use formatter » replace to remove the excess html characters but doesn’t seem to be working

Userlevel 4
Badge +3

Hi @A Colpitts, sounds like a fun challenge :)

This is definitely something that this function can help you with:
https://zapier.com/developer/public-invite/110676/4234ae90a3437a6fcfa9d55b675db501/

What I would do is to

  1. remove everything before (and including) the first “<td>”, and everything after (and including) “</td> </tr> </table>” (probably the easiest way is to use two text splitters)
  2. create a loop and have as your delimiter within the loop “</td>   </tr>   <tr>     <td>” so every table row is a new iteration. 
  3. create as next step a Text Formatter where you’ll create for all indexes Seperate Fields with the delimiter “</td>     <td>”
  4. here you can import all items (columns) into your Google Sheet

 

Let me know if that works! Curious to see if there are other solutions :)

 

Userlevel 1

Thanks! I could see that working, however I’m worried it may be a bit task-heavy, as there are often many rows in these tables. One potential solution I found was to save the html text of the email as a file in Drive and then change the extension to xml, which allows me to download it and open it in Excel. Which is okay, but would love to simply open it in my browser in sheets or quickly preview it in drive as a table. 

Userlevel 4
Badge +3

I found an easier way @A Colpitts!

The start is the same but it will take a lot less tasks!
 

  1. remove everything before (and including) the first “<td>”, and everything after (and including) “</td> </tr> </table>” (probably the easiest way is to use two text splitters)
  2. split text to line items on “</td>   </tr>   <tr>     <td>”
  3. split text to sepearte fields on “</td>     <td>”
  4. create spreadsheet row(s) (the one with line-item support) and fill you columns; the rows fill be filled in as separate line items

I used this table as test: https://www.w3schools.com/html/html_tables.asp

Userlevel 4
Badge +3

Hi @emilysia 

Thanks for sharing the code! With “trimming the edges” is that you want to only keep the table itself. The seperators are dependent on how the HTML code looks like so I’ve gone ahead and used the code you gave to test it myself.

All steps below but one are Text Splits (Formatter) where you use the output of the previous step as the input


Step 2:
 

 

Step 3:

 

Step 4: Replace (Text Formatter)

 

Step 5:

 

Step 6:

 

Then I end up with the following which is what you want and can use at input for your Sheet:

 

One important note! 

It seems the code you copied is not the whole table (just the first five instances). If you use the zapier integration, does the whole table show/get used?

  • If yes, than you should be all good.
  • If no, then you can only use the first five instances and you should decide what you want to do with the All Other and Total rows which are now still present as output but with the HTML formatting around it. 

 

Let me know how you get on!

Thank you! I get a blank output on step 2 so I wonder if I am following the instructions correctly?

 

 

Userlevel 4
Badge +3

@emilysia - the line breaks are important as well. You can copy it from your own HTML code for good measure

</th></tr>

<tr>

<td>

The text that you search for should be EXACTLY how it is in the HTML code (including spaces, line break, and capitalization)