Today I had to build a Zap that parsed a HTML email template to pick out specific data as fields for a CRM system. I first looked at the Email Parser by Zapier, unfortunately it quickly became obvious that despite multiple methods of matching and having AI functionality, the layout of the email was too complex. The service specifically sending the email also only appeared to send a HTML only email, no plain text which was mainly the issue, as I think realistically the Email Parser by Zapier is designed for basic templates. While there are email parsing services, they cost or require a further subscription, so I looked at other options.
I then went back to a standard email inbound action and then a Code by Zapier block using JavaScript, eventually having to settle on using regex to pick specific parts of the HTML and assign these as fields with some further parsing. This worked, partly due to the fact by some luck the HTML elements had unique identifiers for every key piece of data, this meant the regex rules were able to be fairly specific without risk of matching multiple elements while also not requiring too complex regex rules. While I got a working solution, it raised a potential area for Zapier to consider, DOM parsing tools, specifically for HTML/XML responses.
Looking around Zapier does have some utilities related to HTML parsing and can use regex in places as well as well a web scraping option (This however only works with URLs), and a HTML email template is basically a big string. Regex will also only get you so far before it becomes very error prone and not performant. While JavaScript itself has DOMParser, because Zapier uses Node.Js this is not part of the library and as we know we cannot install additional libraries in the Code by Zapier steps. I also had to use a Code by Zapier step as even if the utilities/formatters could pick out the data, I’d have increased the step count in the overall zap by 3 or 4 times the current amount.
While I personally would consider what I wrote today as completely last resort and against all best practice and judgement, it did suggest there could be a possible avenue for DOM parsing options that are specifically designed to pick elements by identifier, class, attribute etc, rather than resort to regex.
It could have multiple use cases, parsing through HTML/XML as string value i.e. a response from another steps, parsing a GET request from a website, parsing a HTML email template and no doubt other use cases.
It would be good for Zapier to providing DOM parsing tools for these types of scenarios, it is certainly a more advanced use case, but better than regex rules.
I created this discussion to share a potential scenario and use case, feel free to chime in, if anyone has also come across the need for DOM parsing.