Question

Prevent HTML from truncating when using a GET webhook to retrieve the HTML from an URL

  • 6 April 2023
  • 4 replies
  • 158 views

Hi,

I am creating a Zap where I need to get only the text of the latest post in a Google News Feed.

I am using a “New Item in Feed in RSS by Zapier” trigger. Then, I use a GET Webhook setting the latest URL in the feed as an Action, in order to retrieve the raw html of the post. Finally, I use a Text Formatter to remove all html tags, thus obtaining only the text of that post.


The problem is that some posts have such a large html that it gets truncated at the Webhook step:

 

I am using the default Webhook configuration to access the URL:

 

Anyone knows a way to prevent it from truncating the html? Thanks in advance.


4 replies

Userlevel 7
Badge +14

Hi @Pau 

Good question.

The data should still be there but is truncated for display purposes.

 

Thanks @Troy Tessalone, happy to know that :-) The Text Formatter for removing HTML tags still is not optimal, though. It doesn’t stripe all the html from the text, and therefore it leaves a lot of unwanted text such as the menu elements, etc.

 

This is coming from the source URL:

I guess the solution would be to stripe everything before the first <body tag. Are you aware of any way to do so with a kind of Text Parser in Zapier?

 

Thank you!

Userlevel 7
Badge +14

@Pau

A Code step can be used instead of a Formatter step to remove the desired HTML tags.

OR

You can try to use AI (OpenAI / ChatGPT) to do the desired formatting.

Thanks @Troy Tessalone, I do not really know how to remove the html tags with a Code step so I will try prompting ChatGPT to ignore any remaining unwanted text.

I appreciate your help 😀

Best regards

Reply