Skip to main content
Question

Unable to scrape web content and return from sub-zap

  • 16 March 2024
  • 5 replies
  • 43 views

 I need to be able to automatically visit a url (a blog/news article) and grab the article on that page. These are usually news articles in foreign languages, mainly Arabic.

 

Zapier Web Parser

I've had a look at several tools that scrape web data including Zapier’s web parser. Unfortunately, it doesn’t work when I pass it some article pages, such as this one from Aljazeera News. The parser returns only a small part of the article. See screenshot below.

 
Apify  

So I tried using Apify's web scraper and it does indeed extract the web content. It returns some links including the below ‘dataset items file urls CSV’, which I then pass to a Files by Zapier step to extract the content as line items. One of those line items is the text content of the article.

However, my problem is now that this action is within a sub-zap, and I need to return the extracted content back to my main zap. And this fails due to the value being too large. See screenshot:


I feel like I’m almost there, but have a hit a wall.

Do you have any advice on solving the above error? I don’t know why it’s ‘too large’ because it’s just the raw text content of the article, and I can’t see how text content can be that memory intensive.

Hi @Ed M! 👋

I couldn’t see any existing bug reports or feature requests regarding the size limits with Sub-Zap by Zapier. So I’d recommend reaching out to our Support team regarding the size limit error you’re running into here.

That said, I wonder if it might be worth moving the Files by Zapier action to the main Zap, then in the Return From a Sub-Zap action you just give it the URL for the CSV file that was generated. Just thinking that approach might help to get around the size limit you’re running into here. 

Can you give that a try and let me know whether that approach works any better?


Thanks @SamB. I’ll try that and also submit a bug report. Will keep this thread updated. 


@SamB I skipped the test of that return from sub zap action and ran the zap regardless, and it seems that the sub zap actually worked. So the problem seems to be the error message that shows up when testing the action. I’ve raised a ticket about it.

By the way, I would much prefer to use Zapier’s own web parser for this action. Is there a reason why it was unable to extract the article content?


Thanks for letting me know, @Ed M. Looking forward to hearing from you! 🙂 


Thanks for the update here @Ed M. I’m so pleased the Zap worked correctly when it ran live, despite the error when testing in the Zap editor. 🎉
 

By the way, I would much prefer to use Zapier’s own web parser for this action. Is there a reason why it was unable to extract the article content?

Hmm, I expect there may have been an issue with the structure of the page content it was attempting to extract.

That said, I did some further digging just now and spotted an existing bug report for a similar issue that might be related, but I can’t confirm for definite from my side. The bug report I found mentioned not being able to extract the content at all, so isn’t a 100% match for what you described previously. I think it would be worth reaching out to the Support team about this as well. They’ll be better able to determine whether it was definitely caused by an existing bug or whether there’s a different another reason behind why wasn’t able to extract all the text.

Please do keep us posted on how it goes with them, keen to know what the cause turns out to be! 🙂


Reply