Question: Does anyone know of any other smart ways to scrape data from websites? I am currently scraping data from our own website and turn that data into personalized and automated mails and newsletters . These, however, are powered not by import.io (which i'm testing right now) but by an RSS-Feed Generator which continually generates RSS Content from the website and then processed by the newsletter or push-mail-system. But I find that workflow (even though it works perfectly) a bit lame to be honest.
Interesting. Do you mind me asking what platform your website is running on,
What is the actual website
'var url = inputData.url
fetch(url)
.then(res => res.text())
.then(body => {
var output = { url, rawHTML: body }
callback(null, output)
})
.catch(callback)'
It brought back all of the info I needed. The fun part is then parsing it out with code or formatter.
There are a few things to keep in mind:
- if you're running the zap constantly, you will get blocked from the site temporarily, this will also occur with Apify or Import.io
- It's best with a plan with Paths.
- Its tedious, but it works really well!
- You'll need to know regex, but if you don't feel free to pm me, I love this kind of stuff.
Hey
Will have a look. I don't "know" Regex (but have successfully tinkered with it to get custom Google Analytics reporting) but also I don't "know" many other things which I can make work none the less
Will take you up on your offer when I get around rengineering this particular workflow.
PS: i'm using Paths actively already.
As mentioned above, without knowing the website it is hard to give an answer. I can highly recommend using Google Sheets to scrape data. There is inbuilt functionality in IMPORTXML to do just this, using Xpaths (It is easier than it sounds). Then, of course, having the data in a google sheet gives you a lot of freedom in how to manipulate the data with zaps etc.
https://www.distilled.net/blog/distilled/guide-to-google-docs-importxml/
or
https://rozhon.com/sheets-for-marketers/web-scraping-using-importxml-in-google-spreadsheets/
FYI, the website in question from which I generate RSS Feeds to then further process is: https://www.oebu.ch/de/jobs-191.html (among others...).
(I have used an add-on in Google Sheets called ImportFromWeb in this example) but the theory and functionality is pretty much the same.
Great stuff - thank
Thanks
Looks like this has been solved, however a smart and easy way to scrape data into Zapier is via Simplescraper.
The scraped data is sent via webhook to Zapier so no code required. There’s a quick guide here: https://simplescraper.io/docs/scraping-data-into-zapier/.
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.