An opinion on…: Scrape webpages with node.js

The hard part about scraping data from websites is coming up with ways to quickly and reliably pick out pieces from the document object model (DOM). These days, I spend a lot of time using the jQuery selector syntax to develop my site which means that ideally I’d find a solution that can download a webpage and then provide me with jQuery-like functions and selectors to pick out pieces from the DOM. For this purpose, node.io uses a project called node-soupselect by default, but I found the selector syntax to be lacking. Thus, I layered another project called cheerio on top. Whatever you do, don’t use jsdom as it is too slow and very strict in its processing of html.

 

An opinion on…: Scrape webpages with node.js.

Advertisements