An opinion on…: Scrape webpages with node.js

The hard part about scraping data from websites is coming up with ways to quickly and reliably pick out pieces from the document object model (DOM). These days, I spend a lot of time using the jQuery selector syntax to develop my site which means that ideally I’d find a solution that can download a webpage and then provide me with jQuery-like functions and selectors to pick out pieces from the DOM. For this purpose, node.io uses a project called node-soupselect by default, but I found the selector syntax to be lacking. Thus, I layered another project called cheerio on top. Whatever you do, don’t use jsdom as it is too slow and very strict in its processing of html.

 

An opinion on…: Scrape webpages with node.js.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s