Introduction
Using Node.js , you can do what you want like a website for chat , a Social Network Like LinkedIn and Facebook and also you can fetch data from The Web.
Background
In the past, I have wrote this post on the different options you can use to scrape data from the Web using for the HtmlAgilityPack in .Net Development Environment. So you can do the same functionality using the powerful Node.Js .
So Node.js is a platform built on Chrome's JavaScript Runtime for easily building fast, scalable network application. Node.js uses an event-driven, non-blocking Input Output model that make it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices. you can download it here : http://nodejs.org/ .
Using the code
You Will need to install Node.js of course and on top it 3 packages:
- npm install request: we could work with the URLs in an easy way.
- npm install cheerio: Cheerio is a jQuery for the server side.
- npm install fs: We use this Package to make files.
After you finish the installation process, you should start a JavaScript File that contains our Code. So the first thing you must do is call the modules we need in our application:
var request= require('request'),
cheerio = require('cheerio'),
fs = require('fs'),
urls= [];
After that we must add the main call to fetch data from the website, parse it and work on it . In this case, we will fetch the location of the picture founded on www.reddit.com as shown below:
request('http://www.reddit.com/', function(err,resp,body)
{
if(!err && resp.statusCode == 200)
{
var $ = cheerio.load(body);
$('a.title', '#siteTable').each(function(){
var url = this.attr('href');
if(url.indexOf('i.imgur.com')!=-1)
{
urls.push(url);
}
});
Now let's do something with the data we have : we will store the picture founded on the Directory called"img" that you must create it inside the directory of your work.
for (var i=0;i<urls.length;i++)
{
request(urls[i]).pipe(fs.createWriteStream('img/' + i +'.jpg'));
}
}
});
Well done. Now, we have to type on The Node.js Command Prompt : Node Scrapping.js.
Now, you have all the pictures of the websites stored on your directory.
Any comments are welcome!