This excerpt is from the new book, ‘Sams Teach Yourself Node.js in 24 Hours’ authored by George Ornbo, published by Pearson/SAMS, Sept. 2012, ISBN 9780672335952, Copyright © 2013 by Pearson Education, Inc. For more info please visit the publisher site: www.informit.com/title/9780672335952
| George Ornbo Published by by Sams ISBN-10: 0-672-33595-6 ISBN-13: 978-0-672-33595-2 |
- Receive data from
Twitter’s streaming API
- Parse data received from
Twitter’s streaming API
- Push third-party data
out to clients in real-time
- Create a real-time graph
- Discover whether there
is more love or hate in the world by using real-time data from Twitter
Streaming APIs
In Hour 13, "A Socket.IO Chat Server," you learned how to create
a chat server with Socket.IO and Express. This involved sending data from
clients (or browsers) to the Socket.IO server and then broadcasting it out to
other clients. In this hour, you learn about how Node.js and Socket.IO can also
be used to consume data directly from the Web and then broadcast the data to
connected clients. You will be working with Twitter’s streaming Application
Programming Interface (API) and pushing data out to the browser in real-time.
With Twitter’s standard API, the process for getting data is as
follows:
- You open a connection to the API server.
- You send a request for some data.
- You receive the data that you requested from the API.
- The connection is closed.
With Twitter’s streaming API, the process is different:
- You open a connection to the API server.
- You send a request for some data.
- Data is pushed to you from the API.
- The connection remains open.
- More data is pushed to you when it becomes available.
Streaming
APIs allow data to be pushed from the service provider whenever new data is
available. In the case of Twitter, this data can be extremely frequent and high
volume. Node.js is a great fit for this type of scenario where large numbers of
events are happening frequently as data is received. This hour represents
another excellent use case for Node.js and highlights some of the features that
make Node.js different from other languages and frameworks.
Signing Up for Twitter
Twitter provides a huge amount of data to developers via a free,
publically available API. Many Twitter desktop and mobile clients are built on
top of this API, but this is also open to developers to use however they want.
If you
do not already have a Twitter account, you need one for this hour. You can sign
up for an account for free at https://twitter.com/. It takes less than a
minute! Once you have a Twitter account, you
need to sign into the Twitter Developers website with your details at
http:// dev.twitter.com/. This site provides
documentation and forums for anything to do with the Twitter API. The documentation is thorough, so if you want, you
can get a good understanding of what types of data you can request from the API
here.
Within the Twitter Developers website, you can also register
applications that you create with the Twitter API. You create a Twitter
application in this hour, so to register your application, do the following:
- Click the link Create an App.
- Pick a name for your application and fill out the form (see
Figure 14.1). Application names on Twitter have to be unique, so if you find
that the name has already been taken, choose another one.
Figure 14.1 Creating a Twitter application
Once you create your application, you need to generate an access
token and an access token secret to gain access to the API from your application.
- At the
bottom of the Details tab is a Create My Access Token button (see Figure 14.2).
Click this button to create an access token and an access token secret.
Figure 14.2 - requesting an access token
- When
the page refreshes, you see that values have been added for access token and
access token secret (see Figure 14.3). Now you are ready to start using the
API!
Figure 14.3 A successful creation of an access token
By The Way
OAuth Is a Way of Allowing Access to Online Accounts
oAuth is an open standard for authentication, typically used
within the context of web applications. It allows users to grant access to all
or parts of an account without handing over a username or password. When a user
grants an application access to their account, a unique token is generated.
This can be used by a third-party services to access all or parts of a user’s
account. At any time, the user can revoke access and the token will no longer
be valid so an application would no longer have access to the account.
Using Twitter’s API with Node.js
Once you create your application within the Twitter Developers
website and request an OAuth access token, you are ready to start using the
Twitter API. An excellent Node.js module is available for interacting with the Twitter
API called ntwitter. This module was initially developed by technoweenie (Rick
Olson), then jdub (Jeff Waugh), and is now maintained AvianFlu (Charlie
McConnell). All the authors have done an amazing job of abstracting the
complexity of interacting with Twitter’s API to make it trivial to get data and
do things with it. You continue to use Express in this hour, so the
package.json file for the application will include the Express and ntwitter
modules.
{ "name":"socket.io-twitter-example", "version":"0.0.1", "private":true, "dependencies":{
"express":"2.5.4", "ntwitter":"0.2.10" } }
If you
requested these when you were setting up the application in the Twitter
Developers web-site, these will be available on the Details page for your
application. If you did not request them when you set up the application, you
need to do so now under the Details tab. Once you have the keys and secrets,
you can create a small Express server to connect to Twitter’s streaming API:
var app = require('express').createServer(), twitter = require('ntwitter');
app.listen(3000);
var twit = new twitter({ consumer_key: 'YOUR_CONSUMER_KEY', consumer_secret: 'YOUR_CONSUMER_SECRET', access_token_key: 'YOUR_ACCESS_TOKEN_KEY', access_token_secret: 'YOUR_ACCESS_TOKEN_KEY'
});
Of course, you need to remember to replace the values in the
example with your actual values. This is all you need to start interacting with
Twitter’s API! In this example, you answer the question, "Is there more love or
hate in the world?" by using real-time data from Twitter. You request tweets
from Twitter’s streaming API that mention the words "love" or "hate" and
perform a small amount of analysis on the data to answer the question. The
ntwitter module makes it easy to request this data:
twit.stream('statuses/filter', { track: ['love',
'hate'] }, function(stream) { stream.on('data',
function (data) { console.log(data); }); });
This
requests data from the 'statuses/filter' endpoint that allows
developers to track tweets by keyword, location, or specific users. In this
case, we are interested in the keywords 'love' and 'hate'. The Express server opens a connection to the API server and
listens for new data being received. Whenever a new data item is received, it
writes the data to the console. In other words, you can see the stream live for
the keywords "love" and "hate" in the terminal.
Figure 14.4 - Streaming data to the terminal
Extracting Meaning from the Data
So far, you
created a way to retrieve data in real-time from Twitter, and you saw a
terminal window move very fast with a lot of data. This is good, but in terms
of being able to understand the data, you are not able to answer the question
set. To work toward this, you need to be able to parse the tweets received and
extract information. Twitter provides data in JSON, a subset of JavaScript, and
this is great news for using it with Node.js. For each response, you can simply
use dot notation to retrieve the data that you are interested in. So, if you
wanted to view the screen name of the user along with the tweet, this can be easily
achieved:
twit.stream('statuses/filter', { track: ['love', 'hate'] },
function(stream) { stream.on('data', function (data) {
console.log(data.user.screen_name + ': ' + data.text); }); });
Full documentation on the structure of the data received from
Twitter is available on the docu mentation for the status element. This can be viewed online https://dev.twitter.com/docs/api/1/get/statuses/show/%3Aid.
Under the section "Example Request," you can see the data structure for a status response. Using dot notation on the data
object returned from Twitter, you are able to access any of these data points. For example, if you
want the URL for the user, you can use data.user.url. Here is the full data available for the
user who posted the tweet:
"user": {
"profile_sidebar_border_color": "eeeeee",
"profile_background_tile": true,
"profile_sidebar_fill_color": "efefef",
"name": "Eoin McMillan ",
"profile_image_url": "http://a1.twimg.com/profile_images/1380912173/Screen_
shot_2011-06-03_at_7.35.36_PM_normal.png", "created_at": "Mon May 16 20:07:59 +0000 2011", "location": "Twitter", "profile_link_color": "009999", "follow_request_sent": null, "is_translator": false, "id_str": "299862462", "favourites_count": 0, "default_profile": false, "url": "http://www.eoin.me", "contributors_enabled": false, "id": 299862462, "utc_offset": null, "profile_image_url_https": "https://si0.twimg.com/profile_images/1380912173/
Screen_shot_2011-06-03_at_7.35.36_PM_normal.png", "profile_use_background_image": true, "listed_count": 0, "followers_count": 9, "lang": "en", "profile_text_color": "333333", "protected": false, "profile_background_image_url_https": "https://si0.twimg.com/images/themes/
theme14/bg.gif", "description": "Eoin's photography account. See @mceoin for tweets.", "geo_enabled": false, "verified": false,
"profile_background_color": "131516",
"time_zone": null,
"notifications": null,
"statuses_count": 255,
"friends_count": 0,
"default_profile_image": false,
"profile_background_image_url": "http://a1.twimg.com/images/themes/theme14/bg.gif",
"screen_name": "imeoin", "following": null, "show_all_inline_media": false
}
There is much more information available with each response
including geographic coordinates, whether the tweet was retweeted, and more.
Pushing Data to the Browser
Now that data from Twitter is in a more digestible format, you
can push this data out to connected browsers using Socket.IO and use some
client-side JavaScript to display the tweets. This is similar to the patterns
that you saw in Hours 12 and 13 where data is received by a Socket.IO server
and then broadcast to connected clients. To use Socket.IO, it must first be
added as a dependency in the package.json file:
{ "name":"socket.io-twitter-example", "version":"0.0.1", "private":true, "dependencies":{
"express":"2.5.4",
"ntwitter":"0.2.10",
"socket.io":"0.8.7"
} }
Then, Socket.IO must be required in the main server file and
instructed to listen to the Express server. This is exactly the same as the
examples that you worked through in Hours 12 and 13:
var app = require('express').createServer(), twitter = require('ntwitter'),
io = require('socket.IO').listen(app);
The streaming API request can now be augmented to push the data
out to any connected Socket.IO clients whenever a new data event is received:
twit.stream('statuses/filter', { track: ['love', 'hate'] },
function(stream) { stream.on('data', function (data) {
io.sockets.volatile.emit('tweet', {
user: data.user.screen_name,
text: data.text
}); }); });
Instead of logging the data to the console, you are now doing
something useful with the data by pushing it out to connected clients. A simple
JSON structure is created to hold the name of the user and the tweet. If you
want to send more information to the browser, you could simply extend the JSON
object to hold other attributes.
You may have
noticed that, instead of using io.sockets.emit as you did
in Hours 12 and 13, you are now using io.sockets.volatile.emit. This is an additional method provided by Socket.IO for
scenarios where certain messages can be dropped. This may be down to network
issues or a user being in the middle of a request-response cycle. This is
particularly the case where high volumes of messages are being sent to clients.
By using the volatile method, you can ensure that your application will not suffer if
a certain client does not receive a message. In other words, it does not matter
whether a client does not receive a message.
The Express server is also instructed to serve a single HTML
page so that the data can be viewed in a browser.
app.get('/', function (req, res) { res.sendfile(__dirname + '/index.html'); });
On the client side (or browser), some simple client-side
JavaScript is added to the index.html file to listen for new tweets being sent to the browser and display
them to the user. The full HTML file is available in the example that follows:
<ul class="tweets"></ul> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></
script> <script src="http://www.codeproject.com/socket.io/socket.io.js"></script> <script>
var socket = io.connect();
jQuery(function ($) {
var tweetList = $('ul.tweets');
socket.on('tweet', function (data) {
tweetList .prepend('<li>' + data.user + ': ' + data.text + '</li>'); }); }); </script>
An empty unordered list is added to the DOM (Document Object
Model), and this is filled with a new list item containing the screen name of the user and the
tweet each time a new tweet is received. This uses jQuery’s prepend() method to insert data received into a list item within the unordered list. This has the effect of creating a stream on the
page.
Now, whenever Socket.IO pushes a new tweet event out the browser
receives it and writes it to the page immediately. Instead of viewing the stream of tweets
in a terminal, it can now be viewed in the browser!
Creating
a Real-Time Lovehateometer
Although the application can now stream tweets to a browser
window, it is still not very useful. It is still impossible to answer the
question of whether there is more love or hate in the world. To answer the
question, you need a way to visualize the data. Assuming that the tweets
received from the API are indicative of human sentiment, you set up several
counters on the server that increment when the words "love" and "hate" are
mentioned in the streaming data that is received. Furthermore, by maintaining
another counter for the total number of tweets with either love or hate in
them, you can calculate whether love or hate is mentioned more often. With this
approach, it is possible to say—in unscientific terms—that there is x% of love
and y% of hate in the world.
To be able to show data in the browser, you need counters on the
server to hold:
- The total number of tweets containing "love" or "hate"
- The total number of tweets containing "love"
- The total number of tweets containing "hate"
This
can be achieved by initializing variables and setting these counters to zero on
the Node.js server:
var app = require('express').createServer(), twitter = require('ntwitter'),
io = require('socket.io').listen(app), love = 0, hate = 0, total = 0;
Whenever new data is received from the API, the love counter
will be incremented if the word "love" is found and so on. JavaScript’s indexOf() string function can be used to look for words within a tweet
and provides a simple way to analyze the content of tweets:
twit.stream('statuses/filter', { track: ['love', 'hate'] },
function(stream) { stream.on('data', function (data) {
var text = data.text.toLowerCase();
if (text.indexOf('love') !== -1) {
love++
total++
}
if (text.indexOf('hate') !== -1) {
hate++
total++
} }); });
Because some tweets may contain both "love" and "hate," the
total is incremented each time a word is found. This means that the total counter represents the
total number of times "love" or "hate" was mentioned in a tweet rather than the total number of
tweets.
Now that the application is maintaining a count of the
occurrences of words this data can be added to the tweet emitter and pushed to connected clients in
real-time. Some simple calculation is also used to send the values as a percentage of the total
number of tweets:
io.sockets.volatile.emit('tweet', { user: data.user.screen_name,
text: data.text, love: (love/total)*100, hate: (hate/total)*100
});
On the client side, by using an unordered list and some
client-side JavaScript, the browser can receive the data and show it to users. Before any data is
received from the server, the values are set to zero:
<ul class="percentage">
<li class="love">0</li>
<li class="hate">0</li>
</ul>
Finally, a client-side listener can be added to receive the
tweet event and replace the percentage values with the ones received from the server. By starting the
server and opening the browser, you can now answer the question!
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script src="http://www.codeproject.com/socket.io/socket.io.js"></script> <script>
var socket = io.connect();
jQuery(function ($) {
var tweetList = $('ul.tweets'),
loveCounter = $('li.love'),
hateCounter = $('li.hate');
socket.on('tweet', function (data) { tweetList .prepend('<li>' + data.user + ': ' +
data.text + '</li>'); loveCounter .text(data.love + '%');
hateCounter .text(data.hate + '%'); }); }); </script>
Adding a Real-Time Graph
The application is now able to answer the question. Hurray! In
terms of visualization, though, it is still just data. It would be great if the application
could generate a small bar graph that moved dynamically based on the data received. The server is
already sending this data to the browser so this can be implemented entirely using client-side
JavaScript and some CSS. The application has an unordered list containing the percentages,
and this is perfect to create a simple bar graph. The unordered list will be amended slightly so
that it is easier to style. The only addition here is to wrap the number in a span tag:
<ul class="percentage"> <li class="love">
<span>0</span>
</li>
<li class="hate">
<span>0</span> </li> </ul>
Some CSS can then be added to the head of the HTML document that
makes the unordered list look like a bar graph. The list items represent the bars with
colors of pink to represent love and black to represent hate:
<style>
ul.percentage { width: 100% } ul.percentage li { display: block; width: 0 }
ul.percentage li span { float: right; display: block} ul.percentage li.love {
background: #ff0066; color: #fff} ul.percentage li.hate { background: #000;
color: #fff}
</style>
Finally, some client-side JavaScript allows the bars (the list
items) to be resized dynamically based on the percentage values received from the server:
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script src="http://www.codeproject.com/socket.io/socket.io.js"></script> <script>
var socket = io.connect();
jQuery(function ($) {
var tweetList = $('ul.tweets'),
loveCounter = $('li.love'),
hateCounter = $('li.hate'),
loveCounterPercentage = $('li.love span'),
hateCounterPercentage = $('li.hate span');
socket.on('tweet', function (data) {
loveCounter
.css("width", data.love + '%'); loveCounterPercentage .text(Math.round(data.love * 10) / 10 + '%');
hateCounter .css("width", data.hate + '%'); hateCounterPercentage .text(Math.round(data.hate * 10) / 10 + '%');
tweetList .prepend('<li>' + data.user + ': ' + data.text + '</li>'); }); }); </script>
Whenever a
new tweet event is received from Socket.IO, the bar graph is updated by
dynamically setting the CSS width of the list items with the percentage values
received from the server. This has the effect of adjusting the graph each time
a new tweet event is received. You have created a real-time graph!
The application that you created provides a visual
representation of whether there is more love than hate in the world based on real-time
data from Twitter. Granted this is totally unscientific, but it does showcase
the capabilities of Node.js and Socket.IO to receive large amounts of data and
to push it out to the browser. With a little more CSS work, the application can
be styled to look better (see Figure 14.9).
Figure 14.9 - The finished
application with additional styling
If you want to run this example yourself, this version is
available in the code for this book as hour14/example06.
Summary
In this hour, you answered a fundamental question about human
nature using Node.js, Twitter, and Socket.IO. Not bad for an hour’s work! At
the time of writing, there is more love in the world, so if you take nothing
else from this hour, rejoice! You learned how a Node.js server can receive
large amounts of data from a third-party service and push it out to the browser
in real-time using Socket.IO. You saw how to manipulate the data to extract
meaning from it and performed simple calculations on the data to extract percentage
values. Finally, you added some client-side JavaScript to receive the data and
create a real-time graph. This hour showcased many of the strengths of Node.js,
including the ease that data can be sent between the server and browser, the
ability to process large amounts of data, and the strong support for
networking.
Q&A
Q. Are there other streaming APIs that I can use to create
applications like this?
A. yes. An increasing number of streaming
APIs is becoming available to developers. At the time of writing, some APIs of
interest include Campfire, Salesforce, Datasift, and Apigee, with many more
expected to be created.
Q. How accurate is this data?
A. Not very. This data is based on the
"statuses/filter" method from Twitter’s streaming API. More information about
what goes into this feed is available here https://dev.twitter.com/ docs/streaming-api/methods. In
short, do not base any anthropological studies on it.
Q. Can I save this data somewhere?
A. The application created in this hour
does not persist data anywhere, so if the server is stopped, the counters and
percentages are reset. Clearly, the longer that data can be collected, the more
accurate the results. The application could be extended to store the counters
with a data store that can handle high volumes of writes like redis. This is
outside the scope of this hour, though!
Workshop
This workshop contains quiz questions and exercises to help
cement your learning in this hour.