Node has grown very quickly in popularity due to its excellent performance and ease of use. It introduces event-driven techniques commonly found in desktop applications to the world of hosted server applications. Google's V8 JavaScript engine is core to the Node framework and the use of JavaScript as the primary language for Node has provided it with a large community of potential developers already familiar with the language. This low learning curve in addition to its excellent performance has contributed greatly to its fast growth.
As with most things, "there is nothing new under the sun", so let's look at a real world example. Many streamlined businesses are way ahead of us when it comes to applying event-driven techniques. After putting in my order at Starbucks, the cashier informs the barista what to make. At this point I step aside and wait while the barista makes my drink and the cashier takes the next order. Once my drink is ready they call my name to come pick it up. This strategy allows them to service many customers at the same time with one cashier. Now imagine what would happen if I did not step aside until my drink was complete. Every customer would have to wait for the full duration of all previous transactions before beginning their own. This would force them to spend a lot of time standing in line. The event-driven process enables us to quickly place an order, sit, relax, and do other tasks while we wait to be called. This also allows Starbucks to maximize profits by minimizing the number of cashiers they need to hire.
Internally, Node keeps an event loop of functions to be completed in sequence. Hence, the code written by you never has to deal with the complexities of multi-threading. In addition to this, however, Node has something called "non-blocking IO", where calls to IO are run on another thread, and when the work is completed the callback function is put in the event loop. In this way Node can be servicing many IO operations at once while still remaining responsive to events not requiring any IO. This makes a dramatic improvement to the performance because typical IO calls are very slow when compared to in-memory operations.
Note that this callback mechanism will 'loose' the stack trace containing what function called for the initial IO request. This can be problematic when debugging and work is being done to address this soon. However, this also keeps the stack from getting very deep. Shorter stacks are very beneficial because recursive algorithms can be taken advantage of in places where they would traditionally produce a stack overflow. However, caution must be used when doing this because some callbacks, especially those not making IO requests, will not make use of the event loop and therefore will add to the stack growth.
Node was developed on the Linux platform, and functions best there, but it also works on Windows and OS X. The examples here were run on Ubuntu 11.10 but they should work without much effort on the other platforms. Installers for Windows and OS X can be downloaded from the main site http://nodejs.org/.
Unfortunately, the Ubuntu repository does not keep up with the latest releases due to rapid growth surrounding Node. You can download the Node source from http://nodejs.org/. However, to simplify the install process for the latest stable version on Ubuntu just run the following:
sudo apt-get install python-software-properties
sudo add-apt-repository ppa:chris-lea/node.js
sudo apt-get update
sudo apt-get install nodejs
sudo apt-get install npm
Notice that the last step installed the Node package manager. Node is capable of supporting many kinds of network applications, but it is just a platform. Most of the heavy lifting is done by a rich set of community supported modules. Because of the large number of developers familiar with JavaScript the list of useful modules has grown very quickly. To help in finding that perfect module to simplify your task and brighten your day they have put together a cool NPM search tool.
The examples below will use the following modules. Take some time to visit their sites. These modules have a lot to offer that we do not have time to demonstrate here.
Use npm to install the modules we need for our simple examples.
npm install mongodb --mongodb:native
npm install socket.io
npm install async
npm install optimist
npm install node-static
The extra argument for installing the mongodb driver demonstrates that modules are not restricted to JavaScript only. In this case the mongodb driver has its own parser in C++ which can be compiled to the native platform for better performance. We will not be using this in the example because the JavaScript implementation turns out to be very performant. In general, modules should stick to JavaScript for simplicity and cross-platform support unless there is a compelling reason to jump out into native code.
Download the appropriate production release of MongoDB for your
platform and extract it into your home folder ~
http://www.mongodb.org/downloads
At the ~$ terminal enter the following to start up
the database.
cd mongo*
mkdir data
./bin/mongod --dbpath ./data
the database server is ready for use when you see
admin web console waiting for connections on port 28017
If you need to stop the server press ctrl-c. Once the server has
stopped, delete all the files in the ./data folder
to start again fresh.
Review the writeExample.js file below. Its usage is
very simple.
node writeExample.js [--async] [--forever]
Each call to the writeExample.js script will write
100000 random documents to the database. If the --forever
argument is passed, it will repeat until ctrl-c is pressed.
This is used as a source of ongoing events for the rest of the
example. MongoDB supports capped collections which have a
fixed size and write over the oldest documents when full.
These are commonly used for logs. However, it will also work well
for the simple example here.
Without the --async argument the example will
synchronously write to the database by waiting for each write to
complete before starting the next. With the --async
argument it fires off all the document writes without waiting for
any response.
Applications written in Node may be difficult to follow at first, but with more experience they become straightforward. In traditional applications the sequential nature of the program flows from the top down. This is still true in Node. However, the convention is to make use of callbacks to continue processing after requesting any IO operations.
Node makes use of require('<module name>') to
load needed modules into scope. For example, the optimist
module greatly simplifies command line arguments parsing by
converting the arguments directly into properties. Notice how
the argv object is used on line 54 after the arguments
are converted on line 15.
This example must write a fixed number of documents into a test
database but it does not make use of for, each, while
or any other traditional loop constructs. Instead callbacks
are used and situations that require sequential processing are
easily implemented by the helper methods from the async
module. The async module makes use of the event queue within
the Node framework to break work up and prevent event queue
blocking.
MongoDB can support both synchronous and asynchronous writes.
The write example makes use of both of these in the insertOne()
function. In the async side of the conditional, the write is
sent to the database, but insertOne() is not directly
called again. Instead, it's passed into the process.nextTick()
function to be called upon during the next pass of the event
queue. If the code had made a traditional recursive call with
the large number of documents to be inserted it would have resulted
in a stack overflow. The use of process.nextTick()
allows all asynchronous calls in the event queue to be processed
before the next call to insertOne() is processed.
The second half of the conditional could have implemented a simple
call to insertOne() after collection.insert().
However, this example is a little more interesting. In this
part, async.parallel() is used to generate the next
random document and update the progress bar while waiting on the IO
to complete. The async.parallel() calls its
callback only when all the other functions have each called their
callback. In this way our example can take advantage of
background processes without having to write a complex,
multi-threaded implementation.
'use strict';
var mongo = require('mongodb'), // driver from https://github.com/christkv/node-mongodb-native
server = new mongo.Server('127.0.0.1', 27017, {}),
db = new mongo.Db('testDB', server, {native_parser: false}),
TOTAL_DOCS = 100000, count = 0, progress = 0, startTime = new Date(), nextDoc,
async = require('async'),
progressTitle = " 0% _ _ _ _ _ _ _ _ _ _ _ _ 25% _ _ _ _ _ _ _ _ _ _ _ _ _ 50% _ _ _ _ _ _ _ _ _ _ _ _ _ 75%_ _ _ _ _ _ _ _ _ _ _ _ 100%",
progressFactor = TOTAL_DOCS / progressTitle.length,
MAX_PARKING_FLOORS = 12, MAX_PARKING_AISLES = 30, MAX_PARKING_ROWS = 50, MAX_SCORE = 100,
data = {
names: [ 'Bob', 'Chuck', 'Mike', 'Moe', 'Larry', 'Curly', 'Shemp', 'Rick', 'Don', 'Sam' ],
foods: [ 'pizza', 'tacos', 'ice cream', 'steak', 'fried chicken' ]
},
argv = require('optimist').usage('Usage: node writeExample.js [--async] [--forever]').argv;
// randomly returns true or false
var randomBoolean = function () {
return Math.random() > 0.5;
};
var updateProgress = function () {
while (progress < (count / progressFactor)) {
process.stdout.write('=');
progress = progress + 1;
}
};
var randomDocument = function () {
// random birthday, food list, course score and parking location
var rnd1 = Math.random(), rnd2 = Math.random(), rnd3 = Math.random(),
personName = data.names[count % data.names.length] + '_' + count,
bDate = new Date(1900 + (rnd1 * 100), rnd2 * 12, rnd3 * 28),
favFoods = data.foods.filter(randomBoolean), artCourse = {
subject: "art",
finalscore: rnd3 * MAX_SCORE
}, parkingSpot = {
row: Math.round(rnd1 * MAX_PARKING_ROWS),
aisle: Math.round(rnd2 * MAX_PARKING_AISLES),
floor: Math.round(rnd3 * MAX_PARKING_FLOORS)
};
return {
firstname: personName,
course: artCourse,
parking: parkingSpot,
birthday: bDate,
food: favFoods
};
};
var wrapUp;
var insertOne = function (collection) {
if (argv.async) {
//look ma no loop!
count = count + 1;
updateProgress();
collection.insert(randomDocument());//fire, forget and hope
//direct recursion in node is not recommended due to stack size limitations
//and it would prevent other events from getting processed
//nextTick however puts this function on the end of the even queue to be called
//after the next pass.
process.nextTick(function () {
if (count < TOTAL_DOCS) {
insertOne(collection);
} else {
wrapUp(collection);
}
});
} else {
//callback event from the insert triggers the next
async.parallel([ function (callback) {
collection.insert(nextDoc,
//the call back will not be called until new doc is saved to disk
{
safe: true
}, callback);
},
//because we are waiting on the write we can get this work done early
function (callback) {
count = count + 1;
updateProgress();
nextDoc = randomDocument();
callback();
} ],
//only called when all the functions in the above array have called their callback
function () {
if (count < TOTAL_DOCS) {
insertOne(collection);
} else {
wrapUp(collection);
}
});
}
};
var writeData = function (collection) {
count = 0;
progress = 0;
startTime = new Date();
collection.count(function (err, originalCount) {
console.log("");
console.log("total docs before insert:" + originalCount);
console.log((argv.async ? "now starting %d async writes"
: "now starting %d synchronous writes"), TOTAL_DOCS);
console.log(progressTitle);
insertOne(collection);
});
};
wrapUp = function (collection) {
process.stdout.write('\n');
collection.count(function (err, finalCount) {
console.log("total docs after insert:" + finalCount);
var sentDocsPerMs = count / (new Date().getTime() - startTime.getTime());
console.log("sent %d documents at %d docs/ms to the server",
TOTAL_DOCS, sentDocsPerMs);
if (argv.forever) {
//this will be repeated forever but we do not need to worry about stack overflow
//wrapup() is always used as in a callback where it is known that the previous stack is lost
writeData(collection);
} else {
//shutdown
db.close();
}
});
};
// calls function when database is ready
db.open(function (err) {
if (err) {
console.warn(err.messsage);
process.exit(1);
}
//will not create new collections when referenced
db.strict = true;
db.collection('testCollection', function (err, collection) {
nextDoc = randomDocument();
if (err) {
console.warn("collection not found, building...");
//only create if it was not already found
db.strict = false;
db.createCollection('testCollection', {
'capped': true,
'size': 1073741824 //don't need more than 1G for this
},
function (err, collection) {
if (err) {
console.warn("unable to create collection:" + err.message);
return;
}
console.warn("created new capped collection");
collection.createIndex([
[ '_id' ],
[ '_id', 1 ]
],
function () {
writeData(collection);
});
});
} else {
db.strict = false;
writeData(collection);
}
});
});//open
Review the tailExample.js file below. Its usage is
also simple.
node tailExample.js [--server]
This small script makes use of MongoDBs tailing feature to 'stream'
all the newly written documents to a client indefinitely.
Without the --server argument the script will simply
write all the received documents to the console. With the --server
argument it will open a web server on the local host so a
browser can hit http://127.0.0.1:8081 and watch data
flicker by as the documents per second are tracked. Note that
in order for the server version to work there must be a folder named
www in the same folder with the tailExample.js
file. The www folder must contain the example index.html
file.
MongoDB supports a feature on capped collections called a tailing cursor. When using a traditional cursor the caller will receive a fixed set of results corresponding to the query. When using a tailing cursor it will continue to return new documents that pass the criteria even if they were not in the collection when the cursor was first opened. In this way we can use MongoDB to "push" document events into our example.
The sendData() method is connecting to the database
and finding the last document and then building a tailing cursor for
all newer documents. The query restricts the results to the
firstname and birthday fields just to demonstrate how it can be
done. The code looks very similar to the write example in that
the execution flow follows the callbacks. It is a good
practice in Node to use the callback convention whenever apropriate
as was done by the sendData() function
signature. Once the cursor is primed and ready to go it is
sent back to the caller. This helps separate concerns in such
a way that the database work will not be tightly integrated with the
consumer of these documents.
Lines 74-85 start up the server instance. Ignoring line 82
that requires socket.io the rest of this block starts
up a simple HTML server that will serve any files put into the www
folder. This is all done by the node-static
module which additionally has an in-memory cache for these files so
it is very performant. The socket.io module on
line 82 is worth pointing out because it is listening to all the
requests coming into the server and can respond as well. This
is not for sending our data but rather it is for sending the socket.io
client-side module down to the HTML page. The index.html
file contains only this element for fetching the the required
client-side script.
<script src="http://localhost:8081/socket.io/socket.io.js"></script>
No other files besides index.html were put into the www
example folder, this is often confusing for new users of the socket.io
module. The server-side module watches for the request socket.io/socket.io.js
and replies with the the right script file. Once both sides
have the right code, socket.io will let us open a
connection suitable for pushing documents to the client. Socket.io
supports web sockets but it also supports graceful fallback to older
communication technologies based on what the browser can
support. All this complexity is hidden so the only thing the
server needs to do is define event handlers for 'connection',
'disconnect', and to emit messages on lines 87, 92 and
105.
The example server can easily support many concurrent browser
connections. Each time a new 'connection' event is received,
the socket for that new browser is added to the array of
clients. When documents are received we could have looped over
this array and emited documents to each socket, but that would have
added an unnecessary blocking loop and caused the transmissions to
be done in series. On line 104 the async module is used once
again to improve the performance by making better use of Nodes event
loop. The forEach function will call the emit
function for all of the clients in parallel. Once all the
functions have called their respective callbacks, it will call its
own callback which will use nextTick() to schedule the
next call to nextObject().
At first glance it might appear that using nextTick()
is not necessary. However, after testing, it appears that if there
is only one function to be called by the forEach()
method it calls back on the same stack context. If that is
allowed to happen it will likely cause a stack overflow. This
is not a unique situation. As I stated before; it is something you
must watch for when developing for Node. Something similar
happens in the MongoDB driver when using unsafe writes. The
driver in that case immediately calls back and does so in the same
stack context. The nextTick() function is an
excellent way to guard against this.
'use strict';
var mongo = require('mongodb'),
server = new mongo.Server('127.0.0.1', 27017, {}),
db = new mongo.Db('testDB', server),
async = require('async'),
argv = require('optimist').usage('Usage: node tailExample.js [--server]').argv;
var sendData = function (consumerCallback) {
// calls function when database is ready
db.open(function (err) {
if (err) {
console.warn(err.messsage);
process.exit(1);
}
process.on('SIGINT', function () {
db.close();
console.info("stopped cursor because of interrupt");
process.exit(0);
});
// calls function when the collection is ready
db.collection('testCollection', function (err, collection) {
console.log("starting tail...");
if (err) {
console.warn(err.messsage);
process.exit(1);
}
// look for last document on startup
collection.findOne({}, {
'limit': 1,
'sort': {
'$natural': -1
}
}, function (err, doc) {
// now find all the new documents after the last one found
var query = (!doc ? {} : {
'_id': {
'$gte': doc._id
}}),
cursor = collection.find(query, {
'firstname': 1,
'birthday': 1,
'_id': 1
},
{
'tailable': 1,
'sort': {
'$natural': 1
}
});
// call for the first object before any connections are made
// because the MongoDB cursor is lazily initalized
cursor.nextObject(function (err, item) {
if (err) {
console.warn("Collection:" + err.message);
process.exit(1);
}
console.log("server is ready");
// note that we are not looping here because we want
// to provide the opportunity to take advantage of nodes
// event loop
consumerCallback(cursor);
});
});
});// db.collection
});// db.open
};
var nextObject;
if (argv.server) {
var clients = [],
ns = require('node-static'),
http = require('http'),
files = new ns.Server('./www'),
app = http.createServer(function (request, response) {
request.addListener('end', function () {
files.serve(request, response);
});
}),
io = require('socket.io').listen(app); // returns client side
// socket.io.js upon request
app.listen(8081);
io.sockets.on('connection', function (socket) {
console.log("connected");
clients.push(socket);
});
io.sockets.on('disconnect', function (socket) {
console.log("disconnected");
clients.pull(clients.indexOf(socket));
});
nextObject = function (cursor) {
cursor.nextObject(function (err, item) {
if (err) {
console.warn(err);
process.exit(1);
}
if (item) {
async.forEach(clients, function (socket, callback) {
socket.emit('JSON', item);
callback();
}, function () {
process.nextTick(function () {
nextObject(cursor);
});
});
}
});
};
io.set('log level', 1); // very verbose without this
console.log('Server running at http://127.0.0.1:8081/');
} else {
nextObject = function (cursor) {
cursor.each(function (err, item) {
if (err || !item) {
console.warn(err);
process.exit(1);
}
console.log(JSON.stringify(item));
});
};
}
sendData(nextObject);
This index.html file should be put into the .www folder which is
relative to the tailExample.js file.
<!DOCTYPE html>
<html>
<head>
<title>socket.io example</title>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body>
<div id="state"></div>
<ul>
<li>Id: <input id="_id" size="24"/></li>
<li>Name: <input id="firstname" size="12"/></li>
<li>Birthday: <input id="birthday" size="24"/></li>
<li>Documents per ms: <input id="dpms" size="6"/></li>
</ul>
<script src="http://localhost:8081/socket.io/socket.io.js"></script>
<script src="http://code.jquery.com/jquery-1.5.2.js"></script>
<script type="text/javascript">
var webSocket = io.connect('http://localhost:8081');
webSocket.on('connect', function() {
$('#state').replaceWith('<b>Connected to server.</b>');
});
var startTime;
var count = 0;
webSocket.on('JSON', function(obj) {
count ++;
$('#_id').val(obj._id);
$('#firstname').val(obj.firstname);
$('#birthday').val(obj.birthday);
if (count%200==0) {
if (!startTime) {
startTime = new Date();
count = 0;//we just started the timer now
} else {
//update the rate
var receivedDocsPerMs = count/(new Date().getTime()-startTime.getTime());
$('#dpms').val(receivedDocsPerMs);
}
}
});
webSocket.on('disconnect', function() {
$('#state').replaceWith('<b>Disconnected from server.</b>');
});
</script>
</body>
</html>
Node has demonstrated surprising performance, ease of use, a devoted fan base, and good technical design. All of these are good reasons for its ongoing future success and growth. However, Node may also be one of those innovations that happened at just the right time. Many companies are running into the limits of their non-event-driven implementations and are looking for ways to support ever greater numbers of users with their existing hardware. They are asking, "How can I do more with less?" The answer might just be node.js.
Home of node.js
NPM search utility
Home of MongoDB
SETT on MongoDB Performance, Durability and Consistency
Example files