Node.js

by
Nathan Tippy, Principal Software Engineer
Object Computing, Inc. (OCI)

Introduction

Node has grown very quickly in popularity due to its excellent performance and ease of use.  It introduces event-driven techniques commonly found in desktop applications to the world of hosted server applications.  Google's V8 JavaScript engine is core to the Node framework and the use of JavaScript as the primary language for Node has provided it with a large community of potential developers already familiar with the language.  This low learning curve in addition to its excellent performance has contributed greatly to its fast growth.

What is 'Event-Driven'?

As with most things, "there is nothing new under the sun", so let's look at a real world example.  Many streamlined businesses are way ahead of us when it comes to applying event-driven techniques.  After putting in my order at Starbucks, the cashier informs the barista what to make.  At this point I step aside and wait while the barista makes my drink and the cashier takes the next order.  Once my drink is ready they call my name to come pick it up.  This strategy allows them to service many customers at the same time with one cashier.  Now imagine what would happen if I did not step aside until my drink was complete.  Every customer would have to wait for the full duration of all previous transactions before beginning their own.  This would force them to spend a lot of time standing in line.  The event-driven process enables us to quickly place an order, sit, relax, and do other tasks while we wait to be called.  This also allows Starbucks to maximize profits by minimizing the number of cashiers they need to hire.

Internally, Node keeps an event loop of functions to be completed in sequence. Hence, the code written by you never has to deal with the complexities of multi-threading.  In addition to this, however, Node has something called "non-blocking IO", where calls to IO are run on another thread, and when the work is completed the callback function is put in the event loop.  In this way Node can be servicing many IO operations at once while still remaining responsive to events not requiring any IO.  This makes a dramatic improvement to the performance because typical IO calls are very slow when compared to in-memory operations.

Note that this callback mechanism will 'loose' the stack trace containing what function called for the initial IO request.  This can be problematic when debugging and work is being done to address this soon. However, this also keeps the stack from getting very deep.  Shorter stacks are very beneficial because recursive algorithms can be taken advantage of in places where they would traditionally produce a stack overflow.  However, caution must be used when doing this because some callbacks, especially those not making IO requests, will not make use of the event loop and therefore will add to the stack growth.  

Install node.js and npm

Node was developed on the Linux platform, and functions best there, but it also works on Windows and OS X. The examples here were run on Ubuntu 11.10 but they should work without much effort on the other platforms. Installers for Windows and OS X can be downloaded from the main site http://nodejs.org/.

Unfortunately, the Ubuntu repository does not keep up with the latest releases due to rapid growth surrounding Node.  You can download the Node source from http://nodejs.org/. However, to simplify the install process for the latest stable version on Ubuntu just run the following:

sudo apt-get install python-software-properties
sudo add-apt-repository ppa:chris-lea/node.js
sudo apt-get update
sudo apt-get install nodejs
sudo apt-get install npm

Notice that the last step installed the Node package manager.  Node is capable of supporting many kinds of network applications, but it is just a platform.  Most of the heavy lifting is done by a rich set of community supported modules.  Because of the large number of developers familiar with JavaScript the list of useful modules has grown very quickly.  To help in finding that perfect module to simplify your task and brighten your day they have put together a cool NPM search tool.

The examples below will use the following modules.  Take some time to visit their sites.  These modules have a lot to offer that we do not have time to demonstrate here.

 
Use npm to install the modules we need for our simple examples.

npm install mongodb --mongodb:native
npm install socket.io
npm install async
npm install optimist
npm install node-static

The extra argument for installing the mongodb driver demonstrates that modules are not restricted to JavaScript only. In this case the mongodb driver has its own parser in C++ which can be compiled to the native platform for better performance.  We will not be using this in the example because the JavaScript implementation turns out to be very performant.   In general, modules should stick to JavaScript for simplicity and cross-platform support unless there is a compelling reason to jump out into native code.

Start up a test database

Download the appropriate production release of MongoDB for your platform and extract it into your home folder ~

http://www.mongodb.org/downloads

At the ~$ terminal enter the following to start up the database.

cd mongo*
mkdir data
./bin/mongod --dbpath ./data

the database server is ready for use when you see

admin web console waiting for connections on port 28017

If you need to stop the server press ctrl-c. Once the server has stopped, delete all the files in the ./data folder to start again fresh.

Execute the writing documents example

Review the writeExample.js file below. Its usage is very simple.

node writeExample.js [--async] [--forever]

Each call to the writeExample.js script will write 100000 random documents to the database.  If the --forever argument is passed, it will repeat until ctrl-c is pressed.  This is used as a source of ongoing events for the rest of the example.  MongoDB supports capped collections which have a fixed size and write over the oldest documents when full.  These are commonly used for logs. However, it will also work well for the simple example here.

Without the --async argument the example will synchronously write to the database by waiting for each write to complete before starting the next.  With the --async argument it fires off all the document writes without waiting for any response.

Reviewing the write document example

Applications written in Node may be difficult to follow at first, but with more experience they become straightforward.  In traditional applications the sequential nature of the program flows from the top down.  This is still true in Node. However, the convention is to make use of callbacks to continue processing after requesting any IO operations.

Node makes use of require('<module name>') to load needed modules into scope.  For example, the optimist module greatly simplifies command line arguments parsing by converting the arguments directly into properties.  Notice how the argv object is used on line 54 after the arguments are converted on line 15.

This example must write a fixed number of documents into a test database but it does not make use of for, each, while or any other traditional loop constructs.  Instead callbacks are used and situations that require sequential processing are easily implemented by the helper methods from the async module.  The async module makes use of the event queue within the Node framework to break work up and prevent event queue blocking.

MongoDB can support both synchronous and asynchronous writes.  The write example makes use of both of these in the insertOne() function.  In the async side of the conditional, the write is sent to the database, but insertOne() is not directly called again.  Instead, it's passed into the process.nextTick() function to be called upon during the next pass of the event queue.  If the code had made a traditional recursive call with the large number of documents to be inserted it would have resulted in a stack overflow.  The use of process.nextTick() allows all asynchronous calls in the event queue to be processed before the next call to insertOne() is processed.

The second half of the conditional could have implemented a simple call to insertOne() after collection.insert(). However, this example is a little more interesting.  In this part, async.parallel() is used to generate the next random document and update the progress bar while waiting on the IO to complete.  The async.parallel() calls its callback only when all the other functions have each called their callback.  In this way our example can take advantage of background processes without having to write a complex, multi-threaded implementation.

'use strict';

var mongo = require('mongodb'), // driver from https://github.com/christkv/node-mongodb-native
    server = new mongo.Server('127.0.0.1', 27017, {}),
    db = new mongo.Db('testDB', server, {native_parser: false}),
    TOTAL_DOCS = 100000, count = 0, progress = 0, startTime = new Date(), nextDoc,
    async = require('async'),
    progressTitle = " 0% _ _ _ _ _ _ _ _ _ _ _ _ 25% _ _ _ _ _ _ _ _ _ _ _ _ _ 50% _ _ _ _ _ _ _ _ _ _ _ _ _ 75%_ _ _ _ _ _ _ _ _ _ _ _ 100%",
    progressFactor = TOTAL_DOCS / progressTitle.length,
    MAX_PARKING_FLOORS = 12, MAX_PARKING_AISLES = 30, MAX_PARKING_ROWS = 50, MAX_SCORE = 100,
    data = {
        names: [ 'Bob', 'Chuck', 'Mike', 'Moe', 'Larry', 'Curly', 'Shemp', 'Rick', 'Don', 'Sam' ],
        foods: [ 'pizza', 'tacos', 'ice cream', 'steak', 'fried chicken' ]
    },
    argv = require('optimist').usage('Usage: node writeExample.js [--async] [--forever]').argv;

// randomly returns true or false
var randomBoolean = function () {
    return Math.random() > 0.5;
};

var updateProgress = function () {
    while (progress < (count / progressFactor)) {
        process.stdout.write('=');
        progress = progress + 1;
    }
};

var randomDocument = function () {
    // random birthday, food list, course score and parking location
    var rnd1 = Math.random(), rnd2 = Math.random(), rnd3 = Math.random(),
        personName = data.names[count % data.names.length] + '_' + count,
        bDate = new Date(1900 + (rnd1 * 100), rnd2 * 12, rnd3 * 28),
        favFoods = data.foods.filter(randomBoolean), artCourse = {
            subject: "art",
            finalscore: rnd3 * MAX_SCORE
        }, parkingSpot = {
            row: Math.round(rnd1 * MAX_PARKING_ROWS),
            aisle: Math.round(rnd2 * MAX_PARKING_AISLES),
            floor: Math.round(rnd3 * MAX_PARKING_FLOORS)
        };
    return {
        firstname: personName,
        course: artCourse,
        parking: parkingSpot,
        birthday: bDate,
        food: favFoods
    };
};

var wrapUp;
var insertOne = function (collection) {

    if (argv.async) {
        //look ma no loop!
        count = count + 1;
        updateProgress();
        collection.insert(randomDocument());//fire, forget and hope
        //direct recursion in node is not recommended due to stack size limitations
        //and it would prevent other events from getting processed
        //nextTick however puts this function on the end of the even queue to be called
        //after the next pass.
        process.nextTick(function () {
            if (count < TOTAL_DOCS) {
                insertOne(collection);
            } else {
                wrapUp(collection);
            }
        });

    } else {
        //callback event from the insert triggers the next

        async.parallel([ function (callback) {
            collection.insert(nextDoc,
                //the call back will not be called until new doc is saved to disk
                {
                    safe: true
                }, callback);
        },
            //because we are waiting on the write we can get this work done early
            function (callback) {
                count = count + 1;
                updateProgress();
                nextDoc = randomDocument();
                callback();
            } ],
            //only called when all the functions in the above array have called their callback
            function () {
                if (count < TOTAL_DOCS) {
                    insertOne(collection);
                } else {
                    wrapUp(collection);
                }
            });
    }
};

var writeData = function (collection) {
    count = 0;
    progress = 0;
    startTime = new Date();
    collection.count(function (err, originalCount) {
        console.log("");
        console.log("total docs before insert:" + originalCount);
        console.log((argv.async ? "now starting %d async writes"
            : "now starting %d synchronous writes"), TOTAL_DOCS);
        console.log(progressTitle);
        insertOne(collection);
    });
};

wrapUp = function (collection) {
    process.stdout.write('\n');
    collection.count(function (err, finalCount) {
        console.log("total docs after insert:" + finalCount);

        var sentDocsPerMs = count / (new Date().getTime() - startTime.getTime());
        console.log("sent %d documents at %d docs/ms to the server",
            TOTAL_DOCS, sentDocsPerMs);

        if (argv.forever) {
            //this will be repeated forever but we do not need to worry about stack overflow
            //wrapup() is always used as in a callback where it is known that the previous stack is lost
            writeData(collection);
        } else {
            //shutdown
            db.close();
        }
    });
};

// calls function when database is ready
db.open(function (err) {
    if (err) {
        console.warn(err.messsage);
        process.exit(1);
    }
    //will not create new collections when referenced
    db.strict = true;

    db.collection('testCollection', function (err, collection) {
        nextDoc = randomDocument();
        if (err) {
            console.warn("collection not found, building...");
            //only create if it was not already found
            db.strict = false;

            db.createCollection('testCollection', {
                    'capped': true,
                    'size': 1073741824 //don't need more than 1G for this
                },
                function (err, collection) {
                    if (err) {
                        console.warn("unable to create collection:" + err.message);
                        return;
                    }
                    console.warn("created new capped collection");
                    collection.createIndex([
                        [ '_id' ],
                        [ '_id', 1 ]
                    ],
                        function () {
                            writeData(collection);
                        });

                });
        } else {
            db.strict = false;
            writeData(collection);
        }
    });
});//open

Execute the tail example

Review the tailExample.js file below. Its usage is also simple.

node tailExample.js [--server]

This small script makes use of MongoDBs tailing feature to 'stream' all the newly written documents to a client indefinitely.  Without the --server argument the script will simply write all the received documents to the console.  With the --server argument  it will open a web server on the local host so a browser can hit http://127.0.0.1:8081 and watch data flicker by as the documents per second are tracked.  Note that in order for the server version to work there must be a folder named www in the same folder with the tailExample.js file.  The www folder must contain the example index.html file.

Reviewing the trailing example

MongoDB supports a feature on capped collections called a tailing cursor.  When using a traditional cursor the caller will receive a fixed set of results corresponding to the query.  When using a tailing cursor it will continue to return new documents that pass the criteria even if they were not in the collection when the cursor was first opened.  In this way we can use MongoDB to "push" document events into our example.

The sendData() method is connecting to the database and finding the last document and then building a tailing cursor for all newer documents.  The query restricts the results to the firstname and birthday fields just to demonstrate how it can be done.  The code looks very similar to the write example in that the execution flow follows the callbacks.  It is a good practice in Node to use the callback convention whenever apropriate as was done by the sendData() function signature.  Once the cursor is primed and ready to go it is sent back to the caller.  This helps separate concerns in such a way that the database work will not be tightly integrated with the consumer of these documents.

Lines 74-85 start up the server instance.  Ignoring line 82 that requires socket.io the rest of this block starts up a simple HTML server that will serve any files put into the www folder.  This is all done by the node-static module which additionally has an in-memory cache for these files so it is very performant.  The socket.io module on line 82 is worth pointing out because it is listening to all the requests coming into the server and can respond as well.  This is not for sending our data but rather it is for sending the socket.io client-side module down to the HTML page.  The index.html file contains only this element for fetching the the required client-side script.

<script src="http://localhost:8081/socket.io/socket.io.js"></script>

No other files besides index.html were put into the www example folder,  this is often confusing for new users of the socket.io module.  The server-side module watches for the request socket.io/socket.io.js and replies with the the right script file.  Once both sides have the right code, socket.io will let us open a connection suitable for pushing documents to the client.  Socket.io supports web sockets but it also supports graceful fallback to older communication technologies based on what the browser can support.  All this complexity is hidden so the only thing the server needs to do is define event handlers for 'connection', 'disconnect', and to emit messages on lines 87, 92 and 105.

The example server can easily support many concurrent browser connections.  Each time a new 'connection' event is received, the socket for that new browser is added to the array of clients.  When documents are received we could have looped over this array and emited documents to each socket, but that would have added an unnecessary blocking loop and caused the transmissions to be done in series.  On line 104 the async module is used once again to improve the performance by making better use of Nodes event loop.  The forEach function will call the emit function for all of the clients in parallel.  Once all the functions have called their respective callbacks, it will call its own callback which will use nextTick() to schedule the next call to nextObject().

At first glance it might appear that using nextTick() is not necessary. However, after testing, it appears that if there is only one function to be called by the forEach() method it calls back on the same stack context.  If that is allowed to happen it will likely cause a stack overflow.  This is not a unique situation. As I stated before; it is something you must watch for when developing for Node.  Something similar happens in the MongoDB driver when using unsafe writes.  The driver in that case immediately calls back and does so in the same stack context.  The nextTick() function is an excellent way to guard against this.

'use strict';

var mongo = require('mongodb'),
    server = new mongo.Server('127.0.0.1', 27017, {}),
    db = new mongo.Db('testDB', server),
    async = require('async'),
    argv = require('optimist').usage('Usage: node tailExample.js [--server]').argv;

var sendData = function (consumerCallback) {
    // calls function when database is ready
    db.open(function (err) {
        if (err) {
            console.warn(err.messsage);
            process.exit(1);
        }
        process.on('SIGINT', function () {
            db.close();
            console.info("stopped cursor because of interrupt");
            process.exit(0);
        });
        // calls function when the collection is ready
        db.collection('testCollection', function (err, collection) {
            console.log("starting tail...");
            if (err) {
                console.warn(err.messsage);
                process.exit(1);
            }
            // look for last document on startup
            collection.findOne({}, {
                'limit': 1,
                'sort': {
                    '$natural': -1
                }
            }, function (err, doc) {
                // now find all the new documents after the last one found
                var query = (!doc ? {} : {
                        '_id': {
                            '$gte': doc._id
                        }}),
                    cursor = collection.find(query, {
                            'firstname': 1,
                            'birthday': 1,
                            '_id': 1
                        },
                        {
                            'tailable': 1,
                            'sort': {
                                '$natural': 1
                            }
                        });
                // call for the first object before any connections are made
                // because the MongoDB cursor is lazily initalized
                cursor.nextObject(function (err, item) {
                    if (err) {
                        console.warn("Collection:" + err.message);
                        process.exit(1);
                    }

                    console.log("server is ready");
                    // note that we are not looping here because we want
                    // to provide the opportunity to take advantage of nodes
                    // event loop
                    consumerCallback(cursor);
                });
            });
        });// db.collection
    });// db.open
};

var nextObject;

if (argv.server) {

    var clients = [],
        ns = require('node-static'),
        http = require('http'),
        files = new ns.Server('./www'),
        app = http.createServer(function (request, response) {
            request.addListener('end', function () {
                files.serve(request, response);
            });
        }),
        io = require('socket.io').listen(app); // returns client side
    // socket.io.js upon request

    app.listen(8081);

    io.sockets.on('connection', function (socket) {
        console.log("connected");
        clients.push(socket);
    });

    io.sockets.on('disconnect', function (socket) {
        console.log("disconnected");
        clients.pull(clients.indexOf(socket));
    });

    nextObject = function (cursor) {
        cursor.nextObject(function (err, item) {
            if (err) {
                console.warn(err);
                process.exit(1);
            }
            if (item) {
                async.forEach(clients, function (socket, callback) {
                    socket.emit('JSON', item);
                    callback();
                }, function () {
                    process.nextTick(function () {
                        nextObject(cursor);
                    });
                });
            }
        });
    };
    io.set('log level', 1); // very verbose without this
    console.log('Server running at http://127.0.0.1:8081/');
} else {
    nextObject = function (cursor) {
        cursor.each(function (err, item) {
            if (err || !item) {
                console.warn(err);
                process.exit(1);
            }
            console.log(JSON.stringify(item));
        });
    };
}
sendData(nextObject);

This index.html file should be put into the .www folder which is relative to the tailExample.js file.

<!DOCTYPE html>
<html>
<head>
  <title>socket.io example</title>
  <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body>
  <div id="state"></div>
  <ul>
    <li>Id: <input id="_id" size="24"/></li>
    <li>Name: <input id="firstname" size="12"/></li>
    <li>Birthday: <input id="birthday" size="24"/></li>
    <li>Documents per ms: <input id="dpms" size="6"/></li>
  </ul>

  <script src="http://localhost:8081/socket.io/socket.io.js"></script>
  <script src="http://code.jquery.com/jquery-1.5.2.js"></script>
  <script type="text/javascript">
    var webSocket = io.connect('http://localhost:8081');

    webSocket.on('connect', function() {
      $('#state').replaceWith('<b>Connected to server.</b>');
    });

    var startTime;
    var count = 0;

    webSocket.on('JSON', function(obj) {

      count ++;
      $('#_id').val(obj._id);
      $('#firstname').val(obj.firstname);
      $('#birthday').val(obj.birthday);

      if (count%200==0) {
        if (!startTime) {
          startTime = new Date();
          count = 0;//we just started the timer now
        } else {
          //update the rate
          var receivedDocsPerMs = count/(new Date().getTime()-startTime.getTime());
          $('#dpms').val(receivedDocsPerMs);
        }
      }
    });

    webSocket.on('disconnect', function() {
      $('#state').replaceWith('<b>Disconnected from server.</b>');
    });
  </script>
</body>
</html>

Summary

Node has demonstrated surprising performance, ease of use, a devoted fan base, and good technical design.  All of these are good reasons for its ongoing future success and growth. However, Node may also be one of those innovations that happened at just the right time.  Many companies are running into the limits of their non-event-driven implementations and are looking for ways to support ever greater numbers of users with their existing hardware.  They are asking, "How can I do more with less?" The answer might just be node.js.

References

Home of node.js

NPM search utility

Home of MongoDB

SETT on MongoDB Performance, Durability and Consistency

Example files


Valid XHTML 1.0 Strict [Valid RSS]
RSS
Top