One of the things that we wanted to do after blogging about Gigabit File uploads with Node.js was to see how we could improve the performance of the application. In the previous version of the application the code that was written was mostly synchronous and as a result of that we had high CPU usage, did quite a lot of I/O, and used up a fair amount of memory. All in all what was created had more to do with demonstrating the concept of how to do the Gigabit File uploads over HTTP rather than for performance.
Now that we have established the concept it is now time to see how the application's performance can be improved.
The Performance Tuning
The areas that we want to look at to address the Gigabit File upload performance are:
- Implementing a reverse proxy server in front of the Node.js server.
- Offloading the file upload requests to the reverse proxy.
- Converting the MergeAll blocking synchronous code to non-blocking asynchronous code
- Creating an API for each backend request. As it is now the UploadChunk API call is used to manage all uploads.
- Removing the checksum calculation from the MergeAll API call. A GetChecksum API will be created to calculate the checksum of the uploaded file.
The performance testing was conducted on a Centos 7 virtual machine running NGINX version 1.9.9. and Node.js version 5.3.0. This is a departure from our previous blog post, because that work was done on a Windows 2012 platform.
The Reverse Proxy
Node.js allows you to build fast, scalable network applications capable of handling a huge number of simultaneous connections with high throughput. This means that from the very start Node.js is quite capable of handling the Gigabit File uploads.
So why would we want to use a reverse proxy in front of our Node.js server in this scenario? We want to do this because offloading the file handling to the NGINX web server will reduce the overhead on the Node.js backend and this should provide a performance boost. The following figure shows how this is achieved.
Figure 1 Offloading file upload to NGINX reverse proxy
- The client computer uploads the file chunks by calling the XFileName API. Once the NGINX reverse proxy sees a call to /api/CelerFTFileUpload/UploadChunk/XFileName it will save the file chunk to the NGINX private temporary directory, because we have enabled the NGINX client_body_in_file_only directive. The NGINX private temporary directory can be found under /tmp. This happens because in the NGINX systemd file the PrivateTmp configuration option is set to true. Please consult the systemd man pages for more information on the PrivateTmp configuration option.
- After the file chunk has been saved NGINX will set the X-File-Name header with the name of the file chunk. This will be sent to Node.js.
- Once all of the file chunks have been uploaded the client calls the MergeAll API and this is sent directly to Node.js by NGINX. Once Node.js receives the MergeAll request it will merge all of the uploaded file chunks to create the file.
- Once Node.js receives the X-File-Name header it will move the file chunk from the NGINX private temporary directory and save it to the file upload directory with the correct name.
We used the following NGINX configuration:
# redirect CelerFT
location = /api/CelerFTFileUpload/UploadChunk/XFileName {
aio on;
directio 10M;
client_body_temp_path /tmp/nginx 1;
client_body_in_file_only on;
client_body_buffer_size 10M;
client_max_body_size 60M;
proxy_pass_request_headers on;
proxy_set_body off;
proxy_redirect off;
proxy_ignore_client_abort on;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_set_header Host $host;
##proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-File-Name $request_body_file;
proxy_pass http:
# proxy_redirect default;
proxy_connect_timeout 600;
proxy_send_timeout 600;
proxy_read_timeout 600;
send_timeout 600;
access_log off;
error_log /var/log/nginx/nginx.upload.error.log;
}
The key parameter is the X-File-Name header which is set to the name of the file. The Node.js backend has to then process the individual chunks. The crucial part of the code is to find out where the NGINX private temporary directory is created, because this is where NGINX will write the file chunks. Under systemd the NGINX private temporary directory will have a different name each time NGINX is restarted and so we have to get the name of that directory before we can move the file chunk to the final destination.
app.post('*/api/CelerFTFileUpload/UploadChunk/XFileName*', function (request, response) {
if (request.headers['x-file-name']) {
var temp_dir = fs.readdirSync('/tmp');
var nginx_temp_dir = [];
for (var i = 0; i < temp_dir.length; i++) {
if (temp_dir[i].match('nginx.service')) {
nginx_temp_dir.push(temp_dir[i]);
}
}
var temp_path = '/tmp/' + nginx_temp_dir[0] + request.headers['x-file-name'];
fs.move(temp_path , response.locals.localfilepath, {}, function (err) {
if (err) {
response.status(500).send(err);
return;
}
response.status(200).send(response.locals.localfilepath);
response.end();
});
}
});
The MergeAll Asynchronous API
In the previous blog post we used the fs.readdirSync
and the fs.readfileSync
function calls quite extensively. The fs.readdirSync
was called each time we needed to check whether or not we had uploaded all of the file chunks. The fs.readfileSync
was called when we merged all of the uploaded file chunks to create the file.
Each of those function calls are synchronous calls and caused the MergeAll API to block each time they had to be called. The getfilesWithExtensionName
function that was being called in the MergeAll API was replaced with a fs.readdir
function call that is used to check that we have uploaded all of the file chunks.
The getfilesWithExtensionName
function.
function getfilesWithExtensionName(dir, ext) {
var matchingfiles = [];
if (fs.ensureDirSync(dir)) {
return matchingfiles;
}
var files = fs.readdirSync(dir);
for (var i = 0; i < files.length; i++) {
if (path.extname(files[i]) === '.' + ext) {
matchingfiles.push(files[i]);
}
}
return matchingfiles;
}
The MergeAll API was written to use the fs.readdir
function to check if we have uploaded all of the file chunks. In each call to fs.readdir we populate the an array named fileslist with the filenames. Once we have uploaded all of the file chunks we populate an array named files with all of the file names as shown.
for (var i = 0; i < fileslist.length; i++) {
if (path.extname(fileslist[i]) == '.tmp') {
files.push(fileslist[i]);
}
}
The next thing that is done is to use the fs.createWriteStream
to create the output file.
var outputFile = fs.createWriteStream(filename);
We then used a recursive function named mergefiles to merge the file chunks into the final output file. In the mergefiles function we use fs.createReadStream
to read each file in the files array and write them to the output file. The mergefiles
function is called with the index set to 0, and after each successful call to fs.createReadStream
we increment the index.
var index = 0;
var mergefiles = function (index) {
if (index == files.length) {
outputFile.end();
return;
}
console.log(files[index]);
var rstream = fs.createReadStream(localFilePath + '/' + files[index]);
rstream.on('data', function (data) {
outputFile.write(data);
});
rstream.on('end', function () {
mergefiles(index + 1);
});
rstream.on('close', function () {
fs.removeSync(localFilePath + '/' + files[index]);
});
rstream.on('error', function (err) {
console.log('Error in file merge - ' + err);
response.status(500).send(err);
return;
});
};
mergefiles(index);
The complete code for the MergeAll API call.
app.get('*/api/CelerFTFileUpload/MergeAll*', function (request, response) {
if (request.method == 'GET') {
var extension = path.extname(request.param('filename'));
var baseFilename = path.basename(request.param('filename'), extension);
var localFilePath = uploadpath + request.param('directoryname') + '/' + baseFilename;
var filename = localFilePath + '/' + baseFilename + extension;
var files = [];
fs.readdir(localFilePath, function (error, fileslist) {
if (error) {
response.status(400).send('Number of file chunks less than total count');
console.log(error);
return;
}
if ((fileslist.length) != request.param('numberOfChunks')) {
response.status(400).send('Number of file chunks less than total count');
return;
}
if ((fileslist.length) == request.param('numberOfChunks')) {
for (var i = 0; i < fileslist.length; i++) {
if (path.extname(fileslist[i]) == '.tmp') {
files.push(fileslist[i]);
}
}
if (files.length != request.param('numberOfChunks')) {
response.status(400).send('Number of file chunks less than total count');
return;
}
var outputFile = fs.createWriteStream(filename);
outputFile.on('finish', function () {
console.log('file has been written ' + filename);
var newfilename = uploadpath + request.param('directoryname') + '/' + baseFilename + extension;
fs.move(filename, newfilename , {}, function (err) {
if (err) {
console.log(err);
response.status(500).send(err);
return;
}
else {
fs.remove(localFilePath, function (err) {
if (err) {
response.status(500).send(err);
return;
}
response.status(200).send('Sucessfully merged file ' + filename);
});
}
});
});
var index = 0;
var mergefiles = function (index) {
if (index == files.length) {
outputFile.end();
return;
}
console.log(files[index]);
var rstream = fs.createReadStream(localFilePath + '/' + files[index]);
rstream.on('data', function (data) {
outputFile.write(data);
});
rstream.on('end', function () {
mergefiles(index + 1);
});
rstream.on('close', function () {
fs.removeSync(localFilePath + '/' + files[index]);
});
rstream.on('error', function (err) {
console.log('Error in file merge - ' + err);
response.status(500).send(err);
return;
});
};
mergefiles(index);
}
});
}
});
Other Improvements
As mentioned the other thing that we did was to create an API call for each type of file upload that is supported by CelerFT.
- The Base64 API call will handle uploads in which the CelerFT-Encoded header is set to base64
- The FormData API call will handle all multipart/form-data uploads.
- The XFileName API call will be used to offload file uploads to the NGINX reverse proxy.
The preliminary tests showed marked improvements in the performance of the backend server during the file uploads. Please feel free to download CelerFT and provide feedback on its performance.
The code for this project can be found at my github repository under the nginxasync branch.