(untagged)

A Beginner's Guide to NoSQL

Software Developer's Journal

0.00/5 (No votes)

1 Aug 2013

This article aims to explore the basic ideas and principles about noSQL databases. noSQL caters to database admins, programmers, coders, web devs, etc.

This article is written by Sufyan bin Uzayr, and was originally published in the June 2013 issue of the Software Developer's Journal. You can find more articles at the SDJ website.

Introduction

This article aims to explore the basic ideas and principles about noSQL databases. noSQL caters to database admins, programmers, coders, web devs, etc.

Let’s say you’ve decided to set up a website or an application. You’ll obviously need something to manage the data. Yes, that’s right, a database. So, what is it going to be? MySQL, MS-SQL, Oracle or PostgreSQL? After all, nothing can be as amazing as a good old RDBMS that employs SQL to manage the data.

Well, allow me to introduce to you an entirely unique and unconventional Database model – NoSQL. Just like every other fine article out there, we too shall begin ith...eh....disclaimers!

NoSQL stands for not-only-SQL. The idea here is not to oppose SQL, but instead provide an alternative in terms of storage of data. Yet, for the obvious reason that most users are well versed with SQL, many NoSQL databases strive to provide an SQLlike query interface.

Why NoSQL?

That’s a valid question, indeed. Well, here are the reasons:

Managing Large Chunks of Data: NoSQL databases can easily handle numerous read/write cycles, several users and amounts of data ranging in petabytes.

Schema? Nah, not needed: Most NoSQL databases are devoid of schema and therefore very flexible. They provide great choices when it comes to constructing a schema and foster easy mapping of objects into them. Terms such as normalization and complex joins are, well, not needed!

Programmer-friendly: NoSQL databases provide simple APIs in every major programming language and therefore there is no need for complex ORM frameworks. And just in case APIs are not available for a particular programming language, data can still be accessed over HTTP via a simple RESTful API, using XML and/or JSON.

Availability: Most distributed NoSQL databases provide easy replication of data and failure of one node does not affect the availability of data in a major way.

Scalability: NoSQL databases do not require a dedicated high performance server. Actually, they can easily be run on a cluster of commodity hardware and scaling out is just as simple as adding a new node.

Low Latency: Unless you are running a cluster of a trillion data servers (or something like that, give or take a few million of them), NoSQL can help you achieve extremely low latency. Of course, latency in itself depends on the amount of data that can be successfully loaded into memory.

Triple stores save data in the form of subject-predicate-object with the predicate being the linking factor between subject and object. As such, Triple Scores too are variants of network databases. For instance, let’s say “Jonny Nitro reads Data Center Magazine.” In this case, Jonny Nitro is the subject, while Data Center Magazine is the object, and the term ‘reads’ acts as the predicate linking the subject with the object. Quite obviously, mapping such semantic queries into SQL will prove difficult, and therefore NoSQL offers a viable alternative. Some of the major implementations of Triple Stores are Sesame, Jena, Virtuoso, AllegroGraph, etc.

SQL ideology

Basically, NoSQL drops the traditional SQL ideology in favor of CAP Theorem or Brewer’s Theorem, formulated by Eric Brewer in 2000. the theorem talks about three basic principles of Consistency, Availability and Partition Tolerance (abbreviated as CAP), adding that a distributed database can at the most satisfy only two of these. NoSQL databases implement the theorem by employing Eventual Consistency, which is a more relaxed form of consistency that performs the task over a sufficient period of time. This in turn improves availability and scalability to a great extent. This paradigm is often termed as BASE – implying Basically Available, Soft state, Eventual Consistency.

NoSQL Data Models

Some of the major and most prominent differentiations among NoSQL databases are as follows:

1. Document Stores

2. Hierarchical

3. Network

4. Column-oriented

5. Object-oriented

6. Key-value Stores

7. Triple Stores

Document stores

Gone are the days when data organization used to be as minimal as simple rows and columns. Today, data is more often than not represented in the form of XML or JSON (we’re talking about the Web, basically). The reason for favoring XML or JSON is because both of them are extremely portable, compact and standardized. Bluntly put, it makes little sense to map XML or JSON documents into a relational model. Instead, a wiser decision would be to utilize the document stores already available. Why? Again, simply because NoSQL databases are schema-less, and there exists no predefined for an XML or JSON document and as a result, each document is independent of the other. The database can be employed in CRM, web-related data, real-time data, etc. Some of the most well known implementation models are MongoDB, CouchDB and RavenDB. In fact, MongoDB has been used by websites such as bit.ly and Sourceforge.

Hierarchical Databases

These databases store data in the form of hierarchical relevance, that is, tree or parent-child relationship. In terms of relational models, this can be termed as 1:N relationship. Basically, geospatial databases can be used in a hierarchical form to store location information which is essentially hierarchical, though algorithms may vary. Geotagging and geolocation are in vogue of late. It is in such uses that a geospatial database becomes very relevant, and can be used in Geographical Information System. Major examples of the same include PostGIS, Oracle Spatial, etc. Also, some of the most well known implementations of hierarchical databases are the Windows Registry by Microsoft and the IMS Database by IBM.

Graph Network Databases

Graph databases are the most popular form of network database that are used to store data that can be represented in the form of a Graph. Basically, data stored by graph databases can grow exponentially and thus, graph databases are ideal for storing data that changes frequently. Cutting the theoretical part, graph database has perhaps the most awesome example in the likes of FlockDB, developed by Twitter to implement a graph of who follows whom. FlockDB uses the Gizzard Framework to query a database up to 10,000 times per second. A general technique to query a graph is to begin from an arbitrary or specified start node and follow it by traversing the graph in a depth-first or breadth-first fashion, as per the relationships that obey the given criterion. Most graph databases allow the developer to use simple APIs for accomplishing the task. For instance, you can make queries such as: “Does Jonny Nitro read Data Center Magazine?” Some of the most popular graph databases include, apart from FlockDB, HyperGraphDB and Neo4j.

Column-oriented Databases

Column-oriented databases came into existence after Google’s research paper on its BigTable distributed storage system, which is used internally along with the Google file system. Some of the popular implementations are Hadoop Hbase, Apache Cassandra, HyperTable, etc.

Such databases are implemented more like three-dimensional arrays, the first dimension being the row identifier, the second being a combination of column family plus column identifier and the third being the timestamp. Column-oriented databases are employed by Facebook, Reddit, Digg, etc.

Object-oriented Databases

Whether or not object-oriented databases are purely NoSQL databases is debatable, yet they are more often than not considered to be so because such databases too depart from traditional RDBMS based data models. Such databases allow the storage of data in the form of objects, thereby making it highly transparent. Some of the most popular ones include db4o, NEO, Versant, etc. Object-oriented databases are generally used in research purposes or web-scale production.

Key-value stores

Key-value stores are (arguably) based on Amazon’s Dynamo Research Paper and Distributed hash Tables. Such data models are extremely simplified and generally contain only one set of global key value pairs with each value having a unique key associated to it. The database, therefore, is highly scalable and does not store data relationally. Some popular implementations include Project Voldemort (open-sourced by LinkedIn), Redis, Tokyo Cabinet, etc.

Triple stores

Summary

So, what now? Well, you’ve just been introduced to NoSQL. However, does this mean that you should make the switch to it from SQL? Perhaps. Or perhaps not. The answer varies from situation to situation. If you find SQL queries way too much to cope with, chances are you’ll find NoSQL equally difficult. However, if you’re looking for a more flexible alternative and do not mind getting your hands dirty, you should definitely give NoSQL a spin! The choice, obviously, is yours! Happy data managing to you!

About the author

Sufyan bin Uzayr is a 20-year old Freelance writer, graphic artist and photographer based in India. Sufyan has been extensively involved in the field of graphic design and web development, and he has also developed apps for the mobile platform. Currently writing for two print magazines and six blogs, Sufyan is also the Editor-in-Chief of Brave New World, a contemporary electronic journal. Visit Sufyan’s website at www.sufyan.co.nr or his e-journal at www.bravenewworld.in You may also mail him at sufyan@live.in

History

Add some missing text at the end (2-08-2013).

Upcoming issues

If you're interested in upcoming issues please check our website. You can see for example a table of content of our two in one new Python pack. Python In a Few Lines of Codes and Python Starter Kit.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here