This article is written by Sufyan bin Uzayr, and was originally published in the June 2013 issue of the Software Developer's Journal. You can find more articles at the SDJ website.
Introduction
This article
aims to explore
the basic ideas
and principles about noSQL databases. noSQL caters to
database admins, programmers, coders, web devs, etc.
Let’s say you’ve decided to set up a website
or an application. You’ll obviously need something to manage the
data. Yes, that’s right, a database. So, what is it going to be? MySQL, MS-SQL,
Oracle or PostgreSQL? After all, nothing can be as amazing as a good old RDBMS
that employs SQL to manage the data.
Well, allow me to introduce to you an entirely
unique and unconventional Database model – NoSQL. Just like every other fine
article out there, we too shall begin ith...eh....disclaimers!
NoSQL stands for not-only-SQL. The idea here
is not to oppose SQL, but instead provide an alternative in terms of storage of
data. Yet, for the obvious reason that most users are well versed with SQL,
many NoSQL databases strive to provide an SQLlike query interface.
Why NoSQL?
That’s a valid question, indeed. Well, here
are the reasons:
- Managing Large Chunks of Data: NoSQL databases can easily handle numerous
read/write cycles, several users and amounts of data ranging in petabytes.
- Schema? Nah, not needed: Most NoSQL databases are devoid of schema and therefore very flexible. They provide great choices when it comes to constructing a
schema and foster easy mapping of objects into them. Terms such as
normalization and complex joins are, well, not needed!
- Programmer-friendly: NoSQL databases provide simple APIs in every major programming
language and therefore there is no need for complex ORM frameworks. And just in case APIs are not available for a particular programming language, data
can still be accessed over HTTP via a simple RESTful API, using XML and/or
JSON.
- Availability: Most distributed NoSQL databases provide easy replication of data
and failure of one node does not affect the availability of data in a major
way.
- Scalability: NoSQL databases do not require a dedicated high performance
server. Actually, they can easily be run on a cluster of commodity hardware and
scaling out is just as simple as adding a new node.
- Low Latency: Unless you are running a cluster of a trillion data servers (or something
like that, give or take a few million of them), NoSQL can help you achieve extremely low latency.
Of course, latency in itself depends on the amount of data that can be
successfully loaded into memory.
Triple stores save data in the
form of subject-predicate-object with the predicate being the
linking factor between subject and object. As such, Triple Scores too are
variants of network databases. For instance, let’s say “Jonny Nitro reads Data
Center Magazine.” In this case, Jonny Nitro is the subject, while Data Center
Magazine is the object, and the term ‘reads’ acts as the predicate linking the
subject with the object. Quite obviously, mapping such semantic queries into SQL
will prove difficult, and therefore NoSQL offers a viable alternative. Some of
the major implementations of Triple Stores are Sesame, Jena, Virtuoso,
AllegroGraph, etc.
SQL ideology
Basically, NoSQL drops the traditional SQL
ideology in favor of CAP Theorem or Brewer’s Theorem, formulated by Eric Brewer
in 2000. the theorem talks about three basic principles of Consistency,
Availability and Partition Tolerance (abbreviated as CAP), adding that a
distributed database can at the most satisfy only two of these. NoSQL databases
implement the theorem by employing Eventual Consistency, which is a more relaxed form of consistency that performs the task over a sufficient period of time.
This in turn improves availability and scalability to a great extent. This paradigm is often termed as BASE – implying Basically Available, Soft state, Eventual Consistency.
NoSQL Data Models
Some of the major and most prominent
differentiations among NoSQL databases are as follows:
1.
Document Stores
2.
Hierarchical
3.
Network
4.
Column-oriented
5.
Object-oriented
6.
Key-value Stores
7.
Triple Stores
Document stores
Gone are the days when data organization used
to be as minimal as simple rows and columns. Today, data is more often than not
represented in the form of XML or JSON (we’re talking about the Web,
basically). The reason for favoring XML or JSON is because both of them are extremely portable, compact and standardized. Bluntly put, it makes little sense to map XML
or JSON documents into a relational model. Instead, a wiser decision would be
to utilize the document stores already available. Why? Again, simply because
NoSQL databases are schema-less, and there exists no predefined for an XML or JSON document and as a result, each
document is independent of the other. The database can be employed in CRM,
web-related data, real-time data, etc. Some of the most well known implementation
models are MongoDB, CouchDB and RavenDB. In fact, MongoDB has been used by
websites such as bit.ly and Sourceforge.
Hierarchical Databases
These databases store data in the form of
hierarchical relevance, that is, tree or parent-child relationship. In terms of
relational models, this can be termed as 1:N relationship. Basically,
geospatial databases can be used in a hierarchical form to store location
information which is essentially hierarchical, though algorithms may vary.
Geotagging and geolocation are in vogue of late. It is in such uses that a
geospatial database becomes very relevant, and can be used in Geographical Information System. Major examples of the same include PostGIS, Oracle
Spatial, etc. Also, some of the most well known implementations of hierarchical
databases are the Windows Registry by Microsoft and the IMS Database by IBM.
Graph Network Databases
Graph databases are the most popular form of
network database that are used to store data that can be represented in the
form of a Graph. Basically, data stored by graph databases can grow exponentially and thus, graph databases are ideal for storing data that changes frequently. Cutting the theoretical part,
graph database has perhaps the most awesome example in the likes of FlockDB, developed by Twitter to
implement a graph of who follows whom. FlockDB uses the Gizzard Framework to
query a database up to 10,000 times per second. A general technique to query a
graph is to begin from an arbitrary or specified start node and follow it by
traversing the graph in a depth-first or breadth-first fashion, as per the
relationships that obey the given criterion. Most graph databases allow the
developer to use simple APIs for accomplishing the task. For instance, you can
make queries such as: “Does Jonny Nitro read Data Center Magazine?” Some of the
most popular graph databases include, apart from FlockDB, HyperGraphDB and
Neo4j.
Column-oriented Databases
Column-oriented databases came into existence after Google’s research paper on its BigTable distributed storage system, which is used
internally along with the Google file system. Some of the
popular implementations are Hadoop Hbase, Apache Cassandra, HyperTable, etc.
Such databases are implemented more like
three-dimensional arrays, the first dimension being the row identifier, the
second being a combination of column family plus column identifier and the
third being the timestamp. Column-oriented databases are employed by Facebook,
Reddit, Digg, etc.
Object-oriented Databases
Whether or not object-oriented databases are
purely NoSQL databases is debatable, yet they are more often than not
considered to be so because such databases too depart from traditional RDBMS
based data models. Such databases allow the storage of data in the form of
objects, thereby making it highly transparent. Some of the most popular ones
include db4o, NEO, Versant, etc. Object-oriented databases are generally used
in research purposes or web-scale production.
Key-value stores
Key-value stores are (arguably) based on
Amazon’s Dynamo Research Paper and Distributed hash Tables. Such data models are extremely simplified and generally contain only one set of global key value pairs with each value
having a unique key associated to it. The database, therefore, is highly
scalable and does not store data relationally. Some popular implementations
include Project Voldemort (open-sourced by LinkedIn), Redis, Tokyo Cabinet,
etc.
Triple stores
Triple stores save data in the form of
subject-predicate-object with the predicate being
the linking factor between subject and object. As such, Triple Scores too are
variants of network databases. For instance, let’s say “Jonny Nitro reads Data
Center Magazine.” In this case, Jonny Nitro is the subject, while Data Center
Magazine is the object, and the term ‘reads’ acts as the predicate linking the
subject with the object. Quite obviously, mapping such semantic queries into SQL
will prove difficult, and therefore NoSQL offers a viable alternative. Some of
the major implementations of Triple Stores are Sesame, Jena, Virtuoso,
AllegroGraph, etc.
Summary
So, what now? Well, you’ve just been
introduced to NoSQL. However, does this mean that you should make the switch to
it from SQL? Perhaps. Or perhaps not. The answer varies from situation to
situation. If you find SQL queries way too much to cope with, chances are
you’ll find NoSQL equally difficult. However, if you’re looking for a more
flexible alternative and do not mind getting your hands dirty, you should definitely give NoSQL a spin! The choice, obviously, is yours! Happy data managing to you!
About the author
Sufyan bin Uzayr is a 20-year old Freelance
writer, graphic artist and photographer based in India. Sufyan has been
extensively involved in the field of graphic
design and web
development, and he
has also developed
apps for the mobile platform.
Currently writing for two print magazines and six blogs, Sufyan is also the
Editor-in-Chief of Brave New World, a contemporary electronic journal. Visit
Sufyan’s website at www.sufyan.co.nr or his e-journal
at www.bravenewworld.in You may also mail him at sufyan@live.in
History
Add some missing text at the end (2-08-2013).
Upcoming issues
If you're interested in upcoming issues please check our website. You can see for example a table of content of our two in one new Python pack. Python In a Few Lines of Codes and Python Starter Kit.