Part 6: Google Cloud Datastore
Welcome back and to the sixth installment of our ongoing
series on Google Cloud Platform. If you're one of those people who suffers from
FOMO (fear of missing out) you might be better off starting with Part 1
here, but if you're alright with coming a bit late to the party
then please read on - we're building an application on Google Cloud Platform
and in this installment we're going to continue with our investigation of
something that most applications need to do: store data.
As we mentioned last time, Google offers several different
ways of dealing with data storage: Google Cloud SQL, which we covered last time, is for
those applications that want to store data in the time-honored fashion of the
relational database and relational model. There’s also the Google Cloud Storage
Client, which is geared more for "large binaries", like images and videos,
which we’ll get into next time. And lastly, there’s Google Cloud Datastore, a
non-relational "NoSQL" data storage approach that we’ll get into here, in just
a moment.
Before we do so, however, a very serious point bears
repeating: Much as the various technical pundits and evangelists might want to
disagree, none of these is "superior" to the others. Those individuals who
prefer to slavishly follow whatever "best practice" industry pundits are going
on about will hate to hear me say this, but the fact is, each one solves a
different kind of problem, and sometimes the best approach is to use all of
them simultaneously, a technique sometimes called "polyglot persistence" or
"poly-store persistence". Or, as a famous writer once put it, "From each
database, according to its abilities, to each project, according to its needs."
(OK, I admit it, that writer was me, just now. But still, it
sounds good, doesn’t it?)
Google Cloud Datastore
Google Cloud SQL has the advantage between the two in that
it builds on top of the ever-familiar JDBC programming model, once that Java
developers will be able to code in their sleep. The Google Cloud Datastore API, on the
other hand, is not one that many Java developers will know; fortunately, they
provide two approaches (and three APIs) for accessing it: a JDO- or JPA-based
approach that encourages developers to build persistent classes and leave the
details of persistence to the library, or a "low-level" API designed to provide
access to the raw details of the Google Cloud Datastore storage layer. (For those who
are truly curious, Google Cloud Datastore is built on top of Google’s "Bigtable"
storage system, one of the earliest of the NoSQL-branded storage systems, and
its details are described in more detail at http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf,
though readers are warned, this is a full-blown academic paper, and pulls no
punches.)
Depending on which approach appeals, the details of using Google Cloud
Datastore differ. The JDO and JPA approaches both require developers to write "persistent
classes", classes that conform to a particular set of restrictions and are
"enhanced" (modified) during the build process to include additional
functionality to make the persistence (mostly) transparent to the application
developer. For developers that don’t want (or need) to bother with low-level
details, this is usually the better approach.
The high-level APIs, however, have the disadvantage that
they are high-level APIs, and there are times when a "raw" low-level approach
will offer certain benefits. So, again, neither approach is "better" than the
other, though it is fairly safe to say that for persisting an instance of a
given class, the high-level APIs will take fewer lines of code than the
low-level APIs, whereas the low-level APIs will offer a finer degree of
control.
Regardless of approach taken, Google Cloud Datastore has several
advantages and disadvantages. For starters, Google Cloud Datastore "entities" are not
stored in a tabular relational format; the format has some vaguely tabular
shape to it, allowing for entities to in turn have data elements stored on them
as directly-dependent data ("properties"), but it doesn’t recognize "relations"
(foreign key relationships) as a core part of the model, and it doesn’t support
an ad-hoc query format like SQL provides. In return, it automatically
distributes data to manage very large data sets, and supports incredibly fast
queries, largely because the queries are known well ahead of time and can be
optimized long before the query actually runs. It’s a tradeoff,
ease-of-accessibility against scale, but the beautiful thing about a poly-store
approach is that highly-relational data can be stored in Google Cloud SQL, and
large-scale data can be stored in Google Cloud Datastore, and both accessed and used
from the same application.
Entities stored in Google Cloud Datastore aren’t accessed in the
same way that we’re used to from SQL-based databases, either—entities are
structured in a hierarchy (analogously to how files are stored on a
filesystem), and thus have a parent entity, except for the "root" entities in
the system. Finding an entity, then, becomes an exercise in navigating the
child paths to a given entity, much as finding a file on the filesystem is an
exercise in navigating through directories to the file in question.
It’s a bit easier to see in code, so let’s look at the two
high-level approaches side by side. If these two don’t really float your boat,
by the way, Google suggests three other possible open-source frameworks that
layer on top of the Google Cloud Datastore API: Objectify (https://code.google.com/p/objectify-appengine/),
Twig (https://code.google.com/p/twig-persist/)
and Slim3 (https://sites.google.com/site/slim3appengine/).
More details on each can be found on their respective home pages.
JDO
Java Data Objects was a predecessor to JPA during the "ORM
Wars" of the JavaEE world, and syntactically looks and feels a lot like the
object-oriented databases that were a big part of the object-oriented world
back in the late 90’s. (Versant, in particular, was a big influence on the JDO
specification, it seems—at least, based on my own time using Versant and then
later using JDO.)
The package for JDO is javax.jdo, and uses annotations to
decorate Java classes to describe the entities and the entity’s properties that
are stored in Google Cloud Datastore. It will require an "enhancement" step (recall
that we had to disable this enhancement step in Part 1 of this series, so
really it just means re-enabling that Ant build script step, or not doing
anything at all if you’re working with a fresh copy of the project template
from the Google App Engine SDK), which churns out modified versions of those classes
with the storage functionality ninja’ed in.
Because JDO is a little less known than JPA, and because JPA
is so frequently associated with relational databases (which can sometimes
create some false-equivalences in new users’ minds), we’ll use it for the code
examples. Note that, particularly at the most basic usage levels, JDO and JPA
are pretty interchangeable, so readers more comfortable with JPA can freely use
that instead.
JPA
Java Persistence API is the officially-sanctioned API for
managing the object/relational impedance mismatch within the JavaEE stack, and
was largely influenced by the success of the Hibernate open-source project. (In
some respects, it can be called the "winner" of the "ORM Wars", if such a thing
can be said to have a winner.) JPA annotations are defined out of the javax.jpa
package, and like JDO, developers will annotate classes to be persisted with
JPA annotations to describe the entities to be stored in the Google Cloud Datastore.
Low-level
A given datastore can also be
seen/accessed from a much lower-level perspective, as can well be imagined
(since it essentially rides on top of Google’s BigTable system). Although it
can be helpful to be able to see "underneath" the objects being stored (via JDO
or JPA) into the storage system, such as the built-in datatypes offered by the Google Cloud
Datastore API (phone numbers, emails, unlimited-length text fields, URL links,
and so on), for the most part Java developers will not need to use the
low-level API, and it’s mentioned here mostly for completeness’ sake.
Code
Enough conceptual deconstruction; let’s see some code.
The application has thus far been greeting people as they’ve
come up to the website (or, as we saw last issue, the mobile endpoint), but
without any sense of history. Marketing has decided that the application needs
to track users as they come to us, and those who’ve been here before get more
personalized and/or heartfelt greetings. That means, practically speaking, that
we want to track the date/time a given user (as given by the parameter to the
mobile endpoint) hits the endpoint, as well as the message that we sent them
this time (so as to avoid any obvious repetitions).
First of all, let’s re-define the Message
class to be
persistent, and to include both the timestamp of the greeting and the target of
the greeting. We’ll still let the Message
be passed in from outside the class
to allow for maximum flexibility in deciding what the message should be.
(Developer aesthetics may differ here—if you prefer to let Message
encapsulate
the actual choice of messages, that’s a perfectly reasonable decision.
Personally, I prefer my data-storage types to be pretty dumb data transfer
objects.)
From the JDO perspective, that means that the class needs to
be annotated at the class level with the JDO @PersistentCapable
annotation, indicating
that this class needs to be enhanced, and the fields to be stored with the JDO
@Persistent
annotation. There are a few cases where @Persistent
isn’t
necessary, but it doesn’t hurt to include it even if it’s redundant. JDO also
demands that there be one field defined on the class that stores the primary
key for the persistent object, so we add one:
@PersistenceCapable
class Message
{
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
@Persistent
public String target;
@Persistent
public String message;
@Persistent
public Date timestamp;
public Message(String t, String m)
{
target = t;
message = m;
timestamp = new Date();
}
public String getMessage() { return message; }
public void setMessage(String value) { message = value; }
public Date getTimestamp() { return timestamp; }
public String getTarget() { return target; }
}
JDO also supports the idea of "serializable types", meaning
that any class that is marked Serializable (by implementing the marker
interface) will be serialized as a "blob"—a straight array-of-bytes binary
value—for those situations where the entity wants to store some dependent data
but doesn’t really need to query or index over that data. For example, if we
wanted to store images in the Message
, that would be easy to do as a
Serializable-implementing field type inside the Message
, and wouldn’t require
anything further to enable it, assuming the Image or other class actually
stored were Serializable. (Note that since Collections are Serializable, an
entity could store a Collection as a field, and the items within the
Collection—assuming all were also Serializable—would be stored along with the
entity itself. However, the items in the Collection would all be stored as a
binary blob, meaning they would be inaccessible as query predicate parameters.)
From the developer’s point of view, this is all that’s
necessary to make Message
objects persistent—having made the changes above, we
can do an "ant enhance", which in turn depends on the "compile" task, and Ant
will run the code through the Java compiler, followed by the DataNucleus (the
tool used for both JDO and JPA persistence) enhancer, and deposit the code into
the generated "war" directory right next to the "src" directory in the project
structure.
However, from the Google App Engine buildchain’s perspective, one
other necessary change remains, and that’s the "JDO configuration file" (jdoconfig.xml),
which has to end up in a very particular location: the
war/WEB-INF/classes/META-INF directory. (In essence, the JDO config file must
appear in the "META-INF" directory of the classes it describes, and thus, since
these classes are part of a servlet WAR format, in the WEB-INF/classes
subdirectory.) The default project template comes with a version stored in the
src/META-INF directory that looks like so:
="1.0" ="utf-8"
<jdoconfig xmlns="http://java.sun.com/xml/ns/jdo/jdoconfig"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation=
"http://java.sun.com/xml/ns/jdo/jdoconfig">
<persistence-manager-factory name="transactions-optional">
<property name="javax.jdo.PersistenceManagerFactoryClass"
value="org.datanucleus.api.jdo.JDOPersistenceManagerFactory"/>
<property name="javax.jdo.option.ConnectionURL"
value="appengine"/>
<property name="javax.jdo.option.NontransactionalRead"
value="true"/>
<property name="javax.jdo.option.NontransactionalWrite"
value="true"/>
<property name="javax.jdo.option.RetainValues"
value="true"/>
<property name="datanucleus.appengine.autoCreateDatastoreTxns"
value="true"/>
<property name="datanucleus.appengine.singletonPMFForName"
value="true"/>
</persistence-manager-factory>
</jdoconfig>
As with all XML files, make sure the spelling and case of
all non-quoted strings is exactly as defined above; usually the default
jdoconfig.xml file is fine, and it’s best to just start with that until a
situation arises that demands changing it.
The next steps come when we want to find all the Message
s
that have a particular target as the value of the target field, to decide what
Message
to hand back, as well as to store the Message
that we created and
handed back. Both of these steps will require the use of a JDO
PersistenceManager
, which are obtained via a JDOHelper
static class to get a
PersistenceManagerFactory
, which in turn offers an instance of
PersistenceManager
:
public class Greetings
{
private static final PersistenceManagerFactory pmf =
JDOHelper.getPersistenceManagerFactory("transactions-optional");
public Message greet(@Named("target") String target)
{
PersistenceManager pm = pmf.getPersistenceManager();
try
{
Message msg = new Message(
target,
"Hello, " + target + ", from Google Cloud Endpoints!");
pm.makePersistent(msg);
return msg;
}
finally
{
pm.close();
}
}
}
Note that the string used to get the
PersistenceManagerFactory
has to match what was listed in the jdoconfig.xml
file; this is to allow developers to be able to use different kinds of
PersistenceManager
s (one with transactions required, one with them optional,
and so on). Once we have a PersistenceManager
, it becomes pretty easy to store
the Message
, using the makePersistent() method call to do the actual storage;
JDO and the Google Cloud Datastore API do the rest of the work from there.
Summary
As mentioned earlier, JDO is not the only way to get at
Google Cloud Datastore; the JPA standard is equally supported, and may, for
some developers, be an easier ramp to getting started with Google Cloud Datastore, if
they’re familiar with it from working with Hibernate or the more recent JavaEE
standard technologies. And, as one might easily surmise, there’s a lot more to
JDO than just what we’ve seen here—the DataNucleus project has a great deal
more documentation on JDO, including some nice examples of how to use it in a
variety of different scenarios; anyone looking to do anything non-trivial with
JDO should spend some serious quality time there. (This is one of the nice
things about Google using established Java API standards like JDO and
JPA—there’s a ton of documentation already out there, so we can leverage that
in learning and using these tools.)
In the meantime, however, we now have a record of those whom
we’ve greeted, and we could perhaps use that data as a way of changing up the
Message—for those who’ve never been here before, offer them a very polite
greeting ("It’s very nice to make your acquaintance"), whereas those who’ve
been here numerous times before get a more casual and friendly greeting
("WHAZZZZUPPPPP?!?"). Future customizations are endless, which is good, because
this is clearly the Internet’s Next Big Thing.
In the next article, we’ll talk about how to jazz up the
greetings even further by including video or images with the greeting, but for
now, happy coding!