Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Indexing Columns in Redis

0.00/5 (No votes)
16 Jan 2016 1  
Unlike SQL databases, Redis does not support querying by columns natively, which means that you have to maintain your own indexes. As it turns out, Redis provides a rich set of data types to the programmer to ease this task.

Introduction

Unlike memecached, Redis can also be used for persistent storage, and not just as a volatile cache. As it happens, Redis is a blazingly fast database, and gives strikingly better performance to your application if used correctly. Of course, as a caveat, I would like to add that there are also risks associated with using Redis as a primary data store, and these risks are significantly enhanced if it is configured incorrectly; it is therefore advised that you do due research on this before deciding to scrap your "regular" SQL database in favor of Redis.

For all the performance boost that it gives to your application owing to its amazing speed, the fact remains that Redis is fundamentally a key/value store, and does not support indexes. It can therefore be a bit of a challenge when you need to index your values, so that you are later able to search and retrieve values using these indexes. As it turns out, we can work our way around this limitation by using Redis' remarkably useful, natively-provided data types. In this article, we explore how we can use both sets and sorted sets to both index and sort records by dates, and also retrieve rows within date ranges.

Background

I have been using MySQL for a while now, and have grown to love it over the years. For that matter, I use it for all kinds of enterprise applications, and also when I am free to have a database of my own choosing for a .NET one. I'm not about to trash MySQL any time soon; nonetheless, I mention it here so that we have a proper frame of reference against which I can say what follows.

I had long dismissed Redis as just an equivalent (and perhaps faster) product, vis-a-vis memcached. I later came across a very good article that Redis can also be used for persistent storage, and that got me thinking. So one weekend I found myself -- more out of curiosity than anything else-- doing a quick spike with Redis, which turned out quite satisfactorily. Following this, I decided to give it a real try: I have been working on a .NET application and -- since the project is still in its nascent stages -- decided to give Redis a shot as the primary persistent storage engine, doing away entirely with MySQL in the deal. I ran some unit tests on both, the Redis version of my code, as well as the (previous) MySQL version, and the results were astonishing: the performance increase was easily by a factor of 5X! Of course, this was in my particular case, and your mileage may vary.

Having said this, the "problem" with Redis is that, basically being a key/value store, it does not support indexes. Coming from the SQL world, this is a very critical requirement for me, and this too was easily solved using Redis' rich data type system. Redis does not just support simple scalar values (simplistically called "string," which is really a misnomer, since it can store a whole lot of things beyond just strings), it also supports compound types like lists, sets, hashes, and yes, sorted sets, one of the data types that I will be talking about in this article.

Prior to this, I was quite flummoxed as to how to index dates, especially when it comes to retrieving records by date ranges. As it turns out, this can easily be accomplished using the sorted sets data type in Redis.

About Redis Complex Types

Redis supports quite a few complex types. As mentioned in the previous section, we have lists, sets, sorted sets, and hashes, though we won't be talking about lists and hashes in this article. However, we do use the other two types, that is, sets and sorted sets, extensively here.

Sets are exactly that: a collection of unique values with no notion of order amongst them. Sorted sets are slightly different beasts though. They allow one to save values with scores and thus at all times the members of a sorted set can be accessed using their scores and even a range of scores. As it turns out, this can be quite handy while storing dates. We convert every single date to be stored into its 'tick' value, thereby getting an ordered set of values. We can then use Redis' ZRANGEBYSCORE command to retrieve values that fall within a particular range.

Redis as a key-value store can store just about any type of value in the database, as long it has a unique key defined for it, regardless of whether the value is a string type or a complex type.

Using the Code

The code is not terribly efficient: for one thing, we are saving the entire object as is to all the indexes. A "real world" approach could be to save the object as a scalar "string" and its id in the various indexes. However, I could not have done this without increasing the length of my code, so I just skipped it. This has been left as an exercise for the reader.

The attached archive contains several files. We have a Person class that allows us to save Person class objects, with details like the person's name, gender, country of origin, and date of birth. I have deliberately kept the class simple so that we do not have to deal with irrelevant details. There is also a static RedisAdaptor class that has helper functions for both saving Person objects and retrieving them.

It is recommended that you have a look at the attached archive for a more complete understanding of the code.

External Libraries

Insofar as external libraries are concerned, you will find a comprehensive list of Redis client libs for different languages/platforms here. I have used the StackExchange.Redis lib, since it is both open source and has very friendly licensing terms, as against some of the other entries in the C# section. I have also used the NewtonSoft JSON library, available here. This latter library is also free for commercial use, though you can purchase a license should you be inclined to do so.

The Main Program

In the main program, we have a loop that runs through every single day of the arbitrarily-chosen year, 1971, and creates a new Person class object with that day's date of birth. As for the gender, we keep alternating every single object between male and female; for the country, since only three countries have been specified in this contrived example, every single Person is either from India, the USA, or Great Britain. For the name, we use a randomly generated string, which is actually a guid, stripped off of all non-alphanumeric chars.

static void Main (string[] args) {
   const int YEAR = 1971;

   // We create one Person object for every single day in the given year.
   for (int month = 1; month <= 12; ++month) {
      for (int day = 1; day <= 31; ++day) {
         try {
         // Get any random name:
         string name = Util.GetAnyName ();
         // And a DoB:
         DateTime dob = new DateTime (YEAR, month, day);
         // As for the gender, let's alternate:
         Gender gender = Gender.FEMALE;
         if (day % 2 == 0) {
            gender = Gender.MALE;
         }
         // And the country, let's round-robin between all three:
         Country country = Country.INDIA;
         if (day % 3 == 1) {
            country = Country.USA;
         } else if (day % 3 == 2) {
            country = Country.GB;
         }

         // Create a new Person object:
         Person person = new Person (name, gender, country, dob);
         //Console.WriteLine ("Created new Person object: {0}", person);

         // We call the function that will store a new person in Redis:
            RedisAdaptor.StorePersonObject (person);
         } catch (Exception) {
            // If the control reaches here, it means the date was illegal.
            // So we just shrug your shoulders and move on to the next date.
            continue;
         }
      }
   }

   // At this point, we have 365 Person objects as a sorted set in our Redis database.

   // Next, let's take a date range and retrieve Person objects from within that range.
   DateTime fromDate = DateTime.Parse ("5-May-" + YEAR);
   DateTime toDate = DateTime.Parse ("7-May-" + YEAR);

   List<Person> persons = RedisAdaptor.RetrievePersonObjects (fromDate, toDate);

   Console.WriteLine ("Retrieved values in specified date range:");
      foreach (Person person in persons) {
      Console.WriteLine (person);
   }

   // Next, let's select some folks who are female AND from the USA.
   // This calls for a set intersection operation.
   List<Person> personsSelection = RedisAdaptor.RetrieveSelection (Gender.FEMALE, Country.USA);

   Console.WriteLine ("Retrieved values in selection:");
      foreach (Person person in personsSelection) {
         Console.WriteLine (person);
   }
}

The RedisAdaptor class has a single function for storing and indexing the value passed to it, and a function each for retrieving values by date range, by gender, and by country. We also have a few static fields and constants for this.

Note that the "indexing" is done during the storage itself (naturally). In this case, we index by gender, by country, and by date of birth.

static class RedisAdaptor {
   const string REDIS_HOST = "127.0.0.1";

   private static ConnectionMultiplexer _redis;

   // Date of birth key:
   const string REDIS_DOB_INDEX = "REDIS_DOB_INDEX";

   // Gender keys:
   const string REDIS_MALE_INDEX = "REDIS_MALE_INDEX";
   const string REDIS_FEMALE_INDEX = "REDIS_FEMALE_INDEX";

   // Country keys:
   const string REDIS_C_IN_INDEX = "REDIS_C_IN_INDEX";
   const string REDIS_C_USA_INDEX = "REDIS_C_USA_INDEX";
   const string REDIS_C_GB_INDEX = "REDIS_C_GB_INDEX";

   static RedisAdaptor () {
      // First, init the connection:
      _redis = ConnectionMultiplexer.Connect (REDIS_HOST);
   }

   public static void StorePersonObject (Person person) {
      // We first JSONize the object so that it's easier to save:
      string personJson = JsonConvert.SerializeObject (person);
      //Console.WriteLine ("JSONized new Person object: {0}", personJson);

      // And save it to Redis.

      // First, get the database object:
      IDatabase db = _redis.GetDatabase ();

      // Bear in mind that Redis is fundamentally a key-value store that does not provide
      // indexes out of the box.
      // We therefore work our way around this by creating and managing our own indexes.

      // The first index that we have is for gender.
      // We have two sets for this in Redis: one for males and the other for females.
      if (person.Gender == Gender.MALE) {
         db.SetAdd (REDIS_MALE_INDEX, personJson);
      } else {
         db.SetAdd (REDIS_FEMALE_INDEX, personJson);
      }

      // Next, we index by country.
      if (person.Country == Country.INDIA) {
         db.SetAdd (REDIS_C_IN_INDEX, personJson);
      } else if (person.Country == Country.USA) {
         db.SetAdd (REDIS_C_USA_INDEX, personJson);
      } else if (person.Country == Country.GB) {
         db.SetAdd (REDIS_C_GB_INDEX, personJson);
      }

      // Next, we need to create an index to be able to retrieve values that are in a particular
      // date range.

      // Since we need to index by date, we use the sorted set structure in Redis. Sorted sets
      // require a score (a real) to save a record. Therefore, in our case, we will use the
      // DoB's `ticks' value as the score.
      double dateTicks = (double) person.DoB.Ticks;

      db.SortedSetAdd (REDIS_DOB_INDEX, personJson, dateTicks);
   }

   public static List<Person> RetrievePersonObjects (DateTime fromDate, DateTime toDate) {
      // First. let's convert the dates to tick values:
      double fromTicks = fromDate.Ticks;
      double toTicks = toDate.Ticks;

      // And retrieve values from the sorted set.

      // First, get the database object:
      IDatabase db = _redis.GetDatabase ();

      RedisValue[] vals = db.SortedSetRangeByScore (REDIS_DOB_INDEX, fromTicks, toTicks);
      List<Person> opList = new List<Person> ();
      foreach (RedisValue val in vals) {
         string personJson = val.ToString ();
         Person person = JsonConvert.DeserializeObject<Person> (personJson);
         opList.Add (person);
      }

      return opList;
   }

   public static List<Person> RetrievePersonObjects (Gender gender) {
      // First, get the database object:
      IDatabase db = _redis.GetDatabase ();

      string keyToUse = gender == Gender.MALE ? REDIS_MALE_INDEX : REDIS_FEMALE_INDEX;

      RedisValue[] vals = db.SetMembers (keyToUse);

      List<Person> opList = new List<Person> ();
      foreach (RedisValue val in vals) {
         string personJson = val.ToString ();
         Person person = JsonConvert.DeserializeObject<Person> (personJson);
         opList.Add (person);
      }

      return opList;
   }

   public static List<Person> RetrievePersonObjects (Country country) {
      // First, get the database object:
      IDatabase db = _redis.GetDatabase ();

      string keyToUse = REDIS_C_IN_INDEX;
      if (country == Country.USA) {
         keyToUse = REDIS_C_USA_INDEX;
      } else if (country == Country.GB) {
         keyToUse = REDIS_C_GB_INDEX;
      }

      RedisValue[] vals = db.SetMembers (keyToUse);

      List<Person> opList = new List<Person> ();
      foreach (RedisValue val in vals) {
         string personJson = val.ToString ();
         Person person = JsonConvert.DeserializeObject<Person> (personJson);
         opList.Add (person);
      }

      return opList;
   }

   public static List<Person> RetrieveSelection (Gender gender, Country country) {
      // First, get the database object:
      IDatabase db = _redis.GetDatabase ();

      string keyToUseGender = gender == Gender.MALE ? REDIS_MALE_INDEX : REDIS_FEMALE_INDEX;
      string keyToUseCountry = REDIS_C_IN_INDEX;
      if (country == Country.USA) {
         keyToUseCountry = REDIS_C_USA_INDEX;
      } else if (country == Country.GB) {
         keyToUseCountry = REDIS_C_GB_INDEX;
      }

      RedisKey[] keys = new RedisKey[] { keyToUseGender, keyToUseCountry };

      RedisValue[] vals = db.SetCombine (SetOperation.Intersect, keys);

      List<Person> opList = new List<Person> ();
      foreach (RedisValue val in vals) {
         string personJson = val.ToString ();
         Person person = JsonConvert.DeserializeObject<Person> (personJson);
         opList.Add (person);
      }

      return opList;
   }
}

Each of the const keys defined in this static class, like REDIS_DOB_INDEX, REDIS_MALE_INDEX, REDIS_FEMALE_INDEX, and so forth, are actually keys to the individual sets in the Redis store.

Once we have saved the values and also created indexed sets for them in Redis, we can retrieve them using the various versions of the overloaded function, RetrievePersonObjects, with the date range, gender, or country parameters.

Retrieving by gender is pretty straightforward: based on the specified gender, we dip into either of the two gender sets and retrieve the required values. So also for retrieval by country, wherein we have three unqiue sets for each of the countries specified.

To retrieve values by date range, we use the SortedSetRangeByScore method in the Redis database object. It takes three arguments: the first is the name of the sorted set itself, while the others are the min and max values. (You can read more about it here.)

One of the more interesting features of working with sets in Redis is its splendid use of set semantics. You can specify two or more sets, and have the Redis database do a union, intersection, or a difference amongst them. In the last code section above, have a look at the function towards the bottom, RetrieveSelection, which declares two parameters: Gender and Country. This particular function returns values based on both, the Person's gender AND their country of origin. To this end, we use the SetCombine method in the Redis database object.

Cleaning up Redis

As you keep running the program, you will soon start filling up your Redis instance's persistent memory. In case you would like to get rid of the previous values already stored in Redis, I would recommend that you call flushdb from the Redis CLI (instructions for installing Redis and using the CLI) to delete all the keys in the current database. You could also use the more perilous flushall call, however, do note that flushall deletes all keys across all databases! So I would ask you to use flushall like a loaded gun: very, very carefully.

Points of Interest

Hey, if you're new to Redis, let me add that we haven't even scratched the surface of what Redis can do for you. If this article has piqued your curiosity, I would recommend that you also check out the pub/sub functionality that comes out of the box.

Another very nifty feature is the HyperLogLog thingy available in Redis for a while now, for finding the cardinality (or member count) of any complex type. This is essentially an algorithm with O(1) complexity, and yet another advantage is that it uses a fixed amount of space, regardless of the membership size. Of course, the tradeoff is that its accuracy is a little off, albeit something that you should be able to live with.

You may also want to check out the documentation provided in a very minimal format for each data type, and a lot more, here.

History

  • First version

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here