<img border="0" height="50px" src="1115019/Download-Button.png" width="160px" />
Introduction
How complicated can a search be?
Usually we do not ask this question in early development stages and this is why for some projects the architecture is not consistent with the needs of a fast and efficient search.
In those "usual" circumstances the project enters into the stage of refactoring and constant discussions with product owners about the need of time and resources for re-building or re-designing the project for faster and more efficient search.
Cloudant distributed database as a service (DBaaS) is engineered in a way to help you solve issues with indexing and searching by integrating the Apache Lucene search library.
The benefit you get from this is that your NoSQL database has built-in indexing made specifically for JSON formatted data. Plus, index calculation and re-calculations algorithms are made to efficiently run in distributed cloud environments over chunks of data. And you don't need to write a single line of code or configuration to use this power.
In short
Search indexes, defined in design documents, allow databases to be queried using Lucene Query Parser Syntax. Search indexes are defined by an index function, similar to a map function in MapReduce views. The index function decides what data to index and store in the index. (from https://docs.cloudant.com/search.html)
The above quote is a good description of the Cloudant Search Engine but the best way to really understand it is with an example. So this is why we are going to build an IoT solution as an example using Cloudant to store sensor data and query information for monitoring.
Background
What is Apache Lucene?
In 1999 Doug Cutting released the first version Lucene. Later in 2001, the project joined the Apache Software Foundation Jakarta family of open-source products. After this a lot of related projects branched from Lucene. Currently this open-source search library is the most popular JSON document processing library.
Distributed Databases can be scaled across multiple racks, data centers and even different cloud providers. The Apache Lucene implementation in Cloudant is made so that scaling out will give you the benefits without losing the efficiency and speed.
Using RESTful API or Design Document Web UI you can define indexes that are immediately built and ready for use. As we mentioned in previous posts, when key actions occur (like CREATE, UPDATE or DELETE) all indexes are incrementally updated including Lucene indexes.
Lucene Query Parser Syntax
The combination of how the index is defined and the query syntax allows you to perform numeric, date, text, Boolean, and geo-spatial queries on any JSON field in your database.
Here are some of the capabilities of search syntax:
- Ranked searching - different ways to order the results
- Powerful query types - Wildcard, Regular Expression, Fuzzy, Proximity, Range and more
- Language-specific analyzers - choosing a language to recognize terms within text
- Faceted search and filtering
- Bookmarking
Using the code
Sensor Emulator
We will reuse the code structure from previous posts extending the JSON delivered to Cloudant DBaaS with more content. Our enhanced sensors will send temperature and humidity just as before, but now they will include device id, geo-location, user messages and error messages. Data will flow directly into the Cloudant DB.
This is not a common architecture design for an IoT solution because architects prefer to design a middle service for providing security and versioning controls. With Cloudant we can send data directly from device to DB using API key level of security provided by Cloudant DBaaS. Because the database is not relational we can integrate different JSON versions into the same database, processed by the same (or version upgraded) indexes.
As a result of this, we have a Sensor class looking like this:
public class Sensor
{
public string _id;
public string recordtitle;
public string record;
public string origtime;
public int displacement;
public int temp;
public int hmdt;
public long modified;
public string tags;
public string city;
public double lat;
public double lon;
public string deviceId;
public string userMessage;
public string errorMessage;
}
The sensor data generation method has several combinations of device IDs, cities and geo-locations predefined as well as several error messages and user messages.
Building Search Indexes
Multi-parameter indexing
Using the Design Document UI we will create a new search index for six of the fields we query
<img height="275px" src="1115019/image001.png" width="504px" />
<img height="271px" src="1115019/image002.png" width="504px" />
This search index is combining six search indexes for six fields we have in each record. This will allow us to execute complicated queries, requesting records in relations to those fields. For example:
- we can request user messages in a city;
- we can query user messages in a range of a geo-location;
- we can request sensor data in a tame frame from a city
- and many more
The code below generates named indexes for key properties:
function (doc) {
index("deviceId", doc.deviceId);
if (doc.origtime) {
index("time", doc.origtime, { "store" : true });
}
if (doc.lat && doc.lon) {
index("lat", doc.lat, { "store" : true });
index("lon", doc.lon, { "store" : true });
index("city", doc.city, { "store" : true });
}
if (doc.userMessage && doc.userMessage.length !== 0) {
index("userMessage", doc.userMessage, { "store" : true });
}
}
Note that some indexes are declared with parameters:
- "store"set to true is instructing the search engine that we will need to keep the value and return it up on request
- "facet"set to true is instruction the search engine to count the distinct value repeats
<img height="272px" src="1115019/image003.png" width="504px" />
The generated search index can be tested from the UI
<img height="271px" src="1115019/image004.png" width="504px" />
Facet Searches
For easy REST access we will create three separate search indexes for the facet indexing. The fields we access there are city, sensor record name, and error message (because error messages are a limited range of error codes).
At the end we will have this type of design document:
<img height="435px" src="1115019/image005.png" width="504px" />
With those codes for each search index method:
facetErrors
function (doc) {
if (doc.errorMessage && doc.errorMessage.length !== 0) {
index("errors", doc.errorMessage, { "facet":true });
}
}
facetCity
function (doc) {
if (doc.city) {
index("city", doc.city, { "store" : true, "facet" : true });
}
}
facetRecords
function (doc) {
if (doc.recordtitle) {
index("record", doc.recordtitle, { "store" : true, "facet" : true });
}
}
Azure Web App visualization
For this demo we will build the simplest possible solution for Azure Cloud Web Site.
<img height="350px" src="1115019/image006.png" width="504px" />
This is why we started from an empty template.
<img height="395px" src="1115019/image007.png" width="504px" />
Making several changes into the web.config - adding application settings and enabling default documents
="1.0" ="utf-8"
<configuration>
<appSettings>
<add key="username" value="[username]" />
<add key="password" value="[password]" />
</appSettings>
...
<system.webServer>
<defaultDocument enabled="true" />
...
</system.webServer>
</configuration>
Install one nuget library (needed for REST API calls against Cloudant DB)
<img height="56px" src="1115019/image008.png" width="504px" />
Adding a static helper class for addressing Cloudant search indexes - check this in code on GitHub. The key rows in the three methods for requesting data from Cloudant are the rows where we specify the filters.
GET Facet Search
request.AddQueryParameter("q", "*:*");
request.AddQueryParameter("counts", "[\""+counter+"\"]"); request.AddQueryParameter("limit", "0");
Three parameters are sent in this type of request:
- the query for main filtering - in this case *:* because we want to count all indexed records
- the "counts" array of search index names - in this case we will request separately for "errors", "city" and "records"
- the "limit" parameter to limit the returned search results - in this case I need only counts, so limit is zero (no records returned)
GET Geo-location Search
request.AddQueryParameter("q", "*:*");
request.AddQueryParameter("sort", "\"<distance,lon,lat," + lon + "," + lat + ",km>\"");
request.AddQueryParameter("limit", "5");
Here I request the top five records ordered by distance from a given latitude and longitude.
GET Text Search
request.AddQueryParameter("q", "userMessage:" + text + "*");
request.AddQueryParameter("limit", "10");
The last one is simple search where requested text is extended with wildcard char and results are limited to 10 records.
Index.html
A simple HTML page named with a default name (Index.html in order to be the default page to open) loads JavaScript from jQuery and Google Maps. The resulting view shows the facet counts and pins on the map
<img height="259px" src="1115019/image009.png" width="504px" />
And the text search shows a different type of pins with popup showing the user massage text.
<img height="235px" src="1115019/image010.png" width="504px" />
Points of Interest
This demo did not cover all the ways to use Cloudant Search, but gives us some knowledge about Cloudant Search capabilities - how to combine indexes and how to query them. In the post we built a simple Azure cloud-based web application that is using Cloudnat DBaaS for storage and data processing. And we used a different architecture for an IoT solution and achieved a simple implementation with better usability and simple maintenance.