Introduction
In this article: Google like Full Text Search, I had the joy of experimenting with the implementation of a Google like FTS Search engine and was wondering if I could get something more general to work. In that quest, I downloaded and tested the Irony project from http://irony.codeplex.com/.
And I began playing around with it… a lot. There are still some documentation pieces missing but the general idea is really simple and I suggest you test it yourself and try doing some grammars (it’s always useful to remember those Compiler days in the university).
To generalize the idea, I talked with Ivan and converged that the new searcher should be able to search in any kind of database configuration with any type of information. This should allow the user to configure a little library (the one that I will describe here) and provide powerful functionality as the one shown in these examples:
- This example’s configuration is used throughout the article. It allows searching articles in a database filtering by id, code, description and price. In the example, we are asking the articles with id between 100003 and 100009 and with description containing MAGGI.
- In this example, I have downloaded a part of the IMDB database and queried it to know which actors were born on a 15th, June.
Grammars for Different Types
What I did was to simplify the problem to 4 types of information that cover all of the possibilities for a SQL Server column:
- Numeric: All integers, doubles, floats go in this category
- Date: All date, datetime, time types
- String: All type of varchars
- FTS: All those varchars and text that have a full-text index defined
For each of these columns, I defined a grammar with similar operations between them. Some examples are given here:
Type | Description | Example |
NUM | All data bigger than 3 and smaller than 10 | >3 and <10 4:9 |
NUM | Different than 5 or (smaller than 3 and bigger than 0) | !5 or (<3 and >0) |
STR | Like 'vladimir' | vladimir |
STR | Begins with ‘start’ and ends with ‘end’ | start* and *end start* *end |
DAT | Later than 01/01/1980 (Dates are always in dd/mm/yyyy format) | >01/01/1980 |
DAT | In august, 9th. Any year | d(08) m(08) |
DAT | Between the 01/01/1990 and 31/01/1990 | 01/01/1990:31/01/1990 |
FTS | Inflectional forms of ‘lived’ and inflectional forms of ‘in’ and inflectional forms of ‘germany’ | Lived in germany |
FTS | Contains ‘lived in germany’ | "Lived in germany" |
You are welcome to browse each of the grammars defined for this library. They contain the operations needed for all combinations of the above (as long as the column type is coherent with the operation).
Now that we have the required grammars we need to know how to interpret them. For example, the numeric grammar should interpret to a SQL query that conforms to the requested expression.
The library contains the interpretation for all grammars to SQL like expressions. Please note that each of the interpreters contain an association to the column that is going to be filtered with this expression. In the following lines, we will explore the structure that defines the search to be performed.
All these interpreters were inspired by the excellent FTS interpreter given in the link above and so I leave for you the analysis of these classes.
Field Grammar & Search Structure
Up until now, all we have is a way to interpret an expression (in any of the 4 defined types) and form a SQL expression with it. In other words, all we have is the WHERE
clause for each of the columns that participate in the search.
Now, we will define a grammar to define the columns and the search expressions for each of the columns. Let’s say we have 2 searchable columns: bio and name. We would like to have a grammar that understands the following:
bio=lived in germany & name=Jurgen
We could parse this expression, and interpret each of the columns with the corresponding interpreter. Finally, we would obtain a nicely formed SQL expression that would do the search for us. The grammar defined to deal with this is:
Expression =>
ColumnSearch |
Expression QuerySearchOp ColumnSearch
ColumnSearch =>
Name EqualOp Term
QuerySearchOp =>
AndOp |
OrOp
EqualOp =>
"="
AndOp =>
"&"
OrOp =>
"|"
The class defining this grammar gives the correct priorities to our operators and we quickly have a simple grammar to deal with the desired expression. However, the grammar needs to define only one search criteria per line. So, the search component adds a new line before each search term.
The search structure that works with this library contains the following:
A search structure contains a collection of tables. Each table contains a join
(except for the first table) and a collection of columns. A column contains:
- Name of the column: This is the name of the column in the database
- Keyword: The keyword that the grammar will recognize as valid to refer to this column
- Searchable: Whether this column is searchable or not
- Title: The title of the column in the resulting dataset
- Appear in results: Whether the column appears in the resulting dataset or not
- Type: The column type defines the grammar that will be used to interpret the search term for the column
- Appear in Standard Search: This will be discussed in the final subtitles of this article.
Forming the SQL Expression
We are ready to pass a human like query to our interpreter and this should validate the expression and interpret each part of it to produce a valid SQL expression conforming to the given query. Specifically, the intervening classes are:
All we have to do is call the GetQuery()
method of a SqlServerInterpreter
object to obtain the requested SQL query. The constructor of the SqlServerInterpreter
will ask for a structure that defines the columns and tables to search.
This solution is needed many times in a system, for different tables and different columns. This is why we should have many configurations to work with our library. Also, we need the configurations to be read from a database or defined directly on the client. The library provides the ability to collect, in a singleton hash table, the different configurations of the search structures.
Also, the resulting object is very important. The SearchResult
class defines an object that contains the resulting query for the query expression and the resulting DataSet
after performing the search. The class diagram for the Searcher
and the SearchResult
is the following:
Searching
The use of this structure is very simple. Here we have an example of a search to be performed in two tables: tbl_in_articulos
and tbl_fa_precios
. The join
between these two tables is when tbl_in_articulos.id = tbl_fa_precios.id_articulo
. The resulting DataSet
should contain:
id
: numeric and searchable and keyword=id codigo_articulo
: string and searchable and keyword=cod descripcion
: string and searchable and keyword=desc precio
: numeric and searchable and keyword=pre
The constructor expects a SearchStructure
and so we subclass this class, define the columns and tables in the constructor of this subclass, and add them to the structure with the given methods.
1 public class ArticlesSearch : SearchStructure
2 {
3 public ArticlesSearch()
4 {
5 TableStructure basetbl = new TableStructure("tbl_in_articulos");
6
7 Column col = new Column("id", "id", "id",
8 Column.ColumnType.Numeric, basetbl.Name);
9 basetbl.addColumn(col);
10 col = new Column("codigo_articulo", "Codigo", "cod",
11 Column.ColumnType.String, basetbl.Name);
12 basetbl.addColumn(col);
13 col = new Column("descripcion", "Descripcion", "desc",
14 Column.ColumnType.String, basetbl.Name);
15 basetbl.addColumn(col);
16
17 this.addTable(basetbl, null);
18
19 TableStructure tbl = new TableStructure("tbl_fa_precios");
20
21 col = new Column("id_articulo", "id_articulo", "none",
22 Column.ColumnType.Numeric, tbl.Name);
23 col.Searchable = false;
24 col.AppearInResults = false;
25 tbl.addColumn(col);
26 col = new Column("precio", "Precio", "pre",
27 Column.ColumnType.Numeric, tbl.Name);
28 tbl.addColumn(col);
29
30 JoinStatement join = new JoinStatement(basetbl.Name + ".id" ,
31 tbl.Name + ".id_articulo",
32 JoinStatement.JoinType.Inner);
33 this.addTable(tbl, join);
34 }
35 }
Now, all we need to do is instantiate an object of this structure and use it with our searcher:
1 string query = txt.Text;
2 Searcher srch = new Searcher(new ArticlesSearch());
3 srch.ConnectionString =
4 ConfigurationManager.ConnectionStrings["SearchConnectionString"].ConnectionString;
5 SearchResult res = srch.search(query);
6 try
7 {
8 lbl.Text = res.Message.Replace("\n","
") + "
" + res.GeneratedQuery;
9
10 DataView dtView = new DataView(res.Data.Tables[0]);
11 gdv.DataSource = dtView;
12 gdv.DataBind();
13 } catch { ; }
The Searcher
object is instantiated passing an ArticlesSearch
object as a parameter. This defines the grammar to be used to parse the expressions. Next, we can define the connection string to be used by the searcher (the connection string used by default is ‘SearchConnectionString
’). Finally, we search passing the text from the text box in our client.
The result is read as a DataSet
in the data field of the SearchResult
object returned. This can be passed to a gridview
and we can have a very powerful searcher in seconds.
Another way to obtain the same result is to define the structure directly in the database. The library expects the following structure:
This structure can easily be loaded with the scripts distributed with the library below. The only thing here is that you should insert the records directly with SQL instructions. A complete example of a configuration equivalent to the one seen programmatically will be the following:
1 insert into SearchConfig
2 values (newid(), 'test')
3
4
5 insert into SearchTable
6 values
7 (newid(), 'tbl_in_articulos', 0, 'DD9DD3DF-5340-4461-86CA-B3A7A589F404')
8
9 insert into SearchTable
10 values
11 (newid(), 'tbl_fa_precios', 1, 'DD9DD3DF-5340-4461-86CA-B3A7A589F404')
12
13 select * from SearchJoin
14
18
19 insert into SearchJoin
20 values
21 (newid(), 'F84F27CD-D40A-473F-9A5E-B67346E641AA',
22 'tbl_in_articulos.id', 'tbl_fa_precios.id_articulo', 'INNER')
23
24
25 insert into SearchColumn
26 values
27 (newid(), 'FBDDB7EB-66D1-425E-A22F-5DEBC16662DC', 'id'
28 'Id', 'NUM', 'id', 1, 1, 0)
29
30 insert into SearchColumn
31 values
32 (newid(), 'FBDDB7EB-66D1-425E-A22F-5DEBC16662DC', 'codigo_articulo'
33 'Codigo', 'STR', 'cod', 1, 1, 0)
34
35 insert into SearchColumn
36 values
37 (newid(), 'FBDDB7EB-66D1-425E-A22F-5DEBC16662DC', 'descripcion'
38 'Descripcion', 'STR', 'desc', 1, 1, 0)
39
40
41 insert into SearchColumn
42 values
43 (newid(), 'F84F27CD-D40A-473F-9A5E-B67346E641AA', 'id_articulo'
44 'ID_Articulo', 'NUM', 'none', 0, 0, 0)
45
46 insert into SearchColumn
47 values
48 (newid(), 'F84F27CD-D40A-473F-9A5E-B67346E641AA', 'precio'
49 'Precio', 'NUM', 'pre', 1, 1, 0)
In this case, to use the Searcher
, all we need is the following lines:
1 Searcher srch = new Searcher("test");
2 srch.ConnectionString =
3 ConfigurationManager.ConnectionStrings["SearchConnectionString"].ConnectionString;
4 SearchResult res = srch.search(query);
Final Remarks
I have tried to provide the most functionality to the searcher. I currently use it with no problems for most intranet applications.
It sure needs some extra work to publish it to the world in a searcher in internet. The most critical issue regarding an internet publication would be the capability to suggest the possible term to expect. This should easily be done by Irony and I expect to include this feature in the library in the short term.
Another pending task in my to-do list is the ability to apply the search term to all columns. For example, when we just type: ‘lived in Germany’; we would like it to apply to all columns that are searchable. This is why the Column structure has a StandardSearch
flag.
Although the less than and bigger than operators are straight forward there are some issues when using it in an ASP.NET page because of the validation; once that is disabled everything works fine.
Conclusion
A generic searcher is possible combining different Human-like-query-to-SQL interpreters for different grammars. In this article, we have seen how this can be done for 4 different types of data that cover most of the possibilities in a SQL Server.
The library provides a very simple interface to incorporate powerful search capabilities to any table(s) in your database. The view capability (order by, filtering, etc.) is left to be worked in the client. The only task the searcher does is exactly that: search and return a data set.
History
- 24th August, 2010: First release