Introduction
If you need a word search engine for your website or for your application, you must create it, or send your data to One Internet Search engine and send money for indexing your site, but this application can index your text/HTML/ASP files and save the words in its database to search in future.
This application use the SQLite database as its DBMS. This is an Open Source DBMS, and can be used freely. For more information, you can visit this site.
For connecting from .NET code to SQLite, use the .NET wrapper from Finisar Corporation. This wrapper makes the ADO.NET libraries for SQLite, and it is very user-friendly. Its DLL can be download from SourceForge.
using Finisar.SQLite;
Database
You can see the E/R chain diagram of the database in this picture:
This diagram has a relation m<-> n, this relation should be converted to a table (index table) named Word_Page.
Create database (class DBWSE)
I think an application is complete if it can create database in code and its executable file does not include an empty database. For creating the required database, we can use the CREATE
SQL commands.
private void CreateDataBase(string path)
{
try
{
FileStream fi = File.Create(path);
fi.Close();
DBFile = path;
OpenDataBase();
string strSQL = "Create Table words (" +
"WID INTEGER PRIMARY KEY ," +
"Word NVarChar(50)" +
")";
SQLiteCommand sqd = new SQLiteCommand(strSQL, sqconn);
sqd.ExecuteNonQuery();
sqd.CommandText ="Create Table pages(" +
"PID INTEGER PRIMARY KEY ," +
"path NVarChar(100) NOT NULL," +
"nofw INTEGER NOT NULL," +
"Date_add NVARChar(10) NOT NULL" +
")";
sqd.ExecuteNonQuery();
sqd.CommandText ="Create Table word_page(" +
"WID INTEGER FORIGEN KEY REFERENCES words (wid)," +
"PID INTEGER FORIGEN KEY REFERENCES pages (pid)," +
"QTY INTEGER NOT NULL," +
"PRIMARY KEY (WID,PID)" +
")";
sqd.ExecuteNonQuery();
}
catch (Exception ex)
{
throw ex;
}
}
This function creates an empty database with this file.
Add to Word, Page, and Word_Page tables
Now we must send data to the database to be saved. I think to get more speed, we must use SQL commands; this means, "we must use SQL INSERT
commands".
public int AddPage(string path, int nofw)
{
int i = PID(path);
if (i < 0)
{
string strSQL = "INSERT INTO pages (path,nofw,Date_add)" +
"VALUES ('" + path + "'," + nofw + ",'" +
DateTime.Now.ToShortDateString() + "')";
SQLiteCommand sqc = new SQLiteCommand(strSQL, sqconn);
sqc.CommandText = strSQL;
sqc.ExecuteNonQuery();
i = PID(path);
}
return i;
}
Next function calls the "PID function" to find the PID (page ID) of the new page. Moreover, now we add this page data to the database.
private int createWord(string word)
{
string strsql = "INSERT INTO words (word) VALUES ('" + word + "')";
SQLiteCommand sqc = new SQLiteCommand(strsql, sqconn);
sqc.ExecuteNonQuery();
return WID(word);
}
This function adds a word to the "Word" table.
public void AddWord(int page, string word, int QTY)
{
int i = WID(word);
if (i < 0)
i = createWord(word);
string strsql = "INSERT INTO word_page (WID,PID,QTY) VALUES " +
"(" + i.ToString() + "," + page.ToString() +
"," + QTY.ToString() + ")";
SQLiteCommand sqc = new SQLiteCommand(strsql, sqconn);
sqc.ExecuteNonQuery();
}
In this solution, a word cannot be repeated. For this, check if WID
finds a word, if yes, return the ID of the word, otherwise return -1. Moreover, we must add the word to the Words table. In fact, this function saves the index of the page without repeating any of data.
Search in the database
Now, we can search the database for our word.
public DataTable SearchWord(string word)
{
string strSQL = "SELECT pages.path,pages.nofw,pages.Date_add,word_page.QTY " +
"FROM words INNER JOIN word_page ON words.wid=word_page.wid" +
" INNER JOIN pages ON word_page.pid=pages.pid" +
" WHERE words.word='"+word+"'";
SQLiteDataAdapter sqd = new SQLiteDataAdapter(strSQL, sqconn);
DataTable dt = new DataTable();
sqd.Fill(dt);
return dt;
}
This function returns a DataTable
for use in our application.
Create data for saving in the database (index pages)
Now, you have a good database to save pages and its words, but you must create data for this database. This means that you must index the pages!
Initially, this work would be very easy! Because, you can divide a paragraph to words using a splitter. An example is: space , ',' , ')' , '(' , '[' ,'\' and etc., but this solution cannot be very precise.
You must remove trivial and repetitive words, and then you should start indexing.
Split paragraphs
string[] split = words.Split (new Char[] { ' ', ',', '.', ':',';','{','}','[',
']','(',')','\'','"','\\','<','>','=','+','-','/','*','#','&','%','^',
'`','~','0','1','2','3','4','5','6','7','8','9' });
Create a list of words (class ScanFile)
private void StartMonitoring(string p)
{
StreamReader stR = File.OpenText(p);
string words = stR.ReadToEnd();
stR.Close();
string[] split = words.Split(new Char[] { ' ', ',', '.', ':',';','{','}','[',
']','(',')','\'','"','\\','<','>','=','+','-','/','*','#','&','%','^',
'`','~','0','1','2','3','4','5','6','7','8','9' });
max = split.Length;
int index;
int k = 0;
list1.Clear();
for (int i = 0; i < split.Length; i++)
{
WordInfo word = new WordInfo();
word.Word = split[i].Trim();
if (word.Word.Length > 2)
{
SearchableWord = word.Word;
index = list1.FindIndex(HaveWord);
if (index < 0)
{
word++;
list1.Add(word);
k++;
}
else
{
list1[index]++;
k++;
}
}
OnProgress(System.EventArgs.Empty);
}
total = k;
}
private static bool HaveWord(WordInfo str)
{
if (str.Word.ToUpper() == SearchableWord.ToUpper())
return true;
else
return false;
}
In the function, I use the WordInfo
class and the List <Word>
class, and create a new event "OnProgress
" to use when scanning files.
SCF.Progress += new EventHandler(SCF_Progress);
To find a word in List
, we must define a delegate:
private static bool HaveWord(WordInfo str)
However, we cannot find a word by this argument. To solve the problem, I used a private variable for comparing strings:
private static string SearchableWord;
Trivial word (class RepluseTrivial)
To Repulse Trivial words, you must have a list of those words. I think by saving the words in the database, we can quickly access the target word by using a SQL command. For this, create a table of trivial words using a SQL command and insert/delete trivial words in this table using SQL commands.
When creating an object of the RepluseTrivial
class, you must send a list of words to this object and call the Repulse
function to remove the trivial word.
public void Repluse()
{
List<WordInfo> temp = new List<WordInfo>();
for (int i = 0; i < list1.Count; i++)
if (!IsTrivial(list1[i].Word))
temp.Add(list1[i]);
list1 = temp;
}
This function calls the IsTrivial
function to checks for trivial words.
Using the Code
Now you have a list of non-trivial words and a good database to save these words by using the said algorithm, so search by the "search function". However, this function can be developed as an engine that can quietly search such as the Google engine and Microsoft engine etc. (but not equal in performance).
We can show the progress using a progress bar:
public Form1()
{
InitializeComponent();
SCF = new ScanFile();
SCF.Progress += new EventHandler(SCF_Progress);
}
void SCF_Progress(object sender, EventArgs e)
{
progressBar1.Maximum = SCF.MaxProgress;
progressBar1.Value++;
}
To create a list of non-trivial words:
private void button2_Click(object sender, EventArgs e)
{
progressBar1.Value = 0;
SCF.Scan(textBox1.Text);
label2.Text = "Total Word:"+SCF.Total.ToString();
List<WordInfo> list1=new List<WordInfo>();
list1 = SCF.WordList;
string str =
Path.GetDirectoryName(Application.ExecutablePath)+
"\\Trivial.db3";
RepluseTrivial rt = new RepluseTrivial(str);
list1 = rt.Repluse(list1);
listView1.Items.Clear();
for (int i = 0; i < list1.Count; i++)
{
ListViewItem li = new ListViewItem(new string[] { i.ToString(),
list1[i].Count.ToString(), list1[i].Word });
listView1.Items.Add(li);
}
}
Now save the list to the database:
private void button3_Click(object sender, EventArgs e)
{
string str = Path.GetDirectoryName(Application.ExecutablePath);
DBWSE db = new DBWSE(str+""\\WSEDB.db3");
int i=db.AddPage(textBox1.Text,SCF.Total);
progressBar1.Maximum = SCF.WordList.Count;
progressBar1.Value = 0;
for (int j=0;j<SCF.WordList.Count;j++)
{
db.AddWord(i, SCF.WordList[j].Word, SCF.WordList[j].Count);
progressBar1.Value++;
}
}
You can search and show the results in a DataGridView
very easily:
private void button7_Click(object sender, EventArgs e)
{
string str = Path.GetDirectoryName(Application.ExecutablePath);
DBWSE db = new DBWSE(str + "\\WSEDB.db3");
dataGridView2.DataSource = db.SearchWord(textBox3.Text);
}