Introduction
My project is an Exact Phrase Website Live Search Script. This script is compatible with ASP.NET 3.5, and can be used in websites and web servers, including localhost, domain.com, etc. The script's primary programming is done in C#. This script is a utility which searches out the entire domain.com or the website's files: .html, .htm, .txt, .rtf, .nfo, to find an exact keyword or a phrase. The search does not require any database or temporary file creation or temporary space. Search is done live in the website, in the server that runs the website! In one go, in a matter of only two seconds on a Xeon server, the search completes, parsing out at least 500 web pages, and indexes the searched files right away.
Extended Information
Put this ASP.NET 3.5 script into yourdomain.com/<folder> and search the entire website for an exact word or a phrase. This script is based on ASP.NET 3.5, so if you have installed .NET 3.5 on your server/localhost, there will not be any problems. This script is a freeware, for personal use, and for educational purposes. You can implement this script in universities, schools, and public libraries to search all* files on the website. The script is upgradable. Currently, because the script searches out files live, it does not require any database, file, or any temporary space to index files. What it requires is memory. The search is performed live on the entire website, and only in about two seconds. A minimum of 500 web-pages and text files can be parsed. The phrase or the keyword you specify is searched exactly as you had specified in the search form. All search phrases are searched in lower case so that if the user does not know if the word he is searching is in lower case or in upper case alphabets, the search will seek out all matching words or phrases. 99.9% exact search is guaranteed in a valid parsed file (.html, .htm, .txt., .nfo, .rtf), so this script is ideal for public libraries to search for books, titles, names, etc.
Background
This is the newest search engine module written in ASP.NET and C#. It contains what a demanding, educational, and a free webmaster requires in his website. The script is upgradable because it is driven totally by methods/functions and not by open programming. It is all very well organized.
Using the Code
Download the code, extract the files into the <localhost>/search folder, or <domain.com>/search folder directly in your website in a virtual directory. If you have ASP.NET, the script will be executed from the main display file: phrasesearch.aspx, which will then display a search engine form. The phrasesearch.aspx.cs file is the primary C# code file in which at the top are some constant or global variables required to be configured manually. There are only four to six variables, and are easy to understand for programmers who have knowledge of web development and ASP.
In phrasesearch.aspx.cs, the global variables are only "optional". There is actually no need to configure if no error arises. Unless an error comes up, there would be no requirement to configure these variables because the script automatically takes up the job of configuration as well.
public const String mywebsiteaddr = "";
public const String mydomain = "";
public const String mydomainrootaddr = "";
public const String mysearchdir = "~/";
public const Boolean isSetupDone = false;
public static int myfileslimit = 1000;
public const String myphysicalpath = "";
Some Functions in the Code
public static string PageName()
Returns the script <filename.aspx> filename. This function is optional.
public static Boolean initializeSiteSearching()
This function is initialized only when a new search is about to take place. This function resets all the required variables to null
or 0
values. To automate the task so that no one ASP.NET function is called limitless times while parsing an uncountable number of files, the function also obtains global website related values including the website URL, domain name, and the port number on which the website is running. This function also sets a global configurable variable, a limit of the number of files to be searched in one go.
public static String GenerateInternalLink(String strPath)
This function is called whenever a valid file is parsed and it is about to be indexed live on the search results page. This function is important because it removes the physical path of the file, converts the file to a relative URL path, then returns the URL relative path to be given in the indexed result as a URL or the hyperlink of the file which is being indexed for the matched pattern.
public static String GetRelavitePathOfFile(String filepath)
This function is also called whenever a valid file is parsed and is about to be indexed. The function parses the physical path relative to the localhost. This is the primary function for converting the physical path to the website relative path/URL. Suppose this function returns an error and not the processed website relative URL; only GenerateInternalLink
is called then as the final method to obtain a website relative URL. These two methods are upgradable, and are called in the FindMatchingCurrentFile()
method in which the required automated tasks are performed such as extracting the entire data of the file, stripping tags, etc., and then matching the user's specified phrase to the text contained in the file buffer. If the text is matched any number of times, the file is then indexed, a title obtained from the HTML file, and the link is produced so that the user can click the link to view the file in which the user's specified phrase or the keyword are present.
public static Boolean FindMatchingCurrentFile(string path)
This function performs file related tasks, tries to search for a match of the user specified phrase, and indexes the file into the results section when a matched pattern is found in the file. This is the primary file and indexing related function.
Some Sample Code Snippets
public static Boolean initializeSiteSearching()
{
Boolean returnval = false;
strSearchPhrase = "";
inumResultsLimit = 0;
strDomainRootDir = "";
strDomainUrl = "";
boolSearchingDone = false;
arrayPhraseMatches = null;
iPhraseMatchCount = 0;
arrayPhrasePositions = null;
numMatchingFilesFound = 0;
iCountSearchedFiles = 0;
iFilesLimitReached = false;
strDomainRootDir = HttpContext.Current.Server.MapPath("~/");
strDomainUrl = HttpContext.Current.Request.Url.Host;
strDomainPort = HttpContext.Current.Request.ServerVariables["SERVER_PORT"];
if (myfileslimit <= 0)
myfileslimit = 10000;
if (strDomainRootDir.Length > 0 && strDomainUrl.Length > 0)
{
returnval = true;
}
return returnval;
}
public static String GenerateInternalLink(String strPath)
{
String strfinalurl = "";
String strPathstipped = "";
String strDomainRootStripped = "";
strPathstipped = Replace(strPath, @"^(([a-zA-Z0-9-]+:+\\))", "",
-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
strDomainRootStripped = Replace(strDomainRootDir, @"^(([a-zA-Z0-9-]+:+\\))", "",
-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
strPathstipped = Replace(strPathstipped, @"\\", "/",
-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
strDomainRootStripped = Replace(strDomainRootStripped, @"\\", "/",
-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
strfinalurl = Replace(strPathstipped, strDomainRootStripped, "",
-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
return strfinalurl;
}
public static Boolean SearchForPhraseInFiles(String strPhrase, int ireslimit)
{
System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
sw.Start();
Boolean returnval = false;
if (!isFormPageLoaded)
{
String strstyle = "<div style=\"background-color:silver;" +
"border:thin; border-style:solid; border-color:olive;\">";
strstyle += "<b><em>Please load phrasesearch.aspx file." +
"That is the main display file... Aborted" +
"</em></b><br /><br /></div>";
System.Web.HttpContext.Current.Response.Write(strstyle);
HttpContext.Current.Response.End();
return false;
}
if (initializeSiteSearching() == false)
{
System.Web.HttpContext.Current.Response.Write
("<p><b>Critical error finding path URLs " +
"while initiliazing search! aborting...</b></p>");
return false;
}
strSearchPhrase = strPhrase;
inumResultsLimit = ireslimit;
if (isSetupDone)
System.Web.HttpContext.Current.Response.Write
("<b>Searching the domain:<em>" + mywebsiteaddr +
"</em></b><br /><br />");
else
System.Web.HttpContext.Current.Response.Write
("<b>Searching the domain:<em>" + strDomainUrl +
"</em></b><br /><br />");
if (!isSetupDone)
{
if (ProcessDirectory(strDomainRootDir) == true)
{
returnval = true;
}
}
else
{
if (ProcessDirectory(mysearchdir) == true)
{
returnval = true;
}
}
sw.Stop();
TimeSpan tsobj = sw.Elapsed;
HttpContext.Current.Response.Write("Search took: " +
"(<b style=\"color:purple\"><em>" +
tsobj.TotalSeconds + "</em></b>) seconds.");
return returnval;
}
public static String getTitleTagValue(String contents)
{
Regex pattern;
Match match;
pattern = new Regex(@"<title>([\w\s,.:'-]+)</title>",
RegexOptions.Compiled | RegexOptions.ECMAScript |
RegexOptions.Multiline | RegexOptions.IgnoreCase);
match = pattern.Match(contents);
if (match.Success)
return match.Groups[1].Value;
else
return "";
}
Code/User Interface Language
The language used is English.
Conclusion
This is the second release and is the latest one; it runs on IIS 5+, .NET 3.5, and is fully automated and bugs are fixed. Contact me at: aroratushar@gmail.com if any problem arises.
History
- 27 August 2009: First release.
- 18 August 2010: Updated article and download files.