(untagged)

ASP.NET Exact Phrase Website Live Search Version 2.0 for .NET Framework 2 and newer

Tushar Arora

0.00/5 (No votes)

20 Aug 2010

ASP.NET Script to search out exact keywords or phrases live in a website. Works from .NET Framework 2.0

Download source (version 2.0) - 156.78 KB

Introduction

My project is Exact Phrase Website Live Search Script ASP.NET and works well in .NET Framework 2.0 and above. Script can be used in ASP.NET Websites: localhost, www.mydomain.com, etc. I have programmed this script in C#. This script is a free utility which searches out the entire domain.com, the text files which include: .html, .htm, .txt, .rtf, .nfo., .doc, .php, .asp, .aspx, .php4, .php5, .xml, etc. and according to your own customization. You can search for an exact keyword or an exact phrase. Limitations can be further imposed by the Administrator of this script. You can upgrade this script into far more advanced web applications. By using this Search Script, you do not require any database, neither any temporary file creation, nor temporary space. Search is done right live into the website, in the server which runs the website! In one go, in a matter of only two seconds on the Xeon server, the search completes parsing out at least 100 large web pages and indexes the searched files right away. Due to live searching, it gives load on the entire physical server, therefore this script can be used in super servers and in universities, colleges, libraries, and in schools. This Script is a full web application and solves the problem of website live searching in the websites of those who do not know programming or simply wish to have website searching for keywords and phrases.

Extended Information

This is version 2.0 + Revision 8 of the older version 1.0 of the Script. Put this ASP.NET script into yourdomain.com/<folder> and search the entire website for an exact word or a phrase. This script is freeware, for personal use, and for educational purposes. You can implement this script in universities, schools, and public libraries to search all* files on the website. The script is upgradable. Currently, because the script searches out files live, it does not require any database, file, or any temporary space to index files, but it puts load on Server. What it requires is memory and load on Server. Whatever new is stored on server, this Script processes even the new files. The phrase or the keyword you specify is searched exactly as you had specified in the search form. It is searched exactly as it is specified by the user in the search engine form. All search phrases are searched in lower case so that if the user did not know if the word he is searching for is in lower case or in upper case alphabets, the search will seek out all such matching words or phrase. 99.9% of guaranteed exact search is found when matched if it exists in a valid parsed file (.html, .htm, .txt., .nfo, .rtf), so this script fits the public libraries to search for books, titles, names, etc. from the configured file extensions.

Background

This is a free search engine module written in ASP.NET and C#. It contains what a demanding educational, or a free webmaster requires in his website. The script is upgradable because it is driven totally by methods/functions, is structured and organized, is not congested with random coding.

Using the Code

Download the code, extract the files into <localhost>/search folder, or in <domain.com>/search folder directly in your website. If you have ASP.NET, the script will be executed from primary file: phrasesearch.aspx which will then display a search engine form. The phrasesearch.aspx.cs file is the primary C# code file in which at the top are some constant or global variables required to be configured manually. But there is no need to configure them unless required. There are only four-six variables and are easily understood by the programmers who have knowledge of web development and ASP.

In phrasesearch.aspx.cs, these are the global variables which are only "optional". There is actually no need to configure unless errors arise.

/*****************************************************************
//
//  MANUAL CONFIGURATION SECTION**/
    
//*****************************************************************/
    
// Configure your website address here:
// example: http://www.mydomain.com:80/
// example: http://www.mydomain2.com:8080/
// example: http://www.mydomain3.com:2200/
// also include port number of your http server along with the website url :XX
// please do not put slash mark "/" after the fully qualified website address
public const String mywebsiteaddr = "";
    
// Configure your domain here:
// example: http://mydomain.com:9090
// example: http://localhost:50
// example: http://tushar
// X NOT: mydomain.com, mydomain, etc. etc.
// Fully qualified http://domain.com address is required along with port number
// also include port number of your http server along with the website url :XX
public const String mydomain = "";
    
// Configure your domain/website's root directory here:
// example: "public_html" or "wwwroot" or "httpdocs"
// please do not put slash mark "/" after the directory's name
public const String mydomainrootaddr = "";
    
// Configure your starting search directory which will be entirely search
// example: "/" (root of your domain) or "/folder1" or "/folder2" 
// are www.lawsofbrahman.com/folder1 and www.lawsofbrahman.com/folder2
// please put back slash mark "/" after the directory's name
public const String mysearchdir = "/";
    
// Enable the flag that you have filled the variables above:
// example: isSetupDone = true;
public const Boolean isSetupDone = false;    
    
//*****************************************************************/
    
// Configure maximum number of files to be search in the entire website.
// example: myfileslimit = 10000 -> will search only Ten thousand files
// example: myfileslimit = 1000 -> will search only One thousand files
// example: myfileslimit = 50 -> will search only Fifty files
// please only put digits and no negative number
// This variable is not under the isSetupDone flag and 
// so is always enabled in the code. 
// When you wish to use this variable, please select [ search unlimited files ] 
// option from the total searched files limit combobox, 
// otherwise this variable won't work.
public static int myfileslimit = 100;
    
// Recommended configuration
// Configure your domain/website's physical path here, 
// if you actually and exactly know it!:
// example: myphysicalpath = "C:\\Inetpub\\wwwroot"
// please do not put slash mark "\" after the directory's name
// please put double backslash between all the 
// directory names in the path: C:\\dir1\\dir2\\dir3
// This setting does not depend on isSetupDone flag.
// Example2: myphysicalpath = "C:\\Inetpub\\vhosts\\lawsofbrahman.com\\httpdocs"
public const String myphysicalpath = "";
    
// arrayFileTypesToSearch is a String Array containing 
// the extensions of the files that are to be parsed. All other files are ignored.
// ".htm" will search the htm webpages
// ".txt" will search the text files
// IMP: PLEASE PUT "." DOT IN FRONT OF THE EXTENSIONS.
// MODIFY THE ARRAY AS IT IS AND DO NOT MAKE IT SOMETHING ELSE! 
// OTHERWISE SCRIPT MIGHT MALFUNCTION.
// LIKE THIS: = {".txt", ".docx", ".html", ".xml"} 
// WITHIN "{" AND "}" CURLY BRACKETS.
public static String[] arrayFileTypesToSearch = 
	{".htm", ".html", ".txt", ".nfo", ".rtf", ".doc", ".xml", ".php", 
	".asp", ".aspx", ".php4", ".php5"};

These variables above are configured manually if the script does not work as expected.

public static Boolean initializeSiteSearching()
{
    Boolean returnval = false;
    
    // Reset all static variables before beginning a new search...
    strSearchPhrase = "";
    inumResultsLimit = 0;
    strDomainRootDir = "";
    strDomainUrl = "";
    boolSearchingDone = false;
    arrayPhraseMatches = null;
    iPhraseMatchCount = 0;
    arrayPhrasePositions = null;
    numMatchingFilesFound = 0;
    iCountSearchedFiles = 0;
    iFilesLimitReached = false;
    
    // Find the domain's root directory
    try
    {
        strDomainRootDir = HttpContext.Current.Server.MapPath("/");
    }
    catch (Exception)
    {
        strDomainRootDir = HttpContext.Current.Server.MapPath("~/");
    }
    
    // Find domain's URL
    strDomainUrl = HttpContext.Current.Request.Url.Host;
    
    // Get the server/domain.com's port
    strDomainPort = HttpContext.Current.Request.ServerVariables["SERVER_PORT"];
    
    // Configure the total files limit if user specified negative digits, 
    // reconfigure it with default value
    if (myfileslimit <= 0)
        myfileslimit = 1000;
        
    // Finalize if we got the domain root directory and the domain url
    if (strDomainRootDir.Length > 0 && strDomainUrl.Length > 0)
    {
        returnval = true;
    }
    return returnval;
}

This function initiates the search.

Some Functions in the Code

```
public static string PageName()
```
Returns the script <filename.aspx> filename. This function is optional.
```
public static Boolean initializeSiteSearching()
```
This function is initialized only when the new search is about to take place. This function resets all the required variables to null or 0 values. To automate the task so that no one ASP.NET function is called limitless times while parsing uncountable number of files, the function also obtains global website related values including the website URL, domain name, and the port number on which the website is running. This function also sets a global configurable variable, a limit of the number of files to be searched in one go.
```
public static String GenerateInternalLink(String strPath)
```
This function is called whenever a valid file is parsed and it is about to be indexed live on the search results page. This function is important because it removes the physical path of the file, converts the file to a relative URL path, then returns the URL relative path to be given in the indexed result as a URL or the hyperlink of the file which is being indexed for matched pattern. This function is important for resolving Website related paths.
```
public static String GetRelavitePathOfFile(String filepath)
```
This function is also called whenever a valid file is parsed and is about to be indexed. The function parses the physical path relative to the localhost. This is the primary function for converting the physical path to the website relative path/URL. Suppose if this function returns an error but not the processed website relative URL, then only the GenerateInternalLink is called as the final method to obtain a website relative URL. These two methods are upgradable and are called in FindMatchingCurrentFile() method in which the required automated tasks are performed such as: Extracting the entire data of the file, stripping tags, etc. and then matching the user's specified phrase to the text contained in the file buffer. If the text is matched, any number of times, the file is then indexed, title obtained from the HTML file, and the link is produced so that the user can click the link to view the file in which the user's specified phrase or the keyword are present. This function uses GenerateInternalLink() function for help in resolving paths.
```
public static Boolean FindMatchingCurrentFile(string path)
```
This function performs file related tasks, tries to search any match of the user specified phrase and indexes the file into the results section when any matched pattern is found in the file. This is the primary file and indexing related function.

Some Sample Code Snippets

public static String GenerateInternalLink(String strPath)
   {
       String strfinalurl = "";
       String strPathstripped = "";
       String strDomainRootStripped = "";

       String strDomainRootPath = "";
       try
       {
           strDomainRootPath = HttpContext.Current.Request.MapPath("/");
       }
       catch (Exception)
       {
           strDomainRootPath = HttpContext.Current.Request.MapPath
			(HttpContext.Current.Request.ApplicationPath);
       }

       // Remove the physical path
       strPathstripped = Replace(strPath, @"\\", "/", -1, 0, 
		RegexOptions.IgnoreCase | RegexOptions.Singleline);
       strDomainRootStripped = Replace(strDomainRootPath, @"\\", 
		"/", -1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
       strfinalurl = Replace(strPathstripped, strDomainRootStripped, 
		"", -1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);
       return strfinalurl; 
   }

public static String GetRelavitePathOfFile(String filepath)
    {
        String strresult = "";
        String strfinalurl = "";
        String strRootPath = "";
        try
        {
            strRootPath = HttpContext.Current.Request.MapPath("/");
        }
        catch (Exception)
        {
            strRootPath = HttpContext.Current.Request.MapPath
		(HttpContext.Current.Request.ApplicationPath);
        }

        // First Check if a physical path had been configured.

        // Check if admin user specified an exact physical path
        if (myphysicalpath.Length > 0)
        {
            // A physical file path was specified which will be stripped directly.
            // Strip drive letter and trailing :\
            strfinalurl = Replace(filepath, strRootPath, "", 
		-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);

            String modphysicalpath = Replace(strfinalurl, @"\\", "/", 
		-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);

            // Replace physical path separators with url path separator
            strfinalurl = Replace(strfinalurl, @"\\", "/",
		-1,0,RegexOptions.IgnoreCase | RegexOptions.Singleline);

            // now finally remove the physical path's unnecessary bytes 
            // by matching modphysicalpath
            strresult = Replace(strfinalurl, modphysicalpath, "", 
		-1, 0, RegexOptions.IgnoreCase | RegexOptions.Singleline);

            // Return the result
            return strresult;
        }

        String strfileurl = GenerateInternalLink(filepath);

        // If the user has not created the setup then proceed with default settings
        if (isSetupDone == false)
        {
            /*
             * NEW CODE
             */
            if (thisDomainName == "localhost") 
		{ strfileurl = "http://localhost:" + strDomainPort + "/" + strfileurl; }
            else { strfileurl = "http://" + thisDomainName + ":" + 
				strDomainPort + "/" + strfileurl; }
            return strfileurl;
        }
        else
        {
            // using user defined variables****************
            // NEW CODE
            if (mydomain == "localhost")
		{strfileurl = "http://localhost:" + strDomainPort + "/" + strfileurl; }
            else { strfileurl = "http://" + mydomain + "/" + strfileurl; }
            return strfileurl;
        }

        // Nothing found
        return "Error in GetRelativePathOfFile method";
    }

Code/User Interface Language

Language is English.

Conclusion

This is my second release and is the latest one, it runs on IIS+, .NET 2.0 or newer, is fully automated and bugs are fixed. Contact me at: aroratushar@gmail.com if any problem arises.

Points of Interest

I was fascinated with keywords and phrase searching, and so I tried to make a script based Search Engine. I love .NET and fund of programming in Visual Studio, Visual Basic, and in .NET. I felt happy to create my own search engine and found this script very useful, for those who need it.

History

17^th August 2010: New updates and bug fixes
27^th August 2009: First release

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here