Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

Data Scraping in ASP.NET Using WhoIs Search !!

0.00/5 (No votes)
23 Jun 2004 1  
Data scraping in ASP.NET using WhoIs search !!

Sample Image - DataScraping.jpg

About This Article

There are lot of examples about WHOIS search floating here and there in the net, but I am trying to explain this using the data scraping technique. My main concern is not WHOIS search, but the use of data scraping in web applications.

Introduction

Quite often, we want to know who owns a given domain. To obtain the registry information, we go to the respective registry (DENIC, Network Solutions etc.) and start a so called WHOIS query (lookup), but this is not always possible and some times you want to implement this search in your web forms. The use of the source code in this article requires the Microsoft .NET Framework SDK installed on a Web server. This article is being written in for the need of building WHOIS search and domain availability search in an ASP.NET project. This article demonstrates how to perform WHOIS search and domain availability search using data mining technique in ASP.NET.

Background

In this article, for data mining / data scraping from a remote Internet resource, I am using System.Net.WebClient class. This class provides common methods for sending data to and receiving data from a resource identified by a URI internally, it uses the WebRequest class to provide access to Internet resources. In this case, I am using Download method of WebClient class which downloads data from a resource and returns an array of bytes. The data received in the byte form is encoded into system's current ANSI format. For parsing the received data here, I am using Regex class which contains static methods that allow use of other regular expression classes without explicitly instantiating objects of the other classes. And finally, the information we need is extracted with the help of Match class, which represents the results of a regular expression matching operation.

The project

To implement data mining in ASP.NET, here I am explaining this using WHOIS (Information about a particular domain) and domain availability by using the WHOIS server of www.directnic.com. For your convenience, you can choose any other server of your choice. You can get a list of such servers anywhere, some of the links are:

Let us start with our example. When user enters the domain name whose information he wants to see, the domain name entered by the user is included in the URL string to form a querystring.

 Dim strURL As String = _
  "http://www.directnic.com/whois/index.php?query=" _
  + txtDomain.Text

To extract the result of this URL string, we need to capture the output returned by the directnic server. For this purpose, we are using WebClient class' Download method which returns data in byte format.

            Dim web As New WebClient()
            ' byte array to store the extracted bufData by webclient

            Dim bufData As Byte()
            bufData = web.DownloadData(strURL)

Half of our work is done, we got the data, now we need to convert this byte data in a standard format so that we can do parsing easily.

    ' got the bufData now convert it into string format

            ' firstLevelbufData is a string variable

            firstLevelbufData = Encoding.Default.GetString(bufData)

Finally, we got the data in a standard format which is stored in firstLevelbufData string variable. This variable contains a lot of unnecessary information, we need to extract only the required information. So, I'll extract the information present within two matching tags or strings. For this, I have used two variables, first and last. You can change the value of first and last variable according to your requirement, but for this, you must have the knowledge of the format of data you are receiving.

    Dim first, last As String
            ' chr(34) is used for (") symbol

            ' <p class="text12" > this is the first string

            first = "' <p class=" + Chr(34) + "text12" + Chr(34) + ">"
            last = "' </p>"

The best way to extract information from a HTML formatted data is by using regular expression parsing. Here, I am creating a regular expression using the first and last variable with the help of Regex class available in System.Text.RegularExpressions namespace.

 Dim RE As New Regex(first + _
  "(?<MYDATA>.*?(?=" + last + "))", _
  RegexOptions.IgnoreCase Or RegexOptions.Singleline)

The main task remaining is extraction of the required data which matches the above regular expression. For this purpose, we require Match class which represents the results of a regular expression matching operation.

     txtResult.Text = m.Groups("MYDATA").Value + ""

Listing 1: HTML CODE (Whois.aspx)

 <HTML>
    <HEAD>
        
    </HEAD>
    <body MS_POSITIONING="GridLayout">
        <form id="Form1" method="post" runat="server">
        <TABLE id="Table1" cellSpacing="0" cellPadding="0" width="358" border="0">
            <TR>
            <TD>
                <asp:Label id="Label2" Runat="server">www.</asp:Label>
                <asp:TextBox id="txtDomain" Runat="server">
                Check Domain</asp:TextBox></TD>
            <TD>
                <asp:Button id="btnQuery" Text="Check Domain" Runat="server">
                </asp:Button></TD>
        </TR>
        <TR>
            <TD>
                <asp:Label id="txtResult" Runat="server"></asp:Label></TD>
            <TD></TD>
        </TR>
        </TABLE>
        </form>
    </body>
</HTML>

Listing 2: SERVER SIDE CODE (WhoIs.aspx.vb)

Imports System.Net.Sockets
Imports System.Text
Imports System.IO
Imports System.Collections
Imports System.Net
Imports System.Text.RegularExpressions

     Public Class WhoIs
    Inherits System.Web.UI.Page
    Protected WithEvents Label1 As System.Web.UI.WebControls.Label
    Protected WithEvents btnQuery As System.Web.UI.WebControls.Button
    Protected WithEvents txtResult As System.Web.UI.WebControls.Label
    Protected WithEvents Label2 As System.Web.UI.WebControls.Label
    Protected WithEvents txtDomain As System.Web.UI.WebControls.TextBox


    Private Sub Page_Load(ByVal sender As System.Object, _
      ByVal e As System.EventArgs) Handles MyBase.Load
        ' Adds the java script code for clearing  the existing text

        ' from the text box when user wants to

        ' enter a new domain name

        txtDomain.Attributes.Add("onclick", "this.value='';")
    End Sub

    Private Sub btnQuery_Click(ByVal sender As System.Object, _
      ByVal e As System.EventArgs) Handles btnQuery.Click
        ' Stores the bufData extracted from the webclient 

        Dim firstLevelbufData As String 
        Try
            ' similarly we can select any server address for bufData mining

            Dim strURL As String = _
              "http://www.directnic.com/whois/index.php?query=" _
              + txtDomain.Text
            Dim web As New WebClient()
            ' byte array to store the extracted bufData by webclient

            Dim bufData As Byte()
            bufData = web.DownloadData(strURL)
            ' got the bufData now convert it into string form

            firstLevelbufData = Encoding.Default.GetString(bufData)
            ' this exception will be fired when the host name

            ' is not resolved or any other connection problem

        Catch ex As System.Net.WebException
            txtResult.Text = ex.Message()
            Exit Sub
        End Try
        Try
            ' first and last are the regular expression string

            ' for extraction bufData witnin two tags

            ' you can change according to your requirement

            Dim first, last As String
            ' chr(34) is used for (") symbol

            first = "<p class=  " + Chr(34) + "text12" + Chr(34) +">"
            last = "</p>"
            Dim RE As New Regex(first + _
              "(?<MYDATA>.*?(?=" + last + "))", _
              RegexOptions.IgnoreCase Or RegexOptions.Singleline)
            ' try to extract the bufData  within the first and last tag

            Dim m As Match = RE.Match(firstLevelbufData)
            ' got the result

            txtResult.Text = m.Groups("MYDATA").Value + ""
            ' check if no information abour that domain is available

            If txtResult.Text.Length < 10 Then _
              txtResult.Text = _
                "Information about this domain is not available !!"
        Catch e3 As Exception
            txtResult.Text = "Sorry the whois information" & _ 
                                 " is currently not available !!"
        End Try
    End Sub

End Class

Details

The demo project contains:

  • WhoIs.aspx - the class.
  • WhoIs.aspx.vb: Code behind page.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here