About This Article
There are lot of examples about WHOIS search floating here and there in the net, but I am trying to explain this using the data scraping technique. My main concern is not WHOIS search, but the use of data scraping in web applications.
Introduction
Quite often, we want to know who owns a given domain. To obtain the registry information, we go to the respective registry (DENIC, Network Solutions etc.) and start a so called WHOIS query (lookup), but this is not always possible and some times you want to implement this search in your web forms. The use of the source code in this article requires the Microsoft .NET Framework SDK installed on a Web server. This article is being written in for the need of building WHOIS search and domain availability search in an ASP.NET project. This article demonstrates how to perform WHOIS search and domain availability search using data mining technique in ASP.NET.
Background
In this article, for data mining / data scraping from a remote Internet resource, I am using System.Net.WebClient
class. This class provides common methods for sending data to and receiving data from a resource identified by a URI internally, it uses the WebRequest
class to provide access to Internet resources. In this case, I am using Download
method of WebClient
class which downloads data from a resource and returns an array of bytes. The data received in the byte form is encoded into system's current ANSI format. For parsing the received data here, I am using Regex
class which contains static methods that allow use of other regular expression classes without explicitly instantiating objects of the other classes. And finally, the information we need is extracted with the help of Match
class, which represents the results of a regular expression matching operation.
The project
To implement data mining in ASP.NET, here I am explaining this using WHOIS (Information about a particular domain) and domain availability by using the WHOIS server of www.directnic.com. For your convenience, you can choose any other server of your choice. You can get a list of such servers anywhere, some of the links are:
Let us start with our example. When user enters the domain name whose information he wants to see, the domain name entered by the user is included in the URL string to form a querystring.
Dim strURL As String = _
"http://www.directnic.com/whois/index.php?query=" _
+ txtDomain.Text
To extract the result of this URL string, we need to capture the output returned by the directnic server. For this purpose, we are using WebClient
class' Download
method which returns data in byte format.
Dim web As New WebClient()
Dim bufData As Byte()
bufData = web.DownloadData(strURL)
Half of our work is done, we got the data, now we need to convert this byte data in a standard format so that we can do parsing easily.
firstLevelbufData = Encoding.Default.GetString(bufData)
Finally, we got the data in a standard format which is stored in firstLevelbufData
string variable. This variable contains a lot of unnecessary information, we need to extract only the required information. So, I'll extract the information present within two matching tags or strings. For this, I have used two variables, first
and last
. You can change the value of first
and last
variable according to your requirement, but for this, you must have the knowledge of the format of data you are receiving.
Dim first, last As String
first = "' <p class=" + Chr(34) + "text12" + Chr(34) + ">"
last = "' </p>"
The best way to extract information from a HTML formatted data is by using regular expression parsing. Here, I am creating a regular expression using the first
and last
variable with the help of Regex
class available in System.Text.RegularExpressions
namespace.
Dim RE As New Regex(first + _
"(?<MYDATA>.*?(?=" + last + "))", _
RegexOptions.IgnoreCase Or RegexOptions.Singleline)
The main task remaining is extraction of the required data which matches the above regular expression. For this purpose, we require Match
class which represents the results of a regular expression matching operation.
txtResult.Text = m.Groups("MYDATA").Value + ""
Listing 1: HTML CODE (Whois.aspx)
<HTML>
<HEAD>
</HEAD>
<body MS_POSITIONING="GridLayout">
<form id="Form1" method="post" runat="server">
<TABLE id="Table1" cellSpacing="0" cellPadding="0" width="358" border="0">
<TR>
<TD>
<asp:Label id="Label2" Runat="server">www.</asp:Label>
<asp:TextBox id="txtDomain" Runat="server">
Check Domain</asp:TextBox></TD>
<TD>
<asp:Button id="btnQuery" Text="Check Domain" Runat="server">
</asp:Button></TD>
</TR>
<TR>
<TD>
<asp:Label id="txtResult" Runat="server"></asp:Label></TD>
<TD></TD>
</TR>
</TABLE>
</form>
</body>
</HTML>
Listing 2: SERVER SIDE CODE (WhoIs.aspx.vb)
Imports System.Net.Sockets
Imports System.Text
Imports System.IO
Imports System.Collections
Imports System.Net
Imports System.Text.RegularExpressions
Public Class WhoIs
Inherits System.Web.UI.Page
Protected WithEvents Label1 As System.Web.UI.WebControls.Label
Protected WithEvents btnQuery As System.Web.UI.WebControls.Button
Protected WithEvents txtResult As System.Web.UI.WebControls.Label
Protected WithEvents Label2 As System.Web.UI.WebControls.Label
Protected WithEvents txtDomain As System.Web.UI.WebControls.TextBox
Private Sub Page_Load(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles MyBase.Load
txtDomain.Attributes.Add("onclick", "this.value='';")
End Sub
Private Sub btnQuery_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles btnQuery.Click
Dim firstLevelbufData As String
Try
Dim strURL As String = _
"http://www.directnic.com/whois/index.php?query=" _
+ txtDomain.Text
Dim web As New WebClient()
Dim bufData As Byte()
bufData = web.DownloadData(strURL)
firstLevelbufData = Encoding.Default.GetString(bufData)
Catch ex As System.Net.WebException
txtResult.Text = ex.Message()
Exit Sub
End Try
Try
Dim first, last As String
first = "<p class= " + Chr(34) + "text12" + Chr(34) +">"
last = "</p>"
Dim RE As New Regex(first + _
"(?<MYDATA>.*?(?=" + last + "))", _
RegexOptions.IgnoreCase Or RegexOptions.Singleline)
Dim m As Match = RE.Match(firstLevelbufData)
txtResult.Text = m.Groups("MYDATA").Value + ""
If txtResult.Text.Length < 10 Then _
txtResult.Text = _
"Information about this domain is not available !!"
Catch e3 As Exception
txtResult.Text = "Sorry the whois information" & _
" is currently not available !!"
End Try
End Sub
End Class
Details
The demo project contains:
- WhoIs.aspx - the class.
- WhoIs.aspx.vb: Code behind page.