Introduction
Tired searching for a free currency exchange and stock info Web Service? Here is the solution. I came up with this idea after a failed search for a good and free of charge Web Service that provides currency exchange and other stock info.
Information retrieval will help us in getting information hidden inside HTML pages and trying to put them in a standard format that would be easier for us to use.
In this example, I am showing a class called DataGrabber
which I use to retrieve selected information from MSN Money's page. Let's have a look at this page first and determine the areas of interest inside it.
I have marked some contents inside this web page: rate, direction arrow, change, change ratio, and currency converter. These are services that DataGrabber
will provide by parsing the page's HTML code. Additionally, we should be able to get the page's title to accurately specify the quote's name and determine if a symbol was not found.
Getting the Page HTML
First, we have to get the HTML of the desired page. MSN's page URL looks like this: http://moneycentral.msn.com/detail/stock_quote?Symbol=SYMBOL&FormatAs=Index.
All we need is to specify the symbol for which we are getting the information and send the request to MSN using the WebClient
object.
Private Const URL As String = "http://moneycentral.msn.com/detail/stock_quote?" _
& "Symbol={0}&FormatAs=Index"
Private Function GetPageCode() As String
Dim c As New WebClient()
Dim data As Stream = c.OpenRead(String.Format(URL, Symbol))
Dim reader As New StreamReader(data)
Dim str As String = reader.ReadToEnd
reader.Close() : reader.Dispose()
data.Close() : data.Dispose()
c.Dispose()
Return str
End Function
Private ReadOnly Property Page() As String
Get
If _page Is Nothing Then _page = GetPageCode()
Return _page
End Get
End Property
Public Sub RefreshData()
_page = Nothing
End Sub
Now we have the HTML code we need. Let's start parsing.
Simple HTML Parsing using String Functions
Before going into the HTML parsing functions, let's review some basic functions in the String
class. These functions are very useful in our case.
IndexOf(s As String)
: This function returns the index of the first appearance of s
, returning -1 if s
was not found.IndexOf(s As String, startIndex As Integer)
: Like the previous one, but starts from startIndex
instead of the beginning of the string.LastIndexOf(s As String)
: Returns the index of the last appearance of s
, and returns -1 if s
was not found.Substring(startIndex As Integer, length As Integer)
: Returns the substring starting from startIndex
with a length of length
.ToLower()
: Returns the whole string in lower case.Trim()
: Removes spaces from the beginning and the end of the string.Trim(ParamArray trimChars() As Char)
: Removes all specified characters from the beginning and end of the string.StartsWith(s As String)
: Returns true if the string begins with s
.
Getting the Page Title
This read-only property parses the HTML code looking for the <title></title>
tag to get the page's title which includes the details about the current quote.
Public ReadOnly Property Title() As String
Get
Dim i1, i2 As Integer
i1 = Page.ToLower.IndexOf("<title>") + 7
i2 = Page.ToLower.IndexOf("</title>")
Return Page.Substring(i1, i2 - i1)
End Get
End Property
Getting the Rate
The rate value is placed inside a span
with a CSS class called s1
. So we search for this span
and read the value stored in it. We use the same method we used to get the title.
Private Const S1 As String = "<span class=""s1"">"
Public ReadOnly Property Rate() As Double
Get
Dim i1, i2 As Integer
i1 = Page.ToLower.IndexOf(S1) + S1.Length
i2 = Page.ToLower.IndexOf("</span>", i1)
Dim d As Double = CDbl(Page.Substring(i1, i2 - i1))
Return d
End Get
End Property
Getting the Change and Change Ratio
Change could be UP, DOWN, or UNCH. We have to know which one we are looking for before trying to read the value. We can know by looking for images/up.gif or images/down.gif. If neither exist, then we return 0 (unchanged).
Public ReadOnly Property Change() As Double
Get
Dim s As String
If Page.ToLower.IndexOf(UP) <> -1 Then
s = S4
ElseIf Page.ToLower.IndexOf(DOWN) <> -1 Then
s = S5
Else
Return 0
End If
Dim i1, i2 As Integer
i1 = Page.ToLower.IndexOf(s, Page.ToLower.IndexOf(S1)) + s.Length
i2 = Page.ToLower.IndexOf("</span>", i1)
Dim d As Double = CDbl(Page.Substring(i1, i2 - i1))
Return d
End Get
End Property
Public ReadOnly Property ChangeRatio() As Double
Get
Dim s As String
If Page.ToLower.IndexOf(UP) <> -1 Then
s = S4
ElseIf Page.ToLower.IndexOf(DOWN) <> -1 Then
s = S5
Else
Return 0
End If
Dim i1, i2 As Integer
i1 = Page.ToLower.LastIndexOf(s) + s.Length
i2 = Page.ToLower.IndexOf("</span>", i1)
Dim d As Double = CDbl(Page.Substring(i1, i2 - i1).Trim("%"))
Return d
End Get
End Property
Getting the List of Currencies with Exchange Rates
The last part of my code I am explaining is the one which performs the exchange rate calculation between currencies based on their exchange rates to USD. This info is stored in JavaScript code instead of HTML, which will make our job of parsing it much easier.
First, let's have a look at the format in which the currency name, symbol, and value (against USD) is stored. Here is a part of the long list you will find in MSN Money page's HTML code:
curUSD2X['AED'] = new currency(0.272279262542725, 'Emirati Dirham');
curUSD2X['ARS'] = new currency(0.330906689167023, 'Argentine Peso');
curUSD2X['AUD'] = new currency(0.870776772499084, 'Australian Dollar');
curUSD2X['BHD'] = new currency(2.65561938285828, 'Bahraini Dinar');
It is obvious that we can split each line as follows. These constant strings can be used with the IndexOf()
and Substring()
functions, as we have seen before, to extract the values we need from each line.
Private Const CUR0 As String = "curUSD2X['"
Private Const CUR1 As String = "'] = new currency("
Private Const CUR2 As String = ", '"
Private Const CUR3 As String = "')"
Note that this list will not be in a page's HTML unless the symbol is for the exchange rate (e.g., /USDEUR, /ILSUSD, ..., etc.), but not other symbols (like $INDU, MSFT, -CL, ..., etc.).
To ease dealing with currency info, we define the Currency
class as follows.
Public Class Currency
Private _name As String
Private _symbol As String
Private _amount As Double
Public Sub New(ByVal symbol As String, _
ByVal amount As Double, ByVal name As String)
Me.Symbol = symbol
Me.Amount = amount
Me.Name = name
End Sub
Public Property Name()
Get
Return _name
End Get
Set(ByVal value)
_name = value
End Set
End Property
Public Property Symbol() As String
Get
Return _symbol
End Get
Set(ByVal value As String)
_symbol = value
End Set
End Property
Public Property Amount() As Double
Get
Return _amount
End Get
Set(ByVal value As Double)
_amount = value
End Set
End Property
Public Shared Function Convert(ByVal fromCur As Currency, _
ByVal toCur As Currency, ByVal amount As Double) As Double
Dim result As Double
result = fromCur.Amount * (1 / toCur.Amount)
Return result * amount
End Function
End Class
Public Class CurrencyNameComparer
Implements IComparer(Of Currency)
Public Function Compare(ByVal x As Currency, ByVal y As Currency) _
As Integer Implements System.Collections.Generic.IComparer(Of Currency).Compare
Return String.Compare(x.Name, y.Name)
End Function
End Class
Back to DataGrabber
; define the following property which will help us in determining the type of the symbol.
Public ReadOnly Property IsCurrency() As Boolean
Get
Return Symbol.StartsWith("/")
End Get
End Property
Finally, this is the function which will iterate over all lines of currency rates inside a page's HTML and return a filled list of currencies.
Public Function GetCurrencyList() As List(Of Currency)
If Not IsCurrency Then
Throw New ArgumentException("This works only for currency exchange symbols")
End If
Dim l As New List(Of Currency)
Dim s As String
Dim startIndex As Integer = Page.IndexOf(CUR0)
Dim endIndex As Integer = Page.IndexOf("}", startIndex)
s = Page.Substring(startIndex, endIndex - startIndex)
Dim lines() As String = s.Split(";")
For Each str As String In lines
str = str.Trim
If Not str.StartsWith(CUR0) Then Exit For
Dim nam, sym As String
Dim amt As Double
Dim i1, i2 As Integer
i1 = str.IndexOf(CUR0) + CUR0.Length
i2 = str.IndexOf(CUR1)
sym = str.Substring(i1, i2 - i1)
i1 = str.IndexOf(CUR1) + CUR1.Length
i2 = str.IndexOf(CUR2)
amt = CDbl(str.Substring(i1, i2 - i1))
i1 = str.IndexOf(CUR2) + CUR2.Length
i2 = str.IndexOf(CUR3)
nam = str.Substring(i1, i2 - i1)
Dim cur As New Currency(sym, amt, nam)
l.Add(cur)
Next
l.Sort(New CurrencyNameComparer)
Return l
End Function
Wait! What if Microsoft Changes the HTML Code?
It is oblivious from the beginning that the whole solution is highly vulnerable to damage if Microsoft changes the HTML code of its MSN Money page. In an attempt to reduce this risk to the minimum, I tried to bind the parsing to the CSS classes which are more likely to remain constant; because, Microsoft may change the class itself but not the name of the class, even if the whole design of the page is altered. Anyway, this is a general solution that gives the idea; you may try to write a more sophisticated one which could use some kind of Regular Expressions and parsing rules stored in an external file, so you can keep up with any future changes without rewriting your code or recompiling it.
For more articles, please visit my blog at http://vbnet4arab.blogspot.com (in Arabic only).
Happy coding!