(untagged)

Reading Meta Tags of Any Page Programatically without loading in browser

ASP.NET Community

0.00/5 (No votes)

13 Sep 2009

In this article I will show you how to read meta tags programatically using C# and Asp.Net. How this article is different from other articles

This articles was originally at wiki.asp.net but has now been given a new home on CodeProject. Editing rights for this article has been set at Bronze or above, so please go in and edit and update this article to keep it fresh and relevant.

In this article I will show you how to read meta tags programatically using C# and Asp.Net. How this article is different from other articles available on internet is that all the samples available on internet talks about reading and writing tags from page itself but In this article our approach will be do dynamically download the contents of a page and read meta tags from it.

First thing first we need to download the content of page without loading it into browser. For this we will be using WebRequest class. Below Code creates a request to "http://www.microsoft.com/en/us/default.aspx" using default credentials

// Create a request for the URL.
WebRequest request = WebRequest.Create("http://www.microsoft.com/en/us/default.aspx");
// If required by the server, set the credentials.
request.Credentials = CredentialCache.DefaultCredentials;

Now we are set to get response from the client. To receive response we are going to use WebResponse class as

// Get the response.
HttpWebResponse response = (HttpWebResponse)request.GetResponse();

Once we have response we want to load this into Html DOM. As you should know that Html uses DOM model to load documents. So in next few lines we will get response in form of string and use that string to load IHTMLDocument2 class.

// Get the stream containing content returned by the server.
        Stream dataStream = response.GetResponseStream();
        // Open the stream using a StreamReader for easy access.
        StreamReader reader = new StreamReader(dataStream);
        // Read the content.
        string responseFromServer = reader.ReadToEnd();

//reads the html into an html document to enable parsing
        IHTMLDocument2 doc = new HTMLDocumentClass();
        doc.write(new object[] { responseFromServer });
        doc.close();

Now that we have entire Page loaded in memory in form of HtmlDocument we are going to iterate it and retrieve Meta tags from it.

        //loops through each element in the document to check if it qualifies for the attributes to be set
        foreach (IHTMLElement el in (IHTMLElementCollection)doc.all)
        {
            // check to see if all the desired attributes were found with the correct values
            bool qualify = true;
            if (el.tagName == "META")
            {
                HTMLMetaElement meta = (HTMLMetaElement)el;
                Response.Write("Content " + meta.content + "<br/>");
            }

        }

Of course you can do lot of more things with above code. But we will take that up in some other articles. For your reference I am pasting the complete code below. For the sample to work please add a reference to mshtml by

Steps:-

1.) In the solution explorer, highlight the project to which you want to add the parsing functionality
2.) In the menu, click on Project -> Add reference
3.) In the dialog box that is shown, under the .Net tab - choose the Microsoft.mshtml assembly
4.) Click the select button and click on the OK button

Now we can reference this assembly

Don't forget to add namespace

using mshtml;

Response.Write("Button2_Click");

        // Create a request for the URL.
        WebRequest request = WebRequest.Create("http://www.microsoft.com/en/us/default.aspx");
        // If required by the server, set the credentials.
        request.Credentials = CredentialCache.DefaultCredentials;
        // Get the response.
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        // Display the status.
        Console.WriteLine(response.StatusDescription);
        // Get the stream containing content returned by the server.
        Stream dataStream = response.GetResponseStream();
        // Open the stream using a StreamReader for easy access.
        StreamReader reader = new StreamReader(dataStream);
        // Read the content.
        string responseFromServer = reader.ReadToEnd();
        // Display the content.
        Console.WriteLine(responseFromServer);
        // Cleanup the streams and the response.
        reader.Close();
        dataStream.Close();
        response.Close();

        //reads the html into an html document to enable parsing
        IHTMLDocument2 doc = new HTMLDocumentClass();
        doc.write(new object[] { responseFromServer });
        doc.close();

        //loops through each element in the document to check if it qualifies for the attributes to be set
        foreach (IHTMLElement el in (IHTMLElementCollection)doc.all)
        {
            // check to see if all the desired attributes were found with the correct values
            bool qualify = true;
            if (el.tagName == "META")
            {
                HTMLMetaElement meta = (HTMLMetaElement)el;
                Response.Write("Content " + meta.content + "<br/>");
            }

        }

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here