Search for a particular tag in a html page

Question

0.00/5 (No votes)

See more:

How can I search for a particular tag in a html file? For example if my html page has h1 tag somewhere in the middle, how to ensure programmatically that my page has that particular tag?

Posted 30-May-10 23:51pm

vinodkalanji87

Updated 31-May-10 0:25am

v4

Add a Solution

4 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

charles henington · Accepted Answer · 2010-06-01T03:50:00

Here is a code that i use to find all images tags within a webpage
hope this helps

C#

public List<string> FetchImages(string Url)
        {
            List<string> imageList = new List<string>();

            //Append http:// if necessary
            if (!Url.StartsWith("http://") && !Url.StartsWith("https://"))
                Url = "http://" + Url;

            string responseUrl = string.Empty;
            string htmlData = ASCIIEncoding.ASCII.GetString(DownloadData(Url, out responseUrl));

            if (responseUrl != string.Empty)
                Url = responseUrl;

            if (htmlData != string.Empty)
            {
                string imageHtmlCode = "<img";
                string imageSrcCode = @"src=""";

                int index = htmlData.IndexOf(imageHtmlCode);
                while (index != -1)
                {
                    //Remove previous data
                    htmlData = htmlData.Substring(index);

                    //Find the location of the two quotes that mark the image's location
                    int brackedEnd = htmlData.IndexOf('>'); //make sure data will be inside img tag
                    int start = htmlData.IndexOf(imageSrcCode) + imageSrcCode.Length;
                    int end = htmlData.IndexOf('"', start + 1);

                    //Extract the line
                    if (end > start && start < brackedEnd)
                    {
                        string loc = htmlData.Substring(start, end - start);

                        //Store line
                        imageList.Add(loc);
                    }

                    //Move index to next image location
                    if (imageHtmlCode.Length < htmlData.Length)
                        index = htmlData.IndexOf(imageHtmlCode, imageHtmlCode.Length);
                    else
                        index = -1;
                }

                //Format the image URLs
                for (int i = 0; i < imageList.Count; i++)
                {
                    string img = imageList[i];

                    string baseUrl = GetBaseURL(Url);

                    if ((!img.StartsWith("http://") && !img.StartsWith("https://"))
                        && baseUrl != string.Empty)
                        img = baseUrl + "/" + img.TrimStart('/');

                    imageList[i] = img;
                }
            }

            return imageList;
        }
         // you can find the c# code here although needs much work
         // not my code by the way mine is completed but originated
         // from this code
        //http://www.vcskicks.com/download_file_http.html
        private byte[] DownloadData(string Url)
        {
            string empty = string.Empty;
            return DownloadData(Url, out empty);
        }

Anıl Yıldız · Accepted Answer · 2010-05-31T11:23:00

indexOf should work as told, just for an alternative you could use string.Contains() method.

XML

string str = "<h1>some_unnecessary_string</h1>";

Example for contains method:

bool result = str.Contains("<h1>");

or regular expressions for a complete tag search:

C#

bool result = check(str);
bool check(string source)
{
    return new Regex("<h1>.*?</h1>").IsMatch(source);
}

#realJSOP · Accepted Answer · 2010-05-31T01:22:00

Solution 2

Once you get your document loaded, just do a string.IndexOf("<mytag>")</mytag>. If the method returns anything less that 0, the string you were looking for doesn't exist. It's faster than using the DOM stuff.

Posted 31-May-10 1:22am

#realJSOP

Comments

Dalek Dave 31-May-10 7:29am

Seems the best way, and works on several search criteria (by nexting). 5!

sainath437 · Accepted Answer · 2010-05-30T23:59:00

Solution 1

hi TRY USING this DOM methods..

x.getElementById(id) - get the element with a specified id
x.getElementsByTagName(name) - get all elements with a specified tag name

x is a node object (HTML element)

Posted 30-May-10 23:59pm

sainath437

Comments

Dalek Dave 31-May-10 7:27am

Good Answer!

vinodkalanji87 1-Jun-10 2:59am

Thanks a lot for the answers

Search for a particular tag in a html page

4 solutions

Solution 4

Solution 3

Solution 2

Solution 1

Add your solution here

Preview 0