Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles
(untagged)

HTML Parser

0.00/5 (No votes)
3 Aug 2006 1  
C# DLL for use it in .Net Applications, you can convert it easy to any code

Introduction

This is HTML parser for getting Titles, Texts and Links from the page, it is a DLL code using C# but you can transform it in easy way to any programming language when you know, how to get the HTML code from the page

Basic Idea

The idea behind this code is,you parse through the HTML code character by character then if you get the title tag represent the text after it to the title string, if you go to body tag then accept all text which not language script or CSS, and the same for the links

Brief Code Description

i make lookup table for some special characters like when you read in the HTML code the characters &lt; this represent the < character

		public string GetTitle(string Source)
		{
		        int len=Source.Length;
			string title="      ";  
			char c;
			for(int i=0;i<len;i++)
			{
				c=Convert.ToChar(Source.Substring(i,1));
				title=title.Remove(0,1);
				title+=c;
				if(title.ToLower()=="<title")
				{
					while(c!='>')
					{
						i++;
						c=Convert.ToChar(Source.Substring(i,1));
					}
					title="";
					i++;
					c=Convert.ToChar(Source.Substring(i,1));
					while(c!='<')
					{
						title+=c;
						i++;
						c=Convert.ToChar(Source.Substring(i,1));
					}
					break;
				}
			}
			return title.Trim();
		}

The other codes for getting text and links in the file attached

Usage

in using this code you add the library to your project then call the instance of this class like Parser.Parse inst=new Parser.Parser() and use the inst for getting the functions inst.GetTitle(page)to represent the title

inst.GetText(page)to represent the text

inst.MakeLinks(page)to represent the Links

then after you make link you will get it in pLabel and pLink which represent the Link and the label you which appear it in the page

Resources

C# DLL in .Net 2005

Contact me

if there is a problem please contact me at ahmed_a_e2006@yahoo.com

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here