Analysing html string in c#

Question

0.00/5 (No votes)

See more:

I have download html source via webclient.downloadstring(url). and I saved then in .txt file. Now I used to analysis those codes on line at a time using loops. But this time the html are bit messy(Irregular newlines, spaces). I mean when I view the source in chrome of that site, chrome formats it for me, and it's easy. But in the .txt file it's not formatted, so I'm having hard time to analyse it.

Like first I read the line then split them, then look for things. But now I can't track things as I can't guess what is in the line, as the lines are irregular.

Any ideas?
Thanks.

I'm going to re-question.

The thing is I'm willing to extract information like image links, image category, subcategory from a wallpaper website. I need to look for links within specific tags with specific classes in the html code. I've using string match algorithm. But is there a way to crawl from tag to tag, child tags, parent tags? Like using DOM in javascript?

Posted 30-Nov-12 1:45am

LolaFile

Updated 30-Nov-12 2:21am

v2

Add a Solution

Comments

ZurdoDev 30-Nov-12 8:14am

What specifically are you looking for? There may be a better way to do it.

thursunamy 30-Nov-12 8:24am

Hi,

Look at HTMLAgilityPack.

Regards

Ravi Bhavnani 30-Nov-12 13:09pm

HTMLAgilityPack is very cool. Also see my StringParser article - I wrote this class to do exactly what you want to do and have used in with much success in a couple of widley used products. http://www.codeproject.com/Articles/12708/StringParser

/ravi

Ravi Bhavnani 30-Nov-12 13:10pm

You might also find this article helpful:

http://www.codeproject.com/Articles/12709/WebResourceProvider-goes-NET

/ravi

Sergey Alexandrovich Kryukov 30-Nov-12 15:18pm

What is "source in chrome"? :-) Do you have a problem analyzing HTML, or only text.
--SA

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Answer 1 · 2012-11-30T09:22:00

Here is where the trouble lies: you formulated the problem with the difficulty of analyzing of text files as a general problem, without concerns of any detail of the file contents. But this general problem cannot have general solution, simply because the notion of "text file" is not anything certain. They can be, well… anything. After all, HTML and XML files are text files, too, but you seemingly don't have problems with them.

—SA

sachin_jain · Answer 2 · 2012-11-30T12:23:00

Solution 2

better you read the page using ajax with jquery and then track what you want, i never tried but hope it will work faster and better.

Posted 30-Nov-12 12:23pm

sachin_jain