I have download html source via webclient.downloadstring(url). and I saved then in .txt file. Now I used to analysis those codes on line at a time using loops. But this time the html are bit messy(Irregular newlines, spaces). I mean when I view the source in chrome of that site, chrome formats it for me, and it's easy. But in the .txt file it's not formatted, so I'm having hard time to analyse it.
Like first I read the line then split them, then look for things. But now I can't track things as I can't guess what is in the line, as the lines are irregular.
Any ideas?
Thanks.
I'm going to re-question.
The thing is I'm willing to extract information like image links, image category, subcategory from a wallpaper website. I need to look for links within specific tags with specific classes in the html code. I've using string match algorithm. But is there a way to crawl from tag to tag, child tags, parent tags? Like using DOM in javascript?