Click here to Skip to main content
16,011,905 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
hello every body
I want to parse html pages in imdb site (e.g by using html agility pack) and extract the content... how can i do it? plz help me
Posted
Comments
Afzaal Ahmad Zeeshan 17-Nov-14 13:43pm    
What is meant by parsing HTML pages, and do you mean IMDB that movies website?
shima541 17-Nov-14 16:41pm    
yes i khnow, i want to extract the features of 12000 films like 'director','genre','duration','score',... by html code of each page in this site
can you halp me?

1 solution

t would be good if this is a document well-formed as XML, then you could parse it using one of .NET XML parsers. Not all Web pages are like that, unfortunately, so you may need HTML parser which does not require well-formed XML compliance. Try this one: http://www.majestic12.co.uk/projects/html_parser.php[^].

—SA
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900