Click here to Skip to main content
16,013,747 members
Home / Discussions / Web Development
   

Web Development

 
AnswerRe: Whats can do??? Pin
- Emanuele -2-Jun-02 23:28
- Emanuele -2-Jun-02 23:28 
Generalproblem using internet transfer control in a web service Pin
31-May-02 6:28
suss31-May-02 6:28 
Generalproblem using internet transfer control in a web service Pin
31-May-02 6:26
suss31-May-02 6:26 
QuestionWindows XP hates Visual Interdev 6.0 and SQL Server 2000? Pin
Romeo31-May-02 5:14
Romeo31-May-02 5:14 
AnswerRe: Windows XP hates Visual Interdev 6.0 and SQL Server 2000? Pin
AndyG2-Jun-02 12:20
AndyG2-Jun-02 12:20 
AnswerRe: Windows XP hates Visual Interdev 6.0 and SQL Server 2000? Pin
Christian Graus2-Jun-02 13:06
protectorChristian Graus2-Jun-02 13:06 
AnswerRe: Windows XP hates Visual Interdev 6.0 and SQL Server 2000? Pin
Not Active3-Jun-02 13:23
mentorNot Active3-Jun-02 13:23 
GeneralWeb page Content Parsing Pin
arun123430-May-02 21:03
arun123430-May-02 21:03 
Problem Statement:
My requirement is to access a web page over the internet and parse the complete Web Page (HTML) content returned by the web-site

Current Approach:
Does .Net provide any re-usable components or classes for achieving the same?
I have currently used the System.Net namespace functions provided by .Net namely HttpWebRequest, HttpWebResponse and HttpWebClient to retrieve the HTML content from the URL accessed. The content returned is stored in a stream object and then converted to a string to parse each HTML element and attribute. The parsing also identifies all links on the page (and converts it to the absolute path), all images (which is downloaded), any javascript code and any includes like ".js" , ".jpeg", ".css" files. The end result obtained is a replica of the web page accessed. All this has been done programmatically.


Drawback of the approach:
It is a very tedious approach since the string containing the HTML code needs to be parsed and stripped into elements, attributes, links, includes, images etc.
It is not effective as far as performance is considered since the logic involves content crawling and downloads.

My Question:
Are there any better approaches to meet the requirement - may be something like re-usable HTML parsers or Web page content parsers already available or pre-defined classes/libraries in .Net?



Smitha Puranik
GeneralClearing browser cache Pin
30-May-02 18:59
suss30-May-02 18:59 
GeneralRe: Clearing browser cache Pin
Mike.NET3-Jun-02 17:56
Mike.NET3-Jun-02 17:56 
GeneralTransfer data from one page to another Pin
BLaZiNiX30-May-02 17:30
BLaZiNiX30-May-02 17:30 
GeneralRe: Transfer data from one page to another Pin
Mike.NET3-Jun-02 17:51
Mike.NET3-Jun-02 17:51 
GeneralHierarchical Menu Problem Pin
30-May-02 6:46
suss30-May-02 6:46 
GeneralRe: Hierarchical Menu Problem Pin
Philip Patrick30-May-02 10:48
professionalPhilip Patrick30-May-02 10:48 
GeneralRe: Hierarchical Menu Problem Pin
30-May-02 12:18
suss30-May-02 12:18 
GeneralRe: Hierarchical Menu Problem Pin
Philip Patrick30-May-02 12:27
professionalPhilip Patrick30-May-02 12:27 
Generalmultiple recordsets Pin
Megan Forbes30-May-02 6:06
Megan Forbes30-May-02 6:06 
GeneralRe: multiple recordsets Pin
Not Active30-May-02 6:47
mentorNot Active30-May-02 6:47 
Generalcookies and javascript Pin
Guillaume F.30-May-02 3:47
Guillaume F.30-May-02 3:47 
GeneralRe: cookies and javascript Pin
Not Active30-May-02 6:35
mentorNot Active30-May-02 6:35 
GeneralRe: cookies and javascript Pin
Guillaume F.30-May-02 7:20
Guillaume F.30-May-02 7:20 
GeneralRe: cookies and javascript Pin
SimonS3-Jun-02 5:43
SimonS3-Jun-02 5:43 
Generalhelp me Pin
HoldMe30-May-02 2:42
HoldMe30-May-02 2:42 
QuestionHow ??? Pin
HoldMe29-May-02 17:07
HoldMe29-May-02 17:07 
AnswerRe: How ??? Pin
JC Gauthier30-May-02 5:39
JC Gauthier30-May-02 5:39 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.