Basically, you can convert it to text by removing all tags from the HTML code, which is a simple string-manipulation routine. Another problem is: you would need to convert all HTML
character entities to Unicode characters. In this way, some HTML parser could be very handy. It will give you the values for all text nodes, with all the character entities already excluded.
The simplest way of doing it would be using XML parser, but if can only work if your HTML is a well-formed XML. The XML parsers are readily available in .NET> This is really a shame, but many HTML codes existing in the real world do not conform to it. In this case, you would need some parser which can work with non-well-formed code. In this case, look at one of available HTML parser in C#. The following projects are found in CodeProject:
An Elementary HTML Parser[
^]
Another C# Legacy HTML Parser Using Tag Processing[
^]
AfterWork HTML Parser in C#[
^].
If this is not enough, do your own search:
http://en.lmgtfy.com/?q=HTML+parser+%22C%23%22[
^].
Sharpen your Google skills, by the way; it will greatly help you.
Now, what kind of Word document do you want: with formatting close to the HTML source of not?
If first case, you can just open HTML document with Word (and save as .doc or .docx file); in second case, you should not do anything else; you can consider plain Unicode text as a partial case of Word document.
If you need to do the conversion automatically (but I don't know why, of HTML document can be opened with Word anyway), you would need to use Office/Word interop.
To create a Word document, use Office interop assembly. Basically, in your project's "References" tab of the Code Explorer, click "Add Reference", use the tab "COM" of the "Add Reference" window, add the reference to Microsoft Word Object Library of required version. Please see:
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word%28v=office.11%29.aspx[
^],
http://msdn.microsoft.com/en-us/library/microsoft.office.interop.word%28v=office.14%29.aspx[
^].
(Or similar piece of documentation for required version.)
See also:
http://msdn.microsoft.com/en-us/library/aa192495%28v=office.11%29.aspx[
^],
http://msdn.microsoft.com/en-us/office/hh128772.aspx[
^].
—SA