Parsing takes too long - How to speed-up??? - XML / XSL Discussion Boards

David Chamberlain28-Mar-02 11:29

28-Mar-02 11:29

Okay. So I really should have something like this?

[Node1][EOL]"\n"[/EOL]
[Data2]Here is data[/Data2][EOL]"\n"[/EOL]
[/Node1][EOL]"\n"[/EOL]

where the EOL text nodes are effectively "hidden" because they only show up as the newlines?

I guess that makes sense, but it sure isn't what I thought it should be. This, of course, is only of use if I really do need to look at or modify the file in an editor. If I just use the web browser to view the file, none of this matters. And, if I only access the file through the application, then none of it matters.

Thanks,
Dave

"You can say that again." -- Dept. of Redundancy Dept.

Re: Editable XML from MSXML

Michael A. Barnhart28-Mar-02 11:38

Michael A. Barnhart

28-Mar-02 11:38

First my appoligies for not being a better writer. I am not getting the story across very well. An element is a node but a node may not be an element. or a node could be an element. It could also be text, processing instructions, comments, etc.

In your example:
David Chamberlain wrote:
[Node1][EOL]"\n"[/EOL]
[Data2]Here is data[/Data2][EOL]"\n"[/EOL]
[/Node1][EOL]"\n"[/EOL]

you have added child elements to the element called node1. What is missing are the child nodes that are not elements, not adding more elements. I guess it has been to many months since I stepped through the MS DOM model.

What I did to finally get a better feel of what was going on in the MS DOM model was to create a simple dialog app that when a button was pressed created a MSDOM instance and read in a XML file. I then added code that found the root element of the document and sent it to a function that would get the list of child nodes and looked at what types they were. When I found a node that was an element I recursed back to the function. It helped me see all of the items that were existed. I am not sure if I have this save or not.

I left it there and concluded for my needs the class I had written worked fine and I would use it for any manipulation of XML files. I do use the MS DOM to read in files as well as some of the Apache code.

Good ideas are not adopted automatically.
They must be driven into practice with courageous patients. -Admiral Rickover. ...

Re: Editable XML from MSXML

David Chamberlain28-Mar-02 11:59

David Chamberlain

28-Mar-02 11:59

Let me first say that I really appreciate your help, even though what appears to be a simple matter has become quite complicated. Thanks to MS, I'm sure.

So, without creating new child elements, I should just create additional text nodes, ending up like this:

[Node1] (Node)
"\n" (Text, child 1 of Node1)
[Data2] (Node, child 2 of Node1)
"Here is data" (Text, child 1 of Data2)
[/Data2]
"\n" (Text, child 3 of Node1)
[/Node1]
"\n" (Text, child ? of parent-of-Node1)

While I hate the vocabulary of "nodes" and "elements," the only real difference I could see was that "elements" allow access to attributes while "nodes" do not. Either one can have children.

Dave

"You can say that again." -- Dept. of Redundancy Dept.

Re: Editable XML from MSXML

Michael A. Barnhart28-Mar-02 15:13

Michael A. Barnhart

28-Mar-02 15:13

In general yes to adding the text nodes.

Neville's comment about some control options appears to be correct and I just had not noticed. In the following code at one time I received all of the nodes and now I do not receive the nodes that only contain white spaces. I.E. exactly the point you made about missing the formating!!!
It is gone now.

Hopefully this is a start.

The first function initializes the process and reads in a specific file.

The second function then steps through it is two different ways.
If you experiment with putting non-white space text data in with elements between I think you will see my comment.
I am using Win2k with MSXML 4 and tried this out on WinMe also with MSXML 4. Previously I had run something simmilar but with only MSXML 3 installed. If that is the difference or not I can not say.

Take Care

void CMsDomTestDlg::OnButtonread()
{
row=0;
CComVariant varFileName = (LPCSTR)"ourtest.xml";
VARIANT_BOOL varOkay;
HRESULT hr;

IXMLDOMDocument *pXML = NULL;
hr = CoCreateInstance(CLSID_DOMDocument, NULL, CLSCTX_INPROC_SERVER,
IID_IXMLDOMDocument2, (void**)&pXML);
ASSERT(SUCCEEDED(hr) && pXML!=NULL);

hr = pXML->load(varFileName,&varOkay);

IXMLDOMElement *pRoot;
if(SUCCEEDED(hr))
{
hr = pXML->get_documentElement(&pRoot);
if(SUCCEEDED(hr)&&pRoot!=NULL)
{
LoadChildren((IXMLDOMNode*)pRoot, 0);
}
else
{
m_NodeGrid.SetItemText(row,0,"Model Not Read In");
}
}
Invalidate(TRUE);
}

void CMsDomTestDlg::LoadChildren(IXMLDOMNode* pNode, int depth)
{
HRESULT hr;
IXMLDOMNode *child;
BOOL Method1 = FALSE;
CString data;
CString td;
CComBSTR txt;
long listlen,listpos;
DOMNodeType type;
IXMLDOMNodeList *childlist;
if(Method1)
{
hr = pNode->get_firstChild(&child);
while(SUCCEEDED(hr)&&child!=NULL)
{
row++;
data.Format("%d",depth);
m_NodeGrid.SetItemText(row,0,data);
hr = child->get_nodeType(&type);
data.Format("%d",type);
m_NodeGrid.SetItemText(row,1,data);
hr = child->get_text(&txt);
if(SUCCEEDED(hr))
{
td = txt;
data.Format("%s length of %d",td,td.GetLength());
m_NodeGrid.SetItemText(row,2,data);
}
else
{
data = "No Text Data";
m_NodeGrid.SetItemText(row,2,data);
}
hr = child->get_baseName(&txt);
if(SUCCEEDED(hr))
{
data = txt;
m_NodeGrid.SetItemText(row,3,data);
}
else
{
data = "No Base Name";
m_NodeGrid.SetItemText(row,3,data);
}

if(type == NODE_ELEMENT)
{
LoadChildren(child,depth+1);
}
hr = child->get_nextSibling(&child);
}
}
else
{
hr = pNode->get_childNodes(&childlist);
childlist->get_length(&listlen);

for(listpos=0;listpos<listlen;listpos++)
{
hr =="" childlist-="">get_item(listpos,&child);
if(SUCCEEDED(hr)&&child!=NULL)
{
row++;
data.Format("%d",depth);
m_NodeGrid.SetItemText(row,0,data);
hr = child->get_nodeType(&type);
data.Format("%d",type);
m_NodeGrid.SetItemText(row,1,data);
hr = child->get_text(&txt);
if(SUCCEEDED(hr))
{
data = txt;
m_NodeGrid.SetItemText(row,2,data);
}
else
{
data = "No Text Data";
m_NodeGrid.SetItemText(row,2,data);
}
hr = child->get_baseName(&txt);
if(SUCCEEDED(hr))
{
data = txt;
m_NodeGrid.SetItemText(row,3,data);
}
else
{
data = "No Base Name";
m_NodeGrid.SetItemText(row,3,data);
}
if(type == NODE_ELEMENT)
{
LoadChildren(child,depth+1);
}
}
}
}
}

Good ideas are not adopted automatically.
They must be driven into practice with courageous patients. -Admiral Rickover. ...

Re: Editable XML from MSXML

Michael A. Barnhart28-Mar-02 16:53

Michael A. Barnhart

28-Mar-02 16:53

From MSXML 4 documentation
When a text file is opened with the xmlDoc.load method or the xmlDoc.loadXML method (where xmlDoc is an XML DOM document), the parser strips most white space from the file, unless specifically directed otherwise. The parser notes within each node whether one or more spaces, tabs, newlines, or carriage returns follow the node in the text by setting a flag. This method is efficient, reducing both the size of each XML file and the number of calculations required to redisplay the XML in a browser. However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. Tabs, in particular, can be lost, because they are not formally recognized in the default mode as anything but white space.

hr = pXML->put_preserveWhiteSpace(VARIANT_TRUE);
Place this before the load.

And now they are spaces are back Smile | :)

Good ideas are not adopted automatically.
They must be driven into practice with courageous patients. -Admiral Rickover. ...

Re: Editable XML from MSXML

David Chamberlain29-Mar-02 3:02

David Chamberlain

29-Mar-02 3:02

Michael A. Barnhart wrote:
However, because this information is lost, an XML document stored in this manner can lose formatting information in its content.

First, I sure didn't expect such investigation into this seemingly trivial matter, but I certainly appreciate all the input and help.

Apparently, the preserve white space option is on by default. I had created an XML file in the Visual Studio IDE in order to plan out the structure and content of the file that would eventually be manipulated and maintained by the application program. Once I had that file, I would call the 'load' function, and then let the application do its thing, one operation being the creation of new nodes, as described in the earlier posts. At the end of execution, and calling 'save', I would then load the file back into the IDE to see what happened, and to check that the application created the new nodes properly.

At that point, what I was seeing was the same file as I had originally created, with all the 'formatting' (spaces, tabs, and newlines) properly still existant, but the new nodes would appear on a single line. They would be in the proper location in the file, in terms of being after the last child of the node being added to, but there were no new lines.

Therefore, although I haven't updated the implementation yet, I believe that the previous suggestion about adding text nodes with newlines (and spaces or tabs if I decide to add those too) will indeed place those into the file and will be preserved upon subsequent 'load' and 'save' operations, by default, even without calling 'preserve white space'.

This particular application is running on Win98 with msxml3, although I plan to update that to msxml4 for the speed and memory considerations.

I also appreciate your code, as seeing how things are done is the best teacher. But, unfortunately, and probably as no surprise, that raises a few more questions.

Based on one of the XML sample projects on CP, I am using the #import [msxml3.dll] in the header file. While it seems to me that the following should be equivalent, one worked and one did not. While I am not familiar with the intricacies of COM, I went with the one that worked.

(1) IXMLDOMNode *pNode;
pNode = m_pXmlDoc->selectNode ("StartTag");

(2) IXMLDOMNodePtr pNode;
pNode = m_pXmlDoc->selectNode ("StartTag");

According to the contents of the generated .tlh file, the selectNode function returns an IXMLDOMNodePtr, and option 1 bombs. While I don't really understand the internal difference, by following the .tlh contents, I was able to get all of the function calls and return values to be of the proper type and operate without bombing. I guess there is not really a question here, other than does this really make any difference, or am I going down a wrong path?

Thanks again for all the help.

Dave

"You can say that again." -- Dept. of Redundancy Dept.

Re: Editable XML from MSXML

Michael A. Barnhart29-Mar-02 3:14

Michael A. Barnhart

29-Mar-02 3:14

Dave,
I think you are well along the right path. Good luck Big Grin | :-D

David Chamberlain wrote:
Apparently, the preserve white space option is on by default.

Dave,
As I said earlier my code worked differently awhile back. I now have MSXML4 installed that the default appears to not include the white spaces. So I would go ahead and add that line in unless your memeory is much better than mine.

And go with what works. In COM you have interfaces. So a pointer to an interface is not the same thing as an interface to a pointer to XYZ. Don't we love this.

I needed a little refresher especially with what I learned for differences between 3 and 4.

Take Care and have a nice day. Mike

Good ideas are not adopted automatically.
They must be driven into practice with courageous patients. -Admiral Rickover. ...

Parsing takes too long - How to speed-up???

Rybeck25-Mar-02 6:29

Rybeck

25-Mar-02 6:29

I have 18000 items and it takes about 10 minutes to parse my xml-file, but only a few seconds if I comment out line 6.

What I'm doing wrong? How can speed-up the following?

1. Set oXMLNodeList = oXMLElement.selectNodes("data/item")
2. Put #fTestFile, , "Count: " & oXMLNodeList.Length & vbCrLf
3. For Each oItemY In oXMLNodeList
4. Put #fTestFile, , vbCrLf
5. For Each oItemX In oItemY.childNodes
6. Put #fTestFile, , " ;" & oItemX.nodeTypedValue
7. Next oItemY
8. Next oItemY

Re: Parsing takes too long - How to speed-up???

Michael A. Barnhart27-Mar-02 11:44

Michael A. Barnhart

27-Mar-02 11:44

I do not work with VB so this is somewhat of an outside observation and could be a worthless comment but:

I would have added a line 4.5
Set oYList = oItemY.childNodes assuming this is valid in VB.
Potentially you may be copying the data over and over, I have seen this impact.
Then line 5 would be For Each oItemX in OYList

Line 7 should be Next oItemX

by commenting out line 6 you are never actually addressing the data.

Good ideas are not adopted automatically.
They must be driven into practice with courageous patients. -Admiral Rickover. ...

An interesting example

Michael A. Barnhart21-Mar-02 15:22

Michael A. Barnhart

21-Mar-02 15:22

I found this page a few days ago and feel it is an interesting example that shows transformations without usage of what I see (and use) typically employing the

XSLT to write links

MS le Roux17-Mar-02 23:06

MS le Roux

17-Mar-02 23:06

Given the following xml:

<para>The <link url="earth.htm">earth</link> rotates.</para>

Is there a way to use XSLT to write the content of the <para> element like this:

<p>The <a href="earth.htm">earth</a> rotates.</p>

Re: XSLT to write links

David Wengier17-Mar-02 23:45

David Wengier

17-Mar-02 23:45

Something along the lines of:

<xsl:template match="link">
<a>
<xsl:attribute name="href"><xsl:value-of select="@url" /></xsl:attribute>
<xsl:value-of select="."/>
</a>
</xsl:template>

I think. Its been a while since I have done XSLT.

--
David Wengier

Sonork ID: 100.14177 - Ch00k

Re: XSLT to write links

MS le Roux18-Mar-02 5:00

MS le Roux

18-Mar-02 5:00

That part works if you only want to show links. What I was wondering was if there was a simple way to put the link inside the rest of the text, i.e. in my example the result I wanted was
The earth rotates.

Re: XSLT to write links

Paul Watson20-Mar-02 22:12

Paul Watson

20-Mar-02 22:12

MarSCoZa wrote:
<para>The earth rotates.

Technically that kind of XML is not really valid.

e.g.

<para>
The
<link url="earth.htm">earth</link>
rotates.
</para>

The The and Rotates PCDATA sections are "floating".

The reason being that an element can either only contain PCDATA or child-elements. It cannot contain both. Think about creating a DTD which defines that... You cannot really. The DTD element definition cannot contain PCDATA and an elements name.

However XML is quite forgiving in this case (which is strange considering it's normally very unforgiving nature) and you can use the following XSL to transform it.

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
	<xsl:template match="/">
		<html>
			<head>
			</head>
			<body>				
				<xsl:apply-templates />
			</body>
		</html>
	</xsl:template>
	
	<xsl:template match="link">
		<a>
			<xsl:attribute name="href">
				<xsl:value-of select="@url" />
			</xsl:attribute>
		<xsl:value-of select="."/>
		</a>
	</xsl:template>
</xsl:stylesheet>

The key is templates. I used to hate them and thought they were these daft, never used bits of XSL. Until I figured out how they worked and went "oooohh yes!" Smile | :)

Enjoy!

regards,
Paul Watson
Bluegrass
Cape Town, South Africa

The greatest thing you'll ever learn is just to love, and to be loved in return - Moulin Rouge

Re: XSLT to write links

Atlantys8-Apr-02 6:26

Atlantys

8-Apr-02 6:26

Paul Watson wrote:
The key is templates. I used to hate them and thought they were these daft, never used bits of XSL. Until I figured out how they worked and went "oooohh yes!"

OMG | :OMG:

I've never used XSLT for XML->HTML, but with XML->XML, templates are the god you bow down to and pray. It boggles my mind the thought of NOT using template matching in XSLT. OMG | :OMG:

Trying to write a filter

Christian Graus13-Mar-02 9:41

Christian Graus

13-Mar-02 9:41

I have an XML document which I need to turn into a new format, where a small portion of it gets put in <static> tags, the rest does not. There is a LOT of XML on this page.

I've got the first half done, easy enough. Now I want to write a filter which, instead of saying 'include these tags and children', I want to say 'EXCLUDE these tages and children, include everything else'. How do I do that ?

Christian

The tragedy of cyberspace - that so much can travel so far, and yet mean so little.

"I'm thinking of getting married for companionship and so I have someone to cook and clean." - <b>Martin Marvinski, 6/3/2002</b>

Re: Trying to write a filter

MS le Roux13-Mar-02 19:08

MS le Roux

13-Mar-02 19:08

Which language and parser are you using to process the XML?

Re: Trying to write a filter

Christian Graus13-Mar-02 20:28

Christian Graus

13-Mar-02 20:28

I'm using XSL, I got it working this afternoon - thanks.

Christian

The tragedy of cyberspace - that so much can travel so far, and yet mean so little.

"I'm thinking of getting married for companionship and so I have someone to cook and clean." - Martin Marvinski, 6/3/2002

Re: Trying to write a filter

Paul Watson14-Mar-02 0:46

Paul Watson

14-Mar-02 0:46

Christian Graus wrote:
I got it working this afternoon - thanks.

How did you do it? I might run into that problem one day and need to know how, thanks Smile | :)

regards,
Paul Watson
Bluegrass
Cape Town, South Africa

"The greatest thing you will ever learn is to love, and be loved in return" - Moulin Rouge

Sonork ID: 100.9903 Stormfront

Re: Trying to write a filter

Michael A. Barnhart14-Mar-02 0:53

Michael A. Barnhart

14-Mar-02 0:53

My first pass would have been to use the count(tag) function and only processed if it returned 0.

Good ideas are not adopted automatically.
They must be driven into practice with courageous patients. -Admiral Rickover. ...

Re: Trying to write a filter

Christian Graus14-Mar-02 3:12

Christian Graus

14-Mar-02 3:12

The trick is that a more generic filter will only include items that were not included in a more specific filter. I wrote specific filters for the tages I needed, then a generic filter, and it excluded all the items that the specific tags caught.

If that's not clear LMK, I can post the code from work tomorrow.

Christian

The tragedy of cyberspace - that so much can travel so far, and yet mean so little.

"I'm thinking of getting married for companionship and so I have someone to cook and clean." - Martin Marvinski, 6/3/2002

Voice XML

Adrian Metcalfe13-Mar-02 5:36

Adrian Metcalfe

13-Mar-02 5:36

Anybody know how to use Voice XML ?

What do I need, how do I set it up ?

Any info would be useful.
Confused | :confused:

Users.
Can't live with 'em, can't kill em!

Re: Voice XML

Jamie Hale28-Mar-02 11:05

Jamie Hale

28-Mar-02 11:05

VoiceXML is just a standard. Go here and grab the specs. Then, grab the parser ang language of your choice and start writing a processor.

Of course, there might be ones out there. I'm going by the work we did about a year ago for a copy that used VoiceXML. We had to write our own processor. Frown | :(

validating XML in .NET

SimonS13-Mar-02 0:22

SimonS

13-Mar-02 0:22

I need to validate an XML document against a schema, but I'm not having much luck finding a complete example in MSDN.

This is the pseudo code of what I want to do:

<br />
 XMLValidater xVal = new XMLValidater("my dtd, etc...");<br />
 XMLDocument xDoc = new XMLDocument("my XML doc");<br />
 if (xVal.Validate(xDoc))<br />
     //can now use XML document in rest of system<br />

Help would be cool.

Cheers,
Simon

X-5 452 rules.

Re: validating XML in .NET

Not Active13-Mar-02 5:35

Not Active

13-Mar-02 5:35

I haven't looked at it but isn't XmlValidatingReader what you need?

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.