|
Please do not post the same question in multiple forums, read the guidelines[^].
|
|
|
|
|
Hi there,
I'm a complete nube to this but I need a piece of code that will do the following:
Look inside a large XML file and remove all the entries of
Folder fullpath= (in this case Y:\HQ\)
SizeData Size= (in this case 172204891364)
Put them somewhere (bearing in mind that an Excel doc is limited to 52000 ish rows) and in a form that I can compare this iteration of the XML data to a previous one to allow a comparison of each "SizeData Size=" and "Folder fullpath=" to get growth statistics for each folder
The XML data is in the form below; there is one for each folder and there are 10's of thousands of them.
<Folder fullpath="Y:\HQ\" IsFilesNode="0">
<Name>HQ</Name>
<Attributes>16</Attributes>
<LastAccessDate Low="1344632870" High="30030880"/>
<LastChangeDate Low="4221838626" High="30030879"/>
<CreationDate Low="4129018994" High="29882626"/>
<SizeData Size="172204891364" Allocated="173345047862" Wasted="1138287388" CDRom="172711966720" Files="443755" Folders="42448" Compression="1"/>
<FilesSizeData Size="0" Allocated="0" Wasted="0" CDRom="0" Files="0" Folders="3" Compression="1"/>
I will be able to get the comparison stuff sorted out, but my issue is getting the data out of the XML file in a form that I can manipulate.
Any help gratefully revieved..
|
|
|
|
|
nhsal69 wrote: I'm a complete nube to this but I need a piece of code that will do the following:
0) don't just ask for code, nobody is going to just give you this.
1) Look into the system.xml namespace
some things to get you started:
system.xml.xmldocument
system.xml.xmlelement
nhsal69 wrote: Put them somewhere (bearing in mind that an Excel doc is limited to 52000 ish rows)
2) Why even consider Excel? if this is data (the comparison) doesn't need to be saved then just put them in memory (Datatable or something like it)
If it does need to be saved, at the very least go to Access, SQL server would be better.
nhsal69 wrote: The XML data is in the form below; there is one for each folder and there are 10's of thousands of them.
3) That's a lot of files, take into account that this will most likely take very long to process all of them.
|
|
|
|
|
[quote]
0) don't just ask for code, nobody is going to just give you this.
[/quote]
fair point...
Have been looking at XML File Parsing in VB.NET[^] and been getting more and more confused, can get the example to work, but can't modify it to get it to read the elements I'm after from the example above...
Getting both fields written into a table in SQL, would be ideal..
Will look at your suggestions, ta...
|
|
|
|
|
Your XML file isn't build as neatly as could be still it's doable.
Where are you stuck?
a bit more to get you going (assuming you posted the exact and full XML):
You read the XML in with 'xmldocument.load(path)'
Your first element = 'Folder'
This element has 2 attributes: 'fullpath' and 'IsFilesNode'
Your second element = 'Name'
This element has NONE attributes so you'll have to read the 'InnerText' of the element to get your data
...
This (together with the article) should be enough to find your answer
Also note that when reading these element / attributes XML is case-sensitive (from what I can remember)
|
|
|
|
|
Hey there,
thanks for the info, I have got it working, in the sense that I can remove the /root type value and manipulate that but anything further down the XML file and I get no output.
Below I have copied the Full XML code as I'm guessing that all the <root>, <Application> etc is stopping my code from searching the <folder> and <sizedata> elements that I'm actually after.
Is there a way of ignoring all the other stuff and just getting what I'm after, I dunno masking it out or something??
Thanks again...
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Root Type="TRoot">
<Application>TreeSize Professional</Application>
<Version>5.2.3 (5.2.3.505)</Version>
<Date>15/10/2009 12:30:08</Date>
<Path>C:\</Path>
<ExcludePatterns>
<pattern>*~SNAPSHOT*</pattern>
</ExcludePatterns>
<IncludePatterns>
<pattern>*</pattern>
</IncludePatterns>
<ArchiveBitFilesOnly>0</ArchiveBitFilesOnly>
<ExcludeOfflineFiles>0</ExcludeOfflineFiles>
<CreatedPastDaysOnly>0</CreatedPastDaysOnly>
<Filesystem>NTFS</Filesystem>
<BytesPerCluster>4096</BytesPerCluster>
<Compressed>0</Compressed>
<FileBasedCompression>-1</FileBasedCompression>
<FoldersOccupySpace>0</FoldersOccupySpace>
<IsCompared>0</IsCompared>
<Title>Drive: Local Disk (C </Title>
<UserDefinedClusterSize>0</UserDefinedClusterSize>
<UsedBytesOnDrive>80023715840</UsedBytesOnDrive>
<FreeBytesOnDrive>7445061632</FreeBytesOnDrive>
<DoCreateFileAges FileAgesDateType="1">-1</DoCreateFileAges>
<Folder fullpath="C:\" IsFilesNode="0">
<Name>C:\</Name>
<Attributes>0</Attributes>
<LastAccessDate Low="322843223" High="30035338"/>
<LastChangeDate Low="322843223" High="30035338"/>
<CreationDate Low="0" High="0"/>
<SizeData Size="74609238013" Allocated="72396726232" Wasted="234062016" CDRom="74727184384" Files="116733" Folders="12272" Compression="3"/>
<FilesSizeData Size="18745407271" Allocated="18745561088" Wasted="153817" CDRom="18745475072" Files="69" Folders="0" Compression="1"/>
<Folder fullpath="C:\" IsFilesNode="-1">
|
|
|
|
|
nhsal69 wrote: Is there a way of ignoring all the other stuff and just getting what I'm after, I dunno masking it out or something??
Yes bye using the xmldocument class.
Alright here's a small code snippet.
Dim odoc as new system.xml.xmldocument
odoc.load("path to the xml file goes here")
dim fullpath as string=""
dim node as system.xml.xmlnode= odoc.selectnode("Folder")
fullpath = foldernode.attributes.getnameditem("fullpath")
dim name as string
node = odoc.selectnode("Name")
name=node.innertext
That should get you the fullpath and name in the xml document.
I didn't test or compile the code (it's from memory) so small adjustments might be in order.
|
|
|
|
|
thats great ta for that....
One question tho..
I get an error when compiling on this line:
Dim node As System.Xml.XmlNode = odoc.SelectNodes("Folder")
now the error is "value of type System.xml.XmlNodeList cannot be converted to System.Xml.XmlNode"
followed by:
fullpath = foldernode.Attributes.GetNamedItem("fullpath")
with an error of "System.Xml.XmlNode cannot be converted to String.
Am I missing something here? Can this element be converted to a variable?
|
|
|
|
|
nhsal69 wrote: Dim node As System.Xml.XmlNode = odoc.SelectNodes("Folder")
it's 'SelectSingleNode'
nhsal69 wrote: fullpath = foldernode.Attributes.GetNamedItem("fullpath")
it's 'foldernode.attributes.getnameditem("fullpath").value'
(did say it was from memory, guess my memory is getting holes in it)
|
|
|
|
|
Thanks again, I would like to point out that your holy memory contains a lot more than my vaguely complete one...
Right this is what I have so far:
Dim odoc As New System.Xml.XmlDocument
odoc.Load("c:\file.xml")
Dim fullpath As String = ""
Dim node As System.Xml.XmlNode = odoc.SelectSingleNode("Folder")
fullpath = foldernode.Attributes.GetNamedItem("fullpath").Value
Dim name As String
node = odoc.SelectSingleNode("Name")
name = node.InnerText
Console.Write("Folder: " & fullpath _& " FullPathValue: " & fullpath & " SizeValue: " _& name)
Console.Write(vbCrLf)
Console.Read()
Catch errorVariable As Exception
Console.Write(errorVariable.ToString())
Console.Read()
Ignore the Console Write stuff, I just want it to write somthing to the console so that I can see it working...
However, this give an "System.NullReferenceException: Object reference not set to an instance of an object." error for the "foldernode" in the "foldernode.Attributes.Ge....." line.
Any thoughts as to why?
|
|
|
|
|
replace 'foldernode' with node
|
|
|
|
|
Sorry, I forgot to state in the last post, I had thought of that and tried running the code but got the same error in the console windows..
|
|
|
|
|
you don't have a declaration for 'foldernode' so you shouldn't even be able to compile this code.
And you still have to replace it bye 'node' since that is your variable name where you load the "folder" node into.
If it still gives you a error that means that
Dim node As System.Xml.XmlNode = odoc.SelectSingleNode("Folder")
this code isn't working so either the node "folder" isn't found or something else is wrong.
|
|
|
|
|
Ok, I have altered the code as follows:
Dim fullpath As String = ""
Dim node As System.Xml.XmlNode = odoc.SelectSingleNode("Root")
fullpath = node.Attributes.GetNamedItem("Type").Value
the 2nd line in the XML code after "<?xml version="1.0" encoding="UTF-8" standalone="no"?>" is:
<Root Type="TRoot">
When I run the altered code I get the correct output of "Troot" in the console window, however, when I change the code to look for "folder" it appears as though the code loads the XML in, looks at the first line and doesn't find "folder" then errors with the "Object reference no...." message.
Does this make any sense to you?
|
|
|
|
|
This code gives me
Try
Dim odoc As New System.Xml.XmlDocument
odoc.Load("c:\temp\text.xml")
Dim node As System.Xml.XmlNode = odoc.SelectSingleNode("Folder")
Dim fullpath As String = node.Attributes.Item("fullpath").Value
node = odoc.SelectSingleNode("Name")
Dim name As String = node.InnerText
Label1.Text = fullpath & " " & name
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
the following error (loosely translated): unexpected file end. De folowing elements are not closed: Folder, Folder, Root. Line 36, position 42.
The error is pretty clear. Your XML file isn't correct. The root node isn't closed and neither are your folder nodes.
Without a correct XML format you won't be able to read the file with the XML classes.
If you can't correct the XML files you'll have to do it by using the textreaders (system.io.textreader) and regular expressions and that's going to get very complicated very fast.
|
|
|
|
|
Sorry, my bad...
The full XML file is over 10 meg so I didn't want to upload the full thing, the above is a sample of the first x lines until a couple of "folder" elements are visible...
I didn't think up loading the full thing would be sensible, but should have told you....
Needless to say try this as vastly cut down version of the xml, but it is fully formed:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Root Type="TRoot">
<Application>TreeSize Professional</Application>
<Version>5.2.3 (5.2.3.505)</Version>
<Date>15/10/2009 12:30:08</Date>
<Path>C:\</Path>
<ExcludePatterns>
<pattern>*~SNAPSHOT*</pattern>
</ExcludePatterns>
<IncludePatterns>
<pattern>*</pattern>
</IncludePatterns>
<ArchiveBitFilesOnly>0</ArchiveBitFilesOnly>
<ExcludeOfflineFiles>0</ExcludeOfflineFiles>
<CreatedPastDaysOnly>0</CreatedPastDaysOnly>
<Filesystem>NTFS</Filesystem>
<BytesPerCluster>4096</BytesPerCluster>
<Compressed>0</Compressed>
<FileBasedCompression>-1</FileBasedCompression>
<FoldersOccupySpace>0</FoldersOccupySpace>
<IsCompared>0</IsCompared>
<Title>Drive: Local Disk (C </Title>
<UserDefinedClusterSize>0</UserDefinedClusterSize>
<UsedBytesOnDrive>80023715840</UsedBytesOnDrive>
<FreeBytesOnDrive>7445061632</FreeBytesOnDrive>
<DoCreateFileAges FileAgesDateType="1">-1</DoCreateFileAges>
<Folder fullpath="C:\" IsFilesNode="0">
<Name>C:\</Name>
<Attributes>0</Attributes>
<LastAccessDate Low="322843223" High="30035338"/>
<LastChangeDate Low="322843223" High="30035338"/>
<CreationDate Low="0" High="0"/>
<SizeData Size="74609238013" Allocated="72396726232" Wasted="234062016" CDRom="74727184384" Files="116733" Folders="12272" Compression="3"/>
<FilesSizeData Size="18745407271" Allocated="18745561088" Wasted="153817" CDRom="18745475072" Files="69" Folders="0" Compression="1"/>
<Folder fullpath="C:\" IsFilesNode="-1">
<Name>[Files]</Name>
<Attributes>0</Attributes>
<LastAccessDate Low="1126896907" High="30035336"/>
<LastChangeDate Low="1142297283" High="30035337"/>
<CreationDate Low="0" High="0"/>
<SizeData Size="18745407271" Allocated="18745561088" Wasted="153817" CDRom="18745475072" Files="69" Folders="0" Compression="1"/>
<FilesSizeData Size="18745407271" Allocated="18745561088" Wasted="153817" CDRom="18745475072" Files="69" Folders="0" Compression="1"/>
<FileAgeDistribution Days_1="1" Size_1="2514485248" Days_7="4" Size_7="20480" Days_30="14" Size_30="168693760" Days_182="19" Size_182="232947712" Days_365="10" Size_365="15804166144" Days_730="18" Size_730="24813568" Days_2147483647="3" Size_2147483647="434176"/>
</Folder>
</Folder>
</Root>
|
|
|
|
|
this code:
Try
Dim odoc As New System.Xml.XmlDocument
odoc.Load("c:\temp\text.xml")
Dim oXmlLog As System.Xml.XmlElement
Dim text As String = ""
For Each oXmlLog In odoc.SelectNodes("Root")
Dim node As System.Xml.XmlElement
For Each node In oXmlLog.SelectNodes("Folder")
Dim fullpath As String = node.Attributes.GetNamedItem("fullpath").Value
Dim subnode = node.SelectSingleNode("Name")
Dim name As String = subnode.InnerText
text &= fullpath & " " & name & Environment.NewLine
Next
Next
Label1.Text = text
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
Will give you the fullpath & name
Considering that you made a small error in your xml. I used this xml:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Root Type="TRoot">
<Application>TreeSize Professional</Application>
<Version>5.2.3 (5.2.3.505)</Version>
<Date>15/10/2009 12:30:08</Date>
<Path>C:\</Path>
<ExcludePatterns>
<pattern>*~SNAPSHOT*</pattern>
</ExcludePatterns>
<IncludePatterns>
<pattern>*</pattern>
</IncludePatterns>
<ArchiveBitFilesOnly>0</ArchiveBitFilesOnly>
<ExcludeOfflineFiles>0</ExcludeOfflineFiles>
<CreatedPastDaysOnly>0</CreatedPastDaysOnly>
<Filesystem>NTFS</Filesystem>
<BytesPerCluster>4096</BytesPerCluster>
<Compressed>0</Compressed>
<FileBasedCompression>-1</FileBasedCompression>
<FoldersOccupySpace>0</FoldersOccupySpace>
<IsCompared>0</IsCompared>
<Title>Drive: Local Disk (C</Title>
<UserDefinedClusterSize>0</UserDefinedClusterSize>
<UsedBytesOnDrive>80023715840</UsedBytesOnDrive>
<FreeBytesOnDrive>7445061632</FreeBytesOnDrive>
<DoCreateFileAges FileAgesDateType="1">-1</DoCreateFileAges>
<Folder fullpath="C:\" IsFilesNode="0">
<Name>C:\</Name>
<Attributes>0</Attributes>
<LastAccessDate Low="322843223" High="30035338"/>
<LastChangeDate Low="322843223" High="30035338"/>
<CreationDate Low="0" High="0"/>
<SizeData Size="74609238013" Allocated="72396726232" Wasted="234062016" CDRom="74727184384" Files="116733" Folders="12272" Compression="3"/>
<FilesSizeData Size="18745407271" Allocated="18745561088" Wasted="153817" CDRom="18745475072" Files="69" Folders="0" Compression="1"/>
</Folder>
<Folder fullpath="C:\" IsFilesNode="-1">
<Name>[Files]</Name>
<Attributes>0</Attributes>
<LastAccessDate Low="1126896907" High="30035336"/>
<LastChangeDate Low="1142297283" High="30035337"/>
<CreationDate Low="0" High="0"/>
<SizeData Size="18745407271" Allocated="18745561088" Wasted="153817" CDRom="18745475072" Files="69" Folders="0" Compression="1"/>
<FilesSizeData Size="18745407271" Allocated="18745561088" Wasted="153817" CDRom="18745475072" Files="69" Folders="0" Compression="1"/>
<FileAgeDistribution Days_1="1" Size_1="2514485248" Days_7="4" Size_7="20480" Days_30="14" Size_30="168693760" Days_182="19" Size_182="232947712" Days_365="10" Size_365="15804166144" Days_730="18" Size_730="24813568" Days_2147483647="3" Size_2147483647="434176"/>
</Folder>
</Root>
Note the closing of the folder nodes (bold text)
|
|
|
|
|
Thanks again....
When I run this I get as an output:
C:\ C:\
So this is outputting one iteration of the loop, but I would expect to see:
C:\ C:\
C:\ [Files]
as there are two instances of "Folder" each containing a "Name"...
Stepping though the code seems to suggest that the
For Each node In oXmlLog.SelectNodes("Folder")
Is only being processed once and not then looking to the second instance of "Folder"...
Is this because the second instance is a Sub of the first??
|
|
|
|
|
nhsal69 wrote: Is this because the second instance is a Sub of the first??
Yes
If you check the XML I posted I solved that.
If the real XML is really like that (having folder nodes in folder nodes) you'll have to use recursion for that.
(have a method call itself and process the folder nodes until it finds no more folder nodes)
|
|
|
|
|
Cool, I have tried to get some of the recursion sorted out, I know what I want it to do, but not sure of the correct form.. Have come up with this so far:
Dim odoc As New System.Xml.XmlDocument
odoc.Load("C:\test\10g_1.xml")
Dim oXmlLog As System.Xml.XmlElement
Dim text As String = ""
Dim FolderSubNode As System.Xml.XmlElement
Dim FolderExist As Boolean
For Each oXmlLog In odoc.SelectNodes("Root")
Dim node As System.Xml.XmlElement
For Each node In oXmlLog.SelectNodes("Folder")
FolderExist = True
Dim fullpath As String = node.Attributes.GetNamedItem("fullpath").Value
Dim subnode = node.SelectSingleNode("Name")
Dim name As String = subnode.InnerText
text &= fullpath & " " & name & Environment.NewLine
Do Until FolderExist = False
For Each node In oXmlLog.SelectNodes("Folder")
If node = "folder" Then FolderExist = True
Next
Dim fullpath As String = node.Attributes.GetNamedItem("fullpath").Value
Dim subnode = node.SelectSingleNode("Name")
Dim name As String = subnode.InnerText
text &= fullpath & " " & name & Environment.NewLine
Loop
Next
Next
Will have a look again later, but any suggestions as to how to go about this would be great..
Cheers
|
|
|
|
|
Thanks for you help, have decided that I'll script something to change the XML file so that it is correctly formed and therefore I won't need to do the recursive searches(Which are proving tricky because of the dodgy formatting)...
I'll post the final code when I get it together..
Thanks again.
|
|
|
|
|
Well I got stuck on the recursion so got a script to tidy up the XML file so that it is reasonably well formed..
But now I have an issue getting one element out,
<sizedata Size= "VALUE" IGNORE THE REST OF THEM/>
here is a fully formed, but shortened XML file:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Root Type="TRoot">
<Date>15/10/2009 12:30:08</Date>
<Folder fullpath="C:\" IsFilesNode="0">
<Name>C:\</Name>
<SizeData Size="74609238013" Allocated="72396726232" Wasted="234062016" CDRom="74727184384" Files="116733" Folders="12272" Compression="3"/>
</Folder>
<Folder fullpath="C:\" IsFilesNode="-1">
<Name>[Files]</Name>
<SizeData Size="18745407271" Allocated="18745561088" Wasted="153817" CDRom="18745475072" Files="69" Folders="0" Compression="1"/>
</Folder>
<Folder fullpath="C:\TEMP_\" IsFilesNode="0">
<Name>TEMP_ALAN</Name>
<SizeData Size="15469174140" Allocated="15489126222" Wasted="19886724" CDRom="15478267904" Files="9570" Folders="1832" Compression="1"/>
</Folder>
<Folder fullpath="C:\TEMP_\mp3\" IsFilesNode="0">
<Name>mp3</Name>
<SizeData Size="11504514137" Allocated="11513361814" Wasted="8829863" CDRom="11508457472" Files="4561" Folders="510" Compression="1"/>
</Folder>
</root>
The code I have is below:
<pre>Try
Dim odoc As New System.Xml.XmlDocument
odoc.Load("C:\test\test.xml")
Dim oXmlLog As System.Xml.XmlElement
Dim text As String = ""
For Each oXmlLog In odoc.SelectNodes("Root")
Dim node As System.Xml.XmlElement
For Each node In oXmlLog.SelectNodes("Folder")
Dim fullpath As String = node.Attributes.GetNamedItem("fullpath").Value
Dim SizeData As String = node.Attributes.GetNamedItem("SizeData").Value
Dim subnode = node.SelectSingleNode("Name")
Dim name As String = subnode.InnerText
Dim Date_ As XmlElement = odoc.DocumentElement
Dim Date_time As XmlNodeList = Date_.ChildNodes
Dim Date__time = (Date_time(0).InnerText)
text &= Date__time & " " & fullpath & " " & name & SizeData & Environment.NewLine
Next
Next
Console.Write(text)
Console.Read()
Catch ex As Exception
Console.Write(ex.ToString())
Console.Read()
End Try</pre>
But I can't get the SIZE from SIZEDATA out and into a variable, can you help???
|
|
|
|
|
I have done a bit of VBA before.
Any suggestions to code to find duplicate messages in Microsoft Outlook 2007.
Thanks!
Jwalant Natvarlal Soneji, BE IT, India
|
|
|
|
|
There's a bunch of different ways to do this. But, how you do it is going to depend on what you mean by "duplicate", how your mailbox is organized (folders and such,) and what you want to do with the duplicate messages.
|
|
|
|
|
Thanks.
I would like to search for duplicate mails in the complete mailbox and delete all except 1 mail.
Jwalant Natvarlal Soneji, BE IT, India
|
|
|
|