|
You really want to look for the patient using a better unique Id than name, and then you want to keep a reference to the existing patient node. This code will remove the name node only, which is obviously not what you want. It's probably easier to delete the existing record than to write code that either updates or edits, but either way, name is not a good unique Id, and you want to delete the patient record, not the name node.
Christian Graus - Microsoft MVP - C++
Metal Musings - Rex and my new metal blog
|
|
|
|
|
I am rather new to XML, so I tried your code as an exercise.
I added some logging to see intermediate states.
It did remove the entire patient record from xmlDoc, as you would have hoped.
To change the file, I guess you would need to terminate with xmlDoc.Save
Luc Pattyn
|
|
|
|
|
I have a function that will divide a loop iteration among threads. Each thread will be given its portion of work to do by giving it the array to iterate through and the start and end index that will be the range of indexes it will work on. I'm also assuming that I will not need to synchronize the access to the array since it works on each variable independently but I may be wrong. Here is what I have.
public Range[] ReturnIndexRanges(long[] indexes)
{
Range[] ranges = null;
Thread[] threads = new Thread[Environment.ProcessorCount];
long length = ((end - start) + 1) / Environment.ProcessorCount;
_start = start;
foreach (Thread t in threads)
{
t = new Thread(new ParameterizedThreadStart(ReturnRangesOfIndexes));
t.Start(new ThreadedReturnRangesOfIndexData(ranges, indexes, _start, length - 1));
_start += length;
length = end - length + 1;
}
return JoinDividedRanges(ranges);
}
private class ThreadedReturnRangesOfIndexData
{
public ThreadedReturnRangesOfIndexData(Range[] ranges, long[] indexes, long start, long end)
{
ranges_out = ranges;
indexes_in = indexes;
this.start = start;
this.end = end;
}
public long start, end;
public long[] indexes_in;
public Range[] ranges_out;
}
The ThreadedReturnRangesOfIndexData class is just a small class that hold some information such as the array of the indexes being passed into the method and the array of Ranges that will be returned. It will be like ref but since it is an array I do not need to add ref or out to it because an array is a reference type.
What I need is to figure out how to divide the indexes up evenly in the ReturnIndexRanges method at the top. This is my first attempt at creating a multi-threaded program so I would appreciate some tips if you have any. Should I just forget about threading?
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
Hello again,
I have some thoughts about this:
1) I hope you dont go through all this if ProcessorCount==1
or maybe you should, just to make sure it is also correct in that case ?
2) in my experience it works a lot easier when you never use end, but instead
use end+1 (saves a lot of mistakes, and it fits the for-loop paradigma).
3) I would not try to divide the job in exactly even parts, when you split
100 in 48+52 it won't matter that much.
4) I would be very careful not to have data that resides in the same cache line
(more precisely in addresses that differ by less than the cache line size)
be written by different processors. It would cause a lot of invalidate,
flush and reload operations, and would be devastating for performance.
And yes this is hard to achieve in a high-level language.
I hope but am not absolutely sure, arrays are always allocated at
an address that is a multiple of a sufficiently high power of 2
(say a multiple of 256 B). You can check this somewhat by experimenting,
better is to have it specd somewhere, but I dont recall having seen that.
5) 3+4 together means: I would, still thinking of arrays, split on
boundaries that correspond to 256 B or more, hence round up to 64
when dealing with 4B ints, etc.
6) I do recall some Intel articles that suggested making structs artificially
larger when you can afford it, just to achieve my point 4 (e.g. for control
data, such as task control blocks).
One of them showed how you can really kill a multi-processor system: have
two threads do spin locks on 2 data items that belong to the same cache line !
7) on the other hand, if you over-align your data, you increase the possibility
of cache trashing; example: if you need repetitive access to only 1000 bytes,
but all these bytes are at a stride of 1KB, they would constantly miss in
a cache even as big as 2MB, since cache associativity nowadays is something
like 8-way, so there would only be 8 cache line candidates to cache the
requested data (the cache line candidate in each way being determined by
the lowest address bits (but excluding those that correspond to the
cache line width).
8) in conclusion, I suggest you use variables for your basic parameters
(such as number of processors, estimated safe allignment (my 256 B above), etc.
And once you have it running, just do some more experiments with slightly
different values for those variables, just to see what really helps.
Good luck, and please keep posting your progress.
Luc Pattyn
|
|
|
|
|
Well thanks for the information. I think I will stick to the single threaded version of my method until I do more research. I didn't even think about the cache and aligning the data. I wanted to get the absolute max performance I could but it seems that will be much more complex than I thought it would. Here is the single threaded version. Its much simpler.
/// <summary>
/// Finds the ranges that may be in a sorted array of indexes. A range is one or more indexes that increment
/// by one, an array containing 6,7,8,10,400,401,402 contains three ranges 6-8, 10, and 400-402.
/// </summary>
/// <param name="indexes">A sorted array of indexes to search.</param>
/// <returns>An array of Ranges.</returns>
public Range[] FindIndexRanges(long[] indexes)
{
if (indexes.LongLength == 0)
return new Range[0];
if (indexes.LongLength == 1)
return new Range[] { new Range(indexes[0], indexes[0]) };
Range[] ranges = new Range[1];
bool resize = false; //this should be true after 1 range has been found
long firstIndex = indexes[0];
long lastIndex = indexes[0];
for (long i = 1; i < indexes.LongLength; i++)
{
if (indexes[i] == indexes[i - 1] + 1)
{
lastIndex++;
if (i != indexes.LongLength - 1)
continue;
}
if (resize)
Array.Resize<Range>(ref ranges, ranges.Length + 1);
ranges[ranges.Length - 1] = new Range(firstIndex, lastIndex);
firstIndex = lastIndex = indexes[i];
resize = true;
}
return ranges;
}
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
My first impression is the optimization requires 3 steps:
1. algorithm optimization, if any
(fits another message board!)
2. regular coding optimization
3. advanced optimization (such as multi-threading)
1. some questions:
- what can you tell about expected average range length ?
- is er an absolute max range length (or an extremely unlikely range length) ?
- do you need ranges in sequential order ?
- how harmful would it be to have a result array with a known length
(=useful data) but an exagerated "capacity" ?
2. I am not familiar with Array.Resize() but am very suspicious about it.
Please run similar code with an ArrayList or so, something that can
grow in chunks much larger than 1. I am afraid right now your CPU spends
a lot of time in growing an array !
3. keep this for later, indeed.
Luc Pattyn
|
|
|
|
|
Luc Pattyn wrote: - what can you tell about expected average range length ?
The average range length will be the length of the text you select in Word or notepad, this is going into a hex editor.
Luc Pattyn wrote: - is er an absolute max range length (or an extremely unlikely range length) ?
If you mean the length between the range's minvalue and maxvalue then it would be long.MaxValue + 1.
Luc Pattyn wrote: - do you need ranges in sequential order ?
Yes, if they are not in order then a copy and paste operation would be a impossible with multiple selections.
Luc Pattyn wrote: - how harmful would it be to have a result array with a known length
(=useful data) but an exagerated "capacity" ?
I would have to set the unused ranges values to -1 to signify that it is not needed. Why would I need to set it to a known length?
Luc Pattyn wrote: I am not familiar with Array.Resize() but am very suspicious about it.
Please run similar code with an ArrayList or so, something that can
grow in chunks much larger than 1. I am afraid right now your CPU spends
a lot of time in growing an array !
Well unless the user selects many items and that are separated(which would cause more than one range to be created) then it could become a performance killer. This is unlikely since FindIndexRanges is not used in the main algorithms, it is a programmer's convenience than can become very useful when using the public API around my hex editor control. I could only imagine a search selecting a million or few hundred thousand items (which are separated like [....] ... [..] ..... [.......]) in a large file and then the user/programmer somehow creating indexes for his needs and then needing to convert them back to ranges.
[you said]
"I forgot one question: what is the source of your array ?
I mean, if there already is something that performs the task
of ordering a lot of numbers and then putting them in an array, seems like
your range-finding requirement performance-wise should be integrated with it !"
When a user selects an a range of data the range will be added to a List<range>. When adding them range it will determan the position to insert using a binaryseach and a bitwise compliment operation. They will always be sorted from the start. No sort methods will need to be called. The source of the array may come from a method long[]GetSelectedIndexes which converted the ranges into an array of indexes. Although I do also have a method called Range[]GetSelectedIndexeRanges which is more likely to be called more than the index array version. Also the indexes could come from anywhere as FindIndexRanges will be exposed to the programmer using the control. If GetSelectedIndexes is called it will return the sorted indexes that are selected. They are already sorted as I said, GetSelectedIndexes uses the Ranges to create the index array.
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
I am still confused about your overall intentions.
I know the data is text (not memory blocks) in an existing, probably
unmodifiable, app. But then, I dont follow yet. It would help a lot
if you could sketch a more global view.
when you say " They will always be sorted from the start. No sort methods will need to be called." I understand you have code that adds a new number to
an existing, ordered, collection of numbers. Well that is exactly my point,
assume the existing collection of numbers already is range oriented
(which is true for an empty collection),
then you need to add one number at a time to such range-oriented collection,
which should not be to hard:
- locate in existing list (ur already doing that)
- if fits at end of existing range, enlarge that range
- else if fits at beginning of next existing range, enlarge that range
- else insert new range with length 1
I hope this makes sense...
Regards,
Luc Pattyn
|
|
|
|
|
Luc Pattyn wrote:
when you say " They will always be sorted from the start. No sort methods will need to be called." I understand you have code that adds a new number to
an existing, ordered, collection of numbers. Well that is exactly my point,
assume the existing collection of numbers already is range oriented
(which is true for an empty collection),
then you need to add one number at a time to such range-oriented collection,
which should not be to hard:
- locate in existing list (ur already doing that)
- if fits at end of existing range, enlarge that range
- else if fits at beginning of next existing range, enlarge that range
- else insert new range with length 1
I hope this makes sense...
I am doing that in the SelectData method. Perhaps I could email you the source code to HDPC 1.0 (my previous version of the hex editor control) and the new HDPC 2.0 (which I am just getting started on) If you wish to view the source then take a look at the Data class. The Data class is where all the complicated operations are. You will then understand what FindIndexRanges is used for if you look to see how it is used in the actual hex editor that goes with HDPC 1.0. Version 1.0 is very slow although I was attempting to keep simplicity a number one priority in version 1.0 it became a performance nightmare.
FindIndexRanges may not be needed anymore in the new version, I want to keep it for compatibility. I am still in the pre-alpha stage of development for HDPC 2.0 and somethings may be removed and changed. I may either remove the method(unlikely) or mark is as obsolete since I have new API that does not kill performance, although I cannot forsee how this control will be used if at all I want to allow as much flexibility as possible. The purpose of FindIndexRanges is to take an array of numbers, (it doesn't matter from where) and convert it to a range array. I cannot assume that those numbers ARE the selected data indexes.
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
Sure, full code could help, and I am willing to look at it,
but remember I am unfamiliar with your application domain, I do not know
what it is you want to achieve at the functional level.
Regards,
Luc Pattyn
|
|
|
|
|
There is full documentation, at least almost full for version 1.0. Is it possible to send a zip through codeproject?
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
dont know about mail thru CP, but I have mailed my e-mail adr to you.
where do I find the doc on DHPC, is there a web site or something ?
regards,
Luc Pattyn
|
|
|
|
|
Luc Pattyn wrote: where do I find the doc on DHPC, is there a web site or something ?
The source code and documentation is being emailed to you now. I have not created a web site for HDPC although I will create an article for it when it is completed or at least in beta. The documentation is in a .pdf document, class view diagrams (.png), readme.txt files, and Visio 2007 files,
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
Hi Captain,
I am under the impression your code generates way too many exceptions, and that's what
is slowing things down; optimizing with ranges wont make much of a difference as long
as Paint and MouseMove events generate one or more PointNotVisible events.
Will send you a quick hack to DataPresenterControl.cs
Regards,
Luc Pattyn
|
|
|
|
|
Luc Pattyn wrote:
I am under the impression your code generates way too many exceptions, and that's what
is slowing things down; optimizing with ranges wont make much of a difference as long
as Paint and MouseMove events generate one or more PointNotVisible events.
Yeah, it does generate too many exceptions.
The ranges are not only to improve performance of the core functionality but it is also to greatly reduce memory consumption. Selecting 1 million bytes will not use a ton of memory any more, it will use 2 longs for each selection (plus another long if you count the range variable in the Range struct).
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
Have you looked at version 2.0 yet?
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
I am getting build errors with HDPC2.0:
2 warnings in Range.cs
and 8 errors in Data.cs; first is:
Error 3 The name 'end' does not exist in the current context H:\...\HDPC\HDPC\Data.cs 235 25 HDPC
If you want me to have a look at 2.0 please provide a zip that builds.
BTW I did not recognize any of the things I looked at in 1.0, seems filenames and code
have changed drastically.
Greetings,
Luc Pattyn
|
|
|
|
|
Comment out Range[] ReturnIndexRanges(long[] indexes), that was the method I was working on when I first posted here about the treads.
Luc Pattyn wrote:
BTW I did not recognize any of the things I looked at in 1.0, seems filenames and code
have changed drastically.
Yes they have and much of it will be different. The public API will be mostly the same though with some additions and maybe slight modifications.
You are looking at the Data class which is the core of HDPC. It does not expose any methods to the programmer as it is internal. It is used by HDPC. I will email you a diagram of the architecture of version 2.0.
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
I am now looking at Data.cs in HDPC1.0 and HDPC2.0 in parallel.
Switching _selected and _changed from <int> collections to <range> collections is
definitely a big step in the right direction.
I am sure working with ranges EVRYWHERE will really speed things up.
To facilitate thiseven more, I would suggest to introduce a new class RangeCollection,
which somehow holds any number of non-overlapping ranges (maybe ordered, maybe not,
thats up to your judgement). The class would offer methods such as:
- Clear()
- AddRange(start, length) or (Range)
- RemoveRange(start, length) or (Range)
- GetEnumerator()
There would be no restrictions on AddRange; you must choose some details:
- when adjacent to an existing range, merge or not ? (may even cause a double merge)
- when overlapping, ignore the overlap or throw Exception ?
There would be no restrictions on RemoveRange (it may modify more than one of the
internally held ranges). You must choose the spec when it contains items not currently
represented in the range list (probably an Exception).
The Enumerator would return a collection of ranges that are logically equal to
what was added/removed before, without having to be identical...
In this way, the rest of your app can easily be range oriented, and the RangeCollection
class is not that difficult to create either.
If handled in this way, you may not need an "indexesArray to rangesArray" conversion at all,
or if you do (e.g. for API compatibility), its performance might be of lesser importance.
One final remark: your current Data class is not really thread-safe, and
the RangeCollection class would not be either, so it is not obvious for either of them
how to switch to a multi-threading approach (assuming you would still be
considering that at all; I dont expect it will come to that).
Regards,
Luc Pattyn
|
|
|
|
|
Luc Pattyn wrote: I would suggest to introduce a new class RangeCollection
That is a good idea. That will keep things organized at the very least.
Luc Pattyn wrote: when adjacent to an existing range, merge or not ? (may even cause a double merge)
Just as in the SelectData method ranges are joined even if they are adjacent but not overlapping.
Luc Pattyn wrote: when overlapping, ignore the overlap or throw Exception ?
When overlapping the range being overlapped may be extended or the new range being added may just be ignored as it will not be needed(if it is inside a range [.[...]....])
Luc Pattyn wrote: You must choose the spec when it contains items not currently
represented in the range list (probably an Exception).
If you remove a range that extends beyond any other range or does not even exist then it will do nothing or modify existing ranges.
Luc Pattyn wrote: If handled in this way, you may not need an "indexesArray to rangesArray" conversion at all,
or if you do (e.g. for API compatibility), its performance might be of lesser importance.
In HDPC 2.0 it wont be needed because it will be range orientated, even without the RangeCollection class available. I will keep the index - range conversions because they may be needed for some unforeseen reason. However it is unlikely I will ever use it in my own code.
Threads are beyond my knowledge for now so it will be single threaded. I'm making HDPC 2.0 capable of handling extremely large chucks of data and files. I doubt the CPU will be a bottleneck in this case. Besides thread creation creates additional overhead.
Thanks for all your help.
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
You're welcome.
I think it's time now to change the thread's title.
Luc Pattyn
|
|
|
|
|
I forgot one question: what is the source of your array ?
I mean, if there already is something that performs the task
of ordering a lot of numbers and then putting them in an array, seems like
your range-finding requirement performance-wise should be integrated with it !
I would even claim:
if you are handed a lot of numbers one by one, in random order, then
solving the overall task would work faster AND be easier than doing
it in two phases.
Furthermore, it does resemble memory recombination in a dynamic memory
manager, doesnt it ?
And if that is what it is, then there are even better ways of doing it
(taking advantage of the memory it describes !)
Luc Pattyn
|
|
|
|
|
Hey guys,
the page is wonderful, but I've got a problem with C# .NET! I wanna do some Clipboard things in a Timer, but it will not work because of the MTA Mode and I have no clue how to set it to STA Mode.
Pls help me!!!
|
|
|
|
|
Add the [STAThread] attribute above your Main method. The main method will most likely be in the static class Program inside of the Program.cs file.
namespace YourProgram
{
class Program
{
[STAThread]
static void Main()
{
}
}
}
█▒▒▒▒▒██▒█▒██
█▒█████▒▒▒▒▒█
█▒██████▒█▒██
█▒█████▒▒▒▒▒█
█▒▒▒▒▒██▒█▒██
|
|
|
|
|
Well I do recall some adventures we once had around the STA/MTA stuff.
We started of in STA (isnt that the default?) and had to switch to MTA to
achieve a sufficient level of parallellism.
So what I'm telling you is turning MTA down to STA, while solving one problem,
may introduce new problems performance wise. It just seems wrong to me that
there is such a global decision to make... or maybe I never sufficiently
understood it all in the first place.
Luc Pattyn
|
|
|
|
|