|
You could use lock () around the places you absolutely don't want to execute concurrently. This would remove all the uncertainty and make your code more reliable.
|
|
|
|
|
True, but I was aiming for a lock-free solution as they are surprisingly expensive.
|
|
|
|
|
It's surprisingly complex for such a short fragment because of the possible interactions between threads, and the shuffling of values among four variables. Maybe it could be rewritten in a simpler way.
A useful step in analyzing code like this is identifying invariant conditions, that are guaranteed to always be true at some point in the code. But it looks like the variable shuffling makes this impossible.
|
|
|
|
|
Yeah, I've temporarily gone with a lock which solves the problem for now. Then I'm going to try Luc's suggestion of using a couple of queues. I already have a lock-free queue that's working elsewhere which I should be able to use, since there would only be one producer for each queue (a limitation for lock-free queues it would seem).
|
|
|
|
|
SK Genius wrote: there would only be one producer ... a limitation for lock-free queues it would seem
a limitation for lock-free whatevers I would say.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
I am sure you can find a solution for adding more producers, if you are good at lock-free algorithms. This page suggest that it is not impossible.
http://www.1024cores.net/home/lock-free-algorithms/queues
|
|
|
|
|
when you have a producer, a consumer, and a number (at least 2) of equivalent buffers, I find it most easy to have two queues, one holding "empty" buffers (emptied by the consumer, to be filled by the producer), one holding "full" buffers (filled by the producer, to be processed by the consumer). That is an approach that is simple to understand; it works well, it doesn't create new objects once initialized, and it doesn't really need locking, except the Queue.Dequeue operation should wait for a buffer being returned.
In pseudo-code:
emptyQ=CreateQueue();
fullQ=CreateQueue();
for(N times) emptyQ.Queue(new Buffer());
forever {
buf=emptyQ.Dequeue();
...fill buf
fullQ.Queue(buf);
}
forever {
buf=fullQ.Dequeue();
... process buf
emptyQ.Queue(buf);
}
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Good call, I might be able to fit that in. With the change that the consumer won't release a buffer until it has new one to render so it would be holding 2 buffers at times, so the minimum number of required buffers becomes 3.
|
|
|
|
|
What is the best (by that I mean the fastest) way to merge some smaller files of sorted data into a larger file, and then later merge the larger files into a single sorted file.
Some particulars:
Input: text files each 6 GB, short records of 500,000,000 10 digit numbers in three formats, sequential ascending, sequential descending, and random selection order, long records with null strings to 65533 byte strings in random order, with many duplicate strings, and then as unique strings where the last 10 digits of each line are replaced with one of the random short strings. A total of 5 different files for 5 different tests.
Memory is allocated with VirtualAlloc (3 GB enabled in boot.ini and in the Link line and 4 GB if memory in the system), memory blocks of 2 GB, 1 GB, and 47 MB, plus 2 more small blocks are allocated.
The first step does the initial sort. The 2GB buffer is used as an input buffer, the 1 GB buffer is used for building mod 16 aligned and sized strings and for the nodes for a RedBlackTree, the 47 MB buffer is used to hold pointers to the strings for the sort tree and for the records moved after the sort. For the short records, this results in about 211 sorted blocks written to a file. This step is complete for now.
The next step combines the 47 MB sorted blocks into a big sort block (2 GB) using the 1 GB buffer for tree nodes and 28 sort blocks are combined into each big block (limited by the node space available - 16 BYTES for each record in the tree), and there are 8 big sort blocks written to a file. This step is complete for now, but could be changed if there is a better merge solution.
The final step needs to merge these big blocks into a single output file, but has to be done in a fragmented mode because there is not enough memory to hold the files. The 2 GB buffer is split into pieces (8 in this case for the 8 big sort blocks). The header from each sort block and the initial pointers from each sort block and the initial associated records from each sort block are read into the 8 pieces of the 2 GB buffer (with an initial header pointing to each big sort block).
Essentially you need to sort the big blocks in ascending sequence by the first record in each big block (how to do this efficiently?), then select and output all the records in the first big block that are lower than the first record in the second big block (how to do this efficiently?), then unlink the first buffer and re-link it with the other set of buffers in ascending sequence (how to do this efficiently?), refilling each block buffer as it exhausts, deleting the block as all of its records are selected and written.
Unfortunately, 6 GB is not necessarily the maximum size file so the 8 big blocks could become thousands so linked lists are not efficient. Arrays of thousands of pointers have insertion timing property problems. Trees are better at insertions than arrays, but have timing problems as well (RedBlackTree recursive rotations, especially with deletions).
Which way is best (overall) for merging, and why?
Dave.
|
|
|
|
|
Sorry Dave, that does not make much sense to me. You will be wasting lots of time reading, decoding, sorting, writing data, all of which isn't very useful.
I would:
- avoid such an amount of data if at all possible;
- avoid large files; I prefer several smaller files;
- avoid text files; binary files are smaller and get processed much faster;
- probably use a database.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
|
|
|
|
|
Luc,
I'm sorry to give you a 1 for your answer, but I disagree with all of your points. I got into this because I was sorting such a list using Windows sort.exe and was disappointed with the performance. I had a file that I wanted to sort and it was a text file. I do not believe that a general purpose database can out-perform a tailored sort process, and you would have all of the database overhead in addition to the data itself to write and move around. The following are the timings from my testing thus far:
The current time is: 10:44:34.51
\restart18\debug>if exist c:\longdups.txt sort.exe /REC 65535 c:\longdups.txt /O c:\winlong.txt
The current time is: 11:07:44.81
The current time is: 10:44:34.51
----------
0:23:10.30 = 1,390,300 msec
The current time is: 10:24:20.18
\restart19\debug>if exist c:\longdups.txt ..\sortit.exe c:\longdups.txt c:\mylong.txt
The current time is: 10:30:14.76
The current time is: 10:24:20.18
----------
00:05:54.58 = 354,580 msec, (354 580*100)/1,390,300 = 25.50%
--------------------------------------------------------------------------------
The current time is: 17:20:45.57
\restart18\debug>if exist c:\textseq1.txt sort.exe c:\textseq1.txt /O c:\winshort.txt
The current time is: 18:10:40.32
The current time is: 17:20:45.57
-----------
00:49:54.75 = 2,994,750 msec.
The current time is: 11:10:25.21
\restart19\debug>if exist c:\textseq1.txt ..\sortit.exe c:\textseq1.txt c:\myshort.txt
The current time is: 11:19:27.01
The current time is: 11:10:25.21
-----------
00:09:01.80 = 541,800 msec, (541,800*100)/2,994,750 = 18.09%
--------------------------------------------------------------------------------
The current time is: 13:35:56.95
\restart18\debug>if exist c:\textdseq.txt sort.exe c:\textdseq.txt /O c:\wdshort.txt
The current time is: 14:22:32.76
The current time is: 13:35:56.95
-----------
00:46:35.81 = 2,795,810 msec.
The current time is: 12:13:07.90
\restart19\debug>if exist c:\textdseq.txt ..\sortit.exe c:\textdseq.txt c:\mdshort.txt
The current time is: 12:26:48.50
The current time is: 12:13:07.90
-----------
00:13:40.60 = 820,600 msec, (820,600*100)/2,795,810 = 29.35%
--------------------------------------------------------------------------------
The current time is: 19:23:06.04
\restart18\debug>if exist c:\textuseq.txt sort.exe c:\textuseq.txt /O c:\wushort.txt
The current time is: 20:37:44.10
The current time is: 19:23:06.04
-----------
01:14:38.06 = 4,478,060 msec
The current time is: 14:30:05.59
\restart19\debug>if exist c:\textuseq.txt ..\sortit.exe c:\textuseq.txt c:\mushort.txt
The current time is: 15:09:04.34
The current time is: 14:30:05.59
-----------
00:38:58.75 = 2,338,750 msec, (2,338,750*100)/4,478,060 = 52.23%
-----------
(not complete - need merge)
--------------------------------------------------------------------------------
The current time is: 8:34:48.34
\restart15\debug>if exist c:\longuniq.txt sort.exe c:\longuniq.txt /O c:\wulong.txt
The current time is: 9:00:27.12
The current time is: 8:34:48.34
----------
0:25:38.78 = 1,538,780 msec.
The current time is: 20:37:07.09
\restart19\debug>if exist c:\longuniq.txt ..\sortit.exe c:\longuniq.txt c:\mulong.txt
The current time is: 20:54:34.73
The current time is: 20:37:07.09
-----------
00:17:27.62 = 1,047,620 msec, (1,047,620*100)/1,538,780 = 68.08%
(not complete - need merge)
Sill looking for any other hints about ways to do this efficiently.
Dave.
|
|
|
|
|
You fail to see that the moment your data don't fully fit into memory, the file system will become your main bottleneck. A general purpose database will be faster than a general purpose file system by several orders of magnitude!
You say a program tailored to your problem would be faster than a database, but the tailoring would need to include the file system as well!
|
|
|
|
|
Stefan,
I just read the CodeProject "Daily News" for today. Interesting article "What would Feynman do?". It seems that you are suggesting a Feynman solution. I am not talking about a 6 core 16 GB server with a farm of hundreds of drives, I am talking abut my simple PC with 4 GB of memory and 2 hard drives running Windows XP. I want a utility string sort solution.
You say "several orders of magnitude better!" I do not think this is very accurate. I'll tell you what I will do. I will supply you with an executable and a small template file. The execution of the executable will create all of my test files (5 files, 6 GB each, the same files that I am using). You can then write your version of a single program to sort these files (case sensitive), then execute Windows sort.exe for each file to create a baseline time on your machine, then execute your program for each file and report all of your timing statistics here. You have an advantage in that you already have my current times and you can refine your algorithm until you get better times. Note that is is not the time for any test or the overall time of all of the tests that matters, it is the percentage of improvement over the baseline sort.exe times that is important. This is done to even out timing differences between different machines. I just do not believe several orders of magnitude.
Interested in trying? Note: I currently do not have a complete solution, I still need the final merge step for the last two files that is what I was trying to get suggestions about in this thread. I will complete the program in some way and post the final results for the final 2 files.
Dave.
|
|
|
|
|
Sorry, 'several orders of magnitude' was probably overstating. I've been thinking of one particular case where a database solution outperformed a file-based solution by a factor of ~15-20. But when I think about it, part of the acceleration may have been due to the algorithms being used, so 'one order of magnitude' might be closer to the real thing.
I've also thought of another case with a 120 GB database of geographical data that turned out to be so fast that my algorithms to dig out some geometrical properties out of a given (and comparatively small) result set were much more time critical than the two-dimensional search to find the data in a particular geographical region. On a sidenote, that was about 15 years ago, so I doubt the database server was all that much faster than your current PC.
In both cases I was just the 'user' of the data though - I am not a database specialist. I just wanted to point out that your opinion of databases might be premature, not that I could write down a solution in 5 minutes.
Anyway, if you want me to try you'd have to wait for a month or two as the single hard disk in my 5-year old PC has barely enough room to fit one of those 6 GB files; forget about 5.
|
|
|
|
|
Stefan,
I'd be happy to let you try, misery loves company! If you send me an Email I'll respond with an attached zip of the create.exe file and the template file (includes a READ.ME and some testing batch files).
To ease the file size problems, create the test files one at a time, then write the file to an external USB drive, test with sort.exe and with your sort program writing the output files and any intermediate files to the external drive and compare the results, then save the timing results, then delete the test file and the sorted files, then create the next test file. That should keep the files down to 18 GB.
As long as the input and output files are on the same drives for both tests, the relative times should be a good indicator of the relative performance.
To create the files one at a time, create null files for each of the named test files (in the create.exe directory) and create.exe will skip all creations, then delete the file you want to test. Create.exe will create that file (since it is missing), test the file two ways, save the timing results, then replace the test file with a null file and delete the next file to test.
OBTW, create.exe requires SSE.
Dave.
|
|
|
|
|
Skip lists[^] provide a good alternative to R/B Trees when space is a major concern (here is a C# implementation on CodeProject[^]). In general, the best approach to multi-way merges is to use a tree-based or a skip list-based priority queue. However, if recursive rotations of R/B trees came up as a performance concern, your system is way past that stage of fine-tuning when one-size-fits-all suggestions may still apply
|
|
|
|
|
dasblinkenlight,
Thank you for the suggestion. I will look up the "skip lists" implementation.
Dave.
|
|
|
|
|
Just out of curiosity, why do you need to sort such large amounts of data? Is somebody planning on reading these files
Illogical thoughts make me ill
|
|
|
|
|
musefan,
Yes, someone will read that data. I had originally created a program to randomize a list. I wanted a verify that I had selected each entry in the original list once and once only. A sort would re-order the randomized list then a simple file compare (fc.exe) of the original file and the sorted file could verify the correctness of the randomization.
Sort.exe was very slow (compared to the randomizing), so I went down that rabbit hole to develop a faster sort, and got lost in a maze of twisty little tunnels..... (what an Adventure!)
Another method would have been to bitmap the randomized records and see that they did not overlap, but this would work only for numeric strings, and the randomize program worked with any string data. To bitmap that would require a string sort anyway to equate a string with an entry number.
Dave.
|
|
|
|
|
Member 4194593 wrote: Which way is best (overall) for merging, and why?
What Luc said, because he's right. You were competing with Sort.exe , which isn't intended to work with large files.
Now, a database doesn't have any problems with sorting, because they needn't load the entire data into memory. If you can manipulate the data, then you can add a "key" to each item that needs be sorted. Next, you could build an index.
Fetch item from disk by key, compare, update index. Building an index on a database can be a relative time-consuming process, but I imagine that it's faster than forcing Windows' to swap a lot of Virtual Memory, pushing all other processes to the background.
You could also partition your blocks; have the process that "generates" the first block only filter out the lines that start with an "A", the second with a "B".
I are Troll
|
|
|
|
|
Eddy,
Ok. You suggest use a database. What database do you suggest? If I had this database, would I need to program it or configure it in some way? Is it as easy to use as executing sort.exe? Don't get me wrong, I've been programming since '65, I am familiar with databases and their overhead. I still do not believe that any database solution would be as fast, as cheap, or easy to use as something like sort.exe.
My offer to you is the same as to Stefan63 above. Send me an email so I can reply with a zip of an executable (of you prefer, the source and a makefile to create the executable). This executable will create the 5 large files (6 GB each) that I was testing with. Write your own sort to sort these files and sort the same files with sort.exe and compare the percentage faster your sort is than sort.exe for each file, and for the overall total testing (comparing this way evens the playing field for CPU, memory, and disk speed differences between your system and mine).
Dave.
|
|
|
|
|
Member 4194593 wrote: You suggest use a database.
No, my suggestion would involve Hadoop and a lot of PS2's
Member 4194593 wrote: What database do you suggest?
Sql Server if you can afford it, although it would be an expensive solution just for sorting. Sql Express would do too, I guess. You can only pack 4 Gb in a database in the free version, but there's no problem in adding several databases to the server.
Member 4194593 wrote: If I had this database, would I need to program it or configure it in some way?
Ideally, it'd use a bulk-import. You could also have it generate the index, but it'd be more fun to build such a thing by hand. The advantage would be that you can easily partition your data, even over multiple computers.
Member 4194593 wrote: Is it as easy to use as executing sort.exe?
From the users' point of view, this would be easier as their data simply already appears to be sorted by applying the index.
Member 4194593 wrote: Don't get me wrong, I've been programming since '65, I am familiar with databases and their overhead. I still do not believe that any database solution would be as fast, as cheap, or easy to use as something like sort.exe.
There's free databases that are scalable and well-documented. Yes, I get your point - but most people won't have a dataset that's this large, so the use-case shouldn't be built around your average user
Member 4194593 wrote: Write your own sort to sort these files and sort the same files with sort.exe and compare the percentage faster your sort is than sort.exe for each file, and for the overall total testing (comparing this way evens the playing field for CPU, memory, and disk speed differences between your system and mine).
My gratitude for the challenge, but I don't have the time nor the hardware. Have you considered writing a CodeProject article on the subject?
I are Troll
|
|
|
|
|
Eddy,
Thank you for the detailed reply. I think I'll stick to what I know. About writing an article - I never considered it. No one here on the CodeProject would like my code or what I was trying to do or the way I was doing it, they wouldn't like it on so many different levels that I wouldn't even try to enumerate.
Dave.
|
|
|
|
|
Member 4194593 wrote: No one here on the CodeProject would like my code
That's part of the job; when someone comes along and sings "try this", then we'll cut it up and look at it from all sides to see whether or not it's better than our old habits. Now, I might not like the use-case, but that doesn't mean that it's not interesting topic to a lot of people (otherwise you'd have 0 replies)
..and sometimes, it's more interesting how you got a result - keep in mind that most people have problems with pointers
I are Troll
|
|
|
|
|
Eddy,
1) My program is a console app.
2) It is totally MASM (except API calls for I/O).
3) It honors the Intel ABI, but in addition it saves and restores all registers around calls and saves flags around all API calls.
4) It uses mainly register calling sequences, not calling arguments (I think that is correct - from my ancient Fortran AutoCoder, "Parameters" are what you expect, "Argument" are what you get).
5) It returns no values, only zero or non-zero flags - any returned values are saved in fixed locations.
6) It uses no call stack frame - ebp is used as a general purpose register.
7) It has tons of commentary to explain What I did so I can pick up the program 20 years from now and understand Why I did what I did.
8) It uses many levels of conditional assembly to do validations and/or timing displays, not checking option flags and branching around unused code - the unused code is not even assembled.
9) There are conditional code segments in the RedBlackTree implementation to validate correctness and to display the tree structure (a short 31 element tree build).
10) The conditional code snipits save and restore any used registers to prevent any side effects.
11) It does not use anonymous symbols, instead, it uses lots of meaningful long labels everywhere, including places where a label is not needed but separates code groups.
12) Need I say more.
Nobody will like it here.
Dave.
|
|
|
|
|