|
Hi jherico,
I'm extremely happy my library is actually being used! I wish I had more time to keep it up to date with new versions of MSVC++, but my spare time is very limited nowadays.
If you volunteer to do the testing and some sort of pair programming over the net, maybe we can port it to MSVC++ 7.x. These compilers have a more standard STL implementation, so things should be simple.
If you're up to the job, drop me a line and I'll send you along some code to test.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I'm also interested. My present work is an optimizer for cutting plastic rolls (tries to minimize waste). I have to keep a list of all possible combinations of cut sizes and this becomes quite large. At the end of the combinations generation, the list must be sorted. I'm now having performance problems with VC++ 7.x.
Regards,
João Paulo
|
|
|
|
|
I tried to upgrade the library to MSVC 7.x some months ago, but had to do it with a remote peer (I do not have that version of the compiler) and did not complete the porting.
If you have some experience with the compiler and feel like doing some tweaking, I can send you the code as it stood when we left the porting process. The fixings should be (IMHO) simple.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I want to use the block_allocator for list for VS.NET...I included the rebind allocator into the the block_allocator class like U do for the custom allocators..though it seems I run into another problem of variable matching...are there ne other things that need to be taken into account for changing this class for VS.NET...can ne one help with it...
Thanks in Advance,
Raj
|
|
|
|
|
Hi all and owner,
Do you get any success in compiling the same under VS .NET 2003 or any previous .NET verisons.
Please provide the updates. I wanted this URGENTLY for .NET.
Thanks
|
|
|
|
|
I've been using the block allocator for lists used in my company's code but ran into a serious problem. For a splice operation, a list will simply relink items out of one list into another list, IF the allocators for the two lists compare as the same. By default this is the case with the built in STL and the latest dinkum STL.
However, in a block allocated list, this is not the default. When a block allocated list does a splice, it implements it as an insert followed by an erase. First off, this changes splice from a constant time operation to an order N operation. Second, and more important, it invalidates iterators that pointed to the items in the original list. Now the C++ standard says that items moved from one list to another invalidate iterators pointing to them, but this was not the actual behaviour before I started using block allocated list, and we relied on the existing behaviour.
In case anyone is interested, I've come up with a solution that works in our case. In this we only have one type of list (basically a list of int pointers) that we need to concern ourselves with. So I subclassed block allocated list to a list of the concrete type we used and changed the constructors to always use the same static allocator. This allows the splice operation to return to its behaviour of being a) constant time and b) not invalidating iterators pointing to moved items.
Brad
|
|
|
|
|
This is a tricky issue. Standard allocators are required to compare to true , which means splice , swap and similar operations take constant time. This requirement does not hold for other allocators, though. Why did I decided to implement block_allocator this way? Having all allocators of a given class compare to true is tantamount to saying that these allocators do not hold per-object data, i.e. all internal data are static. In my particular case, having a static list of allocated blocks would have broken the per-container thread safeness usually taken for granted when working with STL. To remedy this I'd have had to include static locking when accessing this list, which I deemed unacceptable for performance reasons. When designing the allocator, I considered implementing these trade offs as a plugglable policy (so that one can decide between per-object data, global data without locking in single-threaded environments, and global, locked data in multithreaded programs), but the design became too complex and I finally settled for the current approach.
Your solution is a valid one, but keep in mind you can run into trouble if your program is multithreaded and there are more than one thread dealing with containers of the same kind.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I have read that custom block allocators sometimes can decrease the memory necessary for each object in the list.
Does this custom allocator decrease the memory usage of a list(map, or set)?
Thanks, for your help. I'm interested in decreasing size as well as increasing speed.
|
|
|
|
|
Does this custom allocator decrease the memory usage of a list(map, or set)?
On a first approach, no it doesn't. When an allocator is requested n bytes of fresh memory, it has no other option but to provide n bytes of fresh memory, so memory efficiency can only result from second-order effects:- Alignment issues. An allocator reserving memory on a per-object basis can incur a constant alignment overhead (e.g. reserving always 32 bytes when requested 31). IMHO, in release mode this will only happen in very bad designed allocators; I don't think
malloc based allocation suffers from this defect.
- Memory fragmentation. This can actually be an issue. Reserving memory in larger chunks prevents the formation of many unusable little blocks that can fragment memory to the point that the practical limit of total memory seen by the program is much less than theoretically possible. Here, block allocators have a reputation for doing a good job. Furthermore, block allocation improves locality and hence, due to cache considerations, speed.
Anyway, I'd suggest you conduct some tests to see how block_allocator performs with respect to memory efficiency in your particular application.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I am using the Dinkumware C++ library. Is this allocator compatible with Dinkumware STL library? I am in a unique situation, where I exactly know the number of items that I want to copy into the set. But, set does not have any member function like reserve. Then, I looked at the block allocator and it was specific to VC++ STL. The problem is that I use the hash_map from Dinkumware also. I cannot switch the library to use this.
I hope I am not bothering you.
Thank you
Thomas
modified 29-Aug-18 21:01pm.
|
|
|
|
|
Well, the STL shipped with VC++ is actually from Dinkumware, so chances are that this improved STL version you're using is allocator-compatible. Why don't you just give it a try?
#define DEFINE_BLOCK_ALLOCATED_SET
#include <blockallocator.h>
int main()
{
block_allocated_set<int,1000> s;
s.insert(5);
return 0;
} If this compiles and works without assertions, then I guess everything's fine. Regards,
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I am using the set_intersection algorithm on two sorted vectors to produce a resultant vector. When I do a reserve on the result vector, the performance increases from 6s to 2s for the same test. The test is a million intersection operations in a loop. But, when I do not know the size of the resultant vector, how can I reserve upwards in blocks? If I cannot do that, each insert operation on the vector from the algorithm will result in memory allocations slowing down the process.
The only other option is to move to another container like list. I was researching options to do this on vector though.
Thomas
modified 29-Aug-18 21:01pm.
|
|
|
|
|
I'm afraid this kind of allocators are useless for vector s, because these containers require that all elements are placed in a contiguous block of memory. When a vector runs out of preallocated space, it asks for more memory and copies the original block to the new, larger block.
My opinion is that the best solution would be (if you're not worried by memory consumption) to preallocate as much memory on the result vector as will ever be needed:
output.reserve(min(input1.capacity(),input2.capacity())) This ensures no reallocations or copyings will occur.
If this is not an option for you (whatever the reason), then switch to a list and give my allocator a try
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I tried the code performance, and found it does good. But the following
example causes a runtime error, somewhat like [HEAP]: block at ... modified after freed...].
void f()
{
typedef block_allocated_map<cstring,int,100> tmap;
tmap *m1 = new tmap, *m2 = new tmap;
delete m1; delete m2;
}
Note the order of calls to deallocate() is not LIFO. I found the cause of the bug is the use of the static _Nil in XTREE, badly deallocated() from a different instance of block_allocator. So I have rewritten the code...
|
|
|
|
|
The actual bug in the code is that allocators may not
contain any instance-specific data, as mine regretably do.
This has implications with thread-safeness too, with regard to
a question some other folk posted several weeks ago.
Currently I'm quite busy, but please look forward for a
new release of the library in a few weeks.
Joaquín M López Muñoz
|
|
|
|
|
This may come a little late months after your post, but now that I had some time to release v1.2 of block_allocator I feel like I can clarify this issue.
The bug you reported is caused (as you pointed out) to the deallocation of an static struct called _Nil by a different block_allocator instance than the one that allocated it. In my opinion, it is the VC++ STL implementation the one to blame for this bug, as it fails to check whether allocators are interchangeable before trying deallocation (however, this check is systematically performed at all other places where it matters).
One simple workaround for this problem is having a static container of the offending type so that both allocation and deallocation of _Nil are carried out by it regardless of the various containers of the same type being constructed and destructed during the life of the program. The following sample code shows the technique:
#define DEFINE_BLOCK_ALLOCATED_MAP
#include "blockallocator.h"
#include <crtdbg.h>
using namespace std;
typedef block_allocated_map<int,int,1024> bmap;
int main(void)
{
_CrtSetDbgFlag(_CRTDBG_DELAY_FREE_MEM_DF|_CrtSetDbgFlag(_CRTDBG_REPORT_FLAG));
bmap *m1=new bmap;
bmap *m2=new bmap;
delete m1;
delete m2;
_CrtCheckMemory();
return 0;
} As for the multithreading issues I mentioned in my first reply to your post, a closer reevaluation of the library shows it is as multithread safe as it can be (more info in the updated article above).
Hope this helps.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
Apparently SGI's STL offers 4 different allocators for use. Three of them are described as being thread-safe, and one is described as fast but only to be used by programs that have one thread. See
http://www.sgi.com/Technology/STL/Allocators.html for more. Is it safe to use the block allocator that you've written in programs with multiple threads?
|
|
|
|
|
I'm sorry I'm so late in answering this, but I didn't have time to do it before.
block_allocator is as thread safe as the STL itself is. More precisely, block_allocator is thread safe on a per-container basis, meaning that block allocated container instances don't mess around with each other and can be treated separately by different threads. This is, IMHO, a reasonable thread safeness guarantee, and as a matter of fact it is the type of safeness offered by many libraries (including STL). You may be glad to know, I hope, that thread safeness of a program using STL is not altered by the introduction of block_allocator . More info on the updated article above.
Joaquín M López Muñoz
Telefónica, Investigacion y Desarrollo
|
|
|
|
|
Thank you, Joaquin. Your article is very useful.
|
|
|
|
|
Gave this a shot with a project that has half-a-dozen maps and lists. I was unable to see any speed improvement with this addin - in fact, speed degraded by the smallest amount (about .14 seconds overall). What metrics did you use to measure the speed for this, and where is the example code to show it
|
|
|
|
|
At the end of this email you'll find a small test program. It does
random insertions and extractions on two different lists, both with
the default allocator and with block_allocator. I tested this several
times and block_allocator consistently achieved an improvement of
42-43 %. Maybe you can give it a try to determine if there's something
strange with your test platform. Mine is:
Compiler: MSVC++ 5.0 (no sps)
Build: Release, Multithreaded CRT, all switches at their default states
OS: Win95 OSR2
Machine PMMX, 233 MHz, 160MB
Anyway, this is not an academic exercise. I developed block_allocator
to use it in an image compression algorithm doing some intense
list work. The introduction of block_allocator alone speeded up the
performance by a factor of 15-20%, and this was a real program,
not a performance test. I'll be glad to know about your further
experiences with block_allocator. Perharps some conclusion can be drawn
from your test environment or the kind of work you use the containers for.
Regards, Joaquín M López Muñoz
--------------CODE FOLLOWS-------------
#define DEFINE_BLOCK_ALLOCATED_LIST
#include "blockallocator.h"
#include <iostream>
#include <windows.h>
using namespace std;
template <class t1,class="" t2,class="" x1,class="" x2="">
DWORD test(T1& l1,T2& l2,X1& x1,X2& x2)
{
srand(0);
DWORD dw=GetTickCount();
for(int n=0;n<500;++n){
for(int m=0;m<1000;++m){
if(rand()<rand_max 2)l1.push_front(x1);
="" if(rand()<rand_max="" 2)l2.push_front(x2);
="" }
="" for(t1::iterator="" it1="l1.begin();it1!=l1.end();){
" 2)it1="l1.erase(it1);
" else="" it1++;
="" for(t2::iterator="" it2="l2.begin();it2!=l2.end();){
" 2)it2="l2.erase(it2);
" it2++;
="" for(m="0;m<500;++m){
" 2)l2.push_back(x2);
="" 2)l1.push_back(x1);
="" }
="" l1.clear();
="" l2.clear();
="" return="" gettickcount()-dw;
}
struct="" node
{
="" dword="" x,y,z;
="" bool="" operator="" <(const="" node="" &)="" const;
="">(const node &) const;
bool operator ==(const node &) const;
bool operator !=(const node &) const;
};
int main(void)
{
DWORD total_default=0;
DWORD total_block_allocator=0;
for(int i=1;i<10;++i){
{
cout<<"default allocator: ";
list<int> l1;
list<node> l2;
int x1=0;
node x2;
DWORD dw=test(l1,l2,x1,x2);
cout<<dw<<endl;
total_default+="dw;
" }
="" {
="" cout<<"block="" allocator:="" ";
="" block_allocated_list<int,1024=""> l1;
block_allocated_list<node,1024> l2;
int x1=0;
node x2;
DWORD dw=test(l1,l2,x1,x2);
cout<
|
|
|
|
|
At the end of this email you'll find a small test program. It does
random insertions and extractions on two different lists, both with
the default allocator and with block_allocator. I tested this several
times and block_allocator consistently achieved an improvement of
42-43 %. Maybe you can give it a try to determine if there's something
strange with your test platform. Mine is:
Compiler: MSVC++ 5.0 (no sps)
Build: Release, Multithreaded CRT, all switches at their default states
OS: Win95 OSR2
Machine PMMX, 233 MHz, 160MB
Anyway, this is not an academic exercise. I developed block_allocator
to use it in an image compression algorithm doing some intense
list work. The introduction of block_allocator alone speeded up the
performance by a factor of 15-20%, and this was a real program,
not a performance test. I'll be glad to know about your further
experiences with block_allocator. Perharps some conclusion can be drawn
from your test environment or the kind of work you use the containers for.
Regards, Joaquín M López Muñoz
<br />
#define DEFINE_BLOCK_ALLOCATED_LIST<br />
#include "blockallocator.h"<br />
<br />
#include <iostream><br />
#include <windows.h><br />
<br />
using namespace std;<br />
<br />
template <class T1,class T2,class X1,class X2><br />
DWORD test(T1& l1,T2& l2,X1& x1,X2& x2)<br />
{<br />
srand(0);<br />
DWORD dw=GetTickCount();<br />
<br />
for(int n=0;n<500;++n){<br />
for(int m=0;m<1000;++m){<br />
if(rand()<RAND_MAX/2)l1.push_front(x1);<br />
if(rand()<RAND_MAX/2)l2.push_front(x2);<br />
}<br />
for(T1::iterator it1=l1.begin();it1!=l1.end(); ){<br />
if(rand()<RAND_MAX/2)it1=l1.erase(it1);<br />
else it1++;<br />
}<br />
for(T2::iterator it2=l2.begin();it2!=l2.end(); ){<br />
if(rand()<RAND_MAX/2)it2=l2.erase(it2);<br />
else it2++;<br />
}<br />
for(m=0;m<500;++m){<br />
if(rand()<RAND_MAX/2)l2.push_back(x2);<br />
if(rand()<RAND_MAX/2)l1.push_back(x1);<br />
}<br />
}<br />
<br />
l1.clear();<br />
l2.clear();<br />
<br />
return GetTickCount()-dw;<br />
}<br />
<br />
struct node<br />
{<br />
DWORD x,y,z;<br />
<br />
bool operator <(const node &) const;<br />
bool operator >(const node &) const;<br />
bool operator ==(const node &) const;<br />
bool operator !=(const node &) const;<br />
};<br />
<br />
int main(void)<br />
{<br />
DWORD total_default=0;<br />
DWORD total_block_allocator=0;<br />
for(int i=1;i<10;++i){<br />
{<br />
cout<<"default allocator: ";<br />
<br />
list<int> l1;<br />
list<node> l2;<br />
int x1=0;<br />
node x2;<br />
DWORD dw=test(l1,l2,x1,x2);<br />
cout<<dw<<endl;<br />
<br />
total_default+=dw;<br />
}<br />
{<br />
cout<<"block allocator: ";<br />
block_allocated_list<int,1024> l1;<br />
block_allocated_list<node,1024> l2;<br />
int x1=0;<br />
node x2;<br />
DWORD dw=test(l1,l2,x1,x2);<br />
cout<<dw<<endl;<br />
total_block_allocator+=dw;<br />
}<br />
}<br />
cout<<"----------------------------------------"<<endl;<br />
cout<<"Total time default allocator: "<<total_default<<endl;<br />
cout<<"Total time block allocator: "<<total_block_allocator<<endl;<br />
cout<<"Block allocator improvement: "<<<br />
(DWORD)(100.0*(total_default-total_block_allocator)/total_default)<<"%"<<endl;<br />
<br />
return 0;<br />
}<br />
|
|
|
|
|
Although I have not tried the block allocator described by this author, I have been designing my own block allocators, and my "speed" improvements roughly match the authors. There are three areas of interest when dealing with speed differences between allocators:
1) Allocation Time
2) Deallocation Time
3) Access Time
Block allocator will generally improve Allocation and Deallocation times. Where block allocators can really influence speed is in the use of the memory created. Because block allocators group memory together you reduce memory fragmentation and page faults. The percentage decrease in time to randomly insert into a linked list will be larger that the percentage decrease to insert at the head of a linked list or in a map.
Steven Schimmelpfennig
|
|
|
|
|
I tried this block allocator in one of my projects.
I used serveral maps, multimaps, sets and lists with up to hundredthousands of entries to calculate deformations in volumetric objects in realtime. The speed improvement in comparison with default Microsoft allocators was enormous. But the standard SGI STL version is still a bit faster!
Yours, Lothar.
|
|
|
|
|
I believe that the reason the SGI implementation of the STL was faster is because it does not use a variable to track the count of objects currently in the list. This makes inserting much faster, but it makes size() much slower. It is a tradeoff of which is more important to you. I found using block allocators I could almost match SGI STL time of inserting 10,000 integers into a list. Also getting the size took no measureable amount of time (less than 0.00 ms) while the SGI STL took (333.33 ms) to count all the elements
|
|
|
|
|