|
I have read that custom block allocators sometimes can decrease the memory necessary for each object in the list.
Does this custom allocator decrease the memory usage of a list(map, or set)?
Thanks, for your help. I'm interested in decreasing size as well as increasing speed.
|
|
|
|
|
Does this custom allocator decrease the memory usage of a list(map, or set)?
On a first approach, no it doesn't. When an allocator is requested n bytes of fresh memory, it has no other option but to provide n bytes of fresh memory, so memory efficiency can only result from second-order effects:- Alignment issues. An allocator reserving memory on a per-object basis can incur a constant alignment overhead (e.g. reserving always 32 bytes when requested 31). IMHO, in release mode this will only happen in very bad designed allocators; I don't think
malloc based allocation suffers from this defect.
- Memory fragmentation. This can actually be an issue. Reserving memory in larger chunks prevents the formation of many unusable little blocks that can fragment memory to the point that the practical limit of total memory seen by the program is much less than theoretically possible. Here, block allocators have a reputation for doing a good job. Furthermore, block allocation improves locality and hence, due to cache considerations, speed.
Anyway, I'd suggest you conduct some tests to see how block_allocator performs with respect to memory efficiency in your particular application.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I am using the Dinkumware C++ library. Is this allocator compatible with Dinkumware STL library? I am in a unique situation, where I exactly know the number of items that I want to copy into the set. But, set does not have any member function like reserve. Then, I looked at the block allocator and it was specific to VC++ STL. The problem is that I use the hash_map from Dinkumware also. I cannot switch the library to use this.
I hope I am not bothering you.
Thank you
Thomas
modified 29-Aug-18 21:01pm.
|
|
|
|
|
Well, the STL shipped with VC++ is actually from Dinkumware, so chances are that this improved STL version you're using is allocator-compatible. Why don't you just give it a try?
#define DEFINE_BLOCK_ALLOCATED_SET
#include <blockallocator.h>
int main()
{
block_allocated_set<int,1000> s;
s.insert(5);
return 0;
} If this compiles and works without assertions, then I guess everything's fine. Regards,
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I am using the set_intersection algorithm on two sorted vectors to produce a resultant vector. When I do a reserve on the result vector, the performance increases from 6s to 2s for the same test. The test is a million intersection operations in a loop. But, when I do not know the size of the resultant vector, how can I reserve upwards in blocks? If I cannot do that, each insert operation on the vector from the algorithm will result in memory allocations slowing down the process.
The only other option is to move to another container like list. I was researching options to do this on vector though.
Thomas
modified 29-Aug-18 21:01pm.
|
|
|
|
|
I'm afraid this kind of allocators are useless for vector s, because these containers require that all elements are placed in a contiguous block of memory. When a vector runs out of preallocated space, it asks for more memory and copies the original block to the new, larger block.
My opinion is that the best solution would be (if you're not worried by memory consumption) to preallocate as much memory on the result vector as will ever be needed:
output.reserve(min(input1.capacity(),input2.capacity())) This ensures no reallocations or copyings will occur.
If this is not an option for you (whatever the reason), then switch to a list and give my allocator a try
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
I tried the code performance, and found it does good. But the following
example causes a runtime error, somewhat like [HEAP]: block at ... modified after freed...].
void f()
{
typedef block_allocated_map<cstring,int,100> tmap;
tmap *m1 = new tmap, *m2 = new tmap;
delete m1; delete m2;
}
Note the order of calls to deallocate() is not LIFO. I found the cause of the bug is the use of the static _Nil in XTREE, badly deallocated() from a different instance of block_allocator. So I have rewritten the code...
|
|
|
|
|
The actual bug in the code is that allocators may not
contain any instance-specific data, as mine regretably do.
This has implications with thread-safeness too, with regard to
a question some other folk posted several weeks ago.
Currently I'm quite busy, but please look forward for a
new release of the library in a few weeks.
Joaquín M López Muñoz
|
|
|
|
|
This may come a little late months after your post, but now that I had some time to release v1.2 of block_allocator I feel like I can clarify this issue.
The bug you reported is caused (as you pointed out) to the deallocation of an static struct called _Nil by a different block_allocator instance than the one that allocated it. In my opinion, it is the VC++ STL implementation the one to blame for this bug, as it fails to check whether allocators are interchangeable before trying deallocation (however, this check is systematically performed at all other places where it matters).
One simple workaround for this problem is having a static container of the offending type so that both allocation and deallocation of _Nil are carried out by it regardless of the various containers of the same type being constructed and destructed during the life of the program. The following sample code shows the technique:
#define DEFINE_BLOCK_ALLOCATED_MAP
#include "blockallocator.h"
#include <crtdbg.h>
using namespace std;
typedef block_allocated_map<int,int,1024> bmap;
int main(void)
{
_CrtSetDbgFlag(_CRTDBG_DELAY_FREE_MEM_DF|_CrtSetDbgFlag(_CRTDBG_REPORT_FLAG));
bmap *m1=new bmap;
bmap *m2=new bmap;
delete m1;
delete m2;
_CrtCheckMemory();
return 0;
} As for the multithreading issues I mentioned in my first reply to your post, a closer reevaluation of the library shows it is as multithread safe as it can be (more info in the updated article above).
Hope this helps.
Joaquín M López Muñoz
Telefónica, Investigación y Desarrollo
|
|
|
|
|
Apparently SGI's STL offers 4 different allocators for use. Three of them are described as being thread-safe, and one is described as fast but only to be used by programs that have one thread. See
http://www.sgi.com/Technology/STL/Allocators.html for more. Is it safe to use the block allocator that you've written in programs with multiple threads?
|
|
|
|
|
I'm sorry I'm so late in answering this, but I didn't have time to do it before.
block_allocator is as thread safe as the STL itself is. More precisely, block_allocator is thread safe on a per-container basis, meaning that block allocated container instances don't mess around with each other and can be treated separately by different threads. This is, IMHO, a reasonable thread safeness guarantee, and as a matter of fact it is the type of safeness offered by many libraries (including STL). You may be glad to know, I hope, that thread safeness of a program using STL is not altered by the introduction of block_allocator . More info on the updated article above.
Joaquín M López Muñoz
Telefónica, Investigacion y Desarrollo
|
|
|
|
|
Thank you, Joaquin. Your article is very useful.
|
|
|
|
|
Gave this a shot with a project that has half-a-dozen maps and lists. I was unable to see any speed improvement with this addin - in fact, speed degraded by the smallest amount (about .14 seconds overall). What metrics did you use to measure the speed for this, and where is the example code to show it
|
|
|
|
|
At the end of this email you'll find a small test program. It does
random insertions and extractions on two different lists, both with
the default allocator and with block_allocator. I tested this several
times and block_allocator consistently achieved an improvement of
42-43 %. Maybe you can give it a try to determine if there's something
strange with your test platform. Mine is:
Compiler: MSVC++ 5.0 (no sps)
Build: Release, Multithreaded CRT, all switches at their default states
OS: Win95 OSR2
Machine PMMX, 233 MHz, 160MB
Anyway, this is not an academic exercise. I developed block_allocator
to use it in an image compression algorithm doing some intense
list work. The introduction of block_allocator alone speeded up the
performance by a factor of 15-20%, and this was a real program,
not a performance test. I'll be glad to know about your further
experiences with block_allocator. Perharps some conclusion can be drawn
from your test environment or the kind of work you use the containers for.
Regards, Joaquín M López Muñoz
--------------CODE FOLLOWS-------------
#define DEFINE_BLOCK_ALLOCATED_LIST
#include "blockallocator.h"
#include <iostream>
#include <windows.h>
using namespace std;
template <class t1,class="" t2,class="" x1,class="" x2="">
DWORD test(T1& l1,T2& l2,X1& x1,X2& x2)
{
srand(0);
DWORD dw=GetTickCount();
for(int n=0;n<500;++n){
for(int m=0;m<1000;++m){
if(rand()<rand_max 2)l1.push_front(x1);
="" if(rand()<rand_max="" 2)l2.push_front(x2);
="" }
="" for(t1::iterator="" it1="l1.begin();it1!=l1.end();){
" 2)it1="l1.erase(it1);
" else="" it1++;
="" for(t2::iterator="" it2="l2.begin();it2!=l2.end();){
" 2)it2="l2.erase(it2);
" it2++;
="" for(m="0;m<500;++m){
" 2)l2.push_back(x2);
="" 2)l1.push_back(x1);
="" }
="" l1.clear();
="" l2.clear();
="" return="" gettickcount()-dw;
}
struct="" node
{
="" dword="" x,y,z;
="" bool="" operator="" <(const="" node="" &)="" const;
="">(const node &) const;
bool operator ==(const node &) const;
bool operator !=(const node &) const;
};
int main(void)
{
DWORD total_default=0;
DWORD total_block_allocator=0;
for(int i=1;i<10;++i){
{
cout<<"default allocator: ";
list<int> l1;
list<node> l2;
int x1=0;
node x2;
DWORD dw=test(l1,l2,x1,x2);
cout<<dw<<endl;
total_default+="dw;
" }
="" {
="" cout<<"block="" allocator:="" ";
="" block_allocated_list<int,1024=""> l1;
block_allocated_list<node,1024> l2;
int x1=0;
node x2;
DWORD dw=test(l1,l2,x1,x2);
cout<
|
|
|
|
|
At the end of this email you'll find a small test program. It does
random insertions and extractions on two different lists, both with
the default allocator and with block_allocator. I tested this several
times and block_allocator consistently achieved an improvement of
42-43 %. Maybe you can give it a try to determine if there's something
strange with your test platform. Mine is:
Compiler: MSVC++ 5.0 (no sps)
Build: Release, Multithreaded CRT, all switches at their default states
OS: Win95 OSR2
Machine PMMX, 233 MHz, 160MB
Anyway, this is not an academic exercise. I developed block_allocator
to use it in an image compression algorithm doing some intense
list work. The introduction of block_allocator alone speeded up the
performance by a factor of 15-20%, and this was a real program,
not a performance test. I'll be glad to know about your further
experiences with block_allocator. Perharps some conclusion can be drawn
from your test environment or the kind of work you use the containers for.
Regards, Joaquín M López Muñoz
<br />
#define DEFINE_BLOCK_ALLOCATED_LIST<br />
#include "blockallocator.h"<br />
<br />
#include <iostream><br />
#include <windows.h><br />
<br />
using namespace std;<br />
<br />
template <class T1,class T2,class X1,class X2><br />
DWORD test(T1& l1,T2& l2,X1& x1,X2& x2)<br />
{<br />
srand(0);<br />
DWORD dw=GetTickCount();<br />
<br />
for(int n=0;n<500;++n){<br />
for(int m=0;m<1000;++m){<br />
if(rand()<RAND_MAX/2)l1.push_front(x1);<br />
if(rand()<RAND_MAX/2)l2.push_front(x2);<br />
}<br />
for(T1::iterator it1=l1.begin();it1!=l1.end(); ){<br />
if(rand()<RAND_MAX/2)it1=l1.erase(it1);<br />
else it1++;<br />
}<br />
for(T2::iterator it2=l2.begin();it2!=l2.end(); ){<br />
if(rand()<RAND_MAX/2)it2=l2.erase(it2);<br />
else it2++;<br />
}<br />
for(m=0;m<500;++m){<br />
if(rand()<RAND_MAX/2)l2.push_back(x2);<br />
if(rand()<RAND_MAX/2)l1.push_back(x1);<br />
}<br />
}<br />
<br />
l1.clear();<br />
l2.clear();<br />
<br />
return GetTickCount()-dw;<br />
}<br />
<br />
struct node<br />
{<br />
DWORD x,y,z;<br />
<br />
bool operator <(const node &) const;<br />
bool operator >(const node &) const;<br />
bool operator ==(const node &) const;<br />
bool operator !=(const node &) const;<br />
};<br />
<br />
int main(void)<br />
{<br />
DWORD total_default=0;<br />
DWORD total_block_allocator=0;<br />
for(int i=1;i<10;++i){<br />
{<br />
cout<<"default allocator: ";<br />
<br />
list<int> l1;<br />
list<node> l2;<br />
int x1=0;<br />
node x2;<br />
DWORD dw=test(l1,l2,x1,x2);<br />
cout<<dw<<endl;<br />
<br />
total_default+=dw;<br />
}<br />
{<br />
cout<<"block allocator: ";<br />
block_allocated_list<int,1024> l1;<br />
block_allocated_list<node,1024> l2;<br />
int x1=0;<br />
node x2;<br />
DWORD dw=test(l1,l2,x1,x2);<br />
cout<<dw<<endl;<br />
total_block_allocator+=dw;<br />
}<br />
}<br />
cout<<"----------------------------------------"<<endl;<br />
cout<<"Total time default allocator: "<<total_default<<endl;<br />
cout<<"Total time block allocator: "<<total_block_allocator<<endl;<br />
cout<<"Block allocator improvement: "<<<br />
(DWORD)(100.0*(total_default-total_block_allocator)/total_default)<<"%"<<endl;<br />
<br />
return 0;<br />
}<br />
|
|
|
|
|
Although I have not tried the block allocator described by this author, I have been designing my own block allocators, and my "speed" improvements roughly match the authors. There are three areas of interest when dealing with speed differences between allocators:
1) Allocation Time
2) Deallocation Time
3) Access Time
Block allocator will generally improve Allocation and Deallocation times. Where block allocators can really influence speed is in the use of the memory created. Because block allocators group memory together you reduce memory fragmentation and page faults. The percentage decrease in time to randomly insert into a linked list will be larger that the percentage decrease to insert at the head of a linked list or in a map.
Steven Schimmelpfennig
|
|
|
|
|
I tried this block allocator in one of my projects.
I used serveral maps, multimaps, sets and lists with up to hundredthousands of entries to calculate deformations in volumetric objects in realtime. The speed improvement in comparison with default Microsoft allocators was enormous. But the standard SGI STL version is still a bit faster!
Yours, Lothar.
|
|
|
|
|
I believe that the reason the SGI implementation of the STL was faster is because it does not use a variable to track the count of objects currently in the list. This makes inserting much faster, but it makes size() much slower. It is a tradeoff of which is more important to you. I found using block allocators I could almost match SGI STL time of inserting 10,000 integers into a list. Also getting the size took no measureable amount of time (less than 0.00 ms) while the SGI STL took (333.33 ms) to count all the elements
|
|
|
|
|