|
Okay, my mind is completely boggled by this code...
- you allocate an intrinsic array of size whatever
- you decrement the pointer to the array
- you then manually copy another array into this one starting at the same position you allocated the memory at
What's wrong with...
int *src_start = ...
int *src_end = ...
int *p = new int[ src_end - src_start ];
std::copy( src_start, src_end, p );
int *p_to_1_based_alias = p - 1;
If that doesn't at least perform as well as your handrolled loop then trade your compiler in for a new one. You can also get rid of any explicit memory management using a vector.
Another idea would be to just front end the array you get back from the API and convert the index:
template<std::size_t N, typename T>
class N_based_array
{
public:
N_based_array( T *data ) : data_( data ) {}
T &operator[]( std::size_t index )
{
return data_[ index - N ];
}
private:
T *data_;
};
Cheers,
Ash
|
|
|
|
|
I don't get a list of the type I want back from the API function call... I have to call each API method on a per-element basis. Basically, I have a cell-array from Matlab being passed in:
size_t const indexCount = mxGetNumberOfElements(prhs[0]);
size_t const dataSize = mxGetNumberOfElements(prhs[1]);
double const ** data = new double const *[dataSize];
unsigned __int32 const *const indices = static_cast<unsigned __int32 const *>(mxGetData(prhs[0]));
for (size_t i = 0; i < dataSize; ++i) {
data[i] = mxGetPr(mxGetCell(prhs[1], i));
}
--data; That is why I didn't do what you suggested.
Sounds like somebody's got a case of the Mondays
-Jeff
|
|
|
|
|
So what's wrong with the wrapping class solution to convert your array to a 1 based one?
Seeing your code there you could use std::generate on a std::vector to get the same effect as your manual loop. Generally if you're doing low level memory fiddling, pointer arithmetic and looping at the same time in C++ there's usually a simpler way.
Cheers,
Ash
|
|
|
|
|
Skippums wrote: int *const dst = new int[srcLen] - 1;
This is seriously broken. You are allocating a pointer and then decrementing it. This is VERY bad (especially since you declare dst const and now can't delete it.) If you want to use 1 indexing, simply allocate one extra element and don't worry about element zero.
Skippums wrote: for (size_t i = 0; i < srcLen; )
dst[++i] = src[i];
The behavior here is technically undefined, however with VS 2008 it will increment i, then use the result as the index for both arrays.
Skippums wrote: const int temp = src[i];
The "const" here does nothing and ends up being slower, not just because you are confusing the compiler, but because it prevents the compiler from using an indexed assembly call.
Skippums wrote: (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)
Because memcpy() will very likely be faster in this case and you aren't concentrating on where the optimizations will actually do some good. Assuming you have a sound algorithm, a general rule is writing sensible, plain C/C++ code will allow the compiler to do the best job optimizing (I've proven this many times to skeptics.)
* * * * *
Note: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.
void Test1()
{
for (size_t i = 0; i < len; ++i)
pDst[i + 1] = pSrc[i];
}
void Test2()
{
for (size_t i = 0; i < len; )
{
const int temp = pSrc[i];
pDst[++i] = temp;
}
}
void Test3()
{
memcpy(&pDst[1], pSrc, len * sizeof(int));
}
void Test4()
{
int *src_start = (int*) &pSrc[0];
int *src_end = (int*) &pSrc[len];
std::copy(src_start, src_end, &pDst[1]);
}
These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3().
Surprisingly, Test2 was ever so slightly faster than Test1 (by about 0.1% - 0.5% on my system.) I suspect the CPU cache covers the "save" of the register.
modified on Tuesday, August 24, 2010 2:20 PM
|
|
|
|
|
Good points.
Joe Woodbury wrote: const int temp = src[i];
The "const" here does nothing and ends up being slower, not just because you are confusing the compiler, but because it prevents the compiler from using an indexed assembly call.
Care to elaborate on this a bit? Why would the const affect the compiler? I would assume the compiler easier could chose an indexed call just because there is a const.
|
|
|
|
|
for (size_t i = 0; i < srcLen; ++i)
dst[i + 1] = src[i];
for (size_t i = 0; i < srcLen; )
{
const int temp = src[i];
dst[++i] = temp;
}
Sorry, I was typing fast and was rather confusing. I should have said that the second doesn't take advantage of the indexing assembly can do. In assembly both calls use something like mov DWORD PTR [ecx+eax*4], edx , the first, however, "sees" the add by one and simply makes the final copy mov DWORD PTR [eax+edx*4+4], ecx the instruction is one extra byte.
Now you could write the function as follows, however, my experience is that it will end up running slower. Partly because the compiler may not make pDst and pSrc register variables (due to not enough room) and partly because of the above.
int* pDst = &dst[1];
int* pSrc = &src[0];
for (size_t i = 0; i < srcLen; ++i)
*pDst++ = *pSrc++;
[Note: Turns out that in an admittedly isolated environment, the second algorithm and third algorithms are slightly faster than the first, which surprised me, but that's the voodoo of assembly and modern CPUs. memcpy() is a solid 15% faster. Of course, memcpy is likely using SSE2 instructions which makes it even faster than rep movsd .
modified on Tuesday, August 24, 2010 2:58 PM
|
|
|
|
|
Thanks for the explanation. It makes sense if it depends on the indexing rather than the const keyword.
|
|
|
|
|
Joe Woodbury wrote: This is seriously broken. You are allocating a pointer and then decrementing it. This is VERY bad (especially since you declare dst const and now can't delete it.) If you want to use 1 indexing, simply allocate one extra element and don't worry about element zero.
Actually you can delete it:
delete[] (dst + 1);
Joe Woodbury wrote: The behavior here is technically undefined, however with VS 2008 it will increment i, then use the result as the index for both arrays.
That was my original question... Thank you for answering it! Even if it would have done what I wanted, I still probably wouldn't have used it due to code maintainability issues.
Joe Woodbury wrote: The "const" here does nothing
Actually, the const there prevents people in the future from modifying the value stored in temp, which may prevent coder error. In the simple loop I proposed, it probably will have no effect either way, since screwing up the loop through human error is unlikely and I'm guessing VS2008 is smart enough not to allocate memory for it. Also, the actual loop I am using is more complex than this example, so the compiler couldn't have used an indexed assembly call anyway.
Joe Woodbury wrote: Because memcpy() will very likely be faster in this case
I can't use memcpy... that is part of the complexity of my problem. See my response to the post immediately before yours if you are curious as to why I can't use that function.
I do fully agree with your statement:
Joe Woodbury wrote: Assuming you have a sound algorithm, a general rule is writing sensible, plain C/C++ code
Thanks for the help!
Sounds like somebody's got a case of the Mondays
-Jeff
|
|
|
|
|
Skippums wrote: I'm guessing VS2008 is smart enough not to allocate memory for it
It actual does since it runs out of registers (I was a little surprised by this, but it makes sense once you look at the dissassembled code.) In practice, though, I'd imagine the cache would prevent this from being a big performance hit.
|
|
|
|
|
One question: Have you measured how much faster those a-priori optimizations are running compared to the very cleanly coded algorithm from the post above (using std::copy)?
|
|
|
|
|
Repeating my edit above:
Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.
void Test1()
{
for (size_t i = 0; i < len; ++i)
pDst[i + 1] = pSrc[i];
}
void Test2()
{
for (size_t i = 0; i < len; )
{
const int temp = pSrc[i];
pDst[++i] = temp;
}
}
void Test3()
{
memcpy(&pDst[1], pSrc, len * sizeof(int));
}
void Test4()
{
int *src_start = (int*) &pSrc[0];
int *src_end = (int*) &pSrc[len];
std::copy(src_start, src_end, &pDst[1]);
}
These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3().
Surprisingly, Test2 was ever so slightly faster than Test1 (by about 0.1% - 0.5% on my system.) I suspect the CPU cache covers the "save" of the register.
|
|
|
|
|
Thanks for the insight. A nice lesson for the "early optimizers"...
|
|
|
|
|
If your intent is accessing two arrays within a loop, then - no matter the relation between indices - the fastest way would be to use two pointers to the individual elements and increment these.
If you're so intent on improving performance, consider this: every direct access to an array element via an index value requires
1. loading the start address of the array
2. get the size of an element
3. multiplying that by the index(-1)
4. adding that to the start address
5. dereference this address to get to the actual element.
As opposed to using pointers which just requires
1. load the pointer
2. dereference it
Of course, incrementing the pointers eats up most of this advantage, as you have to add the element size each iteration. And most likely a good optimizer will convert your code into something like this anyway.
What I want to say, using an index is just complicating things unneccesarily if all you want to do is access each element sequentially.
|
|
|
|
|
"why not write efficient code when you can?"
Don't be daft. Write readable code when you can. Optimize when necessary and use a profiler to detect inefficencies.
|
|
|
|
|
Why use a performance profiler when I can just guess which parts will be inefficient during implementation? And for the record, my code is readable; the compiler understands it just fine.
Sounds like somebody's got a case of the Mondays
-Jeff
|
|
|
|
|
Skippums wrote: Why use a performance profiler when I can just guess which parts will be inefficient
Unless you're writing Hello World, you will be surprised.
|
|
|
|
|
I guess I need to be more explicit when I type sarcastic comments. Sadly, I couldn't find the appropriate voice inflection characters on my keyboard.
Sounds like somebody's got a case of the Mondays
-Jeff
|
|
|
|
|
..or don't write what I will read
|
|
|
|
|
hello,
i am trying to write a code which will display the layout of an ogg page! but it seems to me that the result dispalyed is wrong! please can someone help me??
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>
typedef struct
{
char magic_number[4];
int version;
long header_type;
int granule_position;
long bitstream_serial_number;
int page_sequence_number;
int crc;
int page_segments;
int segment_table;
} OggS;
OggS* lecture(char*) ;
main (void)
{
char nom[100];
OggS* ogg;
printf("Veuillez entrer le nom et chemin du fichier OGG sans l'extension .ogg :\n");
gets(nom);
strcat(nom,".ogg");
ogg=lecture(nom);
printf ("The magic_number is:\t %.4s\n",ogg-> magic_number);
printf ("Version du fichier :%d\n",ogg->version);
printf ("Le type du header du fichier :\t %d\n",ogg->header_type);
printf ("Le granule position du fichier :\t %d\n",ogg->granule_position);
printf ("Le bitstream_serial_number :\t%d\n",ogg->bitstream_serial_number);
printf ("page_sequence_number :\t %d\n",ogg->page_sequence_number);
printf ("crc :\t\t %d\n",ogg->crc);
printf ("page_segments :\t\t %d\n",ogg->page_segments);
printf ("segment_table :\t\t\t %d\n",ogg->segment_table);
}
OggS* lecture(char* nom)
{
unsigned char temp[4];
OggS *ogg;
FILE * fichier;
ogg=(OggS*)malloc(sizeof(OggS));
fichier = fopen (nom,"r");
fread (ogg->magic_number,4,1,fichier);
fread (ogg->version,1,8,fichier);
fread (ogg->header_type,1,8,fichier);
ogg->header_type = conv (temp,2);
fread (&temp,1,1,fichier);
ogg->granule_position = conv (temp,4);
fread (&temp,1,1,fichier);
ogg->bitstream_serial_number = conv(temp,4);
fread (&temp,1,1,fichier);
ogg->page_sequence_number = conv(temp,4);
fread (&temp,1,1,fichier);
ogg->crc = conv (temp,2);
fread (&temp,1,1,fichier);
ogg->page_segments = conv (temp,2);
fread (&temp,1,1,fichier);
ogg->segment_table = conv (temp,4);
fclose(fichier);
return ogg;
}
int conv (unsigned char hex[4],int nb)
{
int res=0,i;
for (i=nb-1;i>=0;i--)res=res*256+hex[i];
return res;
}
|
|
|
|
|
rukita wrote: but it seems to me that the result dispalyed is wrong!
Well without knowing what results you expect to see and what results you actually see it is difficult to guess what the problem is. However I would make some suggestions about your code (I think your fread() calls may be wrong).
In lines like:
fread (ogg->magic_number,4,1,fichier);
Instead of hard coding your size values use the sizeof() operator to ensure you are correct, thus:
fread (ogg->magic_number, sizeof(char), 4, fichier);
You also need to add the addressof operator & for the non-array items thus:
fread (&ogg->version, sizeof(int), 1, fichier);
and the same with your conv() function.
It's time for a new signature.
|
|
|
|
|
hey Richard,
Thanks for your answer. i've changed my code. However i still do not know if what is dipakyed is correct or not. So, is there a way to know it? All what i know about the Ogg layout is, that an ogg page header is composed of:
Capture pattern – 32 bits
The capture pattern or sync code is a magic number used to ensure synchronisation when parsing Ogg files. Every page starts with the four ASCII character sequence OggS.
Version – 8 bits
This field indicates the version of the Ogg bitstream format. It is currently mandated to be 0.
Header type – 8 bits
This is an 8 bit field of flags, which indicates the type of page that follows. The rightmost or least significant bit is considered bit 0, with value 0x01, the next least significant digit is bit 1, with value 0x02. The third is bit 2, with value 0x04, and so on.
Granule position – 64 bits
A granule position is the time marker in Ogg files. It is an abstract value, whose meaning is determined by the codec. It may for example be a count of the number of samples, the number of frames or a more complex scheme.
Bitstream serial number – 32 bits
This field is a serial number that identifies a page as belonging to a particular logical bitstream. Each logical bitstream in a file has a unique value, and this field allows implementations to deliver the pages to the appropriate decoder.
Page sequence number – 32 bits
This field is a monotonically increasing field for each logical bitstream. The first page is 0, the second 1, etc. This allows implementations to detect when data has been lost.
Checksum – 32 bits
This field provides a checksum of the data in the entire page, performed with the checksum field set to 0.
Page segments – 8 bits
This field indicates the number of segments that exist in this page. It also indicates how many bytes are in the segment table that follows this field. There can be a maximum of 255 segments in any one page.
Segment table
The segment table is an 8 bit vector of values indicating the length of each segment within the page body. The number of segments is determined from the preceding Page Segments field. Each segment is between 0 and 255 bytes in length.
and here is my new code:
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <math.h>
typedef struct
{
char magic_number[4];
int version;
long header_type;
int granule_position;
long bitstream_serial_number;
int page_sequence_number;
int crc;
int page_segments;
int segment_table;
} OggS;
OggS* lecture(char*) ;
main (void)
{
char nom[100];
OggS* ogg;
printf("Veuillez entrer le nom et chemin du fichier OGG sans l'extension .ogg :\n");
gets(nom);
strcat(nom,".ogg");
ogg=lecture(nom);
printf ("The magic_number is:\t %.4s\n",ogg-> magic_number);
printf ("Version du fichier :%d\n",ogg->version);
printf ("Le type du header du fichier :\t %d\n",ogg->header_type);
printf ("Le granule position du fichier :\t %d\n",ogg->granule_position);
printf ("Le bitstream_serial_number :\t%d\n",ogg->bitstream_serial_number);
printf ("page_sequence_number :\t %d\n",ogg->page_sequence_number);
printf ("crc :\t\t %d\n",ogg->crc);
printf ("page_segments :\t\t %d\n",ogg->page_segments);
printf ("segment_table :\t\t\t %d\n",ogg->segment_table);
}
OggS* lecture(char* nom)
{
unsigned char temp[4];
OggS *ogg;
FILE * fichier;
ogg=(OggS*)malloc(sizeof(OggS));
fichier = fopen (nom,"r");
fread (ogg->magic_number,4,1,fichier);
fread (&ogg->version,sizeof(int),1,fichier);
fread (&ogg->header_type,sizeof(long),8,fichier);
ogg->header_type = conv (temp,2);
fread (&ogg->granule_position,sizeof(int),1,fichier);
ogg->granule_position = conv (temp,4);
fread (&ogg->bitstream_serial_number,sizeof(long),1,fichier);
fread (&ogg->page_sequence_number,sizeof(int),1,fichier);
ogg->page_sequence_number = conv(temp,4);
fread (&ogg->crc,sizeof(int),1,fichier);
ogg->crc = conv (temp,2);
fread (&ogg->page_segments,sizeof(int),1,fichier);
ogg->page_segments = conv (temp,2);
fread (&ogg->segment_table,sizeof(int),1,fichier);
fclose(fichier);
return ogg;
}
int conv (unsigned char hex[4],int nb)
{
int res=0,i;
for (i=nb-1;i>=0;i--)res=res*256+hex[i];
return res;
}
please can u help me??
|
|
|
|
|
Well the first thing I noticed is that you are reading an integer (32 bits) for the version, which you state is 8 bits wide, so you should be reading a byte. Check the rest of your code to ensure it is consistent. You could probably make it much easier by the use of a packed structure (check the #pragma pack() and struct keywords in the C++ documentation.
As for any further information I would suggest you do some research on the internet for documentation about the ogg file format (I have not used it before), as you may well find some existing programs or libraries that will help you to read it.
It's time for a new signature.
|
|
|
|
|
On my local machine are a number of user accounts. When I use either NetUserEnum() or NetQueryDisplayInformation() to get those, I always get a few extra. For example, HelpAssistant and SUPPORT_388945a0.
Here's the code I'm using:
LPUSER_INFO_0 pTmpBuf;
DWORD dwEntriesRead,
dwTotalEntries;
NetUserEnum(NULL, 0, FILTER_NORMAL_ACCOUNT, (LPBYTE *) &pTmpBuf, MAX_PREFERRED_LENGTH, &dwEntriesRead, &dwTotalEntries, NULL);
LPUSER_INFO_0 pBuf = pTmpBuf;
for (; dwTotalEntries > 0; dwTotalEntries--)
{
_tprintf(_T("%s\n"), pBuf->usri0_name);
pBuf++;
}
NetApiBufferFree(pTmpBuf); Any idea how I can exclude those other user accounts?
Thanks.
- DC
"One man's wage rise is another man's price increase." - Harold Wilson
"Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons
"Man who follows car will be exhausted." - Confucius
|
|
|
|
|
Oh boy! I wish it was that easy to remember what I did almost 10 years back. I have done exactly what you trying to do. The gist of it is
a. Initially you get all the accounts, including the ones you never seen or heard of in the system, some don't even show in the accounts list of the User Control Panel
b. I was trying to get the user names which show in the log on screen ONLY
Then you going to call NetUserGetInfo for every user. I think using level 11, USER_INFO_11 will give more detail about the account including the type. I thought there is somewhere you can query for "interactive?" turned on. By interactive it means the user name can be used to login interactively to the system.
I hope, I could be more helpful, but the thought is killing me. I hope this leads you to the right direction.
|
|
|
|
|
Hello all,
I am detecting deletion of files on the system and to do this i am hooking the NtSetInformationFile function. this gets passed to it the file handle, and from this i need the file name. so i am using the API GetFileInformationByHandleEx to get the file name, But the problem is that the file name comes like this "\sample\a.txt", Now this doesn't give me device name(\device\volume) with File name, so i cannot assume from where the file has been accessed, It could be "C:\sample\a.txt" or "D:\sample\a.txt".
So it's quite clear that i must have "\Device\volume0", "\Device\volume1" etc before the filename, Further googling took me to THIS page where the file name can be retrieved from FileHandle, This uses CreateFileMapping, MapViewOfFile, GetMappedFileName, GetLogicalDriveStrings and QueryDosDevice to retrieve the file name, But when i use it CreateFileMapping fail with error 5 which is "Access Denied", Some more google and i found that the file handle must have GENERIC_READ access else CreateFileMapping will fail.
Now i'm not opening the file, explorer is.. So how could i check with which access explorer opens or access the file or how could i change the dwDesiredAccess..
Thanks All..
|
|
|
|
|