C / C++ / MFC

31-Jan-11 2:06

Hi,
Using C++, std::fstream, Windows.

I have an algorithm that processes large file (actually this is the series of algorithms, but lets consider only one, cause others having the same problems).
In general this algorithm reads chunks of data from one file and write to the other file with during some realignment (For reference, I'm doing realignment of the 3d volume data). I'm reading chunk size is 512 bytes, writing chunk size is 16 kb.

Usually this algorithm finishes in 1 minute and 50 seconds. But I noticed that sometimes (rarely) it finishes in 24 seconds! Processing the same file, the same execution path. I've started to search for the reasons of why this slowdown is happening and how can I control that.

1) I have tried increasing the coalescing of the accesses to disk
2) I have considered the fragmentation problem (the file that I write is wrote in small chunks, therefore it's fragmented, about 600 fragments). When I resolved fragmentation problem (so, it's guaranteed that file is not fragmented) - I didnt got anything, still this chaotic access speed.
3) I have investigated the probability, that Windows flashes my memory buffers to HDD. No, that's not the case.
4) I have found that if I do all this operation on the other physical disk (not the one with OS) - I get this slow down more rarely, and algorithm usually finishes in 40 seconds (but anyway, speed is of HDD access is chaotic).

Frequently during the same execution, access speed is rising or falling down, may be few times.
This looks like my accesses are going out of tact with some internal HDD or OS operations, don't know.

Anyone, have some experience or idea?

Thanks.

User 742933831-Jan-11 2:35

31-Jan-11 2:35

If you read and write data from the same disk from different files, the read-head will have to move to a different place on the disk, which is slow. If your input and output file are on different drives, this problem is solved. If you read and write alternatively (to the same physical disk) in your program each time, this slow down is maximum.
Note that some variation in disk speed can't be prevented in a multitasking OS where other programs might be using the disk too.

If your memory permits, you could consider reading the entire input file to memory at once at the beginning of your algorithm and then processing it and writing the results.

modified 13-Sep-18 21:01pm.

progDes31-Jan-11 2:49

31-Jan-11 2:49

My files are always one the same drive in all tests. When it does in 1:54 or 24 seconds.

Thaddeus Jones wrote:
Note that some variation in disk speed can't be prevented in a multitasking OS where other programs might be using the disk too.

I always ensure no other work is running. Also I'm looking in the "Resource monitor", no big HDD accesses except my program.

Thaddeus Jones wrote:
If your memory permits, you could consider reading the entire input file to memory at once at the beginning of your algorithm and then processing it and writing the results.

No this is not possible, algorithm should work with unlimited file size.

User 742933831-Jan-11 2:59

31-Jan-11 2:59

Maybe you could increase your input buffer then from 512 bytes to say 100Mb, and every time you've processed the 100Mb from memory, you'll read a new 100Mb. Similarly, writing your output to a memory buffer (say also 100Mb) and once your buffer is full writing that to file, should help with speed too.

The idea is to concentrate disk access to areas on the disk that are near eachother, since those access operations are much faster than if the head has to be repositioned.

modified 13-Sep-18 21:01pm.

progDes31-Jan-11 3:36

31-Jan-11 3:36

Thaddeus Jones wrote:
Maybe you could increase your input buffer then from 512 bytes to say 100Mb, and every time you've processed the 100Mb from memory, you'll read a new 100Mb. Similarly, writing your output to a memory buffer (say also 100Mb) and once your buffer is full writing that to file, should help with speed too.

I've tried this approach. Although it gives slight speed increase, it doesnt resolve the problem of chaotic speed.

User 742933831-Jan-11 3:39

31-Jan-11 3:39

I'm afraid I'm out of ideas then Smile | :)

modified 13-Sep-18 21:01pm.

progDes31-Jan-11 2:51

31-Jan-11 2:51

The problem is not "Algorithm is slow", but "Algorithm speed is too chaotic". Sometimes it's done 5 times faster then usual, this means - it should always be done 5 times faster.

I agree, that access speed can vary a little, but 5 times... I think this is something that should be puzzled out.

David Crow31-Jan-11 2:54

David Crow

31-Jan-11 2:54

progDes wrote:
Usually this algorithm finishes in 1 minute and 50 seconds. But I noticed that sometimes (rarely) it finishes in 24 seconds! Processing the same file, the same execution path. I've started to search for the reasons of why this slowdown is happening and how can I control that.

Caching, perhaps?

"One man's wage rise is another man's price increase." - Harold Wilson

"Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons

"Man who follows car will be exhausted." - Confucius

Andrew Brock31-Jan-11 3:04

Andrew Brock

31-Jan-11 3:04

There are countless factors that could be contributing to this.
I would guess that the major ones are file caching and delayed writes[^].

File Caching:
When a file is opened and read, its contents are loaded into RAM by the OS, and then sections are copied to your exe as you need them (usually in 4KB chunks which you then read smaller chunks from). Once the file is closed it is marked as unused but the OS, but is not removed from the RAM. If then another program needs heaps of memory the file will be removed from the RAM, however if this does not happen, and your file is still in the RAM then your program doesn't actually need to use the hard disk.
This is most notable if you open a program that uses lots of files on load, say MS Word. If you then close the program and open it again shortly after without opening something else the 2nd time you open the program it will load much quicker.

Delayed Writes:
These generally only occur to slow mediums such as USB memory sticks, however they can happen to HDD as well.
When you write to a file, and storage device is busy the data you write will often get written into a virtual file in RAM which is then written to the storage device at a later stage.

Other problems may include HDD head seeks (as mentioned by Thaddeus Jones) and other programs accessing the disk.

If you are running Windows vista or 7 then you can look at disk accesses with the Resource Monitor (resmon.exe)

progDes31-Jan-11 3:39

31-Jan-11 3:39

Thanks Adrew,

I will consider the situation with file caching. Need to investigate on this more.

Meanwhile, are you think that rare disk accesses by other programs can reduce speed of my accesses in 5 times?
I'm making sure that no heavy HDD operations are performed by other programs, but other programs for sure doing disk accesses even in the idle mode.

Niklas L31-Jan-11 21:09

Niklas L

31-Jan-11 21:09

If you run out of ideas, have you tried disabling anti-virus software? Maybe it performs some weird caching of scanned data.
Even a long shot is a shot...

home

progDes31-Jan-11 21:28

want to know meaning of line obj = ( struct xyz * ) ( buffer + offset );

31-Jan-11 21:28

Yes, I tried.

Well, actually I think this is file caching problem. Seems like caching is not always work well in my case. Will try to disable it and do some caching on my own.

VCProgrammer31-Jan-11 0:26

VCProgrammer

31-Jan-11 0:26

Hi All,

I wanted to know the meaning of a line of code,

a structure is declared in a .h file
struct xyz
{
    int q;
    char w;
    long s;
};

object of it is declared in .cpp file

struct xyz *obj;
obj = ( struct xyz * ) ( buffer + offset );

where buffer is a char array of 1000 bytes and offset is a long variable having 500 value

i want to know meaning of line:-

obj = ( struct xyz * ) ( buffer + offset );

Can anybody help me in this
Thanks in advance

Re: want to know meaning of line obj = ( struct xyz * ) ( buffer + offset );

User 742933831-Jan-11 0:38

Re: want to know meaning of line obj = ( struct xyz * ) ( buffer + offset );

31-Jan-11 0:38

The structure pointure obj will now point to the memory address where buffer begins, with an offset of 500 added.

Assuming an integer of 32 bits and long of 64 bits, the structure will now have the following data;
obj->q has buffer[500] to buffer[503] as integer value
obj->w is buffer[504]
obj->s has buffer[505] to buffer[512] as long value

modified 13-Sep-18 21:01pm.

Niklas L31-Jan-11 1:08

Niklas L

31-Jan-11 1:08

Of course that depends on the current padding. For info see the pack[^] pragma. It could just as well be

obj->q has buffer[500] to buffer[503] as integer value
obj->w is buffer[504]
obj->s has buffer[508] to buffer[515] as long value

or something completely different.

home

Re: want to know meaning of line obj = ( struct xyz * ) ( buffer + offset );

Graham Shanks31-Jan-11 2:24

Graham Shanks

31-Jan-11 2:24

Niklas Lindquist wrote:
Of course that depends on the current padding

And, indeed, the size of int and long, as the original response noted. On some 64-bit machines/compiler settings sizeof(int) == 8

Graham

Librarians rule, Ook!

Re: want to know meaning of line obj = ( struct xyz * ) ( buffer + offset );

Graham Shanks31-Jan-11 2:40

Graham Shanks

31-Jan-11 2:40

A bit more explanation: In C/C++ arrays and pointers are almost equivalent (see this C++ Language Tutorial on pointers[^]. Thus buffer + 500 is a pointer that is equivalent to &buffer[500].

This sort of programming construct is often used when reading and writing data to/from devices (such as a disk or a network socket). The device usually works with unstructured data which is treated as a sequence of bytes. Programmers often use char arrays as a buffer (a char is usually the same size as a byte) to read and write to the device. However they wish to use structs within the rest of the program. The cast operator (struct xyz*), instructs the compiler to convert the pointer on the right hand side to a pointer to the xyz struct.

Graham

Librarians rule, Ook!

Transparent Color

john563231-Jan-11 0:06

john5632

31-Jan-11 0:06

I am drawing an image using GDI+ function. Image is having balck color at corners, I need to draw the black color as transparent with window.

How can I do that?

Niklas L31-Jan-11 1:17

Niklas L

31-Jan-11 1:17

One way is to use the TransparentBlt[^] API.

home

john563231-Jan-11 2:08

john5632

31-Jan-11 2:08

Is there any option?

Cool_Dev31-Jan-11 2:11

Cool_Dev

31-Jan-11 2:11

You can simply use ImageAttributes::SetColorKey function.

ImageAttributes imageAttrs;
Color transparentClr(0, 0, 0); //Black
imageAttrs.SetColorKey(transparentclr, transparentclr); //All colors in range Black to Black will be transparent.

Use this image attributes in Graphics::DrawImage

graphics.DrawImage(&yourImage,  yourRect, 0, 0, yourImage.GetWidth(), yourImage.GteHeight(),
                     Gdiplus::UnitPixel, &amp;imageAttrs);