|
leon de boer wrote: The more obvious answer is get whatever is saving the files to put the ones which contain your string ABCD out under a special name string. Then you don't have to search inside the file at all to find the files you want. What happens if I need to find "BCDE" or "S1 " or "01FA" or ...?
Another obvious choice is have the files on a ramdisk as there isn't much data. If I put the folder on the SSD, the process is much faster: 60.9 s for the HD and 2.9 s for the SSD. The SSD is an "unusual" location for that folder because all the other files are on the HD, but it's the easiest solution.
Thank you
|
|
|
|
|
Quote: What happens if I need to find "BCDE" or "S1 " or "01FA" or ...?
Label them differently with a special name obviously, all you are doing is coming up with a filenaming convention
Hell use the file extension you already have (*.states) and mask the bits of it for what special strings are in it
*.states = file with no special tags
*.states1 = file with special tag 1 in it
*.states2 = file with special tag 2 in it
*.states3 = file with special tag 1 & 2 in it
*.states4 = file with special tag 3 in it
*.states5 = file with special tag 1 & 3 in it
*.states6 = file with special tag 1 & 2 in it
*.states7 = file with special tag 1, 2 & 3 in it
You can know what tags are in the file without ever opening it all you need to know is the filename.
This is also obviously a windows program why aren't you using the Windows API for the file open and reading?
HANDLE Handle = CreateFile (fd.name, GENERIC_READ, FILE_SHARE_READ,
0, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
if (Handle != INVALID_HANDLE_VALUE)
{
DWORD Actual;
ReadFile(Handle, &buf[0], sizeof(buf)-1, &Actual, 0);
if (Actual > 0)
{
buf[Actual] = 0;
if(0 == _strnicmp(buf, "ABCD", 4)) {
Save buf in a std::vector
}
}
CloseHandle(Handle);
}
In vino veritas
modified 2-Feb-19 3:59am.
|
|
|
|
|
Since almost all the files contain the wanted string, I'll need to open almost all the files, so the speed up would be negligible.
I don't use Win Api because, afaik, there is no fgets() equivalent and there is no speed up if I read the whole file at once.
|
|
|
|
|
I gave you the fgets equivalent above (its only a couple of lines of code) .. I am not convinced it isn't faster because you will be using the standard console file handler for opening and reading thru the standard library.
Anyhow I will leave you to it
In vino veritas
|
|
|
|
|
I suspect that the reason for your program to run much faster on the second run is that modern drives cache a certain amount of data, and therefore don't need to rely on slow hardware for repeatedly reading the same files.
In your code, you read from each file line by line. Internally, these reads will trigger a request to read some block (or multiple blocks) of data. While each of these blocks is probably cached to be used for consecutive reads, any read request requiring a new block will cause another, slow, access to the hard disk.
You could speed this up by reading the whole file in a single operation: query its size, allocate a sufficiently large buffer, open as binary, and read the whole file into that buffer. Then your internal while loop can request each line from that buffer, which should be considerably faster.
GOTOs are a bit like wire coat hangers: they tend to breed in the darkness, such that where there once were few, eventually there are many, and the program's architecture collapses beneath them. (Fran Poretto)
|
|
|
|
|
The main culprit is "fgets". Once you call that, the fopen series of calls immediately loads, I believe, 32k of data. On top of that fgets is relatively slow. For speed, you may be better off using fread, reading in 4k (or the page size) at a time and parsing the block yourself (by simply looking for ABCD. This could be sped up faster by doing a Boyer-Moore search, though since the string is short, simply scanning first for A and then checking for the rest may be faster. That said, I believe some new implementations of the standard library now include a Boyer-Moore algorithm.)
Do also note that caching plays a big part here. Just recursing folders will take significantly longer the first pass than the second. This can be deceptive, however, since in actual operation those caches may be flushed between program runs.
|
|
|
|
|
The definitive solution: ctrl-x from HDD to SSD, restart the PC (probably not needed), ctrl-x from SSD to HDD.
Now the process takes 8.2 s instead of 61 s, which seems reasonable to me.
|
|
|
|
|
how to write in .xlsx extension excel file using c++??
|
|
|
|
|
|
|
Does typedefs and a lot of use of typedefs make code complicated especially if code has to be used by a lot of people?
I can understand cases where typedef can be useful for the person who created it and uses it but if the person revisits the code 5 years later or some other individual is reviewing the code, they have to constantly look up the typedefs. If there a couple of typedefs it might be ok but if the code is millions of lines and there are 1000 typedefs defined in various projects, does it not defeat the purpose of typedef?
I would rather not use typedef's at all because of this and just deal with the pain of typing or using complete syntax.
Any thoughts or revelations on this?
|
|
|
|
|
As with most "shortcuts", you should only use them where they add value or improve readability.
|
|
|
|
|
nitrous_007 wrote: Does typedefs and a lot of use of typedefs make code complicated especially if code has to be used by a lot of people? How do they make code more complicated?
nitrous_007 wrote: ...but if the person revisits the code 5 years later or some other individual is reviewing the code, they have to constantly look up the typedefs. Not if they were named/implemented correctly.
nitrous_007 wrote:
I would rather not use typedef's at all because of this and just deal with the pain of typing or using complete syntax. And what about that one place you forgot?
"One man's wage rise is another man's price increase." - Harold Wilson
"Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons
"You can easily judge the character of a man by how he treats those who can do nothing for him." - James D. Miles
|
|
|
|
|
About the only place I use them is for function prototypes for lambdas.
With intellisense and auto, my typing doesn't increase much and seeing the full type makes the code more clear for me.
|
|
|
|
|
Without typedef s (or using[^]) code could be a mess.
Anyway nothing prevents messy code to take advantage of typedef to further mess up.
|
|
|
|
|
typedef is essential in metaprogramming, i. e. when you implement template classes that are supposed to fulfil certain criteria. This allows generic algorithms to specify certain dependant types in their implementation. (most notably, but not only, return types)
Other than that, typedefs and using-declarations[^] can be used to imrpove readability. But these mechanisms should not be overused: I prefer to be able to read where a symbol is coming from rather than a name that may be a local symbol or not.
That said, modern, language-sensitve text editors can show you what's behind a name very easily, maybe even in a tooltip. (and if yours doesn't, go look for a plugin that does)
GOTOs are a bit like wire coat hangers: they tend to breed in the darkness, such that where there once were few, eventually there are many, and the program's architecture collapses beneath them. (Fran Poretto)
|
|
|
|
|
I don't understand why you would need to look up a typedef constantly, if you want one you just declare it what does it matter what it is ... that is sort of the point to hide the base type and give a little more safety.
Explain a situation where you are saying you need to look at what the type is and I am suggesting you are probably doing something wrong. You can size any type without knowing what it is by the sizeof function that is the usual error people make when they think they need to know what a typedef base type actually is.
I don't care if you have thousands of types they make things easier not harder, so there is something going on with why you think otherwise.
In vino veritas
|
|
|
|
|
I used to place all my #include directives in the header files, for two reasons:
1) When using C++ I have classes to derive to and from, therefore my derived class must "know" its ancestors;
2) I like to use the header files as indexes of the source files: the information about which dependencies has a module is more useful in the header file, as I don't even have to look at the source code unless there are troubles.
I don't mix and match, so I always put all my #includes in the header file.
Now I'm using plain C for a project and I had to include a bulky header (<windows.h> in this instance) to access an API needed only inside that .c file (the rest of the code doesn't use windows APIs).
This made me doubt about the soundness of the choice of putting all #include directives in the header file: having the inclusions constrained in their own compilation units would speed up compilation time and create more separation between modules; on the other hand complex structures often include dozens of headers and the lower levels would benefit from the inclusions already managed by its dependancies.
What do you think about this issue? How do you normally operate?
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
If a header file depends for its compilation on another header file, the second header file is included in the first.
#include "header2.h"
If a source file depends for its compilation on a header file, that header file is included in the source file.
#include "header1.h"
In your case, <windows.h> is needed only in the source file, so it is included only there. Note that this also helps porting - the header file (which is O/S-independent) does not need changing, but the source file (which is O/S-dependent) does.
I admit that my method could cause a file to be included multiple times. This is not as big a problem as it used to be:
- Modern compilers often cache header files
- SSDs are much faster than HDDs
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Daniel Pfeffer wrote: I admit that my method could cause a file to be included multiple times.
Not a big problem because widely used headers should be put into precompiled headers if compilation time is important.
Thank for sharing your modus operandi.
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
I try to avoid this now as I have run into situations when I get compile errors due to circular references in which one include is depending on another include file. These errors can be hard to find especially with a large project. This is not so much an issue with MFC includes as it is with my own project includes. So I put the includes in the .c or .cpp file.
|
|
|
|
|
speedbump99 wrote: circular references in which one include is depending on another include file. These errors can be hard to find especially with a large project.
Bumped into them, they are quite the PITA to debug. It took me a while to understand how to do things properly... My only excuse is that the project I've worked on for 7 years was a horrible misture of global variables, global functions and poorly designed (damned self-taught programmers) C++ classes.
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|
|
No Problem at all:
#ifndef HEADER_XY
#include 'HEADER_XY.H'
#endif
In HEADER_XY.H must - of course - written been
#define HEADER_XY
|
|
|
|
|
I tend to (loosely) follow the guideline: if the file (header or source) doesn't need an header then it should not include it.
Speeding up compilation is more an issue with C++ than with C (passing from C++ to C is the real performance boost on compilation times, in my experience).
|
|
|
|
|
That's what I thought... I've been doing it mostly wrong for 7 years Thanks!
GCS d--(d+) s-/++ a C++++ U+++ P- L+@ E-- W++ N+ o+ K- w+++ O? M-- V? PS+ PE- Y+ PGP t+ 5? X R+++ tv-- b+(+++) DI+++ D++ G e++ h--- r+++ y+++* Weapons extension: ma- k++ F+2 X
|
|
|
|