|
I still have the problem. I don't see what changed to fix it.
|
|
|
|
|
Hi,
Nothing with FormatV() changed. Only Format() changed. I templatized Format in order to allow passing string objects.
You cannot simply pass a string object to FormatV. It doesn't take variable argument lists. Instead, it takes an arglist object which is BUILT from variable argument lists. However if you're building your own arglist then your still stuck with the need to cast. No way to templatize that. Sorry. As I mention in the comment, allowing this practice (passing string objects) with Format was a hack designed to get around a dangerous MS hack people have long relied upon.
This has long been an incompatability between CString and my code. In short, even Microsoft recommends you NOT pass string objects directly to calls to Format(), sprintf() or other such variadic functions. They recommend you first cast them to LPCTSTR. If you do that with the string object's you're passing, everything should be fine.
So instead of this
CStdString sName("Joe")
CStdString sVal;
sVal.Format("My name is %s", sName);
You should instead do this
sVal.Format("My name is %s", (LPCTSTR)sName);
or alternately this:
sVal.Format("My name is %s", sName.c_str());
Please note that this problem is due to a dangerous hack that the CString designers put in which, in my opinion, they never should have done. The only reason CString lets you get away with this practice is because they carefully laid out the binary pattern of the class to enable it. It's a bad habit to get into.
You can find a much fuller discussion of this topic in this feedback thread as long as you set the date filter to go back indefinitely. It's under the thread entitled "Operator[] and other incompatabilities"
-Joe
|
|
|
|
|
Hi Joe,
I meant Format instead of FormatV. I didn't realize it was unsafe to pass a CString object to CString Format. I was excited when I found your class because it is exactly what I need. I don't use MFC, WTL requires its headers on everyones machine, and the STL string is a mess. It looks like if I want to use it though, I will have to change a tun of calls and hope that I get them all. I guess I will just stick to what I have and suffer for now.
Thanks
|
|
|
|
|
You say you DID mean Format()? Well then your code SHOULD work. If it's not working I'd like to see it because it's a bug. I definitely made this workaround for calls to Format().
Please send me some sample code which illustrates the problem and I will check it out. You can find my email address in the StdString.h header file
-Joe
-Joe
|
|
|
|
|
The sample above crashes for me in every test I try. I really don't see what you have done to fix this. The args are passed down to FormatV and FormatV passes them to _vsnprintf and _vsnprintf doesn't do a cast so it crashes.
-Thanks
|
|
|
|
|
I don't see what sample you are talking about. However if the implementation you see simply does what you describe, then you have an outdated version of the code.
OR... I think I may see it now. There are two versions of Format(). One that takes a string literal for the format strin
void Format(const CT* szFmt, ...)
and another that take a resource ID.
void Format(UINT nId, ...)
Are you calling the one that takes a resource ID? That might be the problem. I only fixed the version that takes the string literal. Sorry, my bad. Just forgetfulness on my part. I'll do the same "fix" with the other version. You can download it here:
http://home.earthlink.net/~jmoleary/code/StdString.zip
Please make sure you have the very latest drop. If you do and you are still having trouble, please email me the code directly at jmoleary@earthlink.net
Thanks,
|
|
|
|
|
One of the most common usage of strings is to cancatenate strings using operator +=. In general this operation doesn't seem to be very efficient. Here are some of my test results:
For my testing, I cancatenate a string of 100 char long 10000 times together.
Using MFC's CString as (just pseudo-code)
CString str, s('0', 100);
for (int i = 0; i < 10000; i++)
{
str += s;
}
It takes about 70 seconds on my machine (
It is almost the same result (actually 68 seconds) if std::string is used (
Interesting, I was quite surprised that the same thing can done in VB in 49 seconds - who said VB is slower?
I then investigated some other options.
Using C libary functions as
// allocate big enough buffer first
char* str = new char[100*10000+1];
str[0] = '\0';
for (int i = 0; i < 10000; i++)
{
strcat(str, s);
}
This takes about 25 seconds - it's much better, at least better than VB. By the way, it's about the same result if memcpy is used.
Back using std::string if I reserve a big enough buffer first like
std::string str;
// allocate big enough buffer first
str.reserve(100*10000);
for (int i = 0; i < 10000; i++)
{
str += s;
}
it is MUCH MUCH MUCH faster. Actually, it takes no time (within 1 second anyway)
Can anybody explain the difference between C and std?
Anyway, it seems that the memory allocation is the most expensive operation here.
While there is a reserve function to allow you to allocate big chunk of memory for big string in std, I find it's inconvenient to use because in many cases you don't know how big you really need. If the reserved memory becomes smaller than the needed, all remaining cancatenations will suffer for the same reason.
A better solution to this, in my opinion, is to define a grow size. I can then set a bigger grow size if I know it will be cancatenated many times. In an application with many string cancatenations, it could save a lot of memory reallocations (thus time) if the grow size is set properly. It can improve the application's overall performance.
MFC's CArray has such a feature. The SetSize function has an optional second parameter nGrowSize which can be very useful if working with a very large array. Unfortunately, CString doesn't have it. And none of std containers has such a feature (right?).
Therefore, I extended CStdString to have this wrapper function
MYTYPE& Append(CT ch, int nGrowSize = 128)
{
if (this->capacity() < this->size()+1)
this->reserve(this->capacity() + nGrowSize);
return (*this += ch);
}
MYTYPE& Append(PCMYSTR sz, int nGrowSize = 1024)
{
if (this->capacity() < this->size()+sslen(sz))
this->reserve(this->capacity() + nGrowSize);
return (*this += sz);
}
But then I have to use Append instead of += what I'm fan of.
Anybody has better idea or comments?
|
|
|
|
|
Like most overloaded operators, operator += is just syntactic sugar. A nicety to make for less typing. Semantically, it is exactly the same thing as the basic_string::append function.
I guess what you want is a version of the operator that would would (like your new Append function) allow you to specify the grow length. I guess that would technically be a ternary operator. It certainly wouldn't be operator +=.
An alternative might be to define a string class with a different std::allocator object that ensured memory would be allocated in the chunk size you specify. The basic_string template takes three arguments: A character type, a traits type, and an allocator type.
typedef basic_string<char, char_traits<char>, CMyAllocator> CMyString
where 'CMyAllocator' is a class defined by you for allocating characters in specific chunks. It would have to follow all the semantics of std::allocator.
One problem with this approach is that you would need to know that size at compile time, not runtime, unless you designed some changeable chunksize setting into your allocator that could be set at runtime. And furthermore, such a string class would not be interchangeable with std::string or std::wstring -- technically it's a different C++ type.
If you look at the definition of my template ('CStdStr') you'll see I did NOT put these argments into the definition. The only thing that one can specify to my template is the character type. I then derive from the 'default' implementation of basic_string, given that character type.
I did this for a couple of reasons:
1. It keeps the template name short in the debugging information.
2. I was trying to design something that was interchangeable with the existing specializations of basic_string, std::string and std::wstring
Still, it's easy enough to change. Just alter the template definition for CStdStr to take these extra two arguments and supply defaults for them, just like basic_string does. Your template should derive from basic_string but now supply all three of these arguments. Then write your own allocator with these capabilities and pass it in as the argument you want. Just remember, whatever class you instantiate from this will NOT have an "is-a" relationship with std::string or std::wstring.
Another approach might be to write your own version of operator+= for CStdString. It would check some changeable "grow-size" setting inside the CStdString object be appended. You would then provide member functions to change this setting. However this approach would either entail a) adding a new member variable to CStdString to hold this grow size -- very, very bad OR b)adding some global variable/static member to hold the setting -- also very bad.
Seems like an awful lot of work just to avoid calling reserve(), doesn't it? I'm all for syntactical niceties myself, but sometimes, you just gotta do the extra work, I think.
-Joe
|
|
|
|
|
A bunch of things can be slow when appending a string:
1) Allocating new memory (and releasing old one).
2) Copying the data from the old memory location to the new one.
3) Scanning for the end of the string (when the string is long) to compute it's length.
The C version will suffer from problem #3.
On the other hand the STD version will suffer from problem #1 and #2. But of these problems vanish if reallocation is avoided (by pre-allocating memory).
I'm not sure how memory allocation is done for std::string but according to your performance analysis, I would guess that it would uses a small fixed grow size (probably 8, 16 or 32 characters or something like that).
A better way for performance is to double (or multiply by 1.5) the size instead of adding a fixed amount to it.
In applications I work on, we used to specify small grow size (to conserve memory) at beginning but we found that it was very slow when lot of data was added to the container.
What you should do it to reserve lot of memory and then copy that item into a new one if you want to conserve memory. For example, if you want to add your string to a std::vector, you could do something similar to:
std::vector<std::string> Container;
string Buffer;
Buffer.reserve(100000);
for (int i = 0; i < 10000; i++) Buffer += "More text...";
string Copy(Buffer);
Container.push_back(Copy);
You may also uses the swap trick (create a copy and then swap the copy (low memory overhead) with the original (high memory overhead) so that the object that is kept for a long period of time won't waste memory after it initialisation (assuming that strings are seldom changed after initialisation).
The reason that reserve is include in std::string and std::vector is exactly because avoiding reallocation and more importantly data copying can have a big effect on performance in situations similar to yours. When lot of data is appended, you should reserve memory for better performance.
For typical uses of std::string when the resulting string is not so long (generally well under 1k), the overhead won't matter for most applications.
Note that container like std::vector typically uses multiplicative increment so they are less affected by reallocation. For 1000000 append, it would take about 20 memory allocations (and data copy). In many case, I specify the size for vector only when I do know it (or know an upper bound and do not bother with wasted space - this will typically happens when copying data with some filtering).
Philippe Mori
|
|
|
|
|
The first version, you use CString class, is slowest because it must alloc memory when you += characters
The second version, you use the C runtime library, it's better because you don't have to alloc memory. But inside the strcat() function, it must calculate the length of the original string whenever you += characters. This may take some time if the string is too long.
In the third version, the string class keep the original length of the original string inside the class. So when you += new string, it doesn't need to call strlen() to calculate the original string length. That why it is the fastest code.
|
|
|
|
|
Hi,
I'm trying to compile this using vc5, and I'm getting errors like the ones listed below. Do you have any idea how I could get this working?
Thanks,
T
...stdstring.h(3236) : error C2908: explicit specialization; 'FmtArg<class cstdstr<char=""> >' has already been specialized from the primary template
...stdstring.h(3243) : error C2908: explicit specialization; 'FmtArg<class cstdstr<unsigned="" short=""> >' has already been specialized from the primary template
...stdstring.h(3251) : error C2242: typedef name cannot follow class/struct/union
...stdstring.h(3251) : error C2908: explicit specialization; 'FmtArg<class std::basic_string<char,struct="" std::char_traits<char="">,class std::allocator<char> > >' has already been specialized from the pri
|
|
|
|
|
VC5 ???? VC has always had terrible C++ support, even .NET is missing stuff every other C++ compiler supports. I'd suggest that this code is valid C++ and your compiler cannot understand it, because Microsoft suck at standards support.
Buy a new compiler - VC6 must be easy to get 2nd hand now that .NET is out.....
Christian
We're just observing the seasonal migration from VB to VC. Most of these birds will be killed by predators or will die of hunger. Only the best will survive - Tomasz Sowinski 29-07-2002 ( on the number of newbie posters in the VC forum )
|
|
|
|
|
First, Joe, like most of the people here, I wanted to congratulate you on one great class. CStdString is sharp, and I appreciate you making it available.
However, I have run into one oddity. I'm one of those that likes to compile with the warning level set to 4 in MSVC 6.0. Doing so lead me to an apparent odd dependency in StdString.h . The following code compiles with a series of minor warnings:
<br />
#include <atlbase.h><br />
#include <yvals.h><br />
#include "StdString.h"<br />
I directly suppressed the warnings with a pragma block around StdString.h:
<br />
#pragma warning (push)<br />
#pragma warning (disable: 4511 4663 4018 4100 4146 4244 4512)<br />
... The body of StdString.h ...<br />
#pragma warning (pop)<br />
and got a clean compile. Then i included StdString.h without first including atlbase.h and yvals.h . I received a series of warnings I felt should have been suppressed by the pragma warning disable above. The warnings appear to be coming from locale so I played with the SS_NOLOCALE macro. The number and location of warnings changed, but I was unable to create a clean compile without adding atlbase.h and yvals.h before StdString.h .
Admittedly, this situation is strictly cosmetic, but it is a touch mysterious to me. Reading yvals.h did not prove informative to me. Has anyone else seen this behavior? Do you have a solution, an explanation, a recommendation, or at least a good joke?
Thanks again for the great work,
cagey
|
|
|
|
|
Hi,
The fact is yvals.h is one of the oddest headers ever created by MS. It actually enables some warnings explicitly through its own #pragma warning directives. If you search through it, it seems to only enable a few, but I'd swear that it enables more than what appears there. In particular, it somehow seems to enable 4786.
The only way I've ever managed to work cleanly with it is to use the following trick:
1. Disable all warnings I want to disable
2. #include yvals.h
3. Re-disable all those same warnings
I have a utility library I use, the first few lines of the StdAfx.h look like this:
#pragma warning(disable: 4786) // symbolic name too long
#pragma warning(disable: 4201) // nonstandard extension used
#pragma warning(disable: 4511) // private copy constructors are good to have
(...etc...)
#include yvals.h // now #include the evil yvals.h and do it again
#pragma warning(disable: 4786) // symbolic name too long
#pragma warning(disable: 4201) // nonstandard extension used
#pragma warning(disable: 4511) // private copy constructors are good to have
(...etc...)
That's the only way I've ever found to reliably disable the warnings I want to. They must be disabled before and after the very first yvals.h is included.
There was a big discussion about this recently on the Yahoo newsgroup WinTechOffTopic. Some people use a trick of including one of the iostreams headers first instead of this (but I suspect that only works because those headers end up including yvals.h.
Regardless, <yvals.h> is the culprit. I've been dealing with this problem for years and I still don't understand how it re-enables warnings I've disables when I can find no #pragmas for them. But it does. And this trick is the only way I've found to make those warnings go away every time.
Give this trick a shot and let me know how it works. If you prefer you can email me directly. My address is in the StdString.h header file.
Also, make sure you're using the very latest version. You can always grab it here:
http://home.earthlink.net/~jmoleary/code/StdString.zip
-Joe
|
|
|
|
|
For years one huge incompatability with my CStdString vs MFC's CString has been that you could pass CString objects to the CString::Format() function fill in "%s" format specifiers, but with my class you could not.
In other words, with CString you could do this:
CString name("Joe");
CString val;
val.Format("My name is %s ", name);
But if you used CStdString (my class) in that example, the call to Format() would crash.
Well I am happy to say I have FINALLY figured out a way to workaround this incompatability. You can now pass a CStdString to Format() with no problems.
Important Note: The previous incompatability still exists for other variadic functions like sprintf() and the like, this only fixes it for Format(). My previous recommendation about using alternatives to variadic functions still applies. I just figured I'd get as much compatability as I could, since the compiler doesn't (and can't) warn you either way
Grab the latest code which fixes this incompatability here:
http://home.earthlink.net/~jmoleary/code/StdString.zip
Previously I couldn't do it because MFC's way to make it work relied upon the binary layout of the class. This was something I had no control over as my class derives from whatever implementation of basic_string is available.
But then I figured out a way to selectively apply strong typing to this function using a simple template trick. It did require me to overload the function based on number of arguments (an AWFUL lot of typing and so the file got a lot bigger). But the good news is that this should not bloat your runtime executables much as the functions are all templates that are inline and merely call through to an underlying format function.
Anyone who has any questions, email me. My email address is at the top of the code header file.
-Joe
|
|
|
|
|
I am attempting to use the StdString class with the Borland command line compiler. As soon as I include the header file in a hello world program, the compiler gives me a long list of error messages generated by the header file objidl.h ( which I am not ( directly ) including ).
I originally included it in a much larger project, which then generated errors from winscard.h, windef.h, and rpcndr.h as well ( I do not include any of these explicitly either ).
Is StdString compatible with the Borland compiler? If so, what do I need to do/set in order to make it work?
|
|
|
|
|
Hi,
The code should work fine on Borland, provided they have a compliant Standard C++ Library implementation. If you could email me the text of the errors you are getting, I will try to help you. My email address is printed in the code header file. It is:
mailto:jmoleary@earthlink.net
Also, make sure you have the very latest version of the code. You can get it here:
http://home.earthlink.net/~jmoleary/code/StdString.zip
This weekend, I will try to download the free version of the Borland compiler and see if I can reproduce your problems.
-Joe
-Joe
|
|
|
|
|
Anyone planning on downloading and using CStdString should download the latest and greatest version. You may always find it here:
http://home.earthlink.net/~jmoleary/code/StdString.zip
The version on CodeProject is a bit out of date and I never have time to update my article.
-Joe O'Leary
|
|
|
|
|
This seems to "do nothing", i.e. the sring remains unchanged:
CStdString strOut ;
...
strOut += CStdString( ((COLUMN_WIDTH - iLastRowSize) * 3), ' ' ) ;
...
This "works":
CStdString strOut, strTmp ;
...
strTmp = CStdString( ((COLUMN_WIDTH - iLastRowSize) * 3), ' ' ) ;
strOut += strTmp ;
...
I traced all the way into to CStdStr::CStdStr( MYSIZE nSize, MYVAL ch, const MYALLOC& al=MYALLOC())... which correctly contructs a CStdString object, with the expected string. Then MYTYPE& operator+=(const MYTYPE& str) gets called (as I'd expect)... but something gets lost within and it returns the source string unchanged...
Hmmmm?
tonyB.
|
|
|
|
|
In this case I think this is definately one of those "d'oh" moments... In this case what I was trying to do was append to a string that I had previously been accessing (and modifying) via a pointer returned from GetBufferSetLength(...). That seems to be my problem.
tonyB.
|
|
|
|
|
Whoops, scrap that reply then. Glad there's no problem!
-Joe
|
|
|
|
|
Thanks for this. Solved my problem beautifully.
|
|
|
|
|
Great! Glad to hear it.
Please make sure you have the very latest version, though. I've fixed a thing or two since I submitted it here and haven't found time to resubmit.
You can always get the very latest version here:
http://home.earthlink.net/~jmoleary/code/StdString.zip
|
|
|
|
|
In trying to replace my usage of CString with this cool class... everything "seemed" to be working... up until I used the CStdStringA strName( ' ', DEF_LEN ) c'tor. Instead of creating ' ' filled string, of length DEF_LEN, I get a '0' filled string...
Chaging my params around to give (DEF_LEN, ' ')works.
tonyB.
|
|
|
|
|
Hi,
This is an incompatability problem that was unavoidable. I mention this specifically on my website but probably not fully enough in my article here.
The problem is that CString has the order of arguments the way you wanted it -- (first character, then count). But basic_string<> has the order of arguments reversed (first count, then character). It is not possible to have both constructors because it leads to compiler ambiguities due to the types involved. Therefore I had to choose one.
In cases like these where there was a conflict between the CString facade and the basic_string base class, I felt it was important to go with the basic_string way of doing things. So I implemented the constructor in the basic_string manner.
Incidentally, another incompatability much like this was in the array-indexing operator, operator[]. CString's version of this function returns characters by VALUE but basic_string's version returns them by reference. Again, CStdString's implementation returns by reference as the base class version does.
For a full description of issues like this and why they were necessary, expand the comments section under this article(you might need to change the date filter to go back further in time) and read the thread entitled "operator[] and other incompatabilites"
You can also read the articles about CStdString on my website (sorely in need of updates, I'm sorry to say) at this URL
http://home.earthlink.net/~jmoleary/stdstring.htm
Please make sure you have the very latest version of the code as it is more recent even then the CodeProject version. You can alway get it here:
http://home.earthlink.net/~jmoleary/code/StdString.zip.
Finally, if you ever have any questions, feel free to email me at the address listed at the top of the StdString.h header file.
Hope this helps,
Joe O'Leary
|
|
|
|
|