|
Hi,
I have been using the CStdString class for a few weeks now and it has been working very well. However, I have been using this string recently to generate dynamic SQL statements and I seem to have a problem allocating a string over 481 characters.
I assumed that the max_size() member of basic_string would tell me what the max size for the string could be but this always seems to return -3.
Do you have any suggestions?
James Spibey
I love the word naked, it's brilliant isn't it, 'naked'. When I was a kid I used to write the word naked on a bit of paper hundreds of times and rub my face in it - Jeff, Coupling, BBC2
|
|
|
|
|
Oh my,
I'm so sorry to take so long to reply (3 months!). I was never notified by email of your comment posting and I seem to have completely missed it. In fact I responded to another one after it and still never saw yours.
I assume you have already solved your problem, but for the record I will answer your question. Again, I am terribly sorry to have missed it.
---
There should be no problem allocating strings over 481 characters or strings 1000 times that size.
The reason you are seeing a max_size of -3 is because you are putting the return value of max_size, an unsigned int, into a signed int. Thus it looks to you like -3. But as an unsigned int that's 4294967293. I think that should be plenty large enough for your purposes.
I don't know what's causing the other problems for you. I don't have access to my home email right now, but I hope you managed to contact me personally about this and didn't have to wonder. If you are still haveing problems I'd be happy to take a look at a test project for you.
Joe O'Leary
|
|
|
|
|
I have tried a lot of alternative string classes.
Most of them have the same bug (like yours):
CStdString s("x");
for(int i=0; i<20;i++)
{
s+=s;
}
the problem may be in the standard library.
|
|
|
|
|
Wow! Very sneaky bug. You're right, it is a bug in the implementation of the Standard C++ Library that comes with Visual C++. The good news is that the latest implementation of the Dinkumware Standard C++ Library has fixed this bug. If you are running Visual C++ and you buy the next version it will be fixed. It is also fixed in the commercially available Dinkumware release too, if you just can't wait.
I don't know if STLPort has the problem or has fixed it. I didn't have time to test.
For the record (since you didn't mention it) the effect is that around the fourth time through the loop, the string becomes corrupted and the program crashes
What happens is that at around the 4th time through the loop, the string object realizes it doesn't have enough capacity to double it's size yet *again* so it grows -- it adds some more capacity.
Unfortunately, it has already called c_str() on the string being added. Normally this wouldn't matter but since the string being added is itself, that means it has a direct pointer to it's own string buffer that is going to become invalid once it reallocates its internal memory.
In the meantime, a workaround is to perform checks before adding to see if
1. The string being added lies within it's boundaries
2. It is going to need to grow capacity
If so, the string should either grow first, or make a copy of what's being added (to a separate string object) before growing. Obviously this must be done in the library, not in my derived class. You need an update, I'm afraid.
In the meantime, I have posted an updated version of StdString which will make your program run properly. However, it will only work for CStdString objects. It will *not* fix this problem if you were to use simple std::string or std::wstring objects. Thus, it's only a band-aid. The true fix is to update your Standard C++ Library
You can get the fix here
http://home.earthlink.net/~jmoleary/code/StdString.zip
Thanks for the report.
Joe O'
Joe O'Leary
|
|
|
|
|
Yes, it is a common issue related to the way buffers are internally released when they need to be grown. Quite complex, by the way.
Try this one - it does not have the bug (but it's not based on STL either)
http://www.utilitycode.com/str
|
|
|
|
|
It is also not free. The above response essentially an advertisement. I always enjoy it when someone comes along to a site designed to help people out and advertises their products in the guise of helping people.
To the original poster: It has been long enough that I assume you have worked around this problem. However if you are still concerned about this issue, you can avoid this bug and save yourself some money by just using one of the free implementations of the STL, such as STLPort or GCC. No need to pay someone for just a string library. By all means look at this guys string library, but don't think for a minute that you don't have many many free alternatives out there with more thousands of hours of development than this library has ever seen. And I'm not talking about anything I've written.
-Joe
|
|
|
|
|
Joe, I wanted to thank you for saving me many hours in standard string debugging time with your excellent CStdString class.
To anyone who wants to be free of MFC: Check this out!
Sincerely,
Desert Eagle
|
|
|
|
|
You saved me a ton of work. Thanks!
Jon Sagara
"Left-handed nunchakus!"
|
|
|
|
|
You're welcome! I'm glad you like it. Please email me if you have any problems.
By the way, you might want to make sure you have the very latest version. You can always get it at this link:
http://home.earthlink.net/~jmoleary/code/StdString.zip
|
|
|
|
|
I tried the following (of course not exactly like this, but simplified it looks like this), which works fine with CString, with CStdString you get a runtime-error.
CStdString s1, s2;
s1 = "a";
s2.Format("%s", s1);
|
|
|
|
|
Hi,
Yes, This is a problem. You must either call c_str() on the string object or cast it to a LPCTSTR. Something Like this
s2.Format(_T("%s"), s1.c_str());
or alternately you could do this:
s2.Format(_T("%s"), static_cast <LPCTSTR>(s1));
or even this
s2.Format(_T("%s"), (LPCTSTR)s1);
I neglected to mention this incompatability in the article. However if you read through the responses to the article (you might have to adjust the date filter to see them all) you'll find I did discuss it fully underneath my first reponse to William E. Kempf in the thread entitled
"Operator[] and other incompatabilities"
If you read that message you'll find a much more complete discussion. I will post an update to the article within the next week or so people can be made aware of this problem immediately, without having to read the responses.
The reason it does not work has to do with the binary layout of basic_string, from which CStdString derives. Frankly it's a hack that this code even works with CString -- an intentional hack made by the CString design. Unfortunately, I have no control over the binary layout of basic_string so it happens.
So I'm afraid you must either call c_str() all the time or cast. You might also consider using stringstreams, as those are type safe and don't have this problem
Joe O'Leary
|
|
|
|
|
This cannot work in a protable way. the MFC implementation (CString containing only one member, that is a pointer to the NULL-terminated string) relies on the way VC++ handles variable argument list. You can't get this working with a basic_string (except with major overhead at implem and/or runtime), and it's never portable (As far as I know gcc won't handle it corectly)
so you ned to specify the type cast (LPCTSTR) s1 whenever you pass a CStdString to a function when expected parameter type is not known by the compiler.
Peter
|
|
|
|
|
This is a good job! but I have been found one menory leak by to used BoundsChecker, like this:
33 bytes allocated by operator new in c:\program files\microsoft visual
stdio\vc98\include\xmemory(30),HANDLE: 0x016D1670.
|
|
|
|
|
Boundschecker is a good program, but it 'finds' memory leaks in operator new when there are none. I haven't seen the code in question here, but I would stake my life on BoundsChecker being over enthusiastic in this area. We all spent a fair amount of time trying to prove BoundsChecker right on some of our own code and the fact was that it was clearly wrong.
Christian
#include "std_disclaimer.h"
People who love sausage and respect the law should never watch either one being made.
The things that come to those who wait are usually the things left by those who got there first.
|
|
|
|
|
Hi,
The entire implementation of CStdString uses only public member functions of the base class template basic_string. There is no use whatsoever of any implementation details. There is also no use whatsoever of either operator new or of malloc.
In other words, I don't allocate any heap memory in StdString.h. If you are getting a memory leak, the odds are that it lies in
1. Your program OR
2. the Visual C++ implementation of basic_string<> (which service pack do you have?), OR
3. it does not exist at all and this is one of the false positives that BoundsChecker often reports.
If you have less than Visual Studio service pack #5, try updating and run again.
Is this leak proportional to the amount of string work you do, or is it always 33 bytes, no matter how much string processing goes on? If the latter is true, it might be BoundsChecker falsely reporting something that the MS implementation allocates statically or in a just-in-time manner, (to be freed by atexit). I have occasionally seen leaks reported by such code that were not really leaks at all.
Failing all this, can you send me a small sample program to reproduce this? I don't have BoundsChecker, but I could at least put breakpoints down on the CRT malloc/free implementations to verify that the memory reported as leaked is being freed.
Joe O'
Joe O'Leary
|
|
|
|
|
Hi,
I wrote the article : "Read and Write application parameters in XML" in Code Project, you can find it at http://www.codeproject.com/soap/paramio.asp.
I was asked how to do to have the code compile/work under UNICODE and I wondered if using CStdString instead std::string wouldn't do that for us.
I may be totally wrong as I don't know UNICODE at all. Please tell me.
Arnaud
|
|
|
|
|
Hi Arnaud,
You don't need CStdString to do work under UNICODE. What you need to understand is what UNICODE is and how it fits into Windows.
UNICODE is a 16-bit character encoding designed to be able to represent almost all scripts in all languages. That's actually an over simplification but let's not get too technical... Anyway, since the wchar_t type is two bytes on the Windows platform, UNICODE characters fit very neatly into a std::wstring on Windows.
If you need your program to be able to handle characters of any language, UNICODE is an excellent choice. You use wstrings instead of strings and read and write everything as wchar_t-based strings.
This works pretty well until you need to run your UNICODE-string-based program on one of the less powerful versions of Windows (Win95, 98, or ME) and want to call some operating system function that takes a string.
For example, suppose you want to set the text of a window: You'll see that the Win32 headers provide two versions of SetWindowText -- one that takes an ANSI string and one that takes a wide string. Almost all Win32 functions that take strings are declared this way. Here's roughly how the function is declared in the Win32 headers:
. BOOL WINAPI SetWindowTextA(HWND hWnd, LPCSTR lpString);
. BOOL WINAPI SetWindowTextW(HWND hWnd, LPCWSTR lpString);
. #ifdef UNICODE
. #define SetWindowText SetWindowTextW
. #else
. #define SetWindowText SetWindowTextA
. #endif // !UNICODE
Unfortunately, on Win9x and ME, the wide character-based version will not work. Unless your program runs on Windows NT, 2000, or XP, you must manually convert your UNICODE string to ANSI and call SetWindowTextA. Not much fun. And the same rule applies to every Win32 function that takes a string.
Generally when people talk about "doing a UNICODE build" on Windows, they're talking about #define-ing the preprocessor macro _UNICODE (note the leading underscore) early on -- before #include-ing the Win32 header TCHAR.H. What this does is change the mapping of the generic character type TCHAR. It also maps the generic versions of Win32 functions (e.g. "SetWindowText" above) to the specific one ("SetWindowTextA" or "SetWindowTextW").
I'm tempted to go and on about this topic but the fact is there are established sources out there which explain it far better than I ever could. One such source is the MSDN itself. Search it for the topic "UNICODE" and you'll find a wealth of articles. Here are some links to a couple of good ones that should get you started:
http://msdn.microsoft.com/library/psdk/winbase/unicode_0mw9.htm
http://msdn.microsoft.com/library/periodic/period99/multilangUnicode.htm
http://msdn.microsoft.com/library/psdk/msaa/msaaovrw_1zcj.htm
If that's not enough, you could consult a good book, such as "International Programming for Microsoft Windows" by David Schmitt
http://www.amazon.com/exec/obidos/ASIN/1572319569/qid=991197339/sr=1-2/ref=sc_b_2/102-1896360-1959348.
or even one about the Unicode standard itself:
http://www.amazon.com/exec/obidos/ASIN/0764546252/ref=sim_books/102-1896360-1959348
Hope this helps
Joe O'Leary
|
|
|
|
|
Hi Joe,
I don't know how to thank you for this very good answer. Thank you very much!
I think I more or less understand and it works globally, now. I'll read the articles of the links you sent we and will try to build something from of this.
Thanks a lot,
Arnaud
|
|
|
|
|
I don't understand your comment in the code about operator[]. The standard clearly defines the return type of std::basic_string::operator[] to be of std::basic_string::reference type which will be TCHAR& in your case. This is return by reference, not return by value. So it should be simple to make CStdString behave exactly like *both* std::basic_string and CString in this case. (Pertinent section in the standard is 21.3.4, and the Dinkumware library shipped with VC6 is right in this case.)
There are other areas where you won't get fully compatible semantics, however. CString is reference counted, and while theoretically the standard allows reference counting there are technical reasons that are leading most implementations to *NOT* reference count std::basic_string. Though this won't change the interfaces any, it may lead some code to be non-portable. You also need to be aware that there are some things that are "safe" to do with CString that aren't portable with std::basic_string. For instance:
printf("%s", str);
This will work with CString because of a tricky non-portable hack that MS uses in the data layout of CString, but it will *NOT* work with your CStdString (the implicit cast will never be called).
I personally don't like several things in CString's interface (implicit conversions are evil here, for example) but I understand the desire for many MFC programmers to retain the familiarity in portable code. But users need to be aware of several areas where portable code simply can't be coded using your CStdString, as well of areas where undefined behavior will result. If truly safe and portable code is wanted I'd recommend sticking with std::basic_string.
William E. Kempf
|
|
|
|
|
> This will work with CString because of a tricky non-portable
> hack that MS uses in the data layout of CString
What exactly makes this hack non-portable? Just curious
Tomasz Sowinski -- http://www.shooltz.com.pl
|
|
|
|
|
I meant non-portable to std::basic_string, but this wasn't very clear from my wording. Sorry about that.
William E. Kempf
|
|
|
|
|
Hi William,
The confusion regarding operator[] was a typo by me in the article. I got it backwards. I meant to say that CString returns the characters by value and basic_string returns them by reference, not the other way around. Sorry for the mistake.
In fact let me apologize here for not thoroughly taking my time with the the article and covering all the bases. I posted this code long ago at CodeGuru and someone had asked me to post it here. I did the article in a bit of a rush I'm afraid. I was trying to finish it before going on vacation and I left out a couple of important points. Most of this stuff I discuss more fully on my website but even that could use an update. I'll post an update to the article when I get a chance. I'll try to address the issues here point by point:
VARIADIC FUNCTIONS:
-------------------
One of the points brought up William regards using CStdString in variadic functions: functions that take a variable number arguments such as printf. As far as the printf("%s", str) thing goes, yes, it too will not work with CStdString as it does in MFC. In fact that's probably the issue about which people most commonly email me. Again, my apologies. I meant to mention it, but rushed the article out.
The only reason printf("%s", str) works with CString is that the CString designers were extremely careful to lay out the class so that the first 4 bytes are always a pointer to the actual null-terminated string (even though it's reference counted). Even Microsoft recommends that you always cast a CString when using it in a variadic function this way, but you manage to get by anyway because they were looking out for you.
In short, whenever you use a CStdString as one of the variable arguments to a variadic function like printf() or Format(), you must always call c_str(). That's just about the only time you need to, though. Frankly, that's a great argument to go to iostreams and their implicit type safety anyway
MBCS CHARACTER HANDLING:
------------------------
Another important point that I neglected to mention regards Visual C++ Win32 programmers doing MBCS builds (with compiler flag _MBCS turned on). CStdString will not always handle MBCS characters properly the way the CString does.
You would only notice this with true extended MBCS characters that go beyond the standard ASCII set so most people will never have any problem. However you should be aware. In particular, functions which iterate through characters may not bring you the results you would get in a CString build. This was unavoidable, I'm afraid. If your platform supports UNICODE (e.g. WinNT, 2000, XP), I highly recommend it as an alternative.
REFERENCE COUNTING
------------------
Whether or not you have reference counting in your version of CStdString is strictly up to the design of your particular library implementation of basic_string<>. Most implementations are not reference counted. It's true that CString does have a couple of functions that directly address reference counting -- LockBuffer() and UnlockBuffer(). However those were also the two functions I mentioned in the article that I was unable to implement -- for these very reasons.
IMPLICIT CAST OPERATOR
------------------------
Regarding the implicit cast, I'm afraid that's one of those debates that in which one side rarely convinces the other. Implicit casts are not "evil". Yes, when used carelessly can be dangerous. However when used properly they can be a godsend. Like most features of C++ (e.g. operator overloading), an implicit cast operator is a tool that gives you ease of use at the cost of some risks. Generally a single implicit cast operator is a safe thing. Multiple implicit cast operators in one class are an accident waiting to happen.
The implicit cast to LPCTSTR is one of the things that many people (like me) always loved about CString. In my development, I have to call so many functions (Win32, CRT, etc) that take some type of const string pointer (const char*, LPCTSTR, whatever) that frankly I get tired of having to type c_str() all the time. It clutters up my code, needlessly in my opinion. In fact, the lack of that explicit cast in basic_string was the original reason I created CStdString in the first place, years ago
However if anyone doesn't like it, you do have the code. By all means, comment out the
operator const CT*() const;
member function in the StdString.h header.
I hope I managed to hit all the points here.
|
|
|
|
|
A comment on Mr. O'Leary's variadic comments:
The CString designers didn't "lay out the class so that the first 4 bytes
are always a pointer to the actual null-terminated string", they layed it out so the *ONLY* data in CString is a 4 byte pointer to a null terminated string. The extra data needed by CString (like the ref-count) is hidden in a memory block that resides before the null terminated string in memory. In other words, when CString allocates memory for the "string" it allocates enough memory for both the "string" and a header block and then sets the internal pointer to point at this buffer + sizeof(header) bytes. If you don't understand this then look closely at the implementation of CString. It's a very tricky hack to enable CString to work inside of printf (and other variadic functions that expect a char* to be passed).
I'm not posting this to be nitpicky with Mr. O'Leary's code or his response here, only to point out how tricky this issue really is. In my opinion, MS did a disservice to users of CString when they added this hack. Because they did many people are unaware of the issues and get themselves into trouble with their own extensions. It also illustrates quite nicely one of the reasons why implicit casts are considered "evil" by many. There are numerous cases where implicit casts result in unexpected results at compile time, while explicit casts do not. The only reason to prefer implicit casts is because they save you some typing, but that's usually a bad reason to make any programming choice. It's an understandable desire but when it can lead to errors in usage...
Of course, like Mr. O'Leary says, it's unlikely I'll persuade anyone who's strongly on the other side of this debate. You just need to understand why the C++ standard doesn't make use of implicit casts. It's also precisely why the "explicit" keyword exists. So as long as you're on the other side of this argument you should know that you'll always be fighting the C++ language and its standard libraries.
William E. Kempf
|
|
|
|
|
I am on the other side of this argument and I have never found myself fighting the language or the standard libraries.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
|
Well said Tim. I'm with you.
At the end of the day we all have to produce results. That's not to say we shouldn't be unaware of what our code is doing - we should. But I'd rather see it work because of a carefully designed "hack" in the MFC implementation than fail with a runtime error (assuming the compiler couldn't catch it).
I reckon that arguments like this one (and the MFC vs STL debate) wouldn't have become so commonplace if the Standard C++ library had appeared when it was needed (1990 or so) instead of much later (and the crap documentation doesn't help - though that's probably down to Redmond).
Give them their due - the MFC dev team produced classes (in particular CString and the collection classes) which were needed at the time, easy to use and well documented.
The fact that there are now standard alternatives (albeit with a steeper learning curve) doesn't take anything away from that achievement, and nor should it.
Andy Metcalfe - Sonardyne International Ltd (andy.metcalfe@lineone.net) http://www.resorg.co.uk
"I used to be a medieval re-enactor, but I'm (nearly) alright now..."
|
|
|
|
|