|
You're welcome! I'm glad you like it. Please email me if you have any problems.
By the way, you might want to make sure you have the very latest version. You can always get it at this link:
http://home.earthlink.net/~jmoleary/code/StdString.zip
|
|
|
|
|
I tried the following (of course not exactly like this, but simplified it looks like this), which works fine with CString, with CStdString you get a runtime-error.
CStdString s1, s2;
s1 = "a";
s2.Format("%s", s1);
|
|
|
|
|
Hi,
Yes, This is a problem. You must either call c_str() on the string object or cast it to a LPCTSTR. Something Like this
s2.Format(_T("%s"), s1.c_str());
or alternately you could do this:
s2.Format(_T("%s"), static_cast <LPCTSTR>(s1));
or even this
s2.Format(_T("%s"), (LPCTSTR)s1);
I neglected to mention this incompatability in the article. However if you read through the responses to the article (you might have to adjust the date filter to see them all) you'll find I did discuss it fully underneath my first reponse to William E. Kempf in the thread entitled
"Operator[] and other incompatabilities"
If you read that message you'll find a much more complete discussion. I will post an update to the article within the next week or so people can be made aware of this problem immediately, without having to read the responses.
The reason it does not work has to do with the binary layout of basic_string, from which CStdString derives. Frankly it's a hack that this code even works with CString -- an intentional hack made by the CString design. Unfortunately, I have no control over the binary layout of basic_string so it happens.
So I'm afraid you must either call c_str() all the time or cast. You might also consider using stringstreams, as those are type safe and don't have this problem
Joe O'Leary
|
|
|
|
|
This cannot work in a protable way. the MFC implementation (CString containing only one member, that is a pointer to the NULL-terminated string) relies on the way VC++ handles variable argument list. You can't get this working with a basic_string (except with major overhead at implem and/or runtime), and it's never portable (As far as I know gcc won't handle it corectly)
so you ned to specify the type cast (LPCTSTR) s1 whenever you pass a CStdString to a function when expected parameter type is not known by the compiler.
Peter
|
|
|
|
|
This is a good job! but I have been found one menory leak by to used BoundsChecker, like this:
33 bytes allocated by operator new in c:\program files\microsoft visual
stdio\vc98\include\xmemory(30),HANDLE: 0x016D1670.
|
|
|
|
|
Boundschecker is a good program, but it 'finds' memory leaks in operator new when there are none. I haven't seen the code in question here, but I would stake my life on BoundsChecker being over enthusiastic in this area. We all spent a fair amount of time trying to prove BoundsChecker right on some of our own code and the fact was that it was clearly wrong.
Christian
#include "std_disclaimer.h"
People who love sausage and respect the law should never watch either one being made.
The things that come to those who wait are usually the things left by those who got there first.
|
|
|
|
|
Hi,
The entire implementation of CStdString uses only public member functions of the base class template basic_string. There is no use whatsoever of any implementation details. There is also no use whatsoever of either operator new or of malloc.
In other words, I don't allocate any heap memory in StdString.h. If you are getting a memory leak, the odds are that it lies in
1. Your program OR
2. the Visual C++ implementation of basic_string<> (which service pack do you have?), OR
3. it does not exist at all and this is one of the false positives that BoundsChecker often reports.
If you have less than Visual Studio service pack #5, try updating and run again.
Is this leak proportional to the amount of string work you do, or is it always 33 bytes, no matter how much string processing goes on? If the latter is true, it might be BoundsChecker falsely reporting something that the MS implementation allocates statically or in a just-in-time manner, (to be freed by atexit). I have occasionally seen leaks reported by such code that were not really leaks at all.
Failing all this, can you send me a small sample program to reproduce this? I don't have BoundsChecker, but I could at least put breakpoints down on the CRT malloc/free implementations to verify that the memory reported as leaked is being freed.
Joe O'
Joe O'Leary
|
|
|
|
|
Hi,
I wrote the article : "Read and Write application parameters in XML" in Code Project, you can find it at http://www.codeproject.com/soap/paramio.asp.
I was asked how to do to have the code compile/work under UNICODE and I wondered if using CStdString instead std::string wouldn't do that for us.
I may be totally wrong as I don't know UNICODE at all. Please tell me.
Arnaud
|
|
|
|
|
Hi Arnaud,
You don't need CStdString to do work under UNICODE. What you need to understand is what UNICODE is and how it fits into Windows.
UNICODE is a 16-bit character encoding designed to be able to represent almost all scripts in all languages. That's actually an over simplification but let's not get too technical... Anyway, since the wchar_t type is two bytes on the Windows platform, UNICODE characters fit very neatly into a std::wstring on Windows.
If you need your program to be able to handle characters of any language, UNICODE is an excellent choice. You use wstrings instead of strings and read and write everything as wchar_t-based strings.
This works pretty well until you need to run your UNICODE-string-based program on one of the less powerful versions of Windows (Win95, 98, or ME) and want to call some operating system function that takes a string.
For example, suppose you want to set the text of a window: You'll see that the Win32 headers provide two versions of SetWindowText -- one that takes an ANSI string and one that takes a wide string. Almost all Win32 functions that take strings are declared this way. Here's roughly how the function is declared in the Win32 headers:
. BOOL WINAPI SetWindowTextA(HWND hWnd, LPCSTR lpString);
. BOOL WINAPI SetWindowTextW(HWND hWnd, LPCWSTR lpString);
. #ifdef UNICODE
. #define SetWindowText SetWindowTextW
. #else
. #define SetWindowText SetWindowTextA
. #endif // !UNICODE
Unfortunately, on Win9x and ME, the wide character-based version will not work. Unless your program runs on Windows NT, 2000, or XP, you must manually convert your UNICODE string to ANSI and call SetWindowTextA. Not much fun. And the same rule applies to every Win32 function that takes a string.
Generally when people talk about "doing a UNICODE build" on Windows, they're talking about #define-ing the preprocessor macro _UNICODE (note the leading underscore) early on -- before #include-ing the Win32 header TCHAR.H. What this does is change the mapping of the generic character type TCHAR. It also maps the generic versions of Win32 functions (e.g. "SetWindowText" above) to the specific one ("SetWindowTextA" or "SetWindowTextW").
I'm tempted to go and on about this topic but the fact is there are established sources out there which explain it far better than I ever could. One such source is the MSDN itself. Search it for the topic "UNICODE" and you'll find a wealth of articles. Here are some links to a couple of good ones that should get you started:
http://msdn.microsoft.com/library/psdk/winbase/unicode_0mw9.htm
http://msdn.microsoft.com/library/periodic/period99/multilangUnicode.htm
http://msdn.microsoft.com/library/psdk/msaa/msaaovrw_1zcj.htm
If that's not enough, you could consult a good book, such as "International Programming for Microsoft Windows" by David Schmitt
http://www.amazon.com/exec/obidos/ASIN/1572319569/qid=991197339/sr=1-2/ref=sc_b_2/102-1896360-1959348.
or even one about the Unicode standard itself:
http://www.amazon.com/exec/obidos/ASIN/0764546252/ref=sim_books/102-1896360-1959348
Hope this helps
Joe O'Leary
|
|
|
|
|
Hi Joe,
I don't know how to thank you for this very good answer. Thank you very much!
I think I more or less understand and it works globally, now. I'll read the articles of the links you sent we and will try to build something from of this.
Thanks a lot,
Arnaud
|
|
|
|
|
I don't understand your comment in the code about operator[]. The standard clearly defines the return type of std::basic_string::operator[] to be of std::basic_string::reference type which will be TCHAR& in your case. This is return by reference, not return by value. So it should be simple to make CStdString behave exactly like *both* std::basic_string and CString in this case. (Pertinent section in the standard is 21.3.4, and the Dinkumware library shipped with VC6 is right in this case.)
There are other areas where you won't get fully compatible semantics, however. CString is reference counted, and while theoretically the standard allows reference counting there are technical reasons that are leading most implementations to *NOT* reference count std::basic_string. Though this won't change the interfaces any, it may lead some code to be non-portable. You also need to be aware that there are some things that are "safe" to do with CString that aren't portable with std::basic_string. For instance:
printf("%s", str);
This will work with CString because of a tricky non-portable hack that MS uses in the data layout of CString, but it will *NOT* work with your CStdString (the implicit cast will never be called).
I personally don't like several things in CString's interface (implicit conversions are evil here, for example) but I understand the desire for many MFC programmers to retain the familiarity in portable code. But users need to be aware of several areas where portable code simply can't be coded using your CStdString, as well of areas where undefined behavior will result. If truly safe and portable code is wanted I'd recommend sticking with std::basic_string.
William E. Kempf
|
|
|
|
|
> This will work with CString because of a tricky non-portable
> hack that MS uses in the data layout of CString
What exactly makes this hack non-portable? Just curious
Tomasz Sowinski -- http://www.shooltz.com.pl
|
|
|
|
|
I meant non-portable to std::basic_string, but this wasn't very clear from my wording. Sorry about that.
William E. Kempf
|
|
|
|
|
Hi William,
The confusion regarding operator[] was a typo by me in the article. I got it backwards. I meant to say that CString returns the characters by value and basic_string returns them by reference, not the other way around. Sorry for the mistake.
In fact let me apologize here for not thoroughly taking my time with the the article and covering all the bases. I posted this code long ago at CodeGuru and someone had asked me to post it here. I did the article in a bit of a rush I'm afraid. I was trying to finish it before going on vacation and I left out a couple of important points. Most of this stuff I discuss more fully on my website but even that could use an update. I'll post an update to the article when I get a chance. I'll try to address the issues here point by point:
VARIADIC FUNCTIONS:
-------------------
One of the points brought up William regards using CStdString in variadic functions: functions that take a variable number arguments such as printf. As far as the printf("%s", str) thing goes, yes, it too will not work with CStdString as it does in MFC. In fact that's probably the issue about which people most commonly email me. Again, my apologies. I meant to mention it, but rushed the article out.
The only reason printf("%s", str) works with CString is that the CString designers were extremely careful to lay out the class so that the first 4 bytes are always a pointer to the actual null-terminated string (even though it's reference counted). Even Microsoft recommends that you always cast a CString when using it in a variadic function this way, but you manage to get by anyway because they were looking out for you.
In short, whenever you use a CStdString as one of the variable arguments to a variadic function like printf() or Format(), you must always call c_str(). That's just about the only time you need to, though. Frankly, that's a great argument to go to iostreams and their implicit type safety anyway
MBCS CHARACTER HANDLING:
------------------------
Another important point that I neglected to mention regards Visual C++ Win32 programmers doing MBCS builds (with compiler flag _MBCS turned on). CStdString will not always handle MBCS characters properly the way the CString does.
You would only notice this with true extended MBCS characters that go beyond the standard ASCII set so most people will never have any problem. However you should be aware. In particular, functions which iterate through characters may not bring you the results you would get in a CString build. This was unavoidable, I'm afraid. If your platform supports UNICODE (e.g. WinNT, 2000, XP), I highly recommend it as an alternative.
REFERENCE COUNTING
------------------
Whether or not you have reference counting in your version of CStdString is strictly up to the design of your particular library implementation of basic_string<>. Most implementations are not reference counted. It's true that CString does have a couple of functions that directly address reference counting -- LockBuffer() and UnlockBuffer(). However those were also the two functions I mentioned in the article that I was unable to implement -- for these very reasons.
IMPLICIT CAST OPERATOR
------------------------
Regarding the implicit cast, I'm afraid that's one of those debates that in which one side rarely convinces the other. Implicit casts are not "evil". Yes, when used carelessly can be dangerous. However when used properly they can be a godsend. Like most features of C++ (e.g. operator overloading), an implicit cast operator is a tool that gives you ease of use at the cost of some risks. Generally a single implicit cast operator is a safe thing. Multiple implicit cast operators in one class are an accident waiting to happen.
The implicit cast to LPCTSTR is one of the things that many people (like me) always loved about CString. In my development, I have to call so many functions (Win32, CRT, etc) that take some type of const string pointer (const char*, LPCTSTR, whatever) that frankly I get tired of having to type c_str() all the time. It clutters up my code, needlessly in my opinion. In fact, the lack of that explicit cast in basic_string was the original reason I created CStdString in the first place, years ago
However if anyone doesn't like it, you do have the code. By all means, comment out the
operator const CT*() const;
member function in the StdString.h header.
I hope I managed to hit all the points here.
|
|
|
|
|
A comment on Mr. O'Leary's variadic comments:
The CString designers didn't "lay out the class so that the first 4 bytes
are always a pointer to the actual null-terminated string", they layed it out so the *ONLY* data in CString is a 4 byte pointer to a null terminated string. The extra data needed by CString (like the ref-count) is hidden in a memory block that resides before the null terminated string in memory. In other words, when CString allocates memory for the "string" it allocates enough memory for both the "string" and a header block and then sets the internal pointer to point at this buffer + sizeof(header) bytes. If you don't understand this then look closely at the implementation of CString. It's a very tricky hack to enable CString to work inside of printf (and other variadic functions that expect a char* to be passed).
I'm not posting this to be nitpicky with Mr. O'Leary's code or his response here, only to point out how tricky this issue really is. In my opinion, MS did a disservice to users of CString when they added this hack. Because they did many people are unaware of the issues and get themselves into trouble with their own extensions. It also illustrates quite nicely one of the reasons why implicit casts are considered "evil" by many. There are numerous cases where implicit casts result in unexpected results at compile time, while explicit casts do not. The only reason to prefer implicit casts is because they save you some typing, but that's usually a bad reason to make any programming choice. It's an understandable desire but when it can lead to errors in usage...
Of course, like Mr. O'Leary says, it's unlikely I'll persuade anyone who's strongly on the other side of this debate. You just need to understand why the C++ standard doesn't make use of implicit casts. It's also precisely why the "explicit" keyword exists. So as long as you're on the other side of this argument you should know that you'll always be fighting the C++ language and its standard libraries.
William E. Kempf
|
|
|
|
|
I am on the other side of this argument and I have never found myself fighting the language or the standard libraries.
Tim Smith
Descartes Systems Sciences, Inc.
|
|
|
|
|
Well said Tim. I'm with you.
At the end of the day we all have to produce results. That's not to say we shouldn't be unaware of what our code is doing - we should. But I'd rather see it work because of a carefully designed "hack" in the MFC implementation than fail with a runtime error (assuming the compiler couldn't catch it).
I reckon that arguments like this one (and the MFC vs STL debate) wouldn't have become so commonplace if the Standard C++ library had appeared when it was needed (1990 or so) instead of much later (and the crap documentation doesn't help - though that's probably down to Redmond).
Give them their due - the MFC dev team produced classes (in particular CString and the collection classes) which were needed at the time, easy to use and well documented.
The fact that there are now standard alternatives (albeit with a steeper learning curve) doesn't take anything away from that achievement, and nor should it.
Andy Metcalfe - Sonardyne International Ltd (andy.metcalfe@lineone.net) http://www.resorg.co.uk
"I used to be a medieval re-enactor, but I'm (nearly) alright now..."
|
|
|
|
|
Firstly, I never intended to "take anything away from" the MFC programmers' "achievements". In general, the MFC developers did an unbelievable job creating solid code that lived up to the goals set forth before development. What's wrong with MFC is simply in some design decisions, but even in this area one has to be aware that the "state of the art" in design has come about after the creation of MFC. I've met some of the MFC developers and would never say anything dispariging about them or their talents.
As for your remarks on "failing with a runtime error", I actually think that would have been a better result. I've had personal experience with developers being confused by the fact that CString works perfectly here, while their own classes crash and burn. Trying to explain to them why this is so generally leads to confusion and animosity to both the language and surprisingly MFC as well. A runtime error, which would be caught very quickly in testing, would have been better for these folks.
William E. Kempf
|
|
|
|
|
FYI, the CString design -- object is sizeof(char*), attribute points to char data and bookkeeping info (refcount, etc.()) is stored in the memory buffer (preceding the char data) is an intentional design on Microsoft's part. Remember seeing a comment about this some years back.
I'm not sure it's intentional only to "solve" the "printf problem".
If you think about how the data's used and locality of reference issues, CString's design is actually quite good.
- Howard
|
|
|
|
|
Well, I may be wrong, of course, but I've read posts in the past claiming the sole reason was to handle variadic functions such as printf correctly because of the large number of complaints given on this issue with earlier CString implementations. I believe these posts came from someone on the MFC development team at the time, but my memory could be faulty. The "locality of reference issues" being solved by this hack appear to be nothing more than a fortunate side effect. If that were the only goal then the internal pointer would point directly at the header info instead of at the string buffer, for instance.
William E. Kempf
|
|
|
|
|
You're not wrong. I too have read such posts. However they only tell part of the story
Anyone with a copy of "MFC Internals" by Shepherd and Wingo can read about the motivation for all of this. As Shepherd and Wingo report it (thanks to their MFC insider, Dean McCrory), the MFC team had several reasons to keep a CString object looking (in binary form) exactly like a TCHAR*.
1. To make TCHAR convenient and easy to use
2. To maintain backward compatibility with legacy MFC code which directly accessed the m_pchData member -- reference counting wasn't introduced to CString until MFC 4.0.
3. To get around the variadic function problem we're discussing here.
Frankly, even if you accept the last two reasons, I think reason #1 is a dubious reason at best. I don't see how ease of use is accomplished by keeping the binary layout to be a TCHAR* unless you're into doing stuff like memcpy on C++ objects -- not something I'd ever recommend.
The other two are sort-of-reasonable I guess, if you consider the time around which MFC was introduced and take into account the design guidelines of the library. We didn't have templates in VC back then. We still don't have them completely in VC6 nor will we probably in VC7. Plus there was no Standard C++ Library to speak of.
MFC has always had sort of a Wild-West feel to it compared to something like OWL or the Standard C++ Library. They wanted to maintain backward compatibility AND to let you customize it in just about anyway possible, without templates. So you get public member variables, and hacks like the one we see in CString. In short, they made compromises that were appropriate for the time and compatibility constraints.
I'm not trying to justify the implementation. I just like the class' API
Anyway, in order to satisfy the variadic functions requirement (#3), it wasn't strictly necessary to ensure that the class consisted of ONLY a single TCHAR*. All they really needed to do was ensure that the TCHAR* was the *very first item* in the binary layout of the class. That also would have satisfied the requirements of variadic functions.
However, given that the first item is going to be a TCHAR*, it makes no sense to have the remaining items (i.e. refcount, data length) stored the class itself. That would be redundant. They belong with the shared data so that's where they reside. You end up with a CString class the size of a single pointer.
Joe O'Leary
|
|
|
|
|
But it's the same sort of design decision as have BSTR == LPOLESTR which makes lots of "compilable" code possible that just doesn't work. BSTR is a lot like CString in that the implicit pointer aims at the middle of the data. It allows sloppy code which WILL bite you eventually.
Marc
|
|
|
|
|
Hi Marc,
I'm not sure I understand your point.
As long as you are going to have an implicit cast operator (and CString always will), how does CString's practice (of having the returned pointer point at the middle of the data) allow for any sloppier code than having it point at the head of the data?
If your sloppy code were to start doing things with that pointer that it should not -- such as iterating backwards past the first character or memcpy()-ing objects, you would get to get bitten no matter *what* class you were to use -- be it basic_string<>, CString, or SGI's Rope class.
Did you have an example in mind that would bite you with CString but not with another class?
(Mind you, I'm not speaking about the argument against the implicit cast operator itself. It's a valid one, but we've beaten that to death here.)
Joe O'Leary
|
|
|
|
|
The implicit cast thing is a problem because there is a NON-IMPLICIT piece of information (that the ref-count and such is just-before the pointer). Yep, we're saddled with that decision, and that's why I don't use CString anymore. I've been bitten too many times, a classic example is someone returning a CString from a function that is assigned to a LPCTSTR, then the CString temporary is destructed leaving the LPCTSTR pointing at garbage. That's a dead issue, but moving on...
In the case of a BSTR, it's "legal" to pass one to any function define as taking an LPOLESTR, but ALL of those functions explicitly assume the string terminates at the first NUL character, which is simply NOT TRUE for a BSTR (whose length is given by ::SysStringLen). This can cause BSTRs with embedded NULs all kinds of grief. Even Microsoft can't get it right. Take a look at the errors in _bstr_t and CComBSTR!
Moral of the story? If you have implicit information, you can't have implicit casts to primative types (like all pointers). Frankly I like the whole c_str() paradigm because it's EXPLICIT whose job it is to manage the memory et al.
Marc
|
|
|
|
|
Hi Marco,
I think I mistook your point. I see now (at least I think I do) that what you dislike is not so much the specific internal CString buffer layout itself (irrelevant I believe) but rather the fact that CString has a cast operator at all.
I think we're roughly in agreement here -- at least about the dangers involved if not the relative merits. I agree: implicit casts can be dangerous -- both for the classic reason you mention as well as the BSTR == LPOLESTR problem.
In particular I agree with you about the BSTR/LPOLESTR thing. Back when I was first doing COM, that used to cause me no end of problems. I don't know what they were thinking when the defined it that way.
As for MFC, I haven't used that much in years. I'm a middleware guy
Joe O'Leary
|
|
|
|
|