|
The MFC7 and ATL7 version of CStringT contain a couple of new members.
One is GetString(), which is equivelent to operator(LPCTSTR).
Another is 'Tokenize()', which you can call repeatedly to find tokens in a string.
Here is some sample code for the function, but note that this code is based on the actual MS implementation.
MYTYPE Tokenize( LPCTSTR pszTokens, int& iStart ) const
{
// This is based on the ATL7 CStringT::Tokenize
ASSERT( iStart >= 0 );
ASSERT( pszTokens );
LPCTSTR pszPlace = c_str() + iStart;
LPCTSTR pszEnd = c_str() + GetLength();
if( pszPlace < pszEnd )
{
int nIncluding = (int) _tcsspn( pszPlace, pszTokens );
if( (pszPlace+nIncluding) < pszEnd )
{
pszPlace += nIncluding;
int nExcluding = (int)_tcscspn( pszPlace, pszTokens );
int iFrom = iStart+nIncluding;
int nUntil = nExcluding;
iStart = iFrom+nUntil+1;
return( Mid( iFrom, nUntil ) );
}
}
// return empty string, done tokenizing
iStart = -1;
return( MYTYPE() );
}
|
|
|
|
|
Hi !
FYI i've already submitted the Tokenize function to the author (look backward in threads for my nickname).
Here it is if you want to integrate it to your StdString version.
MYTYPE Tokenize(PCMYSTR tok, int &nFirst) const
{
int nPrev;
if ( nFirst < 0 )
nFirst = 0;
if ( nFirst >= size() )
return MYTYPE();
nPrev = nFirst;
nFirst = this->find_first_of(tok, nPrev);
if (nFirst == npos)
nFirst = size();
return this->substr(static_cast(nPrev), static_cast(nFirst++ - nPrev));
}
|
|
|
|
|
Thanks for answering before. CStdString works great now, although I did end up using std::cin and std::cout instead of printf and scanf. The problem is that including the CStdString library causes the output exe file size to go from 220 KB to 1.76 MB.
I had the same problem when I tried to use MFC in a static library instead of a DLL. If I use CStdString with MFC in a dll, the file is small. So is this inescapable? If I use a CString-like library am I always going to end up with huge files? File size and dlls are the reason I wanted to get away from MFC.
|
|
|
|
|
First of all, understand that CStdString is not a library. it is a template and every bit of it lies in that header file. The problem is that CStdString builds upon the Standard C++ Library. Adding the Standard C++ Library (and therefore the C-Runtime Library) to your project is what increases the size. This increase is HUGE if you link statically to the CRT.
That is because when you link statically, ALL of the needed CRT code must be added to your EXE, as a copy. A far better option is to link dynamically to the DLL that holds the CRT. This, of course, requires that the appropriate DLL exists on the user's machine but that's a safe bet. Especially if you generate your code with VC6
To make your project link dynamically with the CRT in VC6 you change the settings this way:
Project (Menu) >> Settings (Item) >> C++ (Tab) >> Code Generation (Category) >> "Multithreaded DLL"
(for debug builds you would use "Debug Multithreaded DLL")
-Joe
|
|
|
|
|
Well, I created a new project and got rid of stdafx and that cut the filesize down to about half of what it was. Silly Microsoft Wizard put in something that I didn't want... It introduced a bug where my program is no longer finding the file to read, but I think that can be solved.
I just tried what you mentioned with the multithreaded DLL, it works! Back to 220 KB. Thanks. I suppose dlls are okay if this is a standard one that most Windows systems have.
|
|
|
|
|
All DLL issues can easily be solved with a decent install program like InstallShield. However I'll discuss what I know of the situation to give you a better understanding.
When you link dynamically with the CRT in a release build, you are adding a dependence on msvcrt.dll. If your program uses certain parts of the Standard C++ Library, you also add a a dependency on msvcp60.dll, I believe. This assumes you are building with VC6.
Do not worry about msvcrt.dll. I don't think it is even possible to buy a version of Windows these days that doesn't have it. The only major concern I might have about it is that a computer (on which you install your application) would have a very out-of-date, buggy version.
Visual Studio .NET apparently moves the CRT to msvcr71.dll. Don't ask me why. Now I'm guessing that any computer which has .NET on it must also necessarily have this DLL but I cannot be sure
Regardless, all of these DLLs are freely redistributable. Furthermore, somewhere in the on-line help there is an article which discusses all of these issues regarding the DLLs. Search for it and it will explain the situation far better than I.
-Joe
|
|
|
|
|
I was using CStrings with printf and prefer to use printf. The demo program crashes when I attempt to use printf("%s",strName) (Instruction Error at Memory Location...). Cout works fine. Is this a bug, something I'm probably doing wrong, or is it not intended to work with printf, like CStrings do?
|
|
|
|
|
This is not a bug. CString works with printf this way only due to a complete hack. They had complete control over the binary layout of the class and took advantage of special knowledge of the way printf works.
The solution is to call c_str() on the string
printf("%s", strName.c_str());
Alternately, you could or cast it to a const char*
printf("%s", (const char*) strName);
For more on this, scroll through the responses to my article. I have answered this question many times, most recently in answer to a post on March 25th.
I do not have control over the binary layout of basic_string so there is nothing I can do about it. The fact that it works with CString is an ill-advised hack.
-Joe
|
|
|
|
|
Yes, the way it works in CString is bogus. I've been correcting my code as you suggest, adding (const char*), but it's got me wondering what's optimal?
- (const char*) won't work for UNICODE.
- (const TCHAR*) or (LPCTSTR) are MS specific.
- (const wchar_t*) doesn't work. why not?
Is there a standard-ish type or define for a single portable character? I thought wchar_t was.
Tom
|
|
|
|
|
...is to call c_str(). That will work for everything and is very portable.
Rememember, however, you'll still have to specify the correct format specifier (%s or %S), depending on whether you're using a "thin", char-based string (derived from std::string) or a wide, wchar_t-based string (derived from std::wstring.
-Joe
|
|
|
|
|
The principal problem with std::basic_string<> is the memory management.
Example:
CStdString str("hello ! ");
str += "I don't know ";
str += "what this message means.";
Here, there is 5 memory operations:
- allocate a buffer of characters to hold the string "hello !".
- free the buffer.
- reallocate it and fill it with "hello ! I don't know ".
- free it.
- reallocate it and fill it with "hello ! I don't know what this message means.".
We can do better in memory management:
- we use a class named CStdString which don't support any concatenation operations.
- we use a class named CStdStringList which support concatenation (it is a list of TCHAR*s).
So, we can also write:
CStdStringList strlist("hello !");
strlist += "I don't know ";
strlist += "what this message means.";
|
|
|
|
|
On the contrary. What you have stated is not necessarily true at all. It depends upon the implementation of basic_string.
Consider the case of the implementation that comes along with Visual Studio .NET. This version has a fixed buffer size of 16 characters. Only if the string length exceeds this size does any dynamic memory allocation occur.
So using your example code with this particular implementation:
CStdString str("hello ! ");<br />
str += "I don't know ";<br />
str += "what this message means.";
There would NOT be any allocation or freeing of memory for either of the first two lines. Only for the third one.
This is far simpler and more portable than going to the trouble of writing and debugging your own string classes just to deal with such a relatively uncommon situation. New programmers do not have to learn how YOU tihnk a string class' interface should be, they can simply use the same basic_string interface they have always used with an optimized implementation that takes care of most of the common inefficiencies. Also, they can be far more confident of thoroughly debugged code as the user base for the basic_string<> implementation is guaranteed to be larger than the user base for your string class.
Furthermore, if memory allocation is a bottleneck in your program the thing to do is not to write your own string class. Instead you should write your own, specialized allocator<> template which optimizes away such problems. Supply this allocator as an argument to the instantiation of the basic_string (or to CStdStr) template. This is why the designers of the Standard C++ Library added the concept of allocators in the first place. It is whole lot easier and safer to use your own allocator<> than it is to try to use your own custom string class. Take my word for it, after 6 years of supporting this CStdString code base, there are few people more acutely aware of this fact than I.
Finally, the major performance bottleneck in most programs these days is not string manipulation. Even reasonably well-written code (with at least a passing thought given towards avoiding inefficient multiple concatenation steps such as those in the example) will have no problems with string-related memory allocation.
-Joe
|
|
|
|
|
No biggie, but I get the following warnings in VS7.1...
h:\Dev\Common\StdString.h(2907) : warning C4018: '>' : signed/unsigned mismatch ...
h:\Dev\Common\StdString.h(2910) : warning C4018: '>' : signed/unsigned mismatch
h:\Dev\Common\StdString.h(2914) : warning C4018: '<=' : signed/unsigned mismatch
So I added this pragma to the beginning of your file.
#pragma warning( disable : 4018 )
Thanks,
Tom
|
|
|
|
|
I think I have since corrected these warnings. If not, the way for me to fix them is not to use a #pragma. Instead I should apply the appropriate static_cast<> to the mismatch areas.
Please let me know if downloading the latest version solves your problem. If not I must fix it. You can get the latest version here:
http://www.joeo.net/code/StdString.zip
I'm using Visual Studio .NET (albeit an old version) but I am not seeing these warnings)
-Joe
|
|
|
|
|
Yes, I am using the latest version. The file modify date is 13May03, but internally the last revision is 2003-MAR-14.
|
|
|
|
|
Actually, I just downloaded the most recent version and am seeing the problems. I use VS .NET 2002 (the "original" version.) I am not sure if it has anything to do with the warning level setting?
"When a man sits with a pretty girl for an hour, it seems like a minute. But let him sit on a hot stove for a minute and it's longer than any hour. That's relativity." - Albert Einstein
|
|
|
|
|
I'm using VS 7.0 and I don't see these warnings, even with the latest version. Could you post the exact line numbers you get with the new version. No doubt it's just a static_cast<> needed somwhere but I'd like to be sure.
-Joe
|
|
|
|
|
When compling in BCB 6.0 there come the errors:
1.[C++ Error] StdString.h(2666): E2268 Call to undefined function 'ssnprintf'
2.[C++ Error] StdString.h(1576): E2227 Extra parameter in call to isspace(int)
I use the class as following:
1.
CStdString str;
int iID = 1000;
str.Format("%d", iID);
2.
CStdString str = " Good! ";
str.Trim();
What's wrong?Please help!Thanks!
|
|
|
|
|
There is nothing wrong with your code. The problem is in my preprocessor flags. I've got a check for _MSC_VER in there that doesn't belong and I've got to remove and test it. I added it in recently while incorporating some changes from someone else designed to make the code work on another platform. The problem with making such changes is that I am unable to then re-test the code on all other platforms -- leading to problems such as this one.
I'll post back here in a day or two when I fix it. Sorry.
-Joe
|
|
|
|
|
First, thx for this great piece of work !
I don't know if you plan to be strictly compliant to CString or if you plan to extend your class to new methods and functionalities.
Anyway I found useful to add it some stuff like :
- the Tokenize function of the ATL strings :
MYTYPE Tokenize(PCMYSTR tok, int &nFirst) const
{
int nPrev;
if ( nFirst < 0 )
nFirst = 0;
if ( nFirst >= size() )
return MYTYPE();
nPrev = nFirst;
nFirst = this->find_first_of(tok, nPrev);
if (nFirst == npos)
nFirst = size();
return this->substr(static_cast<mysize>(nPrev), static_cast<mysize>(nFirst++ - nPrev));
}
- and conversion stuff between primary types and strings :
operator const int() const
{
return atoi(c_str());
}
operator const unsigned int() const
{
return atoi(c_str());
}
operator const long() const
{
return atol(c_str());
}
operator const unsigned long() const
{
return atol(c_str());
}
operator const __int64() const
{
return _atoi64(c_str());
}
operator const unsigned __int64() const
{
return _atoi64(c_str());
}
operator const double() const
{
return atof(c_str());
}
CStrEx(int i)
{
Format("%d", i);
}
CStrEx(unsigned int ui)
{
Format("%u", ui);
}
CStrEx(long l)
{
Format("%ld", l);
}
CStrEx(unsigned long ul)
{
Format("%lu", ul);
}
CStrEx(__int64 i64)
{
Format("%i64d", i64);
}
CStrEx(unsigned __int64 ui64)
{
Format("%ui64u", ui64);
}
CStrEx(double d)
{
Format("%g", d);
}
Ok, I admit the constructors are notessential
|
|
|
|
|
BTW, I renamed your class to CStringEx for my own use
(sounds better to me )
but you keep all credits in the header comments don't worry
|
|
|
|
|
Hi,
I'm glad you got good use out of the class.
The main reason I don't add conversion operators and constructors such as the ones you mentioned is that they lead to unintended side effects. Read up on Scott Myers for more on this subject but the basic message is "avoid user-defined conversion operators whenever possible."
I've even had to fight the debate over the one user-defined conversion that I DID put in there -- operator const CT* -- which just calls c_str(). There is no such operator in basic_string due to the dangers I mentioned above. But I was so used to it from CString that I simply had to have it. The convenience is more than worth what is a minor risk, IMHO. Just be aware that some C++ purists do not like that.
Still the argument against conversion operators is a very sound one. In the very least, those constructors you mentioned should be declared with the "explicit" keyword.
Anyway at this point the class is almost 7 years old. As I have tried to keep it working on all platforms (Windows, Unix, Linux, Solaris, etc) I have basically reached the point where I do not want to add any more functions to it. It is tough enough to keep it working. In fact, right now I'm working on a fix for "billca" (see the recent feedback below).
But you are of course free to add or change anything you want. I'd still recommend against those operators and constructors, but hey, go crazy.
Please let me know if you have any problems.
-Joe
|
|
|
|
|
Joe, in your comments within the function ssvsprintf(PWSTR pW, size_t nCount, PCWSTR pFmtW, va_list vl), you say you'd like to hear about compile errors related to the various versions of vswprintf on the various platforms. In order to get a clean compile on HP-UX and LINUX, I changed:
#if (!defined(_MSC_VER) &&
!defined (__BORLANDC__) &&
!defined(__GNUC__) &&
!defined(__sgi)
to
#if (!defined(_MSC_VER) &&
!defined (__BORLANDC__) &&
!defined(__GNUC__) &&
!defined(__sgi) &&
!defined(HPUX1100)) ||
defined(LINUX)
and
#elif defined(__sgi)
to
#elif defined(__sgi) || defined(HPUX1100)
I don't know if these new definitions would be the right ones to be used generally, but they seem to work for my installation.
Bill
|
|
|
|
|
This is great stuff. And I have no way of knowing it since all I've got right now is a Windows box. I will definitely add this in to the preprocessor headers on the on-line version. You get a big credit in the header for this (and for all the other stuff you're helping out with here).
I'll try to coordinate with you off-line testing all the changes I've made once we get all these matters resolved and I'll post a new version soon.
Thanks!
-Joe
|
|
|
|
|
Joe, since CStdString doesn't claim to be exactly the same as CString and since CString of course doesn't even work at all on the UNIX platforms that CStdString does, I'm not sure the following should be considered a problem with CStdString, and I don't know if you'll want to change anything, but it certainly is an additional pitfall that one should be aware of when using CStdString in place of CString. Using a simple example, with UNICODE defined, say you code something like:
mystring1.Format(L"%s", (const wchar_t*)mystring2);
With CString, the %s (lower case) format specification is correct as it is with CStdString on Windows, but using CStdString and running on Solaris (and probably other UNIXs too), it needs to be %S (upper case). The underlying vswprintf function apparently expects it that way.
So for platform-independence, rather than have two versions of every Format call, I couldn't think of anything else but to leave the %s(s) and make the necessary adjustment inside StdString.h as follows:
In ssvsprintf(PWSTR pW, size_t nCount, PCWSTR pFmtW, va_list vl), after the #if !defined(_MSC_VER)..., I replaced
return vswprintf(pW, nCount, pFmtW, vl);
with
std::basic_string<wchar_t> pFmtW2 = pFmtW;
std::basic_string<wchar_t>::size_type index = 0;
while ((index = pFmtW2.find(L"%s", index)) != std::basic_string<wchar_t>::npos)
{
pFmtW2.replace(index + 1, 1, L"S");
index += 2;
}
return vswprintf(pW, nCount, pFmtW2.c_str(), vl);
I realize this is a bit of a kludge and you may not want to do anything like it in the real thing, but I thought you should know and I'd be interested in your thoughts.
Bill
|
|
|
|
|