|
In my opinion, while a lot of you guys out there pride yourself on writing 100 operations in one ine of code, good luck to the next guy (or you in 2 years) who has to figure out what the heck it is supposed to do.
|
|
|
|
|
I have about 9 years professional C++ experience, and I have to agree chaining operations like this is NOT easier to understand. It is certainly not easier to step through or set break points while debugging.
|
|
|
|
|
In practice, having to type MakeLower or MakeUpper on a separate line is an absolute pain when you work with strings a lot. One supposes that these are not functions you would generally wish to step into.
My solution is to have a class called CStringTools which just acts as a container for a collection of handy static string functions, including versions of MakeLower and MakeUpper which return the modified string. I use these functions *all the time*.
|
|
|
|
|
i'm not talking about 100 operations in 1 line of code.
this is about few obvious operations that can be chained like objStr.LTrim().RTrim().Replace(..) its common practice in other languages like java or C# and in alot of scripting languages.
---------------
Horatiu CRISTEA
|
|
|
|
|
Cleary you missed the tougue in cheek way that I meant it. To me, it might as well be 100 lines of code. It seems that every programmer has very different "dialect" when it comes to stringing operations together. What's obvious to you is NOT often obvious to another. That results in wasted time.
Therefore, in large projects, with shared coding responsibilites, please use "linear" coding. My 25 years of experience coding, and of hiring new coders tells me this is the best approach. So called 'elagant' coding means nothing to me if I am late on a project.
|
|
|
|
|
What folks are trying to point out here is that when you see:
objStr.LTrim();
objStr.RTrim();
objStr.Replace(..);
There is absolutely now doubt which step is done first when an inexperienced programmer reads the code. Further you don't have to read the entire line looking for where you put that blasted Replace(...) statement you're sure needs a little tweak, and a grep might not work because you might have 50 Replace calls in the same file. Sure, those of us who are more experienced aren't going to be slowed down "much" by chained code, but who needs to be slowed down at all? Chained statements are clearer. It's not just our oppinion. Books discussing maintainability of code have actual numbers based on real world projects. Chained code results in more mistakes, whether they be order precedence mistakes, not quite grasping everything that's done in a single line, etc.
These same books talk about limiting the use of the ternary operator (that's the : result = (bool) ? val1 : val2; type of line)due to similar issues. The reader of the code has to take a moment and THINK about whether val1 or val2 is used on a true condition. Further they have to sit back and verify if the data types realy make sense for both pieces, and if val1 and val2 contain there own complex processes, well, things get hairy fast. The same holds true for chaining opperations. If one or two take strings (such as the Replace call) you quickly find folks who will write:
objStr.LTrim().RTrim().Replace(objStr2.LTrim().RTrim().Replace(...));
and much worse. Giving the power means it will be used, and usually by your least experienced programmer which means more maintanance bugs for them and those that follow them. Those who argue against this are talking generally from personal bad experiences with chained code and often from knowledge we've gained in our reading.
Take our advice, don't do it any more than necessary.
|
|
|
|
|
It would be nice to have more operators, like +,+=,<,<=,>,>=.
|
|
|
|
|
Also I like and it is necessary
Does work?
CStr CStr::operator+( CStr &other )<br />
{<br />
CStr sTemp(this->m_pBuf);<br />
return sTemp.Append(other);<br />
}<br />
<br />
CStr CStr::operator+=( CStr &other )<br />
{<br />
return this->Append(other);<br />
}<br />
<br />
...<br />
|
|
|
|
|
You've said so already, STL-lovers should quit reading. Although I do agree with you on the fact that std::string isn't all that great in practice, especially when it comes to type-conversions using stringstreams, I believe a better approach would have been to wrap up an STL-string in your own class.
Comparing your implementation with std::string, we can probably agree that both aren't guaranteed to be thread-safe. There are, however, some features in std::string that your class is missing. Most important, when it comes to performance your implementation will be far behind both std::string and MFC's CString. It doesn't incorporate reference counting for assigning strings to different objects without modifying them. Even worse, it doesn't have special handling for short strings, i.e. a union of either an m_pBuf or a m_pShortBuf[WHATEVER_SEEMS_TO_FIT]. operator new[] isn't really all that spectactular when dealing with small-memory-allocations, so this would be an obvious optimization.
Apart from that, from a short glimpse at the source, there are some more or less severe bugs in your code:
* [Str.cpp:line 25]: by default, operator new throws an exception when it runs out of memory. Granted, MSVC defines a different [non-standard] behaviour, returning a NULL-pointer, unless you turn off MSVC-specific language extensions. At any rate, the correct code would be m_pBuf = new( std::nothrow ) TCHAR[ m_size ];
* [Str.cpp: line 38]: applying operator delete on an array results in undefined behaviour. I'm sure one or the other will start arguing, that it merely misses to invoke the [trivial] d'tors of those stored TCHAR's. The truth however is, that any compiler vendor can choose to produce any code they like to. Moreover, the behaviour can change from one revision to the next without further notice or even the need to document the behaviour. If you want to keep standard-conforming replace that with the correct code, i.e. delete [] m_pBuf;
* [Str.cpp: line 51]: _Alloc returns a BOOLean value. Since you are just ignoring it your code will crash at line 53 if it runs out of memory.
* Illegal symbol names: _Alloc, _Free, _Init, _PaddedSize, _CheckSize, _StrCopy, and __STR__EX_H. According to the ISO standard of c++ all symbol names containing a double underscore or starting with an underscore immediately followed by a capital letter are reserved for use by the implementation. You may call it nitpicking, but library code has to follow just about any rule there is -- or it won't make it to be a library ever.
* _CheckSize is misleading: The name of the method has no indication that it is actually performing a reallocation. Why not go with the STL-naming and call it reserve instead?
* missing exception specification: I know, few ever care about the extra cost of doing this. But hey, if this is supposed to be a library you should put in the extra effort and put as much documentation into the code as possible. At least mark the d'tor and member _Free with throw() to indicate that no exception is raised there.
* missing assertions: A library should use as many as necessary to enforce the correct usage of the class. Your implementation doesn't appear to have any. Does that really mean, a client cannot, under no circumstances use the class in an undesired way?
* #define's: are BAD! They pollute global namespace and especially INT_SIZE and the like aren't distinct enough to minimize the probability that someone using your code hasn't #defined them before or will #define them after your code. Why not make them const objects instead? If nothing else it's at least a bit more type-safe.
There are probably issues with the inner workings of the code, which I didn't have time to check. The class itself may work in your environment. Some of it is pure luck though. Especially the first 3 issues noted above, which I would grade as beginners' mistakes, would make me feel somewhat uneasy, if I was forced to use this class.
I hope this isn't putting the code in too bad a light. As it was mentioned before though, there are some rather strict requirements that have to be met when dealing with library code, which you may get away with in a typical application.
Regards,
.f
|
|
|
|
|
.:fl0yd:. wrote:
Comparing your implementation with std::string, we can probably agree that both aren't guaranteed to be thread-safe. There are, however, some features in std::string that your class is missing. Most important, when it comes to performance your implementation will be far behind both std::string and MFC's CString. It doesn't incorporate reference counting for assigning strings to different objects without modifying them. Even worse, it doesn't have special handling for short strings, i.e. a union of either an m_pBuf or a m_pShortBuf[WHATEVER_SEEMS_TO_FIT]. operator new[] isn't really all that spectactular when dealing with small-memory-allocations, so this would be an obvious optimization.
std::string is not specified to be ref-counted, and most implementations are moving away from that now, as the semantics are difficult to make correct and still obtain any performance benefit. COW needs to be replaced by move semantics (http://www.cuj.com/documents/s=8246/cujcexp2102alexandr/).
Also, the standard says nothing about employing any special handling for small strings. That would be a QoI issue, at best.
William E. Kempf
|
|
|
|
|
Yip, those are valid comments. When I was typing I edited the original post several times and both, ref-counting as well as special handling for small strings do not apply to std::string according to the ISO standard. They are present in may implementations, though, and MFC CString's do still have those features, although Alexandrescu raises some valid concerns about their usability.
Thanks for setting this straight.
|
|
|
|
|
<<<<<<<<
* [Str.cpp:line 25]: by default, operator new throws an exception when it runs out of memory. Granted, MSVC defines a different [non-standard] behaviour, returning a NULL-pointer, unless you turn off MSVC-specific language extensions. At any rate, the correct code would be m_pBuf = new( std::nothrow ) TCHAR[ m_size ];
>>>>>>>>
I changed it for the next version. Anyway malloc/free is used instead of new/delete and if it fails it will throw std::bad_alloc.
And I don't really care about out-of-memory situations, if memory is low there's not much to do anyway.
<<<<<<<<<
* [Str.cpp: line 38]: applying operator delete on an array results in undefined behaviour.;
>>>>>>>>>>
Not on MS's C++ implementation, no plans for portability anyway.
<<<<<<<<<<
* [Str.cpp: line 51]: _Alloc returns a BOOLean value. Since you are just ignoring it your code will crash at line 53 if it runs out of memory.
>>>>>>>>>>
Updated version throws std::bad_alloc.
<<<<<<<<<
* Illegal symbol names: _Alloc, _Free, _Init, _PaddedSize, _CheckSize, _StrCopy, and __STR__EX_H. According to the ISO standard of c++ all symbol names containing a double underscore or starting with an underscore immediately followed by a capital letter are reserved for use by the implementation. You may call it nitpicking, but library code has to follow just about any rule there is -- or it won't make it to be a library ever.
>>>>>>>>>>
It compiles, it works, I see no problem with it. How many C++ compiler are standards compliant? And take a look at the header files of the STL implementation from SGI.
<<<<<<<<<<
* _CheckSize is misleading: The name of the method has no indication that it is actually performing a reallocation. Why not go with the STL-naming and call it reserve instead?
>>>>>>>>>>
I'm not a native english speaker. Update it anyway.
<<<<<<<<<<
* missing exception specification: I know, few ever care about the extra cost of doing this. But hey, if this is supposed to be a library you should put in the extra effort and put as much documentation into the code as possible. At least mark the d'tor and member _Free with throw() to indicate that no exception is raised there.
>>>>>>>>>>
Updated, all methods that throw std::bad_alloc are marked with throw() in their declaration.
<<<<<<<<<<
* missing assertions: A library should use as many as necessary to enforce the correct usage of the class. Your implementation doesn't appear to have any. Does that really mean, a client cannot, under no circumstances use the class in an undesired way?
>>>>>>>>>>
Updated, though I'm not really into using assertions.
<<<<<<<<<<
* #define's: are BAD! They pollute global namespace and especially INT_SIZE and the like aren't distinct enough to minimize the probability that someone using your code hasn't #defined them before or will #define them after your code. Why not make them const objects instead? If nothing else it's at least a bit more type-safe.
>>>>>>>>>>
They are not always bad, without #defines how does MFC handle message maps?
Thanks for your comments.
soso
|
|
|
|
|
soso_pub wrote:
And I don't really care about out-of-memory situations, if memory is low there's not much to do anyway.
Maybe not much, but having the application silently crash is generally not accepted as the best choice
soso_pub wrote:
<<<<<<<<<
* [Str.cpp: line 38]: applying operator delete on an array results in undefined behaviour.;
>>>>>>>>>>
Not on MS's C++ implementation, no plans for portability anyway.
You might be surprised, but it is undefined even with MS's C++ implementation. Let me shed some light on what undefined behaviour means: The ISO Standard doesn't enforce any sort of restrictions on the code creation, i.e. any given implementation can choose freely as to what they do about this erroneous statement. MS happened to choose what you expected. But there are 2 more pieces of information that should make you feel uneasy about it: a) an implementation doesn't have to document the behaviour of choice. b) an implementation can choose to change the behaviour whenever the vendor decides to -- again, no need to document the change.
In addition, you may have more plans for portability than you expected, if you think about what portability actually means: changing the platform, which could be as subtle a change as installing a service pack for your current compiler.
soso_pub wrote:
<<<<<<<<<
* Illegal symbol names: _Alloc, _Free, _Init, _PaddedSize, _CheckSize, _StrCopy, and __STR__EX_H. According to the ISO standard of c++ all symbol names containing a double underscore or starting with an underscore immediately followed by a capital letter are reserved for use by the implementation. [...]
>>>>>>>>>>
It compiles, it works, I see no problem with it. How many C++ compiler are standards compliant? And take a look at the header files of the STL implementation from SGI.
It may just compile currently, on your machine, with your code, with your choice of libraries to use. Now if you say that this is all that is needed to make it compile in any scenario you have quite a bit of the road ahead of you.
How many C++ compilers are standards compliant? Currently, none. It's still a rather weak excuse for not obeying the rules of the language in your code, though.
I'm not sure what your reference to SGI's STL implementation is supposed to say. I assume that you found those 'illegal' symbol names there, too. That is, well, true. Don't forget though, that the STL is part of the C++ implementation and thus SGI is free to use those symbol names.
soso_pub wrote:
Updated, all methods that throw std::bad_alloc are marked with throw() in their declaration.
Damn, I wish C++ didn't screw up this detail. Declaring a method as throw() tells the client that it doesn't throw any exception. You should use throw( std::bad_alloc ) instead. If there is no exception specification explicitly declared a method can throw any exception. This one has upset a number of developers around the globe...
soso_pub wrote:
Updated, though I'm not really into using assertions.
Why? Seriously, I cannot find a valid explanation for not using assertions heavily. They don't slow down your code at all, but present vital information during debugging. Use them or shoot your own leg -- it's up to you.
soso_pub wrote:
<<<<<<<<<<
* #define's: are BAD! [...]
>>>>>>>>>>
They are not always bad, without #defines how does MFC handle message maps?
That is true. I can't think of an alternative for include guards. But in this specific case, I also cannot think of any valid reason that would justify to use a macro instead of a true constant. If you do, please elaborate.
MFC's message maps use macros, granted. The reason isn't so much that there aren't better alternatives but rather the weak compiler support for templates. MFC has been around for quite some time and back when it was designed template support was too weak to be even considered.
My statement that macros are BAD! was rather blunt, agreed. Keeping this advice in the back of my head, though, has forced me a number of times to think about alternatives, that usually turned out to be a lot more powerful and even more importantly way less dangerous. At any rate, the way you are using it could easily be called a mistake.
Anyway, I'm glad that you could use some of those comments to improve your code. Best of luck to you
.f
|
|
|
|
|
I've been also playing with string conversion. You might take a look at IoBind (http://iobind.sourceforge.net) which is a little library for conversion to/from string of objects.
Jonathan de Halleux.
|
|
|
|
|
I won't start into the "what is the best string class" discussion. Here's a little suggestion that could ease up your life:
You can use template functions to get rid of redundant functions such as Set.
#include <iostream>
#include <sstream>
using namespace std;
class A
{
public:
template<typename T>
void set(T const& t)
{
std::ostringstream output;
output<<t;
m_str = output.str();
};
std::string m_str;
};
int main(int argc, char* argv[])
{
A a;
a.set(1);
a.set(2.0f);
a.set(2.0);
a.set("test");
return 0;
}
Jonathan de Halleux.
|
|
|
|
|
God, please don't add this suggestion. Not only is ostringstream horrifically slow for this application, but why add all the extra baggage of iostreams just for type conversions.
Tim Smith
I'm going to patent thought. I have yet to see any prior art.
|
|
|
|
|
I have always been frustrated by the limitations of the Find/ReverseFind functions of the MFC CString class. What if you need to find the second occurence of a given substring ? Need to get the first one, then cut your string by using left/right/mid, then call Find/ReverseFind again. Distressing...
So why not a couple a advanced finding functions to get immediately the Nth occurence ?
MyString.FindNth(_T("ABC"), 5); // Find the 5th occurence of "ABC" in MyString
or perhaps like in the MFC CList objects:
POSITION Position = MyString.FindFirst(_T("ABC"));<br />
<br />
while (Position != NULL)<br />
{<br />
int index = MyString.FindNext((_T("ABC"), Position);<br />
}
|
|
|
|
|
Im not usually one to rush to defend MFC, but both the Find and ReverseFind functions take an offset parameter - you start from 0 to find the first string, then use this result to feed into the next Find (making sure to move onto the next position or you'll find the same string again!)
Admittedly a FindNth would be nicer, but there's definatley no need to be chopping strings
|
|
|
|
|
I already sent a new update to codeproject which supports both FindNth & ReverseFindNth. In a few days should be up.
Thanks for your suggestions, if you have others please let me know.
soso
soso
|
|
|
|
|