Identically Named C++ Variable in Inner Block Hides Like Named Variable in Outer Block

David A. Gray

2.50/5 (4 votes)

1 Jun 2015CPOL3 min read

11K

Variable scope in C++ can be devastatingly subtle, as in this function.

Introduction

Several weeks ago, when I was writing and testing a function for use in a library, I inadvertently gave a variable inside a block the same name as a variable defined in the containing block. Initially stunned, I soon realized my error, and understood why the values I saw in the watch window changed suddenly when the execution path entered the inner scope block.

Background

Since I have written more code in C than C++ in the last few months, I had grown accustomed to the compiler emitting a fatal error when a name is reused in the same function.

C functions have a single namespace. All local variables must be defined at the top of a function, and they have function scope. Defining a variable below the first executable statement that isn't an initializer is a fatal syntax error.
C++ functions have a namespace that includes all variables defined at that level. If the function contains blocks (Anything enclosed in braces is a block.), each block acts like a subsidiary namespace that inherits the names defined in each containing block, up to the block that delimits the function body.

Nowever, if the outer and inner blocks both define a variable named Foo, the variable Foo defined in the inner block hides the outer Foo until execution passes the closing brace.

Demonstration of Problem and Its Solution

The first statement in the body of function FB_ReplaceW defines lngFoundPos as a long integer, and initializes it to UNICODE_STRING_MAX_CHARS (32767). Things get really interesting as execution enters the do/while block, in which the first statement defines another long integer, which it also names lngFoundPos, and assigns the value returned by function StrIndex_P6C to it. If lngFoundPos is nonzero, a new value for lngTCharsToCopy is computed, memcpy is invoked to copy a substring, and lngTCharsToCopy is added to lngOutPos, the third variable defined with function scope.

When execution reaches the end of the do/while block, I had a brain teaser on my hands. Which lngTCharsToCopy does the while clause evaluate? No fair peeking at the answer below.

C++

LPTSTR __stdcall FB_ReplaceW
(
    LPCTSTR plpStrData ,
    LPCTSTR plpToFind ,
    LPCTSTR plpToReplace ,
    PUINT   puintNewLength
)
{
#define UNICODE_STRING_MAX_CHARS    32767
#define BUFFER_BEGINNING_P6C        0
#define STRLEN_EMPTY_P6C            0
#define NONE_P6C                    0
#define STRPOS_FOUND_P6C            1
#define TRAILING_NULL_ALLOWANCE_P6C 1

    long    lngFoundPos             = UNICODE_STRING_MAX_CHARS ;
    long    lngInPos                = BUFFER_BEGINNING_P6C ;
    long    lngOutPos               = BUFFER_BEGINNING_P6C ;
    long    lngLenToRepl            = StringIsNullOrEmptyWW ( plpToReplace )
                                      ? STRLEN_EMPTY_P6C
                                      : _tcslen ( plpToReplace ) ;
    long    lngInStrLen             = _tcslen ( plpStrData ) ;
    long    lngLenToFind            = _tcslen ( plpToFind ) ;
    long    lngTCharsToCopy         = STRLEN_EMPTY_P6C ;

    do  // while ( lngFoundPos > STRLEN_EMPTY_P6C )
    {   // The loop constraint must test explicitly for nonzero.
        long lngFoundPos            = StrIndex_P6C ( ( plpStrData + ( LONG_PTR ) lngInPos ) ,
                                                     plpToFind ) ;

        //  --------------------------------------------------------------------
        //  There are two distinct conditions to evaluate.
        //
        //  1) Was substring plpToFind found?
        //  2) If so, are there intervening characters to copy?
        //
        //  if ( lngFoundPos ) covers the first test, and the second, which is
        //  skipped unless the first condition is true, is evaluated by the next
        //  statement, if ( lngFoundPos > STRPOS_FOUND_P6C ).
        //
        //  Since positions are ordinals, StrIndex_P6C returns the position of a
        //  substring as a human would see it, rather than as an offset. Hence,
        //  a returned value of +1 means that the next match was immediately
        //  found. This also means that the position where the match was found
        //  must be deducted to determine the number of intervening characters.
        //  --------------------------------------------------------------------

        if ( lngFoundPos )
        {   // Substring found.
            lngTCharsToCopy = lngFoundPos - 1 ;

            if ( lngTCharsToCopy )
            {
                //  ------------------------------------------------------------
                //  For long strings, memcpy is significantly more efficient
                //  than any string copy function, because it copies the word
                //  aligned portion of the string a machine word at a time. All
                //  string copying routines are limited to copying the text one
                //  character at a time. Hence, memcpy can be 2 to 4 times
                //  faster.
                //
                //  Computing offsets is a bit counterintuitive. Its signature
                //  indicates that source and destination are void pointers.
                //
                //      void *  __cdecl memcpy(void *, const void *, size_t);
                //
                //  Nevertheless, if your actual arguments include address math,
                //  memcpy insists on them being cast to a type with a known
                //  size (e. g., LPTSTR). The reason for this becomes clear when
                //  you view the assembly code generated by the call to memcpy,
                //  and it affeccts the way the offset formulas must be written.
                //
                //  The generated machine code takes into account the size of
                //  the specified type. For example, the size of a LPTSTR is the
                //  width of a TCHAR in the character set of the current trans-
                //  lation unit. While this simplifies coding the offset, it is
                //  a trap for the unwary, because the offset is multiplied by
                //  sizeof (cast) (e. g, sizeof (TCHAR) under the hood, as
                //  demonstrated in the assembly code emitted to implement the
                //  following call to memcpy.
                //
                //      mov eax, DWORD PTR _lngTCharsToCopy$[ebp]
                //      shl eax, 1
                //      push    eax
                //      mov ecx, DWORD PTR _lngInPos$[ebp]
                //      mov edx, DWORD PTR _plpStrData$[ebp]
                //      lea eax, DWORD PTR [edx+ecx*2]
                //      push    eax
                //      mov ecx, DWORD PTR _lngOutPos$[ebp]
                //      mov edx, DWORD PTR _rlpScratchBuff$[ebp]
                //      lea eax, DWORD PTR [edx+ecx*2]
                //      push    eax
                //      call    _memcpy
                //
                //  Two instructions illustrate my point.
                //
                //      lea eax, DWORD PTR [edx+ecx*2]
                //      lea eax, DWORD PTR [edx+ecx*2]
                //
                //  In both cases, register ECX contains the offset (lngInPos
                //  and lngOutPos, respectively).
                //
                //  In contrast, the count of bytes to copy (the third argument)
                //  is always taken at face value, as shown below.
                //
                //      mov eax, DWORD PTR _lngTCharsToCopy$[ebp]
                //
                //  Hence, to copy a given number of characters from a string,
                //  the character count must be explicitly multiplied by the
                //  width of a TCHAR, (preprocessor variable TCHAR_SIZE_P6C) as
                //  shown below.
                //
                //      mov         eax, DWORD PTR _lngTCharsToCopy$[ebp]
                //      shl         eax, 1
                //
                //  The second instruction (shl eax, 1) is a very efficient way
                //  to multiply an integer of up to 1 billion or so by two.
                //
                //  The original version of this routine incorrectly assumed
                //  that memcpy returned NULL if it failed. However, since
                //  it doesn't, I eliminated the test. Hence, the associated
                //  status code, P6STRINGLIB_LSTRCPYN_ERR_P6C, is no longer
                //  used by this routine.
                //
                //  Since it predates the TcharsToBytesP6C macro, it used an
                //  error prone hard coded mathematical expression. Although
                //  technically correct as it was originally written, I replaced
                //  the expression with the macro.
                //
                //  2013/05/12 - Especially in the case of strings, although it
                //               is very fast, memcpy is a tad hazardous. At the
                //               cost of a few machine instructions and a stack
                //               frame, I am substituting SafeMemCpyTchars_WW,
                //               which guarantees that the supplied buffer has
                //               enough room to copy the string and its terminal
                //               null character.
                //
                //  2015/03/21 - Since this routine is private, and is intended
                //               for use with strings that fall far short of the
                //               4097 character capacity of the output buffer,
                //               and a key design goal is avoidance of the heap,
                //               this routine reverts to the original design,
                //               directly calling memcpy. Since it is already
                //               worked out and tested, I decided to leave the
                //               new "safe" method as a comment.
                //  -----------------------------------------------------------

                memcpy ( ( LPTSTR )  m_lpFBReplaceBuff + ( LONG_PTR ) lngOutPos ,
                         ( LPCTSTR ) plpStrData        + ( LONG_PTR ) lngInPos ,
                         TcharsToBytesP6C ( lngTCharsToCopy ) ) ;

                lngOutPos       += lngTCharsToCopy ;
            }   // if ( lngTCharsToCopy )

            if ( lngLenToRepl )
            {
                memcpy ( ( LPTSTR )  m_lpFBReplaceBuff + ( LONG_PTR ) lngOutPos ,
                         ( LPCTSTR ) plpToReplace      + ( LONG_PTR ) lngInPos ,
                         TcharsToBytesP6C ( lngLenToRepl ) ) ;

                lngOutPos       += lngLenToRepl ;
            }   // if ( lngLenToRepl )

            lngInPos            =   lngInPos
                                  + lngFoundPos
                                  + lngLenToFind
                                  - TRAILING_NULL_ALLOWANCE_P6C ;
        }   // TRUE block, if ( lngFoundPos )
        else
        {
            lngTCharsToCopy     = lngInStrLen != lngInPos
                                  ? lngInStrLen - lngInPos
                                  : NONE_P6C ;

            if ( lngTCharsToCopy )
            {
                memcpy ( ( LPTSTR )  m_lpFBReplaceBuff + ( LONG_PTR ) lngOutPos ,
                         ( LPCTSTR ) plpStrData        + ( LONG_PTR ) lngInPos ,
                         TcharsToBytesP6C ( lngTCharsToCopy ) ) ;
            }   // if ( lngTCharsToCopy )
        }   // FALSE block, if ( lngFoundPos )
    }   while ( lngFoundPos > STRLEN_EMPTY_P6C ) ;

    return m_lpFBReplaceBuff ;
}   // FB_ReplaceW

I discovered the hard way that the outer variable is evaluated, since the end of the inner block is the closing brace,. The result was an infinite loop, because lpFoundPos is initialized, and nevver changes thereafter.

The solution was obvious and simple. The first statement in the buggy block was as follows.

C++

long lngFoundPos            = StrIndex_P6C ( ( plpStrData + ( LONG_PTR ) lngInPos ) ,
                                             plpToFind ) ;

Eliminating the first keyword (long) keeps the original lngFoundPos in scope, allowing the loop to stop when it should, rather tnan run off into deep space (high memory, actually). Consolidating the statement into the if statement that followed it in the original code, simplifying the while expression, and initializeing lpFoundPos to NULL (zero) yields a working loop that looks like this.

C++

    LPTSTR  lpFoundPos          = NULL ;

    ...

    if ( lpFoundPos = _tcsstr ( lpInPos , plpToFind ) )

        ...

    }   while ( lpFoundPos ) ;

Points of Interest

Although the role of braces as scope boundary markers is familliar to me, because other languages that borrowed heavily from C++ exhibit the same behavior, the example made crystal clear that the braces form a Chinese wall around the code that they enclose. Any variable defined inside the block doesn't exist until execution passes the opening brace, and it ceases to exist the instant execution passes the closing brace. Since the while clause lies outside the braces, it can't use any variable that was defined inside them, even if a like named variable exists in its scope. Technically, they are two different variables.

Numerous other languages follow these rules, or something very close to them. I know of the following languages, and I am certain that this list is far from exhaustive.

C#
Perl
Java
Javascript

Other popular languages that I suspect follow the same rules include Python, PHP, and Pascal.

History

Monday, 01 June 2015, Initial Publication

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)