Mixing SetEnvironmentVariable
and getenv
is asking for trouble, as we recently found to our dismay (and exasperation!). It took some serious debugging to figure out the actual problem – let’s see if you can figure it out.
Here’s some C++/CLI code that uses getenv
to read the value of an environment variable and print it to the console.
public ref class EnvironmentVariablePrinter
{
public:
void PrintEnv(String ^key)
{
IntPtr ptr = Marshal::StringToHGlobalAnsi(key);
const char *ckey = (const char *)ptr.ToPointer();
const char *cval = getenv(ckey);
string val = cval == NULL ? "" : string(cval);
cout << "Key is " << ckey << " and value is " << val;
Marshal::FreeHGlobal(ptr);
}
};
PrintEnv
merely converts the .NET string
into an unmanaged one and calls getenv
, printing the returned value to the console.
And here’s some C# code that tests the class.
class Test
{
const string key = "MyKey";
public static void Main()
{
Environment.SetEnvironmentVariable(key, "MyValue");
PrintValue();
}
private static void PrintValue()
{
EnvironmentVariablePrinter er = new EnvironmentVariablePrinter();
er.PrintEnv(key);
}
}
The code uses System.Environment.SetEnvironmentVariable
to set the value of the variable and then calls the C++/CLI code to verify that it prints the correct value. And of course, being written in different languages, the two pieces of code must reside in different projects, say CPlusPlusLib.dll and ConsoleApplication.exe, with the latter referencing the former.
No surprises here - this works as expected and prints “Key
is MyKey
and value
is MyValue
”.
However, a seemingly harmless change breaks the code big time.
class Test
{
const string key = "MyKey";
public static void Main()
{
Environment.SetEnvironmentVariable(key, "MyValue");
EnvironmentVariablePrinter er = new EnvironmentVariablePrinter();
er.PrintEnv(key);
}
}
All I've done is inlining of PrintValue
, yet running this code prints “Key
is MyKey
and value is” – getenv
is now returning NULL
instead of “MyValue
”.
It gets even more interesting.
class Test
{
const string key = "MyKey";
public static void Main()
{
Environment.SetEnvironmentVariable(key, "MyValue");
EnvironmentVariablePrinter er = null;
PrintValue();
}
private static void PrintValue()
{
EnvironmentVariablePrinter er = new EnvironmentVariablePrinter();
er.PrintEnv(key);
}
}
This doesn't work either – the mere declaration of EnvironmentVariablePrinter
inside Main
makes getenv
return NULL
for “MyKey
”. This can't be good, can it?
Actually yes, because that is a valuable clue – it means the change in behavior has something to do with JITting. As long as all code that references EnvironmentVariablePrinter
is in a separate method, everything works fine (in debug mode, atleast). Things start going south when any such code is in Main
itself.
What would the JITter do differently in the two scenarios? Load the assembly containing EnvironmentVariablePrinter
(CPlusPlusLib.dll) at different times, of course. When all code referencing EnvironmmentVariablePrinter
is inside PrintValue
, it will have to load the DLL only when JITting PrintValue
, whereas in the other case, it will have to load it when JITting Main
. JITting Main
obviously occurs much before JITting PrintValue
, so DLL load time (relative to other code) is one big difference between the two scenarios that occurs because of JITting.
Why would loading CPlusPlusLib.dll a little early make getenv
return NULL
?
To understand that, you'll first have to know how the getenv
function works. Windows has APIs to set and get environment variables (SetEnvironmentVariable
/GetEnvironmentVariable
), and the .NET method P/Invokes into the Windows API to set and get values. getenv
, on the other hand, is a CRT function, and does not delegate to the Windows API. Instead, the CRT gets all environment variables and their values when it is starting up (using the GetEnvironmentStrings
Windows API), and copies them into its own data structures (MSVCR80!environ
). getenv
then works on the copied data from then on.
Now do you see the problem? When CPlusPlusLib.dll is loaded early (when JITting Main
), the CRT also gets loaded as one of its dependencies, and the startup code that copies environment variables runs right away. At that point, Main
hasn't even been JITted yet, so there’s no way our call to System.Environment.SetEnvironmentVariable
could have run by that time. And when it actually runs, it’s too late – the CRT environ block would have been updated much earlier, and calling the Windows APIs SetEnvironmentVariable
wouldn't have any effect on the cached values. When getenv
runs, it looks in the cached values and returns NULL
.
It’s easy to see why it works in the first case now – CRT loading occurs when JITting PrintValue
, and that occurs after our call to SetEnvironmentVariable
has executed. Which means that when it calls GetEnvironmentStrings
as part of startup, it gets the variable (and its value) that we just set.
Nasty, ain’t it? The actual scenario was a lot more messy – things suddenly stopped working when we linked to a DLL ported to VS2008. We actually figured the problem backwards – we first saw that the CRT load time was different, theorized how getenv
works, verified the theory by stepping through the assembly code and looking at the environ block, and once we realized the problem, figured out what was causing early loading of the CRT. Windbg
was awesome for debugging this - things would have been very difficult if not for sxe ld:MSVCR90
and x MSVCR80!environ
.
The fix was rather simple - in our case, we merely had to move code that set environment variable before CRT load. There’s another twist though; mscorwks.dll, which is the heart of the CLR, loads MSVCR80
when it loads, and you can't set your environment variables before that, not from managed code anyway. Fortunately, in our case, the getenv
call is from a library that links to MSVCR90
, so as long as we set the environment variable before that version of the CRT loads, we're good to go. Until the CLR gets linked to MSVCR90
, anyway :).