(untagged)

Aggressive Optimizations for Visual C++

Todd C. Wilson

0.00/5 (No votes)

26 Oct 2002

Save time and space in your release builds - fight bloatware!

Article:

Shortly after VC6 came out, I was rebuilding a program into separate DLL modules, and noticed that the sum of the sizes of these release built DLLs were a lot higher than I expected - almost an additional two megabytes larger than the original single EXE. So I went back and checked the output size of the EXE from what Visual C++ 5.0 was producing versions what 6.0 was producing - the 6.0 version was much larger, even with all the project optimizations turned on.

As it turns out, both VC5 and VC6 default to a code generation method that does not produce the smallest code possible, even when you have that option turned on in the project settings. In addition, VC6 and VC7 use a different padding value that VC5 - 4k vs. 512 bytes. This causes each section (.data, etc) to be rounded up to the next 4k boundary, which causes excessive file bloat.

The reason for the 4k rounding in VC6 seems to be because of the Win98 file tuning tool - it likes programs that are on the 4k boundary, since they fit nicely in the x86 virtual memory page. More on why this is not a stellar idea later.

Rather than messing around with a bunch of project settings, I've provided a nice and simple header file that you can include in your Stdafx.h (or anywhere - this is not MFC-only stuff). It works for both VC5 and VC6, it works marginally for VC7 (.NET), and may work for VC4 - haven't tried it there. It only kicks in during Release builds, so it's safe to leave in for debug builds, too.

The header tells the compiler to use certain optimization settings that removes frame pointers in the source code, which saves space and time. It then has the linker merge the .data (your text strings, constants, tables, etc), .text (where you code is), the .rdata (re-only data - consts and etc - see note), and the .reloc (relocation data) into a single unit . This cuts down on space taken by the rounding-up of these areas, and is especially noticeable with small dlls and CPL programs. The final twist is to tell VC6 to behave like VC5 and use 512 byte padding instead of 4k, which further shrinks the output. The header file is pretty well commented, and can be dropped into almost any program. I've used this in both small and large applications, and it definitely helps with the output. For example, I got a 93k exe down to 52k with this, and a control panel applet down to 4k from 33k (see RRLoginV3 for an example of this savings). Larger programs do not seem (from a filesize standpoint) to benefit, since the space saved is smaller compared to the larger file (40k from a 1600k exe doesn't impress many people). Loading times are faster, however.

When using this header, please be sure to fully test your release builds before shipping them. Heck, test them without the header too just to prove that it's not causing the problem (which I doubt that it is - four years, fifteen major projects I used it with, at least 21,000 other people using it - no problems). Some code that works fine under Debug will break under Release - this is due to Microsoft's optimizations, and is a know problem with all optimizers on all platforms, and coding in general - amazing how many time we don't check for null pointers or bad data before we use something. Debug will zero things out for us, set them to know values (0xCC etc), where as Release it's random. Generally, if you are getting continual GPF's when using this header, either stop using or else track down the line in your code that is causing it.

There are some tradeoffs, of course. Linking is slower due to the merging of the data segments into one, and in general, compressing the file will not be any better than before (this is because the empty space is removed before compression, where as before, it was hidden by compression). For me, these are reasonable tradeoffs, since I do not make release builds every day, and any space saved in the exe generally means a smaller download for my users, and definitely less memory and space used on the user's system!

Notes:

Merging the .rdata with static MFC will almost always result in a larger EXE. This also seems to affect things when you are mixing static libs (either 3rd-party or your own in-house stuff) with MFC, linking it static or not. If you really want to merge the .rdata section with the rest, define _MERGE_RDATA_ in your project or before including AggressiveOptimize.h header.

Why Not

The argument can be made that doing this is a waste of time, since the "zero bytes" will be compressed out in a zip file or install archive. Not really - it doesn't matter if the data is a string of zeroes or ones or 85858585 - it will still take room (20 bytes in a zip file, 29 bytes if only *4* of them 4k bytes are not the same) and time to compress that data and decompress it. Also, 20k of zeros is NOT 20k on disk - it's the size of the cluster slop- for Fat32 systems, 20k can be 32k, NTFS could make it 24k if you're just 1 byte over (round up). Most end users do not have the dual P4 Xeon systems with two gigs of RDram and a Raid 0+1 of Western Digital 120meg Special Editions that all worthy developers have (all six of us), so they will need any space and LOADING TIME savings they will need; taking an extra 32k or more out of your end user's 64megs of ram on Windows 98 is Not a Good Thing.

.NET

With Visual Studio .NET, aka, VC7, this header does not help as much as it does under VC6 - this is because Microsoft is not allowing the compiler options to be set from within the source code like before. You will have to add in the switches yourself - "/ignore:4078 /RELEASE /LTCG:NOSTATUS" in the Linker Command Line options, and "/GL /opt:nowin98" in the C++ Command Line Options. You should also add in "/GA" into your .EXE project's C++ Command Line Options, too. These will turn off the merging warnings, force release builds on, and perform whole code optimizations, in addition to removing the padding. The /GA option will turn on Windows Application optimization - only do this for EXE's, not DLL's.

VC7 simply does not produce as tight as code as VC6 and VC5 does. The exact same code is padded out more, and uses bigger constructs. Benchmarking is hard on the Uber Box, so while this VC7 code might run faster, it's hard to prove it - perhaps on a slower machine where the runtimes would be more spread out. However, you generally want smaller code, in an effort to put as much into the CPU's cache as possible - keep it from hitting slower ram.

Disassembling

Interestingly, and I feel, an added bonus, once the .TEXT segments have been merged with the .DATA segments, the code will not longer be able to be disassembled by DUMPBIN (try it! DUMPBIN /disasm filename.exe) or WinDisasm or any other disassembly tool; you can still hack at it with SoftICE and the like, of course. Credit must be given to G�zim Pani <gpani@siu.edu> for discovering this and asking "why?".

Now, as to an explanation - as I see it. Code is supposed, by historic default, live within the .TEXT segment (why not .CODE? Good question). The /merge:.text=.data line causes the .TEXT and .DATA segments to be merged into .DATA, like this (summarized):

Application (.EXE)

Dynamic Link Lib (.DLL)

C:>DUMPBIN Crc32.exe
    Dump of file Crc32.exe
    File Type: EXECUTABLE IMAGE
        Summary
             2000 .data
             1000 .rdata
             1000 .rsrc

C:>DUMPBIN CBase.dll 
    Dump of file CBase.dll
    File Type: DLL
        Summary
             4000 .data
             4000 .rdata
             1000 .reloc
             1000 .rsrc

Neither of these files will disassemble, since there is no .text segments. Using DUMPBIN /header on the CRC32.exe file shows two important items: first, the entry point is at 38DF RVA (rva is an address into the relocation virtual address). This is squarely in the 2nd segment, .DATA, which starts at 2000 virtual address, and is 1CE8 long. What this means is that if you have some program that is hard-coded to expect raw code to be living in the .TEXT segment (such as the dissemblers!), then it's going to fail. As a test, I tried Shrinker from Blink-Inc (this is a runtime program compressor, that puts your dlls/exes in a runtime wrapper to compress/decompress in memory transparently to the program & user; for a freeware open source version, try UPX), and it worked fine. So unless something is hacking around with the .TEXT segment explicitly, then there is no problems. Since the only thing that I know of that does this is code modification tools and viruses, the question arises - does this provide any sort of anti-virus protection? Or does it in do the opposite, make it worse, by causing the virus scanner to fail? Oddly enough, neither Symantec nor McAfee would deem us worthy to answer our email - McAfee went as far as to demand that we subscribe to their anti-virus service first! I personally think that this will not cause a problem with the scanners, since they are looking for signatures - patterns within the files, which would be independent of how the program was put together. False positives always happen. However, this might not be true for the virii - if they patch the first .TEXT segment, they will fail. But if they walk the header and patch the entry point (which is how I think they operate, but who knows these days), then this will matter not to them. But since this is pure speculation (but with 20-some odd years of writing code to back it up), until and if we get a response from some anti-virus vendor, that is just a guess.

This article and source code are copyrighted � 1999-2002 by Todd C. Wilson (tcw@nopcode.com). No reproduction of this article may be made without proper clearance from the author. Free use of the source as described in the source files is allowed, but may not be claimed as your own work. You may not re-publish this article nor the attached files on any other web site or medium without prior permission; you may refer to NOPcode.com as to where to get it. You may, of course, use this in your own projects. If you are using this in your projects or example code and would like to let the world know, drop us a line!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here