Download header file - 1 Kb
Contents:
Shortly after VC6 came out, I was rebuilding a program into separate DLL modules, and noticed that
the sum of the sizes of these release built DLLs were a lot higher than I
expected - almost an additional two megabytes larger than the original single
EXE. So I went back and checked the output size of the EXE from what Visual
C++ 5.0 was producing versions what 6.0 was producing - the 6.0 version was much
larger, even with all the project optimizations turned on.
As it turns out, both VC5 and VC6 default to a code generation method that
does not produce the smallest code possible, even when you have that option
turned on in the project settings. In addition, VC6 and VC7 use a different padding
value that VC5 - 4k vs. 512 bytes. This causes each section (.data, etc) to be
rounded up to the next 4k boundary, which causes excessive file bloat.
The reason for the 4k rounding in VC6 seems to be because of the Win98 file
tuning tool - it likes programs that are on the 4k boundary, since they fit
nicely in the x86 virtual memory page. More on why this is not
a stellar idea later.
Rather than messing around with a bunch of project settings, I've provided a
nice and simple header file that you can include in your Stdafx.h (or anywhere -
this is not MFC-only stuff). It works for both VC5 and VC6, it works
marginally for VC7 (.NET), and may work
for VC4 - haven't tried it there. It only kicks in during Release builds, so
it's safe to leave in for debug builds, too.
The header tells the compiler to use certain optimization settings that
removes frame pointers in the source code, which saves space and time. It then
has the linker merge the .data (your text strings, constants, tables, etc),
.text (where you code is), the .rdata (re-only data - consts and etc - see note),
and the .reloc (relocation data) into a single unit . This cuts down on space
taken by the rounding-up of these areas, and is especially noticeable with small
dlls and CPL programs. The final twist is to tell VC6 to behave like VC5 and use
512 byte padding instead of 4k, which further shrinks the output. The header
file is pretty well commented, and can be dropped into almost any program. I've
used this in both small and large applications, and it definitely helps with the
output. For example, I got a 93k exe down to 52k with this, and a control panel
applet down to 4k from 33k (see RRLoginV3
for an example of this savings). Larger programs do not seem (from a filesize
standpoint) to benefit, since the space saved is smaller compared to the larger
file (40k from a 1600k exe doesn't impress many people). Loading times are
faster, however.
When using this header, please be sure to fully test your release builds
before shipping them. Heck, test them without the header too just to prove
that it's not causing the problem (which I doubt that it is - four years,
fifteen major projects I used it with, at least 21,000 other people using
it - no problems). Some code that works fine under Debug will break under
Release - this is due to Microsoft's optimizations, and is a know problem with
all optimizers on all platforms, and coding in general - amazing how many time
we don't check for null pointers or bad data before we use something. Debug will
zero things out for us, set them to know values (0xCC etc), where as Release
it's random. Generally, if you are getting continual GPF's
when using this header, either stop using or else track down the line in your
code that is causing it.
There are some tradeoffs, of course. Linking is slower due to the merging of
the data segments into one, and in general, compressing the file will not be any
better than before (this is because the empty space is removed before
compression, where as before, it was hidden by compression). For me, these are reasonable tradeoffs, since I do not make release
builds every day, and any space saved in the exe generally means a smaller download for my
users, and definitely less memory and space used on the user's system!
Merging the .rdata with static MFC will almost always result in a larger EXE.
This also seems to affect things when you are mixing static libs (either
3rd-party or your own in-house stuff) with MFC, linking it static or not. If you
really want to merge the .rdata section with the rest, define _MERGE_RDATA_
in your project or before including AggressiveOptimize.h header.
Why Not
The argument can be made that doing this is a waste of time, since the
"zero bytes" will be compressed out in a zip file or install archive.
Not really - it doesn't matter if the data is a string of zeroes or ones or
85858585 - it will still take room (20 bytes in a zip file, 29 bytes if only *4*
of them 4k bytes are not the same) and time to compress that data and decompress
it. Also, 20k of zeros is NOT 20k on disk - it's the size of the cluster slop-
for Fat32 systems, 20k can be 32k, NTFS could make it 24k if you're just 1 byte
over (round up). Most end users do not have the dual P4 Xeon systems with two
gigs of RDram and a Raid 0+1 of Western Digital 120meg Special Editions that all
worthy developers have (all six of us), so they will need any space and LOADING
TIME savings they will need; taking an extra 32k or more out of your end user's
64megs of ram on Windows 98 is Not a Good Thing.
.NET
With Visual Studio .NET, aka, VC7, this header does not help as much as it
does under VC6 - this is because Microsoft is not allowing the compiler options
to be set from within the source code like before. You will have to add in the
switches yourself - "/ignore:4078 /RELEASE /LTCG:NOSTATUS" in
the Linker Command Line options, and "/GL /opt:nowin98" in the
C++ Command Line Options. You should also add in "/GA" into your .EXE
project's C++ Command Line Options, too. These will turn off the merging
warnings, force release builds on, and perform whole code optimizations, in
addition to removing the padding. The /GA option will turn on Windows
Application optimization - only do this for EXE's, not DLL's.
VC7 simply does not produce as tight as code as VC6 and VC5 does. The exact
same code is padded out more, and uses bigger constructs. Benchmarking is hard
on the Uber Box, so while this VC7 code might run faster, it's hard to prove it
- perhaps on a slower machine where the runtimes would be more spread out.
However, you generally want smaller code, in an effort to put as much into the
CPU's cache as possible - keep it from hitting slower ram.
Disassembling
Interestingly, and I feel, an added bonus, once the .TEXT segments have been
merged with the .DATA segments, the code will not longer be able to be
disassembled by DUMPBIN (try it! DUMPBIN /disasm filename.exe) or WinDisasm or
any other disassembly tool; you can still hack at it with SoftICE and the like,
of course. Credit must be given to G�zim Pani <gpani@siu.edu>
for discovering this and asking "why?".
Now, as to an explanation - as I see it. Code is supposed, by historic default,
live within the .TEXT segment (why not .CODE? Good question). The /merge:.text=.data
line causes the .TEXT and .DATA segments to be merged into .DATA, like this (summarized):
Application (.EXE) |
Dynamic Link Lib (.DLL) |
C:>DUMPBIN Crc32.exe
Dump of file Crc32.exe
File Type: EXECUTABLE IMAGE
Summary
2000 .data
1000 .rdata
1000 .rsrc
|
C:>DUMPBIN CBase.dll
Dump of file CBase.dll
File Type: DLL
Summary
4000 .data
4000 .rdata
1000 .reloc
1000 .rsrc
|
Neither of these files will disassemble, since there is no .text segments. Using
DUMPBIN /header on the CRC32.exe file shows two important items: first, the
entry point is at 38DF RVA (rva is an address into the relocation virtual
address). This is squarely in the 2nd segment, .DATA, which starts at 2000
virtual address, and is 1CE8 long. What this means is that if you have some
program that is hard-coded to expect raw code to be living in the .TEXT segment
(such as the dissemblers!), then it's going to fail. As a test, I tried Shrinker from Blink-Inc (this is a runtime program compressor, that puts
your dlls/exes in a runtime wrapper to compress/decompress in memory
transparently to the program & user; for a freeware open source version, try
UPX), and it worked fine. So
unless something is hacking around with the .TEXT segment explicitly, then there
is no problems. Since the only thing that I know of that does this is code
modification tools and viruses, the question arises - does this provide any sort
of anti-virus protection? Or does it in do the opposite, make it worse, by
causing the virus scanner to fail? Oddly enough, neither Symantec nor McAfee
would deem us worthy to answer our email - McAfee went as far as to demand that
we subscribe to their anti-virus service first! I personally think that this
will not cause a problem with the scanners, since they are looking
for signatures - patterns within the files, which would be independent of how
the program was put together. False positives always happen. However, this might
not be true for the virii - if they patch the first .TEXT segment, they will
fail. But if they walk the header and patch the entry point (which is how I
think they operate, but who knows these days), then this will matter not to
them. But since this is pure speculation (but with 20-some odd years of writing
code to back it up), until and if we get a response from some anti-virus vendor,
that is just a guess.
This article and source code are copyrighted � 1999-2002 by Todd
C. Wilson (tcw@nopcode.com). No reproduction of this article may be made
without proper clearance from the author. Free use of the source as described in
the source files is allowed, but may not be claimed as your own work. You may not
re-publish this article nor the attached files on any other web site or medium
without prior permission; you may refer to NOPcode.com
as to where to get it. You may, of course, use this in your own projects.
If you are using this in your projects or example code and would like to let the
world know, drop us a line!