Introduction
This article presents a JavaScript compression tool that takes your JavaScript source code and compresses it by removing all comments, extraneous whitespace, and optionally, as many line feeds as possible, and by optionally shortening function parameters and variable names. This will reduce the script size, and may help your pages load faster and reduce bandwidth consumption. A minor side benefit when line feed removal and variable name compression is enabled is that it provides lightweight obfuscation of the code, making it harder for the casual user to read and/or play around with it. It won't stop a determined user from reformatting and reverse engineering it, but that is not the intent of this tool.
I developed this tool for use in my own ASP.NET projects. The code is written in C#, but as long as you have the .NET Framework installed, it can be used to compress JavaScript for any web project, .NET or otherwise. The supplied project file is for Visual Studio 2003, but it can be opened, converted, and successfully compiled under Visual Studio 2005 as well.
There are three levels of compression:
- No Line Feed Removal
Line feeds are not removed from the script (except those deemed extraneous, such as on blank lines). Only comments and extraneous whitespace are removed. This mode provides good compression, and insures that no code is broken.
- Line Feeds Removed Wherever Possible
In this mode, line feeds are removed from the ends of statements in which it is determined safe to do so, usually resulting in an extra 2% to 5% compression. For example, lines ending in an operator such as *
, /
, +
, -
, etc., and those ending in a semi-colon will have any trailing line feeds removed. There are several other conditions that can be met, resulting in removal, and those are described below in the code description sections. Steps are also taken to prevent removal in instances such as missing semi-colons so as not to break code. However, I may not have caught all such conditions, so if the code is broken by this mode, you can fall back to the above mode. This mode achieves its best results when you are diligent about putting semi-colons after all statements that can use them to properly mark their endpoints.
- Function Parameter and Variable Name Compression
This can be combined with one of the first two compression options to further reduce the script size. When enabled, as many function parameters and variable names as possible will be renamed and shortened. The naming scheme starts with the names a
through z
, then _a
through _z
, _aa
through _az
, _ba
through _bz
, etc. With this option enabled, script size can usually be reduced by an additional 10% to 15%. There may be a higher potential for broken code with this option, so it is not enabled by default. If enabled, it is recommended that you thoroughly test all compressed scripts before deploying them.
Code blocks can also be surrounded by special
and
comments to exclude sections from compression. This is useful for including copyright notices in the headers of compressed script files or skipping sections that you are testing. For example:
function Test()
{
return true;
}
The #pragma
comments should appear on lines by themselves, and will be removed from the final compressed script. Any trailing comment text on the same line as the #pragma
is ignored, and will be removed as well. The compressor doesn't care about spacing or case on the #pragma
statements either.
The Programs
Two versions of the program are provided. The first is an interactive version that you can use to test the different modes of compression. It is a Windows Forms application written in C#. After running it, simply paste your JavaScript code into the Original Script text box, turn the Line Feed Removal and Variable Name Compression options on or off, and click the Compress button. The compressed script is then shown in the Compressed Script textbox, with some compression statistics displayed below it. The text can be copied to the clipboard from the Compressed Script text box.
Note that when using the Test only variable name compression option, the script code is not compressed. Only parameter and variable names are compressed. This may help locate a problem with the variable name compression code. Although the script code is not compressed, comments are removed so that the naming results match (i.e., it won't use different names due to matching a word that appears in a comment such as "a", "be", or "to").
The second and most useful tool is a console mode version of the compressor that can be used as the command for a pre-build step in ASP.NET projects to compress scripts in the project. It can also be used to compress scripts that are stored in custom web controls as embedded resources. The command line syntax is shown below. Options and file specs are case-insensitive, and are processed from left to right as encountered.
JSCompressCL [/options] filespec [[/options]
filespec ...]
The available command line options are as follows:
Option |
Description |
/? |
Show help |
/q |
Quiet mode. Don't display compression statistics. |
/debug |
Debug build, compression is suppressed, and scripts are passed through to the output folder unmodified to make debugging easier. Compression can be forced using the /f option. |
/release |
Release build, compression enabled (the default if no build option is specified). |
/k |
(Keep) No line feeds are removed unless they are extraneous (i.e., blank lines). |
/d |
(Delete) Line feeds are removed wherever possible (the default if no line feed removal option is specified). |
/v |
Compress variable and parameter names. |
/t |
Variable name compression only (for testing it). This will strip comments as well, but all other compression options are ignored. |
/f |
Force compression on processed files in debug builds. Useful for testing compressed scripts in debug builds. |
/r |
Recurse sub-folders in the file spec too. The sub-folder structure will be duplicated in the output folder. |
/o:<dir> |
Specify output folder (current folder if not specified). |
filespec |
One or more files to compress, wildcards accepted. |
The debug and release build options are spelled out to make it easy to specify them in a project's pre-build step using one of the IDE macros. This is described below.
At the minimum, you should specify an output folder other than the one in which the scripts to compress reside. For example, you may want to store the uncompressed scripts in a folder called ScriptsDev and tell the compressor to store the compressed scripts in a folder called Scripts that the application will use at runtime. The compressor will not overwrite the source scripts. On debug builds, it also checks for an existing copy of the script and, if the timestamp is greater than or equal to the source script, it skips it. This saves recreating a script file that has not changed, each time the project is built during debugging. An "up to date" message is displayed in such cases. The scripts are always processed in release builds, to ensure that they are up to date and are compressed.
If a script is compressed, the tool displays the source and destination filenames along with the compression statistics. The /q
command line option can be used to turn them off. Some examples are shown below (lines wrapped for display purposes):
Implied release build with line feed removal,
no stats displayed.
JSCompressCL /q /o:\MyProj\Scripts
\MyProj\ScriptsDev\*.js
Explicit release build with line feed removal,
stats are displayed.
JSCompressCL /release /o:\MyProj\Scripts
\MyProj\ScriptsDev\*.js
Line feed removal disabled for first file set, line feed
removal and variable name compression enabled for second file set.
JSCompressCL /o:\MyProj\Scripts
/k \MyProj\ScriptsDev1\*.js
/d /v \MyProj\ScriptsDev2\*.js
Debug build, no compression. Scripts are passed
through unmodified for debugging purposes.
JSCompressCL /Debug /o:\MyProj\Scripts
\MyProj\ScriptsDev\*.js
Debug build with forced compression. Scripts are
compressed even though it's a debug build.
JSCompressCL /Debug /f /o:\MyProj\Scripts
\MyProj\ScriptsDev\*.js
Using the Console Version as a Project's Pre-Build Step
Copy the console version of the application to a folder somewhere on your PC. To use the console version as the pre-build step of a web project, create a folder to contain the uncompressed scripts (ScriptDev, for example), and another to contain the compressed scripts to be used at runtime by the application (Scripts, for example). To create a new folder in the project, right click on the project name, select Add..., select New Folder, and enter the folder name. Add a new script to the folder, by right clicking on it and selecting Add... and then Add New Item... to create a new item, or Add Existing Item... if you copied an existing file to the new folder. Once added to the project folder, right click on the script, and select Properties. Change the Build Action property from Content to None for the scripts in the development (uncompressed) folder. You can add copies of the scripts in the compressed folder and leave their build action set to Content if you want to do so.
The next step is to right click on the project name, select Properties, expand the Common Properties folder, and select the Build Events sub-item. Click in the Pre-build Event Command Line option to enter the command line to run. You can click the "..." button to open a dialog with a larger editor and a list of available macros. Below is an example of a common command line that can be used (lines wrapped for display purposes). Replace the path to the tool with the path where you stored it on your PC.
D:\Utils\JSCompressCL /$(ConfigurationName)
/o:$(ProjectDir)Scripts $(ProjectDir)ScriptsDev\*.js
The /$(ConfigurationName)
option expands to the configuration name in effect at the time of the build. Assuming the defaults, this will equate to either /Debug or /Release, thus turning off compression for debug builds so that you can test your scripts and debug them and turn it on for release builds. Note that the command line processor will look for an entry starting with "Debug" or "Release", so you can use custom configuration names. As long as they start with either of those two keywords, it will select the appropriate build type. If the configuration name contains spaces, place quote marks around the option. As noted, in debug builds, scripts are passed through to the destination folder as-is, to make debugging easier. If you want the scripts compressed in debug builds, add the /f
command line option to force compression to be used.
The /o:$(ProjectDir)Scripts
option equates to the compressed script folder. For my projects, it is always a subfolder of the main project folder, thus the use of the $(ProjectDir)
macro. Modify the path name accordingly, for your own projects.
The same applies for the $(ProjectDir)ScriptsDev\*.js
option which tells the tool where to find the scripts that need to be compressed. As above, modify the path name accordingly for your own projects.
Compressing Scripts that are Embedded Resources
If you are developing a web control, for example, that uses scripts that are contained in the assembly as embedded resources, you can still compress them using the above steps. The only difference is that, when setting up the folders as described above, make an initial copy of the scripts, and place them in the compressed script folder. In the project manager, right click on the scripts in the compressed script folder, select Properties, and change the Build Action property to Embedded Resource. When you build the project, the pre-build command will compress the scripts, the project will then be built in the normal fashion, and the compressed scripts will be embedded as resources in the assembly.
How the Code Works
The code for the Windows Forms and the console applications is fairly straightforward, and there is nothing much to describe. The forms version takes data from the controls, and uses it with the JSCompressor
class. The console mode version does the same thing, but using command line parameters. The class itself is where the action occurs, and is described below. The code for the class can be found in the JSCompressor.cs file.
Basic Information
The JSCompressor
class is fairly simple, and consists of a couple of constructors, properties to modify the line feed removal mode and variable name compression settings, a public method to compress scripts, and several private data members and methods. The default constructor enables line feed removal, by default. A second version of the constructor takes a boolean parameter that lets you specify the initial state for line feed removal (true
for enabled, false
for disabled). The LineFeedRemoval
property lets you modify the mode after construction. The third constructor takes two boolean parameters that let you specify the initial state for the line feed removal and the variable name compression options. The CompressVariableNames
property can be used to modify the variable name compression setting after construction. Variable name compression is off, by default. In addition, the TestVariableNameCompression
property can be set to true to test the variable name compression code. When set to true, script compression is disabled, and only parameter and variable names are compressed. As noted above, comments are removed though, so that you end up with an identical set of renamed variables and parameters.
The Compression Process
The Compress
method of the JSCompressor
class does all of the work. It is passed a copy of the uncompressed script, and returns the compressed version.
public string Compress(string strScript)
{
string strCompressed;
char [] achScriptChars;
if(strScript == null || strScript.Length == 0)
return strScript;
scLiterals.Clear();
scNoComps.Clear();
if(reInsLit == null)
{
reExtNoComp = new Regex(@"//\s*#pragma\s*NoCompStart.*?" +
@"//\s*#pragma\s*NoCompEnd.*?\n",
RegexOptions.Multiline | RegexOptions.Singleline |
RegexOptions.IgnoreCase);
reDelNoComp = new Regex(@"//\s*#pragma\s*NoComp(Start|End).*\n",
RegexOptions.Multiline | RegexOptions.IgnoreCase);
reInsLit = new Regex("\xFE|\xFF");
meInsLit = new MatchEvaluator(OnMarkerFound);
meExtNoComp = new MatchEvaluator(OnNoCompFound);
reFuncParams = new Regex(@"function.*?\((.*?)\)(.*?|\n)?\{",
RegexOptions.IgnoreCase | RegexOptions.Singleline);
reFindVars = new Regex(@"(var\s+.*?)(;|$)",
RegexOptions.IgnoreCase | RegexOptions.Multiline);
reStripVarPrefix = new Regex(@"^var\s+",
RegexOptions.IgnoreCase);
reStripParens = new Regex(@"\(.*?,.*?\)|\[.*?,.*?\]",
RegexOptions.IgnoreCase);
reStripAssign = new Regex(@"(=.*?)(,|;|$)",
RegexOptions.IgnoreCase);
}
The first part initializes two string collections that will end up containing any "no compression" sections specified by the #pragma
comments and any literal strings found during parsing. A set of regular expressions and match evaluators are also initialized to help with the parsing and compression process. Their use is described later.
strCompressed = reExtNoComp.Replace(strScript, meExtNoComp);
private string OnNoCompFound(Match match)
{
scNoComps.Add(reDelNoComp.Replace(match.Value, String.Empty));
return "\xFE";
}
The next part extracts the sections, if any, that the user does not want compressed, as specified via the #pragma
comments (i.e., copyright notices at the top of the file). To do this, a match evaluator is used that adds the found section to the string collection and replaces it in the script with a marker character (\xFE
). The marker will be replaced with the uncompressed section at the end of the process. Replacing the section with a marker helps the remainder of the code to remove extraneous whitespace, by giving it less to look at. The #pragma
comments are stripped from the sections, before storing them in the collection.
achScriptChars = strCompressed.ToCharArray();
CompressArray(achScriptChars);
After the "no compression" sections have been removed, the script is split into a character array to make parsing simpler. The array is passed to the CompressArray
method which scans the script one character at a time, looking for block comments, line comments, literal strings, and JavaScript regular expressions enclosed in slashes (/ /
). Block comments and line comments are removed by setting all characters within the comments to a null in the array. However, sections between
and @*/
are left in the code, as they indicate a conditional compilation section. The code between the conditional section markers will still be compressed. Note that if you do use conditional compilation comments, it is important to end the line preceding the block with a semi-colon, as the browser will not process the conditional block unless it starts on a distinct line.
Literal strings and regular expressions are extracted and stored in a string collection, and are replaced by a marker character (\xFF
) using a method similar to extracting and storing the "no compression" sections. Again, this helps the final steps remove extraneous whitespace, by giving it less to look at. During this process, carriage returns are converted to line feeds, which makes it easy to remove them later on as well.
strCompressed = new String(achScriptChars);
strCompressed = strCompressed.Replace("\0", String.Empty);
if(!varCompTest)
{
strCompressed = Regex.Replace(strCompressed, @"^[\s]+|[ \f\r\t\v]+$",
String.Empty, RegexOptions.Multiline);
strCompressed = Regex.Replace(strCompressed, @"([\s]){2,}", "$1");
Once the array has been parsed, it is converted back into a string, and all null characters (representing removed sections) are deleted. After that, regular expressions are used to remove leading and trailing whitespace from all lines, and to condense all runs of two or more whitespace characters to just one. This part and the subsequent steps are skipped if only testing variable name compression.
if(removeLineFeeds)
{
strCompressed = Regex.Replace(strCompressed, @"([+-])\n\1",
"$1 $1");
strCompressed = Regex.Replace(strCompressed, @"([^+-][+-])\n",
"$1");
strCompressed = Regex.Replace(strCompressed,
@"([\xFE{}([,<>/*%&|^!~?:=.;])\n", "$1");
strCompressed = Regex.Replace(strCompressed,
@"\n([{}()[\],<>/*%&|^!~?:=.;+-])" ,"$1");
}
The next step is to see if line feed removal has been requested. If so, all line feeds occurring near numbers with signs and near operators are removed. As noted in the comments, care is taken around the +
and -
characters so that whitespace and line feeds are left around increment and decrement operations (++
and --
) where needed, to prevent breaking code.
strCompressed = Regex.Replace(strCompressed,
@"[ \f\r\t\v]?([\n\xFE\xFF/{}()[\];,<>*%&|^!~?:=])[ \f\r\t\v]?",
"$1");
strCompressed = Regex.Replace(strCompressed, @"([^+]) ?(\+)", "$1$2");
strCompressed = Regex.Replace(strCompressed, @"(\+) ?([^+])", "$1$2");
strCompressed = Regex.Replace(strCompressed, @"([^-]) ?(\-)", "$1$2");
strCompressed = Regex.Replace(strCompressed, @"(\-) ?([^-])", "$1$2");
A final set of regular expressions is used to strip whitespace from around operators and the marker characters. Again, special care is taken with the +
and -
operators so as to correctly strip whitespace around occurrences of increment and decrement operations.
if(removeLineFeeds)
{
strCompressed = Regex.Replace(strCompressed,
@"(\W(if|while|for)\([^{]*?\))\n", "$1");
strCompressed = Regex.Replace(strCompressed,
@"(\W(if|while|for)\([^{]*?\))((if|while|for)\([^{]*?\))\n",
"$1$3");
strCompressed = Regex.Replace(strCompressed,
@"([;}]else)\n", "$1 ");
}
After removing all extraneous whitespace, if line feed removal has been requested, a few additional steps are taken to remove unnecessary line feeds from around if
, while
, and for
statements. This helps remove line feeds from instances where those statements occur one after the other in any combination, with no intervening brace character. For example, the following would get condensed to a single line:
if(a == 1)
for(b = 0; b < 10; b++)
while(!c)
c = DoSomething();
If the code contains semi-colons on all statements that need them to mark their endpoints, the above process can usually remove all line feeds from the script, reducing it to one long stream of characters, thus providing maximum code compression.
if(compressVarNames || varCompTest)
strCompressed = CompressVariables(strCompressed);
noCompCount = literalCount = 0;
strCompressed = reInsLit.Replace(strCompressed, meInsLit);
return strCompressed;
}
private string OnMarkerFound(Match match)
{
if(match.Value == "\xFE")
return scNoComps[noCompCount++];
return scLiterals[literalCount++];
}
Variable name compression occurs next, if requested. This process will be described in the next section. The last step is to reinsert the uncompressed sections and literal strings. In a manner similar to extraction, a regular expression and a match evaluator are used. Two private counters are used to keep track of the progress through the string collections. As each marker character is found, the match evaluator is called and, depending on the marker found, it returns the next element from the appropriate collection, which then takes the place of the marker. The matching counter is also incremented ready for the next match. After the insertions have been made, the compressed script is returned to the caller.
Parameter and Variable Name Compression
The CompressVariables
method handles the compression of function parameter and variable names. Since there is the potential to break code, the compression method takes a conservative approach to locating and renaming variables.
- Function parameter names appearing within the parentheses on a function declaration are included for compression.
- Variable names on the same line as a
var
statement are included for compression. However, if the var
statement spans lines and extra line feed removal is disabled, some names may be missed. For example:
var string1, string2, num1, num2;
In the above example, string1
and string2
will always be included, but num1
and num2
will not be included if the LineFeedRemoval
property is set to false
as they will always appear on a line by themselves with no indication that they are variables.
- On a similar note, variable names that appear in the code but that are not formally declared with a
var
statement will always be ignored (i.e., global variables declared in another module).
- If you declare global variables that are referenced in other script files, you should wrap their declarations in a
#pragma NoCompStart/NoCompEnd
section so that they are not renamed within the file that they are declared.
The actual renaming process occurs as follows:
private string CompressVariables(string script)
{
StringCollection scVariables = new StringCollection();
string[] varNames;
string name = null, matchName;
bool incVarName;
MatchCollection matches = reFuncParams.Matches(script);
foreach(Match m in matches)
{
varNames = m.Groups[1].Value.Split(',');
foreach(string s in varNames)
{
name = s.Trim();
if(name.Length != 0 && !scVariables.Contains(name))
scVariables.Add(name);
}
}
The first part searches for function parameters using a regular expression created earlier. The parameter list is split apart, and each unique parameter name is added to the variable name string collection.
matches = reFindVars.Matches(script);
foreach(Match m in matches)
{
name = reStripVarPrefix.Replace(m.Groups[1].Value, String.Empty);
name = reStripParens.Replace(name, String.Empty);
name = reStripAssign.Replace(name, "$2");
varNames = name.Split(',');
foreach(string s in varNames)
{
name = s.Trim();
if(name.Length != 0 && !scVariables.Contains(name))
scVariables.Add(name);
}
}
The next part searches for var
statements that contain variable name declarations, using a regular expression created earlier. This step is slightly more complex as it must account for assignments that occur within the statement as well as possible references to array indices that might cause an incorrect split to occur. For example:
var num1, string1 = "Test", num2 = array1[3, 0];
var resultString = functionCall("A", "B");
The var
prefix is removed from the statement, followed by any parts of the expressions that contain brackets or parentheses containing commas (i.e., two-dimensional array indices, function call parameters, etc., as shown in the above examples). Once they are removed, a final regular expression is used to remove any remaining assignment text from the equal sign to the next comma or end of the line. Once this is done, it is safe to split the string on each comma and add the unique names to the variable name string collection.
newVarName = new char[10];
newVarName[0] = '\x60';
varNamePos = 0;
incVarName = true;
foreach(string replaceName in scVariables)
{
if(incVarName)
{
do
{
IncrementVariableName();
name = new String(newVarName, 0, varNamePos + 1);
matchName = @"\W" + name + @"\W";
} while(Regex.IsMatch(script, matchName));
incVarName = false;
}
if(name.Length < replaceName.Length)
{
incVarName = true;
script = Regex.Replace(script,
@"(\W)" + replaceName + @"(?=\W)", "$1" + name);
}
}
return script;
The final step loops through each unique variable name found, and substitutes a shorter name. Once done, the compressed script is returned. As noted in the comments, the naming scheme starts with a
through z
and, if they run out, it adds an underscore prefix and carries on (_a
through _z
). The underscore ensures that it will not accidentally create a name that could match a keyword once it gets past single letter variable names. Should those names be exhausted, it starts appending letters and runs through each set from _aa
to _az
, _ba
to _bz
, etc. The code is written such that it will expand the names further if needed, but it is more likely that the script will have fewer unique variables than the number of unique new names that can be generated by the compressor.
As each new name is created, a check is made to ensure that it does not already exist in the script. For example, common loop variable names such as i
or j
will cause it to skip those new names if they are used in the script already. Likewise, if the new name is longer than the existing name, it will not be replaced. However, as noted, you could remove that check in order to completely obfuscate the names if necessary.
Conclusion
On average, my own scripts have been reduced in size by 50% to 60%. Adding in variable name compression increases the savings by an additional 10% to 15% in the average script. Naturally, the more you comment your JavaScript code, use indentation to make the code more readable, and use descriptive variable names, the better the compression rates, as there is more stuff to remove. Using semi-colons to mark statement endpoints can also increase the compression rates as it enables the code to remove most if not all of the line feed characters too.
History
06/26/2006 |
|
Modified the compression code to allow for conditional compilation blocks ( ). Modified the command line compressor to scan and compress sub-folders if the /r option is specified. |
|
03/05/2006 |
|
Added the option to compress function parameter and variable names. Tested the code under Visual Studio 2005 and .NET 2.0. The demo project is a Visual Studio 2003 project, but will convert and build without any problems under Visual Studio 2005. |
|
07/25/2003 |
|
Initial release. |