Contents
Upload/download files are common tasks in an ASP.NET application. Once a user uploads a file into a web server, later on when downloading that file from the server, he would prefer to see the filename exactly displayed in the File Download dialog box. Basically, developers would normally use the �Content-Disposition� header field to force the download, and the �filename� parameter is used to suggest a filename for the downloaded file. If the filename just contains all US-ASCII characters, then there is no problem as the filename shown in the File Download dialog box is the same as when it was uploaded. The problem only happens when the filename contains non-US-ASCII characters such as Vietnamese or Arabic �, and at that time, it is corrupted and not displayed in the manner that the user would like to see. The reason that explains this problem is, the filename parameter is limited to US-ASCII. For complete information on the Content-Disposition field, you can see RFC 2183.
Figure 1: a non-US-ASCII filename is corrupted
In this article, I come up with three simple alternative ways that can solve this issue to accurately display a non-US-ASCII filename in the File Download dialog box:
- Encoding filename
- URL Rewriting
- �Encoded-word� mechanism
By the end of each section, I also explain a bit when we can use that solution.
Figure 2: a non-US-ASCII filename is correctly displayed
In this solution, we are going to use the html <a>
element to make a link to the requested file instead of using the Content-Disposition header field to develop the download functionality. However, one thing that we need to take into account is, the browser normally sends URLs as UTF-8, so when a file is uploaded to the server, the filename needs to be encoded before saving. Below is the code snippet used to encode the filename:
public static string EncodeFilename(string filename)
{
UTF8Encoding utf8 = new UTF8Encoding();
byte[] bytes = utf8.GetBytes(filename);
char[] chars = new char;
for(int index=0; index<bytes.Length; index++)
{
chars[index] = Convert.ToChar(bytes[index]);
}
string s = new string(chars);
return s;
}
This solution is simple and we can use it when the File Download dialog box is not forced to display. However, we also need to provide a bit more work to control the filename duplication as the users of the application might upload a lot of files that have the same name. In addition, this solution does not work properly in some cases when the encoded values contain some special characters that are not allowed in naming a file.
As I said above, developers would normally use the �Content-Disposition� header field to force the download. And when we look at the way the Mail User Agent (MUA) processes the Content-Disposition header field, we can see that the receiving MUA uses the filename parameter value as a basis for the actual filename in the File Download. If this parameter is absent, the MUA is likely to display the name of the web page that is responsible for writing out the downloaded file contents to the client (in an ASP.NET application, it is normally the name of an aspx page). So, the idea in this solution is that we first make the request for the downloaded file, and then when it arrives at the server, the URL will be rewritten to an aspx page that is in charge of reading the original requested file and sending back to the client. In this aspx page, we use the Content-Disposition header field without specifying the filename parameter.
Generally speaking, URL rewriting can be implemented either at the IIS level or the ASP.NET level, and the URL rewriting only happens at the ASP.NET level when the request is successfully routed from the IIS to the ASP.NET engine. As you know, only the requests for a page with an extension such as aspx, ascx, ashx� will be processed by the ASP.NET engine. Furthermore, we as developers have no idea about the type of files that the user uploads to the server, so URL rewriting at the IIS is likely the answer. Here in this article, I am not going to present how to implement URL rewriting at the IIS with ISAPI filters. However, there are a number of third-party ISAPI filters out there that can be used. For the demo, I am using the ISAPI_Rewrite Live version, it is simple and free. You can download the latest version from here.
As soon as the ISAPI_Rewrite is installed, we can define rewriting rules in the httpd.ini file that should appear in the ISAPI_Rewrite installation directory. For the article purpose, we just define one single rewriting rule:
RewriteRule (/Sample2/Download/.*?)
(id=[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12})
/Sample2/Download.aspx\?$2 [L]
Let me briefly explain this rule: when the user requests any file in the Download directory, the URL will be rewritten to the Download.aspx page that is responsible for reading the file and sending back to the client. Here, the RewriteRule
directive is used to define one single rewriting rule. For example, the user downloads the report.doc file with the URL:
http://localhost/Sample2/Download/report.doc?id=647faba9-a223-45d3-90d5-3bc4de95bd39
At the IIS, the URL will be rewritten to:
http://localhost/Sample2/Download.aspx?id=647faba9-a223-45d3-90d5-3bc4de95bd39
With this solution, we are not required to provide any further filename processing in code while the problem is probably resolved. However, installing a third-party component in IIS might not interest people for some reasons, in this case, the �Encoded-word� mechanism may be the answer.
As you know, the filename parameter is limited to US-ASCII. So if the filename contains any non-US-ASCII characters, it must be encoded to be exactly displayed in the File Download dialog box. RFC 2184, 2231 define extensions to the encoded-word mechanisms in RFC 2047 to provide a means to specify parameter values in character sets other than US-ASCII. The encoding mechanism is quite simple. For a specific filename, all non-US-ASCII characters as well as ones that are different from alphanumeric and reserved characters are replaced with %xx encoding, where xx is the hexadecimal value representing the character. Below is the code snippet used to encode a character:
private static string ToHexString(char chr)
{
UTF8Encoding utf8 = new UTF8Encoding();
byte[] encodedBytes = utf8.GetBytes(chr.ToString());
StringBuilder builder = new StringBuilder();
for(int index=0; index<encodedBytes.Length; index++)
{
builder.AppendFormat("%{0}",Convert.ToString(encodedBytes[index], 16));
}
return builder.ToString();
}
For example, if the original filename is Bản Kiểm K�.doc
(to view the filename correctly, the Encoding should be chosen as Unicode (UTF-8) on your web browser), the encoded value is something like this B%e1%ba%a3n%20Ki%e1%bb%83m%20K%c3%aa.doc
, and then Content-Disposition field is specified as below:
Content-Disposition:
attachment; filename=B%e1%ba%a3n%20Ki%e1%bb%83m%20K%c3%aa.doc
So in this way, we need to provide some code for encoding the filename before passing in to the Content-Disposition field, and in my opinion, it is a good choice because we do not need to install any third-party component. However, we should be aware that the �Encoded-word� mechanism only works if the existing MIME processor on the client side understands the encoded parameter values, otherwise the filename is not displayed correctly as we expect. In addition, according to RFCs 2231 and 2184, the extensions defined in these documents should not be used lightly, they should be reserved for situations where a real need for them exists. Fore more information, see RFCs 2231, 2184.
The download contains three web applications that are used to demonstrate the three solutions. To try out the demo applications, create three web virtual directories in IIS and point them at Sample1, Sample2, and Sample3. The start page should be the ListDocument.aspx page. For the Sample2, you also need to download and install the ISAPI_Rewrite and add the above rewriting rule into the httpd.ini file.
The ASPNET account is required to have read/write access on the Data and Download directories in each application, and for the sake of simplicity, I am using XML as a back-end data store.
At the moment, the filename parameter is limited to US-ASCII, so we need to provide a bit more work in order for the filename to be exactly displayed in the File Download dialog box. We hope that this limitation will be resolved someday and we are able to use non-US-ASCII characters as easily as US-ASCII ones.
References