Other People’s Data, Your Security: Mashup Applications for Enterprises, Part 2

HTML5 Partners

5.00/5 (1 vote)

24 Jan 2013CPOL16 min read

16.4K

This article is for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers

Develop a Windows 8 app in 30 days

In the first article in this series, I covered cross-origin resource sharing (CORS) and building iframe sandboxes and described how to use these techniques in mashup applications to consume data from other domains and provide a layer to a defense-in-depth strategy. In this article, I’ll start exploring how to consume data from CORS connections (or any Ajax connection) by defining a level of trust and then sanitizing accordingly. To do this, I’ll build on top of the guidance provided by Project Silk. First, let’s discuss trust and how mashup applications in the enterprise provide a unique challenge to the existing paradigm.

Trust or the Lack Thereof

In Writing Secure Code, the authors put forth an excellent mantra: "All input is evil." In the world of enterprise mashup applications, this is true, but some input is more evil than others. As an example, does a data feed from your company’s human resources system pose the same threat as a data feed from Twitter? Another common expression about software security is "all external systems pose a threat." Again, this is true, but security is about risk management, which needs to stay a central focus in your mashup development process. Risk is a combination of the impact of a threat (a threat is an exploitation of a vulnerability in the system) and its probability of execution. The types of threats that a HR data feed present probably have a lower impact than threats that could be realized by consuming a Twitter feed. A Twitter feed can be any text content provided by any user, while the HR feed would be a structured set of data points provided and verified by members of the HR department. This gives the HR system a lower risk (assuming the probability of execution is lower or equal to Twitter). When considering the trust of systems, extend your vision of input to include the content coming from your mashup providers and also weigh trust for internal vs. external providers.

Here are some questions to consider when building your input validation for mashup providers:

What is your organization’s culture toward risk? Are you risk adverse or less concerned about risk when functionality is at stake?
What is the history of the provider? As an example, if you know that an internal provider has had malicious code posted to its site by disgruntled employees, you would obviously want to keep an eye on this provider.
What kind of data are you receiving from the provider? Take Twitter vs. Bing Maps, for example. The type of content coming from Bing Maps is different from the content coming from Twitter (depending on the API). How does that impact your trust of the provider?
Does the data cross a trust boundary in your threat model?

Mashup Data = Input

Overall, your goal is to securely consume data from various sources, some trusted and others not as trusted. "Data from various sources" is a long way of saying "input," which means you need to consider the three elements of input validation: constrain, reject and sanitize. Constraining input is not only limiting what is permitted. It also means reducing the possible entry points into the system. (In information security, we call this "reducing the attack surface.") You need an input chokepoint through which all consumed data must pass. Fortunately, the nice folks at Microsoft patterns & practices have provided a great design to do just that with the Data Manager in Project Silk.

The sample code provided for the Data Manager is an excellent single point for processing Ajax requests we will use in the sample application. Being the component that handles requests, the Data Manager is ideal for housing the components of our input validation processes. Figure 1 shows the workflow that Ajax requests follow in our application using an enhanced Data Manager.

Figure 1: Workflow originally from Project Silk, updated to show our new subprocess

We have two validation activities:

Validate response: This process performs input validation activities on content coming from an Ajax request. At this point, we have received data from a provider and need to ensure that it adheres to the security standards configured for our application.
Validate cache: This process performs input validation activities on content coming from the cache. Never assume that data in the cache is clean. Even though our application will clean the data prior to storing it in the cache, we need to be sure the data is still clean when it’s extracted from the cache. Manipulating a cache that a developer assumes is clean is sometimes referred to as cache poisoning. This type of attack exploits a developer’s misplaced trust in the cache by injecting malicious content into the cache, subverting any input validation activities for data coming from the cache.

We’ll use a single widget to manage both validation activities. Using the Project Silk design, we’ll build a Validation widget that performs the three processes of input validation: constrain, reject and sanitize. An excellent, concise resource for input validation techniques can be found in Improving Web Application Security: Threats and Countermeasures (2003). Although Microsoft has retired this content, the book is an excellent resource for web application security, and I highly recommend it even if you are not an ASP.NET developer.

The first step in the process is to constrain data to be known good data. When you see "known good data," instantly think white list. In a white-list approach to input validation you identify what is permitted and deny all else. Here are some sample techniques that you can use to do this:

Type checks: Does the data returned from the request match the data the trust broker says was returned? Implementing a check like this can be a challenge in JavaScript because the idea of a "type" is not always clear-cut. Fortunately, you can limit the data types that your application will accept and accept only content that passes checks against those data types.
Length checks: Using regular expressions check that the returned length conforms to what you expect. For example, a Twitter feed is 140 characters long, or is it? Be careful about assumptions. Twitter feeds are 140 characters long visually but can have supporting HTML (such as a minified URL) that you need to account for as well.
Format checks: Search the returned data for patterns you are expecting. As an example, if you are expecting a list of three comma-separated items to be returned, use regular expressions to check whether the data conforms to this format. Nonconformance means invalid data.
Range checks: Are you expecting a number between 1 and 10? Range checks can limit your data to an appropriately specific range. In an HR system, for example, a value of 259 for a person’s age is probably not valid.

For our widget, we’ll focus on type checks. The other elements are great extensions to the widget that can be used in specific situations. For instance, you could add a length validator for when the data is rendered to the page by a UI widget. What I’ll describe is a skeleton and basic frame for an input validation widget, and I encourage you to expand on it with the topics discussed in this article.

Restricting Input

The validation widget uses the same setup as the Data Manager from Project Silk. This varies slightly from the standard UI widget, but this control does not create an interface, it only cleans data. As with many widgets, we first set up the possible options for the control. These are the default settings that will be replaced when the control is instantiated:

(function (hybrid, $) {
    hybrid.sanitizer = {
        //default options
        options: {
            data: null,             //data to be validated
            dataType: null          //[OPTIONAL] expected data type of the data
        },

There are numerous default widget methods, such as Destroy, _create, and _init, but we will not implement those here. The method that acts as our workhorse is the sanitize method. This method takes input from wherever, constrains it to our accepted types and encodes the data to be safer output.

//validate data provided 
        sanitize: function (inputOptions) { 
            var that = this; 
            that.options = $.extend({}, this.options, inputOptions); 
  
            var suppliedDataType = null;         
            var output = null;

First, as in the Project Silk code, we cast this as that to avoid confusion in nested functions where this becomes something else. Next, we use the jQuery extend method to create a new options object that combines the default values with the values for inputOptions provided when the sanitize method is called. This step ensures that all values are provided in our options object. Two variables that will hold important information are suppliedDataType, which is determined by recognizing the data type of the data provided, and output, which represents what is returned at the end of the method.

Although we have a dataType setting in the options object, we want to rely on what we can determine from the data. This is one aspect of constraining data. We accept only specific kinds of data. What data you accept is up to you, but for the purpose of this project we need just JSON and XML. With that said, let’s look at some specific concerns about these two data types.

JSON

JavaScript Object Notation (JSON) is a very common data format for mashups. If you are using jQuery for your Ajax calls and you specify json as your data type, jQuery will do the conversions for you to result in a valid JSON object (or throw an exception if the result is not valid JSON). If you are not using jQuery, then depending on how you convert the text to a JSON object, you could have a problem. Many times when JavaScript developers need to convert text to an object, they think to use eval(), but there is a better way. Here is some sample code that converts text to JSON using jQuery. You could follow the same process using JSON.parse, which is built in to newer versions of Internet Explorer and most other browsers. For older browsers lacking this functionality, you will need to include Douglas Crockford's JSON2.js to polyfill that functionality.

try { 
                output = $.parseJSON(that.options.data); 
                if (null !== output) 
                    suppliedDataType = "json"; 
            } 
            catch (ex) { 
  
            }

In this block, the jQuery parseJSON method checks to see whether the data passed in to the data object is a valid JSON object. If you make an Ajax call using dataType : "json", you will definitely have a JSON object. This check is for when the data type is text or when a multiple data type is used and for some reason only text was returned. If output has a value (which it should if the data was parsed to JSON), you set the suppliedDataType variable to json. Just in case the data is not valid JSON, the code wraps the section in a try/catch because jQuery’s parseJSON method will throw an exception if it cannot cast the data to JSON (unlike parseXML, which we will look at in a moment).

To eval() or Not to eval()

The use of eval() is a tricky subject in the security world. When you describe its function to a security professional, you usually get a reaction that consists of running and screaming. Basically, eval takes data and converts it to a command, which means the dreaded crossing of the data and command channels that leads to attacks such as XSS and injection. For developers, however, there are times that eval is the answer to a code problem. As a developer (or development team), you need to balance functionality and security. Best practices for security say to never eval untrusted data (which returns to the earlier examination of what is trusted). Unfortunately, code problems do not always conform to best practices. When you need to use eval, be aware of the risks that using it will inject into your application. Make sure that you note the use of eval in your threat models because the security department will want to be aware of this.

JSONP

If you want to retrieve a JSON object from an origin other than your application’s (remember same origin policy from the first article?), JSONP might be your answer. JSONP works by inserting a script tag on the page making the Ajax request that calls the remote site’s JSON object. The remote site receives the request and wraps the JSON object in a method defined by the "callback" querystring parameter. The script reference on the application’s page is then executed (because it is now on the same origin), and the JSON object is returned as the value of the method. After all is complete, the script tag is removed from the page, and your local variable holds the value of the callback method. For example, let’s say I performed the following JSONP request:

http://travel.contoso.com/alerts/medical-alerts.ashx?callback=viewAlerts

This would result in the following output to be interpreted by the browser:

viewAlerts([{ "ID": 1, "Title": "Flu strikes Kenya" }]);

ChiliBook.lineNumbers = true;ChiliBook.automatic = true;

If this sounds like script injection, you’re correct. To skirt the limitations of single origin policy, JSONP allows you to create a script reference to another origin and then execute that script in your application’s origin and result in a value that has the same origin as your application. As you can imagine, this opens a significant security risk for your application. As a best practice, limit use of JSONP to trusted providers and still run the JSON output through your sanitization processes. Unfortunately, some providers offer only JSONP data, and you must accept the risk if you want to access data from that provider. Consult with your information security team about policies and procedures surrounding JSONP (or consumption of external scripting resources). Make note of JSONP implementation in your threat models and document why it is necessary.

Numerous people are brainstorming some crafty ways to make JSONP more secure (execute it in an iframe sandbox and standardize how JSON-P is called are examples), but until JSONP matures more, it is a security risk to your application.

XML

Another common data format to receive is XML. As with JSON, if you are using jQuery for your Ajax request and specify the data type to be XML, you will receive an XML document based on your browser’s setup. jQuery provides a few good helpers to check XML data and parse it as a valid XML document:

if (null === suppliedDataType) { 
                if (jQuery.isXMLDoc(that.options.data)) { 
                    suppliedDataType = "xml"; 
                    output = that.options.data; 
                } 
                else { 
                    output = $.parseXML(that.options.data); 
                    if (null !== output) 
                        suppliedDataType = "xml"; 
                } 
            }

It is possible that we requested dataType : "xml", and an XML document was returned from the Ajax request. Testing for this condition is the first step. By using the jQuery.isXMLDoc method, you can avoid the chaos of XML in the browser (because different browsers do things differently). This method returns True if the data is already a well-formed XML document. If it is not already an XML document, the code tries to cast the response as XML by using the $.parseXML method. With the parseXML method, jQuery examines that.options.data and determines whether it can be cast as an XML document. In a case when the data is not XML, the parseXML method will return a null value. After successfully casting that.options.data as XML, we set the suppliedDataType to XML.

What About Other Data Types?

In our application, we accept only JSON and XML. Any other type of data will be rejected. To use input validation terms: we constrain input to JSON and XML and reject all other data types. If we receive text back, it needs to be able to be converted to JSON or XML, or it will be ignored. This limitation lets us reduce the possible data flowing through our system. As we adopt more data types we can expand the solution, but for now we need to focus only on "known good data." Figure 2 shows our rejection process.

switch (suppliedDataType) { 
                case "json": 
                           this._recursiveJSONSanitizer(output, this._sanitizeJSON); 
                    break; 
                case "xml": 
                    this._recursiveXMLNodeSanitizer(output.firstChild, this._sanitizeXML); 
                    break; 
                default: 
                    throw "The supplied data type is either not supported or not recognized."; 
                    break; 
            } 
  
            return output; 
        },

Figure 2: Unless the data is JSON or XML, the application rejects it.

Sanitization: Making Data Safe and Nonexecutable

Now that we have set constraints for what are valid data types and rejected all else, we need to clean up what came in from our Ajax request. This cleanup requires a few subprocesses:

Canonicalize our data
Ensure that injected code is rendered inert
Ensure that only permitted HTML is available

Here we’ll examine the sanitizeJSON method and how it performs these steps. Both the sanitizeXML and sanitizeJSON methods work the same way. They are called from a recursive method that steps through each node to work with the text of the object:

_sanitizeJSON: function (data) { 
            var tainted_data = data; 
            var canon_data = $.encoder.canonicalize(tainted_data); 
  
            if (null != window.toStaticHTML) 
                data = window.toStaticHTML(canon_data); 
            else 
                data = $.encoder.encodeForHTML(canon_data); 
        },

First we capture the data in a "tainted" variable. This is just to signal the developer that the data has not been through the sanitization process yet. Next we check the data for multiple encodings by using the $.encoder plugin from Chris Schmidt. If the canonicalize command detects multiple encodings, it throws an exception because the presence of multiple encodings is a sign of a security threat. From there we use the window.toStaticHTML command to render inert any script in tainted_data. The toStaticHTML function is a JavaScript function built in to Internet Explorer that prevents malicious script from being able to execute in the data channel. If you are targeting other browsers, you can use the $.encoder.encodeForHTML(canon_data) method to achieve a similar effect. Finally, if you have a list of permitted HTML tags, you could replace the encoded HTML with the decoded HTML tag using a simple replace call, as shown here:

data = window.toStaticHTML(canon_data).replace(‘&lt;p&gt;’, "<p>").replace("&lt;/p&gt;", "</p>");

If you are permitting tags to be rendered, maintain a list of acceptable tags in your threat model. The security team needs to be aware of this information. Certain HTML tags are considered safer than others, like formatting tags such as <strong>, <em>, and <blockquote>. Be wary of the following tags (the list comes from http://msdn.microsoft.com/en-us/library/ff649310.aspx) because they are known to be used to inject script or other nasty code into your application:

<applet>
<body>
<embed>
<frame>
<script>
<frameset>
<html>
<iframe>
<img>
<style>
<layer>
<link>
<ilayer>
<meta>
<object>

I have not covered all aspects of the sanitization widget. The complete code for the sanitization widget can be found at the MSDN Code Gallery. This is a shell that I encourage you to build on and then submit your improvements.

For now we will close up the widget:

} 
} (this.hybrid = this.hybrid || {}, jQuery));

and then use it as part of the sendRequest method in Project Silk’s Data Manager, as shown in Figure 3.

Shrink ▲

sendRequest: function (options) { 
            var that = hybrid.dataManager; 
            var cachedData = hybrid.dataStore.get(options.url); 
  
            var callerOptions = 
              $.extend({ cache: options.cache }, 
                 that.dataDefaults, options); 
  
            if (callerOptions.cache && cachedData) { 
                var sanitized_cacheData = 
                  hybrid.sanitizer.sanitize({ 
                    data: cachedData, 
                    dataType: callerOptions.dataType }); 
                options.success(sanitized_cacheData); 
                return; 
            } 
  
            callerOptions.success = function (data) { 
                var tainted_data = data;    //setup the data as a tainted component that needs  
                                            //sanitization 
                var sanitized_data = 
                  hybrid.sanitizer.sanitize({ 
                    data: tainted_data, 
                    dataType: callerOptions.dataType }); 
  
                if (callerOptions.cache) { 
                    hybrid.dataStore.set(callerOptions.url, 
                      sanitized_data); 
                } 
                options.success(sanitized_data); 
            }; 
  
            $.ajax(callerOptions); 
        },

Figure 3 Integrating our widget with the Project Silk Data Manager

Comment Topic

In the world of mashups, does the mantra "All input is evil" still apply? Can you build a great application if you distrust all the data and functionality that makes up the application?

Constrain, Reject and Sanitize

This article focuses on the activities of input validation. I began with a discussion of trust and its role in the mashup application universe. Using the Data Manager from Project Silk, we then looked at a widget that could be used to perform the three elements of input validation on data returned from an Ajax request. This is yet another layer in a defense-in-depth strategy for protecting users of a mashup application. In the next and final article, I’ll look at what might be the strongest line of defense, the built-in security features of modern browsers, like XSS Filtering, SafeMode and defenses against more than the OWASP Top Ten.

About the Author

For the past ten years Tim has been building web applications using JavaScript, ASP.NET and C#. Now Tim leads the development team at FrontierMEDEX building online medical and security intelligence tools. Tim is a CISSP and CEH specializing in secure software development.

This article was written by Tim Kulp. For the past ten years Tim has been building web applications using JavaScript, ASP.NET and C#. Now Tim leads the development team at FrontierMEDEX building online medical and security intelligence tools. Tim is a CISSP and CEH specializing in secure software development.

Find Tim on:

Tim's Blog
Twitter: @seccode

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)