Introduction
As the internet has evolved over the last decade, the infrastructure required to host its services and the development paradigms to support such scale have also undergone a significant change. This infrastructure scale and evolution has seen the birth of varied technologies, example, data centre in a container to "cloud on a chip" [Intel]. Similarly, the software development languages and paradigms have seen an evolution from scripts [Perl, csh, and bash] hosted via cgi-bin to virtual machine based application servers like Tomcat, .NET, Parrot. Adoption of modern languages to support such a large scale development and parallelism has also seen changes in architectural styles. The current styles in vogue and which have demonstrably been successful have used a Service Oriented Architecture. Such styles have found coherence in the underlying infrastructure and development kits for communities to increase their adoption. Currently, REST based system have shown much promise to be able to handle the conceptual model required for development and deployment on a large scale infrastructure. In this article, we would present the birth of the idea of REST, its concepts and its subsequent industry adoption.
Representational State Transfer
REST was first introduced and defined in 2000 by Roy Fielding at the University of California, Irvine, in his academic dissertation, "Architectural Styles and the Design of Network-based Software Architectures". It is an architectural style which is derived from many existing network architectural styles. It acts as a guiding framework for web standards and designing web services. It exploits the full potential of web by using existing web standards and adding constraints on them in order to ensure the modelling of well matured distributed hypermedia system.
REST Architectural Style
While designing any distributed hypermedia system (modern web architecture), various factors such as scalability, simplicity, visibility, etc. should be ensured. Existing network architectural styles do ensure these factors. But there are no existing styles that compose all these styles and reflect these desired properties. REST is evolved by identifying the strengths and weakness of the existing network styles. The network styles that compose REST are:
- Client-Server: The client-server constraint ensures that the concern related to the view of data (i.e. user interface) is separated from the way it is stored in the server. This loose coupling between the data and its view helps client to be ported across various technologies. The Server also becomes independent of any technology constraint of client. Hence, both the Server and client can evolve independently.
- Client-Server: The client-server constraint ensures that the concern related to the view of data (i.e. user interface) is separated from the way it is stored in the server. This loose coupling between the data and its view helps client to be ported across various technologies. The Server also becomes independent of any technology constraint of client. Hence, both the Server and client can evolve independently.
- Stateless: The stateless constraint ensures that every request is complete and independent of other requests. This simplifies monitoring. There is no need to correlate many requests. Failures can be recovered quickly just by analysing a single request. Scalability is ensured as no client state is stored in the Server. The Server can free up its resources as soon as a single request is processed.
- Cacheable: Caching eliminates some client-server interaction. It caches the information which does not change very often. This improves overall user-perceived performance. Adding caching to various components helps to further reduce network and bandwidth usage. For example, by implementing a cache at the client side (such as a proxy), requests need not always be sent to the origin server. Hence, the information can be served to the client quickly.
- Layered System: By distributing the overall system into various layers, the complexity of the system can be reduced. Each layer has to only know how to interact with the next layer. Each layer has its own responsibility. Hence, the entire system can be managed easily.
- Code on Demand: The client can download the functionality and execute the code at runtime. The executable code can be in the form of applet or script. However, this constraint cannot always be ensured. Some components such as a firewall may not allow the executable code to pass through the network. Due to this reason, this constraint is added as optional in REST.
- Uniform Interface: When the system consists of many components, the communication between them becomes difficult. All the components need to understand the contracts and the protocol of the communication. By having a uniform interface, components do not have to understand application specific semantics. It makes the communication much easier. In order to attain the uniform interface, there are some constraints added on the behavior of the components. This behavior is also known as Data Elements in REST style.
REST Architectural Elements
Data Elements
The key aspect of REST is the nature and state of its data elements. In a REST style, there are four concepts that depict the behavior and state of the information. They are:
- Resource: It is an entity(logical/physical) that is available on the web. It can be a document stored in a server file system or a row in the database table. The end user interacts with the resource to reach some goal. To design a system with REST, a designer has to think about business entities as resources and how these resources can be made addressable.
- URI: It uniquely identifies a resource. It makes a resource addressable and capable of being modified. A resource and URI have a one-to-many relationship. Resources are changed using an application protocol such as HTTP.
- Representation: It is the data/metadata of the resource. A representation is a view of a resource's state at an instant of time. The client receives the representation of a resource when URI is requested. A view of a resource can be encoded in one or more transferable formats such as XML, HTML, JSON, RSS, etc. These formats can be negotiated using content negotiation mechanism.
- Link: It allows the application to do a transition from one state to another. Each resource should be connected to other resources. A representation should offer link to the next transition. A well connected application allows the user to discover the interface on his own.
Connector
A connector is an abstract interface that mediates communication between the components. As REST interactions are stateless, the connector does not have to store any state information. Hence, communication between the components can happen in parallel.
Client and Server are primary REST connectors. A client initiates the request and server process the request. Examples:
- Server Connector: Apache API
- Client Connector: Http Client
Cache is another connector. Caching can be implemented at client, server or intermediary layers. It reduces latency and network usage. Various types of cache are:
- Local cache: It stores the representation of a resource from various origin server on behalf of single user-agent
- Proxy cache: It stores representation from various origin server on behalf of many user-agent.
- Reverse proxy cache: It stores representation from one origin server on behalf of many consumers.
Resolver: It translates the partial or complete request to the actual address of the resource. It is responsible for initiating and sequencing the queries that ultimately lead to a full resolution of the resource such as, translation of a domain name into an IP address using DNS Resolver.
Components
Components perform a set of well-defined methods on a resource producing a representation to capture the current or intended state of that resource.
- User-Agent: It uses client connector to initiate the request.
- Origin-server: It uses server-connector to respond to the request.
- Proxy: It is an intermediary used at client end to provide interface encapsulation of other services. It also performs data translation and security protection.
- Gateway: It is an intermediary used at the server end to provide interface encapsulation of other services. It also performs data translation and security protection.
HTTP and REST
The hypertext transport protocol has a very special role in web architecture. Its primary role is to work as an application protocol for the communication between various web components. It works as a transport protocol for transferring resource representation. There are many changes done on HTTP for modern web architecture.
HTTP and Cache
REST have enforced some guidelines on the usage of HTTP headers to take the benefit of the caching. It improves network efficiency. The idea is to introduce the constraints on the semantics of these HTTP headers when used with different HTTP methods. Such as, for HTTP GET and HEAD methods, these headers have the semantics of a cache refresh (Cache-Control). For other HTTP methods, these headers have the meaning of a precondition.
The caching information is sent as a control data in the representation of the resource. This control data tells the client whether to cache the response or not such as Cache-Control, etc. "Vary" header is introduced in HTTP 1.1. This header lists all the headers with which the response resource representation may vary. For example, if "Vary" header lists Accept header, it means with different Accept header value, the representation is different. The cache use this header when returning response to the client.
When the caching is introduced in various layers, it is assumed that sometimes the consumer can work with the stale data. To increase the degree of consistency of the cached data with the actual data, there are certain techniques added in REST. InValidation
is the process to notify consumers and caches about the changes that has happened to the resources, for which they hold cached representation. But, to notify the consumers, server needs to have the list of all the consumers. This is against the REST guidelines. To resolve this, another technique known as Validation is used. It involves the consumer (caches) to maintain up to date copy of the resource representation by verifying a local copy with the origin server. This requires some sort of validation request to be sent to the server. For re-validating the local copy of the data with the origin server, REST have used two http headers. ETag with If-Match and If-None-Match headers and Last-Modified with If-Unmodified-Since and If-Modified-Since headers with Conditional Gets.
Problem Areas in HTTP
The key problem areas in HTTP that were identified by REST:
- Deployment of new protocol versions: The communication between the components requires that the connector must obey the constraints placed on the HTTP-version protocol element included in each message. REST have tried to resolve this by using HTTP protocols minor and major version. The HTTP-version of a message represents the protocol capabilities of the sender and the gross-compatibility (major version number) of the message being sent. For example if the sender sends the request with HTTP 1.0, the server should send the response which is HTTP 1.0 compatible. With this, each connection on a request/response chain can operate at its best protocol level in spite of the limitations of some clients or servers that are parts of the chain. This mechanism makes the system extensible.
- Separating message parsing from HTTP semantics: With the new changes, HTTP protocol separates the logic of parsing and forwarding the HTTP messages from the semantics associated with HTTP protocol elements. Such as For HEAD, content-length field has other meaning rather than the length of the response (HEAD is similar to GET request the only difference is that the response is not sent but the meta information (headers) such as content-length is sent. So in this case, content-length means the length of response that will be sent if the request would have been a GET request). And, usage of general response codes:
- [] 100-199 indicating that the message contains a provisional information response
- [] 200-299 indicating that the request succeeded
- [] 300-399 indicating that the request needs to be redirected to another resource
- [] 400-499 indicating that the client made an error that should not be repeated
- [] 500-599 indicating that the server encountered an error.
- Self descriptive messages: Each message contains all the information in a standardized form to get processed. REST introduces some headers and techniques to enable this. Such as:
- Host: The HOST header is added in HTTP 1.1. This introduction enables the sharing of the IP address among different domains. It requires that every request sends this header in a request with the host name. Currently there are many browsers that do not send the HOST header in the request. Applying this mechanism in a global level may take up some time.
- Content Negotiation: Content Negotiation is the way by which the representation of the resource can be negotiated, and rendered as per client choice.
- Persistent connections: HTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse. It enables the usage of same TCP connection to send and receive multiple HTTP requests/responses. In HTTP 1.1, all the connections are persistent by default. To reverse this default connection-directive "close" needs to be set to false. Persistence connection reduces resource usage and also reduces network congestion as fewer TCP connection is present.
- Authoritative and Non authoritative responses: When the request is sent from the user-agent, it does not know whether the response it received is from the origin server or from some intermediary component such as cache. This may lead to a situation where user-agent wants the fresh data but receives the stale data. To resolve this problem, there are some headers introduced to give the user-agent to control the response. Header, Cache-Control="no-cache" is added in the request makes sure that origin server is called for getting the response. Using this in a bulk may have performance issues. The other solution is to add a "Warning" header, to warn the user-agent that the response is not from the origin server.
Restful Web Services
Restful web API are the web services which are implemented using HTTP and follow REST guidelines. The strength of REST web services lies in its simplicity in development and testing. Despite the existing WS* standards, REST is gaining popularity day by day. And the reason behind it is its simplicity and usage of existing standards.
When designing a REST web services, the application domain and its entities are studied. The entities which need to be exposed are Resources. In a simple chat application, user and the messages posted by user are resources. To access the resource, there should be some endpoint so that the client can use it. These endpoints are known as URI. In chat application, user and the message are accessed using URI such as /user/,/user/messages/, etc. When any resource is fetched, the representation is fetched from the service. For example, user representation in XML will look like <user><name>Jack. The format of the data that comes in the representation can be negotiated by a client using content negotiation. Response codes and error code that should be sent from the REST web services are standard http status and response codes.
In REST, CRUD (Create
, Read
, Update
, Delete
) operations are performed using HTTP verbs. HTTP verbs tells the server what to do with the data identified by the URL. The most important HTTP verbs for a Restful system are GET
, POST
, PUT
and DELETE
.
Creating a resource with POST
The client can create a resource that does not exist in the system using POST
.The outcome of the POST
request can be, either a new resource is successfully created or some error occurs. If the resource is successfully created then, HTTP status code 200 (Successful) is returned to the client, in this case the link to the newly created resource is embedded in the response and in case of error either the 400 (Bad Request)or 500 (Internal Server Error) is sent to the client.
Getting resource state with GET
The client fetches the representation of the resource using GET
. The outcome of GET
request can be either 200 (Successful) i.e., representation is successfully fetched or, some error/exception occurred such as 404 (Resource Not Found) or 400 (Bad Request) or 500 (Internal Server Error).
Updating resource with PUT
PUT
is used to update or create the resource identified by the URI generated by a client. There is a small convention of when to use PUT
and when POST
as it might be confusing at some places. When the URI is created for the resource, this URI is either generated in the client or in the server for e.g. if the identification of the resource can be done from client generated identifier then while creating a new resource client can pass the URI, in this case this URI is called client generated URI. But if the identifier is generated in the server, then the URI is server generated URI. The general convention is summarized below:
- Use
POST
to create resource identified by server generated URI. - Use
POST
to append a resource to a collection identified by server generated URI - Use
PUT
to create or update a resource identified by client generated URI.
There are four outcomes of the PUT
request; 200 (Successful)or 204 (No Content) or, Error which can be either 404 (Resource Not Found), 409 (Conflict) or 500 (Internal Server Error).
Removing resource with DELETE
When the client decides to delete the resource from the system, he can issue DELETE
request. The possible outcomes are 200 (Successful) resource is successfully deleted, 204 (No Content), or Error which can be either 404 (Resource Not Found), 405 (Method not Allowed) or 503 (Service Unavailable).
REST and Security
REST does not provide any in-built support for security. It is very important when designing a REST webservices, security requirements and designing are decided upfront. REST webservices use HTTP GET
, POST
, PUT
and DELETE
from CRUD operations. PUT
and DELETE
are not supported by many browsers and are more often disabled at server level because of its security implications. If not properly configured at server and client level, anybody will be able to create a resource using PUT
method or destroy a resource using DELETE
. When designing a security requirements for the webservices, the following points should be considered.
- Decide which services are public and services which are protected.
- For all the protected services, use authentication models with strong encryptions. Consider using HMAC (Hash-based Message Authentication Code), it's more secure as compared to SHA (Secure Hash Algorithm) and MD5 (Message Digest Algorithm).
- Both the HTTP methods such as
PUT
and DELETE
put different security principles at server and application level. - If the state is maintained with the help of some unique token ID, make sure that the token id is encrypted.
- Any key that is shared between client and server should be encrypted.
- Always make sure that the authorization rules are defined for the services that are exposed.
- If any third party is involved, do not share the credentials with it, consider using OAuth. Twitter has adopted OAuth for authentication and authorizing REST APIs.
Commercial adoption of REST
Twitter /Amazon: Twitter and Amazon both expose the REST APIs. Twitter exposes the core twitter data, while Amazon's Simple Storage System exposes APIs to store and retrieve any amount of data. Amazon exposes both REST and SOAP web-services, but REST is widely accepted because client can easily consume it. The Uniform Interface constraint added by REST ensures that the services are easy to consume.
Akamai(Content Delivery Network): Akamai is the content delivery management system, which stores the content from the origin server who are tied up with Akamai. The idea is, whenever user-agent asks for any content from the origin server, the request comes to Akamai and the content is delivered from geographically closest server. Akamai have extended REST caching concept by replicating the server to various locations, and deliver the content to the user from the geographically nearest replicated server. The statelessness nature of request and representation enables to cache the data and independent processing in the replicated servers. Network performance is increased as the request does not have to go to the origin server everytime and can be served by the replicated server.
REST have been accepted by many commercial application because of its lightweight nature. There are other architectural styles which may have these features in the form of protocol and is customized, but the beauty of REST lies in the fact that it uses the existing web standards to exploit the benefit of web in much lesser complex and generalized way.
REST v/s SOAP-based web service(WS*)
- SOAP-based web service is protocol specific while REST is just the guideline.
- For using a SOAP based web services, client needs to create the proxy for the web service (stub) and then invoke the method on the proxy, while in REST client can just use the URI for a web service, and invoke the service by using HTTP
GET
on the URI. - SOAP-based web service requires help from external vendors to create proxy classes, etc. while in REST no external vendor is required.
- SOAP uses HTTP as a transport protocol for transferring the SOAP messages. The operation that needs to be executed is present in the SOAP message. Rest uses HTTP as an application protocol. HTTP verbs itself tells about the operations that need to be performed.
- REST WS are stateless which helps in scaling the system. The transports data formats in REST can be chosen as per client wish, also, optimize the performance.
On the principle level, both WS -* and REST provide the same features. But on conceptual and technology level, WS-* is a complex process which is supported by various tools that take away most of complex process, REST is simple but requires lots of coding at low level.
Conclusion
From a simple RPC to a SOAP based webservice and now to a REST, web has seen a lot of changes and more changes are inevitable. From Web 1.0 to Web 2.0, everyone is involved in web. In these few years, REST have seen lot more adoption commercially such as in Amazon, Yahoo, etc., and the reason has been its simplicity and usage. REST is evolved as a foundation of modern web architecture which is evolved by analyzing the flaws of the pre-existing styles and introducing new additions to it. The aim of the REST is to use the existing styles with coordinated set of constraints to minimize the latency and network communication and to maximize the independent evolution of the components to attain scalability. It's a novel architecture for distributed hypermedia system. With more and more devices such as mobile, IPads, the adoption of web and its scalability will shoot up. The future is not far where the websites want to turn into web-services. For the consumption of these web services, a more standardized style is needed which REST gives. In the future, we may see SOAP and REST go along together, with SOAP adopting REST guidelines.
Future of Web
In Web 2.0, humans are responsible for creating the data. The focus of Web 3.0 will be machines creating the data. Studies are being done on the important features of Web 3.0, i.e., Semantic Web and Personalization. Work has been started on semantic web, where data is not about navigating through the links. It is more focused on understanding the context of the data, and the relationships within a given knowledge domain. Various technologies such as Resource Description Framework, Ontology are intended to provide these features. How REST guidelines help to achieve these features is the future to watch.
History
- 12th September, 2011: Initial version