This is a beginner's guide to how HTTP works.
Introduction
As a daily routine, you launch a web browser, type into the address bar of a browser some text that reads like http://peterleowblog.com, and wait while some web page is being loaded onto the browser. A typical web page consists of texts, images, and links to other web pages. You can then navigate to another web page by clicking on one of those links. For ordinary users, that is all that they care. For web developers, however, there is more to this run-of-the-mill practice than meets the eye — the unseen conversation that takes place between the browser and the web server triggered by each user's page request.
The protocol that governs the communication between the browser and the web server is none other than the well-known HTTP — HyperText Transfer Protocol. HTTP is a textual and stateless protocol that does not remember prior communications. A typical HTTP session starts with the client, usually a web browser, establishing a connection to the web server, followed by a series of request-response cycle where in the nutshell:
Step 1. The client sends its request formatted as an HTTP request message to the web server via a URL, e.g. http://www.example.com, and waits for the response.
Step 2. On receiving a request, the web server at the URL processes the request and sends its answer back to the client formatted as an HTTP response message.
Step 3. Repeat Step 1 for subsequent request.
Apart from serving static HTML files, the web server may return dynamic contents that are generated on the fly from server-side scripts parsing and database operations with the help of other software such as the PHP engine and MySQL.
Both request and response messages share a similar structure — each consists of a list of text directives, separated by CRLF (carriage return, followed by line feed), and organized into three sections: a start line section at the beginning, a header section that contains some header fields and an ending blank line in the middle, and a data section that contains any payload at the end of the message.
Seeing is Believing
Let's walk through an example: on the document root of your local web server, create a directory called testsite that contains an HTML file named index.html and an image file named ball.png. The HTML file contains the following HTML markup:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>HTTP Headers</title>
</head>
<body>
<h1>HTTP Headers</h1>
<img src="ball.png">
</body>
</html>
In a browser, this HTML file will be rendered as a web page as shown in Figure 1:
Figure 1: index.html
To get the page shown in Figure 1, start the web server, then enter in the browser address bar this URL as shown:
http://localhost/testsite/index.html
This way, you do not get to see the raw conversation that took place over HTTP. Let's skip the browser part and access the index.html via a telnet session from a text terminal, such as the Command Prompt of Windows, instead. Follow me...
HTTP over Telnet
In the Command Prompt of Windows, type:
telnet localhost 80
and press Enter to open a connection to the web server on port 80. Next, copy and paste the following text (including the ending blank line which is mandatory to signal the end of the header section) to the terminal and hit the Enter key:
GET /testsite/index.html HTTP/1.1
Host: localhost
Accept: text/html
Unknowingly, you have just composed and submitted an HTTP request message (which is usually done by the browser) to the web server.
On receiving the HTTP request message for the index.html file, the web server locates the index.html and embeds its content in an HTTP response message that may read like the following text in the terminal for return to the client:
HTTP/1.1 200 OK
Date: Thu, 16 Nov 2017 16:40:10 GMT
Server: Apache/2.4.23 (Win32) OpenSSL/1.0.2h PHP/5.6.28
Last-Modified: Thu, 16 Nov 2017 16:28:27 GMT
ETag: "a4-55e1c1c1e1486"
Accept-Ranges: bytes
Content-Length: 164
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>HTTP Headers</title>
</head>
<body>
<h1>HTTP Headers</h1>
<img src="ball.png">
</body>
</html>
View the whole process as animated in Figure 2:
Figure 2: HTTP over Telnet
Note
An HTTP request message starts with a request line that consists of the method to be applied to the resource (GET
), the identifier of the resource (/testsite/index.html), and the HTTP protocol version in use (HTTP/1.1
). This is followed by some request headers that contain additional information about the request to the web server in the form of headerFieldName=value pairs (Host: localhost
and Accept: text/html
). Observe that the request headers section ends with a blank line after which the data section should follow if there is any payload to be sent to the server which in this example there is none. We will explore another example of HTTP request message with payload shortly.
Check out the different request methods and request headers from the following links:
Note
An HTTP response message starts with a status line which consists of the HTTP protocol version in use (HTTP/1.1
), a 3-digit status code (200
), and the reason phrase (OK
) associated with the status code. This is followed by some response headers that contain additional information about the response in the form of headerFieldName=value pairs (Server: Apache/2.4.23 (Win32) OpenSSL/1.0.2h PHP/5.6.28
, Content-Length: 164
, etc.). The payload, which is the content of the requested resource (index.html), comes after the blank line (with nothing but a CRLF
) that indicates the end of the response headers section. Check out the different status codes and response headers from the following links:
HTTP over Web
Instead of the cumbersome terminal, you can actually check out the raw conversation over HTTP using the developer tools provided by modern browsers. A quick way to access the developer tools is; open a web page in Chrome or Firefox, hit Ctrl+Shift+I on Windows / Linux or Command+Option+I on Mac, and you will be greeted with the developer tools window opening at the bottom of the browser as shown in Figure 3 in Chrome.
Figure 3: Developer Tools
The developer tools comprise a set of functional tools that allow developers to, among other things, inspect DOM, edit CSS, debug scripts, and profile a web page. They are accessible from the list of tabs in the toolbar of the developer tools window. Clicking on a tab opens a corresponding panel where you can perform specific tasks provided by the tool of that tab. Here, we are only interested in the Network tab. The Network panel under the Network tab provides information on network activities that occurred on a web page, including HTTP headers, response, cookies, etc.
With the Network panel open and the All filter option selected, enter http://localhost/testsite/index.html in the browser address bar and hit Enter, you should get a screen that looks like that in Figure 4:
Figure 4: HTTP Request and Response for index.html
The screen in Figure 4 reveals the HTTP request and response messages under the Headers tab pertaining to index.html as indicated in the Name
panel. The payload of the response is shown separately under the Response
tab as shown in Figure 5:
Figure 5: The Response Payload for index.html
Wait, the story hasn't ended yet. Did you notice that ball.png appearing below index.html in the Name
panel? If you are curious, click on it; do you see another set of HTTP request and response messages under the Headers tab that looks like those shown in Figure 6:
Figure 6: HTTP Request and Response for ball.png
When the browser that interprets the index.html comes across the image markup, i.e., <img src="ball.png">
, it will initiate a new round of request-response cycle with the web server in its bid to request that image resource. The address of the web page, i.e., index.html in this case, from which the link to the requested image, i.e., ball.png in this case, originates can be found in the request header field called Referer
as shown:
Referer: http://localhost/testsite/index.html
The same goes for any external resources, such as audios, videos, CSS files, JavaScript files, plug-ins, and so on, that are specified in a web page. In other words, a complete download of a web page may take several cycles of request-response depending on the number of external resources specified. This is illustrated in Figure 7:
Figure 7: HTTP Request-Response Cycles
Mimicking HTTP's Conversation in the Real World
If the web browser and the web server were real human beings, how would the HTTP conversation have taken place in natural language? Try this:
Browser: Hi, I'm Mozilla (User-Agent: Mozilla/5.0
), can you send me the HTML file (Accept: text/html
) at http://localhost/testsite/index.html
(GET /testsite/index.html HTTP/1.1
)?
Server: Hi, I'm Apache (Server: Apache/2.4.23
). I have succeeded in finding the file (HTTP/1.1 200 OK
). It is in HTML text (Content-Type: text/html
). It reads ....(payload).
Browser: Hi, I'm Mozilla (User-Agent: Mozilla/5.0
), may I have the image file (Accept: image/*
) at http://localhost/testsite/ball.png
(GET /testsite/ball.png HTTP/1.1
) which is referenced in http://localhost/testsite/index.html
(Referer: http://localhost/testsite/index.html
)?
Server: Hi, I'm Apache (Server: Apache/2.4.23
). I have succeeded in finding the file (HTTP/1.1 200 OK
). It is an image (Content-Type: image/png
).
In the pseudo conversation above, part of the sentences are annotated by the corresponding HTTP headers in parentheses. Computers are generally weak in handling unstructured natural language. To overcome this, HTTP request and response messages are organized and structured into different header fields, each of which carries a pre-defined role and meaning in the whole HTTP process.
HTTP Request with Payload
So far, you have seen an example of HTTP request without payload, let's walk through one with payload. A request with payload is usually initiated by a user submitting data via an HTML form. In the testsite directory, add an HTML file called enquiry.html that contains the following HTML markup:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Books Enquiry</title>
</head>
<body>
<h1>Books Enquiry</h1>
<form action="response.php" method="get">
My Name:<br>
<input type="text" name="name">
<br><br>
Title of Book:<br>
<input type="text" name="booktitle">
<br><br>
<input type="submit" value="Submit">
</form>
</body>
</html>
In a browser, this HTML file will be rendered as a web page with two text fields and one Submit button as shown in Figure 8.
Figure 8: enquiry.html
Now, enter a name and a book title, say Peter Leow
and Hands-on with PHP
, into the respective text fields, hit the Submit button. This will initiate an HTTP request using GET
method (specified in the method
attribute of the <form>
tag) with the entered name and book title as payload to the web server to be picked up by a response.php (specified in the action
attribute of the <form>
tag) that contains the following script:
// This is a very rudimental code for demo only
$name = $_REQUEST["name"];
$booktitle = $_REQUEST["booktitle"];
// Assuming there is code to search database and found that book
echo "Dear $name<br><br>The book titled \"$booktitle\" is currently on loan.";
?>
On receiving the request, the response.php will read the book title received, supposedly searches for it in a database, and then generates a reply based on the outcome of the search. An example output of response.php is shown in Figure 9.
Figure 9: response.php
As shown in Figure 9, payload data sent via GET
method are represented in the form of name=value pairs (name=Peter+Leow&booktitle=Hands-on+with+PHP
), delimited with &
symbol, and url-encoded (replacing spaces with +
). They are visibly appended to the URL as query string parameters. They appear on the request line of the HTTP request message as follows:
GET /testsite/response.php?name=Peter+Leow&booktitle=Hands-on+with+PHP HTTP/1.1
In place of enquiry.html, you can also call the response.php with the data via a telnet session like what you have done previously.
Copy and paste the following text (including the ending blank line which is mandatory to signal the end of the header section) to the telnet console and hit the Enter
key:
GET /testsite/response.php?name=Peter+Leow&booktitle=Hands-on+with+PHP HTTP/1.1
Host: localhost
Accept: text/html
You should received the following response from response.php:
HTTP/1.1 200 OK
Date: Thu, 16 Nov 2017 17:40:10 GMT
Server: Apache/2.4.23 (Win32) OpenSSL/1.0.2h PHP/5.6.28
X-Powered-By: PHP/5.6.28
Content-Length: 80
Content_Type: text/html; charset=UTF-8
Dear Peter Leow<br><br>The book titled "Hands-on with PHP" is currently on loan.
Alternatively, you can send data to the web server using POST
method. The following HTTP request message uses POST
method to send data as payload to response.php via a telnet session:
POST /testsite/response.php HTTP/1.1
Host: localhost
Content-Length: 43
Content-Type: application/x-www-form-urlencoded
name=Peter+Leow&booktitle=Hands-on+with+PHP
Like its GET
counterpart, payload data sent via POST
method are represented in the form of name=value pairs (name=Peter+Leow&booktitle=Hands-on+with+PHP
), delimited with &
symbol, and url-encoded (replacing spaces with +
). Unlike its GET
counterpart where they are being visibly appended to the URL, however, payload data sent via POST
method are embedded after the ending blank line of request header section which make them invisible to naked eyes.
If you are to mimic HTTP request with payload in the real world, it may go like this:
enquiry.html: Hi, I'm Mozilla (User-Agent: Mozilla/5.0
), my name is Peter Leow (name=Peter+Leow
), I am looking for this book titled "Hands-on with PHP" (booktitle=Hands-on+with+PHP
). Is it available in your library (GET /testsite/response.php?name=Peter+Leow&booktitle=Hands-on+with+PHP HTTP/1.1
)?
(On receiving the request from enquiry.html, response.php looked up the book and found out that it was on loan.)
response.php: Dear Peter Leow, I'm response.php (HTTP/1.1 200 OK
) from Apache (Server: Apache/2.4.23
). The book titled "Hands-on with PHP" which you are looking for is currently on loan (Dear Peter Leow<br><br>The book titled "Hands-on with PHP" is currently on loan.
).
End of Conversation
More often than not, users pay no attention to how the raw conversation takes place over HTTP since it just happens and works out of the box between the browser and the web server. For developers, however, understanding the mechanism of HTTP enables them to, among other things:
- set values of response headers through server-side scripting to implement certain useful functionalities that would otherwise not be possible. Some of these functionalities include redirecting the browser to a specific URL and downloading resources as files instead of displaying them on screen.
- perform analytics based on data gathered from request headers received by the server.
History
- 2nd January, 2018: Initial version