Making GET Requests in Python - Tutorial

Lenny Cheng

5.00/5 (6 votes)

15 Apr 2016CPOL5 min read

23K

We use Chrome DevTools, and the urllib and BeautifulSoup libraries to programmatically GET contents from www.codeproject.com.

Introduction

In this article, we examine how to make GET requests with Python. We will be using the urllib library to make GET requests and the BeautifulSoup library to parse the contents of the response. We will also use Chrome DevTools to identify HTML elements on a webpage.

You can also download the associated Python file from https://github.com/NanoBreeze/Making-GET-Requests-in-Python-Tutorial

Background

If you’re new to web development or networking, this section may provide some context for making GET requests.

The Hypertext Transfer Protocol (HTTP) is a set of rules for passing (between server and clients) and interpreting Hypertext Markup Language (HTML), the language web sites are built upon. HTTP contains several ways (also known as request methods) for a client to indicate to a server what it seeks. One of the most frequently used request methods is the GET request.

A client makes a GET request to the server when it seeks a specific resource located on the server, such as an HTML file, or an image file, or a Word document. When you search “hello world” in Google’s searchbar, your browser (the client) is making a GET request to Google’s servers for websites related to “hello world”. Google’s servers then respond to your browser’s GET request with a list of related websites. After you enter a URL into a browser’s address bar, the browser sends a GET request to the server hosting that website. The server then responds with the resource associated with the URL you had typed (often times, a web page).

Today, let’s learn how to make GET requests programmatically without using a browser.

(Side Note: Another frequently used request method is the POST method. A client makes a POST request when it submits information to a server, such as form submissions and logins.)

Making the GET Request

The urllib library contains a function called urlopen(“url”), which makes a GET request to the url and returns an HTTPResponse object. A HTTPResponse object has a read() method, which returns the HTML associated with that webpage.

Python

from urllib.request import urlopen

codeProjectHtml = urlopen("http://www.codeproject.com/")
print(codeProjectHtml.read())

Here's the HTML:

Although we can see the HTML, it is difficult to read because it lacks indentations. We can make the HTML more readable by using the BeautifulSoup library and adding one more line to our existing code: we instantiate a BeautifulSoup object and pass it two parameters:

the HTML from codeProjectHtml.read()
the string “html.parser”, which instructs the BeautifulSoup instance to use Python’s default parser library

Instead of printing the HTML, we print the BeautifulSoup instance. Here's our code:

Python

from urllib.request import urlopen
from bs4 import BeautifulSoup

codeProjectHtml = urlopen("http://www.codeproject.com/")
bsInstance = BeautifulSoup(codeProjectHtml.read(), "html.parser")
print(bsInstance)

The HTML is much more readable:

GET Request in Practice

In addition to beautifying HTML, BeautifulSoup can also search the HTML for elements, attributes, and text. Let’s use an exercise to learn how to use BeautifulSoup to search for elements: let’s find the number of members online at www.codeproject.com. There are four steps for solving this exercise:

Since the number of users online is shown on codeproject.com, we need to obtain the HTML of www.codeproject.com (we had already done this in the above section)
Determine which element contains the number of members online (we use Chrome DevTools)
Search the HTML from Step 1 for the element from Step 2 (we use BeautifulSoup)
Display the number of members online

Completing Step 1: Get HTML

We write:

Python

codeProjectHtml = urlopen("http://www.codeproject.com/")

Completing Step 2: Determine element with Chrome DevTools

Chrome DevTools is a set of debugging tools built into Chrome and we can use it to find the HTML of any element by right clicking it and selecting “Inspect”:

Chrome DevTools then highlights in light gray the HTML associated with the element containing the number of members online. Simultaneously, the text containing the number of members online is also highlighted in light blue:

We note that number of members online is in a <div> element with an id is ctl00_MemberMenu_GenInfo.

Completing Step 3: Searching for element with BeautifulSoup

To find the element from the HTML from Step 1, we instantiate a BeautifulSoup instance from the HTML from Step 1 and call its find(...) method. The find(...) method is overloaded, and we pass it two parameters: the first is the element type; the second is a dictionary whose key and value are the element’s attribute and value. We write:

bsInstance = BeautifulSoup(codeProjectHtml.read(), "html.parser")
memberMenu = bsInstance.find("div", {"id" : "ct100_MemberMenu_GenInfo"})

Completing Step 4: Printing result

Since we seek the text inside the <div>, we will call the get_text() method on memberMenu. We print the text in the element:

Python

print(memberMenu.get_text())

Full Code

Python

from urllib.request import urlopen
from bs4 import BeautifulSoup 

codeProjectHtml = urlopen("http://www.codeproject.com/")
bsInstance = BeautifulSoup(codeProjectHtml.read(), "html.parser")
memberMenu = bsInstance.find("div", {"id" : "ct100_MemberMenu_GenInfo"})
print(memberMenu.get_text())

Exercise

Let’s try to find the name of the first article that appears on www.codeproject.com when we search “Python”.

Once again, we can split this task into four parts.

Completing Step 1 : Get HTML

We manually type in “Python” in codeproject’s search bar and note that the URL is http://www.codeproject.com/search.aspx?q=python

Python

codeProjectPythonSearchHtml = urlopen("http://www.codeproject.com/search.aspx?q=python")

Completing Step 2: Determine element with Chrome DevTools

We Inspect the element that contains the name of the first article (at the time of this writing, it’s “Python Code Generator Written in Python”), and note that it’s an <a> element with an id of ct100_MC_Results_ct100_DocTitle. We can ensure that this id is not exclusive only when we search for “Python” by making a different search and examining the element containing the name of the first article.

Completing Step 3: Searching for element with BeautifulSoup

We call the find(...) method on firstArticle:

Python

bsInstance = BeautifulSoup(codeProjectPythonSearchHtml.read(), "html.parser")
firstArticle = bsInstance.find("a", {"id" : " ct100_MC_Results_ct100_DocTitle"})

Completing Step 4: Printing result

We print the element's text:

Python

print(firstArticle.get_text())

Full Code

Python

from urllib.request import urlopen
from bs4 import BeautifulSoup

codeProjectPythonSearchHtml = urlopen("http://www.codeproject.com/search.aspx?q=python")
bsInstance = BeautifulSoup(codeProjectPythonSearchHtml.read(), "html.parser")
firstArticle = bsInstance.find("a", {"id" : " ct100_MC_Results_ct100_DocTitle"})
print(firstArticle.get_text())

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Making GET Requests in Python - Tutorial

Introduction

Background

Making the GET Request

GET Request in Practice

Completing Step 1: Get HTML

Completing Step 2: Determine element with Chrome DevTools

Completing Step 3: Searching for element with BeautifulSoup

Completing Step 4: Printing result

Full Code

Exercise

Completing Step 1 : Get HTML

Completing Step 2: Determine element with Chrome DevTools

Completing Step 3: Searching for element with BeautifulSoup

Completing Step 4: Printing result

Full Code

Further Reading

License