Introduction
In this article, we examine how to make GET requests with Python. We will be using the urllib library to make GET requests and the BeautifulSoup library to parse the contents of the response. We will also use Chrome DevTools to identify HTML elements on a webpage.
You can also download the associated Python file from https://github.com/NanoBreeze/Making-GET-Requests-in-Python-Tutorial
Background
If you’re new to web development or networking, this section may provide some context for making GET requests.
The Hypertext Transfer Protocol (HTTP) is a set of rules for passing (between server and clients) and interpreting Hypertext Markup Language (HTML), the language web sites are built upon. HTTP contains several ways (also known as request methods) for a client to indicate to a server what it seeks. One of the most frequently used request methods is the GET request.
A client makes a GET request to the server when it seeks a specific resource located on the server, such as an HTML file, or an image file, or a Word document. When you search “hello world” in Google’s searchbar, your browser (the client) is making a GET request to Google’s servers for websites related to “hello world”. Google’s servers then respond to your browser’s GET request with a list of related websites. After you enter a URL into a browser’s address bar, the browser sends a GET request to the server hosting that website. The server then responds with the resource associated with the URL you had typed (often times, a web page).
Today, let’s learn how to make GET requests programmatically without using a browser.
(Side Note: Another frequently used request method is the POST method. A client makes a POST request when it submits information to a server, such as form submissions and logins.)
Making the GET Request
The urllib library contains a function called urlopen(“url”)
, which makes a GET request to the url and returns an HTTPResponse object. A HTTPResponse object has a read()
method, which returns the HTML associated with that webpage.
from urllib.request import urlopen
codeProjectHtml = urlopen("http://www.codeproject.com/")
print(codeProjectHtml.read())
Here's the HTML:
Although we can see the HTML, it is difficult to read because it lacks indentations. We can make the HTML more readable by using the BeautifulSoup library and adding one more line to our existing code: we instantiate a BeautifulSoup
object and pass it two parameters:
- the HTML from
codeProjectHtml.read()
- the string
“
html.parser
”
, which instructs the BeautifulSoup instance to use Python’s default parser library
Instead of printing the HTML, we print the BeautifulSoup
instance. Here's our code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
codeProjectHtml = urlopen("http://www.codeproject.com/")
bsInstance = BeautifulSoup(codeProjectHtml.read(), "html.parser")
print(bsInstance)
The HTML is much more readable:
GET Request in Practice
In addition to beautifying HTML, BeautifulSoup can also search the HTML for elements, attributes, and text. Let’s use an exercise to learn how to use BeautifulSoup to search for elements: let’s find the number of members online at www.codeproject.com. There are four steps for solving this exercise:
- Since the number of users online is shown on codeproject.com, we need to obtain the HTML of www.codeproject.com (we had already done this in the above section)
- Determine which element contains the number of members online (we use Chrome DevTools)
- Search the HTML from Step 1 for the element from Step 2 (we use BeautifulSoup)
- Display the number of members online
Completing Step 1: Get HTML
We write:
codeProjectHtml = urlopen("http://www.codeproject.com/")
Completing Step 2: Determine element with Chrome DevTools
Chrome DevTools is a set of debugging tools built into Chrome and we can use it to find the HTML of any element by right clicking it and selecting “Inspect”:
Chrome DevTools then highlights in light gray the HTML associated with the element containing the number of members online. Simultaneously, the text containing the number of members online is also highlighted in light blue:
We note that number of members online is in a <div>
element with an id
is ctl00_MemberMenu_GenInfo
.
Completing Step 3: Searching for element with BeautifulSoup
To find the element from the HTML from Step 1, we instantiate a BeautifulSoup
instance from the HTML from Step 1 and call its find(...)
method. The find(...)
method is overloaded, and we pass it two parameters: the first is the element type; the second is a dictionary whose key and value are the element’s attribute and value. We write:
bsInstance = BeautifulSoup(codeProjectHtml.read(), "html.parser")
memberMenu = bsInstance.find("div", {"id" : "ct100_MemberMenu_GenInfo"})
Completing Step 4: Printing result
Since we seek the text inside the <div>
, we will call the get_text()
method on memberMenu
. We print the text in the element:
print(memberMenu.get_text())
Full Code
from urllib.request import urlopen
from bs4 import BeautifulSoup
codeProjectHtml = urlopen("http://www.codeproject.com/")
bsInstance = BeautifulSoup(codeProjectHtml.read(), "html.parser")
memberMenu = bsInstance.find("div", {"id" : "ct100_MemberMenu_GenInfo"})
print(memberMenu.get_text())
Exercise
Let’s try to find the name of the first article that appears on www.codeproject.com when we search “Python”.
Once again, we can split this task into four parts.
Completing Step 1 : Get HTML
We manually type in “Python” in codeproject’s search bar and note that the URL is http://www.codeproject.com/search.aspx?q=python
codeProjectPythonSearchHtml = urlopen("http://www.codeproject.com/search.aspx?q=python")
Completing Step 2: Determine element with Chrome DevTools
We Inspect the element that contains the name of the first article (at the time of this writing, it’s “Python Code Generator Written in Python”), and note that it’s an <a>
element with an id
of ct100_MC_Results_ct100_DocTitle
. We can ensure that this id
is not exclusive only when we search for “Python” by making a different search and examining the element containing the name of the first article.
Completing Step 3: Searching for element with BeautifulSoup
We call the find(...)
method on firstArticle
:
bsInstance = BeautifulSoup(codeProjectPythonSearchHtml.read(), "html.parser")
firstArticle = bsInstance.find("a", {"id" : " ct100_MC_Results_ct100_DocTitle"})
Completing Step 4: Printing result
We print the element's text:
print(firstArticle.get_text())
Full Code
from urllib.request import urlopen
from bs4 import BeautifulSoup
codeProjectPythonSearchHtml = urlopen("http://www.codeproject.com/search.aspx?q=python")
bsInstance = BeautifulSoup(codeProjectPythonSearchHtml.read(), "html.parser")
firstArticle = bsInstance.find("a", {"id" : " ct100_MC_Results_ct100_DocTitle"})
print(firstArticle.get_text())
Further Reading
Thank you for reading! I hope this article was helpful to you. If you have any feedback, please leave a comment below.