Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / MFC

Python – Search Youtube for Video

5.00/5 (4 votes)
5 Feb 2015GPL32 min read 49.9K  
This code is for Python 3. I was surprised to discover that I couldn’t really find a good way to do this when I Googled for a solution. I just kept getting results for Google’s youtube API, which is great… but also massive overkill for what I wanted to do.

This code is for Python 3.

I was surprised to discover that I couldn’t really find a good way to do this when I Googled for a solution. I just kept getting results for Google’s youtube API, which is great… but also massive overkill for what I wanted to do. I just wanted to search for a youtube video and return the top result. Here’s some simple code showing you how to do exactly  that. If you don’t care how it works just skip to “Using It”.

import urllib.request
import urllib.parse
import re

query_string = urllib.parse.urlencode({"search_query" : input()})
html_content = urllib.request.urlopen("http://www.youtube.com/results?" + query_string)
search_results = re.findall(r'href=\"\/watch\?v=(.{11})', html_content.read().decode())
print("http://www.youtube.com/watch?v=" + search_results[0])

 How It Works

First let’s look at the anatomy of a youtube search URL. Here’s the one I experimented with:

http://www.youtube.com/results?search_query=Epic+Rock+-+Ready+For+This+%282014%29%28Battle+Rock%29%28All+Good+Things%29

If you ignore the end, it’s really not that complicated. Anything you search for is just http://www.youtube.com/results?search_query= with your search string URL encoded tacked onto it. So that’s step one. The first line takes whatever the user enters and changes it from a user readable query to a URL. It looks like this:

Before: Epic Rock – Ready For This (2014)(Battle Rock)(All Good Things)
After: Epic+Rock+-+Ready+For+This+%282014%29%28Battle+Rock%29%28All+Good+Things%29

The urllib.parse.urlencode function returns key value pairs and in this case will return search_query=Epic+Rock+-+Ready+For+This+%282014%29%28Battle+Rock%29%28All+Good+Things%29. The next line simply “browses” to the URL and returns a file-like URL object. Now we want to find the top result. As it turns out the video results always follow the syntax href=”/watch?v=<11_DIGIT_IDENTIFIER>”. So we just want to search for all instances of href=”/watch?v=<11_DIGIT_IDENTIFIER>”. We use a regular expression to do that. The html_content.read().decode() part simply reads the file object and decodes it into a text string for our regular expression to parse. The regular expression you see basically says match href=”/watch?v= and then the .{11} part means match anything that repeats 11 times. The parenthesis around the 11 are what’s called a group. We’re really not interested in the href part of the expression, we just want the 11 digit identifier of the youtube video. The parenthesis cause the regular expression to just return a list of the groups matched. So we’d get something like this [4fmwMXEUWOI, <another 11 digit identifier>, another, etc].

The last line of the program also leverages the predictable nature of youtube URLs. The video URL will always be: http://www.youtube.com/watch?v=<11 digit identifier>. So we just concatenate the predictable part with the first result from our list.

Example Output

For this code snippet you might type in: Epic Rock – Ready For This (2014)(Battle Rock)(All Good Things)
The program would spit out: http://www.youtube.com/watch?v=4fmwMXEUW0I

Using It

To use the code simply copy and paste it into your program. Replace “input()” in the first line with whatever string you want to query youtube for and then change the last line to use the results as needed. The search_results variable will be a list of your the 11 digit video identifiers. If you just want the first one it’s just search_results[0].

Hope this saves someone else some time.

Grant

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)