Join us and get access to thousands of tutorials and a community of expert Pythonistas.

This lesson is for members only. Join us and get access to thousands of tutorials and a community of expert Pythonistas.

Unlock This Lesson

Go From Bytes to Strings

HTTP Requests With Python's urllib.request Alexandra Davis 03:57

When you use urllib.request.urlopen(), the body of the response is a bytes object. In this lesson, you’ll decode those bytes into strings using their character encoding. You’ll probably be safe defaulting to UTF-8 because it is the dominant character encoding, with 98 percent of web pages today being encoded with UTF-8.

00:00 In this lesson, you will learn how to go from bytes to strings. When you use urllib.request.urlopen(), the body of the response is a bytes object.

00:09 The first thing you may want to do is convert the bytes object to a string. To do this, you need to decode the bytes. With Python, all you need to do is find out the character encoding used. Encoding, especially when referring to character encoding, is often referred to as a character set.

00:27 You’ll probably be safe defaulting to UTF-8 because it is the dominant character encoding, with 98 percent of web pages today being encoded with UTF-8. This number is the result of a recent web technology survey.

00:38 You can find a link to it in the text below this video. You can see what decoding this looks like in the code. First, make sure you have urlopen imported from the urllib.request module.

00:52 Next, you’ll use the with statement to create a context in which you can open the website and read the response.

01:01 You’ll set the body variable to the response.read() method. The contents of the website are now stored in the body variable, but they’re in binary format.

01:10 To convert them to a string of UTF-8, you’ll need to decode the binary data using the .decode() method.

01:20 Pass "utf-8" into the .decode() method. To output the decoded content, you can use the print() function and to not show the whole content, you can slice the content—for example, at position 30.

01:33 This will give you a good impression of how the decoded body looks.

01:38 Once again, you can run your script, so that will be py urllib_requests.py and hit Enter. You’ll get back a UTF-8 string that has printed the first thirty characters of the HTML document.

01:54 The output will show <!doctype html> <html> and <head>, all of which are the beginning of an HTML document.

01:58 and head, all of which are the beginning of an HTML document.