Click here to Skip to main content
16,004,782 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I have a program in python , that downloads stock data for the stock with the symbol AAPL. When i run this code , im getting
C#
cannot use a string pattern on a bytes-like object  in python
.


Python
import urllib

import re


import urllib.request as ur
s = ur.urlopen("http://finance.yahoo.com/quote/q?s=AAPL&ql=1")
sl = s.read()
print(sl)

#htmlfile = urllib.openurl("http://finance.yahoo.com/quote/q?s=AAPL&ql=1")


htmltext = s.read()


regex = '<span class="Fw(b) Fz(36px) Mb(-4px)">(.+?)</span>'
pattern  = re.compile(regex)

price = re.findall(pattern, htmltext)


print  (price)


What I have tried:

**********************************************************************************************************-----------------------------------------------------------------------------------------------------------------------------------------------------
Posted
Updated 5-Dec-16 21:36pm

1 solution

The object returned by urlopen() is a byte object because urlopen() can not determine the encoding of the byte stream. You have to determine the encoding first (dynamically or using a fixed one when knowing it) and decode the data.

Example for UTF-8:
Python
htmltext = s.read().decode('utf-8')


The encoding is often specified in the charset argument of a Content-type header. This can be accessed by s.info().get_content_charset() or s.headers.get_content_charset(). So you might check this first and use it if not None.
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900