TypeError: can't use a string pattern on a bytes-like object in Python

Dung Do Tien Sep 16 2021 20

Hello you guys, I am a newbie in Python and I'm also studying more about Python.

I have a small code function, I want to get all link from an original link. Like this:

 url = 'http://google.com'
linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>')
m = urllib.request.urlopen(url)
msg = m.read()
links = linkregex.findall(msg)
print(links)

But I get an TypeError: can't use a string pattern on a bytes-like object when I run code above.

 TypeError: can't use a string pattern on a bytes-like object

And I am using python 3.8.2

Anyone can explain it to me? How can I solve it?

Thanks for any response.

Have 2 answer(s) found.
  • G

    Gerardo Valle Sep 16 2021

    You have used the string pattern for the bytes object. Use the byte pattern instead:You have used the string pattern for the bytes object. Use the byte pattern instead:

    Replace:

     linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>')

    To

     linkregex = re.compile(b'<a\s*href=[\'|"](.*?)[\'"].*?>')

    I hope it useful for you.

  • đ

    đặng thái sơn Sep 16 2021

    The url you have for google didn't work for me, so I replaced it http://www.google.com/ig?hl=en which works for me.

    Try it:

     import re
    import urllib.request
    
    url="http://www.google.com/ig?hl=en"
    linkregex = re.compile('<a\s*href=[\'|"](.*?)[\'"].*?>')
    m = urllib.request.urlopen(url)
    msg = m.read():
    links = linkregex.findall(str(msg))
    print(links) 
Leave An Answer
* NOTE: You need Login before leave an answer

* Type maximum 2000 characters.

* All comments have to wait approved before display.

* Please polite comment and respect questions and answers of others.

Popular Tips

X Close