Convert "shift-jis bytes" to "utf-8 string" in python

A rudimentary question. I'm trying to get html code using urllib in python3.

 request=urllib.request.Request(url)
  response=urllib.request.urlopen(request) 
  html=response.read().decode('utf-8')

I'm getting the source code as above. I can get it well when the page I'm getting is utf-8, but if it's encoded with shift-jis

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 228: invalid start byte

I get an error. I think it will work if I convert "shift-jis" of "bytes" to "utf-8" of "bytes". Is that possible?

python python3

2022-09-29 21:25

1 Answers

Simply html=response.read().decode('ShiftJIS') is fine.

To decide which encoding to use, you have to look at all the information you can think of, such as http headers, lang attributes of html elements, meta elements, and so on, but if you're trying to do that, you'd better use .

2022-09-29 21:25

If you have any answers or tips

Popular Tags

python x 4647
android x 1593
java x 1494
javascript x 1427
c x 927
c++ x 878
ruby-on-rails x 696
php x 692
python3 x 685
html x 656

Popular Questions

549 Understanding How to Configure Google API Key
543 Uncaught (inpromise) Error on Electron: An object could not be cloned
545 Unable to install versioned in Google Colab
561 Scrap text information after the "View More" button when searching in the Yahoo! News search window
552 rails db:create error: Could not find mysql2-0.5.4 in any of the sources