Convert "shift-jis bytes" to "utf-8 string" in python

Asked 2 years ago, Updated 2 years ago, 31 views

A rudimentary question. I'm trying to get html code using urllib in python3.

 request=urllib.request.Request(url)
  response=urllib.request.urlopen(request) 
  html=response.read().decode('utf-8')

I'm getting the source code as above. I can get it well when the page I'm getting is utf-8, but if it's encoded with shift-jis

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 228: invalid start byte

I get an error. I think it will work if I convert "shift-jis" of "bytes" to "utf-8" of "bytes". Is that possible?

python python3

2022-09-29 21:25

1 Answers

Simply html=response.read().decode('ShiftJIS') is fine.

To decide which encoding to use, you have to look at all the information you can think of, such as http headers, lang attributes of html elements, meta elements, and so on, but if you're trying to do that, you'd better use .


2022-09-29 21:25

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.