Encoding problem after crawling Naver blog?

Asked 2 years ago, Updated 2 years ago, 75 views

I crawled through the Naver blog, but the results came out only in unknown texts. What should I do to make it look like Korean?

The command is as follows

from gn_libs3.naver_api_caller2 import get1000Result
import json

keywords=["Small but certain happiness"]
list=[]
for keyword in keywords:
    result = get1000Result(keyword)
    list=list+result
    print(len(list))

file=open("./search_sohachang.json", "w+")
file.write(json.dumps(list))

The results are as follows

How can I make the content look Korean? TT

crawling

2022-09-22 08:33

1 Answers

Hangul that the questioner wants is Python's str type.

Python's json module applies ascii encoding (represented by the expression \ud55c\uae00) for non-Latin Unicode.

In other words, the json string must be decoded into the python type.

import json
json_encoded_str = json.dumps ("Hangul") #json.dumps ("Hangul", sense_ascii=False) can be simply read.
print(json_encoded_str)
python_str = json.loads(json_encoded_str)
print(python_str)

"\ud55c\uae00"
Hangul


2022-09-22 08:33

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.