Log in to Job Planet and ask for scraping ㅜㅜ

I need to get the annual salary for each position that I can check after logging in on the Job Planet site with the code below. I really can't

//
from bs4 import BeautifulSoup
import urllib, http.cookiejar
cj = http.cookiejar.LWPCookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) 
urllib.request.install_opener(opener)

headers = {'User-Agent': 'Mozilla/5.0'}

params = urllib.parse.urlencode({"mode":"login", "user_email":"*******", "user_password":"******"})
params = params.encode('utf-8')
req = urllib.request.Request("https://www.jobplanet.co.kr/users/sign_in", headers=headers)
rej = urllib.request.Request("https://www.jobplanet.co.kr/companies/20575/salaries/", headers=headers)
res = opener.open(rej)

html = res.read()

python urllib login crawling scraping

2022-09-22 20:25

1 Answers

The site of the question does not process authentication by sending information to the form data when authenticating.

Send json string containing authentication information to body to process authentication.

import requests

URL = 'https://www.jobplanet.co.kr/users/sign_in'

user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36'
headers = {'Content-type': 'application/json', 'Accept': 'text/plain', 'User-Agent':user_agent}
login_data = {'user':{'email':'', 'password':'', 'remember_me':'true'}}

client = requests.session()
login_response = client.post(URL, json = login_data, headers = headers)
print(login_response.content.decode('utf-8'))

index = client.get(URL)
print(index.content.decode('utf-8'))

2022-09-22 20:25

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656