Is there any other way to save the Python database?

Asked 1 years ago, Updated 1 years ago, 406 views

import requests
from bs4 import BeautifulSoup
import pymysql

conn = pymysql.connect(host='localhost', user='root', password='db비밀번호',             charset='utf8', db='pythondb') 
cur = conn.cursor()

for page in range(1, 5):
    url = "https://www.10000recipe.com/recipe/list.html?  order=reco&page="+str(page)
    print(url)
    response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
lis = soup.select('#contents_area_full > ul > ul > li')

for li in lis:  
    food = li.select_one('div.common_sp_caption > div.common_sp_caption_tit.line2')
    if food is not None:
        title = food.text
        sql = 'INSERT INTO test (Name) VALUES (%s)'
        cur.execute(sql, (title))

for i in range(2, 17):
    trees = soup.select_one(f'div:nth-child(1) > a:nth-child({i})').get_text()
    cur.execute('INSERT INTO test (Category) VALUES (%s)', (trees))
    conn.commit()
conn.close()

Column 1 (NAME) has a title stored, and column 2 (Category) has NULL stored up to 160 rows, followed by a null column 2 (Category) trace value in column 1 (NAME). Is the storage method different? I want to clear column 1 null value and sort column 2, but I can't sort.

python

2023-02-19 22:04

1 Answers

The code uses the requests and BeautifulSoup libraries to create code that stores crawled data in the MySQL database. However, the data stored in the code is stored separately by the value of the "title" variable and the value of the "trees" variable.

"INSERT INTO test (Name) VALUES (%s)" is a query that stores data in column "Name", and "INSERT INTO test (Category) VALUES (%s)" is a query that stores data in column "Category". Therefore, the data stored in the columns "Name" and "Category" are different.

In this code, there is no data to store in the "Category" column, so the NULL value is being stored. If you want to store the value of the variable "trees" in the "Category" column, you can modify it as follows:

forli inlis:
    food = li.select_one('div.common_sp_caption > div.common_sp_caption_tit.line2')
    if food is not None:
        title = food.text
        sql = 'INSERT INTO test (Name, Category) VALUES (%s, %s)'
        cur.execute(sql, (title, trees))
        conn.commit()

This saves the value of the variable "title" in the "Name" column and the value of the variable "trees" in the "Category" column. "trees" 변수는 for 루프에서 "soup.select_one(f'div:nth-child(1) > a:nth-child({i})').The value imported into get_text()" must be assigned to the "trees" variable inside the for loop before saving the "trees" variable. For example, within the for loop, you can assign a value to the variable "trees" as shown below.

fore in range(2, 17):
    trees = soup.select_one(f'div:nth-child(1) > a:nth-child({i})').get_text()
    sql = 'INSERT INTO test (Category) VALUES (%s)'
    cur.execute(sql, (trees))
    conn.commit()

This saves the value of the variable "trees" in the "Category" column. To sort the stored data, you can sort it using the syntax "ORDER BY". For example, to sort ascending by column "Category", you can create a query as follows:

SELECT * FROM test ORDER BY Category ASC;

This will output data sorted in ascending order by the "Category" column. For your information, the query above takes all the data and sorts it, so the more data you have, the longer it may take to process. You can reduce processing time by importing and sorting only the data you need.


2023-02-20 09:17

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.