Crawling processing for missing material

Asked 1 years ago, Updated 1 years ago, 356 views

Hello I'm a novice office worker who starts Python.

Web crawling on the journal of the paper.
I think the package I'm using will be selenium and beautiful soup 4.
About the paper page in the journal
The title is always in the same position,

driver.find_element_by_class_name("c-article-title").text

I was able to load it in the same format as the

Received, Revised, Accepted, Published, Issue Date, DOI에
Items for

It belongs to the same class called c-article-subject-list

driver.find_elements_by_class_name("c-bibliographic-information__value")[0].text

The data was scraped in order as shown in .

The problem is that some papers don't have some items, such as corrections.
Received(0), Revised(1), Accepted(2), Published(3), Issue Date(4), DOI(5)가
In some papers, received(0), Accepted(1), Published(2), Issue Date(3), and DOI(4).
Even Published (0), DOI (1).

As scraping is carried out using a repetitive sentence, there is a difference Verified to cause an error.

I searched the contents on Google, but I couldn't find it properly due to lack of skills, so I asked you this.

If the class name is the same as above and it changes fluidly from time to time, how can I scrap the data I want?

1 0

python crawling beautifulsoup selenium-webdrive

2022-12-19 17:19

1 Answers

# If so, it should be "everything that can exist" that actually needs to be toured.
attributes = ['Received', 'Revised', 'Accepted', ..., 'DOI']

# For each:
for a in attributes :
    for e in driver.find_elements_by_class_name("c-bibliographic-information__value") :
        if a in e.text :
            # What to do when there's an item
        else :
            # If you decide what to do when you're not there,

# After passing through the loop above, I would have done something about everything that I defined in attributes, whether it was in the crawling data or not.


2022-12-19 23:18

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.