Python Naver Blog Crawling Questions

Asked 2 years ago, Updated 2 years ago, 18 views

Hello, I'm trying to make a post on Naver blog with Python and upload it automatically to another blog (teastory, next time).

import requests
from bs4 import BeautifulSoup
from PIL import Image
import re 


def GetNaverBloginfo():
  try:
    NAVERBLOGURL = 'https://blog.naver.com/qualisports/222647124138'
    response = requests.get(NAVERBLOGURL)
    soup = BeautifulSoup(response.text, 'html.parser')
    ifra = soup.find('iframe', id='mainFrame')
    post_url = 'https://blog.naver.com' + ifra['src']
    res = requests.get(post_url)
    soup2 = BeautifulSoup(res.text, 'html.parser')
    titles = soup2.find_all('div', {'class': re.compile('^se-module se-module-text se-title-tex.*')})
    Navertitle = titles[0].text
    Navertitle = Navertitle.replace('\n', '')
    special_char = '\/:*?"<>|.'
    for c in special_char:
        if c in Navertitle:
            Navertitle = Navertitle.replace(c, '')
    Navewrcontents = ''
    txt_contents = soup2.find_all('div', {'class': "se-module se-module-text"})
    for p_span in txt_contents:
        for txt in p_span.find_all('span'):
            Navewrcontents += txt.get_text() + '\n'
        imgs = soup2.find_all('img', class_='se-image-resource')
        cnt = 1
    for img in imgs:
        img_url = img.get('data-lazy-src')
        imageObj = Image.open(requests.get(img_url, stream=True).raw)
        img_format = imageObj.format                    
        res_img = requests.get(img_url).content  
        if img_format:
            img_name = str(cnt) + '.' + img_format
        else:
            img_name = str(cnt) + '.jpg'
        if len(res_img) > 100:
            with open(img_name, 'wb') as f:
                f.write(res_img)
            cnt += 1  
    print('Naver blog title: '+Navertitle+'\n'+'Navertitle: '+Naverwr contents)
  except:
      pass

GetNaverBloginfo()

That's all for the sauce I searched on Google and modified the source a little bit

In this way, I understand that you can find and get all the values of the different div classes The output result I want is if you go to the blog post address

I write a blog post like this, but if you run the source, you can bring the pictures separately without order, so the order of the pictures is not right So, I want to get the value in the order of the pictures and writings, but after thinking about it, I post a question What should I do? Thank you for the hint

python

2022-09-20 10:56

1 Answers

Looking at the questions, it doesn't seem like a question that a developer hired by a company would ask, but it seems to be working independently.

By the way, how much do you know about copyright?

When I see that you are trying to copy and post the contents of Naver blog to T-story blog, I can't get rid of the feeling that it is a work that is carried out for a specific purpose.

Personally, I don't understand the reason for doing this kind of work, but I hope you don't touch anything weird and legally avoid it.


2022-09-20 10:56

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.