I want to divide PDFs into specified pages and save them with the specified name.

The PDF is 62 pages in total.
For this PDF, we have created a dictionary similar to the following:The contents are the name and the number of pages to save.

pdf_dic={
    'tokyocaffe': 3,
    'yokohamabook': 10,
    'saitamahouse': 5,
    'tokyoshline': 19,
    'aichicoffee': 7,
    'Fukuokfood': 9,
    'tokyobook': 3,
    'kyotocaffe': 2,
    'shigafood': 3,
    'tokyogoods' : 1
}

I would like to create PDFs in order such as pages 1-3 and pages 4-13 as yokohamabook.pdf.

I can split each page using PyPDF2, but I don't know how to handle the contents of the dictionary.
If you understand, please let me know.

python pdf

2022-12-30 22:36

1 Answers

You can split multiple pages by specifying a tuple of (start position, end position+1) in the PyPDF2.PdfMerger.append argument.(Start and end positions are 0 start)
You can create the desired code by combining it with a dictionary type.

sample code
単一Download Download the Digital Agency's guidelines for open data (all 7 pages) so that it works with a single code.( pip install requests)

import pathlib

# Download open data guidelines if no files are available
file_name="20220523_resources_data_guideline_01.pdf" 
if not pathlib.Path(file_name).exists():
    import requests # required pip install requests
    url="https://www.digital.go.jp/assets/contents/node/basic_page/field_ref_resources/f7fde41d-ffca-4b2a-9b25-94b8a701a037/7c57e1a9/20220523_resources_data_guideline_01.pdf" 
    res=requests.get(url)
    with open(file_name, "wb") asf:
        f.write(res.content)

import PyPDF2

pdf_dic = {
    'hoge': 3,
    'fuga': 2,
    'piyo': 2,
}

start = 0# start position
for key in pdf_dic:
    merge = PyPDF2.PdfMerger()
    end = start + pdf_dic [key] # End position +1
    merge.append(file_name, pages=(start,end))# Extract multiple pages
    merger.write(f"{key}.pdf")#Save
    start=end

2022-12-30 23:44

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656