I want Python to store it in the same folder name as the file name
PDF split in PyPDF2.I was able to do the split process, but I don't know how to store the split file in the specified folder. What I want to do is put
in the folder name of "1", "2", "3" in the test folder.
I'd like to have the files 1.pdf, 2.pdf and 3.pdf divided into folders.
The folder was previously created.
1.pdf→1 folder
2.pdf→2 folder
3.pdf→3 folder
The development environment is in Windows
Is it possible to process Python or Windows command batches?
If possible, please let me know.
import PyPDF2
# Get all PDFs in Program 2| folder
curdir=os.getcwd()
files=list(pathlib.Path(cardir).glob('*.pdf'))
# Program 3|Processes all PDFs in folders
for file in files:
merge = PyPDF2.PdfFileMerger()
merge.append('test1.pdf', pages=(1,9))
merge.write('test4.pdf')
merge.close()
You can specify the folder name and filename by separating the folder name and filename with a separator of /
or \
.
PDF page numbers can be obtained from PyPDF2's numPages
property, so you can loop the first page into the following code:
pdf=PdfFileReader(src)
for i in range (pdf.numPages):
# segmentation data creation
dest=PdfFileWriter()
dest.addPage(pdf.getPage(i))
with open(f"./{i+1}/{i+1}.pdf", "wb") asf:
dest.write(f)
Also, since the pdf in the folder is retrieved by glob, setting the file name to 1.pdf
uniformly will overwrite the existing file with the same name.
If you want to set an existing filename in the split filename prefix (for example, the first page of TEAMS_content_health.pdf
will be TEAMS_content_health_1.pdf
), you can use a coating similar to the sample code below.
I found it difficult to understand the intent of your question because of the fact that the PDF was combined and no split process was found in the for
statement.
If this answer misses the point, try adding comments and more to the questionnaire.
sample code
import urlib.request
importos
from pathlib import Path
from PyPDF2 import PdfFileReader, PdfFileWriter
# Download and retrieve the sample PDF (*If you cannot download it, please provide an alternative file)
urls=["https://x54cwjdqkdu7-so-docs.netlify.app/pdf/teams/b_b/general_information/TEAMS_content_health.pdf",
"https://x54cwjdqkdu7-so-docs.netlify.app/pdf/teams/b_b/getting_started/TEAMS_ask_a_question.pdf" ]
For urlin urls:
src=url.split("/")[-1]
urllib.request.urlretrieve(url,src)
# loop PDFs in folders
for src in Path().glob('*.pdf'):
# save a sample PDF page separately
pdf = PdfFileReader(src)
for i in range (pdf.numPages):
# segmentation data creation
dest=PdfFileWriter()
dest.addPage(pdf.getPage(i))
# The filename after the split (for example, the first page of TEAMS_content_health.pdf is TEAMS_content_health_1.pdf).
name = f "{Path(src).stem}_{i+1}.pdf"
# Create a numeric folder that does not correspond to the page number.
os.madeirs(str(i+1), exist_ok = True)
# File output (save the file with the file name after segmentation within the numeric folder)
with open(f"./{i+1}/{name}.pdf", "wb") asf:
dest.write(f)
Run Results
>tree/F test
Folder Path List: Volume Windows
Volume serial number is FFFF-FFFF
{Full Path}\TEST
├─1
│ TEAMS_ask_a_question_1.pdf.pdf
│ TEAMS_content_health_1.pdf.pdf
│
├─2
│ TEAMS_ask_a_question_2.pdf.pdf
│ TEAMS_content_health_2.pdf.pdf
│
├─3
│ TEAMS_content_health_3.pdf.pdf
│
└─4
TEAMS_content_health_4.pdf.pdf
© 2024 OneMinuteCode. All rights reserved.