AttributeError when retrieving text: 'Page' object has no attribute' getText'

I'm going to read a PDF in Python and convert the text information to Excel.
I referred to this site (https://fastclassinfo.com/entry/python_pdf_to_excel/), but the following error message occurred:

AttributeError
'Page' object has no attribute' getText'

Thank you for your help.Thank you for your cooperation.

import Fitz
import openpyxl aspx
from openpyxl.style import Alignment
 
# Program 2 | Create a list to store PDF text
item_list = [ ]
 
# Program 3 | Open PDF File
filename = '20180319001_1.pdf'
doc=fitz.open(filename)
 
# Get text one page at a time for Program 4|PDF
for page in range (len(doc)) :
    textblocks=doc[page].getText('blocks')
    for textblock in textblocks:
        if textblock[4].isspace() == False:
            item_list.append([page,textblock[4])])
 
# Program 5 | Create a new Excel
wb=px.Workbook()
ws = wb.active
 
# Program 6 | Excel Formatting
myalignment=Alignment(wrap_text=True, shrink_to_fit=False)
ws.column_dimensions['C'].width=100
 
# Program 7 | Output Excel Header
headers = ['No', 'Page', 'Content']
for i, header in enumerate (headers):
    ws.cell (row=1, column=1+i, value=headers[i])
 
# Program 8 | Output PDF text data to Excel
fory, row in enumerate (item_list):
    ws.cell(row=y+2, column=1, value=y+1)
    for x, cell in enumerate (row):
        ws.cell(row=y+2, column=x+2, value=item_list[y][x])
        ws.cell(row=y+2, column=x+2).alignment=myalignment
 
# Program 9 | Save Excel Files
excelname=f'{filename}_excel_convert.xlsx'
wb.save(excelname)

add
Python 3.9.7
openpyxl Version: 3.0.4
PyMuPDF 1.21.1

python python3

2023-01-04 10:43

1 Answers

In PyMuPDF 1.20.0 it seems to work if you change getText to get_text.

g I don't know if it works exactly the same as getText, but I got the text information.Please verify with the questioner if the same information can be obtained.

2023-01-04 15:36

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656