I want to extract the graph and the description of the graph from the PDF of the paper as a set.

Asked 2 years ago, Updated 2 years ago, 165 views

I wanted to do something like the title, so when I looked for a way, I found the following way.https://github.com/allenai/pdffigures2
However, it is written on a Scala basis, and I am looking for a program that I can use Python if possible.If you have any recommendations, please let me know.

2022-09-30 21:40

1 Answers

A program that calls Scala's program from Python has been published.
It's an experimental code for the paper itself, and it says it's not practical, but it might be helpful.

allenai/deepfigures-open

I don't know if I can pull out the others as a set, but PDFminer and PyPDF2 seem to be famous.

PDFminer How to extract text from PDF in Python/How to extract jpeg images from PDF in Python
[PDFMiner] Extracting text from PDF /Reporting the results of validation of text data from PDF (pdfminer.six)/Python

PyPDF2
Memory to extract images from PDFs with python
Extracting PDF Metadata and Text With Python
How to robustly extract author names from pdf papers?

Featured Python Tool.
Working with PDFs in Python:Reading and Splitting Pages

There was a Q&A article that introduced not only Python but also various tools.
Extracting information from PDFs of search papers closed

There is also a set of tools written in C.
XpdfReader

2022-09-30 21:40

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656

Popular Questions

647 ML-Agent tutorial says "Heuristic method called but not implemented.Returning placeholder actions." and fails to proceed

1261 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error

775 Error in x, y, and format string must not be None

778 GDB gets version error when attempting to debug with the Presense SDK (IDE)

1028 /usr/bin/google-chrome:symbol lookup error:/usr/bin/google-chrome: undefined symbol:gbm_bo_get_modifier

© 2025 OneMinuteCode. All rights reserved.