How to Determine if Furigana Exists in Python-docx

Asked 2 years ago, Updated 2 years ago, 47 views

Extracting text from docx file on python-docx module.
Therefore, I have one question: Is it possible to check if there is a kurigana (rubi) in the text?
Also, if I could look into it, what would I do to get the contents of the rubi?
I would appreciate it if you could let me know if you know anything.
The environment is Windows 10, Python 3.7.
Thank you for your cooperation.

python python3 windows python-docx

2022-09-30 19:55

1 Answers

In Use Python docx to create phonetic guide/'Ruby text' in Word?, the python-docx does not support loading rubies

After reading the current python-docx document, no way to get rubi was found.

However, you can use the python-docx feature to directly analyze and extract the OOXML from the notes around the comment (OOXML) rubi.

The sample code below uses three methods to extract furigana.
If you want to investigate and extract kana characters for each Run, use the 3. method in the code.

sample code

 from docx import Document

with open('sample.docx', 'rb') as f:
    document=Document(f)

print("#1.View document rubies")
for in document.element.xpath("//w:ruby/w:rt/w:r/w:t"):
    print(t.text)

print("#2. View Rubi and Body of Document")
ns = document.element.nsmap
for ruby in document.element.xpath("//w:ruby"):
    r=ruby.xpath("w:rt/w:r/w:t", namespaces=ns) [0].text#namespaces or "lxml.etree.XPathEvalError: Undefined namespace prefix" will occur
    t=ruby.xpath("w:rubyBase/w:r/w:t", namespace=ns) [0].text#namespace or "lxml.etree.XPathEvalError: Undefined namespace prefix" will appear
    print(f"{t}({r}")")

print("#3.View the rubies and body of runs")
for pin document.paragraphs:
    for run in p.runs:
        ruby=run._r.xpath("w:ruby")
        # check the existence of furigana in the run
        if ruby:
            r=run._r.xpath("w:ruby/w:rt/w:r/w:t")[0].text
            t=run._r.xpath("w:ruby/w:rubyBase/w:r/w:t")[0].text
            print(f"{t}({r}")")

Reference:

  • If you want to add furigana, this is for your reference


2022-09-30 19:55

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.