How to Determine if Furigana Exists in Python-docx

Extracting text from docx file on python-docx module.
Therefore, I have one question: Is it possible to check if there is a kurigana (rubi) in the text?
Also, if I could look into it, what would I do to get the contents of the rubi?
I would appreciate it if you could let me know if you know anything.
The environment is Windows 10, Python 3.7.
Thank you for your cooperation.

python python3 windows python-docx

2022-09-30 19:55

1 Answers

In Use Python docx to create phonetic guide/'Ruby text' in Word?, the python-docx does not support loading rubies

After reading the current python-docx document, no way to get rubi was found.

However, you can use the python-docx feature to directly analyze and extract the OOXML from the notes around the comment (OOXML) rubi.

The sample code below uses three methods to extract furigana.
If you want to investigate and extract kana characters for each Run, use the 3. method in the code.

sample code

 from docx import Document

with open('sample.docx', 'rb') as f:
    document=Document(f)

print("#1.View document rubies")
for in document.element.xpath("//w:ruby/w:rt/w:r/w:t"):
    print(t.text)

print("#2. View Rubi and Body of Document")
ns = document.element.nsmap
for ruby in document.element.xpath("//w:ruby"):
    r=ruby.xpath("w:rt/w:r/w:t", namespaces=ns) [0].text#namespaces or "lxml.etree.XPathEvalError: Undefined namespace prefix" will occur
    t=ruby.xpath("w:rubyBase/w:r/w:t", namespace=ns) [0].text#namespace or "lxml.etree.XPathEvalError: Undefined namespace prefix" will appear
    print(f"{t}({r}")")

print("#3.View the rubies and body of runs")
for pin document.paragraphs:
    for run in p.runs:
        ruby=run._r.xpath("w:ruby")
        # check the existence of furigana in the run
        if ruby:
            r=run._r.xpath("w:ruby/w:rt/w:r/w:t")[0].text
            t=run._r.xpath("w:ruby/w:rubyBase/w:r/w:t")[0].text
            print(f"{t}({r}")")

Reference:

Configuring Word Documents
(OOXML)Notes around rubi
Toho Questions (13) Python, lxml, Default Namespace and XPath
Handling underlines, side points, and frigana in python-docx
- If you want to add furigana, this is for your reference

If you want to add furigana, this is for your reference

2022-09-30 19:55

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656