How Word Normalizes Characters in Document to Unicode NFC

Asked 1 years ago, Updated 1 years ago, 59 views

Is there an easy way to normalize Unicode NFC characters in Microsoft Word?Emacs images features such as ucs-normalize-NFC-region.

In documents such as text and Word format created on macOS, the combination string and pre-synthesized characters are often mixed to treat kana turbidity and semi-cloudiness points.I imagine that it is probably due to differences in behavior when copying and pasting between applications.If things go on like this, there will be inconveniences in searching, so I would like to do NFC normalization.

Maintaining the Word document format is a prerequisite.Also, what I'm talking about here is the content of the document, not the file name.

unicode ms-word

2022-09-30 21:26

1 Answers

The .docx format for Word is a file format in which XML is zipped and the extension is changed.The characters in the document are also written as XML elements, so NFC normalization should not be difficult.

If the document is encrypted, of course, the characters in the document are encrypted, so decryption is required beforehand.


2022-09-30 21:26

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.