Text garbled when an error occurred while operating XML on python 文字

Asked 2 years ago, Updated 2 years ago, 105 views

I'm trying to run it with XML file and python file like below, but I don't know how to get rid of the error.Also, is there a way to deal with garbled characters (using UTF-8)?

Error Code (Results)

 C:\Users\g21125\python_xml_ex>python all-element.py
recipe
    dish
Traceback (most recent call last):
  File "all-element.py", line 32, in <module>
    printAllElement(xdoc. documentElement)
  File "all-element.py", line 18, printAllElement
    printAllElement(child, hierarchy+1)
  File "all-element.py", line 18, printAllElement
    printAllElement(child, hierarchy+1)
  File "all-element.py", line 24, imprintAllElement
    if data!='\n': print("{0}{1}".format(space, node.data))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5:ordinal not in range(128)

C:\Users\g21125\python_xml_ex>

sample.xml

<?xml version="1.0" encoding="UTF-8"?>
<recipe name="bread" preparations-time="5 minutes" cooking-time="3 hours">
    <dish>Basic Bread</dish>
    <material quality='3' unit='cup'>flour</material>
    <material quality='0.25' unit='ounce'>East</material>
    <material quality='1.5' unit='cup'>water</material>
    <material quality='1' unit='teaspoon'>salt</material>
    <point>
        <process> Mix all ingredients together.</process>
        <process> Finely knead.</process>
        Cover with <process> cloth and leave in a warm room for an hour.</process>
        <process>Combine again.</process>
        Place in <process> baking container.</process>
        Cover with <process> cloth and leave in a warm room for an hour.</process>
        <process> Place in oven and bake at 180°C for 30 minutes.</process>
    </point>
</recipe>

all-element.py

#coding:utf-8
# access all elements of

from xml.dom import minimum

# Display tag names or text for all elements
defprintAllElement(node, hierarchy=0):
    # space adjustment
    space='"
    for i in range (hierarchy*4):
        space + = ' '

    # Display tag names for element nodes
    if node.nodeType==node.ELEMENT_NODE:
        print("{0}{1}".format(space,node.tagName))
        # recursive call
        for child in node.childNodes:
            printAllElement(child, hierarchy+1)
    # Display data if text or comment
    elif node.nodeType in [node.TEXT_NODE, node.COMMENT_NODE]:
        # clear space
        data=node.data.replace(', ')
        # Display only when line breaks are not the only ones
        if data!='\n': print("{0}{1}".format(space, node.data))

# Load the sample.xml file
xdoc=minidom.parse("sample.xml")

# View All Elements
printAllElement(xdoc. documentElement)

Run Results (Successful)

recipe
    dish
        basic bread
    material
        flour
    material
        east
    material
        water
    material
        salt
    point
        process
            Mix all the ingredients together.
        process
            I will knead it thoroughly.
        process
            Cover with cloth and leave in a warm room for an hour.
        process
            I will knead it again.
        process
            Put it in a baking container.
        process
            Cover with cloth and leave in a warm room for an hour.
        process
            Place in the oven and bake for 30 minutes at 180℃.

python xml

2022-09-30 20:59

2 Answers

Next line

if data!='\n': print("{0}{1}".format(space, node.data))

Write like this and specify unicode

if data!='\n': print(u"{0}{1}".format(space, node.data))


2022-09-30 20:59

# Well, switching to Python 3 is probably the easiest way.

if data!='\n': print("{0}{1}".format(space,node.data))

The str.format() method attempts to convert (encode) the unicode string to a str string.Specifically, encoding is obtained from sys.getdefaultencoding().

>>import sys
>>sys.getdefaultencoding()
'ascii'

The default encoding is ascii, so UnicodeEncodeError will occur.

>>"{}".format(u "a")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: original not in range (128)

There are several workarounds.

Explicitly convert node.data (unicode string) to str string (UTF-8) instead of relying on Python auto-conversion.

if data!='\n':
  print("{0}{1}".format(space, node.data.encode('utf-8'))))

Change the Default encoding from ascii to UTF-8.

:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

# Load the sample.xml file
xdoc=minidom.parse("sample.xml")
          :

Use print statements instead of formatting them with the format() method.

if data!='\n': print "%s%s"%(space,node.data)
          :

import sys
import codecs
sys.stdout = codecs.getwriter('utf-8') (sys.stdout)

# Load the sample.xml file
xdoc=minidom.parse("sample.xml")
          :

Set the encoding of the standard output (sys.stdout) to UTF-8.

 sys.stdout = codecs.getwriter('utf-8') (sys.stdout)

The same effect can be achieved by setting the environment variable PYTHONIOENCODING without adding the above action to the source code.

$PYTHONIOENCODING='UTF-8'python all-element.py

PYTHONIOENCODING

If this is set before running the interpreter, it overrides the encoding used for stdin/stdout/stderr, in the syntax encodingname: errorhandler.The errorhandler part is optional and has the same meeting as in str.encode.


2022-09-30 20:59

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.