I'm trying to run it with XML file and python file like below, but I don't know how to get rid of the error.Also, is there a way to deal with garbled characters (using UTF-8)?
Error Code (Results)
C:\Users\g21125\python_xml_ex>python all-element.py
recipe
dish
Traceback (most recent call last):
File "all-element.py", line 32, in <module>
printAllElement(xdoc. documentElement)
File "all-element.py", line 18, printAllElement
printAllElement(child, hierarchy+1)
File "all-element.py", line 18, printAllElement
printAllElement(child, hierarchy+1)
File "all-element.py", line 24, imprintAllElement
if data!='\n': print("{0}{1}".format(space, node.data))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-5:ordinal not in range(128)
C:\Users\g21125\python_xml_ex>
sample.xml
<?xml version="1.0" encoding="UTF-8"?>
<recipe name="bread" preparations-time="5 minutes" cooking-time="3 hours">
<dish>Basic Bread</dish>
<material quality='3' unit='cup'>flour</material>
<material quality='0.25' unit='ounce'>East</material>
<material quality='1.5' unit='cup'>water</material>
<material quality='1' unit='teaspoon'>salt</material>
<point>
<process> Mix all ingredients together.</process>
<process> Finely knead.</process>
Cover with <process> cloth and leave in a warm room for an hour.</process>
<process>Combine again.</process>
Place in <process> baking container.</process>
Cover with <process> cloth and leave in a warm room for an hour.</process>
<process> Place in oven and bake at 180°C for 30 minutes.</process>
</point>
</recipe>
all-element.py
#coding:utf-8
# access all elements of
from xml.dom import minimum
# Display tag names or text for all elements
defprintAllElement(node, hierarchy=0):
# space adjustment
space='"
for i in range (hierarchy*4):
space + = ' '
# Display tag names for element nodes
if node.nodeType==node.ELEMENT_NODE:
print("{0}{1}".format(space,node.tagName))
# recursive call
for child in node.childNodes:
printAllElement(child, hierarchy+1)
# Display data if text or comment
elif node.nodeType in [node.TEXT_NODE, node.COMMENT_NODE]:
# clear space
data=node.data.replace(', ')
# Display only when line breaks are not the only ones
if data!='\n': print("{0}{1}".format(space, node.data))
# Load the sample.xml file
xdoc=minidom.parse("sample.xml")
# View All Elements
printAllElement(xdoc. documentElement)
Run Results (Successful)
recipe
dish
basic bread
material
flour
material
east
material
water
material
salt
point
process
Mix all the ingredients together.
process
I will knead it thoroughly.
process
Cover with cloth and leave in a warm room for an hour.
process
I will knead it again.
process
Put it in a baking container.
process
Cover with cloth and leave in a warm room for an hour.
process
Place in the oven and bake for 30 minutes at 180℃.
Next line
if data!='\n': print("{0}{1}".format(space, node.data))
Write like this and specify unicode
if data!='\n': print(u"{0}{1}".format(space, node.data))
# Well, switching to Python 3 is probably the easiest way.
if data!='\n': print("{0}{1}".format(space,node.data))
The str.format()
method attempts to convert (encode) the unicode string to a str string.Specifically, encoding is obtained from sys.getdefaultencoding()
.
>>import sys
>>sys.getdefaultencoding()
'ascii'
The default encoding is ascii
, so UnicodeEncodeError
will occur.
>>"{}".format(u "a")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: original not in range (128)
There are several workarounds.
Explicitly convert node.data (unicode string) to str string (UTF-8) instead of relying on Python auto-conversion.
if data!='\n':
print("{0}{1}".format(space, node.data.encode('utf-8'))))
Change the Default encoding from ascii
to UTF-8
.
:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
# Load the sample.xml file
xdoc=minidom.parse("sample.xml")
:
Use print statements instead of formatting them with the format() method.
if data!='\n': print "%s%s"%(space,node.data)
:
import sys
import codecs
sys.stdout = codecs.getwriter('utf-8') (sys.stdout)
# Load the sample.xml file
xdoc=minidom.parse("sample.xml")
:
Set the encoding of the standard output (sys.stdout) to UTF-8
.
sys.stdout = codecs.getwriter('utf-8') (sys.stdout)
The same effect can be achieved by setting the environment variable PYTHONIOENCODING
without adding the above action to the source code.
$PYTHONIOENCODING='UTF-8'python all-element.py
PYTHONIOENCODING
If this is set before running the interpreter, it overrides the encoding used for stdin/stdout/stderr, in the syntax encodingname: errorhandler.The errorhandler part is optional and has the same meeting as in str.encode.
© 2024 OneMinuteCode. All rights reserved.