I want to load files containing Japanese on Python's NetworkX.

Asked 2 years ago, Updated 2 years ago, 127 views

I tried to load the Japanese file (prn) using NetworkX, but I got the following error.
I'm sorry for the rudimentary point, but I'd appreciate it if you could give me some advice.

Error
QT ------------------------------------------------------------------------------------------------

TypeError Traceback (most recent call last)
<ipython-input-28-f4f3d26af7f2>in<module>()
      2G = nx.DiGraph()
      3# Create edge (side) list by loading files
---->4G=nx.read_edgelist("sm10.prn", nodetype=int, create_using=nx.DiGraph())
      5 
      6 

<C:\Users\IWAMOTO MOMOKA\Anaconda2\lib\site-packages\decorator.pyc:decorator-gen-703>in read_edgelist (path, comments, delimiter, create_using, nodetype, data, edit type, encoding)

C:\Users\IWAMOTO MOMOKA\Anaconda2\lib\site-packages\networkx\utils\decorators.pycin_open_file(func_to_be_decorated, *args, **kwargs)
    238 #Finally, we call the original function, making sure to close the fobj
    239try:
-->240 result=func_to_be_decorated (*new_args, **kwargs)
    241 finally:
    242 if close_fobj:

C:\Users\IWAMOTO MOMOKA\Anaconda2\lib\site-packages\networkx\readwrite\edgelist.pycin read_edgelist (path, comments, delimiter, create_using, nodetype, data, edgetype, encoding)
    367 return parse_edgelist(lines, comments=comments, delimiter=delimiter,
    368 create_using = create_using, nodetype = nodetype,
-->369 data=data)
    370 
    371 

C:\Users\IWAMOTO MOMOKA\Anaconda2\lib\site-packages\networkx\readwrite\edgelist.pic in parse_edgelist(lines, comments, delimiter, create_using, nodetype, data)
    267 except:
    268 raise TypeError("Failed to convert nodes %s, %s to type %s."
-->269% (u, v, nodetype))
    270 
    271 iflen(d) == 0 or data is False:

<type'str'>:(<type' exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii',u"Failed to convert nodes\u9d8f,\u305f\u307e\u3054 to type<type'int'>',24internal)

UNQT -----------------------------------------------------------------

The original files and codes are as follows:

QT ---------------------------------------------------------------------

Chicken egg
rice egg
rice cooked with omelet
charha omelet rice
chicken with a long-nosed on its back

UNQT -------------------------------------------------------------

QT ----------------------------------------------------------

#coding=UTF-8
# function declaration
import networkx as nx
import string
import pandas aspd
import collections
import itertools
import matplotlib.pyplot asplt
import numpy as np

# Specify directed graph
G=nx.DiGraph()
# Create edge (side) list by loading files
G=nx.read_edgelist("sm10.prn", nodetype=int, create_using=nx.DiGraph())


# Number of nodes (vertex) output
print(nx.number_of_nodes(G))
# Edge Count Output
print(nx.number_of_edges(G))
# Network Basic Information Output
print(nx.info(G))
# order distribution

print(nx.degre_histogram(G))

UNQT -------------------------------------------------------------

python networkx

2022-09-30 17:34

1 Answers

cause determination
First of all, I got to the point where something was displayed without any errors.

Two points I have already pointed out

  • Convert encoding of data files to utf-8
  • Change nodetype=int to nodetype=str

In addition to "The strings in the third and fourth lines of the data file are not separated and are not valid data".This may be a transcription error when I wrote the questionnaire, but I corrected the data as follows and it was processed.

Chicken egg
rice egg
rice with omelet rice
fried rice omelet rice
chicken with a long-nosed on its back

The output is as follows:

6
5
Name:
Type—DiGraph
Number of nodes—6
Number of edges—5
Average in degree: 0.8333
Average out degree: 0.8333
[0, 2, 4]

Environmentally I use Windows 10 64bit, Python 3.7.6, NetworkX 2.4, pandas 1.0.1, matplotlib 3.2.0, numpy 1.18.1.

Below: Initial Answer

Simply because the data file is made with UTF-16 instead of UTF-8 .
You can convert the data file to UTF-8 or specify encoding as utf-16 when calling read_edgelist.

read_edgelist

read_edgelist(path, comments='#', delimiter=None, create_using=None, nodetype=None, data=True, edgetype=None, encoding='utf-8')

If you look at the code in the error message, it's UTF-16.

<type'str'>:(<type' exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii',u"Failed to convert nodes\u9d8f,\u305f\u307e\u3054 to type<internot;in';24,25)

\u9d8fChicken
\u305f just
\u307e still
\u3054Go

The error 0x0a in the comment is a newline code, so it's usually part of a character, and it's a little strange that it's an error.

So after searching for the data format, if you look at the article below, why don't you specify nodetype=int instead of nodetype=str.
Introduction to NetworkX in Python
Here's an example of the data.

01
0 2
0 3
0 4
 :

Here's a sample program

#Create Graph
G=nx.read_edgelist('facebook_combined.txt', nodetype=int)

Analyze and visualize your network with Python!Required Steps Summary
Here's an example of the data.

Actual data: edgelist.txt

AB
AC
Ad
AE
AF
 :

Here's a sample program

G=nx.read_edgelist('edgelist.txt', nodetype=str)


2022-09-30 17:34

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.