I would like to count the number of cities by prefecture from the national address data CSV;KEN_ALL.CSV using python.
Contents of National Address Data CSV
import csv
result={}
filename = 'KEN_ALL.CSV'
prefs, cities = [ ], [ ]
with open(filename, encoding='Shift JIS', newline=')asf:
csvreader=csv.reader(f)
for row in csvreader:
prefs.append(row[6])#7th column: State/Province name
cities.append(row[7])#8th column: City name
# Remove Duplicate
prefix=[*{*prefs}]
cities=[*{*cities}]
cities=list(filter(lambdax:x.endswith('city'), cities))
length = len(cities)
print (number of elements in f'cities: {length}')
# 今回 part of this question
for pref in prefs:
result [pref] = 0
For city incities
for pref in prefs:
if city.startswith(pref):
result [pref] + = 1
break
print(result)
print(prefs)#47 cases
print(cities)#1894 cases
Doing this
{'Tokushima Prefecture':0, 'Ishikawa Prefecture':0, 'Tottori Prefecture':0, 'Gunma Prefecture':0, 'Miyazaki Prefecture':0, 'Kumamoto Prefecture':0, 'Kochi Prefecture':0, 'Osaka Prefecture':0, 'Toyama Prefecture':0, 'Hokkaido':0, 'Fukui Prefecture'
As the number of cases is large, the city's count will be zero, although only a city.How do I count the number of cities in cities and describe them by province?
python
If you want only the number of prefectures and cities from KEN_ALL.CSV
, you can do the following.
Also, if .endswith('city')
is used to determine whether the city is a city or not, the designated city is not included.
If you want to use the name of the city later or include a designated city, additional processing is required.
import csv
result={}
filename = 'KEN_ALL.CSV'
with open(filename, encoding='Shift JIS', newline=')asf:
csvreader=csv.reader(f)
pref='"
city='"
cities = 0
for row in csvreader:
p=row[6]
if pref and (pref!=p):
result [pref] = cities
cities = 0
pref = p
c=row[7]
#### If you want to include designated cities by Cabinet Order, the following is effective.
# if c. ends with ('ku') and ('city' in c):
# c=c[:(c.rfind('city')+1)]
if c.ends with ('city') and (c!=city):
cities+=1
city = c
if cities! = 0:
result [pref] = cities
print(result)
For your information, if you are using Pandas, you can write as follows:
import pandas as pd
num_cities=pd.read_csv('ken_all.zip', encoding='cp932', header=None,
usecols=[6,7], names=['State', 'City'])\
.drop_duplicates().query('City.str.endswith('City')')\
.value_counts('State').to_dict()
ref=pd.read_html('https://www.j-lis.go.jp/spd/code-address/kenbetsu-inspection/cms_11914151.html',
index_col=1)[0].drop('total')['city'].to_dict()
print(num_cities==ref)
I think it will be easy to handle this kind of process (apart from the assignment) (personally) using pandas.
CSV is It seems to be a nationwide batch (1,691,520 Byte) of zip code data, so I'll calculate it from zip.
update:simplify comments
(The site says ShiftJIS
, so just in case)
import pandas as pd
fname='/path/to/ken_all.zip'
df=pd.read_csv(fname, encoding='ShiftJIS', header=None, usecols=[6,7])
df=df.drop_duplicates().rename (columns={6:'State', 7:'City'})
df[df['City'].str.endswith('City')].groupby('State', sort=False).count()
If you want to use Python instead of Pandas, you need to find and replace equivalent features.
drop_duplicates
setset or dictdf.groupby
iitertools.groupbyimport csv
from itertools import groupby
with open(csvfile, encoding='ShiftJIS', newline=') as fp:
reader=csv.reader(fp)
dct = {k:sum(c.endswith('city') for_,king)
for k, gin groupby (dict.fromkeys(row[6], row[7]) for row in reader),
key=lambdax:x[0])}
For confirmation, compare with List of municipalities by prefecture
df, =pd.read_html ('https://www.j-lis.go.jp/spd/code-address/kenbetsu-inspection/cms_11914151.html', index_col=0)
df.rename (columns={df.columns[0]: 'State'}, replace=True)
display(df.tail())
df.City.to_list()[:-1]==list(dct.values())#True
df.set_index('Province').drop('Total').City.to_dict()==dct#True
© 2024 OneMinuteCode. All rights reserved.