I want to count the number of cities by province from the csv file.

Asked 1 years ago, Updated 1 years ago, 234 views

I would like to count the number of cities by prefecture from the national address data CSV;KEN_ALL.CSV using python.

National Address Data CSV
Contents of National Address Data CSV

import csv
result={}
filename = 'KEN_ALL.CSV'
prefs, cities = [ ], [ ]
with open(filename, encoding='Shift JIS', newline=')asf:
    csvreader=csv.reader(f)
    for row in csvreader:
        prefs.append(row[6])#7th column: State/Province name
        cities.append(row[7])#8th column: City name

 # Remove Duplicate
prefix=[*{*prefs}]
cities=[*{*cities}]

cities=list(filter(lambdax:x.endswith('city'), cities))

length = len(cities)
print (number of elements in f'cities: {length}')

# 今回 part of this question
for pref in prefs:
    result [pref] = 0

For city incities
    for pref in prefs:
        if city.startswith(pref):
            result [pref] + = 1
            break

print(result)
print(prefs)#47 cases
print(cities)#1894 cases

Doing this

{'Tokushima Prefecture':0, 'Ishikawa Prefecture':0, 'Tottori Prefecture':0, 'Gunma Prefecture':0, 'Miyazaki Prefecture':0, 'Kumamoto Prefecture':0, 'Kochi Prefecture':0, 'Osaka Prefecture':0, 'Toyama Prefecture':0, 'Hokkaido':0, 'Fukui Prefecture'

As the number of cases is large, the city's count will be zero, although only a city.How do I count the number of cities in cities and describe them by province?

python

2022-12-30 02:00

3 Answers

If you want only the number of prefectures and cities from KEN_ALL.CSV, you can do the following.
Also, if .endswith('city') is used to determine whether the city is a city or not, the designated city is not included.
If you want to use the name of the city later or include a designated city, additional processing is required.

import csv
result={}
filename = 'KEN_ALL.CSV'
with open(filename, encoding='Shift JIS', newline=')asf:
    csvreader=csv.reader(f)
    pref='"
    city='"
    cities = 0
    for row in csvreader:
        p=row[6]
        if pref and (pref!=p):
            result [pref] = cities
            cities = 0
        pref = p
        c=row[7]
        #### If you want to include designated cities by Cabinet Order, the following is effective.
        # if c. ends with ('ku') and ('city' in c):
        #     c=c[:(c.rfind('city')+1)]
        if c.ends with ('city') and (c!=city):
            cities+=1
            city = c
    
    if cities! = 0:
        result [pref] = cities

print(result)


2022-12-30 02:21

For your information, if you are using Pandas, you can write as follows:

import pandas as pd

num_cities=pd.read_csv('ken_all.zip', encoding='cp932', header=None,
                         usecols=[6,7], names=['State', 'City'])\
               .drop_duplicates().query('City.str.endswith('City')')\
               .value_counts('State').to_dict()

ref=pd.read_html('https://www.j-lis.go.jp/spd/code-address/kenbetsu-inspection/cms_11914151.html',
                   index_col=1)[0].drop('total')['city'].to_dict()

print(num_cities==ref)


2022-12-31 10:07

How to use groupby

I think it will be easy to handle this kind of process (apart from the assignment) (personally) using pandas.
CSV is It seems to be a nationwide batch (1,691,520 Byte) of zip code data, so I'll calculate it from zip.

update:simplify comments
(The site says ShiftJIS, so just in case)

import pandas as pd

fname='/path/to/ken_all.zip'
df=pd.read_csv(fname, encoding='ShiftJIS', header=None, usecols=[6,7])
df=df.drop_duplicates().rename (columns={6:'State', 7:'City'})
df[df['City'].str.endswith('City')].groupby('State', sort=False).count()

If you want to use Python instead of Pandas, you need to find and replace equivalent features.

  • drop_duplicates setset or dict
  • df.groupby iitertools.groupby
import csv
from itertools import groupby

with open(csvfile, encoding='ShiftJIS', newline=') as fp:
    reader=csv.reader(fp)
    dct = {k:sum(c.endswith('city') for_,king)
            for k, gin groupby (dict.fromkeys(row[6], row[7]) for row in reader),
                               key=lambdax:x[0])}

For confirmation, compare with List of municipalities by prefecture

df, =pd.read_html ('https://www.j-lis.go.jp/spd/code-address/kenbetsu-inspection/cms_11914151.html', index_col=0)
df.rename (columns={df.columns[0]: 'State'}, replace=True)
display(df.tail())

df.City.to_list()[:-1]==list(dct.values())#True
df.set_index('Province').drop('Total').City.to_dict()==dct#True


2022-12-31 17:51

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.