In the Seoul Metropolitan Government's public data, we wanted to analyze how the number of students per teacher changes in time series by district. http://data.seoul.go.kr/openinf/linkview.jsp?infId=OA-11997&tMenu=11 -- Links to the material.
Using the Pandas module in Python, load the file in Excel format
Only E, H, K, and M columns were reset to df.
I'm lost here.
According to Excel raw data, Jongno-gu, which appeared in B5, reappeared in B31 and lives every 26 years.
What should I do if I want to set up a new column and place data such as 2004, 2005, and 2006 by district, such as Jongno-gu, Jung-gu, Dongjak-gu...?
python pandas dataframe data
First, make a good column name, and when reading_csv, give header=None and read the well-made column name in factors.
col_names = []
with open(file_name, 'r', encoding='utf-8') as f:
for i, line in enumerate(f):
line = line.strip()
if i > 3: break
if i == 0:
col_names1 = line.split('\t')
if i == 2:
for c, n in zip(col_names1, line.split('\t')):
if c == n:
col_names.append(c)
else:
col_names.append(c+'-'+n)
df = pd.read_csv(file_name, skiprows=3, names=col_names,
header=None, delimiter='\t', thousands=',')
© 2024 OneMinuteCode. All rights reserved.