I have a question about data range using Python Pandas module.

Asked 2 years ago, Updated 2 years ago, 111 views

In the Seoul Metropolitan Government's public data, we wanted to analyze how the number of students per teacher changes in time series by district. http://data.seoul.go.kr/openinf/linkview.jsp?infId=OA-11997&tMenu=11 -- Links to the material.

The format of the data you downloaded is as follows:

Using the Pandas module in Python, load the file in Excel format

Only E, H, K, and M columns were reset to df.

I'm lost here.

According to Excel raw data, Jongno-gu, which appeared in B5, reappeared in B31 and lives every 26 years.

What should I do if I want to set up a new column and place data such as 2004, 2005, and 2006 by district, such as Jongno-gu, Jung-gu, Dongjak-gu...?

python pandas dataframe data

2022-09-22 19:12

1 Answers

First, make a good column name, and when reading_csv, give header=None and read the well-made column name in factors.

col_names = []
with open(file_name, 'r', encoding='utf-8') as f:
    for i, line in enumerate(f):
        line = line.strip()
        if i > 3: break
        if i == 0:
            col_names1 = line.split('\t')
        if i == 2:
            for c, n in zip(col_names1, line.split('\t')):
                if c == n:
                    col_names.append(c)
                else:
                    col_names.append(c+'-'+n)


df = pd.read_csv(file_name, skiprows=3, names=col_names, 
                 header=None, delimiter='\t', thousands=',')


2022-09-22 19:12

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.