I have a question about data range using Python Pandas module.

In the Seoul Metropolitan Government's public data, we wanted to analyze how the number of students per teacher changes in time series by district. http://data.seoul.go.kr/openinf/linkview.jsp?infId=OA-11997&tMenu=11 -- Links to the material.

The format of the data you downloaded is as follows:

Using the Pandas module in Python, load the file in Excel format

Only E, H, K, and M columns were reset to df.

I'm lost here.

According to Excel raw data, Jongno-gu, which appeared in B5, reappeared in B31 and lives every 26 years.

What should I do if I want to set up a new column and place data such as 2004, 2005, and 2006 by district, such as Jongno-gu, Jung-gu, Dongjak-gu...?

python pandas dataframe data

2022-09-22 19:12

1 Answers

First, make a good column name, and when reading_csv, give header=None and read the well-made column name in factors.

col_names = []
with open(file_name, 'r', encoding='utf-8') as f:
    for i, line in enumerate(f):
        line = line.strip()
        if i > 3: break
        if i == 0:
            col_names1 = line.split('\t')
        if i == 2:
            for c, n in zip(col_names1, line.split('\t')):
                if c == n:
                    col_names.append(c)
                else:
                    col_names.append(c+'-'+n)


df = pd.read_csv(file_name, skiprows=3, names=col_names, 
                 header=None, delimiter='\t', thousands=',')

2022-09-22 19:12

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656