Please tell me a good way to keep the vertical data horizontally in Python's Pandas.

Asked 2 years ago, Updated 2 years ago, 46 views

I'm a python beginner.
I've been doing a lot of research, but I don't feel comfortable, so could someone tell me?

◆ This is a way to convert vertical data to horizontal data in Pandas data frames.

Suppose you have the following vertical data (read from csv):

ID number item
0 A 111 nn
1 A 123 mm
2 A 211 op
3 B 121 nm
4 B 333 nn

I would like to make this horizontal data as follows.
I also want to output this in a data frame.

ID number item number item number item number item
A111 nn 123 mm 211 op
B211 nm 333 nn

I was able to execute it steadily with For statements, but considering the speed of the processing, I think it would be good.
Also, as I am not good at it, I will refrain from posting it here...

Thank you for your cooperation.

python pandas

2022-09-30 17:05

3 Answers

Apply the function f to group with groupby and combine elements in the group with apply.DataFrame, so you can't use the same column name here, so add a new number to the number1, number(code>, number2...

deff(a):    
    a. index = [0 for i in range(len(a))]
    dela ['ID']
    out = a [0:1]
    for i in range (1,len(a)):
        out=out.join(a[i:i+1], rsuffix='{0}'.format(i))
    return out

df = pd.DataFrame([['A', 111, 'nn'], ['A', 123, 'mm'], ['A', 211, 'op'],
  ['B', 121, 'nm'], ['B', 333, 'nn'], columns = ['ID', 'number', 'item'])

df2 = df.groupby(df['ID']).apply(f)

The output shows the following.

>>>print(df2)
     item item 1 item 2 number number 1 number 2
ID                                             
A 0 nn mm op 111 123 211.0
B0nmnnNaN121333Nan  

However, it is annoying that the index is multiindex with unnecessary 0s, so if you do the following, you can erase 0 and make it singleindex.

>>>df3=pandas.DataFrame(df2.to_records())
>>> del df3 ['level_0']
> > print (df3)
 ID item item1 item2 number number1 number2
0 Ann mm op 111 123 211.0
1BnmnnNaN121333NaN    


2022-09-30 17:05

Dataframe is the ideal output.

It's far from simple, but it looks like this:

>>import pandas as pd
>>> from ittertools import chain
>>>df=pd.DataFrame([
      ['A', 111, 'nn'], ['A', 123, 'mm'], ['A', 211, 'op'],
      ['B', 121, 'nm'], ['B', 333, 'nn'],
    ], columns = ['ID', 'number', 'item')

>>df
  ID number item
0 A 111 nn
1 A 123 mm
2 A 211 op
3 B 121 nm
4 B 333 nn

>>>df2=df.groupby('ID')\
      .apply(
        lambdax:
          pd.DataFrame([list(x.ID)[0]]+
                       list(chain.from_iterable(zip(x.number,x.item))))))\
      .reset_index(drop=True)

>>>df2.columns = ['ID'] + list(chain.from_iterable(
                    [['number'+str(i+1), 'item'+str(i+1)]
                       for i in range ((len(df2.columns)-1)/2)])

>>df2
  ID number 1 item 1 number 2 item 2 number 3 item 3
0 A 111 nn 123 mm 211 op
1B 121 nm 333 nn NaN NaN

The missing column is NaN.If you want to include a specific value, you may want to use filna().


2022-09-30 17:05

I can do this.
You can define an index for each ID, so you can apply it to some extent.

df["tmpindex"] = df.index
df.tmpindex=df.groupby('ID').tmpindex.rank(ascending=True)

This is what the molding looks like.

df["numberindex"] = df.tmpindex.apply(lambdax:"number{}".format(int(x)))
df["itemindex"] = df.tmpindex.apply(lambdax:"item{}".format(int(x)))
df_pv = pd.concat(
                [
                 df.pivot(index="ID", columns="numberindex", values="number"),
                 df.pivot(index="ID", columns="itemindex", values="item")
                 ],
                axis=1)
df_pv["ID"] = df_pv.index
df_pv.reset_index(drop=True).ix[:,[-1]+list(range(df_pv.shape[1]-1)]


2022-09-30 17:05

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.