I'm a python beginner.
I've been doing a lot of research, but I don't feel comfortable, so could someone tell me?
◆ This is a way to convert vertical data to horizontal data in Pandas data frames.
Suppose you have the following vertical data (read from csv):
ID number item
0 A 111 nn
1 A 123 mm
2 A 211 op
3 B 121 nm
4 B 333 nn
I would like to make this horizontal data as follows.
I also want to output this in a data frame.
ID number item number item number item number item
A111 nn 123 mm 211 op
B211 nm 333 nn
I was able to execute it steadily with For statements, but considering the speed of the processing, I think it would be good.
Also, as I am not good at it, I will refrain from posting it here...
Thank you for your cooperation.
python pandas
Apply the function The output shows the following. However, it is annoying that the index is multiindex with unnecessary 0s, so if you do the following, you can erase 0 and make it singleindex.f
to group with groupby
and combine elements in the group with apply
.DataFrame
, so you can't use the same column name here, so add a new number to the number1
, number(code>,
number2...
deff(a):
a. index = [0 for i in range(len(a))]
dela ['ID']
out = a [0:1]
for i in range (1,len(a)):
out=out.join(a[i:i+1], rsuffix='{0}'.format(i))
return out
df = pd.DataFrame([['A', 111, 'nn'], ['A', 123, 'mm'], ['A', 211, 'op'],
['B', 121, 'nm'], ['B', 333, 'nn'], columns = ['ID', 'number', 'item'])
df2 = df.groupby(df['ID']).apply(f)
>>>print(df2)
item item 1 item 2 number number 1 number 2
ID
A 0 nn mm op 111 123 211.0
B0nmnnNaN121333Nan
>>>df3=pandas.DataFrame(df2.to_records())
>>> del df3 ['level_0']
> > print (df3)
ID item item1 item2 number number1 number2
0 Ann mm op 111 123 211.0
1BnmnnNaN121333NaN
Dataframe is the ideal output.
It's far from simple, but it looks like this:
>>import pandas as pd
>>> from ittertools import chain
>>>df=pd.DataFrame([
['A', 111, 'nn'], ['A', 123, 'mm'], ['A', 211, 'op'],
['B', 121, 'nm'], ['B', 333, 'nn'],
], columns = ['ID', 'number', 'item')
>>df
ID number item
0 A 111 nn
1 A 123 mm
2 A 211 op
3 B 121 nm
4 B 333 nn
>>>df2=df.groupby('ID')\
.apply(
lambdax:
pd.DataFrame([list(x.ID)[0]]+
list(chain.from_iterable(zip(x.number,x.item))))))\
.reset_index(drop=True)
>>>df2.columns = ['ID'] + list(chain.from_iterable(
[['number'+str(i+1), 'item'+str(i+1)]
for i in range ((len(df2.columns)-1)/2)])
>>df2
ID number 1 item 1 number 2 item 2 number 3 item 3
0 A 111 nn 123 mm 211 op
1B 121 nm 333 nn NaN NaN
The missing column is NaN
.If you want to include a specific value, you may want to use filna()
.
I can do this.
You can define an index for each ID, so you can apply it to some extent.
df["tmpindex"] = df.index
df.tmpindex=df.groupby('ID').tmpindex.rank(ascending=True)
This is what the molding looks like.
df["numberindex"] = df.tmpindex.apply(lambdax:"number{}".format(int(x)))
df["itemindex"] = df.tmpindex.apply(lambdax:"item{}".format(int(x)))
df_pv = pd.concat(
[
df.pivot(index="ID", columns="numberindex", values="number"),
df.pivot(index="ID", columns="itemindex", values="item")
],
axis=1)
df_pv["ID"] = df_pv.index
df_pv.reset_index(drop=True).ix[:,[-1]+list(range(df_pv.shape[1]-1)]
© 2024 OneMinuteCode. All rights reserved.