I want to get the total number of values for each column except NaN in the order of df to dd index.
If the previous df value is NaN, I want it to be counted without that row.
The hardcoded ones are as follows.
In the example, there are only 4 columns, but in case there are more, I want to use the Lambda application to make it simple, but it's hard.
If not, even a for statement.
02 -> 5 pieces (foo, fid, bar, cute, you)
06 -> 4 pieces (q, d, e, vd)
07 -> 2 pieces (g, z)
08 -> 1 piece (a)
[5, 4, 2, 1] I hope the result comes out, but I don't know what to do.
import pandas as pd
import numpy as np
dd = {'02':1,'06':2,'07':3,'08':4}
dd = pd.DataFrame.from_dict(dd, orient = 'index', columns = ['step'])
print(dd)
left = pd.DataFrame({'02':['foo','fid','bar','cute'], '06':['q','d','e','vd']})
right = pd.DataFrame({'02':['foo','you','bar'],'07':['g','v','z'], '08':['a','b','c']})
df = pd.merge(left,right, on=['02'], how='outer')
df.iloc[2,3] = np.nan
print()
print(df)
df2 = df.loc[df['06'].notnull()]
df3 = df2.loc[df2['07'].notnull()]
df4 = df3.loc[df3['08'].notnull()]
a = df['02'].notnull().sum()
b = df2['06'].notnull().sum()
c = df3['07'].notnull().sum()
d = df4['08'].notnull().sum()
print()
print([a,b,c,d])
import pandas as pd
import numpy as np
dd = {"02": 1, "06": 2, "07": 3, "08": 4}
dd_df = pd.DataFrame.from_dict(dd, orient="index", columns=["step"])
print(dd_df)
left = pd.DataFrame({"02": ["foo", "fid", "bar", "cute"], "06": ["q", "d", "e", "vd"]})
right = pd.DataFrame({"02": ["foo", "you", "bar"], "07": ["g", "v", "z"], "08": ["a", "b", "c"]})
df = pd.merge(left, right, on=["02"], how="outer")
df.iloc[2, 3] = np.nan
print()
print(df)
"""
df2 = df.loc[df['06'].notnull()]
df3 = df2.loc[df2['07'].notnull()]
df4 = df3.loc[df3['08'].notnull()]
a = df['02'].notnull().sum()
b = df2['06'].notnull().sum()
c = df3['07'].notnull().sum()
d = df4['08'].notnull().sum()
print()
print([a,b,c,d])
"""
df = df.notnull()
print(df)
sorted_col = sorted(dd.keys(), key=lambda c: dd[c])
# # print(sorted_col)
for col1, col2 in zip(sorted_col[:-1], sorted_col[1:]):
df[col2] = df[col1] & df[col2]
print(f"-- after ANDing {col1} and {col2} --")
print(df)
print(df.sum(axis=0))
step
02 1
06 2
07 3
08 4
02 06 07 08
0 foo q g a
1 fid d NaN NaN
2 bar e z NaN
3 cute vd NaN NaN
4 you NaN v b
02 06 07 08
0 True True True True
1 True True False False
2 True True True False
3 True True False False
4 True False True True
-- -- after ANDing 02 and 06 --
02 06 07 08
0 True True True True
1 True True False False
2 True True True False
3 True True False False
4 True False True True
-- -- after ANDing 06 and 07 --
02 06 07 08
0 True True True True
1 True True False False
2 True True True False
3 True True False False
4 True False False True
-- -- after ANDing 07 and 08 --
02 06 07 08
0 True True True True
1 True True False False
2 True True True False
3 True True False False
4 True False False False
02 5
06 4
07 2
08 1
dtype: int64
© 2024 OneMinuteCode. All rights reserved.