Count the number of non-NaN values in Python Pandas data frames + some conditions

I want to get the total number of values for each column except NaN in the order of df to dd index.

If the previous df value is NaN, I want it to be counted without that row. The hardcoded ones are as follows.
In the example, there are only 4 columns, but in case there are more, I want to use the Lambda application to make it simple, but it's hard. If not, even a for statement.

02 -> 5 pieces (foo, fid, bar, cute, you)  
06 -> 4 pieces (q, d, e, vd)  
07 -> 2 pieces (g, z)  
08 -> 1 piece (a)

[5, 4, 2, 1] I hope the result comes out, but I don't know what to do.

import pandas as pd
import numpy as np

dd = {'02':1,'06':2,'07':3,'08':4}
dd = pd.DataFrame.from_dict(dd, orient = 'index', columns = ['step'])
print(dd)

left = pd.DataFrame({'02':['foo','fid','bar','cute'], '06':['q','d','e','vd']})
right = pd.DataFrame({'02':['foo','you','bar'],'07':['g','v','z'], '08':['a','b','c']})
df = pd.merge(left,right, on=['02'], how='outer')
df.iloc[2,3] = np.nan
print()
print(df)

df2 = df.loc[df['06'].notnull()]
df3 = df2.loc[df2['07'].notnull()]
df4 = df3.loc[df3['08'].notnull()]

a = df['02'].notnull().sum()
b = df2['06'].notnull().sum()
c = df3['07'].notnull().sum()
d = df4['08'].notnull().sum()

print()
print([a,b,c,d])

python pandas

2022-09-20 08:57

1 Answers


import pandas as pd
import numpy as np

dd = {"02": 1, "06": 2, "07": 3, "08": 4}
dd_df = pd.DataFrame.from_dict(dd, orient="index", columns=["step"])
print(dd_df)

left = pd.DataFrame({"02": ["foo", "fid", "bar", "cute"], "06": ["q", "d", "e", "vd"]})
right = pd.DataFrame({"02": ["foo", "you", "bar"], "07": ["g", "v", "z"], "08": ["a", "b", "c"]})
df = pd.merge(left, right, on=["02"], how="outer")
df.iloc[2, 3] = np.nan
print()
print(df)

"""
df2 = df.loc[df['06'].notnull()]
df3 = df2.loc[df2['07'].notnull()]
df4 = df3.loc[df3['08'].notnull()]

a = df['02'].notnull().sum()
b = df2['06'].notnull().sum()
c = df3['07'].notnull().sum()
d = df4['08'].notnull().sum()

print()
print([a,b,c,d])
"""

df = df.notnull()
print(df)

sorted_col = sorted(dd.keys(), key=lambda c: dd[c])
# # print(sorted_col)

for col1, col2 in zip(sorted_col[:-1], sorted_col[1:]):
    df[col2] = df[col1] & df[col2]
    print(f"-- after ANDing {col1} and {col2} --")
    print(df)

print(df.sum(axis=0))

    step
02     1
06     2
07     3
08     4

     02   06   07   08
0   foo    q    g    a
1   fid    d  NaN  NaN
2   bar    e    z  NaN
3  cute   vd  NaN  NaN
4   you  NaN    v    b
     02     06     07     08
0  True   True   True   True
1  True   True  False  False
2  True   True   True  False
3  True   True  False  False
4  True  False   True   True
-- -- after ANDing 02 and 06 --
     02     06     07     08
0  True   True   True   True
1  True   True  False  False
2  True   True   True  False
3  True   True  False  False
4  True  False   True   True
-- -- after ANDing 06 and 07 --
     02     06     07     08
0  True   True   True   True
1  True   True  False  False
2  True   True   True  False
3  True   True  False  False
4  True  False  False   True
-- -- after ANDing 07 and 08 --
     02     06     07     08
0  True   True   True   True
1  True   True  False  False
2  True   True   True  False
3  True   True  False  False
4  True  False  False  False
02    5
06    4
07    2
08    1
dtype: int64

2022-09-20 08:57

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656