Data Frame Nan and None Comparison

Asked 2 years ago, Updated 2 years ago, 39 views

Python's Pandas has two data frames raw1, raw2, and I would like to output only the different ones (x1!=y1 or x2!=y2) comparing raw1's x1, x2 and raw2's y1, y2, but the missing value is nan on one side and None on the other side, so I'm having trouble comparing them.

raw1=raw1.where(pd.notnull(raw1)), None)
As a , we can bring nan to None, but

 out=pd.merge(raw1, raw2, on='key')
out = out [(out.x1 is not out.y1)]

There are no errors, but I am troubled that both x1 and y1 are left with none.

Finally,
out=out[(out.x1 is not out.y1) or (out.x2 is not out.y2)]
It doesn't work to output only different things as .
The following error appears:
ValueError: The true value of a series is ambivalent.Use a.empty, a.bool(), a.item(), a.any() or a.all().

If you are familiar with it, could you please let me know?
Thank you for your cooperation.

add

import pandas as pd

x = pd.DataFrame({'col_0': ["zero", "one", None],
                    'col_1': np.range(3,6),
                    'col_2':("6", "7", "None)},
                   index=['row_0', 'row_1', 'row_2'])
out = x [(x.col_0 is not x.col_2)]


KeyError: True
and
out=x[(x.col_0!=x.col_2)]
does not cause errors, but out and x are exactly the same (=not filtered at all).

I wonder if it is right to bring NaN to None in the first place
Is it correct to say that it is correct to move NaN to None because the column containing NaN must be a number?Or is there a way to compare None and NaN?

Run Environment
Windows 10 Python 3.7

python pandas

2022-09-30 16:34

1 Answers

This answer is helpful.

import pandas as pd
import numpy as np
from operator import is_not

x = pd.DataFrame({'col_0': ["zero", "one", None],
                  'col_1': np.range(3,6),
                  'col_2':("6", "7", "None)},
                  index=['row_0', 'row_1', 'row_2'])
print(x)

out=x.iloc [np.where(np.vectorize(is_not)(x.col_0,x.col_2))]
print(out)

output results

col_0col_1col_2
row_0zero36
row_1 one 47
row_2None5None

      col_0col_1col_2
row_0zero36
row_1 one 47

add

I wonder if it is right to bring NaN to None in the first place.Is it correct to say that it is right to bring None to None because the column containing NaN must be a number?Or is there a way to compare None and NaN?

You can think of a way to function the comparison process.

import pandas as pd
import numpy as np

x = pd.DataFrame({'col_0': ['zero', 'one', np.nan],
                  'col_1': np.range(3,6),
                  'col_2': ('6', '7', None)},
                  index=['row_0', 'row_1', 'row_2'])
print(x)

def(a,b):
  return(
    False if all (x in (None, np.nan) for x in (a, b))
    else(a!=b))

out = x.iloc [np.where(np.vectorize(f)(x.col_0,x.col_2)]
print(out)

=>
      col_0col_1col_2
row_0zero36
row_1 one 47
row_2NaN5None

      col_0col_1col_2
row_0zero36
row_1 one 47


2022-09-30 16:34

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.