I want to extract only duplicate data that matches the conditions from two DataFrames.

If there is data for A and B as shown below, and A and B overlap compared to the previous element, we would like to extract only those elements.
In this case, the "A4" in column A is subject to extraction.

I tried using the for statement as below, but I was not able to extract only duplicates.
I would appreciate it if you could let me know.
Thank you for your cooperation.

df1 = pd.DataFrame({'A':['A0', 'A1', 'A11', 'A3', ],
                    'B': [0,1,0,4]},
                   index=[0,1,2,3])

df2 = pd.DataFrame({'A':['A4', 'A4', 'A7', 'A7', 'A7', ],
                    'B': [3,3,8,7]},
                   index=[4,5,6,7])

df3 = pd.concat ([df1, df2])

for i in range (len(df3["A")-1)-1:
    for jin range(len(df3["B")-1)-1:
        if df3["A"][i]==df3["A"][i+1]and df3["B"][i]==df3["B"][i+1]:
            print(df3["B"][j])

python pandas

2022-09-30 21:59

4 Answers

For your information, here is how to avoid using for loop.

>>df3[(df3==df3.shift(1)).all(axis=1)]
    AB
5 A43

pandas.DataFrame.shift
pandas.DataFrame.all

2022-09-30 21:59

Shouldn't we use j as the subscript for B?

 if df3["A"][i]==df3["A"][i+1]and df3["B"][j]==df3["B"][j+1]:

2022-09-30 21:59

How about the following?

for i in range (1,len(df3["A")]):
    if df3["A"][i]==df3["A"][i-1]and df3["B"][i]==df3["B"][i-1]:
        print(df3["A"][i])

2022-09-30 21:59

forsentenceless pattern

df3.groupby(['A', 'B']).filter(lambdax:len(x)>=2)

#     AB
# 4 A43
# 5 A43

2022-09-30 21:59

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656