I want to extract Python DataFrame

Asked 2 years ago, Updated 2 years ago, 45 views

I would like to extract DataFrame.

A1 B1 C1
A1 B1 C1
A2 B2 C2 A2B2C3 A3B3C4 A3B3C4 A3B3C5

I would like to leave all three columns of data like the one above and remove the duplicates.
(I don't want to leave column 0 A1, but I want to leave A3)

A2 B2 C2 A2B2C3 A3B3C4 A3B3C4 A3B3C5

However,
This is what df.duplicated() looks like.

A2 B2 C2 A2B2C3 A3B3C5

If df.duplicated (keep = 'last') this is what happens.
A1 B1 C1
A2 B2 C2 A2B2C3 A3B3C4 A3B3C5

How should I write it?

python

2022-09-30 20:22

1 Answers

First of all, I decided to use groupby() to make a decision separately for each Ax
I don't think about speed or sophistication.

import pandas as pd

data = [
    ['A1', 'B1', 'C1',]
    ['A1', 'B1', 'C1',]
    ['A2', 'B2', 'C2',]
    ['A2', 'B2', 'C3',]
    ['A3', 'B3', 'C4',]
    ['A3', 'B3', 'C4',]
    ["A3", "B3", "C5"]
]

df = pd.DataFrame(data)

df2 = pd.DataFrame()
for i,gindf.groupby(0):
    if not any(g.duplicated()) :## No duplication
        df2 = pd.concat ([df2,g], ignore_index = True)
    ## There are duplicates, but there are other values.
    elif(g.duplicated().value_counts(sort=False)[0]>1):
        df2 = pd.concat ([df2,g], ignore_index = True)

print(df2)


2022-09-30 20:22

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.