[Python] Find Panda's duplicate columns

Asked 1 years ago, Updated 1 years ago, 364 views

I want to find duplicate values in the data frame.

Example

Sentence_org Sentence
"Ganada Ramaba, Kanada Ramaba"
Canada, canada, canada, canada, canada
Azachakatapa, Azachakatapa
Let's go, let's go

Find the duplicate value of the entire data frame based on the Sentence column

Sentence_org Sentence Count
Kannada Ramaba Kannada Ramaba 0
Canadara Canadara 1
Azachakatapa Azachakatapa 0

I'd like to do the same as above. Thank you.

pandas python

2022-12-20 00:12

1 Answers

>>> import pandas as pd
>> df = pd.DataFrame ({ "sent0": ["Ganadara Ramaba", "Ganadara", "Azachaka", "Ganadara"], "sent": ["Ganadara", "Azachakatapa", "Ganadara"]})
>>> print(df.to_markdown())
|    |    | sent0           | sent         |
|---:|:----------------|:-------------|
|  0 | Kanadara Ramaba | Kanadara Ramaba |
|  1 | Kanadara | Kanadara |
|  2 | Azachaka | Azachakatapa |
|  3 | Kannadara | Kannadara |
>>> df.duplicated("sent")
0    False
1    False
2    False
3     True
dtype: bool
>>> df_ = df[~df.duplicated("sent")]
>>> print(df_.to_markdown())
|    |    | sent0           | sent         |
|---:|:----------------|:-------------|
|  0 | Kanadara Ramaba | Kanadara Ramaba |
|  1 | Kanadara | Kanadara |
|  2 | Azachaka | Azachakatapa |

Use pd.DataFrame.duplicated to eliminate duplication.

>>> count = df.groupby("sent").size()
>>> count
sent
"Ganadara" 2
Kanadaramba 1
Azachakatapa 1
dtype: int64

Number saved as groupby.

>>> df__ = df_.set_index("sent")
>>> df__["count"] = count - 1
>>> df__
           sent0  count
sent                   
Kanadaramaba Kanadaramaba 0
"Ganadara, Kanadara canadara 1
Azachakatapa Azachaka 0
>>> df___ = df__.reset_index()
>>> print(df___.to_markdown())
|    |    | sent         | sent0           |   count |
|---:|:-------------|:----------------|--------:|
|  0 | Kanadara Ramaba | Kanadara Ramaba | 0 |
|  1 | Kanadara | Kanadara | 1 |
|  2 | Azachakatapa | Azachakatapa | 0 |
>>> 

a combination of the two


2022-12-20 09:08

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.