I want to find duplicate values in the data frame.
Example
Sentence_org Sentence
"Ganada Ramaba, Kanada Ramaba"
Canada, canada, canada, canada, canada
Azachakatapa, Azachakatapa
Let's go, let's go
Find the duplicate value of the entire data frame based on the Sentence column
Sentence_org Sentence Count
Kannada Ramaba Kannada Ramaba 0
Canadara Canadara 1
Azachakatapa Azachakatapa 0
I'd like to do the same as above. Thank you.
pandas python
>>> import pandas as pd
>> df = pd.DataFrame ({ "sent0": ["Ganadara Ramaba", "Ganadara", "Azachaka", "Ganadara"], "sent": ["Ganadara", "Azachakatapa", "Ganadara"]})
>>> print(df.to_markdown())
| | | sent0 | sent |
|---:|:----------------|:-------------|
| 0 | Kanadara Ramaba | Kanadara Ramaba |
| 1 | Kanadara | Kanadara |
| 2 | Azachaka | Azachakatapa |
| 3 | Kannadara | Kannadara |
>>> df.duplicated("sent")
0 False
1 False
2 False
3 True
dtype: bool
>>> df_ = df[~df.duplicated("sent")]
>>> print(df_.to_markdown())
| | | sent0 | sent |
|---:|:----------------|:-------------|
| 0 | Kanadara Ramaba | Kanadara Ramaba |
| 1 | Kanadara | Kanadara |
| 2 | Azachaka | Azachakatapa |
Use pd.DataFrame.duplicated to eliminate duplication.
>>> count = df.groupby("sent").size()
>>> count
sent
"Ganadara" 2
Kanadaramba 1
Azachakatapa 1
dtype: int64
Number saved as groupby.
>>> df__ = df_.set_index("sent")
>>> df__["count"] = count - 1
>>> df__
sent0 count
sent
Kanadaramaba Kanadaramaba 0
"Ganadara, Kanadara canadara 1
Azachakatapa Azachaka 0
>>> df___ = df__.reset_index()
>>> print(df___.to_markdown())
| | | sent | sent0 | count |
|---:|:-------------|:----------------|--------:|
| 0 | Kanadara Ramaba | Kanadara Ramaba | 0 |
| 1 | Kanadara | Kanadara | 1 |
| 2 | Azachakatapa | Azachakatapa | 0 |
>>>
a combination of the two
© 2025 OneMinuteCode. All rights reserved.