Extract the least common words by Python group

Asked 2 years ago, Updated 2 years ago, 116 views

I'd like to extract the three most frequent words for each group. For example, when the data is as above, I would like to calculate the output as below.

A Persimmon 3 Banana 2 Apple 1 Strawberry 1

B Flatfish 2 Rockfish 1

Can I write Counter(most_common) and group-by together?

group-by counter python pandas

2022-09-20 19:22

2 Answers


Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> import pandas as pd

>> df = pd.DataFrame({"cat":list("AAABB"), "cont":[["Apple", "Banana", "Gam"], ["Strawberry", "Banana", "Gam"], ["Gam"], ["Flatfish", "Rockfish"]}
>>> df
  cat          cont
0 A [Apple, banana, persimmon]
1 A. [Strawberry, banana, persimmon]
2 A. [Gam]
3B. [Flatfish, rockfish]
4B. [Flatfish]
>>> df.groupby("cat").cont.sum()
cat
A [Apple, banana, persimmon, strawberry, banana, persimmon]
[Flatfish, rockfish, flatfish]
Name: cont, dtype: object
>>> grouped = df.groupby("cat").cont.sum()
>>> grouped
cat
A [Apple, banana, persimmon, strawberry, banana, persimmon]
[Flatfish, rockfish, flatfish]
Name: cont, dtype: object
>>> df2 = grouped.to_frame()
>>> df2
                            cont
cat                             
A [Apple, banana, persimmon, strawberry, banana, persimmon]
[Flatfish, rockfish, flatfish]


>>> from collections import Counter
>>> for c, l in df2.itertuples():
    print(Counter(l))


Counter ({'Gam': 3, 'Banana': 2, 'Apple': 1, 'Strawberry':')
Counter ({'flatfish': 2, 'rockfish':')
>>> for c, l in df2.itertuples():
    print(c, Counter(l))


A Counter ({'Gam': 3, 'Banana': 2, 'Apple': 1, 'Strawberry':')
B Counter ({'flatfish': 2, 'rockfish':')
>>> 


2022-09-20 19:22

Since only the content is needed, it is easier to apply the counter by appending all the same items than by groupby.

data = [{'category': 'A', 'content': ['apple', 'banana', 'gam']},
 {'category': 'A', 'content': ['strawberry', 'banana', 'gam']},
 {'category': 'A', 'content': ['Gam']},
 {'category': 'B', 'content': ['flatfish', 'rockfish']},
 {'category': 'B', 'content': ['flatfish']}]

from collections import defaultdict, Counter

result = defaultdict(list)

for d in data:
    result[d['category']] += d['content']

{k: Counter(g).most_common(3) for k, g in result.items()}

{'A': ('Gam', 3), ('Banana', 2), ('Apple', 1), 'B': [('Flatfish', 2), ('Rockfish', 1)]}


2022-09-20 19:22

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.