I'd like to extract the three most frequent words for each group. For example, when the data is as above, I would like to calculate the output as below.
A Persimmon 3 Banana 2 Apple 1 Strawberry 1
B Flatfish 2 Rockfish 1
Can I write Counter(most_common) and group-by together?
group-by counter python pandas
Python 3.8.5 (tags/v3.8.5:580fbb0, Jul 20 2020, 15:57:54) [MSC v.1924 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license()" for more information.
>>> import pandas as pd
>> df = pd.DataFrame({"cat":list("AAABB"), "cont":[["Apple", "Banana", "Gam"], ["Strawberry", "Banana", "Gam"], ["Gam"], ["Flatfish", "Rockfish"]}
>>> df
cat cont
0 A [Apple, banana, persimmon]
1 A. [Strawberry, banana, persimmon]
2 A. [Gam]
3B. [Flatfish, rockfish]
4B. [Flatfish]
>>> df.groupby("cat").cont.sum()
cat
A [Apple, banana, persimmon, strawberry, banana, persimmon]
[Flatfish, rockfish, flatfish]
Name: cont, dtype: object
>>> grouped = df.groupby("cat").cont.sum()
>>> grouped
cat
A [Apple, banana, persimmon, strawberry, banana, persimmon]
[Flatfish, rockfish, flatfish]
Name: cont, dtype: object
>>> df2 = grouped.to_frame()
>>> df2
cont
cat
A [Apple, banana, persimmon, strawberry, banana, persimmon]
[Flatfish, rockfish, flatfish]
>>> from collections import Counter
>>> for c, l in df2.itertuples():
print(Counter(l))
Counter ({'Gam': 3, 'Banana': 2, 'Apple': 1, 'Strawberry':')
Counter ({'flatfish': 2, 'rockfish':')
>>> for c, l in df2.itertuples():
print(c, Counter(l))
A Counter ({'Gam': 3, 'Banana': 2, 'Apple': 1, 'Strawberry':')
B Counter ({'flatfish': 2, 'rockfish':')
>>>
Since only the content is needed, it is easier to apply the counter by appending all the same items than by groupby.
data = [{'category': 'A', 'content': ['apple', 'banana', 'gam']},
{'category': 'A', 'content': ['strawberry', 'banana', 'gam']},
{'category': 'A', 'content': ['Gam']},
{'category': 'B', 'content': ['flatfish', 'rockfish']},
{'category': 'B', 'content': ['flatfish']}]
from collections import defaultdict, Counter
result = defaultdict(list)
for d in data:
result[d['category']] += d['content']
{k: Counter(g).most_common(3) for k, g in result.items()}
{'A': ('Gam', 3), ('Banana', 2), ('Apple', 1), 'B': [('Flatfish', 2), ('Rockfish', 1)]}
© 2024 OneMinuteCode. All rights reserved.