[Python] With a repeat statement, defaultdict append

Asked 2 years ago, Updated 2 years ago, 83 views

dfs[0], dfs[1] to dfs[8] data.

from transformers import AutoTokenizer
from collections import defaultdict
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased", do_lower_case = True)


# Number zero
word_freqs_0 = defaultdict(int)
for text in dfs[0]['comment']:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    new_words = [word for word, offset in words_with_offsets]
    for word in new_words:
        word_freqs_0[word] = word_freqs_0[word] + 1

# Number one
word_freqs_1 = defaultdict(int)
for text in dfs[1]['comment']:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    new_words = [word for word, offset in words_with_offsets]
    for word in new_words:
        word_freqs_1[word] = word_freqs_1[word] + 1

I've written up to eight times 8 like this. I want to make the above sentence into word_freeqs[0]~word_freeqs[8] using repetitive statements, so what should I do?

word_freqs = defaultdict(int)

After declaring the variable, I want to enter append but it doesn't work.

python dictionary refactoring

2022-09-20 11:29

1 Answers

It's a very simple refactoring problem. You collect and subtract repeated codes as a function.

# 0
word_freqs_0 = defaultdict(int)
for text in dfs[0]['comment']:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    new_words = [word for word, offset in words_with_offsets]
    for word in new_words:
        word_freqs_0[word] = word_freqs_0[word] + 1

# Number one
word_freqs_1 = defaultdict(int)
for text in dfs[1]['comment']:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    new_words = [word for word, offset in words_with_offsets]
    for word in new_words:
        word_freqs_1[word] = word_freqs_1[word] + 1

If you look at the common repetition, the change, dfs[0]dfs[0]dfs[1] has changed, word_freeqs_0 and word_freeqs_1 have changed. And the rest are all the same. If you look closely at this, you can see that given a df as an input, you can make word_freeq a function that returns it and repeat it.

def get_word_freq_from_df(df: pd.DataFrame):
    word_freq = defaultdict(int)
    for text in df['comment']:
        words_with_offsets =   
 tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
        new_words = [word for word, offset in words_with_offsets]
        for word in new_words:
            word_freq[word] = word_freq[word] + 1
    return word_freq

Function completion.

Considering that you want to do this lump again in units,

If you make it into a code,

word_freqs = []
for df in dfs:
    word_freq = get_word_freq_from_df(df)
    word_freqs.append(word_freq)

If you rewrite this as a list compliance,

word_freqs = [ get_word_freq_from_df(df) for df in dfs ]


2022-09-20 11:29

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.