[Python] With a repeat statement, defaultdict append

dfs[0], dfs[1] to dfs[8] data.

from transformers import AutoTokenizer
from collections import defaultdict
tokenizer = AutoTokenizer.from_pretrained("bert-base-cased", do_lower_case = True)


# Number zero
word_freqs_0 = defaultdict(int)
for text in dfs[0]['comment']:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    new_words = [word for word, offset in words_with_offsets]
    for word in new_words:
        word_freqs_0[word] = word_freqs_0[word] + 1

# Number one
word_freqs_1 = defaultdict(int)
for text in dfs[1]['comment']:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    new_words = [word for word, offset in words_with_offsets]
    for word in new_words:
        word_freqs_1[word] = word_freqs_1[word] + 1

I've written up to eight times 8 like this. I want to make the above sentence into word_freeqs[0]~word_freeqs[8] using repetitive statements, so what should I do?

word_freqs = defaultdict(int)

After declaring the variable, I want to enter append but it doesn't work.


python
dictionary
refactoring
					
					

	


		
	

	
		2022-09-20 11:29



			

			
			1 Answers


	
		
It's a very simple refactoring problem. You collect and subtract repeated codes as a function.
# 0
word_freqs_0 = defaultdict(int)
for text in dfs[0]['comment']:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    new_words = [word for word, offset in words_with_offsets]
    for word in new_words:
        word_freqs_0[word] = word_freqs_0[word] + 1

# Number one
word_freqs_1 = defaultdict(int)
for text in dfs[1]['comment']:
    words_with_offsets = tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
    new_words = [word for word, offset in words_with_offsets]
    for word in new_words:
        word_freqs_1[word] = word_freqs_1[word] + 1
If you look at the common repetition, the change, dfs[0]dfs[0]dfs[1] has changed, word_freeqs_0 and word_freeqs_1 have changed. And the rest are all the same. If you look closely at this, you can see that given a df as an input, you can make word_freeq a function that returns it and repeat it.
def get_word_freq_from_df(df: pd.DataFrame):
    word_freq = defaultdict(int)
    for text in df['comment']:
        words_with_offsets =   
 tokenizer.backend_tokenizer.pre_tokenizer.pre_tokenize_str(text)
        new_words = [word for word, offset in words_with_offsets]
        for word in new_words:
            word_freq[word] = word_freq[word] + 1
    return word_freq
Function completion.
Considering that you want to do this lump again in units,
If you make it into a code,
word_freqs = []
for df in dfs:
    word_freq = get_word_freq_from_df(df)
    word_freqs.append(word_freq)
If you rewrite this as a list compliance,
word_freqs = [ get_word_freq_from_df(df) for df in dfs ]


		
		
			

				

					
				

				
					2022-09-20 11:29
				
			
		
	

			
			If you have any answers or tips



		

	
		Popular Tags
	
	python x 4647
android x 1593
java x 1494
javascript x 1427
c x 927
c++ x 878
ruby-on-rails x 696
php x 692
python3 x 685
html x 656
	


	
		Popular Questions
	
	
	1238 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error

	1071 In Java servlet, when SHA-256 sends WW-Authenticate header for digest authentication, the client does not return the result.

	778 M2 Mac fails to install rbenv install 3.1.3 due to errors

	857 Uncaught (inpromise) Error on Electron: An object could not be cloned

	637 Scrap text information after the "View More" button when searching in the Yahoo! News search window