<Problem>
Characteristics of English words and frequency of use of words The second project deals with the British National Corpus (BNC) Text Corpus (Word Collection), which consists of about 100 million English words.
Let's graphically visualize the frequency of 10,000 words stored in the words.txt file and apply linear regression. Read the comments and complete the do_linear_regression() function so that the execution results are output.
<Task>
Understand the code with the annotation and check the return value of each function.
Write the do_linear_regression() function (44th line).
Press the Run button and check the chart being printed.
Uncomment the 21st and 22nd lines of the main() function. Press the Run button and check the newly printed chart.
Compare the printed graph with the [execution results] below and press the Submit button.
<Code>
import operator from sklearn.linear_model import LinearRegression import numpy as np import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt import elice_utils
def main(): words = read_data()
**# words.txt, sorted by frequency.
words = sorted(____)***
# Stores words expressed as integers in the X-axis list and the frequency of each word in the Y-axis list.
X = list(range(1, len(words)+1))
Y = [x[1] for x in words]
# Convert the X and Y lists into arrays and apply log() to each element value.
X, Y = np.array(X), np.array(Y)
X, Y = np.log(X), np.log(Y)
# Obtain the slope and intercepts, and then output graphs and charts.
slope, intercept = do_linear_regression(X, Y)
draw_chart(X, Y, slope, intercept)
return slope, intercept
def read_data():
*# words.txt,
# [[Word 1, Frequency], [Word 2, Frequency]...] Convert to type and return.
words = []*
return words
def do_linear_regression(X, Y): # Write the do_linear_regression() function.
return (slope, intercept)
def draw_chart(X, Y, slope, intercept): fig = plt.figure() ax = fig.add_subplot(111) plt.scatter(X, Y)
# Sets the X and Y axis ranges and graphs of the chart.
min_X = min(X)
max_X = max(X)
min_Y = min_X * slope + intercept
max_Y = max_X * slope + intercept
plt.plot([min_X, max_X], [min_Y, max_Y],
color='red',
linestyle='--',
linewidth=3.0)
# Use the slope and intercepts to enter the graph into the chart.
ax.text(min_X, min_Y + 0.1, r'$y = %.2lfx + %.2lf$' % (slope, intercept), fontsize=15)
plt.savefig('chart.png')
elice_utils.send_image('chart.png')
if name == "main": main()
I have to write three parts.I have no idea.Help me
linear regression-analysis
Since it is a linear model with one feature, you can draw a first-order graph with just one w and one bias.
There was no words.txt, so I extracted words with Shakespeare files and tested them.
Please look at the laptop below and move it.
https://notebooks.azure.com/wincommerce/projects/hashcode/html/9326.ipynb
https://notebooks.azure.com/wincommerce/projects/hashcode/html/shakespeare.txt
© 2024 OneMinuteCode. All rights reserved.