Look at the code given. The skeleton code is an implementation of the following models:
Sales = \beta_0 X_1^2 + \beta_1 X_2 + \beta_2 X_2 X_3 + \beta_3 X_3 + \beta_4Sales=β
Try to lower the value of the MSE as much as possible by adding or subtracting a combination of variables in a given model.
Make the MSE less than 1 in the test data. Note that complicating the model results in overfitting.
<Code Created>
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
import csv
csvreader=csv.reader(open("data/Advertising.csv"))
x=[]
y=[]
next(csvreader)
for line in csvreader:
x_i=[float(line[1]), float(line[2]), float(line[3])]
y_i=[float(line[4])]
x.append(x_i)
y.append(y_i)
X=np.array(x)
Y=np.array(y)
X_poly = []
for x_i in X:
X_poly.append([
x_i[0] ** 2, # X_1^2
x_i[1], # X_2
x_i[1] * x_i[2], # X_2 * X_3
x_i[2] # X_3
])
x_train, x_test, y_train, y_test = train_test_split(X_poly, Y, test_size=0.2, random_state=0)
lrmodel = LinearRegression()
lrmodel.fit(x_train, y_train)
predicted_y_train = lrmodel.predict(x_train)
mse_train = mean_squared_error(y_train, predicted_y_train)
print("MSE on train data: {}".format(mse_train))
predicted_y_test = lrmodel.predict(x_test)
mse_test = mean_squared_error(y_test, predicted_y_test)
print("MSE on test data: {}".format(mse_test))
If you do it like this
MSE on train data: 4.589288715884171
MSE on test data: 7.356365735074988
How do I lower the MSE on test data when it comes out like this?
mse python machine-learning scikit-learn regression-analysis
Data is a lot of data on the Internet.
https://zetawiki.com/wiki/Advertising.csv
If it's the same way as below, it's around 0.2.
for x_i in X:
X_poly.append([
x_i[0] ** 0.4, # X_1^2
x_i[1], # X_2
x_i[0] ** 0.4 * x_i[1], # X_2 * X_3
x_i[2] # X_3
])
MSE on train data: 0.2132098266677575
MSE on test data: 0.20428712970540414
© 2024 OneMinuteCode. All rights reserved.