Inquiry on how to lower Python LearnRegression MSE

Asked 2 years ago, Updated 2 years ago, 73 views

Look at the code given. The skeleton code is an implementation of the following models:

Sales = \beta_0 X_1^2 + \beta_1 X_2 + \beta_2 X_2 X_3 + \beta_3 X_3 + \beta_4Sales=β 

Try to lower the value of the MSE as much as possible by adding or subtracting a combination of variables in a given model.

Make the MSE less than 1 in the test data. Note that complicating the model results in overfitting.

<Code Created>

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

import csv
csvreader=csv.reader(open("data/Advertising.csv"))

x=[]
y=[]

next(csvreader)
for line in csvreader:
    x_i=[float(line[1]), float(line[2]), float(line[3])]
    y_i=[float(line[4])]

    x.append(x_i)
    y.append(y_i)

X=np.array(x)
Y=np.array(y)


X_poly = []
for x_i in X:
    X_poly.append([
        x_i[0] ** 2, # X_1^2
        x_i[1], # X_2
        x_i[1] * x_i[2], # X_2 * X_3
        x_i[2] # X_3
    ])


x_train, x_test, y_train, y_test = train_test_split(X_poly, Y, test_size=0.2, random_state=0)

lrmodel = LinearRegression()
lrmodel.fit(x_train, y_train)

predicted_y_train = lrmodel.predict(x_train)
mse_train = mean_squared_error(y_train, predicted_y_train)
print("MSE on train data: {}".format(mse_train))

predicted_y_test = lrmodel.predict(x_test)
mse_test = mean_squared_error(y_test, predicted_y_test)
print("MSE on test data: {}".format(mse_test))

If you do it like this

MSE on train data: 4.589288715884171
MSE on test data: 7.356365735074988

How do I lower the MSE on test data when it comes out like this?

mse python machine-learning scikit-learn regression-analysis

2022-09-22 14:54

1 Answers

Data is a lot of data on the Internet.

https://zetawiki.com/wiki/Advertising.csv

If it's the same way as below, it's around 0.2.

for x_i in X:
    X_poly.append([
        x_i[0] ** 0.4, # X_1^2
        x_i[1], # X_2
        x_i[0] ** 0.4 * x_i[1], # X_2 * X_3
        x_i[2] # X_3
    ])
MSE on train data: 0.2132098266677575
MSE on test data: 0.20428712970540414


2022-09-22 14:54

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.