Creating classification models and selecting models
Four models of logistic regression, decision tree, random forest, and SVM are prepared, and the model is taken out from the list by comparison with #for statements to perform learning, prediction, and output of F1 values.I'm worried because I don't know the code for it.Please let me know.
Code
import numpy as np
import matplotlib.pyplot asplt
import seaborn as sns
import pandas aspd
from sklearn.model_selection import train_test_split
# Configuring Graph Display on JupiterNotebook
%matplotlib inline
# Configuring DataFrame to Display All Columns
pd.options.display.max_columns=None
# Display summary statistics
dataset=sns.load_dataset("titanic")
dataset=dataset [['survived', 'pcclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked']]]
dataset.head()
dataset.describe()
# Determine the number of missing values in each column
a=dataset.isnull().sum()
b=pd.isnull(dataset[['survived', 'pcclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked']).sum()
print(a)
print(b)
# AGE DEFECTIVE VALUE COMPLEMENTED WITH AVERAGE VALUE
dataset['age'] = dataset['age'].fillna(dataset['age'].mean())
dataset.head()
# Use the value_counts method to check the port with a large number of passengers.
dataset_em=dataset['embarked'].value_counts()
print(dataset_em)
# Confirm that the complementary defect value is 0 #embarked has a defect value 2
dataset.info()
# Convert the dataset's Sex and Embarked to dummy variables and substitute dataset2
dataset2=pd.get_dummies(data=dataset,columns=['sex','embarked'])
# View the first five lines of data in dataset2
dataset2.head()
# Obtain columns corresponding to the target and description variables from dataset2, convert them into numpy arrays, and store them in variables Y and X.
# Y—Column corresponding to the destination variable
Y = np.array (dataset2['survived'])
# X—Column corresponding to the description variable.Exclude Survived from dataset2
X = np.array (dataset2[['survived', 'pcclass', 'age', 'sibsp', 'parch', 'fare', 'sex_female', 'sex_male', 'embarked_C', 'embarked_Q', 'embarked_S']]))
# Check the shape
print("Y=",Y.shape,",X=",X.shape)
# Divide X and Y into machine learning data and test data by 7:3 (X_train, X_test, Y_train, Y_test)
X_train, X_test, Y_train, Y_test=train_test_split(X,Y,test_size=0.3, random_state=0)
# Divide machine learning data into learning data and verification data into 7:3 (X_train, X_valid, Y_train, Y_valid)
X_train, X_valid, Y_train, Y_valid=train_test_split(X_train, Y_train, test_size=0.3, random_state=0)
# Check shape: X_train, X_valid, X_test, Y_train, Y_valid, Y_test
print("Y_train",Y_train.shape,",X_train",X_train.shape)
print("Y_valid", Y_valid.shape, ", X_valid", X_valid.shape)
print("Y_test",Y_test.shape,",X_test",X_test.shape)
Create and compare logistic regression and decision tree, random forest, and SVM models by selecting models in the code below
#Import required libraries
from sklearn.linear_model importLogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.enable import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import f1_score
from sklearn.model_selection import cross_val_score
# Empty list available
model_list = [ ]
# Added model to list.Each argument is configured to be alerted
model_list.append (LogisticRegression(solver='lbfgs', multi_class='multinomial', max_iter=1000))
model_list.append (DecisionTreeClassifier(criterion='entropy')
model_list.append (RandomForestClassifier(n_estimators=100))
model_list.append(SVC(gamma='scale')
# Use the for statement to extract the model from the list, learn, predict, and output the F1 value ← I don't understand this.
"# Added model to list.US>Each argument is set to not warn you.
「# In the for statement, we have added (append) four models to the array model_list between the lines "Learn, predict, and output F1 values", so the array should look like this:
model_list[0], click LogisticRegression (solver='lbfgs', multi_class='multinominal', max_iter=1000)
where model_list[1] is the DecisionTreeClassifier (criterion='entropy')
where model_list[2] is the RandomForestClassifier (n_estimators=100)
In model_list[3], SVC (gamma='scale')
The rest is
f_value_list=[]# Prepare an array to record F1 values
for i in range(4): #4 models are used to learn and predict, and each F1 value is obtained.An obtained F1 value is recorded in an array f_value_list.
model=model_list[i]# Remove the i-th model.
# An F value is obtained by performing learning and prediction using a model.(I don't know the specifications of the model, so I can't show the code for learning and predicting, the code for calculating the F value, etc.I will omit it.)
# Assume that the F1 value falls into the variable FV
f_value_list[i] = FV
for i in range(4):# Display F1 values
print('{0}th model's F value: {1}'.format{i+1, f_value_list[i]})
Why don't we proceed like that?
© 2024 OneMinuteCode. All rights reserved.