How to evaluate classification issues

For the first time in my research on machine learning, I am still a beginner.I would like to calculate accuracy, precision, and recall by dealing with the problem of classification in the study.
The number of data is small (about 30) and the split method greatly affects the score.I am having trouble with the evaluation method (for example, acuracy is [0.833333330.7272730.4444444]) due to the large variation in cross-validation.In this case, is it okay to evaluate the average by performing cross-examination 100 or 500 times?This is a very rudimentary question, but I appreciate your cooperation.

 ava=[ ]
avp = [ ]
avr = [ ]
estimators = [("MinMaxScaler", MinMaxScaler())], 
                  ("SVC", SVC(kernel='linear', class_weight='balanced', 
                    C=1, decision_function_shape='ovr')]
    pl = Pipeline (estimators)

for i in range (ITER):       
    acuracy=cross_val_score(pl,X,y,cv=StratifiedKFold(n_splits=3,shuffle=True))
    precision=cross_val_score(pl,X,y,scoring='precision_macro',cv=StratifiedKFold(n_splits=3,shuffle=True))
    recall=cross_val_score(pl,X,y,scoring='recall_macro',cv=StratifiedKFold(n_splits=3,shuffle=True))
    ava.append(np.mean(accuracy))
    avp.append(np.mean(precision))
    avr.append(np.mean(recall))
print("cross-val-score acuity{} times average: ".format(ITER), np.mean(ava), "\n")
print("cross-val-score precision {} times average: ".format(ITER), np.mean(avp), "\n")
print("cross-val-score recall{} times average:".format(ITER), np.mean(avr), "\n")

python machine-learning scikit-learn

2022-09-30 14:28

2 Answers

If the data is small, I think LOOCV would be a good way to verify generalization performance.There are about 30 cases in each class, so you can see which data affects accuracy by creating a data set with only one test data out of 120 cases and the remaining 119 cases as training data.I would appreciate it if you could refer to it.

2022-09-30 14:28

for i in range (ITER):

Even if you don't rotate the loop in , if you average cv= or n_splits= by adding as many natural numbers as possible (10-120?), you'll get more accurate metrics with more iterations.

2022-09-30 14:28

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656