Code for evaluating the importance of random forests for each segmentation of stratified k-division cross-validation in Python

Asked 2 years ago, Updated 2 years ago, 57 views

In Python, I would like to write a code that evaluates the importance of a random forest for each segment of the stratified five-part cross-validation.

I'd like to find the importance of random forest with five types of split data, and I'd like to show five diagrams showing the importance of characteristics in descending order, but only one diagram appears.
Is it the code that correctly stratifies and divides the data into five parts and learns the random forest with that data?

Please let me know.

K=5
kf = StratifiedKFold (n_splits=K, shuffle=True, random_state=42)

for fold, (train_indices, test_indices) in enumerate (kf.split(X,y)):
    X_train, X_test=X[train_indices], X[test_indices]
    y_train, y_test=y[train_indices], y[test_indices]


# Building a Random Forest Model
    model=RandomForestClassifier(n_estimators=100,
                                   n_jobs=-1, random_state=42, max_samples=None)
    model.fit(X_train,y_train)

# importance of feature quantity
    feature=model.feature_importances_
# Name of feature quantity
    label=df.columns [1:]
# Order of importance of feature quantity (descent order)
    indices=np.argsort (feature) [::1]

# plot
    x = range(len(feature))
    y=feature [indices]
    y_label=label [indices]
    plt.figure(figsize=(30,42))
    plt.barh(x,y,align='center')
    plt.yticks(x,y_label)
    plt.xlabel ("importance_num")
    plt.ylabel("label")
    plt.rcParams ["font.size"] = 9
    plt.show()

# Name of feature quantity
    label=df.columns [1:]

# Order of importance of feature quantity (descent order)
    indices=np.argsort (feature) [::-1]
    for i in range (len(feature)) :
        print(str(i+1)+"+str(label[indices[i]])+"+str(feature[indices[i]]))

python machine-learning

2022-09-30 14:53

1 Answers

The problem is probably caused by overwriting the variable y in the plot.Replace y with y_.

# plot
    x_=range(len(feature))
    y_=feature [indices]
    y_label=label [indices]
    plt.figure(figsize=(30,42))
    plt.barh(x_,y_,align='center')
    plt.yticks(x_,y_label)
    plt.xlabel ("importance_num")
    plt.ylabel("label")
    plt.rcParams ["font.size"] = 9
    plt.show()

Cross-validation is originally a comparison of scores.The meaning of comparing feature importances is likely to be "when trained in a set of data, it happens to be learned as an important feature."You may be overstudying, so you should look at it together with your score.


2022-09-30 14:53

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.