In Python, I would like to write a code that evaluates the importance of a random forest for each segment of the stratified five-part cross-validation.
I'd like to find the importance of random forest with five types of split data, and I'd like to show five diagrams showing the importance of characteristics in descending order, but only one diagram appears.
Is it the code that correctly stratifies and divides the data into five parts and learns the random forest with that data?
Please let me know.
K=5
kf = StratifiedKFold (n_splits=K, shuffle=True, random_state=42)
for fold, (train_indices, test_indices) in enumerate (kf.split(X,y)):
X_train, X_test=X[train_indices], X[test_indices]
y_train, y_test=y[train_indices], y[test_indices]
# Building a Random Forest Model
model=RandomForestClassifier(n_estimators=100,
n_jobs=-1, random_state=42, max_samples=None)
model.fit(X_train,y_train)
# importance of feature quantity
feature=model.feature_importances_
# Name of feature quantity
label=df.columns [1:]
# Order of importance of feature quantity (descent order)
indices=np.argsort (feature) [::1]
# plot
x = range(len(feature))
y=feature [indices]
y_label=label [indices]
plt.figure(figsize=(30,42))
plt.barh(x,y,align='center')
plt.yticks(x,y_label)
plt.xlabel ("importance_num")
plt.ylabel("label")
plt.rcParams ["font.size"] = 9
plt.show()
# Name of feature quantity
label=df.columns [1:]
# Order of importance of feature quantity (descent order)
indices=np.argsort (feature) [::-1]
for i in range (len(feature)) :
print(str(i+1)+"+str(label[indices[i]])+"+str(feature[indices[i]]))
The problem is probably caused by overwriting the variable y in the plot.Replace y with y_.
# plot
x_=range(len(feature))
y_=feature [indices]
y_label=label [indices]
plt.figure(figsize=(30,42))
plt.barh(x_,y_,align='center')
plt.yticks(x_,y_label)
plt.xlabel ("importance_num")
plt.ylabel("label")
plt.rcParams ["font.size"] = 9
plt.show()
Cross-validation is originally a comparison of scores.The meaning of comparing feature importances is likely to be "when trained in a set of data, it happens to be learned as an important feature."You may be overstudying, so you should look at it together with your score.
© 2024 OneMinuteCode. All rights reserved.