I tried to use grid search code for sklearn's random forest as below. If n_jobs is -1, it will be calculated in parallel with the best number of cores, so I did so. If n_jobs=1, it will take a few seconds. Memory and CPU utilization is close to 100% and working.
tuned_parameters=[{'n_estimators':[5, 10, 30, 50, 100], 'max_features':['sqrt', 'log2', None]}]
clf = GridSearchCV (RandomForestClassifier(), tuned_parameters, cv = 2, scoring = 'accuracy', n_jobs = -1)
I don't know the details, but I'll put some relevant information on it.
In short, in the case of mac, when n_jobs>1, python multiprocessing is not called fork-safety, so it is considered a bug.
For Python 3.4+,
import multiprocessing
# other imports, custom code, load data, define model...
if__name__=='__main__':
multiprocessing.set_start_method('forkserver')
# call scikit-learn utilities with n_jobs>1here
It's good to say that
For your information, this is the actual calculation speed when you perform a predict of 10000 data on the RandomForestClassifier learning model.
Parallel processing slows down significantly when you do something else.
© 2024 OneMinuteCode. All rights reserved.