About n_jobs in sklearn

Asked 2 years ago, Updated 2 years ago, 78 views

I tried to use grid search code for sklearn's random forest as below. If n_jobs is -1, it will be calculated in parallel with the best number of cores, so I did so. If n_jobs=1, it will take a few seconds. Memory and CPU utilization is close to 100% and working.

tuned_parameters=[{'n_estimators':[5, 10, 30, 50, 100], 'max_features':['sqrt', 'log2', None]}]
clf = GridSearchCV (RandomForestClassifier(), tuned_parameters, cv = 2, scoring = 'accuracy', n_jobs = -1)

python scikit-learn

2022-09-30 21:24

2 Answers

I don't know the details, but I'll put some relevant information on it.

http://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n-jobs-1-under-osx-or-linux

In short, in the case of mac, when n_jobs>1, python multiprocessing is not called fork-safety, so it is considered a bug.

For Python 3.4+,

import multiprocessing

# other imports, custom code, load data, define model...

if__name__=='__main__':
    multiprocessing.set_start_method('forkserver')

    # call scikit-learn utilities with n_jobs>1here

It's good to say that


2022-09-30 21:24

For your information, this is the actual calculation speed when you perform a predict of 10000 data on the RandomForestClassifier learning model.
Parallel processing slows down significantly when you do something else.

  • n_jobs=8:1303.20sec
  • n_jobs=1:148.83sec
  • n_jobs=1 (using Parallel(n_jobs=8): 42.75sec
  • n_jobs=1 (using Parallel(n_jobs=8) + running other jobs): 475.95sec


2022-09-30 21:24

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.