About n_jobs in sklearn

I tried to use grid search code for sklearn's random forest as below. If n_jobs is -1, it will be calculated in parallel with the best number of cores, so I did so. If n_jobs=1, it will take a few seconds. Memory and CPU utilization is close to 100% and working.

tuned_parameters=[{'n_estimators':[5, 10, 30, 50, 100], 'max_features':['sqrt', 'log2', None]}]
clf = GridSearchCV (RandomForestClassifier(), tuned_parameters, cv = 2, scoring = 'accuracy', n_jobs = -1)

python scikit-learn

2022-09-30 21:24

2 Answers

I don't know the details, but I'll put some relevant information on it.

http://scikit-learn.org/stable/faq.html#why-do-i-sometime-get-a-crash-freeze-with-n-jobs-1-under-osx-or-linux

In short, in the case of mac, when n_jobs>1, python multiprocessing is not called fork-safety, so it is considered a bug.

For Python 3.4+,

import multiprocessing

# other imports, custom code, load data, define model...

if__name__=='__main__':
    multiprocessing.set_start_method('forkserver')

    # call scikit-learn utilities with n_jobs>1here

It's good to say that

2022-09-30 21:24

For your information, this is the actual calculation speed when you perform a predict of 10000 data on the RandomForestClassifier learning model.
Parallel processing slows down significantly when you do something else.

n_jobs=8:1303.20sec
n_jobs=1:148.83sec
n_jobs=1 (using Parallel(n_jobs=8): 42.75sec
n_jobs=1 (using Parallel(n_jobs=8) + running other jobs): 475.95sec

2022-09-30 21:24

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656