About Errors in String Data That Can't Be Converted to Python Float

Thank you for your help.In the process of estimating the annual income of job offers
Learning パラ I have a question because I failed to adjust the parameters.

It generates these characteristics and

After learning, I adjusted the parameters in the random forest as follows.

X=train_data [feature_cols]
y=train_data [target_col]
X_train, X_valid, y_train, y_valid=train_test_split(X,y,test_size=0.3, random_state=1234)

from sklearn.model_selection import GridSearchCV

rf = RandomForestRegressor (random_state=1234)
params={"n_estimators": [700,720,740,760,780,800], "max_depth": [3,4,5,6,7]}
gscv = GridSearchCV(rf, param_grid = params, verbose = 1, cv = 3,
                    scoring='neg_mean_squared_error', n_jobs=-1)
gscv.fit(X_train,y_train)

So, which of the 'company id' is affected by the 'c1070dcb' string data that cannot be converted to float?
"I checked, but there was nothing like that, so I didn't know how to deal with it, so I came here."

Is there a problem with the company_id itself because everything should be dummy variable?

python

2022-09-30 14:51

1 Answers

First of all, the reason why company_id cannot float is because it is not data that can float.In fact, if you look at the raw data, everything contains a string like Cd346dca + hexadecimal notation, which cannot be float instead of decimal.

In the first place, this data is an ID, as the name suggests, like the name of the line.If it is used for machine learning, it would be inappropriate to use it as a feature amount.I think it would be better to just not use this line.Alternatively, some libraries can treat specific columns as columns (index columns) representing row names, so you can use them.

2022-09-30 14:51

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656