About Errors in String Data That Can't Be Converted to Python Float

Asked 2 years ago, Updated 2 years ago, 18 views

Thank you for your help.In the process of estimating the annual income of job offers
Learning パラ I have a question because I failed to adjust the parameters.

It generates these characteristics and characteristic

After learning, I adjusted the parameters in the random forest as follows.

X=train_data [feature_cols]
y=train_data [target_col]
X_train, X_valid, y_train, y_valid=train_test_split(X,y,test_size=0.3, random_state=1234)

from sklearn.model_selection import GridSearchCV

rf = RandomForestRegressor (random_state=1234)
params={"n_estimators": [700,720,740,760,780,800], "max_depth": [3,4,5,6,7]}
gscv = GridSearchCV(rf, param_grid = params, verbose = 1, cv = 3,
                    scoring='neg_mean_squared_error', n_jobs=-1)
gscv.fit(X_train,y_train)

Error Contents

So, which of the 'company id' is affected by the 'c1070dcb' string data that cannot be converted to float?
"I checked, but there was nothing like that, so I didn't know how to deal with it, so I came here."

Required Data Column

Is there a problem with the company_id itself because everything should be dummy variable?

python

2022-09-30 14:51

1 Answers

First of all, the reason why company_id cannot float is because it is not data that can float.In fact, if you look at the raw data, everything contains a string like Cd346dca + hexadecimal notation, which cannot be float instead of decimal.

In the first place, this data is an ID, as the name suggests, like the name of the line.If it is used for machine learning, it would be inappropriate to use it as a feature amount.I think it would be better to just not use this line.Alternatively, some libraries can treat specific columns as columns (index columns) representing row names, so you can use them.


2022-09-30 14:51

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.