Thank you for your help.In the process of estimating the annual income of job offers
Learning パラ I have a question because I failed to adjust the parameters.
It generates these characteristics and
After learning, I adjusted the parameters in the random forest as follows.
X=train_data [feature_cols]
y=train_data [target_col]
X_train, X_valid, y_train, y_valid=train_test_split(X,y,test_size=0.3, random_state=1234)
from sklearn.model_selection import GridSearchCV
rf = RandomForestRegressor (random_state=1234)
params={"n_estimators": [700,720,740,760,780,800], "max_depth": [3,4,5,6,7]}
gscv = GridSearchCV(rf, param_grid = params, verbose = 1, cv = 3,
scoring='neg_mean_squared_error', n_jobs=-1)
gscv.fit(X_train,y_train)
So, which of the 'company id' is affected by the 'c1070dcb' string data that cannot be converted to float?
"I checked, but there was nothing like that, so I didn't know how to deal with it, so I came here."
Is there a problem with the company_id itself because everything should be dummy variable?
python
First of all, the reason why company_id cannot float is because it is not data that can float.In fact, if you look at the raw data, everything contains a string like Cd346dca
+ hexadecimal notation, which cannot be float instead of decimal.
In the first place, this data is an ID, as the name suggests, like the name of the line.If it is used for machine learning, it would be inappropriate to use it as a feature amount.I think it would be better to just not use this line.Alternatively, some libraries can treat specific columns as columns (index columns) representing row names, so you can use them.
© 2024 OneMinuteCode. All rights reserved.