import pandas as pd
from sklearn.model_selection import train_test_split
mushroom = pd.read_csv("../data/mushroom.csv", header = None)
mushroom[0] = mushroom[0].replace("p", float(1))
mushroom[0] = mushroom[0].replace("e", float(0))
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(categories='auto'), [0])],
remainder='passthrough'
)
feat = mushroom.iloc[1:]
mush = ct.fit_transform(feat)
y = mushroom[0] == 1
X_train, X_test, y_train, y_test = train_test_split(mush, y, random_state = 0)
It's a question of predicting whether mushrooms are edible or poisonous. Since the data is in str form, only food or not is changed to float (1, 0) The rest of the characteristics were handled using a hot encoder. An error occurs when you try to divide the training set and the test set.
ValueError: Found input variables with inconsistent numbers of samples: [8123, 8124]
I checked the shape because I thought there was an error because the length of the data and the label did not match.
mush.shape
(8123, 24)
y.shape
(8124,)
I checked that the length of the data that turned the one-hot encoder is reduced.
python scikit-learn
>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1,2,1,1,1,1], "B":[33,24,52,66,22,111]})
>>> df
A B
0 1 33
1 2 24
2 1 52
3 1 66
4 1 22
5 1 111
>>> df.shape
(6, 2)
>>> f = df.iloc[1:]
>>> f
A B
1 2 24
2 1 52
3 1 66
4 1 22
5 1 111
>>> y = df['A'] == 1
>>> y
0 True
1 False
2 True
3 True
4 True
5 True
Name: A, dtype: bool
>>> f.shape
(5, 2)
>>> y.shape
(6,)
It's a natural result.
I have a very simple data frame above, and I gave you an example.
If you look at it, the length is not reduced by one during One Hot. Maybe mushroom.iloc [:, 1:] is what this questioner wanted.
>>> f1 = df.iloc[:,1:]
>>> f1
B
0 33
1 24
2 52
3 66
4 22
5 111
>>> f1.shape
(6, 1)
611 GDB gets version error when attempting to debug with the Presense SDK (IDE)
915 When building Fast API+Uvicorn environment with PyInstaller, console=False results in an error
581 PHP ssh2_scp_send fails to send files as intended
572 rails db:create error: Could not find mysql2-0.5.4 in any of the sources
© 2024 OneMinuteCode. All rights reserved.