When building a decision tree in Python, we need to get weighted_memory.
In the process, I don't understand the code to get the entropy value, so I'm asking you a question.
Weighted_Entropy = np.sum([(counts[i] / np.sum(counts))
* * entropy(data.where(data[feature] == vals[i]).dropna()[class]) \
for i in range(len(vals))])
The concept that I understand when getting the entropy value in is that when feature
is values[i]
, the entropy value is the sum of all the probabilities that each class is each class.
I think the concept I understood is right, but why did I use dropna()
here, and I don't understand grammatically what it means to have a class right after it, and I don't understand it well on the code. If the concept I understood is wrong, I also welcome the fact that it is wrong.
If there's anyone who knows this well, please explain it!
python machine-learning
In data.where(data[feature] == vals[i])
, the data.where
function leaves the data in data
when the condition is true, and returns the data in the part where the condition is false to NaN
.
That's why .draopna()
was used to remove the NaN
portion of the total data returned by data.where(data[feature]==vals[i])
. The [class]
in the background has multiple columns of data returned, so only the [class]
column was pointed out and only that column was brought.
You will understand if you refer to the code and the results below.
# -*- coding: utf-8 -*-
import pandas as pd
A=[[101,'a','z'],[102,'b','y'],[103,'c','x'],[104,'d','w']]
data=pd.DataFrame(A)
data.columns=['number','class','school']
print('=========')
print(data)
print('=========')
print(data.where(data['number']==101))
print('=========')
print(data.where(data['number']==101).dropna())
print('=========')
print(data.where(data['number']==101).dropna()['class'])
print('=========')
© 2024 OneMinuteCode. All rights reserved.