About the chain self-made loss function

Asked 2 years ago, Updated 2 years ago, 136 views

I would like to use Bernoulli's cross entropy as a loss function.

The first argument is the true probability.And the other is the probability of prediction corresponding to the label using SoftMax.I want to try this cross entropy

In fact, the document says:
http://docs.chainer.org/en/stable/reference/generated/chainer.functions.bernoulli_nll.html

It uses a sigmoid function, and I can't use it because I want to use SoftMax.Also
http://docs.chainer.org/en/stable/reference/generated/chainer.functions.softmax_cross_entropy.html
I also use labels to calculate this, but I can't use it because I want to handle the probability itself.

Thank you for your cooperation

python python3 python2

2022-09-30 17:46

1 Answers

 from chain import functions as F

crossEntropy=-F.sum(t*F.log_softmax(y))

If you want to minimize the probability distribution distance, I think you should use KL-diversity. Please consult with your mentor.

entropy=-F.sum(t[t.data.nonzero()]*F.log(t[t.data.nonzero()]))
klDiversity=(crossEntropy-entropy)/y.shape[0]

Note:
I don't know why you're doing Donwvote, but there seems to be still a questioner, so I'll leave a note.
The distance between probability distributions depends on the certainty of the target probability distribution.

In the normal classification task of Label Known, the one-hot vector is the true probability distribution, whereas the estimated label distribution is the uncertain distribution. In such cases, it is appropriate to find KL-diversity for the true distribution of the estimated distribution.

The code for KL-diversity is listed above, so please refer to it.

On the other hand, KL-diversity cannot be used when "wanting to bring the estimated two distributions closer." In the first place, KL-diversity is not suitable for this situation because distribution A → distribution B has different "difference" and distribution B → distribution A.

In many cases, Earth-over-diversity (Wasserstein-diversity) is commonly used. I think the code will come out if you google around "Chainer" and "WGAN".

Reference Link


2022-09-30 17:46

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.