As I had some questions regarding the implementation of Python's batch Affine layer, I have a question (reference book: Deep Learning from scratch, pp. 150-152).
Bias on the following Batch Affine Layers:
I would like to know the explanation in reference books and the source code.
Be careful when adding biases.The addition of the bias in forward propagation is done for each data (first, second, ...).Therefore, for reverse propagation, the reverse propagation values of each data must be aggregated into elements of bias.
db=np.sum(dY,axis=0)
I didn't understand why you were calculating the sum of the values (dYs) crossed.
"In the calculation graph of the error inverse propagation method, we understand that the ""+"" node passes the value from before to the lower node as it is."Therefore, I thought it would be natural to hand over the dY directly to the lower node.
Why does the batch version pass the sum of the N-line data, dY, to the lower node?
Please let me know.
I look forward to your kind cooperation.
For batches, when calculating forward propagation, b is automatically scaled and added to the other section np.dot(x,W)
, that is, to match the batch size.
As a result, the "lower node of the + node" is not b itself, but b is computed.
The bias term is input 1, weight b, so if you think forward propagation is multiplying it, the inverse propagation db
should be the same as dW
.So
self.db=np.sum(dY,axis=0)
is the same shape as dW
calculation dW=np.dot(x.T,dY)
self.db=np.dot (1 is a vector of batch size, dY)
Why don't you think that
For your reference, the source code can be found at https://github.com/oreilly-japan/deep-learning-from-scratch/blob/master/common/layers.py.
© 2024 OneMinuteCode. All rights reserved.