About the Deep Learning parameter update code created from scratch

This is regarding the [Learning using 5.7.4 Error Backpropagation Method] section of page 162 in Deep Learning from scratch by Yasuki Saito, and the parameter update section of the program code below.

for key in ('W1', 'b1', 'W2', 'b2'): 

    network.params [key] -= learning_rate*grads [key]

About the section

network.params [key] -= learning_rate*grads [key]

network.params [key] = network.params[key]-learning_rate*grads[key]

If that is the case, you will not be able to learn well and the correct answer rate will be constant.

As a result of my own research, unlike when -= was used, I repeatedly create and update objects stored at addresses that are completely different from the params properties of the original TwoLayerNet object.

But I'm stuck because I don't even know if it's misguided.
I would appreciate it if you could let me know if anyone has already learned and understood this book.

Additional information

Think while considering your reply.

I always refer to the address of net.params [parameters] that I generated when I generated TwoLayerNet, so I try not to change the address by using a substitute operator.

It turns out, but I don't understand why you always refer to the address of net.params [parameters].

We tested it with a simple program to see if it could be reproduced, but in this case, the instance property self.a of the test class referred to the same address even though the address was changed by the cumulative operator as expected.

import numpy as np

class test:
    
    def_init__(self,b):
        
        self.a=b
        
    def in_instance(self):
        print(id(self.a))

        
input_b=np.array([1,2,3])
added_c=np.array([2,3,4])


test=test(input_b)

print(id(test.a))
test.a = test.a + added_c
# test.a+ = added_c
print(id(test.a))

test.in_instance()

This means that the code in this book is to generate an instance of another class within this test class, so
If you generate an instance of the b(Affine or Relu) class using the properties that class a has in the a(TwoLayerNet) class as an argument, that argument will always be retained to refer to the address of the properties that class a originally has.
Therefore, if the address of net.params [parameters] generated when generating TwoLayerNet changes, the parameters are not updated and the gradient becomes constant.
Is this like a class version of a closure?

python

2022-09-30 17:30

1 Answers

I don't think it's wrong.

[Cumulative substitution]

network.params [key] -= learning_rate*grad [key]

and

network.params [key] = network.params [key] - learning_rate*grad [key]

I looked into the differences between .
Both of these calculations were the same

However, when the key was 'b1', the value of network.layers["Affine1"].b was different.
I refer to network.params['W1'] and network.params['b1'] in calculating network.layer ["Affine1"].b, but this is before the substitution of [subtraction, substitution]

The value before assignment is not updated, so no matter how many calculations you make, network.layer ["Affine1"].b is also not updated.Other network.params [key]The same goes for

I think that's the reason why it doesn't work out as intended.

The code we investigated is as follows.
https://github.com/oreilly-japan/deep-learning-from-scratch/blob/master/ch05/

[Additional note]

Is this like a class version of a closure?

I don't think it's like explaining it in a closure.I think it's simpler.

In the following code, a itself has changed at a+=1, and so has b, which is a itself.b Rather than b being changed, b is the changed a itself

import numpy as np
a = np.array ([1, 2, 3])
print(a)#[123]
b = a
a + = 1
print(b)#[234]

In the following code, a=a+1 does not change itself before substitution.The b that holds the reference to a before substitution has not changed either, because b is the a itself before operation.

import numpy as np
a = np.array ([1, 2, 3])
print(a)#[123]
b = a
a = a+1
print(b)#[123]

In a=a+1, a+1 generates a new object that is different from the one before the operation.
The variable b referencing the object in a before operation is independent of a+1.

2022-09-30 17:30

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656