Understanding How to Obtain Gradient Information in a Chainer

I have a question about the chain.

I want to load a learned NN model and get gradient information for each layer when I enter an image, but I am worried that I cannot implement it.

The network is defined as follows:

class Model 1 (chain.Chain):

    def_init__(self, input_chs, n_outputs):

        super(Model 1 Activation, self).__init__()

        with self.init_scope():

            self.I = [ ]
            self.conv1 = L. Convolution 2D (input_chs, 16, 5, stride = 1, pad = 0)
            self.conv1_hidden=[ ]
            self.relu1_hidden=[ ]
            self.maxpool1_hidden=[ ]
            self.conv2 = L. Convolution 2D (16, 32, 5, stride = 1, pad = 0)
            self.conv2_hidden=[ ]
            self.relu2_hidden=[ ]
            self.maxpool2_hidden=[ ]
            self.conv3 = L. Convolution 2D (32, 64, 5, stride = 1, pad = 0)
            self.conv3_hidden=[ ]
            self.relu3_hidden=[ ]
            self.l4 = L. Linear (1*1*64,100)
            self.l4_hidden=[ ]
            self.l5 = L. Linear (100, n_outputs, nobias = True)
            self.pred = [ ]

    def__call__(self, X):

        # ->Input
        self.I=Variable(X)

        # ->14*14*16
        self.conv1_hidden=self.conv1(self.I)
        self.relu1_hidden=F.relu(self.conv1_hidden)
        self.maxpool1_hidden=F.max_pooling_2d(self.relu1_hidden, 2)

        # ->5*5*32
        self.conv2_hidden=self.conv2(self.maxpool1_hidden)
        self.relu2_hidden=F.relu(self.conv2_hidden)
        self.maxpool2_hidden=F.max_pooling_2d(self.relu2_hidden, 2)

        # ->1*1*64
        self.conv3_hidden=self.conv3(self.maxpool2_hidden)
        self.relu3_hidden=F.relu(self.conv3_hidden)

        # ->100
        self.l4_hidden=self.l4(self.relu3_hidden)
        self.relu4_hidden=F.relu(self.l4_hidden)

        # ->n_outputs
        self.pred=self.l5(self.relu4_hidden)

        return self.pred

If you read the model in main and try to retrieve the gradient information as follows, it will be displayed as None.

main():

    # Abbreviated
    # (Read the Model 1 model into a variable called model, and store the loss when entering the image in loss.)

    loss.backward(retain_grad=True)
    print(model.I.grad)

I would appreciate it if you could let me know if grad is not stored in the model defined as a class, and if so, how can I get gradient information for each layer?
Thank you for your cooperation.

python chainer

2022-09-30 19:53

3 Answers

In short, use chainer.grad. (See Example 3 for use cases.).)
Here are the details.

Q71. Mr. MSK's answer is incorrect. First, you can usually get the input gradient after Variable.backward().

Example 1

import chainer
from chain import Variable
from chain import functions as F

import numpy as np


imp=Variable(np.array([1.,2.,3.]))
intermediate=inp*3
out = F.sum (intermediate)

out.backward()

print(imp.grad)

Output1

 [3.3.3.]

In this way, you can take the gradient of the input value.
However, as you asked, you cannot take the gradient of intermediate features.

Example 2

import chainer
from chain import Variable
from chain import functions as F

import numpy as np


imp=Variable(np.array([1.,2.,3.]))
intermediate=inp*3
out = F.sum (intermediate)

out.backward()

print(intermediate.grad)

Output 2

None

This is a specification called computational graph efficiency, and because of GPU memory omission, Variable.backward() does not remember all grads except leaf nodes.
Therefore, the correct code is as follows.

Example 3

import chainer
from chain import Variable
from chain import functions as F

import numpy as np


imp=Variable(np.array([1.,2.,3.]))
intermediate=inp*3
out = F.sum (intermediate)

intermediate_grad = chain.grad(out,), (intermediate,)) [0]
print(intermediate_grad)

Output 3

 variable ([1.1.1.])

2022-09-30 19:53

Sorry, I made a mistake.The questioner wants to take the gradient of the input value. In my understanding, the gradient is calculated to change the value, to learn.So, do we learn the input value?Of course I don't, so I don't think the gradient will be calculated. What do you think?

Second, the gradient of any layer has already been taken.It should be in the grad of the variable _hidden postfix.
In addition,

y=relu(conv2d(y))
 y = relu(conv2d(y))
 returny

As shown in , if you do not want to leave a calculation result, you can go up to the previous result by looking at creator.inputs from the last output.In this case, if two conditions are met: Use a function such as "Chainer V3 or later" and "Use an unknown (unknown)" as reported in Chainer v3.This is registered with issue4220.As a workaround, let me refer to the results of the calculation somewhere, like the questioner.

2022-09-30 19:53

For Q71.MSK It would have been nice if you could build another topic, but I'll answer it here because it's troublesome to wait for it.

Example 4

 from chain import Chain, Variable
from chain import functions as F
from chain import links as L

import numpy as np


class Model 1 (Chain):
    def_init__(self, input_chs, n_outputs):
        super().__init__()
        with self.init_scope():
            self.conv1 = L. Convolution 2D (input_chs, 16, 5, stride = 1, pad = 0)
            self.conv2 = L. Convolution 2D (16, 32, 5, stride = 1, pad = 0)
            self.conv3 = L. Convolution 2D (32, 64, 5, stride = 1, pad = 0)
            self.l4 = L. Linear (1*1*64,100)
            self.l5 = L. Linear (100, n_outputs, nobias = True)

    def__call__(self, X):
        # ->Input
        self.I=Variable(X)

        # ->14*14*16
        self.conv1_hidden=self.conv1(self.I)
        self.relu1_hidden=F.relu(self.conv1_hidden)
        self.maxpool1_hidden=F.max_pooling_2d(self.relu1_hidden, 2)

        # ->5*5*32
        self.conv2_hidden=self.conv2(self.maxpool1_hidden)
        self.relu2_hidden=F.relu(self.conv2_hidden)
        self.maxpool2_hidden=F.max_pooling_2d(self.relu2_hidden, 2)

        # ->1*1*64
        self.conv3_hidden=self.conv3(self.maxpool2_hidden)
        self.relu3_hidden=F.relu(self.conv3_hidden)

        # ->100
        self.l4_hidden=self.l4(self.relu3_hidden)
        self.relu4_hidden=F.relu(self.l4_hidden)

        # ->n_outputs
        self.pred=self.l5(self.relu4_hidden)

        return self.pred


model=Model1(3,1)
imp=np.range(1*3*30*30, dtype=np.float32).reshape(1,3,30,30))
model_pred=model(imp)
out = F.sum(model_pred)

out.backward(retain_grad=True)
print(type(model.pred.grad))
print(type(model.I.grad))

Output 4

<class'numpy.ndarray'>
<class'numpy.ndarray'>

The backward(retain_grad=True) is also working correctly in the questioner's code.

The behavior of your issue is also due to memory efficiency.

To put it very briefly, the backward(retain_grad=True) behaves as if the lifetime has expired and the variable is not referenced by anyone.

[Additional note]
Basically, you don't have to worry about Variable lifetime.
If you want to access the temporary variable of functions already implemented for debugging purposes, you can use the above method to retrieve grad.

2022-09-30 19:53

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656