Back Propagation

ComputerVision Lecture 9-1

  ·  2 min read

Model Fitting #

$$ L(w) = \lambda ||w||^2_2 + \sum_{i=1}^n(y_i - w^Tx_i))^2 $$

Gradient Descent #

$$ w_{i+1} = w_i - \alpha \nabla_wL(f(w_i)) $$

exempli gratia #

$$ \begin{align*} &f(x) = (-x+3)^2 \newline &f = q^2\quad q=r+3\quad r=-x\newline &\frac{\partial f}{\partial q} = 2q, \quad \frac{\partial q}{\partial r} = 1,\quad \frac{\partial r}{\partial x} = -1\newline &\frac{\partial f}{\partial x} = \frac{\partial f}{\partial q} \frac{\partial q}{\partial r} \frac{\partial r}{\partial x} = 2q * 1 * -1\newline &= 2x-6 \end{align*} $$

Propagation #

Forward Pass #

Given forward input n을 통해 f(n)을 계산

$$ \begin{align*} &f(x) = (-x+3)^2 \newline &x\longrightarrow [-n] \longrightarrow [n+3] \longrightarrow [n^2] \Longrightarrow (-x+3)^2 \newline \end{align*} $$

Backward Pass #

Given backwards input은 g(출력값)을 통해 $g*\dfrac{\partial f(n)}{\partial n}$을 리턴한다.

$$ \begin{align*} &\dfrac{\partial x}{\partial x} = 1 \longrightarrow 1 * \dfrac{\partial}{\partial n} n^2 = 2n \longrightarrow 2n * \dfrac{\partial}{\partial n}n+3 = 2n \ - \newline & \quad \quad \quad \rightarrow 2n * \dfrac{\partial}{\partial n} (-n) = -2n \Longrightarrow -2n = -2(-x+3) = 2x-6 \end{align*} $$

%%{init: {'flowchart': {'curve': 'linear'}}}%%
flowchart LR
    %% Forward Pass
    n -->|입력 n| fblock[f]
    fblock -->|f(n)| out[f]

    %% Backward Pass (Gradient)
    g_in["g"] ---|up스트림 gradient| fblock 
    fblock ---|g × ∂f(n)/∂n| grad_out["g × ∂f(n)/∂n"]

Sigmoid #

시그모이드와 같이 복잡한 수식은 하나씩 propagation하는 것이 아닌 block처럼 중첩시켜 연산할 수 있다.

$$ \sigma(n) = \frac{1}{1+e^{-n}}\newline \dfrac{\partial}{\partial n}\sigma(n) = (1-\sigma(n))\sigma(n) $$

Implementation #

ReLU함수를 propagation하는 코드는 아래와 같다.

class ReLULayer:
    def forward(self, x):
        self.x = x
        return np.maximum(x,0)

    def backward(self, grad_output):
        grad_input = grad_output.clone()
        grad_input[self.x < 0] = 0
        return grad_input