Back Propagation
ComputerVision Lecture 9-1
· 2 min read
Model Fitting #
$$ L(w) = \lambda ||w||^2_2 + \sum_{i=1}^n(y_i - w^Tx_i))^2 $$
Gradient Descent #
$$ w_{i+1} = w_i - \alpha \nabla_wL(f(w_i)) $$
exempli gratia #
$$ \begin{align*} &f(x) = (-x+3)^2 \newline &f = q^2\quad q=r+3\quad r=-x\newline &\frac{\partial f}{\partial q} = 2q, \quad \frac{\partial q}{\partial r} = 1,\quad \frac{\partial r}{\partial x} = -1\newline &\frac{\partial f}{\partial x} = \frac{\partial f}{\partial q} \frac{\partial q}{\partial r} \frac{\partial r}{\partial x} = 2q * 1 * -1\newline &= 2x-6 \end{align*} $$
Propagation #
Forward Pass #
Given forward input n을 통해 f(n)을 계산
$$ \begin{align*} &f(x) = (-x+3)^2 \newline &x\longrightarrow [-n] \longrightarrow [n+3] \longrightarrow [n^2] \Longrightarrow (-x+3)^2 \newline \end{align*} $$
Backward Pass #
Given backwards input은 g(출력값)을 통해 $g*\dfrac{\partial f(n)}{\partial n}$을 리턴한다.
$$ \begin{align*} &\dfrac{\partial x}{\partial x} = 1 \longrightarrow 1 * \dfrac{\partial}{\partial n} n^2 = 2n \longrightarrow 2n * \dfrac{\partial}{\partial n}n+3 = 2n \ - \newline & \quad \quad \quad \rightarrow 2n * \dfrac{\partial}{\partial n} (-n) = -2n \Longrightarrow -2n = -2(-x+3) = 2x-6 \end{align*} $$
%%{init: {'flowchart': {'curve': 'linear'}}}%% flowchart LR %% Forward Pass n -->|입력 n| fblock[f] fblock -->|f(n)| out[f] %% Backward Pass (Gradient) g_in["g"] ---|up스트림 gradient| fblock fblock ---|g × ∂f(n)/∂n| grad_out["g × ∂f(n)/∂n"]
Sigmoid #
시그모이드와 같이 복잡한 수식은 하나씩 propagation하는 것이 아닌 block처럼 중첩시켜 연산할 수 있다.
$$ \sigma(n) = \frac{1}{1+e^{-n}}\newline \dfrac{\partial}{\partial n}\sigma(n) = (1-\sigma(n))\sigma(n) $$
Implementation #
ReLU함수를 propagation하는 코드는 아래와 같다.
class ReLULayer:
def forward(self, x):
self.x = x
return np.maximum(x,0)
def backward(self, grad_output):
grad_input = grad_output.clone()
grad_input[self.x < 0] = 0
return grad_input