Image Transformation

ComputerVision Lecture05: Image transformation

March 31, 2025 · 3 min read

#Computervision #Imagetransformation

Review #

Detection: Harris/Laplcian of Gaussian
Descriptor: SIFT

Model Fitting #

모델 피팅에는 아래 3가지 요소가 필요한다.

Data
Model
Objective Function(평가)

Least-Squares #

Data: $(x_1, y_1),\ (x_2, y_2), \cdots,\ (x_k,y_k)$
Model: $(m,b)\ y_i = mx_i + b\ Or\ (w)y_i = {w}^Tx_i$
Objective function: $(y_i - {w}^Tx_i)^2$

Least-Square Setup $$ \sum_{i=1}^k(y_i-w^Tx_i)^2 \longrightarrow ||Y-Xw||_2^2\newline $$

$$ Y = \begin{bmatrix}y_1\newline \vdots \newline y_k\end{bmatrix} X = \begin{bmatrix}x_1 & 1\newline \vdots & \vdots\newline x_k & 1\end{bmatrix} w = \begin{bmatrix}m\newline b\end{bmatrix} $$

Running Least Squares #

$||Y-Xw||_2^2$를 직접 계산하여 할 수도 있지만 행렬의 역산으로 $w$를 직접 구할 수도 있다. $$ w = (X^TX)^{-1}X^TY $$

RANSAC #

모델은 최대한 많은 값들을 잘 설명해야 한다. 즉 오버피팅이 되어서는 안된다. 하지만 dataset 전체를 Leaset Squares처리해가면서 하나씩 하면 시간이 너무 오래 걸린다는 단점이 있다. RANSAC¹을 통해서 dataset 중 랜덤하게 뽑아 평가할 수 있다.

PseudoCode for RANSAC

bestLine, bestCount = None, -1
for trial in range(numTrials):
    subset = pickPairOfPoints(data)
    line = totalLeastSquares(subset)
    E = linePointDistance(data,line)
    inliers = E < threshold
    if #inliers > bestCount:
        bestLine, bestCount = line, #inliers

bestLine은 모델을, bestCount는 threshold안에 있는 포인트들을 의미한다. numTrials만큼 반복하면서 데이터를 수집 후 linePointDistance함수를 통해 평가한다. 평가 값이 threshold보다 작다면 즉 inlier라면 bestCount에 추가하며 이 값이 이전 모델보다 크다면 bestLine을 변경한다.

RANSAC의 장점

심플하다.
효과적이다.
범용적이다.

RANSAC의 단점

파라미터를 튜닝해야 한다.
많은 outlier가 있는 경우 비효율적이다.

Affine Transformation #

Geometric Transformation의 종류

Translation: 이미지 전체를 x,y 방향으로 일정 거리만큼 평행 이동
Rotation: 이미지 객체를 주어진 각도만큼 회전
Scaling: 이미지의 크기를 확대 또는 축소
Shearing: 이미지 형태를 비스듬이 늘리거나 줄여서, 직선이 기욱어지도록 함
Reflection: 이미지를 좌우 또는 상하로 반전
Affine: Translation, Rotation, Scaling, Shearing을 모두 포함하는 선형 변환
Homography: Affine에 원근감을 추가하는 변환

Fitting Models

Data: $(x_i,y_i,x^\prime_i,y^\prime_i)$ (여기서 $x^\prime,y^\prime$ 값은 실제 이미지의 값을 의미)
Model: $[x_i^\prime,y_i^\prime] = M[x_i,y_i]+t$ (여기서 $x^\prime,y^\prime$값은 실제 값이 아닌 예측값, M은 $2\times 2$ matrix)
Objective function: $||[x^\prime_i,y^\prime_i]-(M[x_i,y_i]+t)||^2$

데이터는 2차원이 아닌 4차원의 값을 가진다. 즉 첫번쨰 이미지와 두번째 이미지의 픽셀 위치 각각 2개씩을 쌍으로 가진다.

$$ \begin{bmatrix}x_i^\prime\newline y_i^\prime\end{bmatrix} = \begin{bmatrix}m_1 & m_2 \newline m_3 & m_4\end{bmatrix}\begin{bmatrix}x_i\newline y_i\end{bmatrix} + \begin{bmatrix}t_x\newline t_y\end{bmatrix} $$ 위의 식을 아래처럼 변경할 수 있다.

$$ \begin{bmatrix}\vdots \newline x_i^\prime\newline y_i^\prime\newline \vdots\end{bmatrix} = \begin{bmatrix}\cdots&\cdots &\cdots & \cdots&\cdots &\cdots \newline x_i & y_i & 0 & 0 & 1 & 0\newline 0 & 0 & x_i & y_i & 0 & i\newline \cdots&\cdots &\cdots & \cdots&\cdots &\cdots \end{bmatrix}\begin{bmatrix}m_1\newline m_2\newline m_3\newline m_4\newline t_x\newline t_y\end{bmatrix} $$

해당 행렬을 통해 2개의 방정식과 6개의 미지수가 있음을 알 수 있다. 그렇다면 데이터셋 $6/2=3$개가 필요하다. 따라서 3개의 점을 통해 Affine Transformation을 할 수 있다.

RANSAC을 통해 데이터셋을 선별한 후 $\argmin\limits_x||Ax-b||^2$을 구하면 된다.

이 떄 $Ax-b$가 inlier를 의미한다.

Homography Transformation #

Fitting Models

Data: $(x_i,y_i,x_i^\prime,y_i^\prime)$
Model: $[x_i^\prime,y_i^\prime,1] \equiv \mathbf{H}[x_i,y_i,1]$
Objective function: complicated

$$ \begin{bmatrix} 0^T & -p_1^T & y_1^\prime p_1^\prime\newline p_1^T & 0^T & -x_1^\prime p_1^\prime\newline & \vdots & \newline 0^T & -p_n^T & y_n^\prime p_n^\prime\newline p_n^T & 0^T & -x_n^\prime p_n^\prime \end{bmatrix} \begin{bmatrix} h_1\newline h_2\newline h_3 \end{bmatrix} = 0 $$

Objective-function은 아래와 같다.

$$ h^* = \argmin_{||h||=1} ||Ah||^2 $$

정리 #

Transformation 순서

Detection
Descriptor
Match by Nearest Neighbor
Fit h via RANSAC
Blend image

RANdom SAmple Concencus ↩︎