跳转至

运动恢复结构

Structure from Motion

Camera Calibration

Perspective-n-Point (PnP) Problem

The goal of the Perspective-n-Point (PnP) problem is to estimate the position and orientation of a calibrated camera from known 3D-to-2D point correspondences.


Direct Linear Transform (DLT)

Suppose the relation between the 2D-coordinate and 3D-coordinate of a point is

\[ \begin{bmatrix} u^{(i)} \\ v^{(i)} \\ 1 \end{bmatrix} \equiv \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \\ \end{bmatrix} \begin{bmatrix} x_w^{(i)} \\ y_w^{(i)} \\ z_w^{(i)} \\ 1 \end{bmatrix} \]

That is

\[ u^{(i)} = \frac{p_{11}x_w^{(i)} + p_{12}y_w^{(i)} + p_{13}z_w^{(i)} + p_{14}}{p_{31}x_w^{(i)} + p_{32}y_w^{(i)} + p_{33}z_w^{(i)} + p_{34}} \\~\\ v^{(i)} = \frac{p_{21}x_w^{(i)} + p_{22}y_w^{(i)} + p_{23}z_w^{(i)} + p_{24}}{p_{31}x_w^{(i)} + p_{32}y_w^{(i)} + p_{33}z_w^{(i)} + p_{34}} \]

Rerange the equations, we can get

\[ \bm{x}_w^{(i)} = \begin{bmatrix} x_w^{(i)} & y_w^{(i)} & z_w^{(i)} & 1 \end{bmatrix}, \bm{p}_i = \begin{bmatrix} p_{i1} \\ p_{i2} \\ p_{i3} \\ p_{i4} \end{bmatrix} \\~\\ \begin{bmatrix} \bm{x}_w^{(1)} & \bm{0} & -u_1\bm{x}_w^{(1)} \\ \bm{0} & \bm{x}_w^{(1)} & -v_1\bm{x}_w^{(1)} \\ \vdots & \vdots & \vdots \\ \bm{x}_w^{(n)} & \bm{0} & -u_n\bm{x}_w^{(n)} \\ \bm{0} & \bm{x}_w^{(n)} & -v_n\bm{x}_w^{(n)} \\ \end{bmatrix} \begin{bmatrix} \bm{p}_1 \\ \bm{p}_2 \\ \bm{p}_3 \end{bmatrix} = \bm{0} \\~\\ \Rightarrow A\bm{p} = \bm{0} \]

Because the projection matrix is defined up to scale, we can add a constraint condition \(\left\| \bm{p} \right\|^2 = 1\), then the solution is

\[ \underset{p}{\min}\left\| A\bm{p} \right\|^2 \text{ such that } \left\| \bm{p} \right\|^2 = 1 \]

The eigenvector with smallest eigenvalue \(\lambda\) of matrix \(A^TA\) is the solution.

Then, to decompose projection matrices to intrinsic and extrinsic matrices, we can use QR factorization:

\[ \begin{bmatrix} p_{11} & p_{12} & p_{13} \\ p_{21} & p_{22} & p_{23} \\ p_{31} & p_{32} & p_{33} \\ \end{bmatrix} = \begin{bmatrix} f_x & 0 & o_x \\ 0 & f_y & o_y \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} r_{11} & r_{12} & r_{13} \\ r_{21} & r_{22} & r_{23} \\ r_{31} & r_{32} & r_{33} \\ \end{bmatrix} = KR \\~\\ \begin{bmatrix} p_{14} \\ p_{24} \\ p_{34} \end{bmatrix} = K\bm{t} \]


P3P

P3P uses the geometric relationships of three points, the 3D-to-2D point correspondences, and the known internal parameters of the camera to get the external parameters of the camera.

Firstly, convert 2D points to corresponding 3D points in camera coordinate system

\[ x_c = \frac{u - o_x}{f_x}, \quad y_c = \frac{v - o_y}{f_y}, \quad z_c = 1 \\ \]

By these, we can get \(\cos\left< a, b \right>, \cos\left< b, c \right>, \cos\left< a, c \right>\). Then, by the cosine theorem, we can get

\[ \left\{ \begin{aligned} & OA^2 + OB^2 - 2OA\cdot OB\cdot \cos\left< a, b \right> = AB^2 \\ & OB^2 + OC^2 - 2OB\cdot OC\cdot \cos\left< b, c \right> = BC^2 \\ & OA^2 + OC^2 - 2OA\cdot OC\cdot \cos\left< a, c \right> = AC^2 \\ \end{aligned} \right. \]

Devided both sides by \(OC^2\), and let \(\displaystyle x = \frac{OA}{OC}, y = \frac{OB}{OC}, v = \frac{AB^2}{OC^2}, u = \frac{BC^2}{AB^2}, w = \frac{AC^2}{AB^2}\), then the equations become

\[ \left\{ \begin{aligned} & x^2 + y^2 - 2xy\cos\left< a, b \right> - v = 0 \\ & y^2 + 1 - 2y\cos\left< b, c \right> - uv = 0 \\ & x^2 + 1 - 2y\cos\left< a, c \right> - wv = 0 \\ \end{aligned} \right. \]

Rerange the equations, then we can get

\[ \left\{ \begin{aligned} & (1 - u)y^2 - ux^2 - \cos\left< b, c \right> y + 2uxy\cos\left< a, b \right> + 1 = 0 \\ & (1 - w)x^2 - wy^2 - \cos\left< b, c \right> x + 2wxy\cos\left< a, b \right> + 1 = 0 \\ \end{aligned} \right. \]

This binary quadratic equation has 4 possible answers, we use one extra point to determine the most possible one.


Bundle Adjustment (BA)

Suppose \(P_i\) is the coordinate of a point in the world coordinate system, \(p_i\) is the homogeneous coordinate of corresponding point in camera coordinate system, \(\omega_i\) is the depth of correspoinding point, then we can define the reprojection error

\[ e(R, t) = \sum_i \left\| p_i - \frac{1}{\omega_i}K(RP_i + t) \right\|^2 \]

We can initialize \(R, t\) by P3P, and then optimize them by Gauss-Newton.


Structure from Motion (SFM)

Structure from motion (SfM) is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals.


Epipolar Geometry

Epipolar geometry uses the 2D point correspondences and the known internal parameters of cameras to get \(R, t\) between two cameras.

Baseline (\(O_lO_r\)): Line connecting the two camera centers. Epipoles (\(e_l\) and \(e_r\)): Intersections of baseline with image planes. Epipolar Plane (\(XO_lO_r\)): Plane containing \(X, O\) and \(O'\). Epipolar Lines (\(l_l\) and \(l_r\)): Intersections of epipolar plane with image planes.


The normal vecter of the epipolar plane is $\bm{n} = \bm{t} \times \bm{x}_l $, so \(\bm{x}_l \cdot (\bm{t} \times \bm{x}_l) = 0\). That is

\[ \begin{bmatrix} x_l & y_l & z_l \end{bmatrix} \begin{bmatrix} t_yz_l - t_zy_l \\ t_zx_l - t_xz_l \\ t_xy_l - t_yx_l \end{bmatrix} = 0 \\~\\ \Rightarrow \begin{bmatrix} x_l & y_l & z_l \end{bmatrix} \begin{bmatrix} 0 & -t_z & t_y \\ t_z & 0 & -t_x \\ -t_y & t_x & 0 \\ \end{bmatrix} \begin{bmatrix} x_l \\ y_l \\ z_l \end{bmatrix} = 0 \\~\\ \Rightarrow \bm{x}_l^T T_X \bm{x}_l = 0 \]

Substitute $\bm{x}_l = R\bm{x}_r + \bm{t} $ into \(\bm{x}_l^T T_X \bm{x}_l = 0\), we can get

\[ \bm{x}_l^T (T_XR\bm{x}_r + T_X\bm{t}) = 0 \\~\\ \Rightarrow \bm{x}_l^T (T_XR\bm{x}_r + \bm{t}\times \bm{t}) = 0 \\~\\ \Rightarrow \bm{x}_l^T E \bm{x}_r = 0 \]

where \(E = T_XR\) is called essential matrix.

Given that \(T_X\) is a skew-symmetric matrix ($a_{ij} = -a_{ji} $) and \(R\) is an orthonormal matrix, we can decouple \(T_X\) and \(R\) from \(E\) using singular value decomposition, so our goal becomes getting \(E\).


To calculate \(E\), fistly, we have

\[ z_l\begin{bmatrix} u_l \\ v_l \\ 1 \end{bmatrix} = \begin{bmatrix} f_x^{(l)} & 0 & o_x^{(l)} \\ 0 & f_y^{(l)} & o_y^{(l)} \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} x_l \\ y_l \\ z_l \end{bmatrix}, z_r\begin{bmatrix} u_r \\ v_r \\ 1 \end{bmatrix} = \begin{bmatrix} f_x^{(r)} & 0 & o_x^{(r)} \\ 0 & f_y^{(r)} & o_y^{(r)} \\ 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} x_r \\ y_r \\ z_r \end{bmatrix} \\~\\ \Rightarrow \bm{x}_l^T = \begin{bmatrix} u_l & v_l & 1 \end{bmatrix}z_l K_l^{{-1}^T}, \bm{x}_r = K_r^{-1} z_r \begin{bmatrix} u_r \\ v_r \\ 1 \end{bmatrix} \]

Substitute these into \(\bm{x}_l^T E \bm{x}_r = 0\), we can get:

\[ \begin{bmatrix} u_l & v_l & 1 \end{bmatrix} K_l^{{-1}^T} E K_r^{-1} \begin{bmatrix} u_r \\ v_r \\ 1 \end{bmatrix} = 0 \\~\\ \Rightarrow \bm{u}_l^T F \bm{u}_r = 0 \]

where \(F = K_l^{{-1}^T} E K_r^{-1}\) is called fundamental matrix.


Expand \(\bm{u}_l^T F \bm{u}_r = 0\), we can get

\[ \begin{aligned} & (f_{11}u_r^{(i)} + f_{12}v_r^{(i)} + f_{13})u_l^{(l)} + (f_{21}u_r^{(i)} + f_{22}v_r^{(i)} + f_{23})v_l^{(l)} + \\ & f_{31}u_r^{(i)} + f_{32}v_r^{(i)} + f_{33} = 0 \end{aligned} \]

Rerange the terms, and set \(\bm{p}_r^{(i)} = \begin{bmatrix} u_r^{(i)} & v_r^{(i)} & 1 \end{bmatrix}\), then we can get:

\[ \begin{bmatrix} u_l^{(1)}\bm{p}_r^{(1)} & v_l^{(1)}\bm{p}_r^{(1)} & \bm{p}_r^{(1)} \\ \vdots & \vdots & \vdots \\ u_l^{(i)}\bm{p}_r^{(i)} & v_l^{(i)}\bm{p}_r^{(i)} & \bm{p}_r^{(i)} \\ \vdots & \vdots & \vdots \\ u_l^{(n)}\bm{p}_r^{(n)} & v_l^{(n)}\bm{p}_r^{(n)} & \bm{p}_r^{(n)} \\ \end{bmatrix} \begin{bmatrix} f_{11} \\ f_{12} \\ f_{13} \\ f_{21} \\ f_{22} \\ f_{23} \\ f_{31} \\ f_{32} \\ f_{33} \end{bmatrix} = \bm{0} \\~\\ \Rightarrow A\mathbf{f} = \bm{0} \]

Because \(\mathbf{f}\) is defined up to scale, we can add a constraint condition \(\left\| \mathbf{f} \right\|^2 = 1\), then the solution is

\[ \underset{\mathbf{f}}{\min}\left\| A\mathbf{f} \right\|^2 \text{ such that } \left\| \mathbf{f} \right\|^2 = 1 \]

The eigenvector with smallest eigenvalue \(\lambda\) of matrix \(A^TA\) is the solution.


Triangulation

Linear Solution

Triangulation uses 2D point correspondence and all parameters of two cameras to get 3D coordinate of a point.

Firstly, we have imaging equations

\[ \begin{bmatrix} u_{r} \\ v_{r} \\ 1 \end{bmatrix} \equiv M_{int}^{(r)} \bm{\tilde{x}}_r = \begin{bmatrix} m_{11} & m_{12} & m_{13} & m_{14} \\ m_{21} & m_{22} & m_{23} & m_{24} \\ m_{31} & m_{32} & m_{33} & m_{34} \\ \end{bmatrix} \begin{bmatrix} x_{r} \\ y_{r} \\ z_{r} \\ 1 \end{bmatrix} \\~\\ \begin{bmatrix} u_l \\ v_l \\ 1 \end{bmatrix} \equiv M_{int}^{(l)} \bm{\tilde{x}}_l = M_{int}^{(l)} M_{ext} \bm{\tilde{x}}_r = \begin{bmatrix} p_{11} & p_{12} & p_{13} & p_{14} \\ p_{21} & p_{22} & p_{23} & p_{24} \\ p_{31} & p_{32} & p_{33} & p_{34} \\ \end{bmatrix} \begin{bmatrix} x_{r} \\ y_{r} \\ z_{r} \\ 1 \end{bmatrix} \]

That is

\[ u_r = \frac{m_{11}x_r + m_{12}y_r + m_{13}z_r + m_{14}}{m_{31}x_r + m_{32}y_r + m_{33}z_r + m_{34}} \\~\\ v_r = \frac{m_{21}x_r + m_{22}y_r + m_{23}z_r + m_{24}}{m_{31}x_r + m_{32}y_r + m_{33}z_r + m_{34}} \\~\\ u_l = \frac{p_{11}x_r + p_{12}y_r + p_{13}z_r + p_{14}}{p_{31}x_r + p_{32}y_r + p_{33}z_r + p_{34}} \\~\\ v_l = \frac{p_{21}x_r + p_{22}y_r + p_{23}z_r + p_{24}}{p_{31}x_r + p_{32}y_r + p_{33}z_r + p_{34}} \]

Rerange the equations, we can get

\[ \begin{bmatrix} u_rm_{31} - m_{11} & u_rm_{32} - m_{12} & u_rm_{33} - m_{13} \\ v_rm_{31} - m_{11} & v_rm_{32} - m_{12} & v_rm_{33} - m_{13} \\ u_lp_{31} - p_{11} & u_lp_{32} - p_{12} & u_lp_{33} - p_{13} \\ v_lp_{31} - p_{11} & v_lp_{32} - p_{12} & v_lp_{33} - p_{13} \\ \end{bmatrix} \begin{bmatrix} x_{r} \\ y_{r} \\ z_{r} \end{bmatrix} = \begin{bmatrix} m_{14} - u_rm_{34} \\ m_{24} - v_rm_{34} \\ p_{14} - u_lp_{34} \\ p_{24} - v_lp_{34} \\ \end{bmatrix} \\~\\ \Rightarrow A\bm{x}_r = \mathbf{b} \]

The least squares solution is

\[\bm{x}_r = (A^TA)^{-1} A^T \mathbf{b} \]


Non-Linear Solution

Suppose \(\mathbf{P}\) is the coordinate of a point, \(\bm{\hat{u}}_l, \bm{\hat{u}}_r\) are projections of the point on the two image planes, then we can define the reprojection error

\[\text{cost}(\mathbf{P}) = \left\| \bm{u}_l - \bm{\hat{u}}_l \right\|^2 + \left\| \bm{u}_r - \bm{\hat{u}}_r \right\|^2\]

The solution is \(\mathbf{P}\) such that minimize the reprojection error.