优化理论
Optimization¶
Model Fitting¶
A typical approach of model fitting is to minimize the mean square error (MSE)
That's because we assume the data is with Gaussian noise
If the data points are independent, we can get
Maximum likelihood estimation (MLE) is to maximize the likelihood to find the best \(x\), so MSE = MLE with Gaussian noise assumption.
Numerical Methods¶
Steepest Descent Method¶
Do first-order expansion at \(x_k\)
When direction of \(\Delta x\) is same as \(-J_F^T\), the objective function descend steepest.
To find an appropriate step size, we can use backtracking algorithm. That is, initialize \(\alpha\) with a big value, and then decrease \(\alpha\) until
where \(0 < \gamma < 1\).
Newton Method¶
Do second-order expansion at \(x_k\)
To minimize \(F(x_k + \Delta x)\), we take the derivative of \(\Delta x\)
So the optimal direction is
Gauss-Newton Method¶
This is useful for solving nonlinear least squares. Define
\(R(x)\) is called the residual vector. Then
Instead of expanding \(F(x)\), we expand \(R(x)\)
The optimal \(\Delta x\) satisfies
So the optimal direction is
Levenberg-Marquardt Method¶
When \(J_R^TJ_R\) is singular, Gauss-Newton algorithm becomes unstable. LM employ regularization to overcome this
When \(\lambda > 0\), \(J_R^T J_R + \lambda I\) must be positive-definite.
If \(\lambda \rightarrow \infty\), it's like gradient descent. If \(\lambda \rightarrow 0\), it's like Gauss-Newton step.
Robust Estimation¶
Inlier obeys the model assumption. Outlier differs significantly from the assumption. MSE is proportional to residual square, so it's affected a lot by outliers.
Random Sample Concensus (RANSAC)¶
RANSAC is the most powerful method to handle outliers, the procedures are 1. Choose some samples randomly to do model fitting. 2. Count the number of samples that do not deviate significantly from the model. 3. Determine the samples with the largest number as inliers.
Graphcut and MRF¶
Graph Cut¶
Define pixel dissimilarity as
Define pixel affinity as
Suppose cut \(C = (V_A, V_B)\) is a partition of vertices \(V\) of a graph \(G\) into two disjoint subsets \(V_A\) and \(V_B\), then the cost of cut is
The graph cut is to minimize the cost of cut. Moreover, we can normalize the cut
This is a NP-Complete problem, but has approximate solution by eigenvalue decomposition.
Segmentation by MRF¶
Define joint probability as
Then we can define the energy function
where \(\varphi(x_i, y_i)\) is called unary term, representing the likelihood for each pixel, \(\psi(y_i, y_j)\) is called pairwise term, representing the consistency between neighboring pixels.