Dense disparity estimation using stereo images
MetadataShow full item record
"“Stereo vision” refers to the ability to infer information on the three-dimensional (3D) structure and distance/depth of a scene using two images captured from different view-points. It imitates one of the tasks performed by the human brain and the two eyes. In the stereo vision, a scene point is projected onto different locations on the two image planes (left and right cameras) and the main goal here is to find the orresponding pixels i.e., pixels resulting from the projection of the same 3D point onto the two image planes. The displacement between corresponding pixels is called “disparity”, and obtaining the same at each pixel location forms adense disparity map. However, estimation of disparities is an ill-posed problem and hence in practice is solved by formulating it as a global energy minimization problem. An energy function represents a combination of a “data term” and a “prior term” that restricts the solution space, and choosing a suitable data as well as prior models lead to accurate dense disparity estimates. In this thesis, we address this problem of dense disparity map estimation using rectified stereo images with known calibration of cameras and propose various approaches for solving it in a global energy minimization framework. We utilize “graph cuts”, an efficient and fast optimization technique for minimizing our energy functions.We first propose a method for dense disparity estimation using inhomogeneous Gaussian Markov random field (IGMRF) prior where we model the disparity map using this prior. The estimated IGMRF parameters assist us to yield a smooth solution while preserving the sharp depth discontinuities. In order to model the data term, we use the pixel-based intensity matching cost which is based on the brightness constancy assumption of the corresponding pixels. A learning based approach is used to obtain an initial disparity map which is used in obtaining the IGMRF parameters. The dense disparity map is obtained by minimizing the energy function using graph cuts. In this case, the quality of the final solution is strongly governed by the accuracy of the IGMRF parameters. Though, IGMRF prior captures smoothness with discontinuities, it fails to capture higher order dependencies such as sparseness in the disparity map. This motivates us to use another prior namely the prior that represents sparsity in disparities. In our next work, we combine IGMRF and sparsity priors in our energy minimization framework in order to obtain a dense disparity map. Here, the sparsity prior is defined using the learned overcomplete sparseness of disparity patches. In this work, instead of making a brightness constancy assumption, we use an intensity matching cost as a data term which is robust against outliers and insensitive to image sampling. We use two different approaches in order to obtain the sparseness of disparities. In the first method, the sparse representation is obtained by a learned overcomplete dictionary where we make use of “K-singular value decomposition” (K-SVD) algorithm. In order to better represent the sparseness, “sparse autoencoder”, a non-linear model is then used. A two phase iterative approach is used to obtain the final solution. In order to achieve better performance, a good initial estimate was obtained using a classical local stereo method including a set of post-processing operations for disparity refinement. The combination of IGMRF and sparsity priors serve as a better regularizer but the choice of an appropriate data model also plays a key role in obtaining a better disparity map. Although, the data term used earlier which was based on pixel based intensity matching is robust against outliers and insensitive to image sampling, it relies on the raw pixel values and hence the use of it may result in ambiguous and erroneous disparities in textureless areas and near depth discontinuities. Taking this into account, in our next work, we propose a method where we make use of feature matching in the energy function. Hierarchical features of given stereo image pair are learned using the “deconvolutional network”, a deep learning model which is trained in an unsupervised way using a database consisting of large number of stereo images. Combining the feature matching with the intensity matching in our energy function restricts the solution space giving."
- PhD Theses