Abstract
Accurate depth estimation is crucial for stereo vision systems, which play an essential role in various applications such as autonomous driving, robotics, augmented reality, and more, especially in complex environments. This thesis presents a new approach to improving depth estimation of distant objects by incorporating multi-scale image blending using deep super-resolution techniques.
The proposed approach consists of two main steps: First, utilize multi-scale image fusions to combine information from images captured at different resolutions and propose a fully automated, unsupervised learning method for multi-camera system calibration based on the power of multi-scale image fusion. Second, propose a fully automated, supervised learning approach for object detection using state-of-the-art deep learning algorithms. Finally, present a fully automated method for depth estimation that achieves greater accuracy compared to ground truth data.
The proposed method was rigorously tested on collected stereo vision datasets, which included various scenarios such as different object distances and the presence of multiple objects. Employed a range of evaluation metrics, both subjective and quantitative, to ensure the quality of the proposed system. The experimental results demonstrate significant improvements in both resolution and depth estimation accuracy. Specifically, the method provides more reliable and realistic depth information, particularly in complex real-world situations.
Depth estimation was performed using two methods (R-CNN and YOLO) across two models: Multi-Camera Calibration System and Multi-Camera Video Generation using Panoramic Image. In the Camera Calibration Model, the YOLO method significantly outperforms R-CNN, with an average error of only 9.4 cm compared to 30.43 cm when measured against Ground Truth data. Similarly, in the Panorama Image Model, the YOLO method demonstrated superior accuracy, achieving an average error of 17.2 cm compared to 23.55 cm with R-CNN.