pp. 2569-2583
S&M3687 Research Paper of Special Issue https://doi.org/10.18494/SAM4822 Published: June 28, 2024 Monocular Depth Estimation of 2D Images Based on Optimized U-net with Transfer Learning [PDF] Ming-Tsung Yeh,1 Tsung-Chi Chen, Neng-Sheng Pai, and Chi-Huan Cheng (Received December 13, 2023; Accepted May 22, 2024) Keywords: depth estimation, transfer-learning-based U-net, convolutional autoencoder, depth classification
Estimating depth from 2D images is vital in various applications, such as object recognition, scene reconstruction, and navigation. It offers significant advantages in augmented reality, image refocusing, and segmentation. In this paper, we propose an optimized U-net network based on a transfer learning encoder and advanced decoder structures to estimate depth on a single 2D image. The encoder–decoder architecture is built from ResNet152v2 as the encoder and an improved U-Net-based decoder to achieve accurate depth predictions. The introduced ResNet152v2 network had been pretrained on the extensive ImageNet dataset, which possesses weights to extract rich and generalizable features for large-scale image classification. This proposed encoder can have prior knowledge to reduce training time and improve object position recognition. The proposed composite up-sampling block (CUB) designed in the decoder applied the 2x and 4x bilinear interpolation combined with the one-stride transpose convolution to expand the low-resolution feature maps obtained from the encoder, enabling the network to recover finer details. The skip connections are used to enhance the representation power of the decoder. The output of each up-sampling block is concatenated with the corresponding pooling layer. This fusion of features from different scales helps capture local and global context information, contributing to more accurate depth predictions. This method utilizes RGB images and depth maps as training inputs from the NYU Depth Dataset V2. The experimental results demonstrate that the transfer learning-based encoder, coupled with our proposed decoder and data augmentation techniques, enables the transformation of complex RGB images into accurate depth maps. The system accurately classifies different depth ranges based on depth data ranging from 0.4 to 10 m. By mapping different depths to corresponding colors using gradational color scales, precise depth classification can be performed on the 2D images.
Corresponding author: Neng-Sheng PaiThis work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Ming-Tsung Yeh,1 Tsung-Chi Chen, Neng-Sheng Pai, and Chi-Huan Cheng, Monocular Depth Estimation of 2D Images Based on Optimized U-net with Transfer Learning, Sens. Mater., Vol. 36, No. 6, 2024, p. 2569-2583. |