(49-6) 34 * << * >> * Русский * English * Содержание * Все выпуски

RANSAC-Scaled Depth: A Dual-Teacher Framework for Metric Depth Annotation in Data-Scarce Scenarios
M.V. Lazukov 1,2, A.V. Shoshin 2, P.V. Belyaev 2, E.A. Shvets 2

Moscow Institute of Physics and Technology,
117303, Russia, Kerchenskay 1 “А”;
NVI Solutions LLC,
115191, Moscow, Russia, Kholodilny Lane 3, block 1, building 3, office 3103

  PDF, 3859 kB

DOI: 10.18287/COJ1810

Страницы: 1174-1181.

Язык статьи: English.

Аннотация:
This paper addresses the problem of training metric monocular depth estimation models for specialized domains in the absence of labeled real-world data. We propose a hybrid pseudo-labeling method that combines the predictions of two models: a metric "teacher," trained on synthetic data to obtain the correct scale, and a foundational relative "teacher" for structurally accurate scene geometry and depth. The relative depth map is calibrated via a linear transformation, whose parameters are found using the outlier-robust RANSAC algorithm on a subset of "support" points. Experiments on the KITTI dataset show that the proposed approach improves the quality of the pseudo-labels, reducing the commonly used error metric AbsRel by 21.6 % compared to the baseline method. A compact "student" model trained on these labels demonstrated superiority over the baseline model, achieving a 23.8 % reduction in AbsRel and a 13.8 % reduction in RMSE log. The results confirm that the proposed method significantly improves domain adaptation from general purpose to the specific domain, allowing for the creation of high-precision metric models without the need to collect and annotate volumes of real data.

Ключевые слова:
monocular metric depth estimation, synthetic data, RANSAC, pseudo-labeling, domain adaptation.

Citation:
Lazukov MV, Shoshin AV, Belyaev PV, Shvets EA, RANSAC-Scaled Depth: A Dual-Teacher Framework for Metric Depth Annotation in Data-Scarce Scenarios. Computer Optics 2025; 49(6): 1174-1181. DOI: 10.18287/COJ1810.

References:

  1. Zhang J. Survey on Monocular Metric Depth Estimation. arXiv Preprint. 2025. Source: <https://arxiv.org/abs/2501.11841>. DOI: 10.48550/arXiv.2501.11841.
  2. Zhao Y, Bian H, Chen K, Ji P, Qu L, Lin S-Y, Yu W, Li H, Chen H, Shen J, Raj B, Xu M. Metric from Human: Zero-shot Monocular Metric Depth Estimation via Test-time Adaptation. NIPS '24: Proceedings of the 38th International Conference on Neural Information Processing Systems 2024; 104724-104753. ISBN: 9798331314385.
  3. Piccinelli L, Yang Y-H, Sakaridis C, Segu M, Li S, Van Gool L, Yu F. UniDepth: Universal monocular metric depth estimation. In: Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR); 2024. DOI: 10.1109/CVPR52733.2024.00963.
  4. Zhang Z, Zhang Y, Li Y, Wu L. Review of monocular depth estimation methods. Journal of Electronic Imaging 2025; 34(2):020901. DOI: 10.1117/1.JEI.34.2.020901.
  5. Chen J, Lu W, Yuan L, Wu Y, Xue F. Estimating construction waste truck payload volume using monocular vision. Resources, Conservation and Recycling 2022; 177:106013. DOI: 10.1016/j.resconrec.2021.106013.
  6. Yang T, Wei S, Fan L, Zhang L. Perspective transform-based depth estimation of monocular camera for electrocution threat determination of construction machinery. The International Archives of the Photogrammetry Remote Sensing and Spatial Information Sciences 2024; XLVIII-4-2024:725-730. DOI: 10.5194/isprs-archives-XLVIII-4-2024-725-2024.
  7. Wang L, Wang B, Wang S, Ma F, Dong X, Yao L, Ma H, Mohamed MA. An effective method for sensing power safety distance based on monocular vision depth estimation. International Transactions on Electrical Energy Systems 2023; 2023(1):8480342. DOI: 10.1155/2023/8480342.
  8. Butt M, Nasir N, Rashid R. A review of perception sensors, techniques, and hardware architectures for autonomous low-altitude UAVs in non-cooperative local obstacle avoidance. Robotics and Autonomous Systems 2024; 173:104629. DOI: 10.1016/j.robot.2024.104629.
  9. Andriyanov N. Estimating Object Coordinates Using Convolutional Neural Networks and Intel Real Sense D415/D455 Depth Maps. International Conference on Information Technology and Nanotechnology (ITNT) 2022; 1-4. DOI: 10.1109/ITNT55410.2022.9848700
  10. Kokhan VL, Konyushenko ID, Bocharov DA, Seleznev IO, Nikolaev IP, Nikolaev DP. TSQ-2024: A categorized dataset of 2D LiDAR images of moving dump trucks in various environment conditions, in: Osten W, Nikolaev D, Debayle J . ICMV 2024; 2024. 13517: 1351709-1-1351709-6. DOI: 10.1117/12.3055203.
  11. Vasiljevic I, Kolkin N, Zhang S, Luo R, Wang H, Dai FZ, Daniele AF, Mostajabi M, Basart S, Walter MR, Shakhnarovich G. DIODE: A Dense Indoor and Outdoor Depth Dataset. arXiv Preprint. 2019. Source: <https://arxiv.org/abs/1908.00463>. DOI: 10.48550/arXiv.1908.00463
  12. Dong Q, Zhou Z, Qiu X, Zhang L. A survey on self-supervised monocular depth estimation based on deep neural networks. IEEE Transactions on Neural Networks and Learning Systems 2025; 1-21. DOI: 10.1109/TNNLS.2025.3552598.
  13. Zhihang S, He Z, Qiming M, Ming R, Mao Z, Pei H, Peng L, Hu J, Yao D, Zhang Y. Synthetic datasets for autonomous driving: A survey. IEEE Transactions on Intelligent Vehicles 2023; PP:1-19. DOI: 10.1109/TIV.2023.3331024.
  14. Izquierdo S, Civera J. SfM-TTR: Using structure from motion for test-time refinement of single-view depth networks. In: Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR); 2023. 21466-21476. DOI: 10.1109/CVPR52729.2023.02056.
  15. Li Z, Snavely N. MegaDepth: Learning single-view depth prediction from Internet photos. In: Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition (CVPR); 2018. 2041-2050. DOI: 10.1109/CVPR.2018.00218.
  16. Liu X, Sinha A, Ishii M, Hager GD, Reiter A, Taylor RH, Unberath M. Dense depth estimation in monocular endoscopy with self-supervised learning methods. IEEE Transactions on Medical Imaging 2020; 39(5):1438-1447. DOI: 10.1109/TMI.2019.2950936.
  17. Roberts M, Ramapuram J, Ranjan A, Kumar A, Bautista MA, Paczan N, Webb R, Susskind JM. Hypersim: A photorealistic synthetic dataset for holistic indoor scene understanding. In: International Conference on Computer Vision (ICCV); 2021. Source: <https://arxiv.org/pdf/2011.02523>.
  18. Gaidon A, Wang Q, Cabon Y, Vig E. Virtual worlds as proxy for multi-object tracking analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. 4340-4349. DOI: 10.1109/CVPR.2016.470.
  19. Ros G, Sellart L, Materzynska J, Vazquez D, Lopez AM. The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016. 3234-3243. DOI: 10.1109/CVPR.2016.352
  20. Karlo K. CARLA dataset for monocular depth estimation with varying camera parameters. Zenodo. 2023. DOI: 10.5281/zenodo.7899804
  21. McCormac J, Handa A, Leutenegger S, Davison AJ. SceneNet RGB-D: Can 5M synthetic images beat generic ImageNet pre-training on indoor segmentation? In: IEEE International Conference on Computer Vision (ICCV); 2017. 2697-2706. DOI: 10.1109/ICCV.2017.292.
  22. Yang L, Kang B, Huang Z, Zhao Z, Xu X, Feng J, Zhao H. Depth Anything V2. arXiv Preprint. 2024. Source: <https://arxiv.org/abs/2406.09414>. DOI: 10.48550/arXiv.2406.09414
  23. Bochkovskii A, Delaunoy A, Germain H, Santos M, Zhou Y, Richter S-R, Koltun V. Depth Pro: Sharp Monocular Metric Depth in Less Than a Second. arXiv Preprint. 2024. Source: <https://arxiv.org/abs/2410.02073>. DOI: 10.48550/arXiv.2410.02073
  24. Li Z, Bhat SF, Wonka P. PatchRefiner: Leveraging synthetic data for real-domain high-resolution monocular metric depth estimation. Computer Vision - ECCV 2024; 2024. 15125:250-267. DOI: 10.1007/978-3-031-72855-6_15.
  25. Ke B, Obukhov A, Huang S, Metzger N, Daudt RC, Schindler K. Repurposing diffusion-based image generators for monocular depth estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2024. 9492-9502. DOI: 10.1109/CVPR52733.2024.00907.
  26. Man K, Chahl J. A Review of Synthetic Image Data and Its Use in Computer Vision. Journal of Imaging 2022;8(11):310. DOI: 10.3390/jimaging8110310.
  27. Toldo M, Maracani A, Michieli U, Zanuttigh P. Unsupervised domain adaptation in semantic segmentation: A review. Technologies 2020;8(2):35. DOI: 10.3390/technologies8020035.
  28. Wilson G, Cook DJ. A survey of unsupervised deep domain adaptation. ACM Transactions on Intelligent Systems and Technology 2020; 11(5):1–46. DOI: 10.1145/3400066.
  29. Chhabra S, Venkateswara H, Li B. Domain Adaptation Using Pseudo Labels. arXiv Preprint. 2024. Source: https://arxiv.org/abs/2402.06809. DOI: 10.48550/arXiv.2402.06809
  30. Yen Y-T, Lu C-N, Chiu W-C, Tsai Y-H. 3D-PL: Domain adaptive depth estimation with 3D-aware pseudo-labeling. Computer Vision - ECCV 2022. 710-728. DOI: 10.1007/978-3-031-19812-0_41.
  31. Kage P, Rothenberger JC, Andreadis P, Diochnos DI. A Review of Pseudo-Labeling for Computer Vision. arXiv Preprint. 2024. Source: <https://arxiv.org/abs/2408.07221>. DOI: 10.48550/arXiv.2408.07221
  32. Pham D-H, Do T, Nguyen P, Hua B-S, Nguyen K, Nguyen R. SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation. arXiv Preprint. 2024. Source: <https://arxiv.org/abs/2411.18229>. DOI: 10.48550/arXiv.2411.18229
  33. Marsal R, Chapoutot A, Xu P, Filliat D. A Simple yet Effective Test-Time Adaptation for Zero-Shot Monocular Metric Depth Estimation. arXiv Preprint. 2024. Source: <https://arxiv.org/abs/2412.14103>. DOI 10.48550/arXiv.2412.14103
  34. Skoryukina N, Arlazarov VV, Nikolaev DP. Fast method of ID documents location and type identification for mobile and server application. In: ICDAR 2019. 850–857. DOI: 10.1109/ICDAR.2019.00141.
  35. Bugai OA, Kulagin PA, Polevoy DV, Nikolaev DP. Orthotropic alignment for X-ray computed tomography images. In: Pang X, editor. Fifth Symposium on Pattern Recognition and Applications; 2025. 135400E-1–135400E-7. DOI: 10.1117/12.3056308.
  36. Rybakova EO, Trusov AV, Limonova EE, Skoryukina NS, Bulatov KB, Nikolaev DP. PESAC, the generalized framework for RANSAC-based methods on SIMD computing platforms. IEEE Access 2023;11:82151–82166. DOI: 10.1109/ACCESS.2023.3301777.
  37. Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Proc IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2012. 3354-3361. DOI: 10.1109/CVPR.2012.6248074.
  38. Eigen D, Puhrsch C, Fergus R. Depth map prediction from a single image using a multi-scale deep network. In: Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS); 2014. 2:2366-2374.
  39. Cabon Y, Murray N, Humenberger M. Virtual KITTI 2. arXiv Preprint. 2020. Source: <https://arxiv.org/abs/2001>.10773. DOI: 10.48550/arXiv.2001.10773.

© 2009, IPSI RAS
Россия, 443001, Самара, ул. Молодогвардейская, 151; электронная почта: journal@computeroptics.ru; тел: +7 (846) 242-41-24 (ответственный секретарь), +7 (846) 332-56-22 (технический редактор), факс: +7 (846) 332-56-20