Header

UZH-Logo

Maintenance Infos

Learning Depth With Very Sparse Supervision


Loquercio, Antonio; Dosovitskiy, Alexey; Scaramuzza, Davide (2020). Learning Depth With Very Sparse Supervision. IEEE Robotics and Automation Letters, 5(4):5542-5549.

Abstract

Motivated by the astonishing capabilities of natural intelligent agents and inspired by theories from psychology, this paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment. Existing works for depth estimation require either massive amounts of annotated training data or some form of hard-coded geometrical constraint. This paper explores a new approach to learning depth perception requiring neither of those. Specifically, we propose a novel global-local network architecture that can be trained with the data observed by a robot exploring an environment: images and extremely sparse depth measurements, down to even a single pixel per image. From a pair of consecutive images, the proposed network outputs a latent representation of the camera's and scene's parameters, and a dense depth map. Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches. We believe that this work, in addition to its scientific interest, lays the foundations to learn depth with extremely sparse supervision, which can be valuable to all robotic systems acting under severe bandwidth or sensing constraints.

Abstract

Motivated by the astonishing capabilities of natural intelligent agents and inspired by theories from psychology, this paper explores the idea that perception gets coupled to 3D properties of the world via interaction with the environment. Existing works for depth estimation require either massive amounts of annotated training data or some form of hard-coded geometrical constraint. This paper explores a new approach to learning depth perception requiring neither of those. Specifically, we propose a novel global-local network architecture that can be trained with the data observed by a robot exploring an environment: images and extremely sparse depth measurements, down to even a single pixel per image. From a pair of consecutive images, the proposed network outputs a latent representation of the camera's and scene's parameters, and a dense depth map. Experiments on several datasets show that, when ground truth is available even for just one of the image pixels, the proposed network can learn monocular dense depth estimation up to 22.5% more accurately than state-of-the-art approaches. We believe that this work, in addition to its scientific interest, lays the foundations to learn depth with extremely sparse supervision, which can be valuable to all robotic systems acting under severe bandwidth or sensing constraints.

Statistics

Citations

Altmetrics

Downloads

5 downloads since deposited on 27 Jan 2021
5 downloads since 12 months
Detailed statistics

Additional indexing

Item Type:Journal Article, refereed, original work
Communities & Collections:03 Faculty of Economics > Department of Informatics
Dewey Decimal Classification:000 Computer science, knowledge & systems
Scopus Subject Areas:Physical Sciences > Control and Systems Engineering
Physical Sciences > Biomedical Engineering
Physical Sciences > Human-Computer Interaction
Physical Sciences > Mechanical Engineering
Physical Sciences > Computer Vision and Pattern Recognition
Physical Sciences > Computer Science Applications
Physical Sciences > Control and Optimization
Physical Sciences > Artificial Intelligence
Language:English
Date:2020
Deposited On:27 Jan 2021 08:37
Last Modified:28 Jan 2021 21:00
Publisher:Institute of Electrical and Electronics Engineers
ISSN:2377-3766
OA Status:Green
Publisher DOI:https://doi.org/10.1109/lra.2020.3009067
Related URLs:https://ieeexplore.ieee.org/document/9140363
Other Identification Number:merlin-id:20319

Download

Green Open Access

Download PDF  'Learning Depth With Very Sparse Supervision'.
Preview
Content: Accepted Version
Filetype: PDF
Size: 5MB
View at publisher