Kurzbeschreibung

Convolutional Neural Network for Depth and Odometry Estimation from Monocular Video

Student:	Mawe Sprenger
Jahr:	2019
Themengebiet:	Robotik
Schlüsselwörter:	3D, a, a, a, a, absolute, accurate, accurate, all, an, an, and, and, and, and, and, and, and, and, and, are, are, are, are, are, as, as, aspects, autonomous, be, be, be, be, camera, cars, continuous, convolutional, create, create, cues, current, data, data, data, depends, depth, depth, depth, depth, depth, depth, depth, depth, developed, developed, development, different, ego-motion, enables, equipment, estimated, estimates, estimation, estimation, estimation, evaluated, evaluated, exact, expensive, for, for, For, for, for, from, fuse, get, global, goal, ground-truth, hand, hand, has, have, However, implementation, implemented, important, improvements, information, into, is, is, is, its, labels, learning, learning, learning, local, make, map, maps, maps, maps, measurements, measurements, measurements, method, method, methods, monocular, navigation, network, neural, novel, odometry, odometry, of, of, of, of, of, of, of, of, of, of, of, On, on, on, one, only, or, other, overcome, performance, pose, pose, pose, position, positional, possible, possible, possible, problem, purpose, quality, recorded, regarding, relevancy, required, robots, safe, scale, self-driving, sensor, significantly, simultaneous, Since, source, sources, streams, such, system, system, system, taken, temporal, tested, tested, that, that, the, the, the, the, the, the, the, the, the, the, the, The, the, their, thesis, this, this, this, Three-dimensional, To, to, to, to, To, To, to, trained, two, unsupervised, use, used, using, using, video, which, while, will, will, with, without

Three-dimensional maps are an important source of information for the safe navigation of self-driving cars or autonomous robots. To create such maps, two data sources are required: On the one hand, continuous depth measurements have to be taken, while on the other hand the current position of the sensor has to be recorded to fuse all local measurements into a global map. However, exact measurements of depth and positional data are only possible with expensive equipment. To overcome this problem, the goal of this thesis is the development and implementation of a system for the simultaneous estimation of depth and camera ego-motion from monocular video streams. For this purpose, a convolutional neural network will be trained using a novel unsupervised learning method that enables learning of absolute scale depth and odometry without using ground-truth pose labels. To get depth and pose estimates that are as accurate as possible, different aspects of the developed learning method and the implemented system will be tested and evaluated for their relevancy. Since the quality of pose estimation depends significantly on an accurate depth estimation, methods which make use of temporal depth cues are tested for possible improvements. The developed system is evaluated regarding its performance and the estimated depth and odometry data is used to create 3D maps.