Self-supervised Estimation of Depth and Ego-motion to Detect Moving Objects in an Unconstrained Monocular Video

M. Afroze

Detection of moving objects within dynamic scenes is a challenging problem because the camera ego-motion and the motion of objects are involved. This thesis concentrates on classifying the detected objects, from an object detection system, to be moving or stationary while the camera itself is moving. To solve this task, depth and ego-motion information are first computed in an end-to-end self-supervised learning network from consecutive camera frames in a monocular video. The self-supervision comes from the video data itself without the need to have ground-truth data. Hence, in this thesis, different deep neural network architectures will be evaluated and an appropriate model for moving object detection will be selected. Furthermore, various spatio-temporal information from consecutive frames will be exploited to devise a proper loss objective function. Subsequently, the objects are classified as moving or stationary based on the 2D projection of the objects on the target image using the estimated depth and ego-motion of the scene. For training the proposed approach, KITTI and Cityscapes datasets will be used. Finally, the developed system will be evaluated using the KITTI2015 dataset as it provides ground-truth data for the segmentation of moving cars.