Semantic 3D Scene Analysis in Interior Environments

Summary

Equipping autonomous robots with techniques of artificial intelligence has become a central aspect in the field of robotic research. One of the most essential challenges for autonomous robots is their ability to adapt to and to interact with their environment.
To increase these abilities, several techniques for object detection and recognition have been applied over the course of the last decades, most of them based on 2D image processing. In this project, we focus on handling recognition of objects and understanding entire scenes based on 3D information of the environment instead of using 2D images.
One way to acquire such 3D information is with the help of actuated laser rangefinders (LRF) that are installed on mobile robots. The acquired data is made available as a cloud of 3D points. Such point clouds typically contain several 100,000 unordered points, a quantity impeding an instantaneous interpretation of the whole data set.

The aim of this project is to analyze interior environments, such as domestic living spaces, offices or laboratories with the help of autonomous robots. A major portion of this analysis focuses on the automatic generation of semantic maps. Such map representations do not only hold distance values that have been projected into one common coordinate system. They also hold descriptions of how particular combinations of geometric features relate to domains or classes of objects. For instance, we can regard what type of room we are dealing with, what kind of objects are located within that environment and which functions and interaction schemes can be related to those objects.
Based on the category of an environment and the objects contained, we can introduce more complex assignments for a robot (manipulation tasks, personal assistance, etc.).

Topics

The following topics are processed in this project:

Scene analysis of individual interiors
	For many tasks it is significant to reconstruct the physical arrangements of the surrounding environment in an initial processing step. The understanding of the continuous extents of a scenario advances the categorization of the environment as well as the interpretation of objects contained in such an environment. We would like to develop methods that allow for a fast reconstruction of a room by a single panoramic 3D scan capturing the main components of the room. A first step towards such a reconstruction can be introduced by segmentation techniques that decompose the cloud into several regions based on local point features. In case of domestic rooms, where we can expect mostly planar boundaries, it is convenient to apply a segmentation which identifies coherent points forming planar surfaces. As a consequence, we can segment the point cloud into regions in particular those elements corresponding to the principal planar boundaries of a room such as walls, floors and ceilings. In most real-life environments, however, this differentiation becomes difficult as walls can be largely occluded in the presence of cluttered furniture and certain sections of surfaces can be isolated by windows or doors. Hence, advanced methods are required to discriminate between points belonging to the planar boundaries of a room and those belonging to objects contained within the room (in complicated cases located close to the boundary or possessing similar features). The existence of openings introduces the need for a differentiation whether points are located within or outside the current room. To be able to apply our system in various environments, we would like to cope with more complex scenarios such as ceiling parts that are not necessarily parallel to floor, multiple height-levels within one room, etc.
Intelligent exploration and navigation
	The range of applications deploying mobile robots for exploring interior environments should not only be restricted to single rooms. Instead, it is desirable to navigate corridors or even entire buildings as far as these are accessible for the robot. For a continuous extension of its map representation, the robot needs to identify those locations at which it is beneficial to acquire new scans that are not yet contained in the map. This includes the ability of the exploration behavior to pass beyond the confinement of one room and further, to decide which areas cannot be navigated due to the situation of the scene (e.g. underground) or the restrictions of the robot (e.g, dimensions, mobility). Special treatment needs to be devoted to the transition between rooms that is, while passing through doorways from one room to another. In this case we may perceive a big portion of the current room and little from the target room. Once we perform another 3D scan after reaching the target room, we may perceive many features of that room and little from the previous room. This unbalanced intersection ratio may cause an erroneous registration of the scans and thus, an inconsistent map. Therefore, we would like to differentiate between multiple types of pathways, such as doors, windows and passages and regard them as landmarks within our map. To enhance the semantic representation, we are interested in storing additional information as attributes. This can for instance include whether these landmarks are reachable or which states can be assigned, e.g. an open vs. a closed state for doors.
Semantic 3D mapping
	The accumulation of individual 3D laser scans taken from different poses leads to the generation of a global map representation that connects multiple scans from different view points. In order to merge multiple scans, the point clouds have to be registered (that is determining the transform between the clouds) and in case of positive registration transformed into a common map representation (cf. fig. 4). However, it may not be the only information to be stored in such a map. Besides just holding plain points, we would like to be able to annotate semantic properties in the map. That is for instance, to which object a certain subset of points belongs to and how this set of points relates to other points contained in the map. Another question deals with update policies for static map sections. Assume certain sets of points in the map have already been categorized and labeled. Now, if the map is extended with new scans that are merged into the map, it is possible that these new points are located in areas that have already been labeled. Here, the system has to determine, which of the new points can be associated to existing labels without initiating a recognition process on the entire point cloud. The process of appending points should also cope with the dynamic character of the environment. Assume, a robot scans a door in a closed state at one instant of time. In a later scan the same door is found in an open state. The system should be able to detect this anomaly and mark it appropriately in the map. Besides storing the information that changes occurred, it would also be interesting to gain information about the kinematic structure of the object causing the change. Finally, the mapping system should be able to determine which parts of the map are still incomplete and require the further scanning. It should estimate the next best scan positions and calculate a route to those locations. If only partial scans are needed in order to complete the map, determine the maximum sufficient scanning field of view and actuation angles to alleviate the processing time.
Occlusion analysis & scene prediction
	Due to the single viewpoint of individual scans, objects in the foreground (with respect to the scanning device) may obscure objects located further in the background. As a result, the scanning device can not acquire the distance values of object sections located in the occluded area; A subset of scan points is missing, the point cloud is exhibiting holes. If multiple 3D scans from different view points are combined, occlusions may be reduced to a certain degree. In cluttered environments however, not all occlusions can be avoided as not every next-best-view position can actually be reached by the robot. Within our project, we would like to address the detection and the recovery of occlusions that exist even after merging multiple 3D scans. The problem of resolving such occlusions depends on the complexity of the objects that are occluded as well as the complexity of the objects responsible for the occlusion. Although, it might not always be possible to recover scenarios involving too complex shapes, we strive for algorithms that are able to identify structures around holes resulting from occlusions, approximate and consistently continue the surface data of occluded areas and reconnect fractured object parts.
Point cloud fusion from different sensors
	The acquisition of point clouds with laser range finders (LRF) has the advantage of delivering relatively precise data even for longer distances; on the other hand the acquisition of 3D point clouds cannot be performed instantaneously as the scanner needs to be actuated (e.g. sweeped). Another reasonable option to obtain point clouds is the XBox Kinect sensor. The Kinect delivers ordered point clouds with RGB color values assigned to each 3D point. However, the perception is restricted to a certain range and depth resolution. With an increasing acquisition distance the data is gradually lacking precision. A desirable feature would be to merge Kinect and LRF data to make use of the high precision and range of LRF on the one hand, and the density, additional color information as well as the acquisition rate of the Kinect on the other.
Testing scenarios in virtual reality
	For many of our experiments, we rely on appropriate testing data of interior environments. However, testing and development of algorithms on real 3D data is not always advantageous. Oftentimes particular tests only require certain sections from point clouds, scenes with a limited level of complexity, or surfaces without or with a predefined noise level, respectively. For this reason, we deploy a 3D simulation framework in addition to our hardware systems. With this framework point clouds can be acquired in virtual environments under controlled conditions. In addition, entire perception-action cycles of a robotic system can be tested in individually equipped environments. With respect to this project, we are focussing on the simulation of laser range finders and 3D cameras. We wish to provide adequate noise models and improve acceleration and complexity of the point cloud synthesis. Another aspect of our research is an automatic testing functionality linked to our simulator enabling a rapid prototyping of newly developed algorithms.

Contact

If you have any further questions, do not hesitate to contact:

Prof. Dr.-Ing. Bärbel Mertsching