Artificial Visual Attention based on a Growing Neural Gas

Summary

Whenever human observers perceive their surroundings, their visual system is guided by attention. Interesting parts of a given scene draw significantly more attention than uninteresting areas and the more interesting a given area is, the earlier it is processed in higher cognitive layers. But how is determined what is interesting and which parts of a given visual scene can be ignored? Psychological research shows, that the concept of saliency has a significant impact on the deployment of attention. Saliency is a measure for local feature contrasts within a scene. In this case the term feature refers to, for example, color, orientation or size. In autonomous systems, for example rescue robots, the concepts of visual attention, especially saliency, can be applied to pre-select interesting parts of a scene, in order to process only small areas of the visual input within computationally intensive algorithm such as object recognition.

The focus of this research project is to further develop a novel approach in the domain of artificial visual attention. This approach is based on learning the underlying structure of a visual scene using an unsupervised learning algorithm. For this purpose the Growing Neural Gas algorithm [1] was modified to be able to approximate the underlying structure of color images. The algorithm produces a structure of nodes, interconnected through edges over a set amount of learning iterations. This structure is then separated based on the color difference between nodes. This leads to several independent sub-graphs of which sub-graphs with large bounding boxes are identified as representing the background of the image. Subsequently, the learning on the graph produced by the Growing Neural Gas algorithm is resumed. However, in this learning process only foreground pixels (i.e. pixels with a color significantly different from the background graphs) are used as samples. This concentrates the graph structures on foreground objects exclusively. Again the resulting graph is separated. This time, however, a set of Superpixels [2] is assigned to each sub-graph. The Superpixel algorithm produces an oversegmentation of an image into sets of pixels. This assignment determines which pixels of an images are represented by which sub-graph. For each sub-graph, pixel-based feature magnitudes and saliencies in the dimensions color, orientation, size, eccentricity, and symmetry are computed. This leads to saliency maps for each feature dimension. Saliency maps are gray-scale images which represent how conspicuous a given area of the image is. The feature saliency maps are combined into an overall saliency map from which the most conspicuous area of an image can be determined and extracted. The complete workflow of the algorithm is depicted in Figure 1.

Fig. 1: The four steps necessary in order to analyze a single image using the artificial attention system based on a Growing Neural Gas and separate the foreground object from the background.

Ongoing research topics

Graph-based top-down influences: Similar to other artificial attention systems, the presented system is able to incorporate top-down influences into the saliency computation process. In this topic the nuances and possibilities of integrating top-down influences into a graph-based system are examined.

Adaptive graphs in dynamic scenes: The Growing Neural Gas algorithm provides a tremendous advantage when learning in dynamic scenes (i.e. videos): Graphs learned for previous frames can be reused in subsequent frames. In this ongoing research topic, this property is explored further.

Hierarchical processing of input images: Saliency computations on complex images profit from hierarchical processing which splits the learning process into multiple layers of graphs with differing granularity. Using image heuristics, this enables the algorithm to concentrate the learning effort on heterogeneous parts of an image while quickly processing homogeneous areas.

Graph-based affordance estimation: Affordances describe the action possibilities which are offered by an object. For example, the handle of a cup offers a grasping possibility. Affordances can guide attention towards objects. The estimation of affordances was implemented into the presented algorithm in prototypical manner [3], research into a sophisticated integration is still ongoing.

Integration of depth information: Depth information enables us to further distinguish between different but similarly colored objects. Analyzing how such information can be integrated into a graph-based system is another interesting current research direction.

References

1.	Fritzke, B. (1995). A Growing Neural Gas Network Learns Topologies. In: Advances in Neural Information Processing Systems.
2.	Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P. & Süsstrunk, S. (2010). SLIC Superpixels. In: EPFL Technical Report no. 149300.
3.	Tünnermann, J., Born, C. & Mertsching, B. (2014). Integrating Object Affordances with Artificial Visual Attention. In: Computer Vision - ECCV 2014 Workshops - Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part II, 2015, pp. 427-437.

Contact

Do you have any questions or comments? Please contact:

Jan Tünnermann