Unsupervised Deep Learning-based Shape Retrieval Using Invariant Contour Features

P. Tondgaonkar

The objective of this work was to develop an unsupervised deep learning-based method for shape retrieval. The developed method uses specific scale- and rotation-invariant keypoints detected along the curvature extrema of object contours, which, according to information theory, are the most informative points. Each detected keypoint is described by a feature vector and has the following geometrical information: position, scale, and orientation. The similarity in shape between contour segments enclosed by these keypoints can pose challenges in distinguishing between objects. For this purpose, graphs are used to model the spatial arrangement of keypoints using their geometric information and feature vectors. The development of the method involves three main steps: firstly, the creation of training data in the form of graphs using the geometric information and corresponding feature vectors of keypoints; secondly, the selection, implementation, and testing of appropriate GNN architectures to learn shape representations; and thirdly, measuring the similarity between shapes using the learned representations to perform shape retrieval. The MPEG-7 data set, which consists of 1400 shape samples in 70 distinct classes with 20 samples per class, is utilized in this work. The retrieval performance of the developed method is evaluated using a metric called the bull’s eye score (BES). Primarily, two shape retrieval methods using two unsupervised learning techniques were implemented and tested: GNN-based autoencoders and self-supervised graph contrastive learning. Specifically, a GNN-based autoencoder, namely the mod-GSAE, achieved an overall BES of 54.96 %. Furthermore, the retrieval results were systematically analyzed, and it was found that the representations become less distinct when the distribution of keypoints in the shape is uneven. The findings of this systematic analysis are presented in this work.