Nachricht

Master's Thesis Presentation: Exploration of Attention Mechanisms in Vision Transformers for Disaster-Related Image Classification

Datum:	2024/04/09
Uhrzeit:	16:30 Uhr
Ort:	P 1.6.02.1
Autor(en):	Yaznik Venkatraman Joshi

	On Tuesday, April 9, Yaznik Venkatraman Joshi will present the results of his master's thesis with the title: Exploration of Attention Mechanisms in Vision Transformers for Disaster-Related Image Classification Abstract: This thesis aims to explore the role of different attention mechanisms in Vision Transformers to perform image classification as means of a pre-training to acquire rich visual features that can be used for other downstream tasks where annotated data may be scarce. One of the main objectives is developing an efficient Vision Transformer model for fast inference on resource-limited devices such as mobile rescue robots. To this end, the image classification will be done on disaster image datasets that are derived from the Incidents1M dataset. Based on the literature research, Vision Transformers such as SwiftFormer, MobileViTv2, and CrossFormer are selected as initial architectures for implementation. Their performance will be compared and evaluated in terms of accuracy, training time, model size, and computation cost. This evaluation will then be used to select an architecture for further evaluation and optimization. Furthermore, a proposed Transpose Additive Attention mechanism is integrated into the SwiftFormer architecture and, is experimented and analyzed with different datasets and configurations. The possible optimizations will be explored by compression of the attention mechanism in SwiftFormer and the proposed TAAFormer architectures to improve resource utilization and performance. In addition, visualizations of the attention maps will be createdto provide interpretability of the architecture.