GET-Forschungsseminar Abstracts

Master's Thesis Intermediate Presentation: Exploration of Attention Mechanisms in Vision Transformers for Disaster-Related Image Classification

Yaznik Venkatraman Joshi, GET Lab

Presentation: 09.04.2024, 16:30h, P 1.6.02.1

Abstract:

This thesis aims to explore the role of different attention mechanisms in Vision Transformers to perform image classification. One of the main objectives is developing an efficient Vision Transformer model for fast inference on resource-limited devices such as mobile rescue robots. To this end, the image classification will be done on a disaster image dataset that is derived from the Incidents1M dataset. Based on the literature research, Vision Transformers such as Swiftformer, MobileViTv2, and Crossformer are selected as initial architectures for implementation. Their performance will be compared and evaluated in terms of accuracy, training time, model size, and inference latency. This evaluation will then be used to determine the efficiency of the architectures in terms of resource utilization and computation time. Furthermore, an image classification pipeline will be developed with the best-performing attention mechanism and will be further experimented on. This is to explore possible optimizations by making changes in the selected attention mechanism to improve resource utilization and latency.