Attention Networks in Visual Question Answering and Visual Dialog

Guo, Dalu

Publication date

January 2021

Publisher

Faculty of Engineering, School of Computer Science

Abstract

Attention is a substantial mechanism for human to process massive data. It omits the trivial parts and focuses on the important ones. For example, we only need to remember the keywords in a long sentence and the principal objects in an image for rebuilding the sources. Therefore, it is crucial to building an attention network for artificial intelligence to solve the problem as human. This mechanism has been fully explored in the text-based tasks, such as language translation, reading comprehension, and sentimental analysis, as well as the visual-based tasks, such as image recognition, object detection, and action recognition. In this work, we explore the attention mechanism in the multi-modal tasks, which involve the inputs of both text...