A Sparse Transformer-Based Approach for Image Captioning

Zhou Lei
Congcong Zhou
Shengbo Chen
Yiyong Huang
Xianrui Liu

Open link

Publication date

January 2020

DOI

10.1109/ACCESS.2020.3024639

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Journal

IEEE Access

Abstract

Image Captioning is the task of providing a natural language description for an image. It has caught significant amounts of attention from both computer vision and natural language processing communities. Most image captioning models adopt deep encoder-decoder architectures to achieve state-of-the-art performances. However, it is difficult to model knowledge on relationships between input image region pairs in the encoder. Furthermore, the word in the decoder hardly knows the correlation to specific image regions. In this article, a novel deep encoder-decoder model is proposed for image captioning which is developed on sparse Transformer framework. The encoder adopts a multi-level representation of image features based on self-attention to ...

Extracted data

We use cookies to provide a better user experience.

Data Protection

A Sparse Transformer-Based Approach for Image Captioning

Abstract

Extracted data

A Sparse Transformer-Based Approach for Image Captioning

Abstract

Extracted data

Related items

Related items