Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Jointly training two tasks is non-trivial due to significant differences in learning difficulties and convergence rates. In this work, we present a conceptually simple yet efficient framework that simultaneously processes the two tasks in a united framework. Our main contributions are three-fold: (1) we propose a novel text-alignment layer that allows it to precisely compute convolutional features of a text instance in arbitrary orientation, which is the key to boost the performance; (2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large ...
Matching two texts is a fundamental problem in many natural language processing tasks. An effective ...
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-langua...
Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of...
Text recognition has attracted considerable research interests because of its various applications. ...
Driven by deep learning and a large volume of data, scene text recognition has evolved rapidly in re...
In spite of significant research efforts, the existing scene text detection methods fall short of th...
Text detection and recognition in natural images have conventionally been seen in the prior art as a...
This thesis addresses the problem of text spotting - being able to automatically detect and recognis...
Automatic text localization and segmentation in a normal environment with vertical or curved texts a...
The real-life scene images exhibit a range of variations in text appearances, including complex shap...
Accurate and efficient text detection in the natural scene is a fundamental yet challenging task in ...
Although an automated reader for the blind first appeared nearly two-hundred years ago, computers ca...
Recognizing texts in images plays an important role in many applications, such as industrial intelli...
We study a novel multimodal-learning problem, which we call text matching: given an image containing...
This paper addresses the problem of detecting and recognizing text in images acquired ‘in the wild’....
Matching two texts is a fundamental problem in many natural language processing tasks. An effective ...
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-langua...
Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of...
Text recognition has attracted considerable research interests because of its various applications. ...
Driven by deep learning and a large volume of data, scene text recognition has evolved rapidly in re...
In spite of significant research efforts, the existing scene text detection methods fall short of th...
Text detection and recognition in natural images have conventionally been seen in the prior art as a...
This thesis addresses the problem of text spotting - being able to automatically detect and recognis...
Automatic text localization and segmentation in a normal environment with vertical or curved texts a...
The real-life scene images exhibit a range of variations in text appearances, including complex shap...
Accurate and efficient text detection in the natural scene is a fundamental yet challenging task in ...
Although an automated reader for the blind first appeared nearly two-hundred years ago, computers ca...
Recognizing texts in images plays an important role in many applications, such as industrial intelli...
We study a novel multimodal-learning problem, which we call text matching: given an image containing...
This paper addresses the problem of detecting and recognizing text in images acquired ‘in the wild’....
Matching two texts is a fundamental problem in many natural language processing tasks. An effective ...
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-langua...
Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of...