Dominant scene text recognition models commonly contain two building blocks, a visual model for feature extraction and a sequence model for text transcription. This hybrid architecture, although accurate, is complex and less efficient. In this study, we propose a Single Visual model for Scene Text recognition within the patch-wise image tokenization framework, which dispenses with the sequential modeling entirely. The method, termed SVTR, firstly decomposes an image text into small patches named character components. Afterward, hierarchical stages are recurrently carried out by component-level mixing, merging and/or combining. Global and local mixing blocks are devised to perceive the inter-character and intra-character patterns, leading to...
Scene text recognition (STR) enables computers to recognize and read the text in various real-world ...
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-langua...
Although an automated reader for the blind first appeared nearly two-hundred years ago, computers ca...
The area of scene text recognition focuses on the problem of recognizing arbitrary text in images of...
Scene text recognition has inspired great interests from the computer vision community in recent yea...
Scene text recognition (STR) attracts much attention over the years because of its wide application....
Abstract—Understanding text captured in real-world scenes is a challenging problem in the field of v...
International audienceUnderstanding text captured in real-world scenes is a challenging problem in t...
Leveraging the advances of natural language processing, most recent scene text recognizers adopt an ...
Driven by deep learning and a large volume of data, scene text recognition has evolved rapidly in re...
Driven by the wide range of applications, scene text de-tection and recognition have become active r...
Abstract—Recognizing text in images taken in the wild is a challenging problem that has received gre...
In this work we present a framework for the recognition of natural scene text. We use purely data-dr...
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-langua...
Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recogn...
Scene text recognition (STR) enables computers to recognize and read the text in various real-world ...
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-langua...
Although an automated reader for the blind first appeared nearly two-hundred years ago, computers ca...
The area of scene text recognition focuses on the problem of recognizing arbitrary text in images of...
Scene text recognition has inspired great interests from the computer vision community in recent yea...
Scene text recognition (STR) attracts much attention over the years because of its wide application....
Abstract—Understanding text captured in real-world scenes is a challenging problem in the field of v...
International audienceUnderstanding text captured in real-world scenes is a challenging problem in t...
Leveraging the advances of natural language processing, most recent scene text recognizers adopt an ...
Driven by deep learning and a large volume of data, scene text recognition has evolved rapidly in re...
Driven by the wide range of applications, scene text de-tection and recognition have become active r...
Abstract—Recognizing text in images taken in the wild is a challenging problem that has received gre...
In this work we present a framework for the recognition of natural scene text. We use purely data-dr...
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-langua...
Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recogn...
Scene text recognition (STR) enables computers to recognize and read the text in various real-world ...
Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-langua...
Although an automated reader for the blind first appeared nearly two-hundred years ago, computers ca...