Abstract:
A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natur...Show MoreMetadata
Abstract:
A challenging aspect of scene text recognition is to handle text with distortions or irregular layout. In particular, perspective text and curved text are common in natural scenes and are difficult to recognize. In this work, we introduce ASTER, an end-to-end neural network model that comprises a rectification network and a recognition network. The rectification network adaptively transforms an input image into a new one, rectifying the text in it. It is powered by a flexible Thin-Plate Spline transformation which handles a variety of text irregularities and is trained without human annotations. The recognition network is an attentional sequence-to-sequence model that predicts a character sequence directly from the rectified image. The whole model is trained end to end, requiring only images and their groundtruth text. Through extensive experiments, we verify the effectiveness of the rectification and demonstrate the state-of-the-art recognition performance of ASTER. Furthermore, we demonstrate that ASTER is a powerful component in end-to-end recognition systems, for its ability to enhance the detector.
Published in: IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 41, Issue: 9, 01 September 2019)
Funding Agency:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Scene Text ,
- Text Recognizer ,
- Distortion ,
- Neural Network ,
- Input Image ,
- Recognition System ,
- Recognition Performance ,
- Natural Scenes ,
- Optical Character Recognition ,
- Recognition Network ,
- Thin-plate Spline ,
- Sequence Of Characters ,
- Convolutional Neural Network ,
- Decoding ,
- Convolutional Layers ,
- Feature Maps ,
- Local Network ,
- Recurrent Neural Network ,
- Sequence Features ,
- Attention Mechanism ,
- Control Points ,
- Detection Boxes ,
- Recognition Model ,
- Image Borders ,
- Fully-connected Layer ,
- Beam Search ,
- Attention Weights ,
- Google Street View ,
- Language Model ,
- Spatial Transformer Network
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Scene Text ,
- Text Recognizer ,
- Distortion ,
- Neural Network ,
- Input Image ,
- Recognition System ,
- Recognition Performance ,
- Natural Scenes ,
- Optical Character Recognition ,
- Recognition Network ,
- Thin-plate Spline ,
- Sequence Of Characters ,
- Convolutional Neural Network ,
- Decoding ,
- Convolutional Layers ,
- Feature Maps ,
- Local Network ,
- Recurrent Neural Network ,
- Sequence Features ,
- Attention Mechanism ,
- Control Points ,
- Detection Boxes ,
- Recognition Model ,
- Image Borders ,
- Fully-connected Layer ,
- Beam Search ,
- Attention Weights ,
- Google Street View ,
- Language Model ,
- Spatial Transformer Network
- Author Keywords