Abstract:
The safety and reliability of urban intelligent transportation systems(ITS) largely depend on accurate and sufficient scene perception, especially detecting and understan...Show MoreMetadata
Abstract:
The safety and reliability of urban intelligent transportation systems(ITS) largely depend on accurate and sufficient scene perception, especially detecting and understanding text information such as traffic signs and road instructions. Urban scene text has arbitrary shape boundaries and overlapped instances, which is hard to represent in the spatial domain. To solve the above challenges, we contrive Discrete Cosine Transform Network(DCTNet), in which the discrete cosine transform boundary representation method is proposed to convert the scene text boundary in the spatial domain to the frequency domain, and then generate the exact scene text boundary through the inverse discrete cosine transformation, thus solving the geometric representation and overlapping problems of arbitrary shape boundaries. The backbone is designed based on Contrastive Language-Image Pre-Training(CLIP) to extract refined features. At the same time, the combined Deformable Convolutional Networks(DCN) in the detection head of DCTNet adaptively perceives categories and edge features. DCTNet has achieved SOTA performances on Total-Text and MSRA-TD500, corresponding to F1 of 90.8% and 89.6%, respectively. The high efficiency and accuracy of DCTNet provide a solid foundation for the scene perception of the intelligent transportation systems, and further improve the safety and reliability of ITS.
Published in: IEEE Transactions on Intelligent Transportation Systems ( Early Access )