FSConformer: A Frequency-Spatial-Domain CNN-Transformer Two-Stream Network for Compressed Video Action Recognition | IEEE Conference Publication | IEEE Xplore