Abstract:
Video classification is more difficult than image classification since additional motion feature between image frames and amount of redundancy in videos should be taken i...Show MoreMetadata
Abstract:
Video classification is more difficult than image classification since additional motion feature between image frames and amount of redundancy in videos should be taken into account. In this work, we proposed a new deep learning architecture called recurrent convolutional neural network (RCNN) which combines convolution operation and recurrent links for video classification tasks. Our architecture can extract the local and dense features from image frames as well as learning the temporal features between consecutive frames. We also explore the effectiveness of sequential sampling and random sampling when training our models, and find out that random sampling is necessary for video classification. The feature maps from our learned model preserve motion from image frames, which is analogous to the persistence of vision in human visual system. We achieved 81.0% classification accuracy without optical flow and 86.3% with optical flow on the UCF-101 dataset, both are competitive to the state-of-the-art methods.
Date of Conference: 11-15 July 2016
Date Added to IEEE Xplore: 29 August 2016
ISBN Information:
Electronic ISSN: 1945-788X