An Efficient Temporal Feature Aggregation of Audio-Video Signals for Human Emotion Recognition | IEEE Conference Publication | IEEE Xplore