Improving Valence-Arousal Estimation with Spatiotemporal Relationship Learning and Multimodal Fusion | IEEE Conference Publication | IEEE Xplore