Multi-Modal Video Summarization Based on Two-Stage Fusion of Audio, Visual, and Recognized Text Information | IEEE Conference Publication | IEEE Xplore