Conferences >2024 37th SIBGRAPI Conference...

A Multimodal Frame Sampling Algorithm for Semantic Hyperlapses with Musical Alignment

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Producing visually engaging and semantically meaningful hyperlapses presents unique challenges, particularly when integrating an audio track to enhance the watching exper...Show More

Metadata

Abstract:

Producing visually engaging and semantically meaningful hyperlapses presents unique challenges, particularly when integrating an audio track to enhance the watching experience. This paper introduces a novel multimodal algorithm to create hyperlapses that optimize semantic content retention, visual stability, and the alignment of playback speed to the liveliness of an accompanying song. We use object detection to estimate the semantic importance of each frame and analyze the song's perceptual loudness to determine its liveliness. Then, we align the most important segments of the video—where the hyperlapse slows down—with the quieter parts of the song, signaling a shift in attention from the music to the video. Our experiments show that our approach outperforms existing methods in semantic retention and loudness-speed correlation, while maintaining comparable performance in camera stability and temporal continuity.

Published in: 2024 37th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)

Date of Conference: 30 September 2024 - 03 October 2024

Date Added to IEEE Xplore: 18 October 2024

ISBN Information:

ISSN Information:

DOI: 10.1109/SIBGRAPI62404.2024.10716336

Conference Location: Manaus, Brazil

Contents

I. Introduction

Over the past two decades, recording daily activities has been made accessible with the advent of smartphones, wear-able devices, and personal action cameras, such as GoPro ™. Sharing photos and videos through social media services has also become commonplace, leading to an ever-growing accu-mulation of visual data competing for our attention. Hands-free recordings of daily activities often contain repetitive or irrelevant content because the wearer is focused on the activity itself rather than managing the camera, which can make the video unpleasant to watch. Egocentric video summarization aims to infer the intent of the wearer, reduce irrelevant content, and produce a summary that is pleasant to watch [1]. In particular, dynamic fast-forward methods assign semantic importance scores to the video according to domain-specific criteria, such as route guidance [2] or presence of people [3], which are used to lower the playback speed during important segments or raise it in unimportant segments, producing a representative summary video that has no gaps between scenes.

References is not available for this document.

A Multimodal Frame Sampling Algorithm for Semantic Hyperlapses with Musical Alignment

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

References

IEEE Account

Purchase Details

Profile Information

Need Help?

A Multimodal Frame Sampling Algorithm for Semantic Hyperlapses with Musical Alignment

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

I. Introduction

Authors

Figures

References

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?