I. Introduction
Over the past two decades, recording daily activities has been made accessible with the advent of smartphones, wear-able devices, and personal action cameras, such as GoPro ™. Sharing photos and videos through social media services has also become commonplace, leading to an ever-growing accu-mulation of visual data competing for our attention. Hands-free recordings of daily activities often contain repetitive or irrelevant content because the wearer is focused on the activity itself rather than managing the camera, which can make the video unpleasant to watch. Egocentric video summarization aims to infer the intent of the wearer, reduce irrelevant content, and produce a summary that is pleasant to watch [1]. In particular, dynamic fast-forward methods assign semantic importance scores to the video according to domain-specific criteria, such as route guidance [2] or presence of people [3], which are used to lower the playback speed during important segments or raise it in unimportant segments, producing a representative summary video that has no gaps between scenes.