Videogenic: Video Highlight Generation via Photogenic Moments


This paper investigates the challenge of extracting highlight moments out of videos. To perform this task, a system needs to understand what constitutes a highlight for a video domain while at the same time being able to scale across different domains. Our key insight is that photographs taken by photographers tend to capture the most remarkable or photogenic moments of an activity. Drawing on this insight, we present Videogenic, a system capable of creating domain-specific highlight videos for a wide range of domains. In a human evaluation study (N=50), we show that a high-quality photograph collection combined with encodings of CLIP, a neural network with semantic knowledge of images, can serve as an excellent prior for finding video highlights. In a within-subjects expert study (N=12), we demonstrate the usefulness of Videogenic in helping video editors create highlight videos with lighter workload, shorter task completion time, and better usability.

Example Results

Original videos:

Example Highlight Template