Videogenic: Video Highlight Generation via Photogenic Moments

Abstract

This paper investigates the challenge of extracting highlight moments out of videos. To perform this task, a system needs to understand what constitutes a highlight for a video domain while at the same time being able to scale across different domains. Our key insight is that photographs taken by photographers tend to capture the most remarkable or photogenic moments of an activity. Drawing on this insight, we present Videogenic, a system capable of creating domain-specific highlight videos for a wide range of domains. In a human evaluation study (N=50), we show that a high-quality photograph collection combined with encodings of CLIP, a neural network with semantic knowledge of images, can serve as an excellent prior for finding video highlights. In a within-subjects expert study (N=12), we demonstrate the usefulness of Videogenic in helping video editors create highlight videos with lighter workload, shorter task completion time, and better usability.

Example Results

For example, Videogenic highlights the officiant address of the wedding, the cars drifting, the skateboard kickflip, the graduation hat toss, the breakdance headspin, the bird carrying its prey, the weightlifter completing the clean and jerk, and so on.


Original videos:

Example Highlight Graphs

Some example highlight graphs and the reference photos used.


Skydiving

skydiving highlight graph

Wedding

wedding highlight graph

Fireworks

fireworks highlight graph

Breakdance

breakdance highlight graph

Rafting

rafting highlight graph

Limitation

A failure case is shown for a gaming video. The system can separate gaming scenes vs. scenes showing the players or title scenes. However, it may not always identify highlight moments that are specific to the game itself.

gaming highlight graph

Example Highlight Template

An example highlight template: Sheesh by Surfaces (upbeat, 10 seconds). Future creators may create new and more creative templates.


Example Manually-Edited Video

We evaluated our system with professional editors by asking them to created videos with both Videogenic and by manual editing with Premiere. As a comparison to the video created by Videogenic above, below is a video created by manual editing. Interestingly, the videos created with Videogenic excel at displaying highlight moments of activities and the videos created with manual editing have the edge of telling more of a story. For future editing workflows, Videogenic may support editors as a component for effectively surfacing highlight moments while editors direct the overall storytelling and creative cut decisions.


Demo