VideoMap: Video Editing in Latent Space
Abstract
Video editing is a creative and complex endeavor and we believe that there is potential for reimagining a new video editing interface that provides better support for this process. We take inspiration from latent space exploration tools that help users find patterns and connections within complex datasets. We introduce VideoMap, a proof-of-concept video editing interface that operates on video frames projected onto a latent space. We support intuitive navigation through map-inspired navigational elements and facilitate transitioning between different latent spaces using swappable lenses. We built three VideoMap components to support editors in three common video editing tasks: organizing video footage, identifying suitable video transitions, and rapidly prototyping rough cuts. In a user study with both professionals and non-professionals (N=14), editors found that VideoMap provides a user-friendly editing experience, reduces tedious grunt work, enhances the overview capability of video footage, helps identify continuous video transitions, and enables a more exploratory approach to video editing. We further demonstrate the versatility of VideoMap by implementing three extended applications.
Project Panel
The figure shows a collection of videos organized under the semantic
lens. The editor can play through the videos by scrubbing the landmarks from left to right (a). Example semantic clusters of videos are shown in (b) for several videos containing streets and buildings and (c) for concert videos.
Swappable Lenses
The editor can switch between different lenses to rearrange the layout. Using the semantic
lens, videos containing semantically similar concepts are close together. Using the color
lens, videos with similar color schemes are close together. Using the shape
lens, videos containing objects of similar shapes are close together.
Prompts
The editor can filter videos using natural language prompts. Click on the arrows to check out different prompts.
Paths Explorer
The editor can select a video frame to display ten video transition suggestions based on the selected lens (a). For example, a video frame with a similar color composition is recommended under the color
lens (b).
Example transitions created by user study participants
Route Planner
The editor can select several video clips to automatically generate a rough cut video based on the selected lens (a). For example, the editor can create a video of a person running with various backgrounds under the shape
lens (b). Route Planner automatically finds the optimal video transitions (c1 and c2).
Example videos created by user study participants
Extended Applications
Video Summarization
VideoMap's Project Panel can be extended to create summary videos. We automatically create "semantic districts" that approximately represent the main activities of a video using k-means clustering under the semantic
lens. The editor can select several landmarks to specify the activities to include in the summary video. Selections are highlighted with red borders (a) and displayed as a storyboard (b).
Video Highlight
VideoMap's Paths Explorer can be extended to create highlight videos. The editor can upload a photograph (i.e., a custom landmark) depicting an activity (e.g., skydiving) (a). Our key insight is that photographs taken by photographers tend to capture the most highlight-worthy moments of an activity (e.g., when the skydiver jumps out of the aircraft). We then generate a highlight video using near neighbor video frames to the custom landmark in the semantic
space (b).
Text-Based Video Editing
VideoMap's Route Planner can be extended to edit videos using text. The editor can describe a desired video using descriptive sentences, like writing a story (a). We then match each sentence to the closest video clip in the semantic
space and generate a video by finding the shortest route along the clips (b).