VideoMap: Video Editing in Latent Space
Video editing is a creative and complex endeavor and we believe that there is potential for reimagining a new video editing interface that provides better support for this process. We take inspiration from latent space exploration tools that help users find patterns and connections within complex datasets. We introduce VideoMap, a proof-of-concept video editing interface that operates on video frames projected onto a latent space. We support intuitive navigation through map-inspired navigational elements and facilitate transitioning between different latent spaces using swappable lenses. We built three VideoMap components to support editors in three common video editing tasks: organizing video footage, identifying suitable video transitions, and rapidly prototyping rough cuts. In a user study with both professionals and non-professionals (N=14), editors found that VideoMap provides a user-friendly editing experience, reduces tedious grunt work, enhances the overview capability of video footage, helps identify continuous video transitions, and enables a more exploratory approach to video editing. We further demonstrate the versatility of VideoMap by implementing three extended applications.
The figure shows a collection of videos organized under the
semantic lens. The editor can play through the videos by scrubbing the landmarks from left to right (a). Example semantic clusters of videos are shown in (b) for several videos containing streets and buildings and (c) for concert videos.
The editor can switch between different lenses to rearrange the layout. Using the
semantic lens, videos containing semantically similar concepts are close together. Using the
color lens, videos with similar color schemes are close together. Using the
shape lens, videos containing objects of similar shapes are close together.
The editor can filter videos using natural language prompts. Click on the arrows to check out different prompts.
The editor can select a video frame to display ten video transition suggestions based on the selected lens (a). For example, a video frame with a similar color composition is recommended under the
color lens (b).
Example transitions created by user study participants
The editor can select several video clips to automatically generate a rough cut video based on the selected lens (a). For example, the editor can create a video of a person running with various backgrounds under the
shape lens (b). Route Planner automatically finds the optimal video transitions (c1 and c2).
Example videos created by user study participants
VideoMap's Project Panel can be extended to create summary videos. We automatically create "semantic districts" that approximately represent the main activities of a video using k-means clustering under the
semantic lens. The editor can select several landmarks to specify the activities to include in the summary video. Selections are highlighted with red borders (a) and displayed as a storyboard (b).
VideoMap's Paths Explorer can be extended to create highlight videos. The editor can upload a photograph (i.e., a custom landmark) depicting an activity (e.g., skydiving) (a). Our key insight is that photographs taken by photographers tend to capture the most highlight-worthy moments of an activity (e.g., when the skydiver jumps out of the aircraft). We then generate a highlight video using near neighbor video frames to the custom landmark in the
semantic space (b).
Text-Based Video Editing
VideoMap's Route Planner can be extended to edit videos using text. The editor can describe a desired video using descriptive sentences, like writing a story (a). We then match each sentence to the closest video clip in the
semantic space and generate a video by finding the shortest route along the clips (b).