An Intelligent Authoring Tool for Non-Programmers Using the Informedia Video Library

Digital Library Initiative-2
IIS-9817527

Brad Myers, Albert Corbett, and Scott Stevens
Human Computer Interaction Institute
School of Computer Science
Carnegie Mellon University
5000 Forbes Avenue Pittsburgh
PA 15213-3891

Increasing amounts of digital video are available on the World-Wide Web and in digital libraries, but using this information is still quite difficult with today's tools. Whereas many researchers are studying searching and summarization techniques for text, images, and other material, there is very little work concentrating on video or on how to use the video once appropriate pieces are found. Our research will investigate how to create a comprehensive tool that will allow the general public to author interesting compositions using digital video. In particular, it will allow the user to specify sophisticated interactive behaviors for the videos and for extra graphical drawings (called synthetic graphics) layered on top of the videos. For example, users might specify which objects in the video can be clicked on to choose the next video clip, or that an arrow should be drawn that shows the path that an object will follow, or that the video is part of a lesson and a viewer's answer to a question determines the next action. Children and their teachers will be able to create interesting interactive compositions using videos. The goal is to make it as easy to use the video material found in a digital library as it is to use textual material found in today's libraries.

We will exploit the substantial capabilities provided by the Informedia Digital Video Library Project to provide advanced features for authoring multimedia compositions using video. We will create an Intelligent Video Editor that provides high-level tools for working with the source video. Informedia provides a transcript of the audio of all video, which can be used for searching, and also for selecting ranges of the video for copying and deleting. Intelligent controls will be included that jump backwards and forwards to the beginning and end of syntactically meaningful units, such as camera movements, fades, scene changes, etc. (rather than just moving by a fixed number of frames). We will incorporate algorithms that can identify and track objects within the video, such as a moving ball or car, which we will use to make it easy to select a sequence of frames where an object is visible, specify an action to take when the user clicks on an object, or even to introduce a synthetic graphic that follows an object in the video. We will also provide techniques for organizing sets of videos and video clips, such as outlines, hierarchies, and collections of clips.

Previous approaches to authoring of interactive behaviors for video and other multimedia presentations have used scripting languages based on conventional control structures which research shows are hard for non-programmers to use. To address this problem, we propose to integrate demonstrational and natural programming techniques into our tool. In a demonstrational interface, the user gives examples of the desired actions and results, and the system generates the code to perform the same actions at run time. Using demonstrational techniques has proven successful for specifying simple behaviors. The research challenge is to automatically generalize the user's actions so the code will work in various circumstances and therefore be more robust and useful. The code itself will be expressed in a more natural programming language. Human-factors studies have been performed to investigate how people naturally express algorithms. These studies have revealed some general principles which will be applied to the design of a new language, such as that a general case is often expressed first, with exceptions afterwards, and that loops are avoided by applying operations to sets of objects. Using these new results, along with results from the fields of Empirical Studies of Programmers and Human-Computer Interaction, we will create a language that is easier to learn and more effective to use. This will enable a wider range of people to read, generate and modify the code. However, it will not be sufficient to design a good language-a supporting environment will be required to help with editing and debugging of the code. By integrating tools for finding and organizing videos, video editing, demonstrating behaviors, writing code in a more natural programming language, and testing and debugging the code, an environment can be provided that will enable more people to author interesting compositions with video. The tools we create will be continuously tested with school children and adults to evaluate and refine the various features.