Publications |
Excerpt from the introduction:
More and more visual information is available in digital form, in various places and on various media. The emergence of digital video and its proliferation in multimedia applications has created a significant demand for content-based
representation of visual information. Main purpose of video segmentation is to enable content-based representation by extracting objects of interest from a series of consecutive video frames. Briefly, the motivation behind video
segmentation can be categorized as the applications in indexing and retrieval, compression and coding, recognition, identification, and understanding of video scenes, editing, manipulation, and animation.
Video databases
on the market today allow only limited capability of or domain limited searching for video using characteristics like color, texture, and simpler motion statistics. If video can be stored in the form of individual objects, indexing
and retrieval of visual information is as simple as that of textual information. An essential tool in the management of visual records is the ability to automatically describe and index the content of video sequences in a meaningful
manner. Such a facility would allow recovery of desired video segments or objects from a very large database of video sequences. The efficient use of stock film archives and identification of specific activities in surveillance
videos are among the potential applications.
From a compression point of view, video segmentation is essential for object-based video coding standards, i.e. MPEG-4. Due to the vast data size of video sequences, communicating
digital video over the bandwidth limited network sources demands competent coding techniques. Having an object-based representation scheme that identifies the important parts of image frames, video sequences can also be encoded
efficiently to satisfy transmission requirements. Videoconferencing is one of the applications that benefit from object-based coding.
Video segmentation is key to many robotic vision applications. Most vision based autonomous
vehicles acquire information on their surroundings by analyzing video. Particularly, it is required for high-level image understanding and scene interpretation such as spotting and tracking of special events in surveillance video.
For instance, pedestrian and highway traffic can be regularized using density evaluations obtained by segmenting people and vehicles. By object segmentation, speeding and suspicious moving cars, road obstacles, strange activities
can be detected. Forbidden zones, parking lots, elevators can be monitored automatically. Gesture recognition as well as visual biometric extraction can be done for user interfaces.
With a good segmentation, it is possible
to access and manipulate objects in video. To illustrate, traffic enforcement currently employs supervised video segmentation tools to acquire identity of speeding or trespassing cars. Infotainment industry utilizes video segmentation
for editing, manipulating, and animation.
Although the human being can quickly interpret the embedded semantic content from the information carried by different modalities, computer understanding of visual information
is still in its primitive stage. Good segmentation tools are crucial to the success of the future standards. But tasks of automatically segmenting image sequences into semantic meaningful objects prove to be very challenging. We
have currently a reasonably good understanding of the basic mechanisms underlying visual information processing, still, many questions are still open to investigation, some desperately waiting for an answer.