By Topic

Workload Characterization of a Parallel Video Mining Application on a 16-Way Shared-Memory Multiprocessor System

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

6 Author(s)

As video data become more and more pervasive, mining information from multimedia data sources becomes increasingly important, e.g., automatically extracting highlights from soccer game video content. However, the huge computation requirement of mining interested data limits its wide use in practice. Since the hardware imperative behind computer architecture is shifting from uniprocessors to multi-core processors, exploiting thread-level parallelism existing in multimedia mining applications is critical to utilizing the hardware resources and accelerating the complex processing of highlight events detection. In this paper we analyze the view type and playfield detection application, a widely used application in sports video mining systems, and we present several different schemes (task level, data-slicing-level, and a hybrid parallel scheme, as well as variations of the hybrid parallel scheme) for parallelizing this application. The hybrid parallel scheme, which exploits data-level and task-slicing-level parallelism, outperforms basic task-level and data-slicing-level schemes, delivering much better performance in terms of execution time and speedup. On a 16-way shared-memory multi-processing system with hardware prefetch enabled, the hybrid scheme achieves a speedup of 10.6x. Detailed performance analysis shows that because of the large working set, the workload often requires data from the off-chip memory. Therefore, the saturated bus bandwidth utilization is the likely cause of bottlenecks for achieving perfect scalability performance. With hardware prefetch enabled, the bus utilization rate on 16-processors system is about 76% for the hybrid scheme, and the projected bus bandwidth requirement for perfect scalability is about 3.1GB/s for 16 processors and 6.2 GB/s for 32 processors. In addition, our experiments also reveal that there are also no obvious scaling limiting factors, e.g., very low synchronization and load imbalance problems even with up to 16 processo- - rs

Published in:

Workload Characterization, 2006 IEEE International Symposium on

Date of Conference:

25-27 Oct. 2006