Skip to Main Content
Extensive research efforts have been made to enable real-time face detection, but a large amount of computation and excess memory access have been one of the main obstacles to improve the speed and accuracy of face detection. This paper proposes a novel hardware architecture for improving utilization of hardware resources as well as reducing memory bandwidth and access time. The proposed architecture allows image data loaded into a line buffer to be utilized three times for operations in different scales. The size of the line buffer is reduced by partitioning the input image into sub-images. In parallel execution of weak classifiers, the hardware is optimized for the feature that has two rectangles, which account for 88% of the total features. This optimization improves the utilization, and consequently, decreases the execution cycles. Compared with the previous architecture, the memory bandwidth, memory access time, execution cycles and line buffer size are reduced by 39.5%, 59.3%, 13.2% and 24.7%, respectively.