Skip to Main Content
Hadoop Distributed File System (HDFS) is widely used in large-scale data storage and processing. HDFS uses MapReduce programming model for parallel processing. The work presented in this paper proposes a novel Hadoop plugin to process image files with MapReduce model. The plugin introduces image related I/O formats and novel classes for creating records from input files. HDFS is especially designed to work with small number of large size files. Therefore, the proposed technique is based on merging multiple small size files into one large file to prevent the performance loss stemming from working with large number of small size files. In that way, each task becomes capable of processing multiple images in a single run cycle. The effectiveness of the proposed technique is proven by an application scenario for face detection on distributed image files.