Skip to Main Content
We present a multicore-enabled smart storage for clusters in general and MapReduce clusters in particular. The goal of this research is to improve performance of data-intensive parallel applications on clusters by offloading data processing to multicore processors in storage nodes. Compared with traditional storage devices, next-generation disks will have computing capability to reduce computational load of host processors or CPUs. With the advance of processor and memory technologies, smart storage systems are promising devices to perform complex on-disk operations. The proposed smart storage system can avoid moving a huge amount of data back and forth between storage nodes and computing nodes in a cluster. To enhance the performance of data-intensive applications, we have designed a smart storage system called Multicore-enabled Smart Storage (McSD), in which a multicore processor is integrated in storage nodes. We have implemented a programming framework for data-intensive applications running on a computing system coupled with McSD. The programming framework aims at balancing load between computing nodes and multicore-enabled smart storage nodes. To fully utilize multicore processors in smart storage nodes, we have implemented the MapReduce model for McSDs to handle parallel computing on a cluster. A prototype of McSD has been implemented in a cluster connected by Gigabit Ethernet. Experimental results show that McSD can significantly reduce the execution times of three real-world applications - word count, string matching, and matrix multiplication. We demonstrate that the integration of multicore-enabled smart storage with MapReduce clusters is a promising approach to improving overall performance of data-intensive applications on clusters.