Cart (Loading....) | Create Account
Close category search window
 

A novel indexing scheme for efficient handling of small files in Hadoop Distributed File System

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

The purchase and pricing options are temporarily unavailable. Please try again later.
5 Author(s)
Chandrasekar, S. ; Dept. of Comput. Sci. & Eng., SSN Coll. of Eng., Kalavakkam, India ; Dakshinamurthy, R. ; Seshakumar, P.G. ; Prabavathy, B.
more authors

Hadoop Distributed File System (HDFS) is designed for reliable storage and management of very large files. All the files in HDFS are managed by a single server, the NameNode. NameNode stores metadata, in its main memory, for each file stored into HDFS. As a consequence, HDFS suffers a performance penalty with increased number of small files. Storing and managing a large number of small files imposes a heavy burden on the NameNode. The number of files that can be stored into HDFS is constrained by the size of NameNode's main memory. Further, HDFS does not take the correlation among files into account, and it does not provide any prefetching mechanism to improve the I/O performance. In order to improve the efficiency of storing and accessing the small files on HDFS, we propose a solution based on the works of Dong et al., namely Extended Hadoop Distributed File System (EHDFS). In this approach, a set of correlated files is combined, as identified by the client, into a single large file to reduce the file count. An indexing mechanism has been built to access the individual files from the corresponding combined file. Further, index prefetching is also provided to improve I/O performance and minimize the load on NameNode. The experimental results indicate that EHDFS is able to reduce the metadata footprint on NameNode's main memory by 16% and also improve the efficiency of storing and accessing large number of small files.

Published in:

Computer Communication and Informatics (ICCCI), 2013 International Conference on

Date of Conference:

4-6 Jan. 2013

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.