Skip to Main Content
In essence, computers are tools to help us with our daily lives. CPUs are extension to our reasoning capability whereas disks are extensions to our memory. But the simple hierarchical namespace of existing file systems is inadequate in managing files today that have rich semantics. In this paper, we advocate the need for integrating semantic information into a storage system. We propose "Sedar", a deep archival file system. Sedar is one of the the first archival file systems that integrates semantic storage and retrieval capabilities. In addition, Sedar introduces several novel features: the notion of "semantic-hashing" to reduce the storage consumption that is robust against misalignment of documents; "virtual snapshot" of namespace, and "conceptual deletions" of files and directories. It exposes a semantic catalog that allows other semantic-based tools (e.g., visualization and statistical analysis) to be built. It uses a decentralized peer-to-peer storage utility enabling horizontal scalability.