Scheduled System Maintenance:
On Wednesday, July 29th, IEEE Xplore will undergo scheduled maintenance from 7:00-9:00 AM ET (11:00-13:00 UTC). During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

An approximation to the greedy algorithm for differential compression

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $31
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Agarwal, R.C. ; IBM Research Division, Almaden Research Center, 650 Harry Road, San Jose, California 95120, USA ; Gupta, K. ; Jain, S. ; Amalapurapu, S.

We present a new differential compression algorithm that combines the hash value techniques and suffix array techniques of previous work. The term “differential compression” refers to encoding a file (a version file) as a set of changes with respect to another file (a reference file). Previous differential compression algorithms can be shown empirically to run in linear time, but they have certain drawbacks; namely, they do not find the best matches for every offset of the version file. Our algorithm, hsadelta (hash suffix array delta), finds the best matches for every offset of the version file, with respect to a certain granularity and above a certain length threshold. The algorithm has two variations depending on how we choose the block size. We show that if the block size is kept fixed, the compression performance of the algorithm is similar to that of the greedy algorithm, without the associated expensive space and time requirements. If the block size is varied linearly with the reference file size, the algorithm can run in linear time and constant space. We also show empirically that the algorithm performs better than other state-of-the-art differential compression algorithms in terms of compression and is comparable in speed.

Note: The Institute of Electrical and Electronics Engineers, Incorporated is distributing this Article with permission of the International Business Machines Corporation (IBM) who is the exclusive owner. The recipient of this Article may not assign, sublicense, lease, rent or otherwise transfer, reproduce, prepare derivative works, publicly display or perform, or distribute the Article.  

Published in:

IBM Journal of Research and Development  (Volume:50 ,  Issue: 1 )