Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

The shortest common superstring problem: average case analysis for both exact and approximate matching

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
En-hui Yang ; Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada ; Zhang, Z.

The shortest common superstring problem and its extension to approximate matching are considered in the probability model where each string in a given set has the same length and letters of strings are drawn independently from a finite set. In the exact matching case, several algorithms proposed in the literature are shown to be asymptotically optimal in the sense that the ratio of the savings resulting from the superstring constructed by each of these algorithms, that is the difference between the total length of the strings in the given set and the length of the superstring, to the optimal savings from the shortest superstring approaches in probability to 1 as the number of strings in the given set increases. In the approximate matching case, a modified version of the shortest common approximate matching superstring problem is analyzed; it is demonstrated that the optimal savings in this case is given approximately by nlogn/Il(Q,Q,2D), where n is the number of strings in the given set, Q is the probability distribution governing the selection of letters of strings, Il(Q,Q,2D) is the lower mutual information between Q and Q with respect to 2D, and D⩾0 is the distortion allowed in approximate matching. In addition, an approximation algorithm is proposed and proved asymptotically optimal

Published in:

Information Theory, IEEE Transactions on  (Volume:45 ,  Issue: 6 )