Skip to Main Content
With the rapid development of high-throughput DNA sequencing technologies, the amount of DNA sequence data is accumulating exponentially. The huge influx of data creates new challenges for storage and transmission. This paper proposes a novel adaptive particle swarm optimization-based memetic algorithm (POMA) for DNA sequence compression. POMA is a synergy of comprehensive learning particle swarm optimization (CLPSO) and an adaptive intelligent single particle optimizer (AdpISPO)-based local search. It takes advantage of both CLPSO and AdpISPO to optimize the design of approximate repeat vector (ARV) codebook for DNA sequence compression. ARV is first introduced in this paper to represent the repeated fragments across multiple sequences in direct, mirror, pairing, and inverted patterns. In POMA, candidate ARV codebooks are encoded as particles and the optimal solution, which covers the most approximate repeated fragments with the fewest base variations, is identified through the exploration and exploitation of POMA. In each iteration of POMA, the leader particles in the swarm are selected based on weighted fitness values and each leader particle is fine-tuned with an AdpISPO-based local search, so that the convergence of the search in local region is accelerated. A detailed comparison study between POMA and the counterpart algorithms is performed on 29 (23 basic and 6 composite) benchmark functions and 11 real DNA sequences. POMA is observed to obtain better or competitive performance with a limited number of function evaluations. POMA also attains lower bits-per-base than other state-of-the-art DNA-specific algorithms on DNA sequence data. The experimental results suggest that the cooperation of CLPSO and AdpISPO in the framework of memetic algorithm is capable of searching the ARV codebook space efficiently.