I. Introduction
Sorting is an essential function for many scientific and data processing applications. Extensive research has optimized multiple software sorting algorithms for general-purpose computing, thereby assisting applications to achieve higher performance. The need for higher performance has also motivated the migration of sorting algorithms into specialized hardware. However many of the assumptions made to increase performance on a general-purpose processor do not hold for custom hardware implementations. When directly translated to a hardware processor, the software algorithms can quickly degrade into a series of data retrievals, comparisons, swaps and writes; all problems that can be magnified in systems with low processor speeds, limited storage, disabled caches and high latency memory access times.