Skip to Main Content
Multicore and graphic processing units (GPUs) can be combined to efficiently implement signal-processing algorithms for communication systems, due to their parallel processing capabilities. This paper proposes a fully parallel fixed-complexity soft-output detector, which is suitable for GPU implementation and allows a considerable decrease in the computational time required for the data detection stage in multiple-input-multiple-output (MIMO) systems. A novel channel matrix preprocessing stage, based on column-norm ordering, is developed to efficiently match the multicore architecture. The throughput of the implementation is shown to outperform other recent implementations and to support some of the configurations in the long-term evolution (LTE) standard.