Abstract:
As the HPC industry moves to exascale-class systems and applications, on-chip and off-chip parallel communication has continued to pose scalability challenges. In particu...Show MoreMetadata
Abstract:
As the HPC industry moves to exascale-class systems and applications, on-chip and off-chip parallel communication has continued to pose scalability challenges. In particular, applications with shared data experience long data transfer latencies between cores, which negatively impacts execution time. Prior research has proposed communication protocols that proactively fetch data by building complex data sharing predictors that attempt to track, identify, and predict exact producer-consumer relationships. Due to the complexity of such predictors, these methods have not been adopted; consequently modern processors are not optimized for data sharing. In our research we show that an efficient communication protocol does not need exact identities of producers and consumers, but only information on whether shared data involves two participants, also known as single producer-single consumer data or involves many participants known as widely-shared data. This limited sharing information can be easily tracked and stored in processors with negligible area impact. Based on this insight, we propose CONCORD, an adaptive communication architecture that uses consumer count detection to build an adaptive data transfer. We show that CONCORD can improve performance on a diverse set of HPC applications by up to 9% with negligible impact on area.
Published in: 2019 IEEE/ACS 16th International Conference on Computer Systems and Applications (AICCSA)
Date of Conference: 03-07 November 2019
Date Added to IEEE Xplore: 16 March 2020
ISBN Information: