Skip to Main Content
Multicore Application-Specific Instruction-Set Processors (MCASIP) offer an interesting alternative for implementing parallel applications in MPSoCs. Flexible MCASIP architecture templates allow matching the instruction and task level parallelism provided by the processor to the requirements of the application at hand. The processing throughput provided by shared memory (SM) multicores is commonly limited by the SM bandwidth. Synchronizing the execution of multiple threads using lock variables residing in the SM further adds to the bottleneck. In this paper we present a technique to reduce the SM contention in the case of MCASIPs where application-specific hardware customization can be used. The proposed solution is to use customized Datapath Integrated Lock Units (DILU) that enable the implementation of light weight synchronization primitives which minimize SM traffic. The paper presents an experiment with a 48-core MCASIP which shows that the SM impact of the proposed fast barrier based on DILU in comparison to a basic SM polling one is up to 64% smaller. The size of the DILU hardware is negligible.