I. Introduction
Emerging processing-in-memory (PIM) systems attempt to overcome the memory-wall bottleneck by rethinking one of the core principles of computing systems: the separation of storage and logic units. This separation has been followed since the introduction of the von Neumann architecture in the 1940 s, when computing systems were primarily utilized for serial program execution. Yet, the recent emergence of data-intensive applications requires parallel high-throughput execution, causing the separation to become a massive bottleneck known as the memory wall [1]. Therefore, PIM integrates logic within the memory to bypass the bandwidth-limited memory interface and enable massive in-memory computational parallelism [2].