PSQ: An Automatic Search Framework for Data-Free Quantization on PIM-based Architecture | IEEE Conference Publication | IEEE Xplore
Scheduled Maintenance: On Monday, 30 June, IEEE Xplore will undergo scheduled maintenance from 1:00-2:00 PM ET (1800-1900 UTC).
On Tuesday, 1 July, IEEE Xplore will undergo scheduled maintenance from 1:00-5:00 PM ET (1800-2200 UTC).
During these times, there may be intermittent impact on performance. We apologize for any inconvenience.

PSQ: An Automatic Search Framework for Data-Free Quantization on PIM-based Architecture


Abstract:

Crossbar-based Process-In-Memory (PIM) architecture has been considered as a promising solution for Deep Neural Networks (DNNs) acceleration. Due to the ever increasing m...Show More

Abstract:

Crossbar-based Process-In-Memory (PIM) architecture has been considered as a promising solution for Deep Neural Networks (DNNs) acceleration. Due to the ever increasing model size and computational budget of DNNs, model compression is a critical step for the deployment of DNNs. However, when deploying DNNs in PIM architectures, fine-grained quantization on DNN weight matrices is not easy due to the inflexible data path inside the crossbar.To this end, in this paper, we study the feasibility and efficiency of a novel fine-grained quantization scheme called PSQ for PIM-based design. The scheme tightly combines the search principle of quantization and the PIM architecture to provide smooth hardware-friendly quantization. We leverage the weight locality and the variety of weight distributions in different blocks to facilitate the fine-grained quantization process. Meanwhile, we propose a lightweight search framework to adaptively allocate the quantization parameters (e.g., scale, bitwidth, etc.). During the search process, suitable quantization parameters are assigned directly to each fine-grained block, keeping the weight distributions before and after quantization as close as possible, thus minimizing the quantization errors. Our evaluation shows that the proposed PSQ achieves 3.5× reduction in occupied crossbars while the accuracy loss is negligible. What’s more, PSQ can perform such a process in just a few seconds on a single CPU, without model retraining and expensive computation.
Date of Conference: 06-08 November 2023
Date Added to IEEE Xplore: 22 December 2023
ISBN Information:

ISSN Information:

Conference Location: Washington, DC, USA

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.