Loading [a11y]/accessibility-menu.js
Harnessing the Effects of Process Variability to Mitigate Aging in Cloud Servers | IEEE Conference Publication | IEEE Xplore

Harnessing the Effects of Process Variability to Mitigate Aging in Cloud Servers


Abstract:

The increasing number of cores within a single chip enables more powerful cloud servers to exploit request-level parallelism better. However, this also leads to unforesee...Show More

Abstract:

The increasing number of cores within a single chip enables more powerful cloud servers to exploit request-level parallelism better. However, this also leads to unforeseen temperature issues, which may accelerate aging and cause soft-errors or even failures. Therefore, smartly managing temperature has become critical and unpredictable due to the inherent process variability - since the temperature will vary across the cores regardless if they operate at the same operating frequency. Based on that, we propose a framework to optimize cloud servers’ lifetime, which automatically distributes the workload among the cores and applies DVFS based on the applications’ behavior and systems status.
Date of Conference: 20-23 June 2023
Date Added to IEEE Xplore: 06 September 2023
ISBN Information:

ISSN Information:

Conference Location: Foz do Iguacu, Brazil
References is not available for this document.

I. Introduction

The growing software-as-a-service demand has increased the pressure on warehouse infrastructures to support more robust cloud services, which involve applications from many domains, such as machine learning, biomedical, and video/audio processing. In such clouds, the workloads are highly heterogeneous and often result from requests from different clients. In such systems, the main challenge lies in providing the service with the lowest possible latency and wisely using the available resources while exploiting Request-Level Parallelism (RLP).

Select All
1.
A. P. Shah and P. Girard, “Impact of aging on soft error susceptibility in cmos circuits,” in 2020 IEEE 26th IOLTS, 2020, pp. 1–4.
2.
S. Corbetta and W. Fornaciari, “Nbti mitigation in microprocessor designs,” in ACM Great Lakes Symp. on VLSI (GLSVLSI), ser. GLSVLSI ’12. New York, NY, USA : ACM, 2012, pp. 33–38.
3.
T. S. Medeiros, G. P. Berned, A. Navarro, F. D. Rossi, M. C. Luizelli, M. Brandalero, M. Hübner, A. C. S. Beck, and A. F. Lorenzon, “Aging-aware parallel execution,” IEEE Embedded Systems Letters, vol. 13, no. 3, pp. 122–125, 2021.
4.
A. F. Lorenzon and A. C. S. Beck Filho, “Parallel computing hits the power wall principles, challenges, and a survey of solutions.”.
5.
J. Schwarzrock, C. C. de Oliveira, M. Ritt, A. F. Lorenzon, and A. C. S. Beck, “A runtime and non-intrusive approach to optimize edp by tuning threads and cpu frequency for openmp applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 7, pp. 1713–1724, 2021.
6.
T. S. Medeiros, L. Pereira, F. D. Rossi, M. C. Luizelli, A. C. S. Beck, and A. F. Lorenzon, “Mitigating the processor aging through dynamic concurrency throttling,” JPDC, vol. 156, pp. 86–100, 2021.
7.
V. Rathore, V. Chaturvedi, A. K. Singh, T. Srikanthan, R. Rohith, S.-K. Lam, and M. Shaflque, “Himap: A hierarchical mapping approach for enhancing lifetime reliability of dark silicon manycore systems,” in DATE, 2018, pp. 991–996.
8.
S. Dighe, S. R. Vangal, P. Aseron, S. Kumar, T. Jacob, K. A. Bowman, J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V. K. De, and S. Borkar, “Within-die variation-aware dynamic-voltage-frequency-scaling with optimal core allocation and thread hopping for the 80-core teraflops processor,” IEEE Journal of Solid-State Circuits, vol. 46, no. 1, pp. 184–193, 2011.
9.
B. Raghunathan, Y. Turakhia, S. Garg, and D. Marculescu, “Cherry-picking: Exploiting process variations in dark-silicon homogeneous chip multi-processors,” in DATE, 2013, pp. 39–44.
10.
D. Stamoulis and D. Marculescu, “Can we guarantee performance requirements under workload and process variations? ” in ISLPED. New York, NY, USA : ACM, 2016, p. 308–313.
11.
D. Gnad, M. Shafique, F. Kriebel, S. Rehman, D. Sun, and J. Henkel, “Hayat: Harnessing dark silicon and variability for aging deceleration and balancing,” in 52nd ACM/EDAC/IEEE DAC. IEEE, 2015, pp. 1–6.
12.
T. R. Mück, Z. Ghaderi, N. D. Dutt, and E. Bozorgzadeh, “Exploiting heterogeneity for aging-aware load balancing in mobile platforms,” IEEE Transactions on Multi-Scale Computing Systems, vol. 3, no. 1, pp. 25–35, 2017.
13.
V. Rathore, V. Chaturvedi, A. K. Singh, T. Srikanthan, and M. Shafique, “Life guard: A reinforcement learning-based task mapping strategy for performance-centric aging management,” in 56th DAC, 2019, pp. 1–6.
14.
H. Lee, M. Shafique, and M. A. Al Faruque, “Aging-aware workload management on embedded gpu under process variation,” IEEE Transactions on Computers, vol. 67, no. 7, pp. 920–933, 2018.
15.
F. Oboril and M. B. Tahoori, “Extratime: Modeling and analysis of wearout due to transistor aging at microarchitecture-level,” in IEEE/IFIP Int. Conf. on Dependable Systems and Networks, 2012, pp. 1–12.
16.
S. Bhardwaj, W. Wang, R. Vattikonda, Y. Cao, and S. Vrudhula, “Predictive modeling of the nbti effect for reliable design,” in IEEE Custom Integrated Circuits Conf. 2006, 2006, pp. 189–192.
17.
H. Amrouch, V. M. van Santen, T. Ebi, V. Wenzel, and J. Henkel, “Towards interdependencies of aging mechanisms,” in IEEE/ACM Int. Conf. on Computer-Aided Design (ICCAD), 2014, pp. 478–485.
18.
J. A. Stratton, C. Rodrigues, I.-J. Sung, N. Obeid, L.-W. Chang, N. Anssari, G. D. Liu, and W.-m. W. Hwu, “Parboil: A revised benchmark suite for scientific and commercial throughput computing,” Center for Reliable and High-Performance Computing, vol. 127, 2012.
19.
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, “Rodinia: A benchmark suite for heterogeneous computing,” in IEEE ISWC. Ieee, 2009, pp. 44–54.

Contact IEEE to Subscribe

References

References is not available for this document.