throttLL’eM: Predictive GPU Throttling for Energy Efficient LLM Inference Serving

throttLL’eM: Predictive GPU Throttling for Energy Efficient LLM Inference Serving | IEEE Conference Publication | IEEE Xplore