Loading [MathJax]/extensions/TeX/boldsymbol.js
Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving | IEEE Journals & Magazine | IEEE Xplore

Exploiting Structured Feature and Runtime Isolation for High-Performant Recommendation Serving


Abstract:

Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommenda...Show More

Abstract:

Recommendation serving with deep learning models is one of the most valuable services of modern E-commerce companies. In production, to accommodate billions of recommendation queries with stringent service level agreements, high-performant recommendation serving systems play an essential role in meeting such daunting demand. Unfortunately, existing model serving frameworks fail to achieve efficient serving due to unique challenges such as 1) the input format mismatch between service needs and the model's ability and 2) heavy software contentions to concurrently execute the constrained operations. To address the above challenges, we propose RecServe, a high-performant serving system for recommendation with the optimized design of structured features and SessionGroups for recommendation serving. With structured features, RecServe packs single-user-multiple-candidates inputs by semi-automatically transforming computation graphs with annotated input tensors, which can significantly reduce redundant network transmission, data movements, and useless computations. With session group, RecServe further adopts resource isolations for multiple compute streams and cost-aware operator scheduler with critical-path-based schedule policy to enable concurrent kernel execution, further improving serving throughput. The experiment results demonstrate that RecServe can achieve maximum performance speedups of 12.3\boldsymbol{\times} and 22.0\boldsymbol{\times} compared to the state-of-the-art serving system on CPU and GPU platforms, respectively.
Published in: IEEE Transactions on Computers ( Volume: 73, Issue: 11, November 2024)
Page(s): 2474 - 2487
Date of Publication: 28 August 2024

ISSN Information:

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.