Loading [MathJax]/extensions/MathMenu.js
Two-Phase Multi-armed Bandit for Online Recommendation | IEEE Conference Publication | IEEE Xplore

Two-Phase Multi-armed Bandit for Online Recommendation


Abstract:

Personalized online recommendations strive to adapt their services to individual users by making use of both item and user information. Despite recent progress, the issue...Show More

Abstract:

Personalized online recommendations strive to adapt their services to individual users by making use of both item and user information. Despite recent progress, the issue of balancing exploitation-exploration (EE) [1] remains challenging. In this paper, we model the personalized online recommendation of e-commence as a two-phase multi-armed bandit problem. This is the first time that “big arm” and “small arm” are introduced into multi-armed bandit (MAB), and a two-stage strategy is adopted to provide target users with the most suitable recommendation list. In the first phase, MAB is used to obtain an item subset that users may be interested in from a large number of items. We use item categories as arms instead of individual items in existing related models to control the arm scale and reduce computational complexity. In the second phase, we directly use the items generated in the first phase as arms of MAB and obtain rewards through fine-grained implicit feedback from users. Empirical studies on three real-world datasets show that our proposed method TPBandit performs better than state-of-the-art bandit-based recommendation methods in several evaluation metrics such as Precision, Recall, and Hit Ratio. Moreover, the two-phase method improves the recommendation performance by nearly 50% compared to the one-phase method in the best case.
Date of Conference: 06-09 October 2021
Date Added to IEEE Xplore: 20 October 2021
ISBN Information:
Conference Location: Porto, Portugal

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.