Abstract:
Constrained combinatorial contextual bandits have emerged as trending tools in intelligent systems and networks to model reward and cost signals under combinatorial decis...Show MoreMetadata
Abstract:
Constrained combinatorial contextual bandits have emerged as trending tools in intelligent systems and networks to model reward and cost signals under combinatorial decision-making. On one hand, both signals are complex functions of the context, e.g., in federated learning, training loss (negative reward) and energy consumption (cost) are nonlinear functions of edge devices’ system conditions (context). On the other hand, there are cumulative constraints on costs, e.g., the accumulated energy consumption should be budgeted by energy resources. Besides, real-time systems often require such constraints to be guaranteed anytime or in each round, e.g., ensuring anytime fairness for task assignment to maintain the credibility of crowdsourcing platforms for workers. This setting imposes a challenge on how to simultaneously achieve reward maximization while subjecting to anytime cumulative constraints. To address such challenge, we propose a primal-dual algorithm (Neural-PD) whose primal component adopts multi-layer perceptrons to estimate reward and cost functions, and its dual component estimates the Lagrange multiplier with the virtual queue. By integrating neural tangent kernel theory and Lyapunov-drift techniques, we prove Neural-PD achieves a sharp regret bound and a zero constraint violation. We also show Neural-PD outperforms existing algorithms with extensive experiments on both synthetic and real-world datasets.
Date of Conference: 17-20 May 2023
Date Added to IEEE Xplore: 29 August 2023
ISBN Information: