A Revised Approach for Risk-Averse Multi-Armed Bandits under CVaR Criterion

Najakorn Khajonchotpanya,
Yilin Xue,
Napat Rujeerapaiboon,

This research is supported by the Ministry of Education, Singapore, under its 2019 Academic Research Fund Tier 3 grant call (Award ref: MOE-2019-T3-1-010)
ABSTRACT

We study multi-armed bandit problems that use conditional value-at-risk as an underlying risk measure. In particular, we propose a new upper confidence bound algorithm and compare it with the state-of-the-art alternatives with respect to various definitions of regret from the risk-averse online learning literature. For each comparison, we demonstrate that our algorithm achieves either a strictly better or a comparable regret bound. Finally, we complement our theoretical findings by a numerical experiment to showcase the competitiveness of the proposed algorithm.