The information revolution is spawning systems that require frequent decisions and provide high volumes of data that can be used to analyze past outcomes. Examples arise across consumer web services such as those that recommend content and products, match supply and demand, or educate human resources, across operational functions such as inventory management, marketing, pricing, and capacity allocation, and across physical systems such as robots, automated vehicles, and energy management systems. In each of these areas, changes in information technology are enabling the design of systems that are interactive, adaptive, and personalized. Capturing the immense potential economic and societal benefit requires sophisticated new algorithms that are capable of systematic experimentation to learn over time to make increasingly effective decisions.
This workshop offers an introduction to Thompson sampling, a concept that has become widely used over recent years as a foundation for such learning problems. The essential idea is simple: select actions by randomly sampling according to the probability that each is optimal. We will explore the range of problems to which such an approach can be applied and computational methods required for that. We will also cover theoretical analyses to develop an understanding of how and why Thompson sampling works. Problems addressed by Thompson sampling can be viewed as variations of the multi-armed bandit, and the first day of the workshop will offer background on the classical version of this problem and the celebrated Gittin’s index theorem. In the last day of the workshop, we will discuss problems where Thompson sampling performs very poorly, an information-theoretic view of the approach, and more complex ideas that might address shortcomings. Time permitting, we will also discuss exploration methods for reinforcement learning that have been inspired by Thompson sampling.