Abstract: Linear bandits and contextual bandits are online learning problems that have a wide range of applications, including recommendation systems, healthcare and many others. For instance, a search for “linear contextual bandit” on Google Scholar over the past five years yields over 17,000 papers. In contextual bandits, a learner interacts with an environment by observing side information (context), selecting an action from a set of actions, and observing a reward that is a noisy function of the selected action and observed context. The learner’s objective is to maximize the sum of rewards by estimating the reward function using previous interactions with the environment. The context allows for modeling side information in multiple setups to tailor the action to users with different side information (contexts).
Linear bandits assume the context is always the same (single context). Although more limited in applications, linear bandits are much better understood in theory than contextual linear bandits, and can in general achieve lower regret performance bounds. Surprisingly, in this talk, we show that the stochastic contextual problem can be solved as if it is a linear bandit problem. To accomplish this, we present a novel reduction framework that transforms any stochastic contextual linear bandit instance into a linear bandit instance. Our framework opens up a new way to approach stochastic contextual linear bandit problems, and enables significant savings in communication cost in distributed setups. Furthermore, it yields improved regret bounds in a number of instances. As a consequence of the reduction, we also propose the first computationally efficient batched algorithm for contextual linear bandits, with nearly optimal performance.
Bio: Osama A. Hanna is currently pursuing his Ph.D. degree in Electrical and Computer Engineering at the University of California, Los Angeles (UCLA). He received his BS and MS degrees in Electrical Engineering from the Faculty of Engineering Cairo University and Nile University in Egypt in 2014 and 2018 respectively. He received the Award of Excellence from Cairo University in 2014, as well as a Masters Fellowship and a graduate Research Assistantship from Nile University for the years 2014-2018. In 2017, he received the Marie Sklodowska-Curie Fellowship, and in 2018, the Electrical and Computer Engineering Department Fellowship from UCLA. His research interests are in online learning theory, information theory and algorithms.
Abstract: Linear bandits and contextual bandits are online learning problems that have a wide range of applications, including recommendation systems, healthcare and many others. For instance, a search for “linear contextual bandit” on Google Scholar over the past five years yields over 17,000 papers. In contextual bandits, a learner interacts with an environment by observing side information (context), selecting an action from a set of actions, and observing a reward that is a noisy function of the selected action and observed context. The learner’s objective is to maximize the sum of rewards by estimating the reward function using previous interactions with the environment. The context allows for modeling side information in multiple setups to tailor the action to users with different side information (contexts).
Linear bandits assume the context is always the same (single context). Although more limited in applications, linear bandits are much better understood in theory than contextual linear bandits, and can in general achieve lower regret performance bounds. Surprisingly, in this talk, we show that the stochastic contextual problem can be solved as if it is a linear bandit problem. To accomplish this, we present a novel reduction framework that transforms any stochastic contextual linear bandit instance into a linear bandit instance. Our framework opens up a new way to approach stochastic contextual linear bandit problems, and enables significant savings in communication cost in distributed setups. Furthermore, it yields improved regret bounds in a number of instances. As a consequence of the reduction, we also propose the first computationally efficient batched algorithm for contextual linear bandits, with nearly optimal performance.
Bio: Osama A. Hanna is currently pursuing his Ph.D. degree in Electrical and Computer Engineering at the University of California, Los Angeles (UCLA). He received his BS and MS degrees in Electrical Engineering from the Faculty of Engineering Cairo University and Nile University in Egypt in 2014 and 2018 respectively. He received the Award of Excellence from Cairo University in 2014, as well as a Masters Fellowship and a graduate Research Assistantship from Nile University for the years 2014-2018. In 2017, he received the Marie Sklodowska-Curie Fellowship, and in 2018, the Electrical and Computer Engineering Department Fellowship from UCLA. His research interests are in online learning theory, information theory and algorithms.