Online Preference Learning with Bandit Algorithms

Applicant Professor Dr. Eyke Hüllermeier, since 3/2017

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2017 to 2022

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 317046553

Project Description

In machine learning, the notion of multi-armed bandit (MAB) refers to a class of online learning problems, in which an agent is supposed to simultaneously explore and exploit a given set of choice alternatives in the course of a sequential decision process. Combining theoretical challenge with practical usefulness, MABs have received considerable attention in machine learning research in the recent past. This project is devoted to a variant of standard MABs that we refer to as the preference-based multi-armed bandit (PB-MAP) problem. Instead of learning from stochastic feedback in the form of real-valued rewards for the choice of single alternatives, a PB-MAB agent is allowed to compare pairs of alternatives in a qualitative manner. The goal of this project is twofold. First, by consolidating existing work and addressing a number of open theoretical questions and algorithmic problems, we wish to provide a complete and coherent understanding of the PB-MAB setting. Second, we shall develop methods for practically motivated extensions of this setting, namely, contextual PB-MABs that allow preferences between alternatives to depend on a decision context, and PB-MABs with generalized feedback that go beyond pairwise comparisons and permit preference information of different kind.

DFG Programme Research Grants

Ehemaliger Antragsteller Dr. Robert Busa-Fekete, until 2/2017

Servicenavigation

Hauptnavigation

Online Preference Learning with Bandit Algorithms

Additional Information

Servicenavigation

Hauptnavigation

Online Preference Learning with Bandit Algorithms

Additional Information

Textvergrößerung und Kontrastanpassung