Project Details
Projekt Print View

Preference-based Monte Carlo Tree Search

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing
Term from 2015 to 2021
Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 280805726
 
Multi-armed bandits are a classical machine learning setting where the task is to find out which of a set of actions yields the highest long-term reward. In recent years, this setting has been extended into two directions: On the one hand, preference-based or dueling bandits replace the assumption of obtaining quantitative numeric feedback on the chosen actions with the weaker assumption of qualitative comparative feedback. On the other hand, Monte-Carlo tree search (MCTS) allows to extend the basic case of a single-state decision problem to the more realistic case of sequential decision problems.In this project, we will investigate preference-based Monte-Carlo tree search, which may both be viewed as an extension of MCTS to problems with qualitative feedback, as well as an extension of preference-based bandits to sequential decision problems. Building on novel methods for preference learning, the basic idea is to provide the MCTS agent with qualitative policy models, such as ranking functions. While a first adaptation of MCTS to such a preference-based setting seems straight-forward, there is a multitude of practical problems that have to be solved in this setting. Most notably, if each trial in a decision node does not probe one action but compares a pair of actions, a single roll-out experiment needs to obtain an exponential number of trajectories if done in a naive way. Our main objective is to solve these problems and develop theoretical and methodological foundations of prreference-based MCTS.In order to test the developed techniques in realistic settings, the theoretical work will be accompanied with two real-world case studies: general-game playing, where MCTS is the current state-of-the-art but a preference-based formulation seems to be a useful extension, and recommender systems, which we consider as sequential decision problems where exact numerical rewards are not available.
DFG Programme Research Grants
 
 

Additional Information

Textvergrößerung und Kontrastanpassung