Preference-based Monte Carlo Tree Search

Applicant Professor Dr. Johannes Fürnkranz

Subject Area Image and Language Processing, Computer Graphics and Visualisation, Human Computer Interaction, Ubiquitous and Wearable Computing

Term from 2015 to 2021

Project identifier Deutsche Forschungsgemeinschaft (DFG) - Project number 280805726

Final Report Year 2020

Final Report Abstract

The main outcome of this project are algorithms that allow to employ Monte-Carlo tree search, a very successful strategy for solving sequential decision problems, to scenarios where numerical feedback is not naturally available or has to carefully tuned. Instead, we only assume the availability of pairwise comparisons between diﬀerent lines of actions (in the general case), or, more speciﬁcally, the availability of an ordering of the available reward signals, but without the use of numerical distances. We developed a proof-of-concept implementation which works for the general case, but was expectedly too ineﬃcient for practical purposes, the main problem being the quadratic number of comparisons that has to be performed. To tackle this problem, we developed an algorithm that works for the case of ordinal preference signals, an important subclass of the problem with many practical applications. For this setting, we worked out an eﬃcient base algorithm, which we subsequently improved. As the computational complexity of this algorithm crucially depended on the number of diﬀerent ordinal feedback levels, a key idea for improvement was the an on-line algorithm that allows to dynamically bin feedback levels into a set of ordinal buckets, thereby simplifying the decision problem via abstraction. We showed that our ordinal MCTS algorithm could obtain competitive results in various games of the general video game playing AI competition (GVGAI), where we have developed one of the leading systems. We also worked on employing Monte Carlo tree search in sequential recommendation problems, but developed a slightly diﬀerent solution for this scenario, where iterative exploration was limited by the oﬀ-line available data points. The key idea to this approach was to ﬁrst abstract available preferences into a numerical utility function, which could then be used by a conventional MCTS variant for eﬃciently selecting the best sequence and hence a product that maximizes this learned utility. Finally, we also transposed our framework for handling ordinal feedback values to the related problem of reinforcement learning, which has – in the context of deep learning – received increased attention recently. The main result of this line of work was the ﬁrst algorithm for deep ordinal reinforcement learning, for which we could show that in many cases, it is competitive to its numerical counterparts even though it uses only the order information of the received feedback signals, and is thus much robust and less sensitive to a careful tuning of a numerical reward signal.

Publications

Informed hybrid game tree search for general video game playing. IEEE Transactions on Games, 10(1):78–90, March 2018
Joppen, T., Moneke, M., Schröder, N., Wirth, C., and Fürnkranz, J.
(See online at https://doi.org/10.1109/TCIAIG.2017.2722235)
Monte Carlo tree search without numerical rewards. In From Multiple Criteria Decision Aid to Preference Learning (Proc. of DA2PL- 18), Poznan, Poland, 2018
Joppen, T.
Preference-based Monte Carlo tree search. In Trollmann, F. and Turhan, A.-Y. (eds.), Proceedings of the 41st German Conference on Artiﬁcial Intelligence (KI-18), volume 11117 of Lecture Notes in Computer Science, pp. 327–340, Berlin, Germany, 2018. Springer
Joppen, T., Wirth, C., and Fürnkranz, J.
(See online at https://doi.org/10.1007/978-3-030-00111-7_28)
Ordinal bucketing for game trees using dynamic quantile approximation. In Proceedings of the IEEE Conference on Games (CoG-19), London, United Kingdom, 2019. IEEE
Joppen, T., Strübig, T., and Fürnkranz, J.
(See online at https://doi.org/10.1109/CIG.2019.8847965)
Personalized transaction kernels for recommendation using MCTS. In Benzmüller, C. and Stuckenschmidt, H. (eds.), Proceedings of the 42nd German Conference on AI (KI-19), volume 11793 of Lecture Notes in Computer Science, pp. 338–352, Kassel, Germany, 2019. Springer
Tavakol, M., Joppen, T., Brefeld, U., and Fürnkranz, J.
(See online at https://doi.org/10.1007/978-3-030-30179-8_31)
Deep ordinal reinforcement learning. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML- PKDD-19), 2019
Zap, A., Joppen, T., and Fürnkranz, J.
(See online at https://doi.org/10.1007/978-3-030-46133-1_1)

Servicenavigation

Hauptnavigation

Preference-based Monte Carlo Tree Search

Final Report Abstract

Publications

Additional Information

Servicenavigation

Hauptnavigation

Preference-based Monte Carlo Tree Search

Final Report Abstract

Publications

Additional Information

Textvergrößerung und Kontrastanpassung