Scalarization based Pareto optimal set of arms identification algorithms Host Publication: 2014 International Joint Conference on Neural Networks (IJCNN) Authors: M. Drugan and A. Nowé Publication Date: Jun. 2014
Abstract: Multi-objective multi-armed bandits (MOMAB) is an extension of the multi-armed bandits framework that considers reward vectors instead of scalar reward values. Scalarization functions transform the reward vectors into reward values in order to use the standard multi-armed bandits (MAB) algorithms. However for many applications it is not obvious to come up with a good scalarization set and therefore there is needed to develop MAB that discover the whole Pareto set of arms. Our approach to this multi-objective MAB problem is two folded: i) identify the set of Pareto optimal arms and ii) identify the minimum subset of scalarization functions that optimize the set of Pareto optimal arms. We experimentally compare the proposed MOMAB algorithms on a multi-objective Bernoulli problem.
|