Epsilon-approximate Pareto optimal set of arms identification in multi-objective multi-armed bandits Host Publication: BENELEARN 2014 - 23rd annual Belgian-Dutch Conference on Machine Learning Authors: M. Drugan and A. Nowé Publication Date: May. 2014 Number of Pages: 8
Abstract: Many real-world stochastic environments are inherently multi-objective environments with multiple possibly conflicting objectives.
Techniques from multi-objective optimization are imported into the multi-armed bandits (MAB) problem for efficient exploration/exploitation mechanisms of reward vectors. We introduce the $\varepsilon$-approximate Pareto MAB algorithm that uses the $\varepsilon$-dominance relation such that its upper confidence bound does not depend on the number of best arms, an important feature for environments with relatively many optimal arms. We experimentally show that the $\varepsilon$-approximate Pareto MAB algorithms outperform the performance of the Pareto UCB1 algorithm on a multi-objective Bernoulli problem inspired by a real world control application.
|