Ontology highlight
ABSTRACT:
SUBMITTER: Sledge IJ
PROVIDER: S-EPMC7512671 | biostudies-literature | 2018 Feb
REPOSITORIES: biostudies-literature
Sledge Isaac J IJ Príncipe José C JC
Entropy (Basel, Switzerland) 20180228 3
In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. I ...[more]