Skip to Main Content
The detective board game of CLUEreg can be viewed as a benchmark example of the treasure hunt problem, in which a sensor path is planned based on the expected value of information gathered from targets along the path. The sensor is viewed as an information gathering agent that makes imperfect measurements or observations from the targets, and uses them to infer one or more hidden variables (such as, target features or classification). The treasure hunt problem arises in many modern surveillance systems, such as demining and reconnaissance robotic sensors. Also, it arises in the board game of CLUEreg, where pawns must visit the rooms of a mansion to gather information from which the hidden cards can be inferred. In this paper, Q-learning is used to develop an automated neural computer player that plans the path of its pawn, makes suggestions about the hidden cards, and infers the answer, often winning the game. A neural network is trained to approximate the decision-value function representing the value of information, for which there exists no general closed-form representation. Bayesian inference, test (suggestions), and action (motion) decision making are unified using an MDP framework. The resulting computer player is shown to outperform other computer players implementing Bayesian networks, or constraint satisfaction.