Planning and learning for weakly-coupled distributed agents
Abstract
Stochastic planning has gained popularity over classical planning in recent years by offering principled ways to model the uncertainty inherent in many real world applications. In particular, Markov decision process (MDP) based approaches are particularly well suited to problems that involve sequential decision making. Partially observable stochastic games (POSGs) are a multi-agent extension of the MDP framework to decentralized decision making. While elegant and expressive, this model has been shown to be intractable. Many real world problems exhibit additional structure that can be leveraged for computational gain. The focus of this thesis is on a class of problems in which the agents are largely independent but may interact by constraining each other’s actions. The best-known solution concepts in game theory are that of computing Nash equilibrium and performing strategy elimination. This dissertation addresses both of these solution concepts. First, we show that finding a Nash equilibrium where each agent achieves reward at least k is NP-hard. Then, we present the Multi-Agent Action Elimination (MAAE) algorithm which performs iterated elimination of dominated strategies. We show experimentally that is is able to solve much larger problems than the general purpose algorithm by pruning a large portion of the policy space. In situations where the agent does not have a model of the environment, it has to learn to act with just the reward as feedback. In this dissertation, we explore reinforcement learning based approaches. In particular, we develop the Bound-Based Exploration with Soft Pruning algorithm (BBESP). It is a Q-learning based algorithm that carries over many of the key ideas from our planning algorithm. It uses learned bounds to perform soft pruning of actions as well as focusing an agent’s exploration. We show that this approach learns faster than traditional Q-learning on this class of problems.