Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments

This page provides some information on the work I did for my PhD and the resulting thesis, titled Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments.


The following people where involved in my supervision:


A key requirement of decision support systems is the ability to reason about uncertainty. This is a complex problem, especially when multiple decision makers are involved. For instance, consider a team of fire fighting agents whose goal is to extinguish a large fire in a residential area using only local observations. In this case, the environment is stochastic because the agents may be uncertain with respect to:

  1. the effect of their actions,
  2. the true state of the environment, and
  3. the actions the other agents take.

These uncertainties render the problem computationally intractable. In this thesis such decision-making problems are formalized using a stochastic discrete-time model called decentralized partially observable Markov decision process (Dec-POMDP).

The first part of this thesis describes a value-based (i.e. based on value functions) approach for Dec-POMDPs, making use of Bayesian games. In the second part, different forms of structure in this approach are identified and exploited to realize better scaling behavior.

In a bit more detail

Making decisions is hard. Even though humans are very good at making many decisions, human decision making may be substantially improved when assisted by intelligent decision support systems. This thesis focuses on complex decision problems and particularly on situations where there are multiple decision makers, or agents that act under various forms of uncertainty:

  1. Outcome uncertainty - the agents can not perfectly predict the effect of their actions.
  2. State uncertainty - the agents may be uncertain about the state of their environment.
  3. Uncertainty with respect to the actions of other agents.

Decision-theoretic planning (DTP), compactly specifies sequential decision problems and provides methods to generate plans of which the quality can be measured. This thesis focuses on decision-theoretic planning for teams of cooperative agents that are situated in a stochastic, partially observable environment. It adopts the decentralized partially observable Markov decision process (Dec-POMDP) as its central framework because it is an expressive model that allows for the principled treatment of the uncertainty faced by such teams.

This thesis aims at advancing planning for Dec-POMDPs by

  1. extending the theory of Dec-POMDPs and establishing further connections with the area of game theory, (chapter 3&4)
  2. proposing optimal and approximate solution methods that scale better with respect to the number of agents (chapter 5), and
  3. proposing optimal solution methods that scale better with respect to the planning horizon (chapter 6).

Additionally, chapter 2 provides an extensive introduction to decision-theoretic planning methods for MASs. In the first part, the necessary game-theoretic background is provided, covering both one-shot and sequential decision models. The second part gives a formal introduction to the Dec-POMDP model and solution methods.

Get your copy

Interested? You can download a copy here.

It should also be able to get a hardcopy.

  • It is is on google books.
  • The publisher should print it on demand, see the Amsterdam University Press.
  • If you happen to be co-located in space-time to both me and the stack of copies I have left, I can give you one.


    author =    {Frans A. Oliehoek},
    title =     {Value-Based Planning for Teams of Agents in
                Stochastic Partially Observable Environments},
    school =    {Informatics Institute, University of Amsterdam},
    year =      {2010},
    month =     feb,