student projects

Student Projects

I am very happy to supervise BSc / MSc student projects if they relate to my research interests.


Prospective and starting students: please have a look at my guidelines and resources.

What background / skills do I need for a project in ILDM?

If you are interested in working on ILDM, consider following these courses:

strongly recommended:

  • Artificial Intelligence Techniques
  • Algorithms for Intelligent Decision Making
  • Deep Learning
  • Machine Learning 1


  • Machine Learning 2
  • Information Theory
  • Advanced Algorithms
  • Evolutionary Algorithms
  • Seminar Research Methodology for Data Science

I select students based on their motivation to work in my field, and like to see some evidence that they will be able to actually do the research they would like to do. E.g.,

  • in case you would want to pick up an empirical question, I would want to be convinced somehow that you have the required coding skills.
  • If you want to do something theoretical, I would like to see a first analysis that you have done.

In particular, in case that you want to do something with deep (reinforcement) learning, you will need to show me evidence of successful implementation of deep learning project (in e.g., tensorflow or pytorch).

How to approach me?

When contacting me about a project, please include:

  • evidence that you have sufficient skills in required background (e.g., grade list).
  • a single page with 2 possible research ideas that you would be interested in.

If I think there could be a match, I will suggest talking to one of my team members to further explore. (I usually co-supervise MSc projects together with my postdocs and PhD students.)

What if I want to do a project with a company?

In general the university allows students to graduate in a company. For me to engage in such a construction, there needs to be some value in it. The rule of thumb is that the topic must be interesting to me, and there should still be some actual scientific value, but I am open to discuss.

In general, except for any companies below, you would need to find the company yourself.

Currently, I welcome students that are interested in exploring a thesis project at:

  • TNO, The Hague, in the area of intelligent traffic and transportation.
  • NS Reizigers, see description below.

Open projects

Intelligent Traffic Control

For instance, I would be quite keen to supervise student projects focusing on applying AI techniques in the open source ‘SUMO’ traffic simulator ( Examples of such projects would be:

  • Reinforcement Learning for large-scale Urban Traffic Light Control
  • Assessing the Impact of Improved Sensing for RL-Based Urban Traffic Light Control

Such projects would require good coding skills and a decent background in reinforcement learning or MDPs.

Deep Multi-Agent RL for Traffic Control

Training multiple agents at the same time to perform a task is a difficult challenge in RL. When learning, each individual agent might adapt to the behavior of the others. However, the fact that this behavior is continuously changing may create non-stationarities in the agents’ learning dynamics. The goal of this project is to evaluate how much of a problem this is when training independent agents to control the traffic lights in a city. In particular, we will:

  • Investigate the influence that different agents exert on each other. 
  • Study how that influence affects their learning curves.

This project combines reinforcement learning with game theory and influence-based abstraction.

Learning to solve train unit shunting problem with disturbances

Looking for a student who is interested doing a project with NS Reizigers.


The Train Unit Shunting Problem (TUSP) is a difficult sequential decision-making problem faced by Dutch Railways (NS). At a service site, several train units need to be inspected, cleaned, repaired, and parked during the night. Also, a specific type of train units needs to depart at a specific time.

At NS, a local search heuristic that evaluates all components of the Train Unit Shunting Problem simultaneously was developed to create shunt plans. The local search heuristic attempts to find a feasible shunt plan by
applying search operators in iterations to move through the search space. The heuristic uses several search operators to move through the search space. In every iteration, the search operators are shuffled in random order. This will be the order in which the operators will be evaluated. Starting with the first search operator, the heuristic evaluates the set of candidate solutions that can be reached through that search operator. A candidate solution is immediately accepted as new solution for the next iteration if it is an improvement over the current solution. If the candidate solution is worse, it is selected with a certain probability depending on the difference in solution quality and the progress of the search process. This probabilistic technique is called simulated annealing.

One challenging aspect of train unit shunting is that frequently, the problem specification will slightly change in the last minute. For example, a train might be delayed, causing the trains to arrive in a different order than expected. A facility might be suddenly unavailable, or some extra tasks might be added to a train due to technical problems. When disturbances occur, making a completely new plan is not desirable. Making a new plan might take a lot of time, but even if a new plan can be made, it would necessitate changing the anticipated operations by all train drivers, maintenance workers, etc. Instead, typically the existing plans are adapted in such a way that the adapted plan is expected to be not too far from the original plan.

In this project, we would like to explore making this approach more robust. For instance, it could be possible to apply (multi-agent) deep reinforcement learning to learn to guide the search operators of local search, to find the minimal number of changes (number of operators) applied to the original plan such that the adapted plan is feasible. Another direction could be to try and learn models of the (probability of) disturbances, andinvestigate ways to incorporate these in the fitness function directly.

Deep representation learning for RL

It is not always easy to understand what and how deep RL agents actually learn, and how this relates to their performance. One way to gain a better understanding is to think about what such agents should ideally learn. When we speak of what a deep RL agent learns, we thereby mean the internal representation that a neural network forms of the environment. That is, the activation patterns that arise in each hidden network layer as the result of feeding (histories of) observations to the network. Among the desirable properties of such an internal representation are that it should make only necessary distinctions between (histories) of observations, allow the agent to learn to act optimally, and enable generalization to new irrelevant feature values and modified dynamics. A representation that has these properties is one in which the distances between states are proportional to a bisimulation metric, which measures how “behaviorally different” states are. Some interesting questions to explore are:

  • Do different representation learning methods for RL (e.g. autoencoder-based, augmentation-based contrastive coding) cause such a representation to be learned, and if so, how quickly?
  • How do the representations learned by different model-free methods compare to such a representation, both during and at the end of training?
  • How can we formulate a scalable auxiliary loss that pushes networks to form such a representation?

Important thereby is also to consider how we can scalably compute how similar to this bisimulation-based representation the learned representations are to allow evaluations on commonly used benchmarks such as Mujoco.

Your topic

If you are interested, or have your own ideas for a project relating to reinforcement learning, multiagent learning, or adversarial learning, please contact me.