My main research interests lie in what I call interactive learning and decision making: the intersection of AI, machine learning and game theory. I try to generate fundamental knowledge about algorithms and models for complex tasks. In addition, I think about how such abstract models might be applied to challenging real-world tasks such as collaboration in multi-robot systems, optimization of traffic control systems, intelligent e-commerce agents, etc.
For more information about my research, look here.
For research opportunities, look here.
For more information about possible student projects (current TU Delft students), look here.
February 23rd, 2021: AAMAS’21: Difference Rewards Policy Gradients
At next AAMAS, Jacopo Castellini, Sam Devlin, Rahul Savani and myself, will present our work on combining difference rewards and policy gradient methods.
Main idea: for differencing the function needs to be quite accurate. As such doing differencing on Q-functions (as COMA) might not be ideal. We instead perform the differencing on the reward function, which may be known and otherwise easier to learn (stationary). Our results show potential for great improvements especially for larger number of agents.
February 19th, 2021: Are Multiple Agents the Solution, and not the Problem, to Non-Stationarity?
That is what we explore in our AAMAS’21 blue sky paper.
The idea is to explicitly model non-stationarity as part of an environmental shift game (ESG). This enables us to predict and even steer the shifts that would occur, while dealing with epistemic uncertainty in a robust manner.
February 19th, 2021: AAMAS’21 camready: AIP loss bounds
Our AAMAS’21 paper on loss bounds for influence-based abstraction is online.
In this paper, we derive conditions for ‘approximate influence predictors’ to give small value-loss when used in small (abstracted) MDPs. From these conditions we conclude that that learning such AIPs with cross-entropy loss seems sensible.
November 5th, 2020: NeurIPS Camready: Multi-agent active perception with prediction rewards
This paper shows that also in decentralized multiagent settings we can employ “prediction rewards” for active perception. (Intuitively leading to a type of voting that we try to optimize).
November 5th, 2020: NeurIPS Camready: Influence-Augmented Online Planning
The camready version of Influence-Augmented Online Planning for Complex Environments is now available.
In this work, we show that by learning approximate representations of influence, we can speed up online planning (POMCP) sufficiently to get better performance when the time for online decision making is constrained.
April 24th, 2020: AAMAS: Maximizing Information Gain via Prediction Rewards
This paper tackles the problem of active perception: taking actions to minimize one’s uncertainty. It further formalizes the link between information gain and prediction rewards, and uses this to propose a deep-learning approach to optimize active perception from a data set, thus obviating the need for a complex POMDP model.
- August 2017 Great news: I have won an ERC Starting Grant!
- August 2017 Our article on active perception is published in Autonomous Robots.
- May 2017 ICML accepted our paper ‘Learning in POMDPs with Monte Carlo Tree Search’
- May 2017 EPSRC will fund my ‘first grant’ proposal. I will be looking for a postdoc with interests in multiagent systems, deep/reinforcement learning.
- April 2017 a 1000 citations – yay! 😉
- April 2017 Our AAMAS paper is nominated for best paper!
- Feb 2017 Daniel Claes has been doing awesome work getting our Warehouse Commisioning planning methods to work on real robots:
- Feb 2017 MADP Toolbox 0.4.1 released!
- Feb 2017 AAMAS accepted our paper on Multi-Robot Warehouse Commisioning, yay!
- Oct 2016 My student Elise’s thesis on traffic light control via deep reinforcement learning will be presented at BNAIC. We also have a video that nicely illustrates the behavior of our novel deep Q-learning+ transfer planning approach. (also described in this demo paper). A NIPS workshop paper is under submission, let me know if you are interested.
- Aug 2016 I am co-organising the NIPS workshop on Learning, Inference and Control of Multi-Agent Systems taking place Friday 9th December 2016, Barcelona, Spain. Join us!
- Jul 2016 The book on Dec-POMDPs I wrote with Chris Amato got published! Check it out here or at Springer.
- Apr 2016 Our paper on PAC Greedy Maximisation with Efficient Bounds on Information Gain for Sensor Selection is accepted at IJCAI. I will also present the poster at AAMAS in Singapore.
- Dec 2015 AAAI ’16 has accepted 4 of my papers and they are now on my publication page.
- Sept 2015 I will give an invited talk at the NIPS 2015 Workshop on Learning, Inference and Control of Multi-Agent Systems, Montreal
- July 2015 I am coorganizing the AAAI spring symposium on Challenges and Opportunities in Multiagent Learning for the Real World. While multiagent learning has been an active area of research for the last two decades, many challenges (such as uncertainty, partial observability, communication limitations, etc.) remain to be solved…!
- June 2015 I am coorganizing the Sequential Decision Making for Intelligent Agents AAAI fall symposium. We aim to bring together researchers that work on formal decision making methods (MDPs, POMDPs, Dec-POMDPs, I-POMDPs, etc.)
- April 2015 MADP v0.3.1 released!
- April 2015 Yay – IJCAI has accepted three of my papers!
- February 2015 Our article Computing Convex Coverage Sets for Faster Multi-Objective Coordination was accepted for publication in the Journal of AI Research.
- Januari 2015 Our work on computing upper bounds for factored Dec-POMDPs will appear at AAMAS as an extended abstract. Check out the extended version.
- Januari 2015 AAMAS has accepted our work on spatial task allocations problems (SPATAPS), see here.
- November 2014 AAAI has accepted two of my papers! One is about exploiting factored value functions in POMCP. The other investigates the use of sub-modularity in the full sequential POMDP setting.
- October 2014 A recent development in Dec-POMDP town is that these beasts can be reduced to a special case of (centralized) POMDPs. Chris and I decided it was time to try and give a quick overview of this approach in this technical report.
- July 2014 I started as Lecturer at the University of Liverpool.
- May 2014 It’s been a while, but we are releasing a new version of the MADP Toolbox ! Download it now: here. We are interested in your feedback, so let us know if you have problems or suggestions.
- April 2014 Chris Amato and I are doing interesting things on Bayesian reinforcement learning for MASs under state uncertainty. See our MSDM paper and technical report for more information.
- December 2013 Our paper Bounded Approximations for Linear Multi-Objective Planning under Uncertainty was accepted for publication at ICAPS.
- December 2013 Good news from AAMAS: A POMDP Based Approach to Optimally Select Sellers in Electronic Marketplaces and Linear Support for Multi-Objective Coordination Graphs both got accepted as full papers!
- October 2013 Our paper Effective Approximations for Spatial Task Allocation Problems was runner up for best paper at BNAIC.
- July 2013 I have been awarded a Veni grant for three years of research!
- June 2013 Presentations of AAMAS and an invited talk at CWI can now be found under Research highlights.
- April 2013 My paper Sufficient Plan-Time Statistics for Decentralized POMDPs is accepted at IJCAI 2013!
- April 2013 Christopher Amato and I are working on Bayesian RL for multiagent systems under uncertainty. See our paper here.
- February 2013 JAIR has accepted our article Incremental Clustering and Expansion for Faster Optimal Planning in Decentralized POMDPs for publication!
- December 2012 Our paper Approximate Solutions for Factored Dec-POMDPs with Many Agents is accepted as a full paper at AAMAS 2013!
- December 2012 Our paper Multi-Objective Variable Elimination for Collaborative Graphical Games is accepted as an extended abstract at AAMAS 2013.
- June 2012 Our paper Exploiting Structure in Cooperative Bayesian Games was accepted at UAI 2012!
- June 2012 Yay! I won the best PC member award at AAMAS 2012!
- June 2012 The 7th MSDM workshop was very interesting! Check the proceedings at the MSDM website.
- April 2012 Two of my papers will appear at AAAI!
- January 2012 My book chapter on decentralized POMDPs is finally published! It can be used as a introduction to Dec-POMDPs and also provides insight on how the current state-of-the-art algorithms (forward heuristic search and backward dynamic programming) for finite-horizon Dec-POMDPs relate to each other.
- December 2011 Two of my papers will appear at AAMAS: a full paper about heuristic search of multiagent influence and an extended abstract about effective method for performing the vector-based backup in settings with delayed communication.
- July 2010 I’ve started at MIT on July 15th!
- Look my thesis is on google books!
- February 2010 On February 12th 2010 I successfully defended my thesis, titled “Value-Based Planning for Teams of Agents in Stochastic Partially Observable Environments”.
- Prashant Doshi and his students Christopher Jackson and Kennth Bogert used the multiagent decision project (MADP) toolbox to compute policies in the game of StarCraft. Watch the (pretty cool) video.