Publications

• Sorted by Date • Classified by Publication Type • Classified by Research Category •

Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning

Zuzanna Osika, Jazmin Zatarain-Salazar, Frans A Oliehoek, and Pradeep K Murukannaiah. Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning. In ECAI 2024 - 27th European Conference on Artificial Intelligence (ECAI), pp. 2919–2926, IOS Press, 2024.

Download

pdf [449.9kB]

Abstract

Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives. An MORL agent must make decisions based on the diverse signals provided by distinct reward functions. Training an MORL agent yields a set of solutions (policies), each presenting distinct trade-offs among the objectives (expected returns). MORL enhances explainability by enabling fine-grained comparisons of policies in the solution set based on their trade-offs as opposed to having a single policy. However, the solution set is typically large and multi-dimensional, where each policy (e.g., a neural network) is represented by its objective values. We propose an approach for clustering the solution set generated by MORL. By considering both policy behavior and objective values, our clustering method can reveal the relationship between policy behaviors and regions in the objective space. This approach can enable decision makers (DMs) to identify overarching trends and insights in the solution set rather than examining each policy individually. We tested our method in four multi-objective environments and found it outperformed traditional k-medoids clustering. Additionally, we include a case study that demonstrates its real-world application.

BibTeX Entry

@inproceedings{Osika24ECAI,
    title=      {Navigating Trade-offs: Policy Summarization for 
                 Multi-Objective Reinforcement Learning},
    author=     {Osika, Zuzanna and Zatarain-Salazar, Jazmin and 
                 Oliehoek, Frans A and Murukannaiah, Pradeep K},
    booktitle=  ECAI24,
    pages=      {2919--2926},
    year=       2024,
    publisher=  {IOS Press},
    keywords =  {refereed},
    abstract =  {
    Multi-objective reinforcement learning (MORL) is used to solve problems
    involving multiple objectives. An MORL agent must make decisions based
    on the diverse signals provided by distinct reward functions. Training
    an MORL agent yields a set of solutions (policies), each presenting
    distinct trade-offs among the objectives (expected returns). MORL
    enhances explainability by enabling fine-grained comparisons of
    policies in the solution set based on their trade-offs as opposed to
    having a single policy. However, the solution set is typically large
    and multi-dimensional, where each policy (e.g., a neural network) is
    represented by its objective values. We propose an approach for
    clustering the solution set generated by MORL. By considering both
    policy behavior and objective values, our clustering method can reveal
    the relationship between policy behaviors and regions in the objective
    space. This approach can enable decision makers (DMs) to identify
    overarching trends and insights in the solution set rather than
    examining each policy individually. We tested our method in four
    multi-objective environments and found it outperformed traditional
    k-medoids clustering. Additionally, we include a case study that
    demonstrates its real-world application.
    }
}

Generated by bib2html.pl (written by Patrick Riley) on Wed Jun 17, 2026 13:52:25 UTC