Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes

Krishnendu Chatterjee; Zuzana Křetínská; Jan Křetínský

doi:10.23638/LMCS-13(2:15)2017

Krishnendu Chatterjee ; Zuzana Křetínská ; Jan Křetínský - Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes

lmcs:3757 - Logical Methods in Computer Science, July 3, 2017, Volume 13, Issue 2 - https://doi.org/10.23638/LMCS-13(2:15)2017

Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision ProcessesArticle

Authors: Krishnendu Chatterjee ; Zuzana Křetínská ; Jan Křetínský

We consider Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) objectives. There exist two different views: (i) the expectation semantics, where the goal is to optimize the expected mean-payoff objective, and (ii) the satisfaction semantics, where the goal is to maximize the probability of runs such that the mean-payoff value stays above a given vector.
We consider optimization with respect to both objectives at once, thus unifying the existing semantics. Precisely, the goal is to optimize the expectation while ensuring the satisfaction constraint. Our problem captures the notion of optimization with respect to strategies that are risk-averse (i.e., ensure certain probabilistic guarantee). Our main results are as follows: First, we present algorithms for the decision problems which are always polynomial in the size of the MDP. We also show that an approximation of the Pareto-curve can be computed in time polynomial in the size of the MDP, and the approximation factor, but exponential in the number of dimensions. Second, we present a complete characterization of the strategy complexity (in terms of memory bounds and randomization) required to solve our problem.

Comment: Extended journal version of the LICS'15 paper

https://doi.org/10.23638/LMCS-13(2:15)2017

Source: arXiv.org:1502.00611

Volume: Volume 13, Issue 2

Published on: July 3, 2017

Imported on: July 3, 2017

Keywords: Computer Science - Logic in Computer Science

Licence: arXiv.org - Non-exclusive license to distribute

Funding:

Source : OpenAIRE Graph

Formal methodes for the design and analysis of complex systems; Code: Z 211
Quantitative Reactive Modeling; Funder: European Commission; Code: 267989
PUMA Programm- und Modell-Analyse; Funder: Deutsche Forschungsgemeinschaft; Code: 47140942/GRK 1480
Quantitative Graph Games: Theory and Applications; Funder: European Commission; Code: 279307
International IST Postdoctoral Fellowship Programme; Funder: European Commission; Code: 291734
Modern Graph Algorithmic Techniques in Formal Verification; Funder: European Commission; Code: P 23499

Classifications

Mathematics Subject Classification 2020¹

Sources:

[1] zbMATH Open.

Bibliographic References

20 Documents citing this article

Orna Kupferman;Noam Shenwald, 2025, Games with Weighted Multiple Objectives, Lecture notes in computer science, pp. 110-132, 10.1007/978-3-031-78709-6_6.

Wei Zhao;Wanwei Liu;Zhiming Liu;Tiexin Wang, 2025, A synthesis method for zero-sum mean-payoff asynchronous probabilistic games, Scientific Reports, 15, 1, pp. 2291, 10.1038/s41598-025-85589-9, https://doi.org/10.1038/s41598-025-85589-9.

Raphaël Berthon;Joost-Pieter Katoen;Tobias Winkler, 2024, Markov Decision Processes with Sure Parity and Multiple Reachability Objectives, Lecture notes in computer science, pp. 203-220, 10.1007/978-3-031-72621-7_14.

Roman Andriushchenko;Alexander Bork;Carlos E. Budde;Milan Češka;Kush Grover;et al., 2024, Tools at the Frontiers of Quantitative Verification, Lecture notes in computer science, pp. 90-146, 10.1007/978-3-031-67695-6_4.

Krishnendu Chatterjee;Joost-Pieter Katoen;Stefanie Mohr;Maximilian Weininger;Tobias Winkler, 2023, Stochastic games with lexicographic objectives, Formal Methods in System Design, 63, 1-3, pp. 40-80, 10.1007/s10703-023-00411-4, https://doi.org/10.1007/s10703-023-00411-4.

Grover, Kush;Křetínský, Jan;Meggendorfer, Tobias;Weininger, Maximilian, 2022, Anytime Guarantees for Reachability in Uncountable Markov Decision Processes, Leibniz-Zentrum für Informatik (Schloss Dagstuhl), 10.4230/lipics.concur.2022.11, https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.CONCUR.2022.11.

Maryam Eghbali‐Zarch;Reza Tavakkoli‐Moghaddam;Amir Azaron;Kazem Dehghan‐Sanej, 2021, An extended ϵ‐constraint method for a multiobjective finite‐horizon Markov decision process, International Transactions in Operational Research, 29, 5, pp. 3131-3160, 10.1111/itor.12989.

Jan Křetínský;Tobias Meggendorfer, 2018, Conditional Value-at-Risk for Reachability and Mean Payoff in Markov Decision Processes, arXiv (Cornell University), pp. 609-618, 10.1145/3209108.3209176, http://arxiv.org/abs/1805.02946.

Jan Křetínský;Tobias Meggendorfer;Salomon Sickert;Christopher Ziegler, 2018, Rabinizer 4: From LTL to Your Favourite Deterministic Automaton, Lecture notes in computer science, pp. 567-577, 10.1007/978-3-319-96145-3_30, https://doi.org/10.1007/978-3-319-96145-3_30.

Christoph Haase;Stefan Kiefer;Markus Lohrey, 2017, Computing quantiles in Markov chains with multi-dimensional costs, 2017 32nd Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), 10.1109/lics.2017.8005090.

Mickael Randour;Jean-François Raskin;Ocan Sankur, 2017, Percentile queries in multi-dimensional Markov decision processes, Formal Methods in System Design, 50, 2-3, pp. 207-248, 10.1007/s10703-016-0262-7.

Pranav Ashok;Krishnendu Chatterjee;Przemysław Daca;Jan Křetínský;Tobias Meggendorfer, 2017, Value Iteration for Long-Run Average Reward in Markov Decision Processes, Lecture notes in computer science, pp. 201-221, 10.1007/978-3-319-63387-9_10.

Krishnendu Chatterjee;Laurent Doyen, 2016, Perfect-Information Stochastic Games with Generalized Mean-Payoff Objectives, Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, pp. 247-256, 10.1145/2933575.2934513.

Krishnendu Chatterjee;Thomas A. Henzinger;Jan Otop, 2016, Quantitative Automata under Probabilistic Semantics, Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, pp. 76-85, 10.1145/2933575.2933588.

Krishnendu Chatterjee;Thomas A. Henzinger;Jan Otop, 2016, Quantitative Monitor Automata, Lecture notes in computer science, pp. 23-38, 10.1007/978-3-662-53413-7_2.

Tomáš Brázdil;Antonín Kučera;Petr Novotný, 2016, Optimizing the Expected Mean Payoff in Energy Markov Decision Processes, Lecture notes in computer science, pp. 32-49, 10.1007/978-3-319-46520-3_3.

Krishnendu Chatterjee;Zuzana Komarkova;Jan Kretinsky, 2015, Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes, 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science, pp. 244-256, 10.1109/lics.2015.32.

Lorenzo Clemente;Jean-Francois Raskin, 2015, Multidimensional beyond Worst-Case and Almost-Sure Problems for Mean-Payoff Objectives, Dépôt institutionnel de l'Université libre de Bruxelles (Université Libre de Bruxelles), pp. 257-268, 10.1109/lics.2015.33.

Mickael Randour;Jean-François Raskin;Ocan Sankur, 2015, Percentile Queries in Multi-dimensional Markov Decision Processes, Lecture notes in computer science, pp. 123-139, 10.1007/978-3-319-21690-4_8.

Vojtěch Forejt;Jan Krčál;Jan Křetínský, 2015, Controller Synthesis for MDPs and Frequency LTL $$_{\setminus \mathbf{G}\mathbf U}$$, Lecture notes in computer science, pp. 162-177, 10.1007/978-3-662-48899-7_12.

Sources : OpenCitations, OpenAlex & Crossref

Share and export

Consultation statistics

This page has been seen 2696 times.

This article's PDF has been downloaded 539 times.