Markov Decision Processes with Multiple Long-run Average Objectives

Tomáš Brázdil; Václav Brožek; Krishnendu Chatterjee; Vojtěch Forejt; Antonín Kučera

doi:10.2168/LMCS-10(1:13)2014

Tomáš Brázdil ; Václav Brožek ; Krishnendu Chatterjee ; Vojtěch Forejt ; Antonín Kučera - Markov Decision Processes with Multiple Long-run Average Objectives

lmcs:1156 - Logical Methods in Computer Science, February 14, 2014, Volume 10, Issue 1 - https://doi.org/10.2168/LMCS-10(1:13)2014

Markov Decision Processes with Multiple Long-run Average ObjectivesArticle

Authors: Tomáš Brázdil ; Václav Brožek ; Krishnendu Chatterjee ; Vojtěch Forejt ; Antonín Kučera

We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for epsilon-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e.
randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for epsilon-approximation, for all epsilon>0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of limit-average functions, for all epsilon>0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results.

https://doi.org/10.2168/LMCS-10(1:13)2014

Source: arXiv.org:1104.3489

Volume: Volume 10, Issue 1

Published on: February 14, 2014

Imported on: December 1, 2011

Keywords: Computer Science - Computer Science and Game Theory

Licence: arXiv.org - Non-exclusive license to distribute

Funding:

Source : OpenAIRE Graph

Quantitative Graph Games: Theory and Applications; Funder: European Commission; Code: 279307
Modern Graph Algorithmic Techniques in Formal Verification; Funder: European Commission; Code: P 23499

Classifications

Mathematics Subject Classification 2020¹

Sources:

[1] zbMATH Open.

Bibliographic References

27 Documents citing this article

Christel Baier;Calvin Chau;Sascha Klüppelholz, 2025, Certificates and witnesses for multi-objective queries in Markov decision processes, Performance Evaluation, 168, pp. 102482, 10.1016/j.peva.2025.102482, https://doi.org/10.1016/j.peva.2025.102482.

Christel Baier;Calvin Chau;Sascha Klüppelholz, 2024, Certificates and Witnesses for Multi-objective Queries in Markov Decision Processes, Lecture notes in computer science, pp. 1-18, 10.1007/978-3-031-68416-6_1.

Roman Andriushchenko;Alexander Bork;Carlos E. Budde;Milan Češka;Kush Grover;et al., 2024, Tools at the Frontiers of Quantitative Verification, Lecture notes in computer science, pp. 90-146, 10.1007/978-3-031-67695-6_4.

Alvaro Velasquez;Ismail Alkhouri;K. Subramani;Piotr Wojciechowski;George Atia, 2023, Optimal Deterministic Controller Synthesis from Steady-State Distributions, Journal of Automated Reasoning, 67, 1, 10.1007/s10817-022-09657-9.

Krishnendu Chatterjee;Joost-Pieter Katoen;Stefanie Mohr;Maximilian Weininger;Tobias Winkler, 2023, Stochastic games with lexicographic objectives, Formal Methods in System Design, 63, 1-3, pp. 40-80, 10.1007/s10703-023-00411-4, https://doi.org/10.1007/s10703-023-00411-4.

Richard Mayr;Sven Schewe;Patrick Totzke;Dominik Wojtczak, 2021, Simple Stochastic Games with Almost-Sure Energy-Parity Objectives are in NP and coNP, Lecture notes in computer science, pp. 427-447, 10.1007/978-3-030-71995-1_22, https://doi.org/10.1007/978-3-030-71995-1_22.

Tim Quatmann;Joost-Pieter Katoen, 2021, Multi-objective Optimization of Long-run Average and Total Rewards, Lecture notes in computer science, pp. 230-249, 10.1007/978-3-030-72016-2_13, https://doi.org/10.1007/978-3-030-72016-2_13.

Tobias Winkler;Maximilian Weininger, 2021, Stochastic Games with Disjunctions of Multiple Objectives, Electronic Proceedings in Theoretical Computer Science, 346, pp. 83-100, 10.4204/eptcs.346.6, https://doi.org/10.4204/eptcs.346.6.

Arnd Hartmanns;Sebastian Junges;Joost-Pieter Katoen;Tim Quatmann, 2020, Multi-cost Bounded Tradeoff Analysis in MDP, Journal of Automated Reasoning, 64, 7, pp. 1483-1522, 10.1007/s10817-020-09574-9, https://doi.org/10.1007/s10817-020-09574-9.

Pranav Ashok;Krishnendu Chatterjee;Jan Křetínský;Maximilian Weininger;Tobias Winkler, 2020, Approximating Values of Generalized-Reachability Stochastic Games, arXiv (Cornell University), 1, pp. 102-115, 10.1145/3373718.3394761, http://arxiv.org/abs/1908.05106.

Christel Baier;Holger Hermanns;Joost-Pieter Katoen, 2019, The 10,000 Facets of MDP Model Checking, Lecture notes in computer science, pp. 420-451, 10.1007/978-3-319-91908-9_21.

Christel Baier;Nathalie Bertrand;Jakob Piribauer;Ocan Sankur, 2019, Long-run Satisfaction of Path Properties, 10.1109/lics.2019.8785672, https://hal.science/hal-02349456.

Tomáš Brázdil;Krishnendu Chatterjee;Antonín Kučera;Petr Novotný;Dominik Velan, 2019, Deciding Fast Termination for Probabilistic VASS with Nondeterminism, arXiv (Cornell University), pp. 462-478, 10.1007/978-3-030-31784-3_27, http://arxiv.org/abs/1907.11010.

Arnd Hartmanns;Sebastian Junges;Joost-Pieter Katoen;Tim Quatmann, 2018, Multi-cost Bounded Reachability in MDP, Data Archiving and Networked Services (DANS), pp. 320-339, 10.1007/978-3-319-89963-3_19, https://research.utwente.nl/en/publications/36721b57-a67a-4bb3-b363-341908a693ca.

Christel Baier;Clemens Dubslaff, 2018, From verification to synthesis under cost-utility constraints, ACM SIGLOG News, 5, 4, pp. 26-46, 10.1145/3292048.3292052, https://doi.org/10.1145/3292048.3292052.

Jan Křetínský;Tobias Meggendorfer, 2018, Conditional Value-at-Risk for Reachability and Mean Payoff in Markov Decision Processes, arXiv (Cornell University), pp. 609-618, 10.1145/3209108.3209176, http://arxiv.org/abs/1805.02946.

Patricia Bouyer;Mauricio González;Nicolas Markey;Mickael Randour, 2018, Multi-weighted Markov Decision Processes with Reachability Objectives, Electronic Proceedings in Theoretical Computer Science, 277, pp. 250-264, 10.4204/eptcs.277.18, https://doi.org/10.4204/eptcs.277.18.

Christel Baier;Clemens Dubslaff;L’uboš Korenčiak;Antonín Kučera;Vojtěch Řehák, 2017, Synthesis of Optimal Resilient Control Strategies, Lecture notes in computer science, pp. 417-434, 10.1007/978-3-319-68167-2_27.

Christel Baier;Joachim Klein;Sascha Klüppelholz;Sascha Wunderlich, 2017, Maximizing the Conditional Expected Reward for Reaching the Goal, Lecture notes in computer science, pp. 269-285, 10.1007/978-3-662-54580-5_16.

Laurent Doyen, 2017, The Multiple Dimensions of Mean-Payoff Games, Lecture notes in computer science, pp. 1-8, 10.1007/978-3-319-67089-8_1.

Pranav Ashok;Krishnendu Chatterjee;Przemysław Daca;Jan Křetínský;Tobias Meggendorfer, 2017, Value Iteration for Long-Run Average Reward in Markov Decision Processes, Lecture notes in computer science, pp. 201-221, 10.1007/978-3-319-63387-9_10.

Tim Quatmann;Sebastian Junges;Joost-Pieter Katoen, 2017, Markov Automata with Multiple Objectives, Lecture notes in computer science, pp. 140-159, 10.1007/978-3-319-63387-9_7.

Krishnendu Chatterjee;Laurent Doyen, 2016, Perfect-Information Stochastic Games with Generalized Mean-Payoff Objectives, Proceedings of the 31st Annual ACM/IEEE Symposium on Logic in Computer Science, pp. 247-256, 10.1145/2933575.2934513.

Krishnendu Chatterjee;Martin Chmelík;Mathieu Tracol, 2016, What is decidable about partially observable Markov decision processes with ω-regular objectives, Journal of Computer and System Sciences, 82, 5, pp. 878-911, 10.1016/j.jcss.2016.02.009.

Christel Baier, 2015, Reasoning About Cost-Utility Constraints in Probabilistic Models, Lecture notes in computer science, pp. 1-6, 10.1007/978-3-319-24537-9_1.

Daniel Krähmann;Jana Schubert;Christel Baier;Clemens Dubslaff, 2015, Ratio and Weight Quantiles, Lecture notes in computer science, pp. 344-356, 10.1007/978-3-662-48057-1_27.

Nicolas Basset;Marta Kwiatkowska;Ufuk Topcu;Clemens Wiltsche, 2015, Strategy Synthesis for Stochastic Games with Multiple Long-Run Objectives, Lecture notes in computer science, pp. 256-271, 10.1007/978-3-662-46681-0_22, https://doi.org/10.1007/978-3-662-46681-0_22.

Sources : OpenCitations, OpenAlex & Crossref

Share and export

Consultation statistics

This page has been seen 3432 times.

This article's PDF has been downloaded 739 times.