Tomáš Brázdil ; Václav Brožek ; Krishnendu Chatterjee ; Vojtěch Forejt ; Antonín Kučera - Markov Decision Processes with Multiple Long-run Average Objectives

lmcs:1156 - Logical Methods in Computer Science, February 14, 2014, Volume 10, Issue 1 - https://doi.org/10.2168/LMCS-10(1:13)2014
Markov Decision Processes with Multiple Long-run Average Objectives

Authors: Tomáš Brázdil ; Václav Brožek ; Krishnendu Chatterjee ORCID-iD; Vojtěch Forejt ; Antonín Kučera ORCID-iD

    We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for epsilon-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for epsilon-approximation, for all epsilon>0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of limit-average functions, for all epsilon>0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results.


    Volume: Volume 10, Issue 1
    Published on: February 14, 2014
    Accepted on: June 25, 2015
    Submitted on: December 1, 2011
    Keywords: Computer Science - Computer Science and Game Theory
    Fundings :
      Source : OpenAIRE Research Graph
    • Quantitative Graph Games: Theory and Applications; Funder: European Commission; Code: 279307

    Linked data

    Source : ScholeXplorer IsCitedBy ARXIV 2201.10825
    Source : ScholeXplorer IsCitedBy DOI 10.4230/lipics.concur.2022.22
    Source : ScholeXplorer IsCitedBy DOI 10.48550/arxiv.2201.10825
    • 10.4230/lipics.concur.2022.22
    • 10.48550/arxiv.2201.10825
    • 2201.10825
    Different Strokes in Randomised Strategies: Revisiting Kuhn’s Theorem Under Finite-Memory Assumptions
    Main, James C. A. ; Randour, Mickael ;

    23 Documents citing this article

    Share

    Consultation statistics

    This page has been seen 533 times.
    This article's PDF has been downloaded 350 times.