ON (SUBGAME PERFECT) SECURE EQUILIBRIUM IN QUANTITATIVE REACHABILITY GAMES

. We study turn-based quantitative multiplayer non zero-sum games played on ﬁnite graphs with reachability objectives. In such games, each player aims at reaching his own goal set of states as soon as possible. A previous work on this model showed that Nash equilibria (resp. secure equilibria) are guaranteed to exist in the multiplayer (resp. two-player) case. The existence of secure equilibria in the multiplayer case remained and is still an open problem. In this paper, we focus our study on the concept of subgame perfect equilibrium, a reﬁnement of Nash equilibrium well-suited in the framework of games played on graphs. We also introduce the new concept of subgame perfect secure equilibrium. We prove the existence of subgame perfect equilibria (resp. subgame perfect secure equilibria) in multiplayer (resp. two-player) quantitative reachability games. Moreover, we provide an algorithm deciding the existence of secure equilibria in the multiplayer case.


Introduction
General framework.The construction of correct and efficient computer systems (hardware or software) is recognized as an extremely difficult task.To support the design and verification of such systems, mathematical logic, automata theory [HU79] and more recently model-checking [CGP00] have been intensively studied.The efficiency of the model-checking approach is widely recognized when applied to systems that can be accurately modeled as a finite-state automaton.In contrast, the application of these techniques to more complex systems like embedded systems or distributed systems has been less successful.This could be partly explained by the following reasons: classical automata-based models do not faithfully capture the complex behavior of modern computational systems that are usually composed of several interacting components, also interacting with an environment that is only partially under control.One recent trend to improve the automata models used in the classical approach of verification is to generalize these models with the more flexible and mathematically deeper game-theoretic framework [Nas50,OR94].
The first steps to extend computational models with concepts from game theory were done with the so-called two-player zero-sum games played on graphs [GTW02].Those games are adequate to model controller-environment interactions problems [Tho95,Tho08].Moves of player 1 model actions of the controller whereas moves of player 2 model the uncontrollable actions of the environment, and a winning strategy for player 1 is an abstract form of a control program that enforces the control objective.However, only purely antagonist interactions between a controller and a hostile environment can be modeled in this framework.In order to study more complex systems with more than two components and objectives that are not necessarily antagonist, we need multiplayer non zero-sum games.While in zero-sum games we look for winning or optimal strategies, in non-zero-sum games we rather try to find relevant notions of equilibria, like the famous notion of Nash equilibrium [Nas50].The secure equilibrium [CHJ06] is a more recent concept that is especially well-suited for assume-guarantee synthesis [CH07,CR10].
There is another interesting extension in such games: moving from qualitative to quantitative objectives.A player has a qualitative objective if his aim is to enforce some specification (as, for instance, reaching a certain set of target states of the graph), whereas a quantitative objective implies that he wants to minimize or maximize his gain.For example, a player may wish to reach a set of target states quickly or with a minimal consumption of energy.Until now, qualitative objectives have been more studied than quantitative objectives.However, the latter objectives are as much natural as the former, and so, aught to be considered.Consequently, we investigate here equilibria for multiplayer non zero-sum games played on graphs with quantitative objectives.This article provides some new results in this research direction, in particular it is another step in the quest for solution concepts well-suited for the computer-aided synthesis and verification of multi-agent systems.
Our contribution.We study turn-based multiplayer non zero-sum games played on finite graphs with quantitative reachability objectives, continuing work initiated in [BBDP10].In this framework each player aims at reaching his own goal as soon as possible.In [BBDP10], among other results, it has been proved that a finite-memory Nash (resp.secure) equilibria always exists in multiplayer (resp.2-player) games.
In this paper we consider alternative solution concepts to the classical notion of Nash equilibria.In particular, in the present framework of games on graphs, it is very natural to consider the notion of subgame perfect equilibrium [Sel65]: a choice of strategies is not only required to be a Nash equilibrium from the initial vertex, but also after every possible initial history of the game.Indeed if the initial state or the initial history of the system is not known, then a robust controller should be subgame perfect.We introduce a new and even stronger solution concept with the notion of subgame perfect secure equilibrium, which gathers both the sequential nature of subgame perfect equilibria and the verificationoriented aspects of secure equilibria.These different notions of equilibria are precisely defined in Section 1.
In this paper, we address the following problems: Problem 1.Given a multiplayer quantitative reachability game G, does there exist a Nash (resp.secure, subgame perfect, subgame perfect secure) equilibrium in G?
Problem 2. Given a Nash (resp.secure, subgame perfect, subgame perfect secure) equilibrium in a multiplayer quantitative reachability game G, does there exist such an equilibrium with finite memory?
These questions have been positively solved by some of the authors in [BBDP10] for Nash equilibria in multiplayer games, and for secure equilibria in two-player games.Notice that these problems and related ones have been investigated a lot in the qualitative framework (see [GU08]).
Here we go a step further and establish the following results about subgame perfect and secure equilibria: • in every multiplayer quantitative reachability game, there exists a subgame perfect equilibrium (Theorem 2.1), • in every two-player quantitative reachability game, there exists a subgame perfect secure equilibrium (Theorem 3.1), • in every multiplayer quantitative reachability game, one can decide whether there exists a secure equilibrium in ExpSpace (Theorem 4.1), • if there exists a secure equilibrium in a multiplayer quantitative reachability game, then there exists one that is finite-memory (Theorem 4.2).The results in this paper first appeared in the proceedings of FoSSaCS 2012, [BBDPG12].We here provide their complete proofs.
Related work.Several recent papers have considered two-player zero-sum games played on finite graphs with regular objectives enriched by some quantitative aspects.Let us mention some of them: games with finitary objectives [CH06], games with prioritized requirements [AKW08], request-response games where the waiting times between the requests and the responses are minimized [HTW08,Zim09], and games whose winning conditions are expressed via quantitative languages [BCHJ09].
Other work concerns qualitative non zero-sum games.In [CHJ06] where the notion of secure equilibrium has been introduced, it is proved that a unique maximal payoff profile of secure equilibria always exists for two-player non zero-sum games with regular objectives.In [GU08], general criteria ensuring existence of Nash equilibria and subgame perfect equilibria (resp.secure equilibria) are provided for multiplayer (resp.2-player) games, as well as complexity results.In [BBM10], the existence of Nash equilibria is studied for timed games with qualitative reachability objectives.Complexity issues are discussed in [BBMU11] about Nash equilibria in multiplayer concurrent games with Büchi objectives.
Finally, let us mention work that combines both quantitative and non zero-sum aspects.In [BG09], the authors study games played on graphs with terminal vertices where quantitative payoffs are assigned to the players.These games may have cycles but all the infinite plays form a single outcome (like in chess where every infinite play is a draw).That paper gives criteria that ensure the existence of Nash (and subgame perfect) equilibria in pure and memoryless strategies.In [KLST12], the studied games are played on priced graphs similar to the ones considered in this article, however in a concurrent way.In this concurrent framework, Nash equilibria are not guaranteed to exist anymore.The authors provide an algorithm to decide existence of Nash equilibria, thanks to a Büchi automaton accepting all Nash equilibria outcomes.The complexity of some related decision problems is also studied.In [PS09], the authors study Muller games on finite graphs where players have a preference ordering on the sets of the Muller table.They show that Nash equilibria always exist for such games, and that it is decidable whether there exists a subgame perfect equilibrium.In both cases they give a procedure to compute an equilibrium strategy profile (when it exists).In [FKMY + 10] (respectively [PS11]), it is shown that every multiplayer sequential game has a subgame-perfect ǫ-equilibrium for every ǫ > 0 if the payoff functions of the players are bounded and lower-semicontinuous (respectively upper-semicontinuous).
Organization of the paper.Section 1 is dedicated to definitions.We present the kinds of games and equilibria that we study in this paper.In Section 2, we positively solve Problem 1 for subgame perfect equilibria.In Section 3, this problem is also positively solved for subgame perfect secure equilibria, but only in the two-player case.Finally, in Section 4, we study Problems 1 and 2 in the context of secure equilibria.We partially solve Problems 1 by providing an algorithm that decides the existence of a secure equilibrium.And we positively solve Problem 2 for secure equilibria.

Preliminaries
1.1.Games, Strategy Profiles and Equilibria.In this paper, we distinguish between qualitative and quantitative games.In a qualitative game, each player has a qualitative objective, meaning that he wants to guarantee that some property holds.In this case, his payoff for a play of the game is either 1 or 0 (the play does or does not satisfy the property, respectively).On the other hand, in a quantitative game, each player has a quantitative objective: he aims at minimizing (or maximizing) a certain value.His payoff for a play can then be a real number or ±∞.
We consider here quantitative games played on a graph where all the players have reachability objectives.It means that, given a certain set of vertices Goal i , each player i wants to reach one of these vertices as soon as possible.We recall the basic notions about these games and we introduce different kinds of equilibria, like Nash equilibria.This section is inspired from [BBDP10].
Definition 1.1.An infinite turn-based multiplayer quantitative reachability game is a tuple , each vertex has at least one outgoing edge), and • Goal i ⊆ V is the non-empty goal set of player i.
From now on, we often use the term game to denote a multiplayer quantitative reachability game according to Definition 1.1.
It is often useful to specify an initial vertex v 0 ∈ V for a game G.We call the pair (G, v 0 ) an initialized game.Sometimes we omit the word "initialized" and just talk about games.The game (G, v 0 ) is played as follows.A token is first placed on the vertex v 0 .Player i, such that v 0 ∈ V i , has to choose one of the outgoing edges of v 0 and put the token on the vertex v 1 reached when following this edge.Then, it is the turn of the player who owns v 1 .And so on.
A play ρ ∈ V ω (resp.a history h ∈ V + ) of (G, v 0 ) is an infinite (resp.a finite) path through the graph G starting from vertex v 0 .Note that a history is always non-empty because it starts with v 0 .The set H ⊆ V + is made up of all the histories of G, and for i ∈ Π, the set H i is the set of all histories h ∈ H whose last vertex belongs to V i .
For any play ρ = ρ 0 ρ 1 . . . of G, we define Cost i (ρ) the cost of player i as: (1.1) We note Cost(ρ) = (Cost i (ρ)) i∈Π the cost profile for the play ρ.Each player i aims to minimize the cost he has to pay, i.e. reach his goal set as soon as possible.The cost profile for a history h is defined similarly.
A prefix (resp.proper prefix ) α of a history h = h 0 . . .h k is a finite sequence h 0 . . .h l , with l ≤ k (resp.l < k), denoted by α ≤ h (resp.α < h).We similarly consider a prefix α of a play ρ, denoted by α < ρ.The function Last returns, given a history h = h 0 . . .h k , the last vertex h k of h, and the length |h| of h is the number k of its edges.Note that the length is not defined as the number of vertices.Given a play ρ = ρ 0 ρ 1 . .., we denote by ρ ≤l the prefix of ρ of length l, i.e. ρ ≤l = ρ 0 ρ 1 . . .ρ l .Similarly, ρ <l = ρ 0 ρ 1 . . .ρ l−1 .
We say that a play ρ = ρ 0 ρ 1 . . .visits a set S ⊆ V (resp.a vertex v ∈ V ) if there exists l ∈ N such that ρ l is in S (resp.ρ l = v).The same terminology also stands for a history h.More precisely, we say that ρ visits a set S at (resp.before) if there exists l ≤ d such that ρ l is in S).For any play ρ we denote by Visit(ρ) the set of players i ∈ Π such that ρ visits Goal i .The set Visit(h) for a history h is defined similarly.
A strategy of player i in G is a function σ : H i → V assigning to each history h ∈ H i , a next vertex σ(h) such that (Last(h), σ(h)) belongs to E. We say that a play ρ = ρ 0 ρ 1 . . . of G is consistent with a strategy σ of player i if ρ k+1 = σ(ρ 0 . . .ρ k ) for all k ∈ N such that ρ k ∈ V i .The same terminology is used for a history h of G.A strategy profile of G is a tuple (σ i ) i∈Π where σ i is a strategy for player i.It determines a unique play in the initialized game (G, v 0 ) consistent with each strategy σ i , called the outcome of (σ i ) i∈Π and denoted by (σ i ) i∈Π v 0 .We write σ −j for (σ i ) i∈Π\{j} , the set of strategies σ i for all the players except for player j.
A strategy σ of player i is memoryless if σ depends only on the current vertex, i.e. σ(h) = σ(Last(h)) for all h ∈ H i .More generally, σ is a finite-memory strategy if the equivalence relation In other words, a finite-memory strategy is a strategy that can be implemented by a finite automaton with output.A strategy profile (σ i ) i∈Π is called memoryless or finitememory if each σ i is a memoryless or a finite-memory strategy, respectively.
For a strategy profile (σ i ) i∈Π with outcome ρ and a strategy σ ′ j of player j, we say that player j deviates from ρ if there exists a prefix h of ρ, consistent with σ ′ j , such that h ∈ H j and σ ′ j (h) = σ j (h).We now introduce different notions of equilibria in the quantitative framework and give several examples to make clear the presented concepts.We first begin with the definition of Nash equilibrium.Definition 1.2.A strategy profile (σ i ) i∈Π of a game (G, v 0 ) is a Nash equilibrium if for every player j ∈ Π and every strategy σ ′ j of player j, we have: This definition means that for all j ∈ Π, player j has no incentive to deviate since he cannot strictly decrease his cost when using σ ′ j instead of σ j .Keeping notations of Definition 1.2 in mind, a strategy σ ′ j such that Cost j (ρ) > Cost j (ρ ′ ) is called a profitable deviation for player j w.r.t.(σ i ) i∈Π .In this case, either player j pays an infinite cost for ρ and a finite cost for ρ ′ (i.e.ρ ′ visits Goal j , but ρ does not), or player j pays a finite cost for ρ and a strictly lower cost for ρ ′ (i.e.ρ ′ visits Goal j for the first time earlier than ρ does).
We now define the concept of secure equilibrium 1 .We first need to associate a binary relation ≺ j on cost profiles with each player j.Given two cost profiles (x i ) i∈Π and (y i ) i∈Π : We then say that player j prefers (y i ) i∈Π to (x i ) i∈Π .In other words, player j prefers a cost profile to another one either if he has a strictly lower cost, or if he keeps the same cost, the other players have a greater cost, and at least one has a strictly greater cost.
Definition 1.3.A strategy profile (σ i ) i∈Π of a game (G, v 0 ) is a secure equilibrium if for every player j ∈ Π, there does not exist any strategy σ ′ j of player j such that: In other words, player j has no incentive to deviate w.r.t.relation ≺ j .A strategy σ ′ j such that Cost(ρ) ≺ j Cost(ρ ′ ) is called a ≺ j -profitable deviation for player j w.r.t.(σ i ) i∈Π .Clearly, any secure equilibrium is a Nash equilibrium.
In a secure equilibrium, each player tries first to minimize his own cost, and then to maximize the costs of the other players.According to [CHJ06], a secure profile can be seen as a contract between the players which strengthens cooperation in the following sense: any unilateral selfish deviation by one player cannot put the other players at a disadvantage if they follow the contract.For more intuition and motivation about secure equilibria, see [CHJ06,CH07,CR10].
We now introduce a third type of equilibrium: the subgame perfect equilibrium.In this case, a strategy profile is not only required to be a Nash equilibrium from the initial vertex, but also after every possible initial history of the game.Before giving the definition, we introduce the concept of subgame and explain some notations.
Given a game G = (Π, V, (V i ) i∈Π , E, (Goal i ) i∈Π ), an initial vertex v 0 , and a history hv of (G, v 0 ), with v ∈ V (h might be empty), the subgame (G| h , v) of (G, v 0 ) with history hv is the game G| h = (Π, V, (V i ) i∈Π , E, (Goal i ) i∈Π ) initialized at v and such that the cost of a play π of (G| h , v) for player i is given by Cost i (hπ).Notice that the only difference between (G, v 0 ) and (G| h , v) occurs in the costs of the plays.The cost for a play in the subgame (G| h , v) depends on the considered history h (the goal set Goal i could have already been visited by h).Given a strategy σ i for player i in G, we define the strategy Let σ be the strategy profile (σ i ) i∈Π , we write σ| h for (σ i | h ) i∈Π , and h σ| h v for the play in (G, v 0 ) with prefix h that is consistent with σ| h from v.
1 Our definition naturally extends the notion of secure equilibrium proposed in [CHJ06] to the quantitative framework.

ON (SUBGAME PERFECT) SECURE EQUILIBRIUM IN QUANTITATIVE REACHABILITY GAMES 7
Then, we say that (σ i | h ) i∈Π is a Nash equilibrium in the subgame (G| h , v) if for every player j ∈ Π and every strategy σ ′ j of player j, we have that Cost j (ρ) ≤ Cost j (ρ ′ ), where The definition of a secure equilibrium in (G| h , v) is given similarly.
A subgame perfect equilibrium is a strategy profile that is a Nash equilibrium after every possible history of the game, i.e. in every subgame.In particular, a subgame perfect equilibrium is also a Nash equilibrium.
Definition 1.4.A strategy profile (σ i ) i∈Π of a game (G, v 0 ) is a subgame perfect equilibrium if for all histories hv of (G, v 0 ), with v ∈ V , (σ i | h ) i∈Π is a Nash equilibrium in the subgame (G| h , v).
We now introduce the last kind of equilibrium that we study.It is a new notion that combines both concepts of subgame perfect equilibrium and secure equilibrium in the following way.
Definition 1.5.A strategy profile (σ i ) i∈Π of a game (G, v 0 ) is a subgame perfect secure equilibrium if for all histories hv of (G, v 0 ), with v ∈ V , (σ i | h ) i∈Π is a secure equilibrium in the subgame (G| h , v).
Notice that a subgame perfect secure equilibrium is a secure equilibrium, as well as a subgame perfect equilibrium.
In order to understand the differences between the various notions of equilibria, we provide three simple examples of games limited to two players and to finite trees.
Example 1.6.Let G = (V, V 1 , V 2 , E, Goal 1 , Goal 2 ) be the two-player game depicted in Fig. 1.The vertices of player 1 (resp.2) are represented by circles (resp.squares), that is, V 1 = {A, D, E, F } and V 2 = {B, C}.The initial vertex v 0 is A. The vertices of Goal 1 are shaded whereas the vertices of Goal 2 are doubly circled; thus Goal 1 = {D, F } and Goal 2 = {F } in G.The number 2 labeling the edge (B, D) is a shortcut to indicate that there are two consecutive edges from B to D (through one intermediate vertex).We will keep these conventions throughout the article.In the games G, G ′ and G ′′ of Fig. 1, 2 and 3 (played on the same graph), we define two strategies σ 1 , σ ′ 1 of player 1 and two stategies σ 2 , σ ′ 2 of player 2 in the following way: In (G, A), one can easily check that the strategy profile (σ 1 , σ 2 ) is a secure equilibrium (and thus a Nash equilibrium) with cost profile is (3, +∞).Such a secure equilibrium exists because player 2 threatens player 1 to go to vertex E in the case where vertex C is reached.This threat is not credible in this case since by acting this way, player 2 gets an infinite cost instead of a cost of 2 (that he could obtain by reaching F ).For this reason, (σ 1 , σ 2 ) is not a subgame perfect equilibrium (and thus not a subgame perfect secure equilibrium).However, one can check that the strategy profile (σ ′ 1 , σ ′ 2 ) is a subgame perfect secure equilibrium.Let us now consider the game (G ′ , A) depicted in Fig. 2 (notice that the number 2 has disappeared from the edge (B, D), but Goal 1 and Goal 2 remain the same).One can verify that the strategy profile (σ ′ 1 , σ ′ 2 ) is a subgame perfect equilibrium which is not a secure equilibrium (and thus not a subgame perfect secure equilibrium).A subgame perfect secure equilibrium for (G ′ , A) is given by the strategy profile (σ 1 , σ ′ 2 ).Finally, for the game (G ′′ , A) depicted in Fig. 3 (where Goal 1 = {D, F } and Goal 2 = {E, F }), one can check that the strategy profile (σ 1 , σ ′ 2 ) is both a subgame perfect equilibrium and a secure equilibrium.However it is not a subgame perfect secure equilibrium.In particular, this shows that being a subgame perfect secure equilibrium is not equivalent to be a subgame perfect equilibrium and a secure equilibrium.On the other hand, (σ 1 , σ 2 ) is a subgame perfect secure equilibrium in (G ′′ , A).
The general philosophy of our work is to investigate interesting concepts of equilibria in multiplayer quantitative reachability games.In these games, each player aims at reaching his goal set as soon as possible.Having that in mind, a play where a goal set is visited for the first time after cycles were no new goal set is visited does not seem to be a desirable behavior (see the definition of unnecessary cycle below).It appears thus reasonable to seek equilibrium concepts with outcomes that do not present this undesirable feature.
Example 1.8.Let us exhibit an example of this phenomenon on the two-player game (G, A) depicted in Fig. 4 (we use the same conventions as in Example 1.6).For n > 0, let us consider the play A n B ω .Along this play, the cycles A n−1 , for n > 1, are unnecessary cycles.Indeed, once Goal 1 is visited (in A), looping n times on A just delays the apparition of Goal 2 (in B).However, for each n > 0, one can build a subgame perfect equilibrium (σ n 1 , σ 2 ) whose outcome is A n B ω and cost profile is (0, n), as follows: This allows us to conclude that the notion of subgame perfect equilibrium does not prevent the existence of outcomes with unnecessary cycles.We can notice that (σ n 1 , σ 2 ) is not a secure equilibrium, for all n > 0. However, we will see in the next example that secure equilibria can also allow this kind of undesirable behaviors.Let us consider the game of Fig. 5 initialized at A. For n > 1, the cycles A n−1 are unnecessary along the play A n BC ω .However, for each n > 0, we can build a secure equilibrium 2 ) whose outcome is A n BC ω and cost profile is (n + 1, n + 1), as follows: For each n > 0, the fact that (σ n 1 , σ n 2 ) is a secure equilibrium is based on the following threat of player 2 against player 1: player 2 pretends that he will only decide to visit vertex C if player 1 has visited vertex A exactly n times.This behavior is not credible since player 2's interest is to reach vertex C as soon as possible.In other words, we have that (σ n 1 , σ n 2 ) is not a subgame perfect equilibrium (and thus not a subgame perfect secure equilibrium).
Those examples motivate the introduction of the notion of subgame perfect secure equilibrium.We believe that this notion can help in avoiding the undesirable behaviors of unnecessary cycles.More generally, a deeper understanding of the studied equilibria whose outcomes have unnecessary cycles could be very useful.A more subtle example of a three-player game will be discussed in Example 4.9.
In the sequel, we study and partially solve Problem 1 and Problem 2. The next three sections contain useful material for the proofs of our results.
1.2.Qualitative Two-player Zero-sum Games.In this section we recall well-known properties of qualitative two-player zero-sum games [Tho08].They will be useful for our proofs, especially in the context of deviations of a player with respect to a strategy profile: we thus face a two-player zero-sum game where the player who deviates plays against the coalition of the other players.
We first recall the notion of weak parity game.
Definition 1.9.A qualitative two-player zero-sum weak parity game is a tuple ) is a partition of V into the vertex sets of player 1 and player 2, and Given an initial vertex v 0 ∈ V , the notions of play, history and strategy are the same as the ones defined in Section 1.1.The game is said zero-sum because every play is won by exactly one of the two players.
In zero-sum games, it is interesting to know if one of the players can play in such a way that he is sure to win, however the other player plays.This is formalized with the notion of winning strategy.A strategy σ i for player i is a winning strategy from an initial vertex v if all the plays of G starting from v that are consistent with σ i are won by player i.If player i has a winning strategy in G from v, we say that player i wins the game G from v. We say that a game G is determined if for all v ∈ V , one of the two players has a winning strategy from v.
Martin showed [Mar75] that every qualitative two-player zero-sum game with a Borel type winning condition is determined.In particular, we have the following proposition: E, c) be a qualitative two-player zero-sum weak parity game.Then for all v ∈ V , one of the two players has a memoryless winning strategy from v (in particular, G is determined).
We here consider three special cases of the weak parity condition: reachability, safety, and reachability under safety conditions.A qualitative two-player zero-sum reachability under safety game is denoted where R, S ⊆ V and R = ∅.In such a game, player 1 wins a play ρ iff ρ visits R (i.e., ∃i ρ i ∈ R) while staying in S (i.e., ∀i ρ i ∈ S).The reachability under safety condition can be encoded with a weak parity condition by defining the coloring function c as follows: where S = V (resp.R = V ).We can now state a corollary of Proposition 1.10.
) be a qualitative two-player zero-sum reachability under safety game.Then the game G is determined and player 1 has a memoryless strategy ν 1 that enables him to reach R within |V | − 1 edges, while staying in S, from each vertex v from which he wins the game.
In the sequel, we apply Corollary 1.11 on particular two-player games.Given a multiplayer quantitative reachability game G = (Π, V, (V i ) i∈Π , E, (Goal i ) i∈Π ) and a player i ∈ Π, we denote by ) in short) the qualitative two-player zero-sum reachability under safety game associated with player i.This game is played on the graph G i = (V, V i , V \ V i , E), where player i plays against the coalition of all the other players.Player i controls the vertices of V i and the coalition those of V \ V i ; player i aims at reaching R while staying in S, and the coalition wants to prevent this.1.3.Unraveling.In the proofs of this article, it will be often useful to unravel the graph G = (V, (V i ) i∈Π , E) from an initial vertex v 0 , which ends up in an infinite tree, denoted by T .This tree can be seen as a new graph where the set of vertices is the set H of histories of G, the initial vertex is v 0 , and a pair (h, hv) We denote by T the related game.This game T played on the unraveling T of G from v 0 is equivalent to the game (G, v 0 ) played on the graph G in the following sense.A play (ρ 0 )(ρ 0 ρ 1 )(ρ 0 ρ 1 ρ 2 ) . . . in T induces a unique play ρ = ρ 0 ρ 1 ρ 2 . . . in (G, v 0 ), and conversely.Thus, we denote a play in T by the respective play in (G, v 0 ).The bijection between plays of (G, v 0 ) and plays of T allows us to use the same cost function Cost, and to transform easily strategies in G to strategies in T (and conversely).
For practical reasons, we often consider equivalently T in our proofs instead of (G, v 0 ), and the equilibria defined in T are obviously equilibria in (G, v 0 ).Moreover, figures given in proofs to help the understanding roughly represent the unraveling T of G and plays in game T .
We also need to study the tree T limited to a certain depth d ∈ N: we denote by Trunc d (T ) the truncated tree of T of depth d and Trunc d (T ) the finite game played on Trunc d (T ).More precisely, the set of vertices of Trunc d (T ) is the set of histories h ∈ H of length ≤ d; the edges of Trunc d (T ) are defined in the same way as for T , except that for the histories h of length d, there exists no edge (h, hv).A play ρ in Trunc d (T ) corresponds to a history of (G, v 0 ) of length equal to d.The notions of cost and strategy are defined exactly like in the game T , but limited to the depth d.For instance, a player pays an infinite cost for a play ρ (of length d) if his goal set is not visited by ρ.
1.4.Kuhn's Theorem.This section is devoted to the classical Kuhn's theorem [Kuh53].It claims the existence of a subgame perfect equilibrium (resp.subgame perfect secure equilibrium) in multiplayer games played on finite trees.
A preference relation is a total, reflexive and transitive binary relation.
Theorem 1.12 (Kuhn's theorem).Let Γ be a finite tree and G a game played on Γ.For each player i ∈ Π, let i be a preference relation on cost profiles.Then there exists a strategy profile (σ i ) i∈Π such that for every history hv of G, every player j ∈ Π, and every strategy σ ′ j of player j in G, we have Cost(ρ ′ ) j Cost(ρ) One can easily be convinced that the binary relation on cost profiles used to define the notion of Nash equilibrium (see Definition 1.2) is total, reflexive and transitive.We thus have the following corollary.
Corollary 1.13.Let (G, v 0 ) be a game and T be the unraveling of G from v 0 .Let Trunc d (T ) be the game played on the truncated tree of T of depth d ∈ N. Then there exists a subgame perfect equilibrium in Trunc d (T ).
Let j be the relation defined by x j y iff x ≺ j y or x = y, where ≺ j is the relation used in Definition 1.3.We notice that in the two-player case, this relation is total, reflexive and transitive.However when there are more than two players, j is no longer total.Nevertheless, it is proved in [LR09] that Kuhn's theorem remains true when j is only transitive.So, the next corollary holds.
Corollary 1.14.Let (G, v 0 ) be a game and T be the unraveling of G from v 0 .Let Trunc d (T ) be the game played on the truncated tree of T of depth d ∈ N. Then there exists a subgame perfect secure equilibrium in Trunc d (T ).

Existence of a Subgame Perfect Equilibrium
In this section, we positively solve Problem 1 for subgame perfect equilibria.
Theorem 2.1.In every multiplayer quantitative reachability game, there exists a subgame perfect equilibrium.
The proof uses techniques completely different from the ones given in [BBDP10,BBDP11] for the existence of Nash equilibria, and secure equilibria in two-player games.
Let (G, v 0 ) be a game and T be the infinite game played on the unraveling T of G from v 0 .Kuhn's theorem (and in particular Corollary 1.13) guarantees the existence of a subgame perfect equilibrium in each finite game Trunc n (T ) for every depth n ∈ N. Given a sequence of such equilibria, the keypoint is to derive the existence of a subgame perfect equilibrium in the infinite game T .This is possible by the following lemma.
Lemma 2.2.Let (σ n ) n∈N be a sequence of strategy profiles such that for every n ∈ N, σ n is a strategy profile in the truncated game Trunc n (T ).Then there exists a strategy profile σ ⋆ in the game T with the property: ∀d ∈ N, ∃n ≥ d, σ ⋆ and σ n coincide on histories of length up to d. (2.1) Proof.This result is a direct consequence of the compactness of the set of infinite trees with bounded outdegree [Kec95].An alternative proof is as follows.We give a tree structure, denoted by Γ, to the set of all strategy profiles in the games Trunc n (T ), n ∈ N: the nodes of Γ are the strategy profiles, and we draw an edge from a strategy profile σ in Trunc n (T ) to a strategy profile σ ′ in Trunc n+1 (T ) if and only if σ is the restriction of σ ′ to histories of length less than n.It means that the nodes at depth d correspond to strategy profiles of Trunc d (T ).We then consider the tree Γ ′ derived from Γ where we only keep the nodes σ n , n ∈ N, and their ancestors.Since Γ ′ has finite outdegree, it has an infinite path by König's lemma.This path goes through infinitely many nodes that are ancestors of nodes in the set {σ n , n ∈ N}.Therefore there exists a strategy profile σ ⋆ in the infinite game T (given by the previous infinite path in Γ ′ ) with property (2.1).
Proof of Theorem 2.1.Let G = (Π, V, (V i ) i∈Π , E, (Goal i ) i∈Π ) be a multiplayer quantitative reachability game, v 0 be an initial vertex, and T be the game played on the unraveling of G from v 0 .For all n ∈ N, we consider the finite game Trunc n (T ) and get a subgame perfect equilibrium σ n = (σ n i ) i∈Π in this game by Corollary 1.13.According to Lemma 2.2, there exists a strategy profile σ ⋆ in the game T with property (2.1).
It remains to show that σ ⋆ is a subgame perfect equilibrium in T , and thus in (G, v 0 ).Let hv be a history of the game (with v ∈ V ).We have to prove that σ ⋆ | h is a Nash equilibrium in the subgame (T | h , v).As a contradiction, suppose that there exists a profitable deviation σ ′ j for some player j ∈ Π w.r.t.
, that is, ρ ′ visits Goal j for the first time at a certain depth d, such that |h| < d < +∞, and ρ visits Goal j at a depth strictly greater than d (see Figure 6).Thus: According to property (2.1), there exists n ≥ d such that σ ⋆ coincide with σ n on histories of length up to d.It follows that for we have that (see Figure 6) as π ′ and ρ ′ coincide up to depth d.And so, σ ′ j is a profitable deviation for player j w.r.t.σ n | h in (Trunc n (T )| h , v), which leads to a contradiction with the fact that σ n is a subgame perfect equilibrium in Trunc n (T ) by hypothesis.
As an extension, we consider multiplayer quantitative reachability games with tuples of costs on edges (as in [BBDP11]).In these games, we assume that edges are labelled with tuples of strictly positive costs (one cost for each player).Here we do not only count the number of edges to reach the goal of a player, but we sum up his costs along the path until his goal is reached.His aim is still to minimize his global cost for a play.Let us give the formal definition.E) is a finite directed graph where V is the set of vertices, (V i ) i∈Π is a partition of V into the state sets of each player, and E ⊆ V × V is the set of edges, such that for all v ∈ V , there exists

Definition 2.3. A multiplayer quantitative reachability game with tuples of costs on edges
is the cost function of player i defined on the edges of the graph, • Goal i ⊆ V is the non-empty goal set of player i.
In this context, we adapt the definition of Cost i (ρ), the cost of player i for a play ρ = ρ 0 ρ 1 . . .: (2.2) In this framework, we also prove the existence of a subgame perfect equilibrium.The proof is similar to the one of Theorem 2.1, the only difference lies in the choice of the considered depth d.
Theorem 2.4.In every multiplayer quantitative reachability game with tuples of costs on edges, there exists a subgame perfect equilibrium.
Let us introduce some notations that will be useful for the proof of this theorem.We define c min := min i∈Π min e∈E Cost i (e), c max := max i∈Π max e∈E Cost i (e) and K : ) be a multiplayer quantitative reachability game with tuples of costs on edges, v 0 be an initial vertex, and T be the game played on the unraveling of G from v 0 .For all n ∈ N, we consider the finite game Trunc n (T ) and get a subgame perfect equilibrium σ n = (σ n i ) i∈Π in this game by Corollary 1.13.According to Lemma 2.2, there exists a strategy profile σ ⋆ in the game T with property (2.1).
We then show that σ ⋆ is a subgame perfect equilibrium in T , and thus in (G, v 0 ).Let hv be a history of the game (v ∈ V ).We have to prove that σ ⋆ | h is a Nash equilibrium in the subgame (T | h , v).As a contradiction, suppose that there exists a profitable deviation σ ′ j for some player j ∈ Π w.r.t.
Thus ρ ′ visits Goal j for the first time at a certain depth d ′ , such that |h| < d ′ < +∞.
We define some depth d depending on the fact that ρ visits Goal j or not.
According to property (2.1), there exists n ≥ d such that σ ⋆ coincide with σ n on histories of length up to d.
If ρ visits Goal j , then it holds that Cost j (π) = Cost j (ρ) by definition of d, and so Cost j (π) > Cost j (π ′ ).If ρ does not visit Goal j , then the following inequalities hold: The first inequality comes from the fact that π ′ visits Goal j at depth d ′ , the second one from the definition of d, and the last one from the fact that if π visits Goal j , it must happen after depth d (as ρ does not visit Goal j ).
In both cases Cost j (π) > Cost j (π ′ ), and we conclude that σ ′ j is a profitable deviation for player j w.r.t.σ n | h in (Trunc n (T )| h , v), which leads to a contradiction with the fact that σ n is a subgame perfect equilibrium in Trunc n (T ) by hypothesis.
Remark 2.5.We can transform the cost functions (Cost i ) i∈Π ((1.1) or (2.2)) of our games in the following way: for any player i and any play ρ, These new cost functions (Cost ′ i ) i∈Π are bounded and continuous (in the product topology on V ω ).Moreover, a subgame perfect equilibrium in a game with the cost functions (Cost i ) i∈Π is a subgame perfect equilibrium in this game with the new cost functions (Cost ′ i ) i∈Π , and conversely.Then, Theorems 2.1 and 2.4 are consequences of [Har85,FL83].

Existence of a Subgame Perfect Secure Equilibrium
Regarding subgame perfect secure equilibria, we positively solve Problem 1 but only in the case of two-player games.
Theorem 3.1.In every two-player quantitative reachability game, there exists a subgame perfect secure equilibrium.
The main ideas of the proof are similar to the ones for Theorem 2.1.
be a two-player quantitative reachability game, v 0 be an initial vertex, and T be the game played on the unraveling of G from v 0 .For every n ∈ N, we consider the finite game Trunc n (T ) and get a subgame perfect secure equilibrium σ n = (σ n 1 , σ n 2 ) in this game by Corollary 1.14.According to Lemma 2.2 there exists a strategy profile σ ⋆ in the game T such that σ ⋆ has property (2.1).
We show that σ ⋆ = (σ ⋆ 1 , σ ⋆ 2 ) is a subgame perfect secure equilibrium in T .Let hv be a history of the game (v ∈ V ).We have to prove that σ ⋆ | h is a secure equilibrium in the subgame (T | h , v).As a contradiction, suppose that there exists a ≺ j -profitable deviation σ ′ j for some player j ∈ {1, 2} w.r.t.σ ⋆ | h in (T | h , v).Let us assume w.l.o.g. that j = 1.As σ ⋆ | h is a Nash equilibrium in (T | h , v) (see the proof of Theorem 2.1), we know that where Thus it implies that Cost 2 (ρ) is finite.Let d be the maximum between Cost 1 (ρ) and Cost 2 (ρ) if Cost 1 (ρ) is finite, or Cost 2 (ρ) otherwise.Remark that d > |h|.According to property (2.1), there exists n ≥ d such that the strategy profiles σ ⋆ and σ n coincide on histories of length up to d.
Let us show that σ ′ 1 would then be a ≺ 1 -profitable deviation for player 1 w.r.t.
In this aim we first prove that are finite plays in Trunc n (T ) (see Fig. 7).By definition of d and according to property (2.1), we have that Cost . Otherwise, we have that Cost 2 (π ′ ) > d as ρ ′ and π ′ coincide until depth d (by property (2.1)), and then Figure 7: The game T with its subgame (T | h , v).
We now consider Cost 1 (π) and Cost 1 (π ′ ).Let us study the next two cases.
• If Cost 1 (ρ) < +∞, then we have that (3.4) As a contradiction suppose that Cost 1 (π) < +∞.Consider vertex ρ d , the first vertex of ρ that belongs to Goal 2 (we recall that Cost 2 (ρ) = d).Suppose that player 1 has a winning strategy to reach his goal from vertex ρ d in the zero-sum reachability game G 1 = (G 1 , Goal 1 , V ) (as defined in Section 1.2).Then this contradicts the fact that σ ⋆ is a subgame perfect equilibrium in T (see the proof of Theorem 2.1).Therefore, by determinacy of G 1 (Corollary 1.11), player 2 has a winning strategy from vertex ρ d to prevent player 1 from reaching Goal 1 .But in this case, this strategy is a ≺ 2 -profitable deviation w.r.t.σ n | h in (Trunc n (T )| h , v), because player 2 can keep his cost while strictly increasing player 1's cost.This is impossible as σ n is a subgame perfect secure equilibrium in Trunc n (T ).Thus, we must have that Cost 1 (π) = +∞.In all possible situations, we proved that σ ′ 1 is a ≺ 1 -profitable deviation for player 1 w.r.t.
So we get a contradiction with the fact that σ n is a subgame perfect secure equilibrium in Trunc n (T ) by hypothesis.
Unfortunately the proof does not seem to extend to the multiplayer case.Indeed we face the same kind of problems encountered in [BBDP10,BBDP11], where the existence of secure equilibria is proved for two-player games and left open for multiplayer games.

Decidability of the Existence of a Secure Equilibrium
In this section, we study Problems 1 and 2 in the context of secure equilibria.Both problems have been positively solved in [BBDP10] for two-player games only.To the best of our knowledge, the existence of secure equilibria in the multiplayer framework is still an open problem.We here provide an algorithm that decides the existence of a secure equilibrium.We also show that if there exists a secure equilibrium, then there exists one that is finitememory.
Theorem 4.1.In every multiplayer quantitative reachability game, one can decide whether there exists a secure equilibrium in ExpSpace.
Theorem 4.2.If there exists a secure equilibrium in a multiplayer quantitative reachability game, then there exists one that is finite-memory.
The proof of Theorem 4.1 is inspired from ideas developed in [BBDP10,BBDP11].The keypoint is to show that the existence of a secure equilibrium in a game (G, v 0 ) is equivalent to the existence of a secure equilibrium (with two additional properties) in the finite game Trunc d (T ) for a well-chosen depth d.The existence of the latter equilibrium is decidable.Notice that by Corollary 1.14 a secure equilibrium always exists in Trunc d (T ); however we do not know if a secure equilibrium with the two required additional properties always exists in Trunc d (T ).
Let us formally introduce these two properties.The first one requires that the secure equilibrium is goal-optimized, meaning that all the goal sets visited along its outcome are visited for the first time before a certain given depth.For any game G played on a graph with |V | vertices by |Π| players, we fix the following constant: Definition 4.3.Given a game (G, v 0 ) and a strategy profile (σ i ) i∈Π in G, with outcome ρ, we say that (σ i ) i∈Π is goal-optimized if and only if for all i ∈ Π such that Cost i (ρ) < +∞, we have that Cost i (ρ) < d goal (G).
The second property asks for a secure equilibrium that is deviation-optimized, meaning that whenever a player deviates from its outcome, he realizes within a certain given number of steps that his deviation is not profitable for him.Definition 4.4.Given a game (G, v 0 ) and a secure equilibrium (σ i ) i∈Π in G, with outcome ρ, we say that (σ i ) i∈Π is deviation-optimized if and only if for every player j ∈ Π and every strategy σ ′ j of player j, we have that Cost(ρ <d dev ) ≺ j Cost(ρ ′ <d dev ), Let us remark that in Theorem 4.2, the finite-memory secure equilibrium is created from the one given by hypothesis and the construction is made in such a way that the set of players whose goal set is visited along the outcome is the same for both equilibria.
The proof of Proposition 4.5 is long and technical.The next two sections are devoted to the two parts of this proposition.4.1.Part (i) of Proposition 4.5.This section is devoted to the proof of Proposition 4.5, Part (i).We begin with a useful characterisation of a deviation-optimized secure equilibrium.
Lemma 4.6.With the previous notations of Definition 4.4, a secure equilibrium (σ i ) i∈Π is deviation-optimized if and only if for every player j ∈ Π and every strategy σ ′ j of player j, Proof.Let us first assume that (σ i ) i∈Π is a deviation-optimized secure equilibrium whose outcome is denoted by ρ.Given any player j ∈ Π, let σ ′ j be a strategy fulfilling the hypotheses of the lemma and ρ ′ the outcome given by σ ′ j , σ −j v 0 .Let us denote respectively by (x i ) i∈Π and (y i ) i∈Π the cost profiles of the histories ρ <d dev and ρ ′ <d dev .Notice that by definition of d dev , Cost i (ρ) = x i for all i.For ρ ′ , we have Cost i (ρ ′ ) = y i provided Cost i (ρ ′ ) < d dev .Otherwise, it may happen that y i = +∞ and Cost i (ρ ′ ) < +∞.So, it holds that Cost i (ρ ′ ) ≤ y i for all i.These observations will be often used in the sequel of the proof.
Since (σ i ) i∈Π is deviation-optimized, we have Cost(ρ <d dev ) ≺ j Cost(ρ ′ <d dev ) meaning that: (4.1)By hypothesis (i), x j = y j .By hypothesis (iii), we cannot have ∀i ∈ Π x i ≥ y i .Therefore to satisfy (4.1), there must exist a player i such that ) in contradiction with hypothesis (ii).Therefore Cost i (ρ) = +∞.From x i > y i , it follows that Cost i (ρ ′ ) < d dev , which concludes the first implication of the proof.
For the converse, let us now assume that (σ i ) i∈Π is a secure equilibrium that fulfills the property stated in Lemma 4.6.We will prove that it is deviation-optimized, that is, for any player j ∈ Π, and any deviation σ ′ j of player j, we have that Cost(ρ <d dev ) ≺ j Cost(ρ ′ <d dev ), with ρ = (σ i ) i∈Π v 0 and ρ ′ = σ ′ j , σ −j v 0 .By denoting respectively by (x i ) i∈Π and (y i ) i∈Π the cost profiles of ρ <d dev and ρ ′ <d dev , it is equivalent to prove (4.1).Since (σ i ) i∈Π is a secure equilibrium, we know that σ ′ j is not a ≺ j -profitable deviation.In particular, player j can not strictly decrease his cost along ρ ′ , and thus x j ≤ y j .It remains to prove that the second conjunct of (4.1) is true.For this, we first show that as soon as one of the hypotheses among (i), (ii) or (iii) is not fulfilled, this conjunct is satisfied.
• If Cost j (ρ) < Cost j (ρ ′ ), by choice of d dev , we also have that x j < y j .Moreover, the case Cost j (ρ) > Cost j (ρ ′ ) is not possible as (σ i ) i∈Π is a secure equilibrium.• If there exists i ∈ Π such that Cost i (ρ) < +∞ and Cost i (ρ) > Cost i (ρ ′ ), then x i > y i .
• If for all i ∈ Π, Cost i (ρ) ≥ Cost i (ρ ′ ), we also have that x i ≥ y i , for all i.Thus the remaining deviations to consider fulfill hypotheses (i), (ii) and (iii).In this case, there exists l ∈ Π such that Cost l (ρ) = +∞ and Cost l (ρ ′ ) < d dev .In particular we have that x l > y l , and the second conjunct of (4.1) is true.
The ideas of the proof for Part (i) of Proposition 4.5 are as follows.Suppose that there exists a goal-optimized and deviation-optimized secure equilibrium (σ i ) i∈Π in Trunc d (T ), for d = d goal (G) + 3 • |V |.To get from (σ i ) i∈Π a finite-memory secure equilibrium in (G, v 0 ), we use a similar construction as [BBDP11, Proposition 25] where it is shown, in the context of two-player games, how to extend a secure equilibrium in a finite truncation of (G, v 0 ) to a secure equilibrium in (G, v 0 ).The rough idea is as follows.Due to the hypotheses, the outcome π of (σ i ) i∈Π has a prefix αβ such that all goal sets visited by π are already visited by α, and such that β is a cycle.The required secure equilibrium is specified such that its outcome is equal to αβ ω and any deviating player is punished by the coalition of the other players in a way that this deviation is not profitable for him.This secure equilibrium can be constructed in a way to be finite-memory.

Proof of Proposition 4.5, Part (i).
Let us set Π = {1, . . ., n}.Let (τ i ) i∈Π be a goal-optimized and deviation-optimized secure equilibrium in the game Trunc d (T ) and π its outcome.Since We have that Visit(α) = Visit(αβγ) (no new goal set is visited after α) because |α| ≥ d goal (G) and (τ i ) i∈Π is goal-optimized.This enables us to use [BBDP11, Lemma 15] as follows.Let j ∈ Π be such that α does not visit Goal j , and suppose that player j deviates from the history α.This lemma states that for all histories hv consistent with τ −j and such that |hv| ≤ |αβ|, then the coalition formed by all the players i ∈ Π \ {j} can play to prevent player j from reaching his goal set Goal j from vertex v.It means that this coalition has a memoryless winning strategy ν v −j from vertex v in the zero-sum reachability game G j = (G j , Goal j , V ) (see Corollary 1.11).For each player i = j, let ν v i,j be the memoryless strategy of player i in G induced by ν v −j .We define a finite-memory secure equilibrium in the game T using the same idea as in the proof of [BBDP11,Proposition 25].The idea is to specify the required secure equilibrium as follows: each player i plays according to αβ ω (which is the outcome of this equilibrium) and punishes player j = i if he deviates from αβ ω , by playing according to τ i until depth |α|, and after that, by playing arbitrarily if α visits Goal j , and according to ν v i,j otherwise (where v is the vertex visited at depth |α| when deviating).
Formally we first need to specify a punishment function P .For the initial vertex v 0 , we define P (v 0 ) = ⊥ and for all histories hv ∈ H such that h ∈ H i , we let: Then the definition of the secure equilibrium (σ i ) i∈Π in T is as follows.For all i ∈ Π and h ∈ H i , where arbitrary means that the next vertex is chosen arbitrarily (in a memoryless way).Clearly the outcome of (σ i ) i∈Π is the play αβ ω .
Let us show that (σ i ) i∈Π is a secure equilibrium in the game T .Assume by contradiction that there exists a ≺ j -profitable deviation σ ′ j for player j w.r.t.(σ i ) i∈Π in T .Let τ ′ j be the strategy σ ′ j restricted to Trunc d (T ).We are going to show that τ ′ j is a ≺ j -profitable deviation for player j w.r.t.(τ i ) i∈Π in Trunc d (T ), which is impossible by hypothesis.Here are some useful notations: Notice that the play π ′ coincide with the play ρ ′ at least until depth |α| (by definition of τ ′ j and σ −j ); they can differ afterwards.Clearly π and ρ coincide at least until depth |αβ|.
(1) y ′ j < y j < +∞.As ρ = αβ ω and Visit(α) = Visit(αβγ), it means that α visits Goal j , and then y j = x j .Since y ′ j < |α|, we also have x ′ j = y ′ j (as π ′ and ρ ′ coincide until depth |α|).Therefore x ′ j < x j , and (x 1 , . . ., , it follows that x j = y j = +∞.Thus x ′ j < x j , and so (x 1 , . . ., x n ) ≺ j (x ′ 1 , . . ., x ′ n ).We show that the case y ′ j > |α| is impossible.By definition of σ −j , the play ρ ′ is consistent with τ −j until depth |α|, and then with ν v −j from ρ ′ |α| (as y j = +∞).The play ρ ′ cannot visit Goal j after a depth > |α| by definition of ν v −j . (3) We show that the case x ′ j = x j is impossible.We can show that for all i ∈ Π such that x i < +∞, we have x i ≤ x ′ i , and that there exists i ∈ Π such that x i < x ′ i .Since (τ i ) i∈Π is deviation-optimized, Lemma 4.6 implies that there exists some l ∈ Π such that x l = +∞, and This is in contradiction with (τ i ) i∈Π being a secure equilibrium in Trunc d (T ), and therefore, (σ i ) i∈Π is a secure equilibrium in T , thus in (G, v 0 ).
It remains to show that (σ i ) i∈Π is a finite-memory strategy profile.This proof is very similar to the proof of [BBDP11, Proposition 25] and thus is not given in details.Roughly speaking, a finite amount of memory is enough to produce the outcome αβ ω ; outside of this outcome it is enough to remember how (σ i ) i∈Π is defined for histories up to length |α| (after depth |α|, memoryless strategies are used).
Remark 4.7.This proof shows in fact a little stronger result: if there exists a goaloptimized and deviation-optimized secure equilibrium in Trunc d (T ), then there exists a finite-memory secure equilibrium in (G, v 0 ) with the same cost profile.Proposition 4.8.If there exists a secure equilibrium in a game (G, v 0 ), then there exists one in (G, v 0 ) which is goal-optimized and deviation-optimized.
To get a goal-optimized equilibrium, the idea is to eliminate some unnecessary cycles (see Definition 1.7).Such an idea has already been developed in [BBDP11, Lemma 19] for Nash equilibria.Unfortunately, this lemma cannot be applied for secure equilibria (as shown in Example 4.9).Adapting it to the context of secure equilibria is not trivial, the underlying constructions are more involved: we need to modify the strategies of the coalition against a deviating player.By using specific punishing strategies for the coalitions, we are then able to get a goal-optimized equilibrium that is also deviation-optimized, due to the particular form of these strategies.
Example 4.9.Consider the three-player game of Fig. 9 initialized at A, where V 1 = {A, C, D}, V 2 = {B} and V 3 = ∅, Goal 1 = Goal 2 = {A} and Goal 3 = {D}.The strategy profile (σ 1 , σ 2 , σ 3 ) defined 2 below is a secure equilibrium whose outcome is ABCBD ω and cost profile (0, 0, 5): In Example 1.8, we gave two equilibria whose outcome has unnecessary cycles.Here, we also face such a situation, with the cycle BCB.If we modify (σ 1 , σ 2 , σ 3 ) in order to remove this cycle, as done in [BBDP11, Lemma 19] for Nash equilibria, the resulting strategy profile is a Nash equilibrium with outcome ABD ω and cost profile (0, 0, 3), however it is no longer a secure equilibrium.Indeed player 1 has a ≺ 1 -profitable deviation by taking the edge (A, D) instead of (A, B), which leads to a cost of 4 for player 3 (instead of 3).In the sequel we show how to modify the approach of [BBDP11, Lemma 19] in a way to keep the property of secure equilibrium.In order to prove Proposition 4.8, we need three lemmas: Lemmas 4.11, 4.12 and 4.13.Given a secure equilibrium, Lemma 4.11 describes some particular memoryless strategies for the coalition when a player deviates.Lemma 4.12 (counterpart of [BBDP11, Lemma 19] for secure equilibria) states that we can remove a cycle from the outcome of a secure equilibrium, but the strategies have to be somewhat modified with these specific coalition strategies.This lemma is used in the proof of Proposition 4.8 to get a goal-optimized secure equilibrium.Lemma 4.13 states that we can also get a deviation-optimized secure equilibrium.

Memoryless coalition strategies.
Given a secure equilibrium in a game (G, v 0 ), we here prove the existence of interesting memoryless strategies for the coalition against a deviating player.
Let us first introduce the definition of a j-promising history for some deviating player j.Intuitively player j deviates from a strategy profile (σ i ) i∈Π and constructs a history h consistent with σ −j .This history h is called j-promising w.r.t.(σ i ) i∈Π if player j does not know yet if this deviation will be ≺ j -profitable for him w.r.t.(σ i ) i∈Π , but he can still hope that it will be, without knowing what he will play after h.Definition 4.10.Let (σ i ) i∈Π be a strategy profile in a game (G, v 0 ), with cost profile (x i ) i∈Π .Let us assume that Π = {1, . . ., n} and where 0 ≤ k < n.Let h be a history of the game such that x k ≤ |h| < x k+1 .
For any player j ∈ Π, we say that h is j-promising w.r.t.(σ i ) i∈Π if h is consistent with σ −j and if • in the case where x k+1 < +∞: we have that Cost j (h) = +∞; • in the case where x k+1 = +∞: In the case where x k+1 < +∞ and j ≤ k, along h, player j has been able to get the same cost as along ρ (Cost j (h) = x j ) and to not decrease the cost of the other players (Cost i (h) ≥ x i ).After h, he hopes to be able to play such that the resulting deviation hρ ′ will satisfy (x i ) i∈Π ≺ j Cost(hρ ′ ).In the case where j > k, player j has not visited his goal set along h, so he does not know yet if his deviation will be ≺ j -profitable for him.However he hopes to visit it early enough after h along hρ ′ , such that Cost j (hρ ′ ) < x j , or to get the same cost while increasing the cost of the other players in a way that (x i ) i∈Π ≺ j Cost(hρ ′ ).
In the case where x k+1 = +∞, the history ρ ≤|h| has visited all the goal sets Goal i such that Cost i (ρ) < +∞.Thus player j could have a ≺ j -profitable deviation hρ ′ if he can avoid visiting the goal sets Goal i , where i ≥ k + 1 (i = j).
Given a j-promising history h of player j, the next lemma describes the existence of interesting memoryless strategies of the coalition Π \ {j} from the last vertex of h.This lemma uses some qualitative two-player zero-sum reachability under safety games G −j = (G −j , R, S) associated with the coalition Π \ {j} (where G −j = (V, V \ V j , V j , E)).In such games, the coalition Π \ {j} aims at reaching R while staying in S, and player j wants to prevent this.Lemma 4.11.Let (σ i ) i∈Π be a secure equilibrium in a game (G, v 0 ), with outcome ρ and cost profile (x i ) i∈Π .Let h be a j-promising history w.r.t.(σ i ) i∈Π for some player j ∈ Π.Let us assume w.l.o.g. that Π = {1, . . ., n}.If where 0 ≤ k ≤ l ≤ n, then the coalition Π \ {j} has a memoryless winning strategy µ v −j from v = Last(h) in the qualitative two-player zero-sum game G −j = (G −j , R, S) where Goal i , and S = V \ Goal j , • if l < j and Cost(ρ ≤|h| ) j Cost(h), then R = V , and S = V \ Goal j .
In this lemma, either all goal sets are visited by ρ and l = n, or l < n and the last visited goal set is Goal l .Also notice that R = ∅ in all cases.Indeed, k = n as h is j-promising, and then the set R in the case j ≤ k of this lemma is not empty.In the third case, it is not empty either, otherwise we would have k + 1 = l + 1 = n = j but such a situation is impossible because h is j-promising w.r.t.(σ i ) i∈Π (see the last case of Definition 4.10) and (σ i ) i∈Π is a secure equilibrium .
Proof of Lemma 4.11.By contradiction assume that the coalition Π \ {j} has no winning strategy from v in the game G −j = (G −j , R, S), i.e. no winning strategy from v to reach R while staying in S. By Corollary 1.11, it implies that player j has a memoryless winning strategy µ v j from v to stay outside R or to reach V \ S. Recall that h is consistent with σ −j as it is j-promising w.r.t.(σ i ) i∈Π .Let ρ ′ be the play with prefix h that is consistent with σ −j , and with µ v j from v (see Fig. 10).In the four cases of the lemma, we then prove that (x i ) i∈Π ≺ j (Cost i (ρ ′ )) i∈Π , meaning that player j has a ≺ j -profitable deviation w.r.t.(σ i ) i∈Π , which is impossible.
Figure 10: Play ρ and its deviation ρ ′ with prefix h.
The strategy µ v j enables to avoid all goal sets Goal i where i > k.As h is j-promising, we have that Cost j (h) = x j and ∀i ∈ Π, Cost i (h) ≥ x i .By construction of ρ ′ and as x k ≤ |h| < x k+1 , we have that Then for all i ∈ Π, we have that Cost i (ρ ′ ) ≥ x i .It remains to show that the cost of one player is strictly increased in ρ ′ compared with ρ.In the case where x k+1 < +∞, i.e. k < l, we have in particular that x l < +∞ and Cost l (ρ ′ ) = +∞.And in the case where x k+1 = +∞ (k = l), we have that (x i ) i∈Π ≺ j Cost(h) (by definition of j-promising), i.e. there exists i ∈ Π such that x i < Cost i (h).Either Cost i (h) = Cost i (ρ ′ ) and then As µ v j is memoryless, this strategy enables player j to reach his goal set Goal j from v within |V | steps.Thus, we have that since k < j ≤ l, and so, (x i ) i∈Π ≺ j (Cost i (ρ ′ )) i∈Π .
The strategy µ v j enables to avoid all goal sets Goal i where i > k and i = j, or to visit the goal set Goal j .On one hand, if ρ ′ visits Goal j , then Cost j (ρ ′ ) < +∞ = x j as j > l, and so, (x i ) i∈Π ≺ j (Cost i (ρ ′ )) i∈Π .On the other hand, if ρ ′ does not visit Goal j , then ρ ′ does not visit either any Goal i with i > k.Since Cost(ρ ≤|h| ) j Cost(h), the situation is quite similar to the first case, and we can deduce that Thus, for all i ∈ Π, we have that Cost i (ρ ′ ) ≥ x i .Moreover, exactly like in the case j ≤ k, we can show that there exists i ∈ Π such that x i < Cost i (ρ ′ ).Then it implies that (x i ) i∈Π ≺ j (Cost i (ρ ′ )) i∈Π .
Like in the second case, the strategy µ v j enables player j to reach his goal set Goal j from v. Then we have that Cost j (ρ ′ ) < +∞ = x j and so, ( Removing a cycle.The next lemma states that it is possible to modify the strategy profile of a secure equilibrium in a way to eliminate an unnecessary cycle in its outcome.In the notations of this lemma, notice that β is the eliminated cycle (condition Last(α) = Last(αβ)), notice also that a new goal set is visited after αβγ (condition Visit(ρ) = Visit(α)).The elimination of the cycle is possible by modifying the strategies of the coalitions into strategies as described in Lemma 4.11.
Proof.Let (x i ) i∈Π be the cost profile of ρ.Let us assume w.l.o.g. that Π = {1, . . ., n} and Let us define the required strategy profile (τ i ) i∈Π with the aim to get the outcome αγ ρ by eliminating β in ρ.For all i ∈ Π and all histories h ∈ H i , we set In this definition, arbitrary means that the next vertex is chosen arbitrarily, and the punishment function P is defined as in the proof of Proposition 4.5, Part (i) (adapted to the play αγ ρ).Moreover, when a player j deviates, each player i = j plays according to σ i , except in the case of a j-promising history h of length |α| from which he plays according to µ v −j , with v = Last(h) (see Lemma 4.11).Notation µ v i,j means the memoryless strategy of player i induced by µ v −j .
We observe that the outcome of (τ i ) i∈Π is the play π = αγ ρ (see Fig. 11 and 12).Let us write its cost profile as (y 1 , . . ., y n ).It follows that for all i ∈ Π, y i ≤ x i .More precisely, Figure 12: Play π and possible deviations.
Assume that there exists a ≺ j -profitable deviation τ ′ j for player j w.r.t.(τ i ) i∈Π .Let π ′ be the outcome of the strategy profile (τ ′ j , τ −j ) from v 0 , and (y ′ 1 , . . ., y ′ n ) its cost profile.Then we know that (y 1 , . . ., y n ) ≺ j (y ′ 1 , . . ., y ′ n ).Two possible situations occur according to where player j deviates from π.We show that the first situation is impossible.In the second one, we construct a ≺ j -profitable deviation σ ′ j for player j w.r.t.(σ i ) i∈Π , and then get a contradiction with (σ i ) i∈Π being a secure equilibrium.(i) player j deviates from π strictly before depth |α| (see the play π ′ 1 in Fig. 12).Let us consider the prefix h of π ′ of length |α|.We first state that h cannot visit Goal j in a way that Cost j (h) < x j , because h is consistent with σ −j (by definition of (τ i ) i∈Π ), and (σ i ) i∈Π is a secure equilibrium.Therefore, h is a j-promising history w.r.
The coalition Π \ {j} forces the play π ′ to visit Goal i , for a certain i > k, i = j, before depth |α| + |V |, while avoiding the visit of Goal j (then, y j = y ′ j = +∞).As in the first case, this leads to a contradiction with the fact that (y 1 , . . ., y n ) ≺ j (y ′ 1 , . . ., y ′ n ).
We define for all histories h ∈ H j : As player j deviates after α with the strategy τ ′ j , one can prove that π ′ = απ ′ and ρ ′ = αβ π′ by definition of (τ i ) i∈Π (see the play ρ ′ 2 in Fig. 11).Since Visit(α) = Visit(αβ), Equations (4.3), (4.4) and (4.5) also stand by replacing x i with x ′ i and y i with y ′ i (but the value of l might be different).Then j is a ≺ j -profitable deviation for player j w.r.t.(σ i ) i∈Π , and this is a contradiction.
Goal-and deviation-optimized secure equilibrium.The next lemma uses the ideas developed in the proof of Lemma 4.12 to show that any secure equilibrium can be transformed into one that is deviation-optimized.It is the last step before proving Proposition 4.8, and finally Part (ii) of Proposition 4.5.Lemma 4.13.Let (σ i ) i∈Π be a secure equilibrium in a game (G, v 0 ), with outcome ρ.Then there exists a deviation-optimized secure equilibrium (τ i ) i∈Π in (G, v 0 ) with outcome ρ.
Proof.Let α be the prefix of ρ of length max{Cost i (ρ) | Cost i (ρ) < +∞}.It follows that Visit(ρ) = Visit(α).Then we define the required strategy profile (τ i ) i∈Π exactly like in the proof of Lemma 4.12.We only remove the first line of the definition: τ i (h) = σ i (αβδ) if h = αδ.One can be convinced that (τ i ) i∈Π and (σ i ) i∈Π have the same outcome ρ.We prove in the exact same way that (τ i ) i∈Π is a secure equilibrium in (G, v 0 ) (here, k = l).
We are now able to prove Proposition 4.8, which states that if there exists a secure equilibrium in a game (G, v 0 ), then there exists one which is goal-optimized and deviationoptimized.
Proof of Proposition 4.8.Let (σ i ) i∈Π be a secure equilibrium in (G, v 0 ) with outcome ρ = (σ i ) i∈Π v 0 and cost profile (x i ) i∈Π .Let us assume w.l.o.g. that Π = {1, . . ., n} and and while it is still the case, we apply the following procedure to get a goal-optimized secure equilibrium.
By applying finitely many times this procedure, we can assume w.l.o.g. that (σ i ) i∈Π is a secure equilibrium with a cost profile (x 1 , . . ., x n ) such that meaning that (σ i ) i∈Π is a goal-optimized secure equilibrium.Moreover, by Lemma 4.13, there exists a deviation-optimized secure equilibrium with the same outcome, i.e. a goal-optimized and deviation-optimized secure equilibrium.And this concludes the proof.
Remark 4.14.Regarding the costs, this proof shows that if there exists a secure equilibrium with cost profile (a i ) i∈Π in a game (G, v 0 ), then there exists a goal-optimized and deviationoptimized secure equilibrium with cost profile (b i ) i∈Π in (G, v 0 ), such that for all i ∈ Π, b i ≤ a i .In particular, the cost profile is usually not preserved.Proof of Proposition 4.5, Part (ii).Let (σ i ) i∈Π be a secure equilibrium in (G, v 0 ) with outcome ρ.By Proposition 4.8, we can suppose w.l.o.g. that (σ i ) i∈Π is goal-optimized and deviation-optimized.Let us define the strategy profile (τ i ) i∈Π in Trunc d (T ) as the strategy profile (σ i ) i∈Π restricted to the finite tree Trunc d (T ).We prove that (τ i ) i∈Π is a secure equilibrium in Trunc d (T ), which is clearly goal-optimized (d > d goal (G)).
Remark 4.15.This proof shows in particular that if there exists a goal-optimized and deviation-optimized secure equilibrium in (G, v 0 ), then there exists a goal-optimized and deviation-optimized secure equilibrium in Trunc d (T ) with the same cost profile.Together with Remark 4.14, we then proved the following result: if there exists a secure equilibrium with cost profile (a i ) i∈Π in (G, v 0 ), then there exists a goal-optimized and deviationoptimized secure equilibrium with cost profile (b i ) i∈Π in Trunc d (T ), such that for all i ∈ Π, b i ≤ a i .Remarks 4.7 and 4.15 imply the proposition below.Proposition 4.16.Given a multiplayer quantitative reachability game and a tuple of thresholds (t i ) i∈Π ∈ (R ∪ {+∞}) Π , one can decide in ExpSpace whether there exists a secure equilibrium with cost profile (c i ) i∈Π such that for all i ∈ Π, c i ≤ t i .
The decision problem related to Proposition 4.16 is equivalent to decide whether there exists a goal-optimized and deviation-optimized secure equilibrium with cost profile (a i ) i∈Π in Trunc d (T ) where d = d goal (G) + 3 • |V |, such that for all i ∈ Π, a i ≤ t i .Notice that d does not depend on (t i ) i∈Π .

Conclusion and Perspectives
In this paper, we study the concept of subgame perfect equilibrium, a refinement of Nash equilibrium well-suited to the framework of games played on graphs.We also introduce the new concept of subgame perfect secure equilibrium.We prove the existence of subgame perfect equilibria in multiplayer quantitative reachability games.We also prove the existence of subgame perfect secure equilibria, but only in the two-player framework.Finally, we provide an algorithm deciding in ExpSpace the existence of secure equilibria in the multiplayer case.On the one hand, the first two results have been obtained by topological techniques, that are completely different from the techniques used in [BBDP10,BBDP11].On the other hand, proofs of the last result are strongly inspired by proofs developed in these references, but have required new ideas about the coalition strategies.
There are several interesting directions for future research.We are currently working on the model of quantitative game, enriched by allowing n-tuples of positive weights on edges (see Theorem 2.4).We do believe that our results remain true in this context.The case of Nash equilibria is already treated in [BBDP11].Notice that our results trivially generalize to the particular case where the weights of the edges are of the form (c, . . ., c) with c ∈ N 0 .Indeed it is enough to replace each such edge by a path of length c composed of c new edges (of cost 1).
To the best of our knowledge, the existence of secure equilibria in the multi-player framework is still an open problem.We prove that the existence of a secure equilibrium in an infinite game is equivalent to the existence of a goal-optimized and deviation-optimized secure equilibrium in a finite game.This open problem could be positively solved if Corollary 1.14 could be adapted in a way to get a goal-optimized and deviation-optimized secure equilibrium in the finite game, and then by applying Proposition 4.5.A deeper understanding of equilibria with unnecessary cycles could also be helpful.For the moment, we are not able to solve this problem with more than two players.The same kind of question is also open for subgame perfect secure equilibria.
Another research direction concerns a deeper study of the memory needed in the different kinds of equilibria.In the case of subgame perfect equilibria and subgame perfect secure equilibria, the topological techniques give no results on the memory needed.However, in the case of secure equilibria, we prove that we can limit to finite-memory equilibria.

Figure 6 :
Figure 6: The game T with its subgame (T | h , v).
where d dev = max{Cost i (ρ) | Cost i (ρ) < +∞} + |V | and ρ ′ = σ ′ j , σ −j v 0 .Remark that Definitions 4.3 and 4.4 extend to games Trunc d (T ) where d ≥ d goal (G).We can now state the key proposition for proving Theorems 4.1 and 4.2.Proposition 4.5.Let (G, v 0 ) be a game, and d = d goal (G) + 3 • |V |. (1) If there exists a goal-optimized and deviation-optimized secure equilibrium in Trunc d (T ), then there exists a secure equilibrium in (G, v 0 ) that is finite-memory.(2) If there exists a secure equilibrium in (G, v 0 ), then there exists a goal-optimized and deviation-optimized secure equilibrium in Trunc d (T ).At this stage, it is difficult to give some intuition about the choice of the values d goal (G), d dev and d = d goal (G) + 3 • |V |.These values are linked to the proofs contained in this section.Proof of Theorem 4.1.By Proposition 4.5, there exists a secure equilibrium in (G, v 0 ) iff there exists a goal-optimized and deviation-optimized secure equilibrium in Trunc d (T ), with d = d goal (G) + 3 • |V |.The latter property is decidable in NExpSpace (in |V | and |Π|).Indeed, Trunc d (T ) has an exponential size.Guessing a strategy profile (σ i ) i∈Π in this tree also needs an exponential size.Then we can test in exponential size whether (σ i ) i∈Π is a goal-optimized and deviation-optimized secure equilibrium in Trunc d (T ).By Savitch's theorem, deciding the existence of a secure equilibria is thus in ExpSpace.Proof of Theorem 4.2.This theorem is a direct consequence of Proposition 4.5.Indeed consider a secure equilibrium in a game (G, v 0 ).We first apply Proposition 4.5 (Part (ii )) to this strategy profile to get a goal-optimized and deviation-optimized secure equilibrium (σ i ) i∈Π in Trunc d (T ), for d = d goal (G) + 3 • |V |.Then we apply Proposition 4.5 (Part (i )) to the equilibrium (σ i ) i∈Π , to get a finite-memory secure equilibrium back in (G, v 0 ).

4. 2 .
Part (ii) of Proposition 4.5.Part (ii) of Proposition 4.5 states that if there exists a secure equilibrium in a game (G, v 0 ), then there exists a goal-optimized and deviationoptimized secure equilibrium in Trunc d (T ), for d = d goal (G) + 3 • |V |.The proof needs several steps.Suppose that there exists a secure equilibrium (σ i ) i∈Π in (G, v 0 ).The first step consists in transforming (σ i ) i∈Π into a goal-optimized and deviation-optimized secure equilibrium in (G, v 0 ) (Proposition 4.8); the second step in showing that its restriction to Trunc d (T ) with d = d goal (G) + 3 • |V | is still a goal-optimized and deviation-optimized secure equilibrium in Trunc d (T ).

Finally, on the
basis of Proposition 4.8, we are able to prove Part (ii) of Proposition 4.5: given a game (G, v 0 ), if there exists a secure equilibrium in (G, v 0 ), then there exists a goaloptimized and deviation-optimized secure equilibrium in Trunc d (T ), for d = d goal (G)+3•|V |.
ON (SUBGAME PERFECT) SECURE EQUILIBRIUM IN QUANTITATIVE REACHABILITY GAMES 27