Subgame-perfect Equilibria in Mean-payoff Games

In this paper, we provide an effective characterization of all the subgame-perfect equilibria in infinite duration games played on finite graphs with mean-payoff objectives. To this end, we introduce the notion of requirement, and the notion of negotiation function. We establish that the plays that are supported by SPEs are exactly those that are consistent with the least fixed point of the negotiation function. Finally, we show that the negotiation function is piecewise linear, and can be analyzed using the linear algebraic tool box. As a corollary, we prove the decidability of the SPE constrained existence problem, whose status was left open in the literature.


Introduction
The notion of Nash equilibrium (NE) is one of the most important and most studied solution concepts in game theory.A profile of strategies is an NE when no rational player has an incentive to change their strategy unilaterally, i.e. while the other players keep their strategies.Thus an NE models a stable situation.Unfortunately, it is well known that, in sequential games, NEs suffer from the problem of non-credible threats, see e.g.[Osb04].In those games, some NEs only exist when some players do not play rationally in subgames and so use non-credible threats to force the NE.This is why, in sequential games, the stronger notion of subgame-perfect equilibrium is used instead: a profile of strategies is a subgame-perfect equilibrium (SPE) if it is an NE in all the subgames of the sequential game.Thus SPEs impose rationality even after a deviation has occured.
In this paper, we study sequential games that are infinite-duration games played on graphs with mean-payoff objectives, and focus on SPEs.While NEs are guaranteed to exist in infinite duration games played on graphs with mean-payoff objectives, it is known that it is not the case for SPEs, see e.g.[SV03,BBMR15].We provide in this paper a constructive characterization of the entire set of SPEs, which allows us to decide, among others, the SPE threshold problem.This problem was left open in previous contributions on the subject.More precisely, our contributions are described in the next paragraphs.
1.1.Contributions.First, we introduce two important new notions that allow us to capture NEs, and more importantly SPEs, in infinite duration games played on graphs with meanpayoff objectives 1 : the notion of requirement and the notion of negotiation function.
A requirement λ is a function that assigns to each vertex v ∈ V of a game graph a value in R ∪ {−∞, +∞}.The value λ(v) represents a requirement on any play ρ = ρ 0 ρ 1 . . .ρ n . . .that traverses this vertex: if one wants the player who controls the vertex v to follow ρ and to give up deviating from ρ, then the play must offer to that player a payoff that is at least λ(v).An infinite play ρ is λ-consistent if, for each player i, the payoff of ρ for player i is larger than or equal to the largest value of λ on vertices occurring along ρ and controlled by player i.
We first use these notions to rephrase a classical result about NEs: if λ maps each vertex v to the largest value that the player who controls v can secure against a fully adversarial coalition of the other players, i.e. if λ(v) is the zero-sum worst-case value, then the set of plays that are λ-consistent is exactly the set of plays that are supported by an NE (Theorem 3.8).
As SPEs are forcing players to play rationally in all subgames, we cannot rely on the zero-sum worst-case value to characterize them.Indeed, when considering the worst-case value, we allow adversaries to play fully adversarially after a deviation and so potentially in an irrational way w.r.t.their own objective.In fact, in an SPE, a player is refrained to deviate when opposed by a coalition of rational adversaries.To characterize this relaxation of the notion of worst-case value, we rely on our notion of negotiation function.
The negotiation function nego operates from the set of requirements into itself.To understand the purpose of the negotiation function, let us consider its application on the requirement λ that maps every vertex v to the worst-case value as above.Now, we can naturally formulate the following question: given v and λ, can the player who controls v improve the value that they can ensure against all the other players, if only plays that are consistent with λ are proposed by the other players?In other words, can this player enforce a better value when playing against the other players if those players are not willing to give away their own worst-case value?Clearly, securing this worst-case value can be seen as a minimal goal for any rational adversary.So nego(λ)(v) returns this value; and this reasoning can be iterated.One of the contributions of this paper is to show that the least fixed point λ * of the negotiation function is exactly characterizing the set of plays supported by SPEs (Theorem 4.4).
To turn this fixed point characterization of SPEs into algorithms, we additionally draw links between the negotiation function and two classes of zero-sum games, that are called abstract and concrete (see Theorem 6.3) negotiation games.The abstract negotiation game is conceptually simple but is played on an uncountably infinite graph, and therefore cannot be turned into an effective algorithm.However, it captures the intuition behind the concrete negotiation game, which is played on a finite graph.We show that in the concrete negotiation game, one of the players has a memoryless optimal strategy (Lemma 6.4), which can be used to solve it effectively.Thus, the negotiation function is computable.However, that is not 1 A large part of our results apply to the larger class of games with prefix-independent objectives.For the sake of readability of this introduction, we focus here on mean-payoff games but the technical results in the paper usually cover broader classes of games.
sufficient to compute its least fixed point, because the sequence of Kleene-Tarski's iterations may require a transfinite number of steps to reach a fixed point (Theorem 5.3).
Nevertheless, we prove that the concrete negotiation game can be used to construct a geometrical representation of the fixed points of the negotiation function, from which one can extract its least fixed point (Theorem 7.3).Thus, the SPE threshold problem is decidable (Theorem 8.3).
All the previous results do also apply to ε-SPEs, a classical quantitative relaxation of SPEs -see for example [FP16].In particular, Theorem 8.3 does also apply to the ε-SPE threshold problem.
1.2.Related works.Non-zero sum infinite duration games have attracted a large attention in recent years, with applications targeting reactive synthesis problems.We refer the interested reader to the following survey papers [BCH + 16, Bru17] and their references for the relevant literature.We detail below contributions more closely related to the work presented here.
In [BDS13], Brihaye et al. offer a characterization of NEs in quantitative games for cost-prefix-linear reward functions based on the worst-case value.The mean-payoff is costprefix-linear.In their paper, the authors do not consider the stronger notion of SPE, which is the central solution concept studied in our paper.In [BMR14], Bruyère et al. study secure equilibria that are a refinement of NEs.Secure equilibria are not subgame-perfect and are, as classical NEs, subject to non-credible threats in sequential games.
In [Umm06], Ummels proves that there always exists an SPE in games with ω-regular objectives and defines algorithms based on tree automata to decide constrained SPE problems.Strategy logics, see e.g.[CHP10], can be used to encode the concept of SPE in the case of ω-regular objectives with application to the rational synthesis problem [KPV16] for instance.In [FKM + 10], Flesch et al. show that the existence of ε-SPEs is guaranteed when the reward function is lower-semicontinuous.The mean-payoff reward function is neither ω-regular, nor lower-semicontinuous, and so the techniques defined in the papers cited above cannot be used in our setting.Furthermore, as already recalled above, see e.g.[VS03,BBMR15], contrary to the ω-regular case, SPEs in games with mean-payoff objectives may fail to exist.
In [BBMR15], Brihaye et al. introduce and study the notion of weak subgame-perfect equilibria, which is a weakening of the classical notion of SPE.This weakening is equivalent to the original SPE concept on reward functions that are continuous.This is the case for example for the quantitative reachability reward function, on which Brihaye et al. solve the SPE threshold problem in [BBG + 19].On the contrary, the mean-payoff cost function is not continuous and the techniques used in [BBMR15], and generalized in [BRPR17], cannot be used to characterize SPEs for the mean-payoff reward function.
In [Meu16], Meunier develops a method based on Prover-Challenger games to solve the problem of the existence of SPEs on games with a finite number of possible payoffs.This method is not applicable to the mean-payoff reward function, as the number of payoffs in this case is uncountably infinite.
In [FP17], Flesch and Predtetchinski present another characterization of SPEs on games with finitely many possible payoffs, based on a game structure that we will present here under the name of abstract negotiation game.Our contributions differ from this paper in two fundamental aspects.First, it lifts the restriction to finitely many possible payoffs.This is crucial as mean-payoff games violate this restriction.Instead, we identify a class of games, that we call with steady negotiation, that encompasses mean-payoff games and for which some of the conceptual tools introduced in that paper can be generalized.Second, the procedure developed by Flesch and Predtetchinski is not an algorithm in computer science acceptation: it needs to solve infinitely many games that are not represented effectively, and furthermore it needs a transfinite number of iterations.On the contrary, our procedure is effective and leads to a complete algorithm in the classical sense: with guarantee of termination in finite time and applied on effective representations of games.
In [CDE + 10], Chaterjee et al. study mean-payoff automata, and give a result that can be translated into an expression of all the possible payoff vectors in a mean-payoff game.In [BR15], Brenguier and Raskin give an algorithm to build the Pareto curve of a multi-dimensional two-player zero-sum mean-payoff game.Techniques defined in these papers are used in several technical steps of our algorithm.1.3.Structure of the paper.In Section 2, we introduce the necessary background.Section 3 defines the notion of requirement and the negotiation function.Section 4 shows that the set of plays that are supported by an SPE are those that are λ-consistent, where λ is a fixed point of the negotiation function.Section 5 draws a link between the negotiation function and the abstract negotiation game.Section 6 shows that the abstract negotiation game can be transformed into a game on a finite graph, the concrete negotiation game, which can be solved to compute effectively the negotiation function.Section 7 uses the concrete negotiation game to prove that the negotiation function is a piecewise affine function, of which one can compute an effective representation.Finally, Section 8 applies these results to prove that the SPE threshold problem in mean-payoff games is decidable, 2ExpTime-easy and NP-hard.

Background
2.1.Games, strategies, equilibria.In all what follows, we will use the word game for the infinite duration turn-based quantitative games on finite graphs with complete information.Definition 2.1 (Non-initialized game).A non-initialized game -or game for short -is a tuple G = (Π, V, (V i ) i∈Π , E, µ), where: • Π is a finite set of players; • (V, E) is a directed graph, called the underlying graph of G, whose vertices are sometimes called states and whose edges are sometimes called transitions, and in which every state has at least one outgoing transition.For the simplicity of writing, a transition (v, w) ∈ E will often be written vw.
Definition 2.2 (Initialized game).An initialized game -or game for short -is a tuple (G, v 0 ), often written G ↾v 0 , where G is a non-initialized game and v 0 ∈ V is a state called initial state.
When the context is clear, we often use the word game for both initialized and noninitialized games.
Definition 2.3 (Play, history).A play (resp.history) in the game G is an infinite (resp.finite) path in the graph (V, E).It is also a play (resp.history) in the initialized game G ↾v 0 , where v 0 is its first vertex.The set of plays (resp.histories) in the game G (resp. the initialized game G ↾v 0 ) is denoted by PlaysG (resp.PlaysG ↾v 0 , HistG, HistG ↾v 0 ).We write Hist i G (resp.Hist i G ↾v 0 ) for the set of histories in G (resp.G ↾v 0 ) of the form hv, where v is a vertex controlled by player i.
Given a play ρ (resp.a history h), we write Occ(ρ) (resp.Occ(h)) the set of vertices that appear in ρ (resp.h), and Inf(ρ) the set of vertices that appear infinitely often in ρ.For a given index k, we write ρ ≤k (resp.h ≤k ), or ρ <k+1 (resp.h <k+1 ), the finite prefix ρ 0 . . .ρ k (resp.h 0 . . .h k ), and ρ ≥k (resp.h ≥k ), or ρ >k−1 (resp.h >k−1 ), the infinite (resp.finite) suffix ρ k ρ k+1 . . .(resp.h k h k+1 . . .h |h|−1 ).Finally, we write first(ρ) (resp.first(h)) the first vertex of ρ, and last(h) the last vertex of h.Definition 2.4 (Strategy, strategy profile).A strategy for player i in the initialized game G ↾v 0 is a function σ i : Hist i G ↾v 0 → V , such that vσ i (hv) is an edge of (V, E) for every hv.A history h is compatible with a strategy σ i if and only if h k+1 = σ i (h 0 . . .h k ) for all k such that h k ∈ V i .A play ρ is compatible with σ i if all its prefixes are.
A strategy profile for P ⊆ Π is a tuple σP = (σ i ) i∈P , where each σ i is a strategy for player i in G ↾v 0 .A play or a history is compatible with σP if it is compatible with every σ i for i ∈ P .A complete strategy profile, usually written σ, is a strategy profile for Π.Exactly one play is compatible with the strategy profile σ: we call it its outcome and write it ⟨σ⟩.
When i is a player and when the context is clear, we will often write −i for the set Π \ {i}.We will often refer to Π \ {i} as the environment against player i.When τP and τ ′ Q are two strategy profiles with P ∩ Q = ∅, (τ P , τ ′ Q ) denotes the strategy profile σP ∪Q such that σ i = τ i for i ∈ P , and σ i = τ ′ i for i ∈ Q.In a strategy profile σP , the σ i 's domains are pairwise disjoint.Therefore, we can consider σP as one function: for hv ∈ HistG ↾v 0 such that v ∈ i∈P V i , we liberally write σP (hv) for σ i (hv) with i such that v ∈ V i .
Before moving on to SPEs, let us recall the notion of Nash equilibrium.
Definition 2.5 (Nash equilibrium).Let G ↾v 0 be a game.A strategy profile σ is a Nash equilibrium -or NE for short -in G ↾v 0 if and only if for each player i and for every strategy σ ′ i , called deviation of σ i , we have the inequality µ i (⟨σ ′ i , σ−i ⟩) ≤ µ i (⟨σ⟩).To define SPEs, we need the notion of subgame.
Definition 2.6 (Subgame, substrategy).Let hv be a history in the game G.The subgame of G after hv is the game (Π, V, (V i ) i , E, µ ↾hv ) ↾v , where µ ↾hv maps each play to its payoff in G, assuming that the history hv has already been played: formally, for every ρ ∈ PlaysG ↾hv , we have µ ↾hv (ρ) = µ(hρ).
If σ i is a strategy in G ↾v 0 , its substrategy after hv is the strategy σ i↾hv in G ↾hv , defined by σ i↾hv (h ′ ) = σ i (hh ′ ) for every h ′ ∈ Hist i G ↾hv .
Remark 2.7.The initialized game G ↾v 0 is also the subgame of G after the one-state history v 0 .Definition 2.8 (Subgame-perfect equilibrium).Let G ↾v 0 be a game.The strategy profile σ is a subgame-perfect equilibrium -or SPE for short -in G ↾v 0 if and only if for every history h in G ↾v 0 , the strategy profile σ↾h is a Nash equilibrium in the subgame G ↾h .
The notion of subgame-perfect equilibrium can be seen as a refinement of Nash equilibrium: it is a stronger equilibrium which excludes players resorting to non-credible threats.
Example 2.9.In the game represented in Figure 1a, where the square state is controlled by player and the round states by player , if both players get the payoff 1 by reaching the state d and 0 in the other cases, there are actually two NEs: one, in blue, where goes to the state b and then player goes to d, and both win, and one, in red, where player goes to c because player was planning to go to e.However, only the blue one is an SPE, as moving from b to e is irrational for player in the subgame G ↾ab .
An ε-SPE is a strategy profile which is almost an SPE: if a player deviates after some history, they will not be able to improve their payoff by more than a quantity ε ≥ 0.
Hereafter, we focus on prefix-independent games, and in particular mean-payoff games.

Mean-payoff games.
Definition 2.12 (Mean-payoff, mean-payoff game).In a graph (V, E), we associate to each reward function r : E → Q the mean-payoff function: A game G = (Π, V, (V i ) i , E, µ) is a mean-payoff game if its underlying graph is finite, and if there exists a tuple (r i ) i∈Π of reward functions, such that for each player i and every play ρ: In a mean-payoff game, the quantities given by the function r i represent the immediate reward that each action gives to player i.The final payoff of player i is their average payoff along the play, classically defined as the limit inferior over n (since the limit may not be defined) of the average payoff after n steps.When the context is clear, we liberally write MP i (h) for MP r i (h), and MP(h) for (MP i (h)) i , as well as r(uv) for (r i (uv)) i .Definition 2.13 (Prefix-independent game).A game G is prefix-independent if, for every history h and for every play ρ, we have µ(hρ) = µ(ρ).We also say, in that case, that the payoff function µ is prefix-independent.
Mean-payoff games are prefix-independent.A first important result that we need is the characterization of the set of possible payoffs in a mean-payoff game, which has been introduced in [CDE + 10].Given a graph (V, E), we write SC(V, E) the set of simple cycles it contains.Given a finite set D of dimensions and a set X ⊆ R D , we write ConvX the convex hull of X.We will often use the subscript notation Conv x∈X f (x) for the set Convf (X).MP(c) .
2.3.The ε-SPE threshold problem.In the sequel, we prove the decidability of the ε-SPE threshold problem, which is a generalization of the SPE threshold problem (since SPEs are 0-SPEs and conversely, by Remark 2.11), defined as follows.
That problem is illustrated by the two following examples.
Example 2.18 (A game without SPEs).Let G be the mean-payoff game of Figure 1b, where each edge is labelled by the rewards r and r .No reward is given for the edges ac and bd since they can be used only once, and therefore do not influence the final payoff.For now, the reader should not pay attention to the red labels below the states.As shown in [BRPR16], this game does not have any SPE, neither from the state a nor from the state b.Indeed, the only NE outcomes from the state b are the plays where player eventually leaves the cycle ab and goes to d: if he stays in the cycle ab, then player would be better off leaving it, and if she does, player would be better off leaving it before.From the state a, if player knows that player will leave, she has no incentive to do it before: there is no NE where leaves the cycle and plans to do it if ever she does not.Therefore, there is no SPE where leaves the cycle.But then, after a history that terminates in b, player has actually no incentive to leave if player never plans to do it afterwards: contradiction. to the gray and blue areas in Figure 2b.Indeed, following exclusively one of the three simple cycles a, ab and b of the game graph during a play yields the payoffs 01, 10 and 22, respectively.By combining those cycles with well chosen frequencies, one can obtain any payoff in the convex hull of those three points.Now, it is also possible to obtain the point 00 by using the properties of the limit inferior: it is for instance the payoff of the play a 2 b 4 a 16 b 256 . . .a 2 2 n b 2 2 n+1 . . . .In fact, one can construct a play that yields any payoff in the convex hull of the four points 00, 10, 01, and 22.
We claim that the payoffs of SPEs plays correspond to the red-circled area in Figure 2b: there exists an SPE σ in G ↾a with ⟨σ⟩ = ρ if and only if µ (ρ), µ (ρ) ≥ 1.That statement will be a direct consequence of the results we show in the remaining sections, but let us give a first intuition: a play with such a payoff necessarily uses infinitely often both states.It is an NE outcome because none of the players can get a better payoff by looping forever on their state, and they can both force each other to follow that play, by threatening them to loop for ever on their state whenever they can.But such a strategy profile is clearly not an SPE.
It can be transformed into an SPE as follows: when a player deviates, say player , then player can punish him by looping on a, not forever, but a large number of times, until player 's mean-payoff gets very close to 1. Afterwards, both players follow again the play that was initially planned.Since that threat is temporary, it does not affect player 's payoff on the long term, but it really punishes player if that one tries to deviate infinitely often.
Not that such an SPE requires infinite memory.
2.4.Two-player zero-sum games.The concept of SPEs has been designed for non-zerosum games with arbitrarily many players, but the methods we will present in the sequel will bring us back to the more classical framework of two-player zero-sum games, with more complex payoff functions.We will therefore need the following notions and results.
Definition 2.21 (Borel game).A game G is Borel if the function µ, from the set V ω equipped with the product topology to the Euclidian space R Π , is Borel, i.e. if, for every Borel set B ⊆ R Π , the set µ −1 (B) is Borel.
Definition 2.24 (Optimal strategy).Let G ↾v 0 be a zero-sum Borel game, with Now, let us define memoryless strategies, and state a condition under which they can be optimal.
Definition 2.25 (Memoryless strategy).A strategy σ i in a game G ↾v 0 is memoryless if for all vertices v ∈ V i and for all histories h and h ′ , we have For every game G ↾v 0 and each player i, we write ML i (G ↾v 0 ), or ML (G ↾v 0 ) when the context is clear, for the set of memoryless strategies for player i in G ↾v 0 .
Lemma 2.29.In a two-player zero-sum game played on a finite graph, every player whose payoff function is concave has an optimal strategy that is memoryless.
Proof.According to [Kop06], this result is true for qualitative objectives, i.e. when µ can only take the values 0 and 1.It follows that for every α ∈ R, if a player i, whose payoff function is concave, has a strategy that ensures µ i (ρ) ≥ α (understood as a qualitative objective), then they have a memoryless one.Hence the equality: Since the underlying graph (V, E) is assumed to be finite, there exists a finite number of memoryless strategies, hence the infimum above is realized by a memoryless strategy σ 1 that is, therefore, finite.

Requirements and negotiation
We will now see that SPEs are strategy profiles that respect some requirements about the payoffs, depending on the states they traverse.In this part, we develop the notions of requirement and negotiation.
3.1.Requirement.In the method we will develop further, we will need to analyze the players' behaviours when they have some requirement to satisfy.Intuitively, one can see requirements as rationality constraints for the players, that is, a threshold payoff value under which a player will not accept to follow a play.In all what follows, R denotes the set R ∪ {±∞}.
Definition 3.1 (Requirement).A requirement on the game G is a mapping λ : V → R.
For a given state v, the quantity λ(v) represents the minimal payoff that the player controlling v will require in a play beginning in v.
The set of the λ-consistent plays from a state v is denoted by λCons(v).Definition 3.3 (λ-rationality).Let λ be a requirement on a mean-payoff game G. Let i ∈ Π.A strategy profile σ−i is λ-rational if and only if there exists a strategy σ i such that, for every history hv compatible with σ−i , the play ⟨σ ↾hv ⟩ is λ-consistent.We then say that the strategy profile σ−i is λ-rational assuming σ i .The set of λ-rational strategy profiles in G ↾v is denoted by λRat(v).
Note that λ-rationality is a property of a strategy profile for all the players but one, player i. Intuitively, their rationality is justified by the fact that they collectively assume that player i will, eventually, play according to the strategy σ i : if it is the case, then everyone gets their payoff satisfied.
Finally, let us define a particular requirement: the vacuous requirement, that requires nothing, and with which every play is consistent.Definition 3.4 (Vacuous requirement).In any game, the vacuous requirement, denoted by λ 0 , is the requirement constantly equal to −∞.
3.2.Negotiation.We will show that SPEs in prefix-independent games are characterized by the fixed points of a function on requirements.That function captures a negotiation process: when a player has a requirement to satisfy, another player can hope a better payoff than what they can secure in general, and therefore update their own requirement.Note that we always use the convention inf ∅ = +∞.Definition 3.5 (Negotiation function, steady negotiation).Let G be a game.The negotiation function is the function that transforms any requirement λ on G into a requirement nego(λ) on G, such that for each i ∈ Π and v ∈ V i , we have: If that infimum is realized for every λ, i and v ∈ V i such that λRat(v) ̸ = ∅, then the game G is called a game with steady negotiation2 .
Remark 3.6.The negotiation function satisfies the following properties.
• There exists a λ-rational strategy profile from v against the player controlling v if and only if nego(λ)(v) ̸ = +∞.
In the general case, the quantity nego(λ)(v) represents the worst case value that the player controlling v can ensure, assuming that the other players play λ-rationally.
Example 3.7.Let us consider the game of Example 2.18: in Figure 1b, on the two first lines below the states, we present the requirements λ 0 and λ 1 = nego(λ 0 ), which is easy to compute since any strategy profile is λ 0 -rational: for each v, λ 1 (v) is the classical worst-case value or antagonistic value of v, i.e. the best value the player controlling v can enforce against a fully hostile environment.Let us now compute the requirement λ 2 = nego(λ 1 ).
From c, there exists exactly one λ 1 -rational strategy profile σ− = σ , which is the empty strategy since player has never to choose anything.Against that strategy, the best and the only payoff player can get is 1, hence λ 2 (c) = 1.For the same reasons, λ 2 (d) = 2.
From b, player can force to get the payoff 2 or less, with the strategy profile σ : h → c.Such a strategy is λ 1 -rational, assuming the strategy σ : h → d.Therefore, we have λ 2 (b) = 2.
Finally, from a, player can force to get the payoff 2 or less, with the strategy profile σ : h → d.Such a strategy is λ 1 -rational, assuming the strategy σ : h → c.But, he cannot force her to get less than the payoff 2, because she can force the access to the state b, and the only λ 1 -consistent plays from b are the plays with the form (ba) k bd ω .Therefore, λ 2 (a) = 2.
It will be proved in Section 6 that mean-payoff games are with steady negotiation.
3.3.Link with Nash equilibria.Requirements and the negotiation function are able to capture Nash equilibria.Indeed, if λ 0 is the vacuous requirement, then nego(λ 0 ) characterizes the NE outcomes, in the following formal sense: Theorem 3.8.Let G be a game with steady negotiation.Then, a play ρ in G is an NE outcome if and only if ρ is nego(λ 0 )-consistent.

Proof.
• Let σ be a Nash equilibrium in G ↾v 0 , for some state v 0 , and let ρ = ⟨σ⟩ : let us prove that the play • Conversely, let ρ be a nego(λ 0 )-consistent play from a state v 0 .Let us define a strategy profile σ such that ⟨σ⟩ = ρ, by: -⟨σ⟩ = ρ; for each history of the form ρ 0 . . .ρ k v with v ̸ = ρ k+1 , let i be the player controlling ρ k .Since the game G is with steady negotiation, the infimum: −i be λ 0 -rational strategy profile from ρ k realizing that minimum, and let τ k i be some strategy from ρ k such that τ k i (ρ k ) = v.Then, we define: for every other history h, the state σ(h) is defined arbitrarily.Let us prove that σ is an NE: let σ ′ i be a deviation of σ i , let ρ ′ = ⟨σ −i , σ ′ i ⟩ and let ρ 0 . . .ρ k be the longest common prefix of ρ and ρ ′ .Let v = ρ ′ k+1 .Then, we have: Example 3.9.Let us consider again the game of Example 2.18, with the requirement λ 1 given in Figure 1b.The only λ 1 -consistent plays in this game, starting from the state a, are ac ω , and (ab) k d ω with k ≥ 1.One can check that those plays are exactly the NE outcomes in that game.
In the following section, we will prove that as well as nego(λ 0 ) characterizes the NEs, the requirement that is the least fixed point of the negotiation function characterizes the SPEs.

Link between negotiation and SPEs
The notion of negotiation will enable us to find the SPEs, but also more generally the ε-SPEs, in a game.For that purpose, we need the notion of ε-fixed points of a function.
Remark 4.2.A 0-fixed point is a fixed point, and conversely.By Tarski's fixed point theorem, the negotiation function, which is a monotone function from a complete lattice to itself, has a least fixed point.That result can be generalized to ε-fixed points.
Proof.The following proof is a generalization of a classical proof of Tarski's fixed point theorem.Let Λ be the set of the ε-fixed points of the negotiation function.The set Λ is not empty, since it contains at least the requirement v → +∞.Let λ * be the requirement defined by: For every ε-fixed point λ of the negotiation function, we have then λ * (v) ≤ λ(v) for each v, and then nego(λ * )(v) ≤ nego(λ)(v) since nego is monotone; and therefore, we have nego(λ * )(v) ≤ λ(v) + ε.As a consequence, we have: The requirement λ * is an ε-fixed point of the negotiation function, and is therefore the least of them.
In all what follows, for a given game G and a given ε > 0, we will write λ * for the least ε-fixed point of the negotiation function.Intuitively, the requirement λ * is such that, from every vertex v, the player i controlling v cannot enforce a payoff greater than λ * (v) + ε against a λ * -rational behaviour.Therefore, the λ * -consistent plays are such that if one player tries to deviate, it is possible for the other players to prevent them improving their payoff by more than ε, while still playing rationally -which defines ε-SPE outcomes.Formally: Theorem 4.4.Let G ↾v 0 be a prefix-independent game played on a finite graph, and let ε ≥ 0. Let θ be a play starting in v 0 .If there exists an ε-SPE σ such that ⟨σ⟩ = θ, then θ is λ * -consistent.If G is also a game with steady negotiation, then conversely, if θ is λ * -consistent, then it is an ε-SPE outcome. Proof.
Let us define a requirement λ by, for each i ∈ Π and v ∈ V i : Then, for every history hv starting in v 0 , the play ⟨σ ↾hv ⟩ is λ-consistent.In particular, the play θ is.Let us now prove that λ is an ε-fixed point of nego.We will then have λ ≥ λ * , which implies that the play θ is λ * -consistent.Let i ∈ Π, let v ∈ V i , and let us assume toward contradiction (since the negotiation function is non-decreasing) that nego(λ)(v) > λ(v) + ε, that is to say: Then, since all the plays generated by the strategy profile σ are λ-consistent, and therefore since any strategy profile of the form σ−i↾hv is λ-rational, we have: Therefore, there exists a history hv such that: which is impossible if the strategy profile σ is an ε-SPE.Therefore, there is no such v, and the requirement λ is an ε-fixed point of the negotiation function.
• If G is a game with steady negotiation and θ is λ * -consistent, then θ is an ε-SPE outcome.
-A particular case: if there exists v accessible from v 0 such that λ * (v) = +∞.
In that case, for each u such that uv ∈ E, if the player controlling u chooses to go to v, no λ * -consistent play can be proposed to them from there, hence there is no λ * -rational strategy profile against that player from u, and nego(λ * )(u) = +∞.Since ε is finite and since λ * is an ε-fixed point of the negotiation function, it follows that λ * (u) = +∞.Since v is accessible from v 0 , we can repeat this argument and show that λ * (v 0 ) = +∞; in that case, there is no λ * -consistent play θ from u, and then the proof is done.Therefore, for the rest of the proof, we assume that for all v, we have λ * (v) ̸ = +∞.As a consequence, since λ * is an ε-fixed point of the function nego, for each v accessible from v 0 , we have nego(λ * )(v) ̸ = +∞; which implies that for each such v, there exists a λ * -consistent strategy profile against the player controlling v, starting from v.
The rest of the proof constructs the strategy profile σ and proves that it is an SPE.That construction is illustrated by Figure 3. -Spare parts: the strategy profiles τ v * .
Recall that since G is a game with steady negotiation, for every requirement λ * , for every player i and for every state v ∈ V i , since by the previous point we assume Figure 3.The construction of σ we know that there exists a strategy profile τ v −i from v that is λ * -rational assuming a strategy τ v i and that satisfies the inequality: i.e. there exists a worst λ * -rational strategy profile against player i from the state v, with regards to player i's payoff.Our goal in this part of the proof is to construct a strategy profile τ v * −i , that is λ * -rational assuming a strategy τ v * i , and that will be used to punish player i when they deviate from σ until another player deviates.The strategy profile τ v −i and the strategy τ v i are not sufficient for that purpose, because if some history h compatible with τ v −i is such that µ i (⟨τ v ↾h ⟩) < µ i (⟨τ v ⟩), then in the corresponding subgame, it may be possible for player i to deviate and get a payoff that would be smaller than or equal to µ i (⟨τ v ⟩), but greater than µ i (⟨τ v ↾h ⟩).On the other hand, the construction of τ v * −i will ensure that each time player i deviates, the other players punish them at least as harshly as they were planning to do before the deviation.Let us construct inductively the strategy profile τ v * .We define it only on histories that are compatible with τ v * −i , since it can be defined arbitrarily on other histories.We proceed by assembling the strategy profiles of the form τ w for various w ∈ V i , and the histories after which we follow a new τ w will be called the resets of τ v * : they will be histories of the form hw ′ , where h is empty or last(h) = w.First, we set ⟨τ v * ⟩ = ⟨τ v ⟩: the one-state history v is then the first reset of τ v * −i .Then, for every history hww ′ from v such that h is compatible with τ v * −i , that w ∈ V i , and that w ′ ̸ = τ v * i (hw): let us decompose hww ′ = h 1 h 2 , so that the history h 1 first(h 2 ) is the longest reset of τ v * −i among the prefixes of hw.Or, in other words, so that the strategy profile τ v * ↾h 1 first(h 2 ) has been defined as equal to τ u over the prefixes of h 2 until w, where u = v if h 1 is empty, or u = last(h 1 ) otherwise.By prefix-independence of G and by definition of τ u and τ w , we have: Let us now separate two cases.* Suppose first that there is equality: Then, we choose ⟨τ v * ↾hww ′ ⟩ = ⟨τ u ↾uh 2 ⟩: the coalition of players against player i keeps following the same strategy profile.* Suppose now that the inequality is strict: Then, we choose ⟨τ v * ↾hww ′ ⟩ = ⟨τ w ↾ww ′ ⟩: player i has done something that lowers the payoff they can ensure, and therefore the other players have to update their strategy profile in order to punish them more.The history hw is a reset of τ v * −i .Since there are finitely many histories of each length, this process completely defines τ v * .Moreover, all the plays constructed are λ * -consistent, hence the strategy profile τ v * −i is λ * -rational assuming τ v * i , as desired.-Construction of σ.
Let us now construct inductively the strategy profile σ: we will prove in the next part of the proof that it is an ε-SPE.We proceed inductively, by defining all the plays ⟨σ ↾hv ⟩, for hv ∈ Hist(G v 0 ) with v ̸ = σ(h).We maintain the induction hypothesis that such a play is always λ * -consistent.
* Let now huv be a history such that the strategy profile σ has been defined on all the prefixes of hu, which we now assume to be nonempty, but not on huv itself, and such that v ̸ = σ(hu).Let i be the player controlling the state u.
Then, we define ⟨σ ↾huv ⟩ = ⟨τ u * ↾uv ⟩, and inductively, for every history h ′ w starting from v and compatible with σ−i↾huv , we define ⟨σ ↾huh ′ w ⟩ = ⟨τ u * ↾uh ′ w ⟩.The strategy profile σ↾huv is then equal to τ v * ↾uv on any history compatible with τ v * −i .Since there are finitely many histories of each length, this process completely defines σ.
Consider a history h 0 w ∈ HistG ↾v 0 , a player i ∈ Π, and a deviation σ ′ i of σ i .Let ρ = h 0 ⟨σ ↾h 0 w ⟩, and let ρ ′ = h 0 ⟨σ −i↾h 0 w , σ ′ i↾h 0 w ⟩.We wish to prove that µ i (ρ ′ ) ≤ µ i (ρ) + ε.First, if the play ρ ′ is compatible with σ i , then ρ ′ = ρ and the proof is immediate.Now, if it is not, we let ρ ′ ≤n denote the shortest prefix of ρ ′ such that ρ ′ n−1 ∈ V i and ρ ′ n ̸ = σ i (ρ ′ <n ), and such that ρ ′ ≥n is compatible with σ−i↾ρ ′ ≤n .Thus, the transition ρ ′ n−1 ρ ′ n marks the time when player i begins to deviate unilaterally from σ i .However, note that ρ ′ ≤n can be both longer or shorter than h 0 w: player i may have already deviated in h 0 w, or may wait afterwards to effectively deviate.Be that as it may, the history ρ ′ <n is a common prefix of the plays ρ and ρ ′ , and the substrategy profile σ↾ρ ′ ≤n has been defined during the construction of σ as equal to , where v = ρ ′ n−1 , on any history compatible with σ−i↾ρ ′ ≤n .By construction of τ v * , the sequence (nego(ρ ′ k )) k≥n−1,ρ ′ k ∈V i is non-increasing.It is therefore stationary (or finite), because it can take only a finite number of values.Consequently, there is a finite number of resets along the play ρ ′ ≥n−1 .Let ρ ′ n−1 . . .ρ ′ m be the last (longest) one.Afterwards, the play ρ ′ ≥m is compatible with the strategy profile . By definition of that strategy profile, we have the inequality µ i (ρ ′ ) ≤ nego(λ * )(ρ ′ m−1 ).We need now to prove nego(λ * )(ρ ′ m−1 ) ≤ µ i (ρ) + ε.Let ρ ≤p = ρ ′ ≤p denote the longest common prefix of ρ and ρ ′ such that ρ p ∈ V i .Since player i does not control any vertex between ρ p and ρ n−1 , and therefore cannot deviate, we have ρ ≥p = ⟨σ ↾ρ ≤p ⟩, which is λ * -consistent.As a consequence, we have µ i (ρ) ≥ λ * (ρ p ).

Finally, since the sequence of the quantities nego(ρ
. Consequently, we have: The strategy profile σ is an ε-SPE.

5.
A first way to handle negotiation: the abstract negotiation game 5.1.Informal definition.We have now proved that SPEs are characterized by the requirements that are fixed points of the negotiation function; but we need to know how to compute, in practice, the quantity nego(λ) for a given requirement λ.We first define an abstract negotiation game, that is conceptually simple but not directly usable for an algorithmic purpose, because it is defined on an uncountably infinite state space.
A similar definition was given in [FP17], as a tool in a general method to compute SPE outcomes in games whose payoff functions have finite range, which is not the case of mean-payoff games.Here, linking that game with our concepts of requirements, negotiation function and steady negotiation enables us to present an effective algorithm in the case of mean-payoff games, by constructing a finite version of the abstract negotiation game, the concrete negotiation game.
The abstract negotiation game from a state v 0 ∈ V i , with regards to a requirement λ, is denoted by Abs λi (G) ↾v 0 and opposes two players, Prover and Challenger, with the following rules: • first, Prover proposes a λ-consistent play ρ from v 0 (or loses, if she has no play to propose).
• Then, either Challenger accepts the play and the game terminates; or, he chooses an edge ρ k ρ k+1 , with ρ k ∈ V i , from which he can make player i deviate, using another edge ρ k v with v ̸ = ρ k+1 : then, the game starts again from v instead of v 0 .• In the resulting play (either eventually accepted by Challenger, or constructed by an infinity of deviations), Prover wants player i's payoff to be low, and Challenger wants it to be high.That game gives us the basis of a method to compute nego(λ) from λ: the maximal payoff that Challenger -or C for short -can ensure in Abs λi (G) ↾[v 0 ] , with v 0 ∈ V i , is also the maximal payoff that player i can ensure in G ↾v 0 , against a λ-rational environment; hence the equality val C Abs λi (G) ↾[v 0 ] = nego(λ)(v 0 ).A proof of that statement, with a complete formalization of the abstract negotiation game, is presented in Appendix A.
Example 5.1.Let us consider again the game of Example 2.18: the requirement λ 2 = nego(λ 1 ), computed in Section 3.2, is also presented on the third line below the states in Figure 1b.Let us use the abstract negotiation game to compute the requirement λ 3 = nego(λ 2 ).
From a, Prover can propose the play abd ω , and the only deviation Challenger can do is going to c; he has of course no incentive to do it.Therefore, λ 3 (a) = 2. From b, whatever Prover proposes at first, Challenger can deviate and go to a.Then, from a, Prover cannot propose the play ac ω , which is not λ 2 -consistent: she has to propose a play beginning by ab, and to let Challenger deviate once more.He can then deviate infinitely often that way, and generate the play (ba) ω : therefore, λ 3 (b) = 3.The other states keep the same values.Note Iterations of the negotiation function that there exists no λ 3 -consistent play from a or b, hence nego(λ 3 )(a) = nego(λ 3 )(b) = +∞.This proves that there is no SPE in that game.

5.
2. An imperfect method: the negotiation sequence.A classical way to compute the least fixed point of a function is, as in the example above, to compute its iterations on the least element of the set we are considering until reaching a fixed point -which is, then, the least one.We call this sequence the negotiation sequence, and write it (λ n ) n∈N = (nego n (λ 0 )) n .In many simple examples, in practice, computing the negotiation sequence, using the abstract negotiation game, is the way we will find the least fixed point of the negotiation function and solve SPE problems.
Example 5.2.Let G be the game of Figure 4, where each edge is labelled by the rewards r and r .Below the states, we present the requirements λ 0 : v → −∞, λ 1 = nego(λ 0 ), λ 2 = nego(λ 1 ), λ 3 = nego(λ 2 ), and λ 4 = nego(λ 3 ).Let us explicate those computations, using the abstract negotiation game.From λ 0 to λ 1 : since every play is λ 0 -consistent, Prover can always propose whatever she wants.From the state a, whatever she (trying to minimize player 's payoff) proposes, Challenger can always make player deviate in order to loop on the state a.Then, in the game G, player gets the payoff 1, hence λ 1 (a) = 1.From the state b, Prover (trying to minimize player 's payoff) can propose the play (bc) ω .If Challenger makes player deviate to go to the state a, then Prover can propose the play a(bc) ω .Even if Challenger makes player deviate infinitely often, he cannot give him more than the payoff 0, hence λ 1 (b) = 0. Similar situations happen from the states c and d, hence From λ 1 to λ 2 : now, from the state b, whatever Prover proposes at first, Challenger can make player deviate and go to the state a.From there, since we have λ 1 (a) = 1, Prover has to propose a play in which player gets the payoff 1.The only such plays do also give the payoff 1 to player , hence λ 2 (b) = 1.Similar situations explain λ 3 (c) = 1, and λ 4 (c) = 1.Finally, plays ending with the loop a ω are all λ 4 -consistent, hence Prover can always propose them, hence the requirement λ 4 is a fixed point of the negotiation function -and therefore the least.
The interested reader will find other such examples in Appendix B. However, this cannot be turned into an effective algorithm: the negotiation sequence is not always stationary.
Theorem 5.3.There exists a mean-payoff game on which the negotiation sequence is not stationary.From a, the worst play that player could propose to player would be a combination of the cycles cd and d giving her exactly 1.But then, player will deviate to go to b, from which if player proposes plays in the strongly connected component containing c and d, then player will always deviate and generate the play (ab) ω , and then get the payoff 2.
Then, in order to give her a payoff lower than 2, player has to go to the state e.Since player does not control any state in that strongly connected component, the play he will propose will be accepted: he will, then, propose the worst possible combination of the cycles ef and f for player , such that he gets at least his requirement λ n (b).The payoff λ n+1 (a) is then the minimal solution of the system: 2 , and by induction, for all n > 0: which converges to 2 but does never reach it.6.A tool to compute negotiation: the concrete negotiation game 6.1.Definition.In the abstract negotiation game, Prover has to propose complete plays, on which we can make the hypothesis that they are λ-consistent.In practice, there will often be an infinity of such plays, and therefore it cannot be used directly for an algorithmic purpose.Instead, those plays can be given edge by edge, in a finite state game.Its definition is more technical, but it can be shown that it is equivalent to the abstract one.Definition 6.1 (Concrete negotiation game).Let G be a prefix-independent game played on a finite graph, let i ∈ Π and v 0 ∈ V i , and let λ be a requirement on G.The concrete negotiation game of G ↾v 0 is the two-player zero-sum game Conc λi (G) ↾s 0 = ({P, C}, S, (S P , S C ), ∆, ν) ↾s 0 , defined as follows: • player P is called Prover, and player C is called Challenger.
• The set of states controlled by Prover is S P = V × 2 V , where the state s = (v, M ) contains the information of the current state v on which Prover has to define the strategy profile, and the memory M of the states that have been traversed so far since the last deviation, and that define the requirements Prover has to satisfy.The initial state is s 0 = (v 0 , {v 0 }).• The set of states controlled by Challenger is S C = E × 2 V , where in the state s = (uv, M ), the edge uv is the edge proposed by Prover.• The set ∆ contains three types of transitions: proposals, acceptations and deviations.
-The proposals are transitions in which Prover proposes an edge of the game G: the acceptations are transitions in which Challenger accepts to follow the edge proposed by Prover (it is in particular his only possibility when that edge begins on a state that is not controlled by player i) -note that the memory is updated: the deviations are transitions in which Challenger refuses to follow the edge proposed by Prover, as he can if that edge begins in a state controlled by player i -the memory is erased, and only the new state the deviating edge leads to is memorized: : the projection of the history H is the history Ḣ = h 0 . . .h n in the game G.That definition is naturally extended to plays.
• The payoff function ν C = −ν P measures player i's payoff, with a winning condition if the constructed strategy profile is not λ-rational, that is to say if after finitely many player i's deviations, it can generate a play which is not λ-consistent: ν C (π) = +∞ if after some index n ∈ N, the play π ≥2n contains no deviation, and if the play Like in the abstract negotiation game, the goal of Challenger is to find a λ-rational strategy profile that forces the worst possible payoff for player i, and the goal of Prover is to find a possibly deviating strategy for player i that gives them the highest possible payoff.Remark 6.2.The concrete negotiation game has the following properties.
• When π ≥2n contains no deviation, the memory of its states is increasing, and therefore eventually equal to the memory M = Occ( π≥n ).If it is the longest such suffix of π, it means that the play π≥n is λ-consistent if and only if for each player j and each vertex v ∈ V j , we have µ j ( π) ≥ λ(v).
3 When we combine the notations π and π ≥n , the notation π is applied first; that is, the play π≥n is the projection of the play π ≥2n , not π ≥n .
• When G is a mean-payoff game and when λ has finite values, the concrete negotiation game can be seen as a multidimensional two-player zero-sum mean-payoff game, with one dimension for each player, meant to control that each player gets the payoff they require, plus a special dimension ⋆, meant to measure player i's actual payoff.The rewards of the proposals are all equal to 0, and the rewards of acceptations and deviations are r⋆ ((uv, M )(v ′ , N )) = 2r i (uv ′ ), and rj ((uv, M )(v ′ , N )) = 2r j (uv ′ ) − 2 max{λ(w) | w ∈ M ∩V j }.The payoff ν C (π) equals then +∞ for every π that contains finitely many deviations and such that for some j ∈ Π, the mean-payoff μj (π) is negative, and ν C (π) = μ⋆ (π) otherwise.
6.2.Link with the negotiation function.The concrete negotiation game is equivalent to the abstract one: the only differences are that the plays proposed by Prover are proposed edge by edge, and that their λ-consistency is not written in the rules of the game but in its payoff function.
Theorem 6.3.Let G be a Borel prefix-independent game played on a finite graph.Let λ be a requirement, let i be a player and let v 0 ∈ V i .Then, we have val C (Conc λi (G) ↾s 0 ) = nego(λ)(v 0 ).Moreover, if for each player i and every state v 0 ∈ V i , Prover has an optimal strategy in Conc λi (G) ↾(v 0 ,{v 0 }) , then G is a game with steady negotiation. Proof.
Let τ P be a strategy such that sup τ C ν C (⟨τ ⟩) ̸ = +∞, and let σ be the strategy profile defined by σ( Ḣ) = w for every history H compatible with τ P (by induction, the projection is injective on the histories compatible with τ P ) with τ P (H) = (vw, •), and arbitrarily defined on any other histories.We prove that the strategy profile σ−i is λ-rational assuming the strategy σ i , and that sup -The strategy profile σ−i is λ-rational, assuming the strategy σ i .Indeed, let us assume it is not.Then, there exists a history h = h 0 . . .h n in G ↾v 0 compatible with σ−i such that the play ⟨σ ↾h ⟩ is not λ-consistent.Then, let: be the only history in Conc λi (G) ↾s 0 compatible with τ P such that Ḣ = h.Let τ C be a strategy constructing the history h, defined by: for every k, and: τ C H ′ (vw, M ) = (w, M ∪ {w}) for any other history H ′ (vw, M ).Then, the play π = ⟨τ ⟩ contains finitely many deviations (Challenger stops the deviations after having constructed the history h), and the play π≥n is not λ-consistent.Therefore, we have ν C (π) = +∞, which is false by hypothesis.
-Now, let us prove the inequality sup Let σ ′ i be a strategy for player i, and let η = ⟨σ −i , σ ′ i ⟩.Let τ C be a strategy such that for every k: i.e. a strategy forcing η against τ P .Then, since ν C (⟨τ ⟩) ̸ = +∞ by hypothesis on τ P , we have Moreover, if τ P is optimal, then the λ-rational strategy profile σ−i realizes the infimum: hence if there exists such an optimal strategy for every vertex v 0 , then the game G is with steady negotiation.
Let σ−i be a λ-rational strategy profile from v 0 , assuming the strategy σ i ; let us define a strategy τ P , by τ P (H(v, •)) = vσ( Ḣv), • for every history H and for every v ∈ V .Let us prove the inequality . Let τ C be a strategy for Challenger, and let π = ⟨τ ⟩.If ν C (π) = +∞, then there exists n such that the play π ≥2n contains no deviation, i.e. π≥n = ⟨σ ↾ π≤n ⟩, and that play is not λ-consistent, which is impossible.Therefore, we have ν C (π) ̸ = +∞, and as a consequence , hence the desired inequality.The dotted arrows indicate the deviations, and the transitions have been labelled with the immediate rewards defined as in the remark above.The transitions that are not labelled are either zero for the three coordinates, or meaningless since they cannot be used more than once.The red arrows indicate a (memoryless) optimal strategy for Challenger.Against that strategy, the lowest payoff Prover can ensure is 2. Therefore, we have nego(λ 1 )(v 0 ) = 2, in line with the abstract game in Example 5.1.6.4.Resolution.We now know that nego(λ)(v), for a given requirement λ, a given player i and a given state v ∈ V i , is the value of the concrete negotiation game Conc λi (G) ↾(v,{v}) .But we still do not know how to compute that value.We present here an important result for that purpose.
For any game G ↾v 0 and any memoryless strategy σ i , we write G ↾v 0 [σ i ] the graph induced by σ i , defined as the underlying graph of G where all the transitions that are not compatible with σ i , and all the vertices that are then no longer accessible from v 0 , have been omitted.Lemma 6.4.Let G ↾v 0 be a mean-payoff game, let i be a player, let λ be a requirement and let Conc λi (G) ↾s 0 be the corresponding concrete negotiation game.There exists a memoryless strategy τ C that is optimal for Challenger.
Proof.The structure of that proof is inspired from the proof of Lemma 14 in [VCD + 15].
Let ν ′ C be the payoff function defined by: • ν ′ C (π) = +∞ if there exists n such that π ≥2n contains no deviation, and such that the play π≥n is not λ-consistent.
The payoff function ν ′ C is then defined as ν C , but with a limit superior instead of inferior.The payoff function ν ′ C is concave.Indeed, let π and χ be two plays in Conc λi (G) ↾v 0 , and let ξ be a shuffling of them.Let us check that ν Otherwise, we also have ν ′ C (ξ) ̸ = +∞: if either π or χ contains infinitely many deviations, then so does ξ.If both contain finitely many deviations, then so does ξ: the states of ξ have therefore eventually the same memory M , which is also the memory of, eventually, the states of both π and χ.Now, since ν ′ C (π), ν ′ C (χ) ̸ = +∞, we have µ j ( π), µ j ( χ) ≥ λ(v) for each player j and every v ∈ M ∩ V j .Since mean-payoff functions are convex, it is also the case for the play ξ, which is a shuffling of π and χ.Hence ν C (ξ) ̸ = +∞.
Therefore, by Lemma 2.29 Challenger has a memoryless strategy that is optimal with regards to the payoff function ν ′ C : let us write it τ C .Now, we want to prove that the memoryless strategy τ C is also optimal with regards to ν C .Note that for every play π, we have ν C (π) ≤ ν ′ C (π), and therefore val C (Conc λi (G) ↾s 0 ) ≤ α, where α is the value of the game Conc λi (G) ↾s 0 with the payoff function ν ′ C instead of ν C .Therefore, we have proven that τ C is optimal with regards to ν C if we prove that inf τ P ν C (⟨τ ⟩) ≥ α.
Let π be a play compatible with τ C , i.e. an infinite path from s 0 in the graph and by Lemma 2.16, we have: where C is the set of the simple cycles of the graph Conc λi (G) ↾s 0 [τ C ]. Now, for each such cycle, there exists a history H such that the play Hc ω is compatible with the strategy τ C , and therefore satisfies ν ′ C (Hc ω ) ≥ α, and consequently MP i ( ċ) ≥ α.Therefore, we have µ i ( π) ≥ α, and the strategy τ C is optimal with regards to the payoff function ν C .Using this lemma, computing nego(λ) for any given λ amounts to looking for an optimal path for Prover in each graph Conc λi (G) ↾s 0 [τ C ].When (V, E) is a graph, we write SConn(V, E) the set of its strongly connected components accessible from the vertex v.
Let K be a strongly connected subgraph of a concrete negotiation game.If K contains no deviation, then the states of K share all the same memory: let us us write it Mem(K).If K contains at least one deviation, we define Mem(K) = ∅.Then, we write: The set in which the variable x evolves is the set of payoffs of plays that Prover can construct against Challenger, when she chooses to go in the strongly connected component K, and observes the requirements she has to observe -those stored in the common memory of the states of K if K contains no deviations, and none otherwise.Hence the following formal result.Lemma 6.5.Let G be a mean-payoff game, let λ be a requirement, let i be a player, and let v ∈ V i .Then, we have: Moreover, mean-payoff games are games with steady negotiation.
Proof.By Lemma 6.4, in the game Conc λi (G) ↾s , where s = (v, {v}), there exists a memoryless strategy τ C which is optimal for Challenger.Therefore, the best payoff that Prover can ensure against every strategy of Challenger is the best payoff she can ensure against τ C .It follows from Theorem 6.3 that the highest value player i can enforce against a hostile λ-rational environment is the minimal payoff of Challenger in a path in the graph Conc λi (G) ↾s [τ C ] starting from s.For any such path π, there exists a strongly connected component K of Conc λi (G) ↾s [τ C ] such that after a finite number of steps, the path π is a path in K. Let us now prove that the least payoff of Challenger in such a play is given by opt(K).Let us distinguish two cases.
• If there is at least one deviation in K.
Then, for every play π in K, it is possible to transform π into a play π ′ with µ( π′ ) = µ( π), which contains infinitely many deviations: it suffices to add round trips to a deviation, endlessly, but less and less often.Therefore, the outcomes ν C (π) of plays in K are exactly the mean-payoffs µ i ( π) of plays in K (plus possibly +∞); and in particular, the lowest payoff Challenger can get in K is the quantity: By Lemma 2.16, the set of possible values of µ( π) for all plays π in K is exactly the set: Since all the plays in K contain finitely many deviations (actually none), for every path π in K, we have ν C (π) = +∞ if and only if there exists j ∈ Π and u ∈ V j ∩ Mem(K) such that µ j ( π) < λ(u).Then, the lowest outcome Prover can get in K is: that is to say opt(K).
Theorem 6.3 enables to conclude to the desired formula.Moreover, let us notice that in all cases, Prover can choose one optimal play against each memoryless strategy of Challenger.By determinacy of Borel games, it comes that Prover has an optimal strategy, hence by Theorem 6.3, mean-payoff games are games with steady negotiation.
We are now able to compute nego(λ) for a given λ.However, because of Theorem 5.3, that is not sufficient to compute the least ε-fixed point λ * , and then to decide the SPE threshold problem.Nevertheless, we will prove that λ * can also be extracted from the concrete negotiation game.where every P ∈ Φ d is a polyhedron, such that for each such P there exists āP ∈ R D \ { 0} and b P ∈ R such that for every x ∈ P , the vector ȳ = f (x) satisfies:

Analysis of the negotiation function
where • is the canonical scalar product.
From now, we can therefore drop the downward sealing.Moreover, let us note that the projection x → x i , as an affine mapping over the polytope Q ∩ R λ , finds its minimum on a vertex of that polytope.Each vertex {x} of Q ∩ R λ is the intersection between a face F of the polytope Q, and a face F ′ of the polyhedron R λ .Such a face F is of the form Conv c∈C MP( ċ), where C is a subset of SC(K); and such a face F ′ is of the form H λw , where W is a subset of Mem(K), and where H λw is the hyperplane {x | x j = λ(w)} for each j and w ∈ V j .Thus, if we define the set: it is included in the polytope Q ∩ R λ , and contains all its vertices; hence the equality opt(K) = inf{x i | x ∈ X}.We can therefore write: where f CW is, for every C and W , the function defined by: Let us now study each of those functions f CW .Given C, W and λ, we have f CW (λ) ̸ = +∞ if and only if the three following conditions are satisfied.
• First, the intersection I is a singleton.The elements of I are the points of the form x = c∈C α c MP j ( ċ) j with ᾱ ∈ R C , c α c = 1, and x j = λ(w) for each j and w ∈ W ∩V j .The set I is therefore a singleton if and only if the matrix: is such that there exists exactly one vector ᾱ ∈ R C satisfying: That condition is satisfied if and only if A is invertible, which can be decided in a time polynomial in the size of A, and does actually not depend on λ: either it is not satisfied, and the function f CW is constantly equal to +∞, or it is, and only the following conditions must be considered.
• Second, the unique element of I belongs to Conv c∈C MP( ċ).That is the case if and only if the vector: Thus, the vector ᾱ = A −1 (Bλ ¯+ β) has non-negative coordinates if and only if λ ¯belongs to the set: which, as a pre-image of a polyhedron by an affine function, is itself a polyhedron, which can be constructed in a time polynomial in the size of A, B and β. • Third, the vector: x = (MP j ( ċ)) j∈Π,c∈C ᾱ is such that for each j and each w ∈ Mem(K) (not only in W ), we have x j ≥ λ(w).The set P 1 of requirements λ ¯satisfying that condition can itself be written as the pre-image of a polyhedron by an affine function, and is therefore itself a polyhedron, which we can construct in a time polynomial in the size of A, B, β and (MP j ( ċ)) j∈Π,c∈C .
Therefore, the function f CW is equal to +∞ outside of the polyhedron P 0 ∩ P 1 , and satisfies: inside it.It is therefore an affine function of which a representation can be constructed in a time polynomial in the size of K. Therefore, a representation of opt(K) = inf C,W f CW (λ ¯) as an affine function of λ ¯can be constructed in a time exponential in the size of K, and the negotiation function, expressed by: nego : λ ¯ → sup • If such an SPE exists: let us write it σ, and let ρ = ⟨σ⟩.Since µ S (ρ) = 1, the sink state ⊥ is never visited.Let us define a valuation ν on X as follows: for each variable x, we have ν(x) = 1 if and only if µ x (ρ) < 1.Now, let C be a clause of φ: since C, as a state, is necessarily visited infinitely often and with a fixed frequence in the play ρ (because no player ever go to the sink state ⊥), one of its successors, say (C, L), is visited with a non-negligible frequence (more formally, the time between two occurrences of (C, L) is bounded).If L is a positive litteral, say x, then by definition of ν, we have ν(x) = 1 and the clause C is satisfied.
If L has the form ¬x, then each time the state (C, ¬x) is traversed, player x has the possibility to deviate and to go to the sink state ⊥, where he is sure to get the payoff 1.Since σ is an SPE, it means that he already gets the payoff 1 in the play ρ.By definition of ν, we then have ν(x) = 0, hence the litteral ¬x is satisfied, hence so is the clause C.
The valuation ν satisfies all the clauses of φ, and therefore satisfies the formula φ itself.• If φ is satisfied by some valuation ν: let us define a strategy profile σ by: σ S (hC) = (C, L) for each history hC where C is a clause of φ, where L is a litteral of C that is satisfied in the valuation ν; and σ x (h(C, ¬x)) = ⊥ if and only if ν(x) = 1 for each history h(C, ¬x) where C is a clause of φ and x is a variable.Any other state has only one successor, hence we now have completely defined a strategy profile.Now, let us prove it is an SPE, in which Solver gets the payoff 1.
Let hC be a history, where C is a clause of φ.We want to prove that σ↾hC is a Nash equilibrium, in which Solver gets the payoff 1.Let ρ = ⟨σ ↾hC ⟩.If µ S (ρ) < 1, i.e. if ρ is of the form hD(D, ¬x)⊥ ω , then by definition of σ we have ν(x) = 0.But then, we cannot But then, at the third step, from the state b: whatever Prover proposes at first, Challenger can deviate to reach the state a.Then, Prover has to propose a λ 2 -consistent play from a, i.e. a play in which player gets at least the payoff 2: such a play necessarily end in the state d, i.e. after possibly some prefix, Prover proposes the play abd ω .But then, Challenger can always deviate to go back to the state a; and the play which is thus created is (ab) ω which gives player the payoff 3. Finally, from the states a and b, there exists no λ 3 -consistent play, and therefore no λ-rational strategy profile.and for all n ≥ 4, λ n = λ 4 .
Example B.2.In all the previous examples, all the games whose underlying graphs were strongly connected contained SPEs.Here is an example of game with a strongly connected underlying graph that does not contain SPEs.This game is similar to the game of Example 2.18, hence we do not give the details of the computation of the negotiation sequence.At the first step, the requirement λ 1 captures the antagonistic values.Then, from the state c, if player forces the access to the state b, then player must get at least 1: the worst play that can be proposed to player is then (babc) ω , which gives player the payoff 3 2 .From the state f , if player forces the access to the state g, then the worst play that can be proposed to them is g ω .Then, from the state d, if player forces the access to the state c, then player must get at least 3 2 : the worst play that can be proposed to player is then (cccd) ω , which gives player the payoff 1 2 .At the same time, from the state e, player can now force the acces to the state f : then, the worst play that can be proposed to them is f g ω .But then, from the state c, player can now force the access to the state e: then, the worst play that can be proposed to them is ef g ω .And finally, from that point, if from the state d player forces the access to the state c, then player must have at least the payof 2; and therefore, the worst play that can be proposed to player is now (ccd) ω , which gives her the payoff Definition 2.14 (Downward sealing).Given a set Y ⊆ R D , the downward sealing of Y is the set ⌞ Y = (min z∈Z z d ) d∈D Z is a finite subset of Y .Two NEs and one SPE

Figure 2 .
Figure 2. A game with an infinity of SPEs

6. 3 .
Example.Let us consider again the game from Example 2.18. Figure 6 represents the game Conc λ 1 (G) (with λ 1 (a) = 1 and λ 1 (b) = 2), where the dashed states are controlled by Challenger, and the other ones by Prover.

Definition 7. 1 (
Piecewise affine function).Let D be a finite set of dimensions.A function f : R D → R D is piecewise affine if for each d ∈ D, there exists a finite partition Φ d of R D ,

Figure 7 .
Figure 7.The negotiation function on the games of Examples 2.19 and 2.18

Figure 8 .
Figure 8.The game G φ 3. This example shows how a new requirement can emerge from the combination of several cycles.Let G be the following game: The requirement λ 5 is a fixed point of the negotiation function.
Two examples of gamesExample 2.15.In R 2 , if Y is the blue area in Figure2b, then ⌞ Y is obtained by adding the gray area.Lemma 2.16 [CDE + 10].Let G be a mean-payoff game, whose underlying graph is strongly connected.The set of the payoffs µ(ρ), where ρ is a play in G, is exactly the set: ⌞ Conv c∈SC(V,E) Proof.Let i ∈ Π, let v ∈ V i , and let λ be a requirement.Let τ C be a memoryless strategy of Challenger in the game Conc λi (G), and let K be a strongly connected component of the graph Conc λi (G) ↾s , where s = (v, {v}).The result will follow from the fact that the quantity opt(K) is, itself, a piecewise affine function of λ ¯.Note that the underlying graph of the game Conc λi (G) does not depend on λ.For a given λ, let us consider the polytope Q Remark 7.2.The function f is fully represented by the family (Φ d , (ā P , b P ) P ∈Φ d ) d∈D .That representation is finite if each polyhedron P ∈ Φ d , for each dimension d, is defined by rational equations, and if each āP and each b P has rational or infinite values.Theorem 7.3.Let us assimilate every requirement λ to the vector λ ¯= (λ(v)) v∈V .Then, the negotiation function is piecewise affine, and a finite representation of it can be constructed in a time doubly exponential in the size of G.
This work is licensed under the Creative Commons Attribution License.To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/or send a letter to Creative Commons, 171 Second St, Suite 300, San Francisco, CA 94105, USA, or Eisenacher Strasse 2, 10777 Berlin, Germany