Games Where You Can Play Optimally with Arena-Independent Finite Memory

For decades, two-player (antagonistic) games on graphs have been a framework of choice for many important problems in theoretical computer science. A notorious one is controller synthesis, which can be rephrased through the game-theoretic metaphor as the quest for a winning strategy of the system in a game against its antagonistic environment. Depending on the specification, optimal strategies might be simple or quite complex, for example having to use (possibly infinite) memory. Hence, research strives to understand which settings allow for simple strategies. In 2005, Gimbert and Zielonka provided a complete characterization of preference relations (a formal framework to model specifications and game objectives) that admit memoryless optimal strategies for both players. In the last fifteen years however, practical applications have driven the community toward games with complex or multiple objectives, where memory -- finite or infinite -- is almost always required. Despite much effort, the exact frontiers of the class of preference relations that admit finite-memory optimal strategies still elude us. In this work, we establish a complete characterization of preference relations that admit optimal strategies using arena-independent finite memory, generalizing the work of Gimbert and Zielonka to the finite-memory case. We also prove an equivalent to their celebrated corollary of great practical interest: if both players have optimal (arena-independent-)finite-memory strategies in all one-player games, then it is also the case in all two-player games. Finally, we pinpoint the boundaries of our results with regard to the literature: our work completely covers the case of arena-independent memory (e.g., multiple parity objectives, lower- and upper-bounded energy objectives), and paves the way to the arena-dependent case (e.g., multiple lower-bounded energy objectives).

encode virtually all classical game objectives, both qualitative and quantitative, and lets us reason in a well-founded framework under minimal assumptions. See Example 2.3 for illustrations of classical objectives encoded as preference relations.
Memoryless optimal strategies. Remarkably, several canonical classes of games that have been around for decades and proved their usefulness over and over -e.g., meanpayoff [EM79], parity [EJ88,Zie98], or energy games [CdAHS03] -share a desirable property: they all admit memoryless optimal strategies for both players. That is, for every strategy σ i of P i , there is a strategy σ ML i which is at least as good (i.e., wins whenever σ i wins or ensures at least the same payoff) and that uses no memory at all. Such a memoryless strategy always picks the same edge when in the same state, regardless of what happened earlier in the game.
Memoryless strategies are the simplest kind of strategies one can use in a turn-based game on a graph. Therefore, it is quite interesting that they suffice for objectives as rich as the ones we just discussed. Following this observation, a lot of effort has been put into understanding which games admit memoryless optimal strategies, and in identifying the exact frontiers of memoryless determinacy. Let us mention, non-exhaustively, works by Gimbert and Zielonka [GZ04,GZ05] (culminating in a complete characterization), Aminof and Rubin [AR17] (through the prism of first-cycle games), and Kopczyński [Kop06] and Bianco et al. [BFMM11] (half-positional determinacy). All these advances were built by identifying the common underlying mechanisms in ad hoc proofs for specific classes of games, and generalizing them to wide classes (e.g., the first-cycle games of Aminof and Rubin are inspired by the seminal paper of Ehrenfeucht and Mycielski on mean-payoff games [EM79]).
Gimbert and Zielonka's approach. Arguably, the most important result in this direction is the complete characterization of preference relations admitting memoryless optimal strategies, established in [GZ05], fifteen years ago. By complete characterization, we mean sufficient and necessary conditions on the preference relations.
This result can be stated as follows: a preference relation admits memoryless optimal strategies for both players on all arenas if and only if the relation (used by P 1 ) and its inverse (used by P 2 ) are monotone and selective. These concepts will be defined formally in Section 3.1, but let us give an intuition here. Roughly, a preference relation is monotone if it is stable under prefix addition: that is, given two sequences of colors such that one is strictly preferred to the other, it is impossible to reverse this order of preference by adding the same prefix to both sequences. Selectivity is similarly defined with regard to cycle mixing: if a preference relation is selective, then, starting from two sequences of colors, it is impossible to create a third one by mixing the first two in such a way that the third one is strictly preferred to the first two. Observe that these elegant notions coincide with the natural intuition that memoryless strategies suffice if there is no interest in behaving differently in a state depending on what happened earlier.
In addition to this complete characterization, Gimbert and Zielonka proved another great result, of high interest in practice [GZ05,Corollary 7]: as a by-product of their approach, they obtain that if memoryless strategies suffice in all one-player games of P 1 and all one-player games of P 2 , they also suffice in all two-player games. Such a lifting corollary provides a neat and easy way to prove that a preference relation admits memoryless optimal strategies without proving monotony and selectivity at all : proving it in the two one-player has no finite-memory winning strategy in this game. This can be done using a standard argument: whatever the amount of memory used by P 1 , P 2 may loop in s 2 long enough as to exceed the bound up to which P 1 can track the sum accurately; thus dooming P 1 to fail to reset the sum to zero in s 1 infinitely often.  Figure 1: P 1 (circle) needs infinite memory to win in this game (by always resetting the sum of weights to zero by looping long enough in s 1 before going back to s 2 ), whereas both players have finite-memory optimal strategies in all one-player games using the same preference relations.
This modest example proves that Gimbert and Zielonka's approach cannot work in full generality in the finite-memory case, and for good reasons. Informally, in this case, the corollary breaks down because of (the absence of some sort of) monotony. In the case of memoryless strategies, as in [GZ05], P 1 is already doomed in one-player games in the absence of monotony: two prefixes to distinguish -in order to play optimally -can be hardcoded as different paths leading to the same state in a game arena, as if they were chosen by P 2 in a two-player game. In the case of finite-memory strategies, however, the situation is different. In one-player games, the number of such paths that can be hardcoded in an arena is always bounded, hence finite memory might suffice to react, i.e., to keep track of which prefix is the current one and how to behave accordingly. However, in two-player games, P 2 might create an infinite number of prefixes to distinguish (using a cycle), thus requiring P 1 to use infinite memory to be able to do so. This is exactly what happens in the example above: in any one-player game, the largest sum that P 1 has to track is bounded, whereas P 2 can make this sum as large as he wants in two-player games.
Our approach. In a nutshell, we generalize Gimbert and Zielonka's results -characterization and lifting corollary -to the case of arena-independent finite memory. That is, we encompass all situations where the memory needed by the two players is solely dependent on the preference relation (e.g., colors, dimensions of weight vectors), and not on the game arena (i.e., number of edges/states). Let us take some classical examples to illustrate this notion.
• All memoryless-determined relations -studied in [GZ05] -use arena-independent memory: the memory required, none, is the same for all arenas. • Combinations of parity objectives use arena-independent memory [CHP07]: the memory only depends on the number of objectives and the number of priorities -both parameters of the preference relation, not on the size of the arena. This informal concept of arena-independent memory is transparent in our work: in all our results, we use memory skeletons -essentially Mealy machines without a next-action function (Section 2) -that suffice for all arenas, and that are at the basis of the strategies we build. A quick look at our main concepts (Section 3.1) and results (Section 3.2) suffices to grasp the formalism behind this intuition. This restriction to arena-independent memory is natural given the counterexample to a general approach presented above. It is also important to note that it is not as restrictive as it may seem, as hinted by the examples above: we are not restricted to constant memory but to memory only depending on the parameters of the preference relation (or equivalently, objective), and not of the arena. This framework thus already encompasses many objectives from the literature -e.g., [EM79, EJ88, Zie98, CdAHS03, BMR + 18, FH10, CHP07, BHR16, CDRR15, BFL + 08, BMR + 18, BHM + 17], as well as possible extensions. We discuss this topic in more details in Section 6, where we provide a precise description of the frontiers of our results within the current research landscape.
Let us also highlight that the arena-independent case, which we solve here, is an exact equivalent to Gimbert and Zielonka's results in the finite-memory case: the memoryless case is de facto arena-independent. Therefore, this paper strictly generalizes [GZ05] by allowing to study any arena-independent memory skeleton instead of the unique trivial one corresponding to memoryless strategies.
Outline of our contributions. Informally, our characterization can be stated as follows: given a preference relation and a memory skeleton M, both players have optimal finitememory strategies based on skeleton M in all games if and only if the relation and its inverse are M-monotone and M-selective.
These last two concepts are keys to our approach. Intuitively, they correspond to Gimbert and Zielonka's monotony and selectivity, modulo a memory skeleton. Recall that monotony and selectivity are related to stability of the preference relation with regard to prefix addition and cycle mixing, respectively. Our more general concepts of M-monotony and M-selectivity serve the same purpose, but they only compare sequences of colors that are deemed equivalent by the memory skeleton. For the sake of illustration, take selectivity: it implies that one has no interest in mixing different cycles of the game arena. For its generalization, the memory skeleton is taken into account: M-selectivity implies that one has no interest in mixing cycles of the game arena that are read as cycles on the same memory state in the skeleton M.
Let us give a quick breakdown of our approach. In Section 2, we introduce all basic notions, including the memory skeletons, and we establish several technical results. We also discuss optimal strategies and Nash equilibria, their relationship, and their roles in our approach.
Section 3 is dedicated to our characterization, and consists of three parts. In Section 3.1, we introduce the concepts of M-monotony and M-selectivity, cornerstones of our work. We also present two essential tools to establish the characterization: prefix-covers and cyclic-covers of arenas. Section 3.2 states formally our characterization (Theorem 3.6), as well as the corresponding lifting corollary (Corollary 3.9), from one-player to two-player games. We close this overview with an example of application, in Section 3.3.
The proof of the characterization (Theorem 3.6) is split in two. In Section 4, we establish the implication from (the sufficiency of) finite memory based on M to M-monotony (Theorem 4.1) and M-selectivity (Theorem 4.2) of the preference relation. The main idea here is to build game arenas based on automata recognizing the languages involved in the two concepts, and to use the existence of finite-memory optimal strategies in these arenas to prove that M-monotony and M-selectivity hold.
In Section 5, we prove the converse implication. We proceed in two steps, first establishing the existence of memoryless optimal strategies in "covered" arenas (Lemma 5.1 and Theorem 5.3), and then building on it to obtain the existence of finite-memory optimal strategies in general arenas (Corollary 5.5). The main technical tools we use are Nash equilibria and the aforementioned notions of prefix-covers and cyclic-covers.
We close the paper with a discussion of our characterization, presented in Section 6: we highlight some limitations and interesting features, compare its scope with the current research landscape, and sketch directions for future work.
Technical overview. Naturally, our technical approach is inspired by the one of Gimbert and Zielonka for the memoryless case [GZ05], which can actually be rediscovered through our results using a trivial memory skeleton. Two of the most important challenges we had to overcome were: (1) establishing natural concepts of monotony and selectivity modulo memory that are exactly as powerful as required to maintain a complete characterization (i.e., sufficient and necessary conditions) in the finite-memory case; (2) circumventing the seemingly unavoidable coupling between the memory skeleton and the arena in the inductive argument needed to prove the implication from M-monotony and M-selectivity to finite-memory optimal strategies -which we were able to do using our notions of prefix-covers and cyclic-covers. All along our paper, we highlight the similarities and discrepancies between our work and Gimbert and Zielonka's [GZ05]. Whenever possible, we also go further, using weaker hypotheses and proving stronger results, along with addressing core problems left untouched in [GZ05] -while they do have an important impact on the approach (e.g., the role of the zero-sum hypothesis). In that respect, we hope to shed a new light on the seminal results of [GZ05] while generalizing them.
Critical analysis. Before jumping to the technical part of this work, let us take a step back and assess the place of our work in its larger line of research. The natural endgame is characterizing all preference relations admitting finite-memory optimal strategies, including those using arena-dependent memory, and pinpointing the frontiers of application of the lifting corollary -that is, under which conditions is finite-memory determinacy preserved when going from one-player to two-player games?
The road is long from Gimbert and Zielonka's characterization in the memoryless case [GZ05] to such a general result, and this work is but a first step. We have already established that Gimbert and Zielonka's approach cannot be fully transposed for finite memory. Our focus on arena-independent memory is a way to study the frontiers of this approach while providing an extension of practical interest. While it may seem limited at first, note that our framework already encompasses arguably rich classes of games such as, e.g., generalized parity games and fully-bounded energy games.
Let us stress that our result -relating a memory skeleton M and preference relations for which this skeleton suffices -cannot be obtained by simply considering product arenas and invoking Gimbert and Zielonka's result on memoryless determinacy [GZ05]. While, of course, memoryless strategies on product arenas correspond to memoryfull strategies on original arenas (as we will formally establish in Lemma 2.4), invoking [GZ05] requires to be able to quantify on all arenas, not only product arenas. Filling this gap is exactly the goal of this paper, and it is made possible through the new concepts we sketched above.
From a practical point of view, our equivalence result has limitations as it inherently uses the memory skeleton M. At this point, our approach neither helps in finding an appropriate skeleton, nor in determining the minimal one; two highly interesting questions from a practical standpoint. Nonetheless, to advance toward answering these questions and to be able to find good skeletons automatically, one first has to understand their theoretical characteristics, which we do here as a necessary stepping stone. Focusing on applications, let us note that the equivalence result is often not the most suited tool: this is instead where the lifting corollary shines. As noted before, reasoning on one-player games (i.e., graphs) is generally much easier than in two-player games (e.g., [BMR + 18, BHM + 17, BHRR19]). Hence, a reasonably easy way to tackle practical cases is to find skeletons sufficient for P 1 and P 2 in their respective one-player games and to use our constructive result to build a skeleton that suffices for both in two-player games: interestingly, the product of the two one-player-game skeletons is sufficient for both players in all two-player games. Hence the memory blowup is mild.
Finally, we believe it should be possible to generalize our approach to some extent to the arena-dependent case, through some function associating memory skeletons to arenas (e.g., skeletons encoding bounded counters, with bounds growing with the size of the arena, as for multiple lower-bounded energy objectives). Again, the previous example proves that this would not hold in full generality, but our hope is to establish conditions on this function (which is induced by the preference relation) under which the approach would hold. We leave this question open for now: this paper paves the way to this more general setting.
Related work. We already discussed the most important related papers, notably [GZ05]. Let us highlight here some works where similar approaches have been considered to establish "meta-theorems" applying to general classes of games. First and foremost is the determinacy theorem by Martin that guarantees determinacy (without considering the complexity of strategies) for Borel winning conditions [Mar75].
Following the same motivation as our work -the need to characterize (combinations of) objectives admitting finite-memory optimal strategies, Le Roux et al. [LPR18] take another road: whereas our work permits to lift results from one-player games to two-player games, they provide a lifting from the single-objective case to the multi-objective one.
Our work focuses on deterministic turn-based two-player games. Our results were recently extended to stochastic games [BORV21] (both the characterization in terms of generalizations of the monotony and the selectivity concepts, and the lifting corollary). Sufficient conditions for memoryless determinacy were also previously provided for stochastic models (e.g., [Gim07,GK14]). Some sufficient criteria, orthogonal to our approach, were studied for concurrent games in [Le 18]. A recent preprint [BRV21] revisits our work in the context of infinite arenas, providing a game-theoretic characterization of ω-regular objectives.
Finally, we recently discovered unpublished content in Kopczyński's PhD thesis [Kop08]. Kopczyński distinguishes chromatic memory (which corresponds to our definition of memory skeleton), and the more powerful chaotic memory, where transitions of the memory can depend on the actual edges of the arenas, rather than simply on the colors of the edges. Chaotic memory is thus intrinsically arena-dependent. Our notion of an arena being both prefix-and cyclic-covered by a memory skeleton M is equivalent to a notion in [Kop08,Definition 8.12], which defines that an arena adheres to chromatic memory M if it is possible to assign a state of M to every state of the arena such that moving along the edges of the arena updates these memory states in a consistent way. Our definitions of prefix-and cyclic-cover can be seen as two distinct sides of this idea of adherence, which when added up, are actually equivalent to it.
Comparison with conference version. Our paper presents in full details the contributions published in a preceding conference version [BLO + 20]. All sections have been supplemented with extra explanations, remarks and examples. All proofs are now directly provided in the main text. In practice, the previous "Technical sketch" section has been replaced by two sections (Sections 4 and 5), each detailing the proof of an implication of our main equivalence (Theorem 3.6). These sections contain extra intermediate lemmas which, albeit more technical, have interest on their own. A new section (Section 3.4) was also added to formally prove statements about the counterexample ( Figure 1) sketched in this introduction.

Preliminaries
Automata and languages of colors. Let C be an arbitrary set of colors.
We recall classical notions on automata on finite words. A non-deterministic finite-state automaton (NFA) is a tuple N = (Q, B, δ, Q init , Q fin ), where Q is a finite set of states, B ⊆ C is a finite alphabet of colors, δ ⊆ Q × B × Q is a set of transitions, Q init ⊆ Q is a set of initial states, and Q fin ⊆ Q is a set of final states. Given a state q ∈ Q and a word w ∈ B * , we denote by δ(q, w) the set of states that can be reached from q after reading w. Without loss of generality, we assume all NFA to be coaccessible, i.e., for all q ∈ Q, there exists w ∈ B * , such that δ(q, w) ∩ Q fin = ∅. Recall that NFA precisely recognize regular languages.
For any finite subset B ⊆ C, we denote by Reg(B) the set of all regular languages over B. Let R(C) = B ⊆ C, |B|<∞ Reg(B), that is, all the regular languages built over C.
Let K ⊆ C * be a language of finite words. We denote by Prefs(K) the set of all prefixes of the words in K. We define the set of infinite words [K] = {w = c 1 c 2 . . . ∈ C ω | ∀ n ≥ 1, c 1 . . . c n ∈ Prefs(K)}, which contains all infinite words for which every finite prefix is a prefix of a word in K. Intuitively, if K is regular, [K] is the language of infinite words that correspond to infinite paths that can always branch and reach a final state, on an automaton for K: we will formalize this in Lemma 2.2. Given a finite word w ∈ C * and a language K ⊆ C * , we write wK for their concatenation, i.e., the language wK = {w = ww | w ∈ K} ⊆ C * .
The following observation, already noted in [GZ05], will come in handy too.
Proof. Let w ∈ [K 1 ∪ K 2 ]. Every finite prefix of w is in Prefs(K 1 ∪ K 2 ). Assume w.l.o.g. that infinitely many prefixes of w are in Prefs(K 1 ). This implies that all prefixes of w are in Prefs(K 1 ) (intuitively, because there is a continuity in the prefix relation). Hence, , every finite prefix of w is in Prefs(K 1 ) (resp. Prefs(K 2 )), so in particular it is in Prefs(K 1 ∪ K 2 ). Hence, w ∈ [K 1 ∪ K 2 ].
Arenas. We consider two players: player 1 (P 1 ) and player 2 (P 2 ). An arena is a tuple A = (S 1 , S 2 , E) such that S = S 1 S 2 (disjoint union) is a finite set of states partitioned into states of P 1 (S 1 ) and P 2 (S 2 ), and E ⊆ S × C × S is a finite set of edges. Let col : E → C be the projection of edges to colors and col its natural extension to sequences of edges. For an edge e ∈ E, we use in(e) and out(e) to denote its starting state and arrival state respectively, i.e., e = (in(e), col(e), out(e)). We assume all arenas to be non-blocking, i.e., for all s ∈ S, there exists e ∈ E such that in(e) = s. For i ∈ {1, 2}, we call an arena A = (S 1 , S 2 , E) a P i 's one-player arena if for all s ∈ S 3−i , |{e ∈ E | in(e) = s}| = 1 -that is, P 3−i has no choice.
Let Hists(A, s) denote the set of histories in A from initial state s ∈ S, i.e., finite sequences of edges ρ = e 1 . . . e n ∈ E + such that in(e 1 ) = s and for all i, 1 ≤ i < n, out(e i ) = in(e i+1 ). Let Plays(A, s) denote the set of plays in A from initial state s ∈ S, i.e., infinite sequences of edges π = e 1 e 2 . . . ∈ E ω such that in(e 1 ) = s and for all i ≥ 1, out(e i ) = in(e i+1 ). We write Hists(A, S ) and Plays(A, S ) for the unions over subsets of initial states S ⊆ S, and write Hists(A) and Plays(A) for the unions over all states of A.
Let ρ = e 1 . . . e n ∈ Hists(A) (resp. π = e 1 e 2 . . . ∈ Plays(A)): we extend the operator in to histories (resp. plays) by identifying in(ρ) (resp. in(π)) to in(e 1 ). We proceed similarly for out and histories: out(ρ) = out(e n ). For the sake of convenience, we consider that any set Hists(A, s) contains the empty history λ s such that in(λ s ) = out(λ s ) = s. We write Hists i (A, s) and Hists i (A) for the subsets of histories ρ such that out(ρ) ∈ S i , i ∈ {1, 2}, i.e., histories whose last state belongs to P i .
For any set of histories H ⊆ Hists(A), we write col(H) for its projection to colors, i.e., col(H) = { col(ρ) | ρ ∈ H}. We do the same for sets of plays.

Memory skeletons.
A memory skeleton is a tuple M = (M, m init , α upd ) where M is a finite set of states, m init ∈ M is a fixed initial state and α upd : M × C → M is an update function. We write α upd for the natural extension of α upd to sequences of colors in C * . Note that memory skeletons are deterministic and might have an infinite number of transitions, in contrast to NFA. We define the trivial memory skeleton with only one state as M triv = (M = {m init }, m init , α upd : {m init } × C → {m init }): it permits to formalize memoryless strategies [GZ05] in our framework.
Let M = (M, m init , α upd ) be a memory skeleton. For m, m ∈ M , we define the language L m,m = {w ∈ C * | α upd (m, w) = m } that contains all words that can be read from m to m in M.
Product arenas. Let A = (S 1 , S 2 , E) be an arena and M = (M, m init , α upd ) be a memory skeleton. We define their product A M as the arena (S 1 , S 2 , E ) where S 1 = S 1 × M , S 2 = S 2 ×M , and E ⊆ S ×C ×S , with S = S 1 S 2 , is such that ((s 1 , m 1 ), c, (s 2 , m 2 )) ∈ E if and only if (s 1 , c, s 2 ) ∈ E and α upd (m 1 , c) = m 2 . That is, the memory is updated according to the colors of the edges in E. Note that even though M might contain an infinite number of transitions since C might be infinite, A M is always finite, as E is finite in A. Since we assume arena A is non-blocking, it is also the case of arena A M.
Arena induced by an NFA. Let N = (Q, B, δ, Q init , Q fin ) be an NFA. We say that a state q ∈ Q is essential if there exists an infinite path in N starting in q. Let Q ess = {q ∈ Q | q is essential}. We define the corresponding one-player arena Arena(N ) = (S 1 = Q ess , S 2 = ∅, E ⊆ Q ess × B × Q ess ), where e = (q, c, q ) ∈ E if (q, c, q ) ∈ δ. Intuitively, Arena(N ) transforms N into a non-blocking arena thanks to the restriction to essential states.
We may now state formally the link between [K] and the underlying automaton for K. Our result (and its proof) is similar to [GZ05, Lemma 4].
Lemma 2.2. Let N = (Q, B, δ, Q init , Q fin ) be a (coaccessible) NFA recognizing the regular language K ⊆ C * . Let Q init = Q init ∩ Q ess . The following equality holds: In particular, [K] is non-empty if and only if there exists an essential initial state in N .
Intuitively, [K] is the language of infinite words that correspond to infinite paths that can always branch and reach a final state, on the automaton N recognizing K.
Proof. If Q init is empty, the equality trivially holds: col Plays (Arena(N ), Q init ) and [K] are both empty. Hence, from now on, we assume Q init = ∅.
We start with the left-to-right inclusion. Let w = c 1 c 2 . . . ∈ [K]. We first prove that for all n ≥ 1, it holds that c 1 . . . c n ∈ col Hists Arena(N ), Q init .
We assume on the contrary that there exists n ≥ 1 such that c 1 . . . c n / ∈ col Hists Arena(N ), Q init .
As Arena(N ) is a restriction of the states of N to Q ess , this means that no matter how c 1 . . . c n is read on N , it goes through a state in Q \ Q ess . As there is no infinite path from these states, this contradicts that w ∈ [K]; there cannot be arbitrarily long prefixes starting with c 1 . . . c n . We now use the property that we have just proved along with König's lemma to show that w ∈ col Plays (Arena(N ), Q init ) . We build a forest of trees F. The vertices of F are paths ρ ∈ Hists (Arena(N ), Q init ) such that col(ρ) is a prefix of w and in(ρ) ∈ Q init . For every q ∈ Q init , there is one tree in F whose root is the empty path λ q . There is a transition from a vertex ρ to a vertex ρ if there exists e ∈ E such that ρ · e = ρ . As there is at least one vertex for each prefix c 1 . . . c n , (at least) one of the trees of F must be infinite. Moreover, F is finitely branching. By König's lemma, we obtain that there must be an infinite path π starting from a root λ q for some q ∈ Q init . By construction, col(π) = w, so w ∈ col Plays (Arena(N ), Q init ) .
We now prove the right-to-left inclusion. Let π = e 1 e 2 . . . ∈ Plays (Arena(N ), Q init ). For n ≥ 1, the word col(e 1 . . . e n ) is the color of a path in N , since every edge of Arena(N ) corresponds to a transition of N . As N is coaccessible, there is a path in N from the state corresponding to out(e n ) to a final state in Q fin . Thus, the word col(e 1 . . . e n ) is a prefix of an accepted word of N , i.e., a prefix of a word in K; as this holds for all n ≥ 1, we obtain that col(π) ∈ [K].

Strategies.
A strategy σ i for P i , i ∈ {1, 2}, on arena A = (S 1 , S 2 , E), is a function σ i : Hists i (A) → E such that for all ρ ∈ Hists i (A), in(σ i (ρ)) = out(ρ). Let Σ i (A) be the set of all strategies of P i on A.
A finite-memory strategy σ i is a strategy that can be encoded as a Mealy machine, i.e., a memory skeleton M = (M, m init , α upd ) with transitions over a finite subset of colors B ⊆ C, enriched with a next-action function α nxt : M × S i → E such that for all m ∈ M , s ∈ S i , in(α nxt (m, s)) = s. Given a Mealy machine Γ σ i = (M, α nxt ), strategy σ i is defined as follows: We denote by Σ FM i (A) the set of all finite-memory strategies of P i on A. We say that a strategy σ i ∈ Σ FM i (A) is based on memory skeleton M if it can be encoded as a Mealy machine Γ σ i = (M, α nxt ), as above. We always implicitly assume that strategies of Σ FM i (A) are built by restricting the transitions of their skeleton M to the actual subset of colors appearing in A. A strategy σ i is memoryless if it is a function σ i : S i → E, or equivalently, if it is based on the trivial memory skeleton M triv . We denote by Σ ML i (A) the set of all memoryless strategies of P i on A.
We denote by Plays(A, s, σ i ) the set of plays consistent with a strategy σ i of P i from an initial state s, i.e., all plays π = e 1 e 2 . . . ∈ Plays(A, s) such that for all prefixes ρ = e 1 . . . e n , out(ρ) ∈ S i =⇒ σ i (ρ) = e n+1 . We write Plays(A, s, σ 1 , σ 2 ) for the singleton set containing the unique play consistent with a couple of strategies for the two players. We use similar notations for histories.
Preference relations. Let be a total preorder on C ω , called preference relation. We consider antagonistic games, where the objective of P 1 is to create the best possible play with regard to whereas the objective of P 2 is to obtain the worst possible one. That is, P 2 uses the inverse relation −1 . This corresponds to zero-sum games when using a quantitative framework.
Given w, w ∈ C ω , we write w w if we have ¬(w w) since the preorder is total. We extend the relation to subsets of C ω as follows: for W, W ⊆ C ω , Note that W W if and only if ¬(W W ), and that transitivity is preserved when considering sets.
We sometimes compare words w ∈ C ω with languages K ⊆ C ω , by simply identifying word w to its singleton language {w}.

Games.
A (deterministic turn-based two-player) game is a tuple G = (A, ) where A is an arena and is a preference relation. As discussed in Section 1, all the classical objectives from the literature (both qualitative and quantitative) can be expressed in the general framework of preference relations. For i ∈ {1, 2}, a P i 's one-player game is a game G = (A, ) such that A is a P i 's one-player arena.
Example 2.3. There are two prominent ways to formalize game objectives in the literature: through payoff functions and through winning conditions. We take an example of each. First, consider (lim inf) mean-payoff games [EM79]. In this setting, colors are integers, i.e., C = Z, and the goal of P 1 is to create a play π, with w = col(π) = c 1 c 2 . . . , maximizing the following payoff function: Such a payoff function induces a natural preference relation MP between sequences of colors as follows: for all w, w ∈ C ω , w MP w if and only if MP(w) ≤ MP(w ). Such quantitative games are zero-sum, hence P 2 uses the natural inverse relation −1 MP : he is a minimizer player in the payoff formulation of these games.
Second, consider reachability games (e.g., [FH10]). In this setting, only two colors are needed: one for edges in the target set, and one for the other edges. Let us use and ⊥ respectively to color these two sets of edges, i.e., C = { , ⊥}. Then, the winning condition can be simply written as W = {w ∈ C ω | ∈ w} C ω , i.e., W is the set of plays seeing at least once. In such games, the goal of P 1 is to create a play π such that col(π) ∈ W , called winning play. Defining a corresponding preference relation reach is straightforward: for all w, w ∈ C ω , w reach w if and only if w ∈ W and w ∈ W . That is, reach defines two equivalence classes: losing and winning plays. This qualitative setting is antagonistic, hence P 2 uses the inverse relation −1 reach : his winning condition is W = C ω \ W in the classical formulation of these games.
As explained in Section 1, quantitative games are often reduced to qualitative ones by fixing a threshold to achieve.
Optimal strategies. Let G = (A, ) be a game on arena A = (S 1 , S 2 , E). Given a P i -strategy σ i ∈ Σ i (A) and a state s ∈ S, we define Note that DCol (A, s, σ i ) = UCol −1 (A, s, σ i ). Intuitively, UCol and DCol represent the upward and downward closures of sequences of colors (consistent with a strategy) with respect to the preference relation.
Taking the standpoint of P 1 , we say that a strategy σ 1 ∈ Σ 1 (A) is at least as good as a strategy Intuitively, σ 1 is at least as good as σ 1 if the "worst-case" plays consistent with σ 1 are at least as good as the ones consistent with σ 1 . The UCol operator is useful to define this notion properly even in the case where there is no "worst-case" play for a strategy (i.e., if the infimum used in the classical quantitative setting is not reached). Similar notions have been used before, e.g., in [Le 13].
Symmetrically, for P 2 , we say that a strategy σ 2 ∈ Σ 2 (A) is at least as good as a strategy σ 2 ∈ Σ 2 (A) from a state s ∈ S if DCol (A, s, σ 2 ) ⊆ DCol (A, s, σ 2 ). Now, we say that a strategy σ i ∈ Σ i (A) of P i is optimal from a state s ∈ S, aka s-optimal, if it is at least as good as every other strategy σ i ∈ Σ i (A) from s. We extend this notation to subsets of states in the natural way, and we say that a strategy σ i is uniformly-optimal if it is S-optimal. The goal of our paper is to characterize the preference relations that admit uniformlyoptimal finite-memory (UFM) strategies based on a given skeleton M in all arenas. We also discuss the simpler case of uniformly-optimal memoryless (UML) strategies, which corresponds to the subset of preference relations studied by Gimbert and Zielonka [GZ05], using the trivial skeleton M triv .
In that respect, the following link is important to observe. Proof. We first aim to define a bijection H : . . e n ∈ Hists(A), with e j = (s j , c j , s j+1 ). We set m 1 = m init , and for 2 ≤ j ≤ n, m j = α upd (m j−1 , c j ). We define e j = ((s j , m j ), c j , (s j+1 , m j+1 )), and H(ρ) = e 1 . . . e n .
Notice that col(H(ρ)) = col(ρ). Furthermore, H is bijective; as the initial state of the memory m init is fixed and the memory skeleton is deterministic, the memory states added to ρ to obtain H(ρ) are uniquely determined.
We now show that there is a correspondence between strategies of Σ i (A) and strategies of Σ i (A M): intuitively, augmenting the arena with the skeleton allows some strategies to be played using less memory, but does not fundamentally change each player's possibilities.
We define a function f : , c, (s , α upd (m, c))). The histories induced by strategies τ i and f (τ i ) correspond: if ρ = H −1 (ρ ), then we have (2.1) We are only interested in the behavior of strategies of Σ i (A M) on histories ρ with in(ρ ) ∈ S × {m init } (in what follows, we will only consider histories and plays starting in such states). If we restrict the image of f to the set of strategies τ i : Consider the next-action function α nxt of strategy σ i . Formally, we have to transform α nxt into a proper memoryless strategy in Σ ML i (A M). This can be done through the bijection f , yielding the memoryless strategy α nxt = f (σ i ), which corresponds to α nxt interpreted over the product arena, and well-defined for all histories starting in S × {m init }.

2.2)
This can easily be proved by induction using Equation (2.1). Indeed, at each step, both strategy τ 1 (resp. τ 2 ) and strategy f (τ 1 ) (resp. f (τ 2 )) pick an edge with the same color. We finish the proof assuming that σ i = σ 1 ∈ Σ FM 1 (A); the proof is symmetric for P 2 . Equation (2.2) implies that for all s ∈ S, τ 1 ∈ Σ 1 (A), where the penultimate equality uses that the aforementioned restriction of f is bijective.
Using Equation (2.3), we can obtain that a strategy τ 1 is uniformly- Nash equilibria. We use Nash equilibria [OR94] as tools to establish the existence of optimal strategies in some of our proofs. Let G = (A, ) be a game on arena A = (S 1 , S 2 , E). Formally, a Nash equilibrium (NE) from a state s ∈ S is a couple of strategies (σ 1 , (

2.4)
Similarly to optimal strategies, we call an NE uniform if it is an NE from all states s ∈ S. It is worth taking a moment to discuss the link between optimal strategies and Nash equilibria in our specific context of antagonistic games. Both notions seem closely related, and indeed, in [GZ05], Gimbert and Zielonka did choose Equation (2.4) -i.e., the definition of a Nash equilibrium -as their definition of a pair of optimal strategies. This could lead the reader to believe that both notions coincide. However, they do not in full generality, as we discuss in the following.
Additionally, defining optimality for a pair of strategies gives rise to difficulties as one naturally wants to reason about optimal strategies of a player without talking about the (possibly optimal) strategy of its adversary. Our definition of optimal strategy has the advantage of giving a clear and precise definition that does not involve the standpoint of the adversary.
As stated before, the ultimate goal of our paper is to characterize preference relations that admit finite-memory optimal strategies, but Nash equilibria will serve as tools in our endeavor. Let us establish two interesting properties of Nash equilibria in antagonistic games.
First, it is possible to mix different Nash equilibria.
Lemma 2.5. Let G = (A, ) be a game on arena A = (S 1 , S 2 , E), and let s ∈ S be a state.
is also a Nash equilibrium from s.
We now establish that Nash equilibria induce optimal strategies (again, in our antagonistic context).
Lemma 2.7. Let G = (A, ) be a game on arena A = (S 1 , S 2 , E), and let s ∈ S be a state.
be a Nash equilibrium from s. Then, both σ 1 and σ 2 are s-optimal strategies.
As noted above, optimal strategies do not always coincide with Nash equilibria. Intuitively, they do coincide in the classical quantitative formulation (using payoff functions) if the value of the game exists, that is, if the best payoff that P 1 can guarantee is equal to the worst payoff that P 2 can guarantee [OR94]. In a sense, our concepts of UCol and DCol are meant to mimic the classical sup inf and inf sup formulations in our abstract context where objectives are described as preference relations.
So (quantitative) games do not always have a value, and similarly, in our context, optimal strategies as defined above do not always induce a Nash equilibrium. That being said, they do for arguably all reasonable preference relations, as Martin's determinacy result grants the existence of winning strategies -in our formalism, Nash equilibria -for all Borel winning conditions [Mar75]. That is, for the equivalence to fail, we would need a preference relation capable of inducing non-Borel sets of winning plays (once a threshold is chosen, as explained in Section 1). Hence, the two notions are virtually equivalent -yet, the above clarification is needed to circumvent slight technical issues in the original proofs of Gimbert and Zielonka [GZ05].
Remark 2.8. In one-player games, the two visions -optimal strategies and Nash equilibriacoincide.
Remark 2.9. Lemma 2.4 can be restated in terms of Nash equilibria, using a similar reasoning.

Characterization
We are now able to establish our characterization of preference relations admitting finitememory optimal strategies based on a given memory skeleton M. We proceed in three steps. First, in Section 3.1, we present the core concepts of this characterization, i.e., the properties that preference relations must verify to yield UFM strategies. Second, we state our equivalence result in Section 3.2, alongside a corollary of practical interest that lets one lift results from the one-player case to the two-player one. We defer the formal proofs of both directions of the equivalence to Section 4 and Section 5, and only explain here how to combine them. Finally, we provide an illustrative application of our characterization in Section 3.3.

Concepts.
Generalizing monotony and selectivity. As discussed in Section 1, Gimbert and Zielonka's characterization [GZ05] relies on notions of monotony and selectivity of the preference relation. Intuitively, the main difference between Gimbert and Zielonka's technical approach and ours is the following. In the memoryless setting, all the reasoning can be abstracted away from the underlying arena and done at the level of sequences of colors. In the finite-memory one, however, one has to pay attention to how sequences of colors are composed and compared, to maintain consistency with regard to the memory and the underlying game arena. This need to intertwine abstract reasoning on arbitrary sequences of colors with concrete tracking of memory updates is the key obstacle to overcome.
Much of our effort was thus spent on trying to define concepts that would preserve the elegance of monotony and selectivity while allowing us to lift the theory to the finite-memory case. As often the case in these endeavors, the right concepts turned out to be the most natural ones, capturing the intuitive idea that one needs monotony and selectivity modulo a memory skeleton. (3.1) Recall that a memory skeleton M has a fixed initial state m init . Intuitively, M-monotony extends Gimbert and Zielonka's monotony by asking one to compare prefixes belonging to the same language L m init ,m , that is, prefixes that are deemed equivalent by the memory skeleton. This property roughly captures that is stable with regard to prefix addition, for memory-equivalent prefixes. The original monotony notion is exactly equivalent to our M-monotony with M being the trivial skeleton M triv : that is, the memoryless case is naturally a specific subcase of our framework.
Similarly, M-selectivity extends Gimbert and Zielonka's selectivity by asking one to compare sequences of colors belonging to the same language L m,m , that is, sequences read as cycles on the memory skeleton. Note also that the memory state m should be consistent with the prefix w read from the initial memory state m init . This property roughly captures that is stable with regard to cycle mixing, for memory-equivalent cycles.
Again, the original selectivity notion is exactly equivalent to M triv -selectivity. In a nutshell, M-monotony deals with prefixes up to the first cycle (on memory) and M-selectivity deals with the cycles thereafter; we will see that memory skeletons can be built in a compositional way based on these two orthogonal yet complementary tasks. We present an example illustrating both concepts and their application in Section 3.3.
Our notions respect the natural intuition that access to additional memory should always be helpful: if a skeleton M is sufficient to classify sequences of colors in a way that guarantees M-monotony and M-selectivity, then it should also be the case for "more powerful" skeletons. Let us assume that is M-monotone, that is, for all m ∈ M , for all K 1 , K 2 ∈ R(C), We show that is (M ⊗ M )-monotone, that is, for all (m, m ) ∈ M × M , for all K 1 , K 2 ∈ R(C), number of choices in an arena. Intuitively, we want to use a similar approach for UFM strategies, but because of the unavoidable coupling between the memory skeleton and the arena (e.g., Lemma 2.4), the induction argument breaks, as adding one choice in the arena results in adding many in the product arena (as many as there are memory states), where the reasoning needs to take place. New insight and techniques are thus needed to patch this induction scheme.
To solve this issue, we decouple the two aspects (see Section 5). Intuitively, we first establish that, on arenas that inherently share the same good properties as product arenas (that is, they already "classify" prefixes and cycles as the memory would), we can deploy the induction argument and obtain UML strategies. Then, we obtain the result for UFM strategies on general arenas as a corollary. The crux is identifying such "good" arenas: this is done through the following notions. We say that M is a prefix-cover of S cov in A if for all s ∈ S, there exists m s ∈ M such that, for all ρ ∈ Hists(A) such that in(ρ) ∈ S cov , out(ρ) = s and such that for all ρ proper prefix of ρ, out(ρ ) = s, we have α upd (m init , col(ρ)) = m s .
Intuitively, M is a prefix-cover for a set of states S cov if the histories starting in S cov and visiting a given state s ∈ S for the first time are read up to the same memory state in the memory skeleton. Similarly, M is a cyclic-cover of A if the cycles 1 of A are read as cycles in the memory skeleton, once the memory has been initialized properly.
As hinted above, the canonical example of a prefix-covered and cyclic-covered arena is a product arena (but many more arenas can be covered, hence it is beneficial to be general with these concepts).
Lemma 3.5. Let M = (M, m init , α upd ) be a memory skeleton and A = (S 1 , S 2 , E) be an arena. Then M is both a prefix-cover and a cyclic-cover for S cov = S × {m init } in the product arena A M.
Proof. The main argument that we will be using in this proof is that if there is a history ρ with in(ρ) = (s, m) and out(ρ) = (s , m ) in the product arena A M, then reading col(ρ) from m in the memory skeleton M leads to m (i.e., α upd (m, col(ρ)) = m ). This can be easily proved by induction on the length of ρ, thanks to how the product arena is built.
We first show that M is a prefix-cover for S cov = S ×{m init } in the product arena A M. What we have to prove, instantiating the definition of prefix-cover in this case, is that for all (s, m) ∈ S × M , there exists m (s,m) ∈ M such that, for all ρ ∈ Hists(A M) such that in(ρ) ∈ S cov , out(ρ) = (s, m) and such that for all ρ proper prefix of ρ, out(ρ ) = s, we have α upd (m init , col(ρ)) = m (s,m) . Let (s, m) ∈ M ; we take m (s,m) = m. Then, if ρ ∈ Hists(A M) is such that in(ρ) ∈ S cov (that is, is equal to (s , m init ) for some s ∈ S), and out(ρ) = (s, m), we have by construction of the product arena that α upd (m init , col(ρ)) = m = m (s,m) , as required.

Main results.
Equivalence. We now have the necessary ingredients to state our general equivalence result formally. We state this theorem broadly and with a focus on UFM strategies. The actual results we have for each direction of the equivalence -which we develop in Section 4 and Section 5are a bit stronger, of wider applicability and/or more interesting, but this statement carries the take-home message of our work. It is also meant to mirror the seminal result of Gimbert and Zielonka [GZ05, Theorem 2]: their result can be retrieved from Theorem 3.6 by taking the trivial memory skeleton M triv . As such, our work brings a strict generalization of Gimbert and Zielonka's results [GZ05] to the finite-memory case.
Remark 3.7. We will refine the statement of our intermediate results in order to use weaker hypotheses and/or grant stronger conclusions, whenever possible. For example, we only need optimal strategies in the left-to-right direction (Theorem 4.1 and Theorem 4.2) -and not the stronger notion of Nash equilibrium -while we do prove the existence of finite-memory Nash equilibria in the other direction (Theorem 5.3 and Corollary 5.5).
Similarly, we study the two implications of the equivalence in a compositional way: we split the reasoning for M-monotony and M-selectivity, using different skeletons for each whenever meaningful, as well as for the players, again when beneficial. Additionally, we distinguish between arenas where the players do not need memory and the ones where they do, the first essentially being arenas that already share the good properties of product arenas (as in "product with a memory skeleton").
While such a level of care is not necessary to obtain Theorem 3.6, it has two advantages. First, from a practical standpoint, it permits to obtain more useful results 2 when focusing on a particular direction of the equivalence (as often required in applications). Second, from a theoretical standpoint, it permits to isolate each concept and each element of the reasoning and to highlight their true roles 3 in the underlying mechanisms that lead to the existence of UFM strategies.
To prove Theorem 3.6, we invoke the results we will prove in Section 4 and Section 5. Proof of Theorem 3.6. The left-to-right implication trivially follows from Theorem 4.1 (intuitively, the sufficiency of finite-memory strategies based on M in all one-player arenas implies M-monotony) and Theorem 4.2 (similar, but for M-selectivity), applied to each player with respect to his preference relation. The converse implication is established in Corollary 5.5 (intuitively, M-monotony together with M-selectivity implies the existence of a uniform finite-memory NE with strategies based on M in all two-player arenas), which can be restated in terms of UFM strategies through Lemma 2.7.
As a by-product of our method, we also obtain a similar equivalence by solely considering one of the two players and the corresponding one-player arenas. Although this looks like a weak version of Theorem 3.6 at first sight, this is actually a distinct result as both sides of the equivalence are weaker: on the left side, it only handles the memory requirements for P 1 's one-player games; on the right side, it does not assume anything about the inverse preference relation −1 .
Albeit close, this is distinct from the half-positional determinacy result from [BFMM11, Theorem 3], which gives sufficient conditions about a winning condition for a player to admit memoryless optimal strategies on every two-player arena -in Theorem 3.8, we give a necessary and sufficient condition for a player to admit UFM strategies on his oneplayer arenas only. The sufficient conditions from [BFMM11] (strong monotony and strong concavity) imply M triv -monotony and M triv -selectivity, but not the other way around. Given a preference relation, it is possible for a player to have UFM strategies on his one-player arenas, but not on all two-player arenas: e.g., the example used in [BFMM11,Lemma 15]. In such an example, Theorem 3.8 could be applied, but not the result from [BFMM11].
Proof of Theorem 3.8. The left-to-right implication trivially follows from Theorem 4.1 (for M-monotony) and Theorem 4.2 (for M-selectivity). The converse implication is established in Corollary 5.6.
Lifting corollary. As discussed in Section 1, the work of Gimbert and Zielonka contains not one, but two great results. Alongside the aforementioned equivalence result, Gimbert and Zielonka provide a corollary of high practical interest [GZ05, Corollary 7]: they essentially obtain as a by-product of their approach that if memoryless strategies suffice in all one-player games of P 1 and in all one-player games of P 2 , these strategies also suffice in all two-player games.
This provides an elegant way to prove that a preference relation (or equivalently an objective) admits memoryless optimal strategies without proving monotony and selectivity at all : proving it in the two one-player subcases, which is generally much easier 4 as it boils down to graph reasoning, and then lifting the result to the general two-player case through the corollary.
Again, we are able to lift this corollary to the arena-independent finite-memory case, as follows.
Corollary 3.9. Let be a preference relation and M 1 , M 2 be two memory skeletons. Assume that (1) for all one-player arenas A = (S 1 , S 2 = ∅, E), P 1 has a UFM strategy σ 1 ∈ Σ FM 1 (A) based on memory skeleton M 1 in G = (A, ); (2) for all one-player arenas A = (S 1 = ∅, S 2 , E), P 2 has a UFM strategy σ 2 ∈ Σ FM 2 (A) based on memory skeleton M 2 in G = (A, ). Then, for all two-player arenas A = (S 1 , S 2 , E), both P 1 and P 2 have UFM strategies σ i ∈ Σ FM i (A) based on memory skeleton M = M 1 ⊗ M 2 in G = (A, ). We highlight the two (possibly different) skeletons of the two players to maintain a compositional approach, but if the same skeleton M works in both one-player 5 versions, it also suffices in the two-player version.
Proof. By Theorem 4.1 and Theorem 4.2 -which essentially state that the left-to-right implication of Theorem 3.6 holds already in one-player games, the hypothesis yields that is M 1 -monotone and M 1 -selective, while −1 is M 2 -monotone and M 2 -selective. Now it suffices to apply Corollary 5.5 -essentially the right-to-left implication of Theorem 3.6to get the claim.

Example of application.
We present an illustrative application of our results, thereby proving the existence of UFM strategies for a specific preference relation: the conjunction of two reachability objectives, a subcase of generalized reachability games, studied extensively in [FH10]. Let C be an arbitrary set of colors, and T 1 , T 2 ⊆ C be two target sets of colors that have to be reached at least once. Formally, let the winning condition W ⊆ C ω be the set of infinite words w = c 1 c 2 . . . such that This winning condition induces a two-level (i.e., win/lose) preference relation as discussed in Example 2.3.
In this example, we will use Theorem 3.6 directly in order to provide one thorough illustration of the definitions of M-monotony and M-selectivity. However, in practice, using Corollary 3.9 is preferable, as it yields a much shorter proof: by exhibiting the right skeletons for P 1 and P 2 , we simply have to show that these skeletons are sufficient to play optimally on both players' one-player arenas, which amounts to graph reasoning.
We start by showing that this preference relation is not M triv -monotone (that is, is not monotone for [GZ05]). Assume c 1 ∈ T 1 \ T 2 , c 2 ∈ T 2 \ T 1 , and c 3 / ∈ T 1 ∪ T 2 . Take K 1 = c * 1 , . This means that the preference relation is not stable with regard to prefix addition (at least, without distinguishing different classes of prefixes). Similarly, it is not M triv -selective (take w as the empty word, K 1 = c * 1 , K 2 = c * 2 , K 3 = c * 3 : to win, K 1 and K 2 need to be mixed). We exhibit two memory skeletons M p = (M p , m p init , α p upd ) and M c = (M c , m c init , α c upd ) such that is M p -monotone and M c -selective: they are pictured in Figure 2. Note that such skeletons are obviously not unique. 5 In Corollary 3.9 -and in other places further in this paper -we use a slightly more restrictive definition of one-player arenas (no state of the opponent) than in Section 2 (no choice in states of the opponent). Both definitions are morally equivalent, and our use of the more restrictive version here is without loss of generality (as it yields a weaker hypothesis). We use this definition for the sake of readability whenever possible. • If m = m p 2 , then w and w reach T 1 . Clearly, w cannot reach T 2 (as [wK 1 ] would be winning). This implies that [K 2 ] must contain a word reaching T 2 ; as w reaches T 1 , the concatenation of w with the word of [K 2 ] reaching T 2 means that there is a winning word Let us now prove that is M c -selective. Let w ∈ C * , m = α c upd (m c init , w), K 1 , K 2 ∈ R(C) such that K 1 , K 2 ⊆ L m,m , and K 3 ∈ R(C). We show that Equation (3.2) is satisfied, i.e., that ]. If all words of [w(K 1 ∪ K 2 ) * K 3 ] are losing, this equation trivially holds; we thus assume that this set contains a winning word. We therefore have to show that there is a winning word in [wK * 1 ], [wK * 2 ], or [wK 3 ]. We study the two possible values of m separately. • If m = m c init , then w does not reach T 1 nor T 2 , and the same holds for all words of K 1 and K 2 , as Similar arguments can be laid out to show that the preference relation −1 of P 2 is M p -monotone and M triv -selective (where M triv is the trivial memory skeleton defined earlier). Let M = M p ⊗ M c ⊗ M triv be the product of all the considered skeletons, depicted in . Note that the number of states of memory skeleton M is minimal (no memory skeleton with two states or fewer suffices for P 1 to play optimally in all arenas [FH10]). Notice also that the one-player equivalence (Theorem 3.8) gives us a more precise result for one-player games of P 2 : in these games, P 2 can play with memory M p ⊗ M triv (which corresponds to M p ). We provide an example of a one-player arena A = (S 1 , S 2 = ∅, E) in Figure 3, and show that there is a UFM strategy for the preference relation based on skeleton M. To do so, we invoke Lemma 2.4: we show equivalently that the product A M admits an (S × {m init })-optimal memoryless strategy for . Notice that no memoryless strategy suffices to play optimally in G = (A, ), as when starting in s 2 , P 1 should first visit s 1 before going to s 3 . Also, the (S × {m init })-optimal memoryless strategy for the product arena is only optimal if the initial state is in S × {m init }; it is for instance not optimal from state (s 2 , m 2 ).
3.4. Counterexample to a general lifting corollary. We discuss in full details the counterexample presented in Section 1. We recall that the goal of this counterexample is to show that a lifting corollary for general finite-memory (instead of arena-independent finite-memory) determinacy is not possible.
Let C = Z. We consider the following two winning conditions: c i = 0 for infinitely many n's}.
If the play obtained by playing a game is π, P 1 wins if and only if col(π) lies in W = W 1 ∪ W 2 , and P 2 wins if and only if col(π) lies in W = C ω \ W = W 1 ∩ W 2 (which corresponds to the description given in Section 1). We prove that P 1 and P 2 have finite-memory optimal strategies in their respective one-player games.
Let us fix some terminology beforehand: we say that a cycle in an arena is a zero cycle if the sum of its weights is zero, a positive cycle if this sum is strictly positive, and a negative cycle if this sum is strictly negative.
We first consider P 1 's one-player games. In a one-player arena, P 1 can create a play π such that col(π) ∈ W 1 if and only if there is a reachable positive cycle. In this case, P 1 can win with a memoryless strategy (simply reaching the cycle and then looping in it). If that is not possible, in order to win, P 1 has to induce a play π such that col(π) ∈ W 2 . We show that if possible, this can be done using finite memory. Let us assume that there exists a play π = e 1 e 2 . . . such that col(π) = c 1 c 2 . . . ∈ W 2 . Let us consider two indices k, l ∈ N such that k < l, e k = e l , k i=1 c i = 0, and l i=1 c i = 0. Such two indices necessarily exist, as there are finitely many edges in the arena, but infinitely many indices for which the running sum of weights is 0. Notice in particular that l i=k+1 c i = 0. Now, consider the play π = e 1 . . . e k e k+1 . . . e l e k+1 . . . e l e k+1 . . . , with the sequence of edges e k+1 . . . e l repeating ad infinitum (π is a "lasso"). This is a valid play since e k = e l . Moreover, we have that col(π ) ∈ W 2 as after repeating m times the sequence e k+1 . . . e l , the sum of the weights equals k i=1 c i + m · l i=k+1 c i = 0 + m · 0 = 0. The play π can be implemented with finite memory, as it consists of a finite prefix and a repeated finite sequence, which corresponds to a zero cycle.
We now turn our attention to P 2 's one-player games; P 2 wins a play π such that col(π) = c 1 c 2 . . . ∈ C ω if and only if c i = 0 for at most finitely many n's.
In a one-player arena, if there is a reachable negative cycle, P 2 can ensure to win by pumping it forever, and can therefore win with a memoryless strategy. We now consider an arena that has no reachable negative cycle. As we did for P 1 , we show that if P 2 can win a game in such an arena, then he can do so using finite memory. If P 2 can win, let π = e 1 e 2 . . . be a winning play for P 2 , i.e., col(π) = c 1 c 2 . . . ∈ W 1 ∩ W 2 . Let s be a state visited infinitely often when π is played, and m ∈ N be the first index such that out(e m ) = s. We can decompose π into a finite prefix e 1 . . . e m followed by an infinite sequence of cycles, all starting in s. Since there is no negative cycle, we cannot have that infinitely many of these cycles are positive, as this would imply that col(π) ∈ W 1 . Thus, infinitely many zero cycles are taken from s. As col(π) ∈ W 2 , there exists such a cycle e k . . . e l (that is, in(e k ) = out(e l ) = s and l i=k c i = 0) such that for all k ≤ n ≤ l, it holds that n i=1 c i = 0. This also implies that k−1 i=1 c i = 0, i.e., the history up to this cycle has a non-zero sum. Now, let us consider the play π = e 1 . . . e k−1 e k . . . e l e k . . . e l e k . . . , with the sequence of edges e k . . . e l repeating ad infinitum (π is a "lasso"). This is a valid play as in(e k ) = out(e l ). As l i=k c i = 0, we have that col(π ) ∈ W 1 . Moreover, every time the cycle starts again, the running sum of weights is equal to the same value: k−1 i=1 c i = 0. Therefore, as the running sum of weights does not reach zero the first time the cycle is taken, and it also never reaches zero along the cycle, it can never reach zero after index k − 1. Hence, col(π ) is not in W 2 either, and π is winning for P 2 . For the same reason as for P 1 , play π only requires finite memory to be implemented. As argued in Section 1, the two-player game from Figure 1 illustrates that P 1 might need infinite memory to play optimally in the two-player case. This proves that Gimbert and Zielonka's approach cannot work in full generality in the finite-memory case, as we cannot obtain the existence of finite-memory optimal strategies in all two-player games from the existence of finite-memory optimal strategies in all one-player games.

From finite memory based on M to M-monotony and M-selectivity
Monotony. We want to keep our approach as compositional as possible, hence we consider the two notions separately. Let us start with M-monotony.
Theorem 4.1. Let M = (M, m init , α upd ) be a memory skeleton and be a preference relation. Assume that for all one-player arenas A = (S 1 , S 2 = ∅, E), for all s, s ∈ S, P 1 has an s-optimal and s -optimal strategy σ ∈ Σ FM 1 (A), encoded as a Mealy machine Γ σ = (M, α nxt ), in G = (A, ). Then is M-monotone.
Note that the same holds for P 2 and −1 symmetrically. Also, we do not require full uniformity of the strategy, but only uniformity with regard to the fixed pair of states (i.e., strategy σ does not need to be optimal from other states). Figure 4: Automaton N built to establish M-monotony.
Our proof can be sketched as follows. We need to establish that Equation (3.1) holds. We first instantiate the four languages involved in it: {w}, {w }, K 1 and K 2 . We take NFA recognizing them and build an NFA N that joins them in such a way that, when N is considered as a game arena (see Lemma 2.2), its plays correspond exactly to the languages of infinite words considered in Equation (3.1). This arena is essentially composed of two chains emulating the two prefixes w and w and leading to a state t where P 1 has to pick a side corresponding to the two languages [K 1 ] and [K 2 ] (Figure 4). Now, establishing the M-monotony of boils down to invoking an optimal strategy σ in the corresponding game, the crux being that this strategy always picks the same edge in t (i.e., the same side between subarenas corresponding to [K 1 ] and [K 2 ]) as both prefixes w and w are deemed equivalent by the memory skeleton M.
Proof. Let M = (M, m init , α upd ) be a memory skeleton and be a preference relation satisfying the hypothesis. Let us prove that is M-monotone, i.e., that for all m ∈ M , for all K 1 , K 2 ∈ R(C), (4.1) Let m ∈ M , K 1 , K 2 ∈ R(C). We assume that K 1 , K 2 = ∅, otherwise Equation (4.1) holds trivially: if K 1 is empty, the conclusion of the implication is true regardless of K 2 ; and if K 2 is empty, the premise is false. Now, assume there exists w ∈ L m init ,m such that [wK 1 ] [wK 2 ], and let w be another prefix in L m init ,m . We will prove that [w init , Q K 1 fin ) and N K 2 = (Q K 2 , B K 2 , δ K 2 , q K 2 init , Q K 2 fin ) respectively denote NFA recognizing languages {w}, {w }, K 1 and K 2 . They exist since all these languages are regular. We assume w.l.o.g. that automaton N w (resp. N w , N K 1 , N K 2 ) is coaccessible and has only one initial state q w init (resp. q w init , q K 1 init , q K 2 init ) with no ingoing transition. We can do this since K 1 and K 2 are non-empty. We also assume w.l.o.g. that N w (resp. N w ) has only one final state q w fin (resp. q w fin ) with no outgoing transition. Actually, N w and N w can be taken as "chains" recognizing a unique word, and being coaccessible and deterministic.
We build an automaton N = (Q, B, δ, Q init , Q fin ) by "merging" states q K 1 init , q K 2 init , q w fin , and q w fin . We call this new merged state t. Formally, we built it as follows.
init , q w init } and Q fin = Q K 1 fin ∪ Q K 2 fin ; • and finally, the transition relation simply takes into account the merging on t: This construction is illustrated in Figure 4. The language recognized by N from q w init is w(K 1 ∪ K 2 ), whereas from q w init , it is w (K 1 ∪ K 2 ). Observe that N is coaccessible since both N K 1 and N K 2 are coaccessible.
Recall that we assume [wK 1 ] [wK 2 ]. By definition, this implies that [wK 2 ] = ∅, hence we also have that [K 2 ] = ∅. From this, we get that t is essential in N (Lemma 2.2). Thus, it is also the case for q w init and q w init . We will now interpret this NFA as an arena and use the hypothesis. Let A = Arena(N ). By Lemma 2.2, we have that col(Plays(A, q w init )) = [w(K 1 ∪ K 2 )] and col(Plays(A, q w init )) = [w (K 1 ∪ K 2 )]. By hypothesis, P 1 has a q w init -optimal and q w init -optimal strategy σ ∈ Σ FM 1 (A), encoded as a Mealy machine Γ σ = (M, α nxt ), in G = (A, ).
Let π ∈ Plays(A, q w init , σ) be the only play consistent with strategy σ from q w init . By definition of A, this play π necessarily contains a history ρ = e 1 . . . e n such that out(e n ) = t and for all i, 1 ≤ i < n, out(e i ) = t. Observe that col(ρ) = w. Recall that m = α upd (m init , w) is the memory state reached after reading w since w ∈ L m init ,m . Let e = α nxt (m, t) be the edge chosen by σ in t when t is visited (note that t will be visited only once by construction of A).
Finally, we assumed that col(π) ∈ [wK 1 ], hence we can conclude that which contradicts the hypothesis that [wK 1 ] [wK 2 ]. Hence, we have established that e belongs to N K 2 . Now let us consider π ∈ Plays(A, q w init , σ), the only play consistent with strategy σ from q w init . Again, by definition of A, this play π necessarily contains a history ρ = e 1 . . . e n such that out(e n ) = t and for all i, 1 ≤ i < n, out(e i ) = t. Observe that col(ρ ) = w . Since w ∈ L m init ,m , we also have that α upd (m init , w ) = m, i.e., the memory state reached after reading w is the same as the one reached after reading w. Recall that α nxt is deterministic by definition: i.e., for a given memory state and state of the arena, it always prescribes the same edge. Hence, we have that e = α nxt (m, t) is exactly the same as before, and therefore belongs to N K 2 . Thus, col(π ) ∈ [w K 2 ]. Finally, since σ is also q w init -optimal and applying the same reasoning as above, we have that which proves Equation (4.1) and concludes our proof.
Selectivity. We now turn to selectivity, which focuses on stability with regard to cycle mixing.
Theorem 4.2. Let M = (M, m init , α upd ) be a memory skeleton and be a preference relation. Assume that for all one-player arenas A = (S 1 , S 2 = ∅, E), for all s ∈ S, P 1 has an s-optimal strategy σ ∈ Σ FM 1 (A), encoded as a Mealy machine Γ σ = (M, α nxt ), in G = (A, ). Then is M-selective.
Note that the same holds for P 2 and −1 symmetrically. Again, observe that our hypothesis is as weak as possible as no uniformity is required.
Our proof bears similarities with the case of monotony. We need to establish that Equation (3.2) holds. We first instantiate the four languages involved in it: {w}, K 1 , K 2 and K 3 . We take NFA recognizing them and build an NFA N that joins them in such a way that, when N is considered as a game arena (see Lemma 2.2), its plays correspond exactly to the languages of infinite words considered in Equation (3.2). This arena is essentially composed of a chain emulating the prefix w and leading to a state t where P 1 can visit sides that generate cycles from K 1 and K 2 -forever or for a finite time -or branch to a side corresponding to K 3 ( Figure 5). Now, establishing the M-selectivity of boils down to invoking an optimal strategy σ in the corresponding game, the crux being that this strategy 6 One can easily get from the definition using the UCol-operator that, in this one-player game, σ is q w init -optimal if and only if for all σ ∈ Σ(A), col(Plays(A, q w init , σ )) col(Plays(A, q w init , σ)). always picks the same edge in t (i.e., the same side between subarenas corresponding to [K * 1 ], [K * 2 ] and [K 3 ]) as all cycles on t are deemed equivalent by the memory skeleton M. The main difference with the previous construction is clear in the last sentence: it is now possible to come back to t, possibly infinitely often, and our proof takes that into account (as illustrated in Figure 5).
Proof. Let M = (M, m init , α upd ) be a memory skeleton and a preference relation satisfying the hypothesis. Let us prove that is M-selective, i.e., that for all w ∈ C * , m = α upd (m init , w), for all K 1 , K 2 ∈ R(C) such that K 1 , K 2 ⊆ L m,m , for all K 3 ∈ R(C), Let w ∈ C * and m = α upd (m init , w). Let K 1 , K 2 , K 3 ∈ R(C), with K 1 , K 2 ⊆ L m,m . In the following, we assume all three languages K 1 , K 2 and K 3 to be non-empty. Indeed, if K 3 is empty, so is the left-hand side of Equation (4.2), hence it trivially holds. If both K 1 and K 2 are empty, Equation (4.2) compares [wK 3 ] to itself, hence it trivially holds again. Finally, if K 1 is the only empty language among the three, then Equation (4.2) can be restated as follows: , where the middle inequality -the one to prove -involves three non-empty sets. A symmetric argument holds if K 2 is the only empty language. We also assume that K 1 and K 2 do not contain the empty word for technical convenience: this is w.l.o.g. thanks to the Kleene stars used in the regular expressions to consider.
As for monotony, we start by considering NFA for all these languages: let N w = (Q w , B w , δ w , q w init , Q w fin ), N K 1 = (Q K 1 , B K 1 , δ K 1 , q K 1 init , Q K 1 fin ), N K 2 = (Q K 2 , B K 2 , δ K 2 , q K 2 init , Q K 2 fin ) and N K 3 = (Q K 3 , B K 3 , δ K 3 , q K 3 init , Q K 3 fin ) respectively denote NFA recognizing languages {w}, K 1 , K 2 and K 3 . They exist since all these languages are regular. We assume w.l.o.g. that automaton N w (resp. N K 1 , N K 2 , N K 3 ) is coaccessible and has only one initial state q w init (resp. q K 1 init , q K 2 init , q K 3 init ) with no ingoing transition. We can do this since K 1 , K 2 and K 3 are non-empty. We also assume w.l.o.g. that N w (resp. N K 1 , N K 2 ) has only one final state q w fin (resp. q K 1 fin , q K 2 fin ) with no outgoing transition. Again N w can simply be a "chain" recognizing a unique word, being both coaccessible and deterministic.
Similarly to Theorem 4.1, we build an automaton N = (Q, B, δ, Q init , Q fin ) by "merging" states q K 1 init , q K 2 init , q K 3 init , q w fin , q K 1 fin , and q K 2 fin . We call this new merged state t. Formally, we build it as follows.
fin ; • and finally, the transition relation simply takes into account the merging on t: This construction is illustrated in Figure 5. The language recognized by N is w(K 1 ∪ K 2 ) * K 3 . Observe that N is coaccessible since N K 1 , N K 2 and N K 3 are coaccessible. Also observe 11:30 P. Bouyer, S. Le Roux, Y. Oualhadj, M. Randour, and P. Vandenhove Vol. 18:1 that t is essential by construction: by merging the initial and final states of K 1 (resp. K 2 ), we created cycles on t. Thus, q w init is also essential. Figure 5: Automaton N built to establish M-selectivity.
We will now interpret this NFA as an arena and use the hypothesis. Let A = Arena(N ). By Lemma 2.2, we have that col(Plays(A, q w init )) = [w(K 1 ∪ K 2 ) * K 3 ]. By hypothesis, P 1 has a q w init -optimal strategy σ ∈ Σ FM 1 (A), encoded as a Mealy machine Γ σ = (M, α nxt ), in G = (A, ).
Let π ∈ Plays(A, q w init , σ) be the only play consistent with σ from q w init . By q w init -optimality, we have that [w(K 1 ∪ K 2 ) * K 3 ] col(π).
(4.3) By definition of A, this play π necessarily contains a history ρ = e 1 . . . e n such that out(e n ) = t and for all i, 1 ≤ i < n, out(e i ) = t. Observe that col(ρ) = w. Recall that m = α upd (m init , w) is the memory state reached after reading w since w ∈ L m init ,m . Let e = α nxt (m, t) be the edge chosen by σ in t when t is first visited. Note that in contrast to the construction in Theorem 4.1, t could be visited many times here, and even infinitely often (using cycles from K 1 and K 2 ). We consider two cases in the following.
First, assume that e belongs to the part of the arena generated by N K 3 . Since t (originally q K 3 init ) has no incoming transition in N K 3 , we conclude that π never visits t again, and that col(π) ∈ [wK 3 ]. By Equation (4.3), we verify Equation (4.2). Now, assume that e belongs to the part of the arena generated by N K 1 (the same reasoning will apply symmetrically for N K 2 ). We want to show that col(π) ∈ [wK * 1 ], i.e., that σ never switches to another part of the arena. Two cases are possible: either (a) π visits t only once, or (b) π visits t at least twice.
Case (a). Since π visits t only once and t is the only state where the play could switch to a different automaton, we have that π = ρ · π for a suffix π starting in t and entirely contained in N K 1 . Hence, we have col(π) = w · col(π ) with col(π ) ∈ [K 1 ]. Thus, col(π) ∈ [wK 1 ] ⊆ [wK * 1 ]. Case (b). Let π = ρ · ρ · π , such that ρ ends with the second visit of t. Recall that w = col(ρ), m = α upd (m init , w), and e = α nxt (m, t). Now, by definition of K 1 , we have that col(ρ ) ∈ L m,m . Hence, α upd (m, col(ρ )) = m. Intuitively, the memory skeleton is back to the same memory state after reading the cycle ρ . As argued in Theorem 4.1, α nxt is deterministic, and both the state of the arena and the memory state are identical after ρ and after ρ · ρ . Therefore σ(ρ · ρ ) = σ(ρ) = e. Iterating this reasoning (as all cycles on t in N K 1 are read as cycles on m in the memory), we conclude that π = ρ · (ρ ) ω . This implies that col(π) ∈ [wK * 1 ]. Hence, in both cases, we have that col(π) ∈ [wK * 1 ]. Now, by Equation (4.3), we verify Equation (4.2).
Wrapping everything up, we have that whatever the part of the arena to which e belongs, Equation (4.2) is verified. Therefore, we have shown that is indeed M-selective.
Wrap-up. We have established that the existence of finite-memory optimal strategies based on a skeleton M in one-player games implies both M-monotony and M-selectivity of the preference relation, under mild uniformity assumptions. It is interesting to observe that this holds already for one-player games (a fortiori, for two-player games too). Next, we consider the converse: we will prove that M-monotony and M-selectivity implies the existence of UFM strategies, not only in one-player games, but even in two-player ones, when satisfied by the preference relation and its inverse.

5.
From M-monotony and M-selectivity to finite memory based on M Induction step. To prove the sought implication (Theorem 5.3), we first focus on memoryless strategies in "covered" arenas, as discussed in Section 3.1. Intuitively, a "covered" arena resembles a product arena (with a memory skeleton): hence studying memoryless strategies on such arenas is very close to studying finite-memory strategies on general arenas.
We will proceed by induction on the number of choices in an arena, as sketched in Section 3.1. This induction will require us to mix different Nash equilibria (one for each player) in a proper way, to maintain the desired property. For the sake of readability, we thus start by proving the induction step for one player.
For an arena A = (S 1 , S 2 , E), we write n A = |E| − |S| for its number of choices. We also define the notion of subarena: we say that an arena A = (S 1 , S 2 , E ) is a subarena of an arena A = (S 1 , S 2 , E) if S 1 = S 1 , S 2 = S 2 , and E ⊆ E. That is, arena A is a subarena of A if it can be obtained from A by removing some edges of A (while keeping it non-blocking). We say that a set of arenas A is closed under the subarena operation if for all A ∈ A, for all subarenas A of A, A ∈ A.
Lemma 5.1. Let be a preference relation, M p and M c be two memory skeletons, and A be a set of arenas closed under the subarena operation. Assume that is M p -monotone and M c -selective, and that for all P 2 's one-player arenas A = (S 1 , S 2 , E) ∈ A, for all subsets of states S cov ⊆ S for which M p is a prefix-cover and M c is a cyclic-cover, P 2 has an optimal strategy from S cov .
Let n ∈ N. Assume that for all arenas A = (S 1 , S 2 , E ) ∈ A such that n A < n, for all subsets of states S cov ⊆ S for which M p is a prefix-cover and M c is a cyclic-cover, there exists a memoryless Nash equilibrium (σ 1 , σ 2 ) ∈ Σ ML 1 (A ) × Σ ML 2 (A ) from S cov in G = (A , ).
Then, for all arenas A = (S 1 , S 2 , E) ∈ A such that n A = n, for all subsets of states S cov ⊆ S for which M p is a prefix-cover and M c is a cyclic-cover, there exists a Nash equilibrium (σ 1 , σ 2 ) ∈ Σ ML 1 (A) × Σ 2 (A) from S cov in G = (A, ) such that σ 1 is memoryless. Note that the same holds for P 2 and −1 symmetrically. Intuitively, Lemma 5.1 states that under the hypotheses of M p -monotony and M cselectivity, if both players can play optimally with memoryless strategies in "small" and "covered" arenas, the same property holds for at least P 1 in "covered" arenas where an additional choice exists.
This lemma has to be commented. First, observe that the property is about Nash equilibria. Indeed, as explained in Section 2, the result we prove is actually slightly stronger than the existence of optimal strategies, as it can be stated for Nash equilibria.
Second, this lemma is focused on proving the existence of an NE in which P 1 's strategy is memoryless: proving that this holds for both players will be done in Theorem 5.3.
Third, as motivated in Section 3.2, we state our result as the existence of memoryless optimal strategies in "covered" arenas: the existence of UFM strategies in general arenas will follow (Corollary 5.5), but taking this road allows us to keep optimal strategies memoryless for many arenas (which already share the "classifying" properties that a product with a memory skeleton would grant).
Fourth, we use two different skeletons, one for monotony (i.e., dealing with prefixes) and one for selectivity (i.e., dealing with cycles). Obviously, one can use a single combined skeleton using Lemma 3.3 and Lemma 3.5, but our approach has the advantage of being compositional and highlighting how each skeleton / property impacts the reasoning in the proof: we will see that they have different uses.
Lastly, the notions of prefix-covers and cyclic-covers are defined with regard to a covered set of states S cov in order to keep the need for uniformity minimal, in the same spirit as what we did in Section 4.
As mentioned above, our proof is essentially an induction step. Starting from an arena A with n A = n choices, we identify a state t in which P 1 has at least two outgoing edges (the proof is symmetric for P 2 ). By splitting the edges in t into two sets, we obtain two corresponding subarenas A a and A b such that n Aa , n A b < n, along with the corresponding subgames. The induction hypothesis gives us two memoryless Nash equilibria (from S cov ) in these subgames: (σ a 1 , σ a 2 ) and (σ b 1 , σ b 2 ). The arguments can then be unfolded intuitively as follows. First, using M p -monotony and M p being a prefix-cover, we identify one subarena (say A a ) which is clearly at least as good as the other for P 1 . Second, we build a strategy profile (σ # 1 , σ # 2 ), that we claim to be an NE in G, in the following way: P 1 uses strategy σ a 1 (the one from the best subarena) and P 2 reacts to P 1 's actions by playing the corresponding best-response strategy. I.e., if P 1 plays in A a , P 2 plays according to σ a 2 , and otherwise he plays according to σ b 2 . Third, it remains to prove the two inequalities of Equation (2.4). The rightmost one is easy, as well as the leftmost one in the subcase where the unique play π ∈ Plays(A, s, σ # 1 , σ # 2 ) does not visit state t: they can both be proved essentially thanks to the induction hypothesis and easy construction arguments. The crux of the proof is thus in the last step: proving that the leftmost inequality holds when the play visits t. This can be achieved thanks to M c -selectivity and M c being a cyclic-cover, Lemma 2.1, inherent properties of the preference relation, A a being the best subarena thanks to M p -monotony, and the induction hypothesis, in that order.
Obviously, M-monotony (Definition 3.1), M-selectivity (Definition 3.2), prefix-covers and cyclic-covers (Definition 3.4) were defined to be sufficient to provide Lemma 5.1: one of the main challenges was to have them not too powerful as to keep them also necessary, as proved in Section 4.
Memoryless Nash equilibria. We are now armed to establish the implication sketched earlier. As motivated before, we first state the result in the context of memoryless NE on "covered" arenas, the finite-memory case on general arenas will follow almost trivially. We first show the result for one-player arenas, and use it to obtain the two-player case.
Lemma 5.2. Let be a preference relation and M p , M c be two memory skeletons. Assume that is M p -monotone and M c -selective. Then, for all P 1 's one-player arenas A = (S 1 , S 2 , E), for all subsets of states S cov ⊆ S for which M p is a prefix-cover and M c is a cyclic-cover, there exists a memoryless optimal strategy σ 1 ∈ Σ ML 1 (A) from S cov in G = (A, ).
Proof. Let A be the set of all P 1 's one-player arenas, which is closed under the subarena operation. By hypothesis, is M p -monotone and M c -selective, and clearly P 2 has optimal strategies on all his one-player arenas in A, as P 2 has no choice in any of these arenas.
We proceed by induction on the number of choices in the arena. The base case, n A = 0, is trivial. Now let n ∈ N \ {0} and assume the result holds for n A < n. Let A = (S 1 , S 2 , E) ∈ A be a P 1 's one-player arena such that n A = n, and let S cov ⊆ S be a subset of states for which M p is a prefix-cover and M c is a cyclic-cover. We can invoke Lemma 5.1 (note that as we only consider P 1 's one-player arenas, the existence of a Nash equilibrium coincides with the existence of an optimal strategy for P 1 -see Remark 2.8), and obtain an optimal strategy for P 1 from S cov .
We are now ready for the two-player case, which requires monotony and selectivity assumptions for both relation and relation −1 .
Theorem 5.3. Let be a preference relation and M p 1 , M p 2 , M c 1 and M c 2 be four memory skeletons. Assume that is M p 1 -monotone and M c 1 -selective, and that −1 is M p 2 -monotone and M c 2 -selective. Then, for all arenas A = (S 1 , S 2 , E), for all subsets of states S cov ⊆ S for which M p 1 and M p 2 are prefix-covers, and M c 1 and M c 2 are cyclic-covers, there exists a memoryless Nash equilibrium (σ 1 , σ 2 ) ∈ Σ ML 1 (A) × Σ ML 2 (A) from S cov in G = (A, ). As always, we want to keep our results as general and compositional as possible, hence we consider different skeletons for the two players. As argued before, one can always take a single skeleton for the two players, as well as for the two notions, by taking their product and using Lemma 3.3 and Lemma 3.5.
As discussed previously, this theorem in particular implies the existence of memoryless S cov -optimal strategies for both players (via Lemma 2.7).
It is fairly straightforward to prove Theorem 5.3 once Lemma 5.1 is established: the main idea is to invoke Lemma 5.1 for both players while doing the induction, and obtain two Nash equilibria, both of which being memoryless for only one player. Then, to conclude, we resort to Lemma 2.5 which gives us the possibility to mix these two NE into one that is now memoryless for both players.
Remark 5.4. Recall that a crucial hypothesis for Lemma 2.5 to hold is that our games are antagonistic, i.e., that we consider and its inverse relation −1 . It is quite interesting to observe that our use of Lemma 2.5 is the only circumstance in which this hypothesis matters (and it is indeed essential) in all our reasoning. 7 In other words, most of our arguments would hold for two different preference relations, 1 and 2 , without the hypothesis that 7 To be more precise: we wrote everything in the antagonistic setting, but Equation (2.4) can be written as two inequalities in the general setting -col(Plays(A, s, σ 1 , σ2)) 1 col (Plays(A, s, σ1, σ2) 2 equals ( 1 ) −1 . The problem would be that we cannot mix the two equilibria in a single equilibrium with both strategies being memoryless -while we do need it in the hypothesis of the induction step Lemma 5.1.
Whether the same reasoning can be extended to (general) Nash equilibria by adapting Lemma 5.1 to take into account the unavoidable blow-up of memory is a question we leave open for future work. Note that the memory bounds would be awful in any case: as the induction would unroll, the memory needed in the equilibria would build up (essentially one bit of memory is added at each call of the induction step in our easier setting, which is then discarded thanks to Lemma 5.1).
Proof. Let be a preference relation and M p 1 , M p 2 , M c 1 and M c 2 be four memory skeletons such that is M p 1 -monotone and M c 1 -selective, and −1 is M p 2 -monotone and M c 2 -selective. We consider the set of all arenas, A, which is closed under the subarena operation. By Lemma 5.2, we immediately obtain that for all P 1 's (resp. P 2 's) one-player arenas A = (S 1 , S 2 , E), for all subsets of states S cov ⊆ S for which M p 1 and M p 2 are prefix-covers, and M c 1 and M c 2 are cyclic-covers, P 1 (resp. P 2 ) has an optimal strategy from S cov . We will proceed by induction on the number of choices in the arena, as described before. The base case, n A = 0, is trivial. Now let n ∈ N \ {0} and assume the result holds for n A < n. Let A = (S 1 , S 2 , E) ∈ A be an arena such that n A = n, and let S cov ⊆ S be a subset of states for which M p 1 and M p 2 are prefix-covers, and M c 1 and M c 2 are cyclic-covers. Focusing on P 1 and , we invoke Lemma 5.1 (using M p 1 ⊗ M p 2 and M c 1 ⊗ M c 2 , and the induction hypothesis) and obtain an NE (σ ♠ 1 , σ ♠ 2 ) ∈ Σ ML 1 (A) × Σ 2 (A) from S cov in G = (A, ). Note that this NE is only memoryless for P 1 ! Symmetrically, focusing on P 2 and −1 , we invoke Lemma 5.1 (using M p 1 ⊗ M p 2 and M c 1 ⊗ M c 2 , and the induction hypothesis) and obtain an NE (σ ♣ 1 , σ ♣ 2 ) ∈ Σ 1 (A) × Σ ML 2 (A) from S cov in G = (A, ). Again, note that this NE is only memoryless for P 2 .
Finite-memory Nash equilibria and UFM strategies. Finally, we conclude this section by establishing our result as a corollary. As usual in this section, we state our result for the slightly stronger notion of Nash equilibria: it involves in particular the existence of UFM strategies for both players. As for Theorem 5.3, we use four memory skeletons to keep the approach compositional and playerbased, and we provide strategies based on their product memory. However, if there exists a and col(Plays(A, s, σ1, σ 2 )) 2 col (Plays(A, s, σ1, σ2)) -and all our previous reasoning can be rewritten accordingly. skeleton M that is already such that both and −1 are M-monotone and M-selective, this skeleton suffices to build both strategies (this is clear in the following proof). This corollary is fairly easy to obtain. We build the joint memory skeleton M as defined above. By Lemma 3.3 and Lemma 3.5, we can invoke Theorem 5.3 on the product arena A M and obtain a memoryless NE on it, or equivalently, a finite-memory one on the original arena, through Lemma 2.4. Consider the product arena A = A M, as defined in Section 2. Recall that S = S ×M . By Lemma 3.5, the set of states S cov = S × {m init } ⊆ S is both prefix-covered and cycliccovered by M.
Putting the last two arguments together, we may invoke Theorem 5.3 on A and obtain a memoryless Nash equilibrium (σ 1 , σ 2 ) ∈ Σ ML 1 (A ) × Σ ML 2 (A ) from S cov in G = (A , ). To conclude, it suffices to use Lemma 2.4 (stated using NE, as discussed in Remark 2.9): the memoryless equilibrium (σ 1 , σ 2 ) in the product game G can be seen as a finite-memory We can also formulate a version of this last result focusing on one-player arenas.
Corollary 5.6. Let be a preference relation and M p , M c be two memory skeletons. Assume that is M p -monotone and M c -selective. Then, for all P 1 's one-player arenas A = (S 1 , S 2 , E), there exists a UFM strategy σ 1 ∈ Σ FM 1 (A) in G = (A, ), such that strategy σ 1 is encoded as a Mealy machine Γ σ 1 = (M, α nxt ) based on the joint memory skeleton M = M p ⊗ M c .
Proof sketch. The proof is very similar to (and easier than) the proof of Corollary 5.5, but uses the one-player implication of Lemma 5.2 instead of Theorem 5.3.
Remark 5.7. Our whole induction scheme is bottom-up, and presented through the prism of covered arenas. It is possible to obtain a similar proof scheme by going top-down and starting from product arenas -which are particular cases of covered arenas (Lemma 3.5). Our approach pursues two objectives. First, extracting the main technical elements needed and describing them through the concepts of prefix-and cyclic-covers to give a better grasp of how things work and where. Second, providing memoryless optimal strategies in all covered arenas (Theorem 5.3).

Discussion
We close our paper with a discussion of the assets and limits of our approach, its applicability with regard to the current research landscape, and the directions we aim to follow in future work. Technical features of our approach. As observed through Remark 3.7, our results are established using fine-grained assumptions and conclusions, in an effort to push the approach to its limits. They also preserve compositionality, splitting the reasoning for M-monotony and M-selectivity, and for the two players. Alongside M-monotony and M-selectivity, we define two other key concepts to solve the technical issues related to the induction on product arenas: prefix-covers and cyclic-covers. These notions are crucial tools to prove the results in Section 5.
Some advantages. The aforementioned concepts of prefix-covers and cyclic-covers also have benefits from a practical point of view: given a preference relation and the corresponding memory skeleton M, they let us identify game arenas where memoryless strategies suffice whereas finite memory (based on M) might be necessary in general. Such arenas are the ones covered by M. 8 Hence in practice, this approach permits to obtain UML strategies for many arenas where a coarser approach would only provide UFM ones.
Our approach yields two methods to establish that a preference relation (or equivalently a payoff function or a winning condition) admits UFM strategies. The first one, exhibiting appropriate memory skeletons and proving M-monotony and M-selectivity, is based on Theorem 3.6 and can be used compositionally through Corollary 5.5. The second one follows the lifting corollary, Corollary 3.9: one only has to study the one-player subcases then invoke this result to lift the existence of UFM strategies to the two-player case, without checking for M-monotony and M-selectivity at all. Hence this second method is often painless in practice.
Two interesting facts can be seen through Corollary 3.9. First, there is no blow-up in the memory required when going from one-player games to two-player games: the overall memory simply combines the memory skeletons of the two players. Second, assuming that one has an algorithm to solve 9 one-player games -say for P 1 -for a winning condition satisfying our hypotheses, this lifting corollary also induces a naive algorithm for the twoplayer case for free: thanks to the bounds on memory, one may enumerate the strategies of the adversary, P 2 -or guess one if one aims for a non-deterministic algorithm -and solve the corresponding P 1 's game(s) where the strategy of P 2 is fixed. Note that while such a simple algorithm might not be optimal, it does correspond to the approach giving the best complexity class known for the renowned family of games in NP ∩ coNP, such as, e.g., parity or mean-payoff games (e.g., [Jur98]). These last two cases could already be dealt with thanks to Gimbert and Zielonka's result since they involve memoryless strategies, but now a similar road can be taken for any objective that admits arena-independent finite-memory optimal strategies, such as, e.g., generalized parity games.
Applicability. Let us give a quick tour of some classical (combinations of) objectivesexpressed through winning conditions, payoffs or preference relations -and assess whether our approach permits to establish the existence of UFM strategies in the corresponding games.
Note that when considering multiple (quantitative) objectives, optimal strategies usually do not exist, and one has to settle for Pareto-optimal ones (e.g., [DKQR20]). However, in many cases, the (decision) problem under study is as follows: given a threshold (vector), define the winning condition as all the plays achieving at least this threshold, and check for 8 The follow-up paper [BORV21] further discusses how to know if an arena is covered. 9 I.e., decide who has a winning strategy from a given state. a winning strategy. Hence multi-objective quantitative games are often de facto reduced to qualitative win-lose games for this so-called threshold problem. Observe that, given a multi-objective setting, if UFM strategies exist for all threshold problems, then finite-memory strategies suffice to realize the Pareto front (as each point of this front can be considered as a threshold). Therefore, our approach also enables reasoning about the existence of finite-memory Pareto-optimal strategies in multi-objective games.
We start our overview with some game settings that fall under the scope of our approach. Obviously, all memoryless-determined objectives are among them, since we generalize Gimbert and Zielonka's work [GZ05]: this includes, e.g., mean-payoff [EM79], parity [EJ88,Zie98], energy [CdAHS03] or average-energy games [BMR + 18]. As established in Section 1, our results encompass all cases where arena-independent memory suffices. Hence they permit to rediscover the existence of UFM strategies for games such as, e.g., generalized reachability [FH10], generalized parity [CHP07], Muller [DJW97,Cas21], window parity [BHR16], some variants of window mean-payoff [CDRR15], or lower-and upper-bounded (multi-dimension) energy games [BFL + 08, BMR + 18, BHM + 17]. Our approach can also be useful to extend these known results to more general combinations, either via appropriate memory skeletons or through the lifting corollary (see an application in Section 3.3).
There are many games that do not fit our approach for good reasons, as they do not admit UFM strategies in general: e.g., multi-dimension mean-payoff [VCD + 15], mean-payoff parity [CHJ05], finitary parity and Streett [CHH09], or energy mean-payoff games [BHRR19]. More interesting are games for which finite-memory strategies exist, but the memory is arena-dependent. These notably include games with multi-dimension lower-bounded energy objectives and no upper bound [CRR14,JLS15], or other variants of window mean-payoff games [CDRR15]. In such games, the players usually have to keep track of information such as, e.g., the sum of weights along an acyclic path, which is bounded for any given arena, but by a value that grows when the arena grows. Hence the need for memory that grows with the arena parameters. Our results cannot be applied directly to such cases in order to obtain the existence of finite-memory strategies for all games. An adaptation of our approach could potentially be used for subclasses of arenas where the parameters are bounded (in order to regain a skeleton working on all arenas of the class).
Comparison with related work. We already discussed extensively the most important related articles [GZ04, GZ05, Kop06, BFMM11, AR17, Mar75, LPR18] in Section 1, alongside a technical comparison between our work and Gimbert and Zielonka's seminal result [GZ05]. Here, we simply want to highlight interesting directions of research inspired by some of these papers.
First, Aminof and Rubin provide a simpler (but incomplete) approach to memoryless determinacy through the prism of first-cycle games in [AR17]: a similar take on finite-memory determinacy could be appealing -it could provide sufficient conditions easier to test than M-monotony and M-selectivity.
Second, Bianco et al. establish sufficient (and relaxed) conditions to ensure the existence of UML strategies for one player, in two-player games, in [BFMM11]: it would be interesting to study the corresponding problem in the finite-memory case. Indeed, in many games where infinite memory is needed, it is only the case for one of the players (e.g., [VCD + 15, CHJ05, BHRR19]) and such conditions could thus prove useful. Note that this is different from Theorem 3.8, which gives a sufficient and necessary condition but for one-player games only. Finally, recall that Le Roux et al. give a rather tight characterization of combinations of objectives preserving the sufficiency of finite-memory strategies in [LPR18]. Their techniques, as well as the scope of their results, are somewhat orthogonal to ours. Whether both approaches can be intertwined to obtain results on more general settings remains an open question.
Limits and future work. To close this paper, we recall three limits of our approach, and the corresponding open problems.
First, as explained throughout the paper, our results cover all cases where arenaindependent memory suffices, and are limited to these cases. We have argued that the approach cannot be fully lifted to the general case, for good reasons, as the lifting corollary breaks in some situations (see Sections 1 and 3.4). Still, we have hope to generalize our approach to some extent to the arena-dependent case, through some function associating memory skeletons to arenas, as discussed in Section 1. Obtaining a lifting corollary -under well-chosen conditions -in the arena-dependent case would be of tremendous help in practice: see for example [BMR + 18, BHM + 17, BHRR19]. Hence this is clearly the next step in our quest.
Second, our result is a characterization instantiated by a memory skeleton M. While the lifting corollary is helpful in applications, it would be fantastic to be able to find an appropriate skeleton automatically, and to be able to determine if a given skeleton is minimal (with regard to a preference relation). This paper is a first step toward these long-term objectives.
Lastly, as explained in Remark 2.6 and Remark 5.4, most of our arguments carry over to the case of general Nash equilibria. That is, when considering not necessarily antagonistic games where the two players use different, not necessarily inverse, preference relations. Whether our approach can be adapted in this case, at the price of an unavoidable blow-up of memory, is an open question worth considering. In particular, we want to study the links between our results (including the lifting from one-player to two-player games) and recent results lifting finite-memory determinacy in two-player games to the existence of finite-memory Nash equilibria in multi-player games [LP18].