Arena-Independent Finite-Memory Determinacy in Stochastic Games

We study stochastic zero-sum games on graphs, which are prevalent tools to model decision-making in presence of an antagonistic opponent in a random environment. In this setting, an important question is the one of strategy complexity: what kinds of strategies are sufficient or required to play optimally (e.g., randomization or memory requirements)? Our contributions further the understanding of arena-independent finite-memory (AIFM) determinacy, i.e., the study of objectives for which memory is needed, but in a way that only depends on limited parameters of the game graphs. First, we show that objectives for which pure AIFM strategies suffice to play optimally also admit pure AIFM subgame perfect strategies. Second, we show that we can reduce the study of objectives for which pure AIFM strategies suffice in two-player stochastic games to the easier study of one-player stochastic games (i.e., Markov decision processes). Third, we characterize the sufficiency of AIFM strategies through two intuitive properties of objectives. This work extends a line of research started on deterministic games to stochastic ones.


26:4 Arena-Independent Finite-Memory Determinacy in Stochastic Games
The proof technique for the one-to-two-player lift shares a similar outline in [33,34,7] and in this paper: it relies on an induction on the number of edges in arenas to show the existence of memoryless optimal strategies. This edge-induction technique is frequently used in comparable ways in other works about memoryless determinacy [35,31,32,18]. In the AIFM case, the extra challenge consists of applying such an induction to the right set of arenas in order for a result about memoryless strategies to imply something about AIFM strategies. Work in [7] paved the way to neatly overcome this technical hindrance and we were able to factorize the main argument in Lemma 6.
Applicability. Let us discuss objectives that admit, or not, pure AIFM optimal strategies in stochastic arenas.
Objectives for which AIFM optimal strategies exist include the aforementioned memorylessdetermined objectives [29,22,48,10], as explained earlier. Such objectives could already be studied through the lens of a one-to-two-player lift [34], but our two other main results also apply to these. Pure AIFM optimal strategies also exist in lexicographic reachability-safety games [23,Theorem 4]: the memory depends only on the number of targets to visit or avoid, but not on parameters of the arena (number of states or transitions). Muller objectives whose probability must be maximized [15] also admit pure AIFM optimal strategies: the number of memory states depends only on the colors and on the Muller condition.
In general, every ω-regular objective admits pure AIFM optimal strategies, as it can be seen as a parity objective (for which pure memoryless strategies suffice) after taking the product of the game graph with a deterministic parity automaton accepting the objective [42,20]. This parity automaton can be taken as an arena-independent memory structure. It is therefore possible to use our results to investigate precise memory bounds in stochastic games for multiple ω-regular objectives which have been studied in deterministic games or in one-player stochastic games: generalized parity games [21], lower-and upper-bounded energy games [5], some window objectives [13,11], weak parity games [49]. There are objectives for which finite-memory strategies suffice for some player, but with an underlying memory structure depending on parameters of the arena (an example is provided by the Gain objective in [40,Theorem 6]). Many objectives also require infinite memory, such as generalized mean-payoff games [18] (both in deterministic and stochastic games) and energy-parity games (only in stochastic games [17,39]). Our characterizations provide a more complete understanding of why AIFM strategies do not suffice.
Deterministic and stochastic games. There are natural ways to extend classical objectives for deterministic games to a stochastic context: typically, for qualitative objectives, a natural stochastic extension is to maximize the probability to win. Still, in general, memory requirements may increase when switching to the stochastic context. To show that understanding the deterministic case is insufficient to understand the stochastic case, we outline three situations displaying different behaviors. As mentioned above, for many classical objectives, memoryless strategies suffice both in deterministic and in stochastic games. AIFM strategies may suffice both for deterministic and stochastic games, but with a difference in the size of the required memory structure. One such example is provided by the weak parity objective [49], for which memoryless strategies suffice in deterministic games, but which requires memory in stochastic games (this was already noticed in [34,Section 4.4]). Yet, it is possible to show that pure AIFM strategies suffice in stochastic games using the results from our paper. This shows that to go from the deterministic to the stochastic case, a "constant" increase in memory may be necessary and sufficient. There are also objectives for which memoryless strategies suffice in deterministic games, but even AIFM strategies do not suffice in stochastic games. One such example consists in maximizing the probability to obtain a non-negative discounted sum (which is different from maximizing the expected value of the discounted sum, for which memoryless strategies suffice, as is shown in [48]). Formal proofs for these last two examples are provided in the full version [9]. These three situations further highlight the significance of establishing results about memory requirements in stochastic games, even for objectives whose deterministic version is well-understood.
Outline. We introduce our framework and notations in Section 2. We discuss AIFM strategies and tools to relate them to memoryless strategies in Section 3, which allows us to prove our result about subgame perfect strategies. The one-to-two-player lift is presented in Section 4, followed by the one-player characterization in Section 5. Due to a lack of space, we choose to focus on Section 5 and only sketch Section 4; the complete proofs and technical details are found in the full version of the article [9].

Preliminaries
Let C be an arbitrary set of colors.
Arenas. For a measurable space (Ω, F) (resp. a finite set Ω), we write Dist(Ω, F) (resp. Dist(Ω)) for the set of probability distributions on (Ω, F) (resp. on Ω). For Ω a finite set and µ ∈ Dist(Ω), we write Supp(µ) = {ω ∈ Ω | µ(ω) > 0} for the support of µ. We consider stochastic games played by two players, called P 1 (for player 1) and P 2 (for player 2), who play in a turn-based fashion on arenas. A (two-player stochastic turn-based) arena is a tuple A = (S 1 , S 2 , A, δ, col), where: S 1 and S 2 are two disjoint finite sets of states, respectively controlled by P 1 and P 2 -we denote S = S 1 ⊎ S 2 ; A is a finite set of actions; δ : S × A → Dist(S) is a partial function called probabilistic transition function; col : S × A → C is a partial function called coloring function. For a state s ∈ S, we write A(s) for the set of actions that are available in s, that is, the set of actions for which δ(s, a) is defined. For s ∈ S, function col must be defined for all pairs (s, a) such that a is available in s. We require that for all s ∈ S, A(s) ̸ = ∅ (i.e., arenas are non-blocking).
For s, s ′ ∈ S and a ∈ A(s), we denote δ(s, a, s ′ ) instead of δ(s, a)(s ′ ) for the probability to reach s ′ in one step by playing a in s, and we write (s, a, s ′ ) ∈ δ if and only if δ(s, a, s ′ ) > 0. An interesting subclass of (stochastic) arenas is the class of deterministic arenas: an arena A play of A is an infinite sequence of states and actions s 0 a 1 s 1 a 2 s 2 . . . ∈ (SA) ω such that for all i ≥ 0, (s i , a i+1 , s i+1 ) ∈ δ. A prefix of a play is an element in S(AS) * and is called a history; the set of all histories starting in a state s ∈ S is denoted Hists(A, s). For S ′ ⊆ S, we write Hists(A, S ′ ) for the unions of Hists(A, s) over all states s ∈ S ′ . For ρ = s 0 a 1 s 1 . . . a n s n a history, we write out(ρ) for s n . For i ∈ {1, 2}, we write Hists i (A, s) and Hists i (A, S ′ ) for the corresponding histories ρ such that out(ρ) ∈ S i . For s, s ′ ∈ S, we write Hists(A, s, s ′ ) for the histories in Hists(A, s) such that out(ρ) = s ′ .
A one-player arena of P i is an arena A = (S 1 , S 2 , A, δ, col) such that for all s ∈ S 3−i , |A(s)| = 1. A one-player arena corresponds to a Markov decision process (MDP) [44,2].
An initialized arena is a pair (A, S init ) such that A is an arena and S init is a non-empty subset of the states of A, called the set of initial states. We assume w.l.o.g. that all states of A are reachable from S init following transitions with positive probabilities in the probabilistic transition function of A. In case of a single initial state s ∈ S, we write (A, s) for (A, {s}).
We will consider classes (sets) of initialized arenas, which are usually denoted by the letter A. Typical classes that we will consider consist of all one-player or two-player, deterministic or stochastic initialized arenas. We use initialized arenas throughout the paper for technical reasons, but our results can be converted to results using the classical notion of arena.

Memory.
We define a notion of memory based on complete deterministic automata on colors. The goal of using colors instead of states/actions for transitions of the memory is to define memory structures independently of arenas. A memory skeleton is a tuple M = (M, m init , α upd ) where M is a set of memory states, m init ∈ M is an initial state and α upd : M × C → M is an update function. We add the following constraint: for all finite sets of colors B ⊆ C, the number of states reachable from m init with transitions provided by Memory skeletons with a finite state space are all encompassed by this definition, but this also allows some skeletons with infinitely many states. For example, if C = N, the tuple (N, 0, (m, n) → max{m, n}), which remembers the largest color seen, is a valid memory skeleton: for any finite B ⊆ C, we only need to use memory states up to max B. However, the tuple (N, 0, (m, n) → m + n) remembering the current sum of colors seen is not a memory skeleton, as infinitely many states are reachable from 0, even if only B = {1} can be used. We denote α upd : M × C * → M for the natural extension of α upd to finite sequences of colors. Let

Strategies.
Given an initialized arena (A = (S 1 , S 2 , A, δ, col), S init ) and i ∈ {1, 2}, a strategy of P i on (A, S init ) is a function σ i : A pure memoryless strategy of P i can be simply specified as a function S i → A. A strategy σ i of P i on (A, S init ) is finite-memory if it can be encoded as a Mealy machine Γ = (M, α nxt ), with M = (M, m init , α upd ) a memory skeleton and α nxt : If σ i can be encoded as a Mealy machine (M, α nxt ), we say that σ i is based on (memory) M.
If σ i is based on M and is pure, then the next-action function can be specified as a function S i × M → A. Memoryless strategies correspond to finite-memory strategies based on the trivial memory skeleton M triv = ({m init }, m init , (m init , c) → m init ) that has a single state.
) the set of pure finite-memory (resp. pure, finite-memory, general) strategies of P i on (A, S init ). A type of strategies is an element X ∈ {PFM, P, GFM, G} corresponding to these subsets.
Outcomes. Let (A = (S 1 , S 2 , A, δ, col), S init ) be an initialized arena. When both players have decided on a strategy and an initial state has been chosen, the generated object is a (finite or countably infinite) Markov chain, which induces a probability distribution on the plays. For strategies σ 1 of P 1 and σ 2 of P 2 on (A, S init ) and s ∈ S init , we denote P σ1,σ2 A,s for the probability distribution on plays induced by σ 1 and σ 2 , starting from state s.
We define F to be the smallest σ-algebra on C ω generated by the set of all cylinders on C. In particular, every probability distribution P σ1,σ2 A,s naturally induces a probability distribution over (C ω , F) through the col function, which we denote Pc σ1,σ2 A,s .

Preferences.
To specify each player's objective, we use the general notion of preference relation. A preference relation ⊑ (on C) is a total preorder over Dist(C ω , F). The idea is that P 1 favors the distributions in Dist(C ω , F) that are the largest for ⊑, and as we are studying zero-sum games, P 2 favors the distributions that are the smallest for ⊑. For ⊑ a preference relation and µ, Depending on the context, it might not be necessary to define a preference relation as total: it is sufficient to order distributions that can arise as an element P σ1,σ2 A,s . For example, in the specific case of deterministic games in which only pure strategies are considered, all distributions that arise are always Dirac distributions on a single infinite word in C ω . In this context, it is therefore sufficient to define a total preorder over all Dirac distributions (which we can then see as infinite words, giving a definition of preference relation similar to [33,7]). We give some examples to illustrate our notion of preference relation.
▶ Example 1. We give three examples corresponding to three different ways to encode preference relations. First, a preference relation can be induced by an event W ∈ F called a winning condition, which consists of infinite sequences of colors. The objective of P 1 is to maximize the probability that the event W happens. An event W naturally induces a preference relation ⊑ W such that for µ, µ ′ ∈ Dist(C ω , F), µ ⊑ W µ ′ if and only if µ(W ) ≤ µ ′ (W ). For C = N, we give the example of the weak parity winning condition W wp [49], defined as W wp = {c 1 c 2 . . . ∈ C ω | max j≥1 c j exists and is even}. In finite arenas, the value max j≥1 c j always exists, as there are only finitely many colors that appear. This is different from the classical parity condition, which requires the maximal color seen infinitely often to be even, and not just the maximal color seen.
A preference relation can also be induced by a Borel (real) payoff function f : . Payoff functions are more general than winning conditions: for W a winning condition, the preference relation induced by the indicator function of W corresponds to the preference relation induced by W .
It is also possible to specify preference relations that cannot be expressed as a payoff function. An example is given in [27]: we assume that the goal of P 1 is to see color c ∈ C with probability precisely 1 2 . We denote the event of seeing color c as ♢c ∈ F. Then for

Arena-Independent Finite-Memory Determinacy in Stochastic Games
A (two-player stochastic turn-based zero-sum) initialized game is a tuple G = (A, S init , ⊑), where (A, S init ) is an initialized arena and ⊑ is a preference relation.
The set UCol X ⊑ (A, s, σ 1 ) corresponds to all the distributions that are at least as good for P 1 (w.r.t. ⊑) as a distribution that P 2 can induce by playing a strategy σ 2 of type X against σ 1 ; this set is upward-closed w.
. This inclusion means that the best replies of P 2 against σ ′ 1 yield an outcome that is at least as bad for P 1 (w.r.t. ⊑) as the best replies of P 2 against σ 1 . We can define symmetrical notions for strategies of P 2 .
Let G = (A, S init , ⊑) be an initialized game and X ∈ {PFM, P, GFM, G} be a type of if it is at least as good under X strategies as any other strategy in Σ X i (A, S init ) from all s ∈ S init . When the considered preference relation ⊑ is clear, we often talk about X-optimality in an initialized arena (A, S init ) to refer to X-optimality in the initialized game (A, S init , ⊑). Given a preference relation, a class of arenas, and a type of strategies, our goal is to understand what kinds of strategies are sufficient to play optimally. In the following definition, abbreviations AIFM and FM stand respectively for arena-independent finite-memory and finite-memory.
▶ Definition 2. Let ⊑ be a preference relation, A be a class of initialized arenas, X ∈ {PFM, P, GFM, G} be a type of strategies, and M be a memory skeleton. We say that pure AIFM strategies suffice to play X-optimally in A for P 1 if there exists a memory skeleton M such that for all (A, S init ) ∈ A, P 1 has a pure strategy based on M that is X-optimal in (A, S init ). We say that pure FM strategies suffice to play X-optimally in A for P 1 if for all (A, S init ) ∈ A, there exists a memory skeleton M such that P 1 has a pure strategy based on M that is X-optimal in (A, S init ).
Since memoryless strategies are a specific kind of finite-memory strategies based on the same memory skeleton M triv , the sufficiency of pure memoryless strategies is equivalent to the sufficiency of pure strategies based on M triv , and is therefore a specific case of the sufficiency of pure AIFM strategies. Notice the difference between the order of quantifiers for AIFM and FM strategies: the sufficiency of pure AIFM strategies implies the sufficiency of pure FM strategies, but the opposite is false in general (an example is given in [17]).
▶ Example 3. Let us reconsider the weak parity winning condition W wp introduced in Example 1: the goal of P 1 is to maximize the probability that the greatest color seen is even. To play optimally in any stochastic game, it is sufficient for both players to remember the largest color already seen, which can be implemented by the memory skeleton M max = (N, 0, (m, n) → max{m, n}). As explained above, this memory skeleton has an infinite state space, but as there are only finitely many colors in every (finite) arena, only a finite part of the skeleton is sufficient to play optimally in any given arena. The size of the skeleton used for a fixed arena depends on the appearing colors, but for a fixed number of colors, it does not depend on parameters of the arena (such as its state and action spaces). Therefore pure AIFM strategies suffice to play optimally for both players, and more precisely pure strategies based on M max suffice for both players. ⌟ We define a second stronger notion related to optimality of strategies, which is the notion of subgame perfect strategy: a strategy is subgame perfect in a game if it reacts optimally to all histories consistent with the arena, even histories not consistent with the strategy itself, or histories that only a non-rational adversary would play [43]. This is a desirable property of strategies that is stronger than optimality, since a subgame perfect strategy is not only optimal from the initial position, but from any arbitrary stage (subgame) of the game. In particular, if an opponent plays non-optimally, an optimal strategy that is not subgame perfect does not always fully exploit the advantage that the opponent's suboptimal behavior provides, and may yield a result that is not optimal when starting in a subgame. We first need extra definitions.
For w ∈ C * , µ ∈ Dist(C ω , F), we define the shifted distribution wµ as the distribution such that for an event E ∈ F, wµ( For (A, S init ) an initialized arena, for σ i ∈ Σ G i (A, S init ), and for ρ = s 0 a 1 s 1 . . . a n s n ∈ Hists(A, S init ), we define the shifted strategy σ i [ρ] ∈ Σ G i (A, out(ρ)) as the strategy such that, For ⊑ a preference relation and w ∈ C * , we define the shifted preference relation ⊑ [w] as the preference relation such that for µ, µ ′ ∈ Dist(C ω , F), µ ⊑ Strategies that are X-SP are in particular X-optimal; the converse is not true in general.

Coverability and subgame perfect strategies
In this section, we establish a key tool (Lemma 6) which can be used to reduce questions about the sufficiency of AIFM strategies in reasonable classes of initialized arenas to the sufficiency of memoryless strategies in a subclass. We then describe the use of this lemma to obtain our first main result (Theorem 7), which shows that the sufficiency of pure AIFM strategies implies the existence of pure AIFM SP strategies. Technical details are in [9]. The coverability property means that it is possible to assign a unique memory state to each arena state such that transitions of the arena always update the memory state in a way that is consistent with the memory skeleton. A covered initialized arena carries in some way already sufficient information to play with memory M without actually using memory -using the memory skeleton M would not be more powerful than using no memory at all. This property is linked to the classical notion of product arena with M: a strategy based on M corresponds to a memoryless strategy in a product arena (e.g., [7, Lemma 1] and [9]). For our results, coverability is a key technical definition, as the class of initialized arenas covered by a memory skeleton is sufficiently well-behaved to support edge-induction arguments, whereas it is difficult to perform such techniques directly on the class of product arenas: removing a single edge from a product arena makes it hard to express as a product arena, whereas it is clear that coverability is preserved. Every initialized arena is covered by M triv , which is witnessed by the function ϕ associating m init to every state.
Definitions close to our notion of coverability by M were introduced for deterministic arenas in [36,7]. The definition of adherence with M in [36, Definition 8.12] is very similar, but does not distinguish initial states from the rest (neither in the arena nor in the memory C O N C U R 2 0 2 1 26:10 Arena-Independent Finite-Memory Determinacy in Stochastic Games skeleton). Our property of (A, S init ) being covered by M is also equivalent to A being both prefix-covered and cyclic-covered by M from S init [7]. Distinguishing both notions gives insight in [7] as they are used at different places in proofs (prefix-covered along with monotony, and cyclic-covered along with selectivity). Here, we opt for a single concise definition.
The following lemma sums up our practical use of the idea of coverability.
▶ Lemma 6. Let ⊑ be a preference relation, M be a memory skeleton, and X ∈ {PFM, P, GFM, G} be a type of strategies. Let A be the class of one-player or two-player, stochastic or deterministic initialized arenas. Then, P 1 has an X-optimal (resp. X-SP) strategy based on M in all initialized arenas in A if and only if P 1 has a memoryless X-optimal (resp. X-SP) strategy in all initialized arenas covered by M in A.
We now state one of our main results, which shows that the sufficiency of pure strategies based on the same memory skeleton M implies that pure SP strategies based on M exist. ▶ Theorem 7. Let ⊑ be a preference relation, M be a memory skeleton, and X ∈ {PFM, P, GFM, G} be a type of strategies. Let A be the class of one-player or two-player, stochastic or deterministic initialized arenas. If P 1 has pure X-optimal strategies based on M in initialized arenas of A, then P 1 has pure X-SP strategies based on M in initialized arenas of A.
Proof sketch. By Lemma 6, we can prove instead that P 1 has a pure memoryless X-SP strategy in every initialized arena covered by M, based on the hypothesis that P 1 has a pure memoryless X-optimal strategy in every initialized arena covered by M. For (A, S init ) ∈ A covered by M, P 1 has a pure memoryless X-optimal strategy σ 0 1 . If this strategy is not X-SP, there must be a prefix ρ ∈ Hists(A, S init ) such that σ 0 1 is not X-optimal in (A, out(ρ), ⊑ [ col(ρ)] ). Then, we extend (A, S init ) by adding a "chain" of states with colors col(ρ) up to out(ρ), and add as an initial state the first state of this chain. This new arena is still covered by M, thus P 1 has a pure memoryless X-optimal strategy in this arena that is now X-optimal after seeing ρ. If this strategy is not X-SP, we keep iterating our reasoning. This iteration necessarily ends, as we consider finite arenas, on which there are finitely many pure memoryless strategies. ◀ This result shows a major distinction between the sufficiency of AIFM strategies and the more general sufficiency of FM strategies: if a player can always play optimally with the same memory, then SP strategies may be played with the same memory as optimal strategies -if a player can play optimally but needs arena-dependent finite memory, then infinite memory may still be required to obtain SP strategies. One such example is provided in [38,Example 16] for the average-energy games with lower-bounded energy in deterministic arenas: P 1 can always play optimally with pure finite-memory strategies [6,Theorem 13], but infinite memory is needed for SP strategies. As will be further explained later, we will also use Theorem 7 to gain technical insight in the proof of the main result of Section 5.

One-to-two-player lift
Our goal in this section is to expose a practical tool to help study the memory requirements of two-player stochastic (or deterministic) games. This tool consists in reducing the study of the sufficiency of pure AIFM strategies for both players in two-player games to one-player games. We first state our result, and we then explain how it relates to similar results from the literature and sketch its proof. A slightly generalized result with a more fine-grained quantification on the classes of initialized arenas is in [9], with the complete proof.
▶ Theorem 8 (Pure AIFM one-to-two-player lift). Let ⊑ be a preference relation, M 1 and M 2 be two memory skeletons, and X ∈ {PFM, P, GFM, G} be a type of strategies. Let A be the class of all initialized stochastic or deterministic arenas. Assume that in all initialized one-player arenas of P 1 in A, P 1 can play X-optimally with a pure strategy based on M 1 , and in all initialized one-player arenas of P 2 in A, P 2 can play X-optimally with a pure strategy based on M 2 . Then in all initialized two-player arenas in A, both players have a pure X-SP strategy based on M 1 ⊗ M 2 .
The practical usage of this result can be summed up as follows: to determine whether pure AIFM strategies are sufficient for both players in stochastic (resp. deterministic) arenas to play X-optimally, it is sufficient to prove it for stochastic (resp. deterministic) one-player arenas. Our theorem deals in a uniform manner with stochastic and deterministic arenas, under different types of strategies. Studying memory requirements of one-player arenas is significantly easier than doing so in two-player arenas, as a one-player arena can be seen as a graph (in the deterministic case) or an MDP (in the stochastic case). Still, we bring more tools to study memory requirements of one-player arenas in Section 5.
Theorem 8 generalizes known one-to-two-player lifts: for pure memoryless strategies in deterministic [33] and stochastic [34] games, and for pure AIFM strategies in deterministic games [7]. Very briefly, our proof technique consists in extending the lift for pure memoryless strategies in stochastic games [34] in order to deal with initialized arenas. Then, we show that this pure memoryless one-to-two-player lift can be applied to the class of initialized arenas covered by M 1 ⊗ M 2 (using an edge-induction technique), and Lemma 6 permits to go back from pure memoryless strategies to pure strategies based on M 1 ⊗ M 2 . Thanks to Theorem 7, we also go further in our understanding of the optimal strategies: we obtain the existence of X-SP strategies instead of the seemingly weaker existence of X-optimal strategies.

AIFM characterization
For this section, we fix ⊑ a preference relation, X ∈ {PFM, P, GFM, G} a type of strategies, and M = (M, m init , α upd ) a memory skeleton. We distinguish two classes of initialized arenas: the class A D P1 of all initialized one-player deterministic arenas of P 1 , and the class A S P1 of all initialized one-player stochastic arenas of P 1 . A class of arenas will therefore be specified by a letter Y ∈ {D, S}, which we fix for the whole section. Our aim is to give a better understanding of the preference relations for which pure strategies based on M suffice to play X-optimally in A Y P1 , by characterizing it through two intuitive conditions. All definitions and proofs are stated from the point of view of P 1 . As we only work with one-player arenas in this section, we abusively write P σ1 A,s and Pc σ1 A,s for the distributions on plays and colors induced by a strategy σ 1 of P 1 on (A, s), with the unique, trivial strategy for P 2 .
For A ∈ A Y P1 and s a state of A, we write [A] X s = {Pc σ1 A,s | σ 1 ∈ Σ X 1 (A, s)} for the set of all distributions over (C ω , F) induced by strategies of type X in A from s.
For m 1 , m 2 ∈ M , we write L m1,m2 = {w ∈ C * | α upd (m 1 , w) = m 2 } for the language of words that are read from m 1 up to m 2 in M. Such a language can be specified by the deterministic automaton that is simply the memory skeleton M with m 1 as the initial state and m 2 as the unique final state. We extend the shifted distribution notation to sets of distributions: for w ∈ C * , for Λ ⊆ Dist(C ω , F), we write wΛ for the set {wµ | µ ∈ Λ}.
For two arenas A 1 and A 2 with disjoint state spaces, if s 1 and s 2 are two states controlled by P 1 that are respectively in A 1 and A 2 with disjoint sets of available actions, we write (A 1 , s 1 ) ⊔ (A 2 , s 2 ) for the merged arena in which s 1 and s 2 are merged, and everything else is kept the same. The merged state which comes from the merge of s 1 and s 2 is usually called t. Formally, let A 1 = (S 1 1 , S 1 2 , A 1 , δ 1 , col 1 ), A 2 = (S 2 1 , S 2 2 , A 2 , δ 2 , col 2 ), s 1 ∈ S 1 1 , and s 2 ∈ S 2 1 . We assume that S 1 ∩ S 2 = ∅ and that A( 2}, δ(t, a) = δ i (t, a) and col(t, a) = col i (t, a) if a ∈ A(s i ); all the other available actions, transitions and colors are kept the same as in the original arenas (with transitions going to s 1 or s 2 being directed to t). A symmetrical definition can be written if s 1 and s 2 are both controlled by P 2 .
We can now present the two properties of preference relations at the core of our characterization. These properties are called X-Y-M-monotony and X-Y-M-selectivity; they depend on a type of strategies X, a type of arenas Y, and a memory skeleton M. The first appearance of the monotony (resp. selectivity) notion was in [33], which dealt with deterministic arenas and memoryless strategies; their monotony (resp. selectivity) is equivalent to our P-D-M trivmonotony (resp. P-D-M triv -selectivity). In [7], these definitions were generalized to deal with the sufficiency of strategies based on M in deterministic arenas; their notion of M-monotony (resp. M-selectivity) is equivalent to our P-D-M-monotony (resp. P-D-M-selectivity).

▶ Definition 9 (Monotony). We say that
The crucial part of the definition is the order of the last two quantifiers: of course, given a w ∈ L minit,m , as ⊑ is total, it will always be the case that w[ . However, we ask for something stronger: it must be the case that the set of distributions w[A i ] X si is preferred to w[A 3−i ] X s3−i for any word w ∈ L minit,m . The original monotony definition [33] states that when presented with a choice once among two possible continuations, if a continuation is better than the other one after some prefix, then this continuation is also at least as good after all prefixes. This property is not sufficient for the sufficiency of pure memoryless strategies as it does not guarantee that if the same choice presents itself multiple times in the game, the same continuation should always be chosen, as alternating between both continuations might still be beneficial in the long run -this is dealt with by selectivity. If memory M is necessary to play optimally, then it makes sense that there are different optimal choices depending on the current memory state and that we should only compare prefixes that reach the same memory state. The point of taking into account a memory skeleton M in our definition of X-Y-M-monotony is to only compare prefixes that are read up to the same memory state from m init .

▶ Definition 10 (Selectivity). We say that
(where t comes from the merge of s 1 and s 2 ).
Our formulation of the selectivity concept differs from the original definition [33] and its AIFM counterpart [7] in order to take into account the particularities of the stochastic context, even if it can be proven that they are equivalent in the pure deterministic case. The idea is still the same: the original selectivity definition states that when presented with a choice among multiple possible continuations after some prefix, if a continuation is better than the others, then as the game goes on, if the same choice presents itself again, it is sufficient to always pick the same continuation to play optimally; there is no need to alternate between continuations. This property is not sufficient for the sufficiency of pure memoryless strategies as it does not guarantee that for all prefixes, the same initial choice is always the one we should commit to -this is dealt with by monotony. The point of memory skeleton M in our definition is to guarantee that every time the choice is presented, we are currently in the same memory state.
An interesting property is that both notions are stable by product with a memory skeleton: if ⊑ is X-Y-M-monotone (resp. X-Y-M-selective), then for all memory skeletons M ′ , ⊑ is also X-Y-(M ⊗ M ′ )-monotone (resp. X-Y-(M ⊗ M ′ )-selective). The reason is that in each definition, we quantify universally over the class of all prefixes w that reach the same memory state m; if we consider classes that are subsets of the original classes, then the definition still holds. This property matches the idea that playing with more memory is never detrimental.
Combined together, it is intuitively reasonable that X-Y-M-monotony and X-Y-Mselectivity are equivalent to the sufficiency of pure strategies based on M to play X-optimally in A Y P1 : monotony tells us that when a single choice has to be made given a state of the arena and a memory state, the best choice is always the same no matter what prefix has been seen, and selectivity tells us that once a good choice has been made, we can commit to it in the future of the game. We formalize this idea in Theorem 13. First, we add an extra restriction on preference relations which is useful when stochasticity is involved.
▶ Definition 11 (Mixing is useless). We say that mixing is useless for ⊑ if for all sets I at most countable, for all positive reals (λ i ) i∈I such that i∈I λ i = 1, for all families (µ i ) i∈I , That is, if we can write a distribution as a convex combination of distributions, then it is never detrimental to improve a distribution appearing in the convex combination.
▶ Remark 12. All preference relations encoded as Borel real payoff functions (as defined in Example 1) satisfy this property (it is easy to show the property for indicator functions, and we can then extend this fact to all Borel functions thanks to properties of the Lebesgue integral). The third preference relation from Example 1 (reaching c ∈ C with probability precisely 1 2 ) does not satisfy this property: if µ 1 (♢c) = 0, µ ′ 1 (♢c) = 1 2 , and µ 2 (♢c) = 1, we have µ 1 ⊏ µ ′ 1 and µ 2 ⊑ µ 2 , but 1 2 µ ′ 1 + 1 2 µ 2 ⊏ 1 2 µ 1 + 1 2 µ 2 . In deterministic games with pure strategies, only Dirac distributions on infinite words occur as distributions induced by an arena and a strategy, so the requirement that mixing is useless is not needed. ⌟ ▶ Theorem 13. Assume that no stochasticity is involved (that is, X ∈ {P, PFM} and Y = D), or that mixing is useless for ⊑. Then pure strategies based on M suffice to play X-optimally in all initialized one-player arenas in A Y P1 for P 1 if and only if ⊑ is X-Y-M-monotone and X-Y-M-selective.
Proof sketch. We sketch both directions of the proof (available in [9]). The proof of the necessary condition of Theorem 13 is the easiest direction. The main idea is to build the right arenas (using the arenas occurring in the definitions of monotony and selectivity) so that we can use the hypothesis about the existence of pure X-optimal strategies based on M to immediately deduce X-Y-M-monotony and X-Y-M-selectivity. It is not necessary that mixing is useless for ⊑ for this direction of the equivalence.
For the sufficient condition, we first reduce as usual the statement to the existence of pure memoryless strategies in covered initialized arenas, using Lemma 6. We proceed with an edge-induction in these arenas (as for Theorem 8). The base case is trivial (as in an arena in which all states have a single available action, there is a single strategy which is pure and memoryless). For the induction step, we take an initialized arena (A ′ , S init ) ∈ A Y P1 covered by M, and we pick a state t with (at least) two available actions. A memory state ϕ(t) is associated to t thanks to coverability. We consider arenas (A ′ a , S init ) obtained from (A ′ , S init ) by leaving a single action a available in t, to which we can apply the induction hypothesis and obtain a pure memoryless X-optimal strategy σ a 1 . It is left to prove that one of these strategies is also X-optimal in (A ′ , S init ); this is where X-Y-M-monotony and X-Y-M-selectivity come into play.
The property of X-Y-M-monotony tells us that one of these subarenas (A ′ a * , S init ) is preferred to the others w.r.t. ⊑ after reading any word in L minit,ϕ(t) . We now want to use X-Y-M-selectivity to conclude that there is no reason to use actions different from a * when coming back to t, and that σ a * 1 is therefore also X-optimal in (A ′ , S init ). To do so, we take any strategy σ 1 ∈ Σ X 1 (A ′ , s) for s ∈ S init and we condition distribution P σ1 A ′ ,s over all the ways it reaches (or not) t, which gives a convex combination of probability distributions. We want to state that once t is reached, no matter how, switching to strategy σ a * 1 is always beneficial. For this, we would like to use X-subgame-perfection of σ a * 1 rather than simply X-optimality: this is why in the actual proof, our induction hypothesis is about X-SP strategies and not X-optimal strategies. Luckily, Theorem 7 indicates that requiring subgame perfection is not really stronger than what we want to prove. We then use that mixing is useless for ⊑ (Definition 11) to replace all the parts that go through t in the convex combination by a better distribution induced by σ a * 1 from t. ◀ The literature provides some sufficient conditions for preference relations to admit pure memoryless optimal strategies in one-player stochastic games (for instance, in [31]). Here, we obtain a full characterization when mixing is useless for ⊑ (in particular, this is a full characterization for Borel real payoff functions), which can deal not only with memoryless strategies, but also with the more general AIFM strategies. It therefore provides a more fundamental understanding of preference relations for which AIFM strategies suffice or do not suffice. In particular, there are examples in which the known sufficient conditions are not verified even though pure memoryless strategies suffice (one such example is provided in [10]), and that is for instance where our characterization can help.
It is interesting to relate the concepts of monotony and selectivity to other properties from the literature to simplify the use of our characterization. For instance, if a real payoff function f : C ω → R is prefix-independent, 1 then it is also X-Y-M-monotone for any X, Y, and M; therefore, the sufficiency of pure AIFM strategies immediately reduces to analyzing selectivity.

Conclusion
We have studied stochastic games and gave an overview of desirable properties of preference relations that admit pure arena-independent finite-memory optimal strategies. Our analysis provides general tools to help study memory requirements in stochastic games, both with one player (Markov decision processes) and two players, and links both problems. It generalizes both work on deterministic games [33,7] and work on stochastic games [34].
A natural question that remains unsolved is the link between memory requirements of a preference relation in deterministic and in stochastic games; our results can be called independently to study both problems, but do not describe a bridge to go from one to the other yet. Also, our results can only be used to show the optimality of pure strategies with some fixed memory, but in some cases, using randomized strategies allows for lesser memory requirements [16,41]. Investigating whether extensions to our results dealing with randomized strategies hold would therefore be valuable.