Algorithms for Omega-Regular Games with Imperfect Information

We study observation-based strategies for two-player turn-based games on graphs with omega-regular objectives. An observation-based strategy relies on imperfect information about the history of a play, namely, on the past sequence of observations. Such games occur in the synthesis of a controller that does not see the private state of the plant. Our main results are twofold. First, we give a fixed-point algorithm for computing the set of states from which a player can win with a deterministic observation-based strategy for any omega-regular objective. The fixed point is computed in the lattice of antichains of state sets. This algorithm has the advantages of being directed by the objective and of avoiding an explicit subset construction on the game graph. Second, we give an algorithm for computing the set of states from which a player can win with probability 1 with a randomized observation-based strategy for a Buechi objective. This set is of interest because in the absence of perfect information, randomized strategies are more powerful than deterministic ones. We show that our algorithms are optimal by proving matching lower bounds.


Introduction
Two-player games on graphs play an important role in computer science. In particular, the controller synthesis problem asks, given a model for a plant, to construct a model for a controller such that the behaviors resulting from the parallel composition of the two models respects a given specification (e.g., are included in an ω-regular set). Controllers can be synthesized as winning strategies in a game graph whose vertices represent the plant states, and whose players represent the plant and the controller [18,17]. Other applications of game graphs include realizability and compatibility checking, where the players represent parallel processes of a system, or its environment [1,11,6].
Most results about two-player games played on graphs make the hypothesis of perfect information. In this setting, the controller knows, during its interaction with the plant, the exact state of the plant. In practice, this hypothesis is often not reasonable. For example, in the context of hybrid systems, the controller acquires information about the state of the plant using sensors with finite precision, which return imperfect information about the state. Similarly, if the players represent individual processes, then a process has only access to the public variables of the other processes, not to their private variables [19,2].
Two-player games of imperfect information are considerably more complicated than games of perfect information. First, decision problems for imperfect-information games usually lie in higher complexity classes than their perfect-information counter-parts [19,14,2]. The algorithmic difference is often exponential, due to a subset construction that, similar to the determinization of finite automata, turns an imperfect-information game into an equivalent perfect-information game. Second, because of the determinization, no symbolic algorithms are known to solve imperfect-information games. This is in contrast to the perfect-information case, where (often) simple and elegant fixed-point algorithms exist [12,8]. Third, in the context of imperfect information, deterministic strategies are sometimes insufficient. A game is turn-based if in every state one of the players chooses a successor state. In turn-based games of perfect information the set of winning states coincides with the set of states where the probability of winning is 1, and so deterministic strategies suffice to win (and thus also to win with probability 1). In contrast, in turnbased games of imperfect information the set of winning states is in general a strict subset of the set of states where the probability of winning is 1, and so randomized strategies are required to win with probability 1 (see Example 2.3). Fourth, winning strategies for imperfect-information games need memory even for simple objectives such as safety and reachability (see Example 4.4). This is again in contrast to the perfect-information case, where turn-based safety and reachability games can be won with memoryless strategies.
The contributions of this paper are twofold. First, we provide a symbolic fixed-point algorithm to compute winning states in games of imperfect information for arbitrary ωregular objectives. The novelty is that our algorithm is symbolic; it does not carry out an explicit subset construction. Instead, we compute fixed points on the lattice of antichains of state sets. Antichains of state sets can be seen as a symbolic and compact representation for ⊆-downward-closed sets of sets of states. 1 This solution extends our recent result [10] from safety objectives to all ω-regular objectives. To justify the correctness of the algorithm, we transform games of imperfect information into games of perfect information while preserving 1 We recently used this symbolic representation of ⊆-downward-closed sets of state sets to propose a new algorithm for solving the universality problem of nondeterministic finite automata. First experiments show a very promising performance; see [9] for details. the existence of winning strategies for every objective. The reduction is only part of the proof, not part of the algorithm. For the special case of parity objectives, we obtain a symbolic Exptime algorithm for solving parity games of imperfect information. This is optimal, as the reachability problem for games of imperfect information is known to be Exptime-hard [19].
Second, we study randomized strategies and winning with probability 1 for imperfectinformation games. To our knowledge, for these games no algorithms (symbolic or not) are present in the literature. Following [7], we refer to winning with probability 1 as almost-sure winning (almost winning, for short), in contrast to sure winning with deterministic strategies. We provide a symbolic Exptime algorithm to compute the set of almost-winning states for games of imperfect information with Büchi objectives (reachability objectives can be obtained as a special case, and for safety objectives almost winning and sure winning coincide). Our solution is again justified by a reduction to games of perfect information. However, for randomized strategies the reduction is different, and considerably more complicated. We prove our algorithm to be optimal, showing that computing the almost-winning states for reachability games of imperfect information is Exptime-hard. The problem of computing the almost-winning states for coBüchi objectives under imperfect information in Exptime remains an open problem.
The paper is organized as follows. Section 2 presents the definitions; Section 3 gives the algorithm for the case of sure winning with deterministic strategies; Section 4, for the case of almost winning with randomized strategies; and Section 5 provides the lower bounds.
Related work. In [17], Pnueli and Rosner study the synthesis of reactive modules. In their framework, there is no game graph; instead, the environment and the objective are specified using an LTL formula. In [14], Kupferman and Vardi extend these results in two directions: they consider CTL * objectives and imperfect information. Again, no game graph, but a specification formula is given to the synthesis procedure. We believe that our setting, where a game graph is given explicitly, is more suited to fully and uniformly understand the role of imperfect information. For example, Kupferman and Vardi claim that imperfect information comes at no cost, because if the specification is given as a CTL (or CTL * ) formula, then the synthesis problem is complete for Exptime (resp. 2Exptime), just as in the perfect-information case. These hardness results, however, depend on the fact that the specification is given compactly as a formula. In our setting, with an explicit game graph, reachability games of perfect information are Ptime-complete, whereas reachability games of imperfect information are Exptime-complete [19]. None of the above papers provide symbolic solutions, and none of them consider randomized strategies.
It is known that for Partially Observable Markov Decision Processes (POMDPs) with boolean rewards and limit-average objectives the quantitative analysis (whether the value is greater than a specified threshold) is Exptime-complete [15]. However, almost winning is a qualitative question, and our hardness result for almost winning of imperfect-information games does not follow from the known results on POMDPs. We give in Section 5 a detailed proof of the hardness result of [19] for sure winning of imperfect-information games with reachability objectives, and we show that this proof can be extended to almost winning as well. To the best of our knowledge, this is the first hardness result that applies to the qualitative analysis of almost winning in imperfect-information games.
A class of semiperfect-information games, where one player has imperfect information and the other player has perfect information, is studied in [4]. That class is simpler than the games studied here; it can be solved in NP ∩ coNP for parity objectives.

Definitions
A game structure (of imperfect information) is a tuple G = L, l 0 , Σ, ∆, O, γ , where L is a finite set of states, l 0 ∈ L is the initial state, Σ is a finite alphabet, ∆ ⊆ L × Σ × L is a set of labeled transitions, O is a finite set of observations, and γ : O → 2 L \∅ maps each observation to the set of states that it represents. We require the following two properties on G: (i) for all ℓ ∈ L and all σ ∈ Σ, there exists ℓ ′ ∈ L such that (ℓ, σ, ℓ ′ ) ∈ ∆; and (ii) the set {γ(o) | o ∈ O} partitions L. We say that G is a game structure of perfect information if O = L and γ(ℓ) = {ℓ} for all ℓ ∈ L. We often omit (O, γ) in the description of games of perfect information. For σ ∈ Σ and s ⊆ L, let Post G σ (s) = {ℓ ′ ∈ L | ∃ℓ ∈ s : (ℓ, σ, ℓ ′ ) ∈ ∆}. Plays. In a game structure, in each turn, Player 1 chooses a letter in Σ, and Player 2 resolves nondeterminism by choosing the successor state. A play in G is an infinite sequence π = ℓ 0 σ 0 ℓ 1 . . . σ n−1 ℓ n σ n . . . such that (i) ℓ 0 = l 0 , and (ii) for all i ≥ 0, we have (ℓ i , σ i , ℓ i+1 ) ∈ ∆. The prefix up to ℓ n of the play π is denoted by π(n); its length is |π(n)| = n + 1; and its last element is Last(π(n)) = ℓ n . The observation sequence of π is the unique infinite sequence Similarly, the observation sequence of π(n) is the prefix up to o n of γ −1 (π). The set of infinite plays in G is denoted Plays(G), and the set of corresponding finite prefixes is denoted Prefs(G). A state ℓ ∈ L is reachable in G if there exists a prefix ρ ∈ Prefs(G) such that Last(ρ) = ℓ. For a prefix ρ ∈ Prefs(G), the cone Cone(ρ) = { π ∈ Plays(G) | ρ is a prefix of π } is the set of plays that extend ρ. The knowledge associated with a finite observation sequence τ = o 0 σ 0 o 1 σ 1 . . . σ n−1 o n is the set K(τ ) of states in which a play can be after this sequence of observations, that is, K(τ ) = {Last(ρ) | ρ ∈ Prefs(G) and γ −1 (ρ) = τ }. Lemma 2.1. Let G = L, l 0 , Σ, ∆, O, γ be a game structure of imperfect information. For σ ∈ Σ, ℓ ∈ L, and ρ, ρ ′ ∈ Prefs(G) with ρ ′ = ρ · σ · ℓ, let o ℓ ∈ O be the unique observation such that ℓ ∈ γ(o ℓ ).
Sure winning and almost winning. A strategy λ i for Player i in G is sure winning for an objective φ if for all π ∈ Outcome i (G, λ i ), we have π |= φ. Given a game structure G and a state ℓ of G, we write G ℓ for the game structure that results from G by changing the initial state to ℓ, that is, if G = L, l 0 , Σ, ∆, O, γ , then G ℓ = L, ℓ, Σ, ∆, O, γ . An event is a measurable set of plays, and given strategies α and β for the two players, the probabilities of events are uniquely defined [22]. For a Borel objective φ, we denote by Pr α,β ℓ (φ) the probability that φ is satisfied in the game G ℓ given the strategies α and β. A strategy α for Player 1 in G is almost winning for the objective φ if for all randomized strategies β for Player 2, we have Pr α,β l 0 (φ) = 1. The set of sure-winning (resp. almost-winning) states of a game structure G for the objective φ is the set of states ℓ such that Player 1 has a deterministic sure-winning (resp. randomized almost-winning) observation-based strategy in G ℓ for the objective φ.  Notice that deterministic strategies suffice for sure winning a game: given a randomized strategy α for Player 1, let α D be the deterministic strategy such that for all ρ ∈ Prefs(G), the strategy α D (ρ) chooses an input letter from Supp(α(ρ)). Then Outcome 1 (G, α D ) ⊆ Outcome 1 (G, α), and thus, if α is sure winning, then so is α D . The result also holds for observation-based strategies. However, for almost winning, randomized strategies are more powerful than deterministic strategies as shown by Example 2.3.

Example 2.3.
Consider the game structure shown in Figure 1 The transitions are shown as labeled edges in the figure, and the initial state is ℓ 1 . The objective of Player 1 is Reach({o 4 }), to reach state ℓ 4 . We argue that the game is not sure winning for Player 1. Let α be any deterministic strategy for Player 1. Consider the deterministic strategy β for Player 2 as follows: for all ρ ∈ Prefs(G) such that Last(ρ) ∈ γ(o 2 ), if α(ρ) = a, then in the previous round β chooses the state ℓ 2 , and if α(ρ) = b, then in the previous round β chooses the state ℓ ′ 2 . Given α and β, the play outcome(G, α, β) never reaches ℓ 4 . Similarly, Player 2 has no sure winning strategy for the dual objective Safe({o 1 , o 2 , o 3 }). Hence the game is not determined. However, the game G is almost winning for Player 1. Consider the randomized strategy that plays a and b uniformly at random at all states. Every time the game visits observation o 2 , for any strategy for Player 2, the game visits ℓ 3 and ℓ ′ 3 with probability 1 2 , and hence also reaches ℓ 4 with probability 1 2 . It follows that against all Player 2 strategies the play eventually reaches ℓ 4 with probability 1.
Spoiling strategies. To spoil a strategy of Player 1 (for sure-winning), Player 2 does not need the full memory of the history of the play, he only needs counting strategies. We say that a deterministic strategy β : Prefs(G) × Σ → L for Player 2 is counting if for all prefixes ρ, ρ ′ ∈ Prefs(G) such that |ρ| = |ρ ′ | and Last(ρ) = Last(ρ ′ ), and for all σ ∈ Σ, we have β(ρ, σ) = β(ρ ′ , σ). Let B c G be the set of counting strategies for Player 2. The memory needed by a counting strategy is only the number of turns that have been played. This type of strategy is sufficient to spoil the non-winning strategies of Player 1.
Proof. We prove the equivalent statement that: G be an arbitrary observation-based strategy for Player 1 in G. Let β ∈ B G be a strategy for Player 2 such that outcome(G, α o , β) ∈ φ. Let outcome(G, α o , β) = ℓ 0 σ 0 ℓ 1 . . . σ n−1 ℓ n σ n . . . and define a counting strategy β c for Player 2 such that ∀ρ ∈ Prefs(G) · ∀σ ∈ Σ : if Last(ρ) = ℓ n−1 and σ = σ n−1 for n = |ρ|, then β c (ρ, σ) = ℓ n , and otherwise β c (ρ, σ) is fixed arbitrarily in the set Post G σ (Last(ρ)). Clearly, β c is a counting strategy and we have outcome Remarks. First, the hypothesis that the observations form a partition of the state space can be weakened to a covering of the state space, where observations can overlap [10]. In that case, Player 2 chooses both the next state of the game ℓ and the next observation o such that ℓ ∈ γ(o). The definitions related to plays, strategies, and objectives are adapted accordingly. Such a game structure G with overlapping observations can be encoded by an equivalent game structure G ′ of imperfect information, whose state space is the set The games G and G ′ are equivalent in the sense that for every Borel objective φ, there exists a sure (resp. almost) winning strategy for Player i in G for φ if and only if there exists such a winning strategy for Player i in G ′ for φ.
Second, it is essential that the objective is expressed in terms of the observations. Indeed, the games of imperfect information with a nonobservable winning condition are more complicated to solve. For instance, the universality problem for Büchi automata can be reduced to such games, but the construction that we propose in Section 3 cannot be used. More involved constructionsà la Safra are needed [20].

Sure Winning
First, we show that a game structure G of imperfect information can be encoded by a game structure G K of perfect information such that for every objective φ, there exists a deterministic observation-based sure-winning strategy for Player 1 in G for φ if and only if there exists a deterministic sure-winning strategy for Player 1 in G K for φ. We obtain G K using a subset construction similar to Reif's construction for safety objectives [19]. Each state in G K is a set of states of G which represents the knowledge of Player 1. In the worst case, the size of G K is exponentially larger than the size of G. Second, we present a fixed-point algorithm based on antichains of set of states [10], whose correctness relies on the subset construction, but avoids the explicit construction of G K .
3.1. Subset construction for sure winning. Subset construction. Given a game structure of imperfect information G = L, l 0 , Σ, ∆, O, γ , we define the knowledge-based subset construction of G as the following game structure of perfect information: Notice that for all s ∈ L and all σ ∈ Σ, there exists a set s ′ ∈ L such that (s, σ, s ′ ) ∈ ∆ K . A (deterministic or randomized) strategy in G K is called a knowledge-based strategy. To distinguish between a general strategy in G, an observation-based strategy in G, and a knowledge-based strategy in G K , we often use the notations α, α o , and α K , respectively.
Proof. First, the property holds for s = {l 0 }, the initial state in G K as it is a singleton. Second, we show that the property holds for any successor s ′ of any state s in G K . Assume that (s, σ, s ′ ) ∈ ∆ K . Then we know that Abusing the notation, for a play π = s 0 σ 0 s 1 . . . σ n−1 s n σ n . . . ∈ Plays(G K ) we define its observation sequence as the infinite sequence The correctness of the subset construction G K is established by the following two lemmas which generalize the result of [19] for safety objective to any kind of objective. For Lemma 3.3, the proof of [19] is not sufficient, since violation of a safety objective can be witnessed by a finite prefix of play, while general objectives need an infinite witness.
Lemma 3.2. If Player 1 has a deterministic sure-winning strategy in G K for an objective φ, then he has a deterministic observation-based sure-winning strategy in G for φ.
Proof. Let α K be a deterministic sure-winning strategy for Player 1 in G K with the objective φ. Define α o a strategy for Player 1 in G as follows: for every ρ ∈ Prefs(G), By contradiction, assume that α o is not a sure-winning strategy for Player 1 in G with the objective φ. Then there exists a play π ∈ Outcome 1 (G, α o ) such that π |= φ. Let π = ℓ 0 σ 0 ℓ 1 σ 1 . . . and consider the infinite sequence π K = s 0 σ 0 s 1 σ 1 . . . where s i = K(γ −1 (π(i))) for each i ≥ 0. We show that π K ∈ Outcome 1 (G K , α K ). First, we have s 0 = K(γ −1 (π(0))) = K(γ −1 (ℓ 0 )) = {ℓ 0 }. Second, for any i ≥ 0, we have s i = K(γ −1 (π(i))) and by Lemma 2.1 we have and thus π K |= φ which contradicts the fact that α K is a sure-winning strategy for Player 1 in G K with the objective φ. Therefore, α o is a sure-winning strategy for Player 1 in G with the objective φ. Lemma 3.3. If Player 1 has a deterministic observation-based sure-winning strategy in G for an objective φ, then Player 1 has a deterministic sure-winning strategy in G K for φ.
Proof. First, it is easy to show by induction that for every finite prefix of play ρ K = s 0 σ 0 s 1 . . . σ n−1 s n in Prefs(G K ), there exists a prefix of play ρ = ℓ 0 σ 0 ℓ 1 . . . σ n−1 ℓ n in Prefs(G) that generates ρ K , that is such that s i = K(γ −1 (ℓ 0 σ 0 ℓ 1 . . . σ i−1 ℓ i )) for each 0 ≤ i ≤ n; and for all such prefix of play ρ ′ that generates ρ K , we have γ −1 (ρ) = γ −1 (ρ ′ ) (by Lemma 2.1). Now, let α o be a deterministic observation-based sure-winning strategy for Player 1 in G that is sure-winning for φ. We construct a deterministic strategy α K for Player 1 in G K as follows: for every By contradiction, assume that α K is not sure-winning for Player 1 in G K with objective φ. Then, there exists a play π K ∈ Outcome 1 (G K , α K ) with π K |= φ. We By definition of G K , for all i ≥ 0, we have Last(π K (i)) = ∅ and for all ℓ ∈ Last(π K (i)), there is a path in D from (ℓ 0 , 0) to (ℓ, i). Therefore, V is infinite and by König's Lemma, there exists an infinite path (ℓ 0 , 0)(ℓ 1 , 1) . . . in D and thus a play π = ℓ 0 σ 0 ℓ 1 σ 1 . . . in G such that π ∈ Outcome 1 (G, α o ) and π |= φ. This is in contradiction with the assumption that α o is sure-winning in G for φ. Hence α K is sure-winning for Player 1 in G K with objective φ.

3.2.
Two interpretations of the µ-calculus. From the results of Section 3.1, we can solve a game G of imperfect information with objective φ by constructing the knowledgebased subset construction G K and solving the resulting game of perfect information for the objective φ using standard methods. For the important class of ω-regular objectives, there exists a fixed-point theory -the µ-calculus-for this purpose [8]. When run on G K , these fixed-point algorithms compute sets of sets of states of the game G. An important property of those sets is that they are downward closed with respect to set inclusion: if Player 1 has a deterministic strategy to win the game G when her knowledge is a set s, then she also has a deterministic strategy to win the game when her knowledge is s ′ with s ′ ⊆ s. And thus, if s is a sure-winning state of G K , then so is s ′ . Based on this property, we devise a new algorithm for solving games of perfect information.
An antichain of nonempty sets of states is a set q ⊆ 2 L \ {∅} such that for all s, s ′ ∈ q, we have s ⊂ s ′ . Let A be the set of antichains of nonempty subsets of L, and consider the following partial order on A: for all q, q ′ ∈ A, let q ⊑ q ′ iff ∀s ∈ q · ∃s ′ ∈ q ′ : s ⊆ s ′ . For q ⊆ 2 L , define the set of maximal elements of q by ⌈q⌉ = {s ∈ q | s = ∅ and ∀s ′ ∈ q : s ⊂ s ′ }. Clearly, ⌈q⌉ is an antichain. The least upper bound of q, q ′ ∈ A is q ⊔ q ′ = ⌈{s | s ∈ q or s ∈ q ′ }⌉, and their greatest lower bound is q ⊓ q ′ = ⌈{s ∩ s ′ | s ∈ q and s ′ ∈ q ′ } ⌉. The definition of these two operators extends naturally to sets of antichains, and the greatest element of A is ⊤ = {L} and the least element is ⊥ = ∅. The partially ordered set A, ⊑, ⊔, ⊓, ⊤, ⊥ forms a complete lattice. We view antichains of state sets as a symbolic representation of ⊆-downward-closed sets of state sets.
A game lattice is a complete lattice V together with a predecessor operator CPre : V → V . Given a game structure G = L, l 0 , Σ, ∆, O, γ of imperfect information, and its knowledge-based subset construction G K = L, {l 0 }, Σ, ∆ K , we consider two game lattices: the lattice of subsets S, ⊆, ∪, ∩, L, ∅ , where S = 2 L and CPre : S → S is defined by CPre(q) = {s ∈ L | ∃σ ∈ Σ · ∀s ′ ∈ L : if (s, σ, s ′ ) ∈ ∆ K , then s ′ ∈ q}; and the lattice of Lattice of antichains The µ-calculus formulas are generated by the grammar  Downward closure. Given a set q ∈ S, the downward closure of q is the set q↓ = {s ∈ L | ∃s ′ ∈ q : s ⊆ s ′ }. Observe that in particular, for all q ∈ S, we have ∅ ∈ q↓ and ⌈q⌉↓ = q↓. The sets q↓, for q ∈ S, are the downward-closed sets. A valuation E for the variables in the lattice S of subsets is downward closed if every variable x is mapped to a downward-closed set, that is, E(x) = E(x)↓. Lemma 3.6. All downward-closed sets q, q ′ ∈ S satisfy ⌈q ∩ q ′ ⌉ = ⌈q⌉ ⊓ ⌈q ′ ⌉ and ⌈q ∪ q ′ ⌉ = ⌈q⌉ ⊔ ⌈q ′ ⌉.  Proof. We prove this by induction on the structure of ϕ.
Using Lemma 3.6 and Lemma 3.7, we have successively: The proof is similar to the previous case.
Consider a game structure G of imperfect information and a parity objective φ. From Theorems 3.4 and 3.5 and Lemma 3.8, we can decide the existence of a deterministic observation-based sure-winning strategy for Player 1 in G for φ without explicitly constructing the knowledge-based subset construction G K , by instead evaluating a fixed-point formula in the lattice of antichains. Corollary 3.10. Let G be a game structure of imperfect information, let p be a priority function, and let ℓ be a state of G. Whether ℓ is a sure-winning state in G for the parity objective Parity(p) can be decided in Exptime. Corollary 3.10 is proved as follows: for a parity objective φ, an equivalent µ-calculus formula ϕ can be obtained, where the size and the fixed-point quantifier alternations of ϕ is polynomial in φ. Thus given G and φ, we can evaluate ϕ in G K in Exptime.

Almost Winning
Given a game structure G of imperfect information, we first construct a game structure H in which the knowledge of Player 1 is made explicit. However, the construction is different from the one used for sure winning. Then, we establish certain equivalences between randomized strategies in G and H. Finally, we show how the reduction can be used to obtain a symbolic Exptime algorithm for computing almost-winning states in G for Büchi objectives. An Exptime algorithm for almost winning for coBüchi objectives under imperfect information remains unknown.
Intuitively, when H is in state (s, ℓ), it corresponds to G being in state ℓ and the knowledge of Player 1 being s. Two states q = (s, ℓ) and q ′ = (s ′ , ℓ ′ ) of H are equivalent, written q ≈ q ′ , if s = s ′ , that is when the knowledge of Player 1 is the same in the two states. Two prefixes ρ = q 0 σ 0 q 1 . . . σ n−1 q n and ρ ′ = q ′ 0 σ ′ 0 q ′ 1 . . . σ ′ n−1 q ′ n of H are equivalent, written ρ ≈ ρ, if for all 0 ≤ i ≤ n, we have q i ≈ q ′ i , and for all 0 ≤ i ≤ n − 1, we have σ i = σ ′ i . Two plays π, π ′ ∈ Plays(H) are equivalent, written π ≈ π ′ , if for all i ≥ 0, we have π(i) ≈ π ′ (i). For a state q ∈ Q, we denote by [q] ≈ = { q ′ ∈ Q | q ≈ q ′ } the ≈-equivalence class of q. We define equivalence classes for prefixes and plays similarly. We cannot reuse the results of Section 3 to compute almost-winning states of G, as the randomized strategies in H should not distinguish equivalent states.
Equivalence-preserving strategies and objectives. A strategy α for Player 1 in H is positional if it is independent of the prefix of plays and depends only on the last state, that is, for all ρ, ρ ′ ∈ Prefs(H) with Last(ρ) = Last(ρ ′ ), we have α(ρ) = α(ρ ′ ). A positional strategy α can be viewed as a function α : Q → D(Σ). A strategy α for Player 1 in H is equivalencepreserving if for all ρ, ρ ′ ∈ Prefs(H) with ρ ≈ ρ ′ , we have α(ρ) = α(ρ ′ ). We denote by A H , A P H , and A ≈ H the set of all Player-1 strategies, the set of all positional Player-1 strategies, and the set of all equivalence-preserving Player-1 strategies in H, respectively. We write A ≈(P ) H = A ≈ H ∩ A P H for the set of equivalence-preserving positional strategies. An objective φ for H is a subset of (Q × Σ) ω , that is, the objective φ is a set of plays. The objective φ is equivalence-preserving if for all plays π ∈ φ, we have [π] ≈ ⊆ φ.
Relating strategies for Player 1. We define two strategy mappings g : A H → A G and h : A G → A H . Given a Player-1 strategy α H in H, we construct a Player-1 strategy α G = g(α H ) in G as follows: for all ρ ∈ Prefs(G), let α G (ρ) = α H (h(ρ)). Similarly, given a Player-1 strategy α G in G, we construct a Player-1 strategy α H = h(α G ) in H as follows: for all ρ ∈ Prefs(H), let α H (ρ) = α G (h −1 (ρ)). The following properties hold: (i) for all strategies α H ∈ A H , if α H is equivalence-preserving, then g(α H ) is observation-based; and (ii) for all strategies α G ∈ A G , if α G is observation-based, then h(α G ) is equivalencepreserving.
Lemma 4.1. The following assertions hold.
(1) For all ρ H ∈ Prefs(H), for every equivalence preserving strategy α H , for every strategy β H we have (2) For all ρ G ∈ Prefs(G), for every observation-based strategy α G , for every strategy β G we have equivalence-preserving Player-1 strategies α H in H, and all Player-2 strategies β H in H, Proof. By the Caratheódary unique-extension theorem, a probability measure defined on cones has a unique extension to all Borel objectives. The theorem then follows from Lemma 4.1.

Corollary 4.3 follows from Theorem 4.2.
Corollary 4.3. For every Borel objective Φ G for G, we have

4.2.
Almost winning for Büchi objectives. We first illustrate the need of memory and randomization for almost-winning in imperfect information games with Büchi objectives. We show that Player 1 has no observation-based sure-winning strategy in this game. This is because when we fix an observation-based strategy for Player 1, Player 2 has a spoiling strategy to maintain the game into the states {ℓ 0 , ℓ 1 , ℓ 2 }. Indeed, at ℓ 0 , the only reasonable choice for Player 1 is to play a. Then Player 2 can choose to go either in ℓ 1 or ℓ 2 . In both cases, the observation will be the same for Player 1. After seeing o 1 ao 2 , if the strategy of Player 1 is to play a then Player 2 chooses ℓ 2 , otherwise, if Player 1 strategy is to play b then Player 2 chooses ℓ 1 . This can be repeated and so Player 2 has a spoiling strategy against any observation-based strategy of Player 1.
The strategy α is almost-winning against any randomized strategy of Player 2. Note that the strategy α uses memory and this is necessary because when receiving observation AS . Lemma 4.5 follows from the construction of H from G, and yields Lemma 4.6.
Given an equivalence-preserving Player-1 strategy α H ∈ A H , a prefix ρ ∈ Prefs(H), and a state q ∈ Q, if there exists a Player-2 strategy β H ∈ B H such that Pr α H ,β H q (Cone(ρ)) > 0, then for every prefix ρ ′ ∈ Prefs(H) with ρ ≈ ρ ′ , there exist a Player-2 strategy β ′ H ∈ B H and a state q ′ ∈ [q] ≈ such that Pr (Buchi(B T )) < 1 }. It follows from Lemma 4.6 that if a play starts in Q ≈ AS and reaches Q \ Q ≈ AS with positive probability, then for all equivalence-preserving strategies for Player 1, there is a Player 2 strategy that ensures that the Büchi objective Buchi(B T ) is satisfied with probability strictly lower than 1.
Notation. For a state q ∈ Q and Y ⊆ Q, let Allow(q, (Buchi(B T )) = 1. Let ρ = q 0 σ 0 q 1 . . . σ n−1 q n be a prefix in Prefs(H) such that for all 0 ≤ i ≤ n, we have q i ∈ Q ≈ AS . If there is a Player-2 strategy β H ∈ B H and a state q ′ ∈ [q] ≈ such that Pr α H ,β H q ′ (Cone(ρ)) > 0, then Supp(α H (ρ)) ⊆ Allow([q n ] ≈ , Q ≈ AS ).  Notation. We inductively define the ranks of states in Q ≈ AS as follows: let )}, and let Q * = Rank(j * ). We say that the set Rank(j + 1) \ Rank(j) contains the states of rank j + 1, for all j ≥ 0. We now prove that Q ≈ AS ⊆ Q * . Assume towards a contradiction that X = Q ≈ AS \ Q * = ∅. For all states q ∈ X and all σ ∈ Allow([q] ≈ , Q ≈ AS ), we have Post H σ (q) ∩ X = ∅, because otherwise q would have been in Q * . Hence, for all q ∈ X and all σ ∈ Allow([q] ≈ , Q ≈ AS ), there exists a q ′ ∈ X such that (q, σ, q ′ ) ∈ ∆ H . Fix a strategy β H for Player 2 as follows: for a state q ∈ X and the input letter σ ∈ Allow([q] ≈ , Q ≈ AS ), choose a successor q ′ ∈ X such that (q, σ, q ′ ) ∈ ∆ H . Consider a state q ∈ X and an equivalence-preserving almost-winning strategy α H for Player 1 from q for the objective Buchi(B T ). By Lemma 4.8, for every prefix ρ satisfying the condition of Lemma 4.8, we have Supp(α H (ρ)) ⊆ Allow([Last(ρ)] ≈ , Q ≈ AS ). It follows that Pr α H ,β H q (Safe(X)) = 1. Since B T ∩ Q ≈ AS ⊆ Q * , it follows that B T ∩ X = ∅. Hence Pr α H ,β H q (Reach(B T )) = 0, and therefore Pr α H ,β H q (Buchi(B T )) = 0. This contradicts the fact that α H is an almost-winning strategy.
Direct symbolic algorithm. As in Section 3.2, the subset structure H does not have to be constructed explicitly. Instead, we can evaluate a fixed-point formula on a well-chosen lattice. The fixed-point formula to compute the set Q ≈ AS is evaluated on the lattice 2 Q , ⊆ , ∪, ∩, Q, ∅ . It is easy to show that the sets computed by the fixed-point algorithm are downward closed for the following order on Q: for (s, ℓ), (s ′ , ℓ ′ ) ∈ Q, let (s, ℓ) (s ′ , ℓ ′ ) iff ℓ = ℓ ′ and s ⊆ s ′ . Then, we can define an antichain over Q as a set of pairwiseincomparable elements of Q, and compute the almost-sure winning states in the lattice of antichains over Q, without explicitly constructing the exponential game structure H.

Lower Bounds
We show that deciding the existence of a deterministic (resp. randomized) observationbased sure-winning (resp. almost-winning) strategy for Player 1 in games of imperfect information is Exptime-hard already for reachability objectives. A first proof for sure-winning was given in [19]. We give all the details of the reduction used in the proof and show that it extends to almost winning as well.
Sure winning. To show the lower bound result, we use a reduction of the membership problem for polynomial space Alternating Turing Machine. An alternating Turing machine (ATM) is a tuple M = Q, q 0 , g, Σ i , Σ t , δ, F where: • Q is a finite set of control states; • q 0 ∈ Q is the initial state; • g : Q → {∧, ∨}; • Σ i = {0, 1} is the input alphabet; • Σ t = {0, 1, 2} is the tape alphabet and 2 is the blank symbol; 1} is a transition relation; and • F ⊆ Q is the set of accepting states. We say that M is a polynomial space ATM if for some polynomial p(·), the space used by M on any input word w is bounded by p(|w|).
Without loss of generality, we make the hypothesis that the initial control state of the machine is a ∨-state and that transitions link ∨-state to ∧-state and vice versa. A word w is accepted by an ATM M if there exists a run tree of M on w whose all leaf nodes are accepting configurations (see [3] for details). The AND-OR graph of the polynomial space ATM (M, p) on the input word w ∈ Σ * is G(M, p) = S ∨ , S ∧ , s 0 , ⇒, R where • ((q 1 , h 1 , t 1 ), (q 2 , h 2 , t 2 )) ∈⇒ iff there exists (q 1 , t 1 (h 1 ), q, γ, d) ∈ δ such that q 2 = q, h 2 = h 1 + d, t 2 (h 1 ) = γ and t 2 (i) = t 1 (i) for all i = h 1 ; p). The membership problem is to decide if a given word w is accepted by a given polynomial space ATM (M, p). This problem is known to be ExpTime-hard [3].
Idea of the reduction. Given a polynomial space ATM M and a word w, we construct a game of size polynomial in the size of (M, w) to simulate the execution of M on w. Player 1 makes choices in ∨-states and Player 2 makes choices in ∧-states. Furthermore, Player 1 is responsible for maintaining the symbol under the tape head. The objective is to reach an accepting configuration of the ATM.
Each turn proceeds as follows. In an ∨-state, by choosing a letter (t, a) in the alphabet of the game, Player 1 reveals (i) the transition t of the ATM that he has chosen (this way he also reveals the symbol that is currently under the tape head) and (ii) the symbol a under the next position of the tape head. If Player 1 lies about the current or the next symbol under the tape head, he should loose the game, otherwise the game proceeds. The machine is now in an ∧-state and Player 1 has no choice: he announces a special symbol ǫ and Player 2, by resolving nondeterminism on ǫ, chooses a transition of the Turing machine which is compatible with the current symbol under the tape head revealed by Player 1 at the previous turn. The state of the ATM is updated and the game proceeds. The transition chosen by Player 2 is visible in the next state of the game and so Player 1 can update his knowledge about the configuration of the ATM. Player 1 wins whenever an accepting configuration of the ATM is reached, that is w is accepted.
The difficulty is to ensure that Player 1 looses when he announces a wrong content of the cell under the tape head. As the number of configurations of the polynomial ATM is exponential, we cannot directly encode the full configuration of the ATM in the states of the game. To overcome this difficulty, we use the power of imperfect information as follows. Initially, Player 2 chooses a position k, 1 ≤ k ≤ p(|w|), on the tape: this number as well as the symbol σ ∈ {0, 1, 2} that lies in the tape cell number k is maintained all along the game in the non-observable portion of the game states. The pair (σ, k) is thus private to Player 2 and invisible to Player 1. Hence, at any point in the game, Player 2 can check whether Player 1 is lying when announcing the content of cell number k, and go to a sink state if Player 1 cheats (no other states can be reached from there). Since Player 1 does not know which cell is monitored by Player 2 (k is private), to avoid loosing, he should not lie about any of the tape cells and thus he should faithfully simulate the machine. Then, he wins the game if and only if the ATM accepts the words w.
Almost winning. To establish lower bound for almost-winning, we can use the same reduction. Randomization can not help Player I in this game. Indeed, at any point of the game, if Player I takes a chance in either not faithfully simulating the ATM or lying about the symbol under the tape head, the sink state is reached. In those case, the probability to reach the sink state is positive and so the probability to win the game is strictly less than one. We now present the details of the reduction of the hardness proof.
Reduction. Given a polynomial space ATM (M, p), with M = Q, q 0 , g, Σ i , Σ t , δ, F and a word w, we construct the following game structure G M,p,w = L, l 0 , Σ, ∆, O, γ , where: . . , p(|w|)} × Σ t . A state (t, q, h, k, σ) consists of a transition t ∈ δ of the ATM chosen by Player 2 at the previous round or − if this is the first round where Player 1 plays, the current control state q of M , the position h of the tape head, the pair (k, σ) such that the k-th symbol of the tape is σ, this pair (k, σ) will be kept invisible for Player 1. L 2 = Q × {1, . . . , p(|w|)} × Σ t × {1, . . . , p(|w|)} × Σ t . A state (q, h, γ, k, σ) consists of q, h, k, σ as in L 1 and γ is the symbol that Player 1 claims to be under the tape head. The objective for Player 1 will be to reach a state ℓ ∈ L associated with an accepting control state of M . • l 0 = init. • Σ = {ǫ} ∪ (δ × Σ t ).
It follows that Player 1 has an observation-based sure-winning (or almost-winning) strategy in the game G M,p,w for the objective φ iff the word w is accepted by the polynomial space ATM (M, p). This gives us Lemma 5.1 and Theorem 5.2 follows from the lemma.
Lemma 5.1. Player 1 has a deterministic (resp. randomized) observation-based surewinning (resp. almost-winning) strategy in the game G M,p,w for the objective φ iff the word w is accepted by the polynomial space ATM (M, p). Theorem 5.2 (Lower bounds). Let G be a game structure of imperfect information, let T be a set of observations, and let ℓ be a state of G. Deciding whether ℓ is a sure-winning state in G for the reachability objective Reach(T ) is Exptime-hard. Deciding whether ℓ is an almost-winning state in G for Reach(T ) is also Exptime-hard.