A Bit of Nondeterminism Makes Pushdown Automata Expressive and Succinct

We study the expressiveness and succinctness of history-deterministic pushdown automata (HD-PDA) over finite words, that is, pushdown automata whose nondeterminism can be resolved based on the run constructed so far, but independently of the remainder of the input word. These are also known as good-for-games pushdown automata. We prove that HD-PDA recognise more languages than deterministic PDA (DPDA) but not all context-free languages (CFL). This class is orthogonal to unambiguous CFL. We further show that HD-PDA can be exponentially more succinct than DPDA, while PDA can be double-exponentially more succinct than HD-PDA. We also study HDness in visibly pushdown automata (VPA), which enjoy better closure properties than PDA, and for which we show that deciding HDness is ExpTime-complete. HD-VPA can be exponentially more succinct than deterministic VPA, while VPA can be exponentially more succinct than HD-VPA. Both of these lower bounds are tight. We then compare HD-PDA with PDA for which composition with games is well-behaved, i.e. good-for-games automata. We show that these two notions coincide, but only if we consider potentially infinitely branching games. Finally, we study the complexity of resolving nondeterminism in HD-PDA. Every HDPDA has a positional resolver, a function that resolves nondeterminism and that is only dependant on the current configuration. Pushdown transducers are sufficient to implement the resolvers of HD-VPA, but not those of HD-PDA. HD-PDA with finite-state resolvers are determinisable.


Introduction
Nondeterminism adds both expressiveness and succinctness to deterministic pushdown automata.Indeed, the class of context-free languages (CFL), recognised by nondeterministic pushdown automata (PDA), is strictly larger than the class of deterministic context-free languages (DCFL), recognised by deterministic pushdown automata (DPDA), both over finite and infinite words.Even when restricted to languages in DCFL, there is no computable bound on the relative succinctness of PDA [Har80,Val76].In other words, nondeterminism is remarkably powerful, even for representing deterministic languages.The cost of such succinct representations is algorithmic: problems such as universality and solving games with a CFL winning condition are undecidable for PDA [Fin01,HU79], while they are decidable for DPDA [Wal01].Intermediate forms of automata that lie between deterministic and nondeterministic models have the potential to mitigate some of the disadvantages of fully nondeterministic automata while retaining some of the benefits of the deterministic ones.
Unambiguity and bounded ambiguity, for example, restrict nondeterminism by requiring words to have at most one or at most k, for some fixed k, accepting runs.Holzer and Kutrib survey the noncomputable succinctness gaps between unambiguous PDA and both PDA and DPDA [HK10], while Okhotin and Salomaa show that unambiguous visibly pushdown automata are exponentially more succinct than DPDA [OS15].Universality of unambiguous PDA is decidable, as it is decidable for unambiguous context-free grammars [SS78], which are effectively equivalent [Her97].However, to the best of our knowledge, unambiguity is not known to reduce the algorithmic complexity of solving games with a context-free winning condition.
Another important type of restricted nondeterminism that is known to reduce the complexity of universality and solving games has been studied under the names of goodfor-games nondeterminism [HP06] and history-determinism [Col09] (HD) 1 .Intuitively, a nondeterministic automaton is HD if its nondeterminism can be resolved on-the-fly, i.e. without knowledge of the remainder of the input word to be processed.
For finite automata on finite words, where nondeterminism adds succinctness, but not expressiveness, HD nondeterminism does not even add succinctness: every HD-NFA contains an equivalent DFA [BKS17], which can be obtained by pruning transitions from the HD-NFA.Thus, HD-NFA cannot be more succinct than DFA.But for finite automata on infinite words, where nondeterminism again only adds succinctness, but not expressiveness, HD coBüchi automata can be exponentially more succinct than deterministic automata [KS15].Finally, for certain quantitative automata over infinite words, HD nondeterminism adds as much expressiveness as arbitrary nondeterminism [Col09].
Recently, pushdown automata on infinite words with HD nondeterminism (ω-HD-PDA) were shown to be strictly more expressive than ω-DPDA, while universality and solving games for ω-HD-PDA are not harder than for ω-DPDA [LZ22].Thus, HD nondeterminism adds expressiveness without increasing the complexity of these problems, i.e. pushdown automata with HD nondeterminism induce a novel and intriguing class of context-free ω-languages.
1 After the publication of the conference version [GJLZ21], which uses the term good-for-games, it has come to light that the notions of history-determinism and good-for-gameness do not always coincide [BL21].We therefore prefer to use the term history-determinism as it corresponds better to our definitions.In Section 8, we show that at least for pushdown automata over finite words and potentially infinitely branching games, the two notions coincide.
Structure of the paper.We begin with some preliminaries in Section 2 and introduce HD-PDA in Section 3. We study their expressiveness in Section 4 by comparing them to DPDA, PDA, and unambiguous PDA.Afterwards, in Section 5, we exhibit the succinctness of HD-PDA in relation to DPDA and PDA.The important subclass of HD-VPA is studied in Section 6, focusing on succinctness, decidability of HDness, and connections to the goodenough synthesis problem.Closure properties of HD-PDA are studied in Section 7 while we compare the notions of HDness and good-for-gameness for PDA on finite words in Section 8.Then, in Section 9, we solve synthesis with specifications given by HD-PDA as well as universality of such automata.Next, the resources required to resolve the nondeterminism in an HD-PDA are studied in Section 10 before we conclude with some open problems in Section 11.

Previous version.
A conference version of this paper was presented at MFCS 2021 [GJLZ21].The present article contains all proofs, a more detailed discussion of closure properties, and a new section on compositionality.
Related work.The notion of HD nondeterminism has emerged independently several times, at least as Colcombet's history-determinism [Col09], in Piterman and Henzinger's HD automata [HP06], and as Kupferman, Safra, and Vardi's nondeterminism for recognising derived languages, that is, the language of trees of which all branches are in a regular language [KSV06].Related notions have also emerged in the context of XML document parsing.Indeed, preorder typed visibly pushdown languages and 1-pass preorder typeable tree languages, considered by Kumar, Madhusudan, and Viswanathan [KMV07] and Martens, Neven, Schwentick, and Bex [MNSB06] respectively, also consider nondeterminism which can be resolved on-the-fly.However, the restrictions there are stronger than simple HD nondeterminism, as they also require the typing to be unique, roughly corresponding to unambiguity in automata models and grammars.This motivates the further study of unambiguous HD automata, although this remains out of scope for the present paper.The XML extension AXML has also inspired Active Context Free Games [MSS06], in which one player, aiming to produce a word within a target regular language, chooses positions on a word and the other player chooses a rewriting rule from a context-free grammar.Restricting the strategies of the first player to moving from left to right makes finding the winner decidable [MSS06,BSSK13]; however, since the player still knows the future of the word, this restriction is not directly comparable to HD nondeterminism.
Unambiguity, or bounded ambiguity, is an orthogonal way of restricting nondeterminism by limiting the number of permitted accepting runs per word.For regular languages, it leads to polynomial equivalence and containment algorithms [SI85].Minimization remains NP-complete for both unambiguous automata [JR93,BM12] and HD automata [Sch20] (at least when acceptance is defined on states, see [RK19]).On pushdown automata, increasing the permitted degree of ambiguity leads to both greater expressiveness and unbounded succinctness [Her97].Finally, let us mention two more ways of measuring-and restrictingnondeterminism in PDA: bounded nondeterminism, as studied by Herzog [Her97] counts the branching in the run-tree of a word, while the minmax measure [SY93,GLW05] counts the number of nondeterministic guesses required to accept a word.The natural generalisation of history-determinism as the width of an automaton [KM19] has not yet, to the best of our knowledge, been studied for PDA.

Preliminaries
An alphabet Σ is a finite nonempty set of letters.The empty word is denoted by ε, the length of a word w is denoted by |w|, and the n th letter of w is denoted by w(n) (starting with n = 0).The set of (finite) words over Σ is denoted by Σ * , the set of nonempty (finite) words over Σ by Σ + , and the set of finite words of length at most n by Σ ⩽n .A language over Σ is a subset of Σ * .
The PDA P from Example 2.2.Grey states are final, and X is an arbitrary stack symbol.
A run of P is a finite sequence n of configurations and transitions with c 0 being the initial configuration and c n ′ We say that ρ is accepting if it ends in a configuration whose state is final.The language L(P) recognized by P contains all w ∈ Σ * such that P has an accepting run on w.
Hence, whenever convenient, we treat a sequence of transitions as a run if it indeed induces one (not every such sequence does induce a run, e.g. if a transition τ n ′ is not enabled in c n ′ ).
We say that a PDA P is deterministic (DPDA) if • every mode of P enables at most one a-transition for every a ∈ Σ ∪ {ε}, and • for every mode of P, if it enables some ε-transition, then it does not enable any Σ-transition.Hence, for every input and for every run prefix on it there is at most one enabled transition to continue the run.Still, due to the existence of ε-transitions, a DPDA can have more than one run on a given input.However, these only differ by trailing ε-transitions.
The class of languages recognized by PDA is denoted by CFL, the class of languages recognized by DPDA by DCFL.
Example 2.2.The PDA P depicted in Figure 1 recognises the language

History-deterministic Pushdown Automata
Here, we introduce history-deterministic pushdown automata on finite words (HD-PDA for short), nondeterministic pushdown automata whose nondeterminism can be resolved based on the run prefix constructed so far and on the next input letter to be processed, but independently of the continuation of the input beyond the next letter.
As an example, consider the PDA P from Example 2.2.It is nondeterministic, but knowing whether the first transition of the run processed an a or a b allows the nondeterminism to be resolved in a configuration of the form (q, γN ) when processing a d: in the former case, take the transition to state q 1 , in the latter case the transition to state q 2 .Afterwards, there are no nondeterministic choices to make and the resulting run is accepting whenever the input is in the language.This automaton is therefore history-deterministic.
We implement this intuition with the notion of a resolver, that is, a function that, given the run so far, and the next input letter, produces the sequence of transitions (ε and otherwise) to be taken to process this letter.After the last letter has been processed, the run produced so far must be accepting, without trailing ε-transitions.We will show that this is not a serious restriction at the end of the section.
Fix a PDA P = (Q, Σ, Γ, q I , ∆, F ).We say that a run A (nondeterminism) resolver for P is a function r : ∆ * × Σ → ∆ such that for every w ∈ L(P), there is an accepting run ρ = c 0 τ 0 • • • τ n c n+1 on w that has no trailing ε-transitions and satisfies is defined for all 0 ⩽ n ′ < n, as ρ has no trailing ε-transitions.Note that ρ is unique if it exists.We say that P is history-deterministic (HD) if it has a resolver.We denote the class of languages recognised by HD-PDA by HD-CFL.
Note that the prefix processed so far can be recovered from r's input, i.e. it is ℓ(ρ).However, the converse is not true due to the existence of ε-transitions.This is the reason that the run prefix and not the input prefix is the argument for the resolver.
We require the run induced by a resolver to have no trailing ε-transitions, as a resolver requires the next letter to be processed as part of its input.This is obviously undefined once the input has ended.Nevertheless, at the end of this section, we study a variant of resolvers with end-of-word markers, which will allow trailing ε-transitions, showing that they do not add expressiveness.
Intuitively, every DPDA should be HD, as there is no nondeterminism to resolve during a run.However, in order to reach a final state, a run of a DPDA on some input w may traverse trailing ε-transitions after the last letter of w is processed.On the other hand, the run of an HD-PDA on w consistent with any resolver has to end with the transition processing the last letter of w.Hence, not every DPDA recognises the same language when viewed as an HD-PDA.Nevertheless, we show, using standard pushdown automata constructions, that every DPDA can be turned into an equivalent HD-PDA.As every HD-PDA is a PDA by definition, we obtain a hierarchy of languages.
Proof.We only consider the first inclusion, as the second one is trivial.So, let L ∈ DCFL, say it is recognised by the DPDA P = (Q, Σ, Γ, q I , ∆, F ).We say that a mode m of P is a reading mode if it does not enable an ε-transition.Hence, due to determinism, a reading mode m can only enable at most one a-transition for every a ∈ Σ.Now, consider some nonempty word w(0) • • • w(n) ∈ L(P) (we take care of the empty word later on), say with accepting run ρ (treated, for notational convenience, as a sequence of transitions).This run can be decomposed as where τ i processes w(i) and each ρ i is a (possibly empty) sequence of ε-transitions.Each run prefix induced by some ρ 0 τ 0 ρ 1 • • • ρ i ends in a configuration with reading mode.
Intuitively, we have to eliminate the trailing ε-transitions in ρ n+1 .To do so, we postpone the processing of each letter w(i) to the end of ρ i+1 .Instead, we guess that the next input is w(i) by turning the original w(i)-transition τ i into an ε-transition τ ′ i that stores w(i) in the state space of the modified automaton.Then, ρ i+1 is simulated and a dummy transition τ d i processing the stored letter w(i) is executed.Hence, the resulting run of the modified automaton on w has the form where each τ ′ i is now an ε-transition, each ρ i is a (possibly empty) sequence of ε-transitions, and each τ d i is a dummy transition processing w(i).Hence, the run ends with the transition processing the last letter w(n) of the input.
The resulting PDA is HD, as a resolver has access to the next letter to be processed, which is sufficient to resolve the nondeterminism introduced by the guessing of the next letter.
More formally, consider the PDA and I = ∅, otherwise, and • ∆ ′ is the union of the following sets of transitions: -{τ ∈ ∆ | ℓ(τ ) = ε}, which is used to simulate the leading sequence of ε-transitions before the first letter is processed by P, i.e. the transitions in ρ 0 above.-{(q, X, ε, (q ′ , a), γ) | (q, X, a, q ′ , γ) ∈ ∆ and a ∈ Σ}, which are used to guess and store the next letter to be processed, i.e. the transitions τ ′ i above.-{((q, a), X, ε, (q ′ , a), γ) | (q, X, ε, q ′ , γ) ∈ ∆}, which are used to simulate ε-transitions after a letter has been guessed, but not yet processed, i.e. transitions in some ρ i with i > 0 above.-{((q, a), X, a, q, X) | (q, X) is a reading mode}, the dummy transitions used to actually process the guessed and stored letter.Now, formalising the intuition given above, one can show that P ′ has a resolver witnessing that it recognises L(P).In particular, the empty word is in L(P ′ ) if and only if it is in L(P), as the run induced by the resolver on ε ends in the initial configuration, which is final if and only if ε ∈ L(P).
HD-PDA are by definition required to end their run with the last letter of the input word.Instead, one could also consider a model where they are allowed to take some trailing ε-transitions after the last input letter has been processed.As a resolver has access to the next input letter, which is undefined in this case, we need resolvers with end-of-word markers to signal the resolver that the last letter has been processed.In the following, we show that HD-PDA with end-of-word resolvers are as expressive as standard HD-PDA, albeit exponentially more succinct.
Fix some distinguished end-of-word-marker #, which takes the role of the next input letter to be processed, if there is none after the last letter of the input word is processed.Let P = (Q, Σ, Γ, q I , ∆, F ) be a PDA with # / ∈ Σ.An EoW-resolver for P is a function r : ∆ * × (Σ ∪ {#}) → ∆ such that for every w ∈ L(P), there is an accepting run c 0 τ 0 ) for all 0 ⩽ n ′ < n.Note that the second argument given to the resolver is a letter of w#, which is equal to # if the run prefix induced by τ 0 • • • τ n ′ −1 has already processed the full input w.Lemma 3.2.HD-PDA with EoW-resolvers are as expressive as HD-PDA.
Proof.A (standard) resolver can be turned into an EoW-resolver that ignores the EoWmarker.Hence, every HD-PDA is an HD-PDA with EoW-resolver recognizing the same language.So, it only remains to consider the other inclusion.
To this end, let P = (Q, Σ, Γ, q I , ∆, F ) be a PDA with EoW-resolver.The language encoding final configurations of P is regular.Hence, the language can be shown to be regular as well by applying saturation techniques [Bü64] 3 to the restriction of P to ε-transitions.If P reaches a configuration c ∈ C after processing an input w, then w ∈ L, even if c's state is not final.Let A = (Q A , Γ ⊥ ∪Q, q A I , δ A , F A ) be a DFA recognizing C. We extend the stack alphabet of P to Γ × Q A × (Q A ∪ {u}), where u is a fresh symbol.Then, we extend the transition relation such that it keeps track of the unique run of A on the stack content: If P reaches a stack content ⊥(X 1 , q 1 , q ′ 1 )(X 2 , q 2 , q ′ 2 ) • • • (X s , q s , q ′ s ), then we have for every 1 ⩽ j ⩽ s as well as q ′ j = q j−1 for every 2 ⩽ j ⩽ s and q ′ 1 = u.Here, δ * A is the standard extension of δ A to words.The adapted PDA is still HD, as no new nondeterminism has been introduced, and keeps track of the state of A reached by processing the stack content as well as the shifted sequence of states of A, which is useful when popping the top stack symbol: If the topmost stack symbol (X, q, q ′ ) is popped of the stack then q ′ is the state of A reached when processing the remaining stack.Now, we double the state space of P, making one copy final, and adapt the transition relation again so that a final state is reached whenever P would reach a configuration in C. Whether a configuration in C is reached can be determined from the current state of P being simulated, as well as the top stack symbol containing information on the run of A on the current stack content.The resulting PDA P ′ recognises L(P) and has on every word w ∈ L(P) an accepting run without trailing ε-transitions.Furthermore, an EoW-resolver for P can be turned into a (standard) resolver for P ′ , as the tracking of stack contents and the doubling of the state space does not introduce nondeterminism.
As A has at most exponential size, P ′ is also exponential (both in the size of P).This exponential blowup incurred by removing the end-of-word marker is in general unavoidable.In Theorem 5.1, we show that the language L n of bit strings whose n th bit from the end is a 1 requires exponentially-sized HD-PDA.On the other hand, it is straightforward to devise a polynomially-sized HD-PDA P EoW with EoW-marker recognizing L n : the underlying PDA stores the input word on the stack, guesses nondeterministically that the word has ended, uses n (trailing) ε-transitions to pop of the last n − 1 letters stored on the stack, and then checks that the topmost stack symbol is a 1.With an EoW-resolver, the end of the input does not have to be guessed, but is marked by the EoW-marker.Hence, P EoW is HD.
Finally, let us remark that the history-determinism of PDA and context-free languages is undecidable.These problems were shown to be undecidable for ω-HD-PDA and ω-HD-CFL by reductions from the inclusion and universality problem for PDA on finite words [LZ22, Grey states are final, and X is an arbitrary stack symbol.
Theorem 6.1].The same reductions also show that these problems are undecidable over PDA on finite words.
Theorem 3.3.The following problems are undecidable: (1) Given a PDA P, is P an HD-PDA?

Expressiveness
Here we show that HD-PDA are more expressive than DPDA but less expressive than PDA.
To show that HD-PDA are more expressive than deterministic ones, we consider the language B 2 = {a i $a j $b k $ | k ⩽ max(i, j)}.It is recognised by the PDA P B 2 depicted in Figure 2, hence B 2 ∈ CFL.We show that B 2 ∈ HD-CFL by proving that P B 2 is history-deterministic: the only nondeterministic choice, between moving to p 1 or to p 2 upon reading the second $, can be made only based on the prefix a i $a j processed so far, which deterministically lead to q 2 .Lemma 4.2.B 2 ∈ HD-CFL.
Proof.Let us summarise the behaviour of the pushdown automaton P B 2 recognising B 2 (see Figure 2).First, the automaton copies the two blocks of a's on the stack.Then, when it processes the second $, it transitions nondeterministically to either p 1 or p 2 .In p 1 , it erases the second a-block from the stack, so that the first block is at the top of the stack, and then transitions to p 2 .In p 2 , the automaton compares the number of b's in the input with the number of a's in the topmost block of the stack.If the latter is larger than or equal to the former, P B 2 pops one a for each b in the input, and then transitions to the final state when it processes the third $.
When processing the second $, knowing whether the first or second block of the prefix contains more a's allows the nondeterminism to be resolved: if the first block contains more a's, take the transition to the state p 1 , if the second block contains more a's, take the transition to the state p 2 .Now, in order to show that B 2 is not in DCF L, we prove that its complement B c 2 is not a context-free language.Since DCF L is closed under complementation, this implies the desired result.
Lemma 4.3.The complement B c 2 of B 2 is not in CF L. Proof.Assume, for the sake of contradiction, that the complement B c 2 of B 2 is in CF L. Now consider the regular language Since the intersection of a context-free language and a regular language is context-free, we have that B c 2 ∩ A ∈ CF L. Therefore, B c 2 ∩ A satisfies the pumping lemma for context-free languages (see, e.g.[HU79]): there exists m ∈ N such that the word 2 ∩ A for every n ⩾ 0. Note that Item 2 directly implies that both v and x are in the language {a} * ∪ {b} * , as otherwise uv 2 wx 2 y is not in A. On top of that, Item 1 implies that either v or x is in {a} + ∪ {b} + .We conclude by proving, through a case distinction, that Item 2 cannot hold as either uwy or uv 2 wx 2 y is in B 2 .
In both cases, we get uv 2 wx 2 y ∈ B 2 .• Assume that either v or x is in {b} + .Then pumping v and x down in z reduces the size of at most one of the a-blocks: In both cases, we get uwy ∈ B 2 .As every possible case results in a contradiction, our initial hypothesis is false: B c 2 ̸ ∈ CFL.The previous two lemmata and Lemma 3.1 yield DCFL ⊊ HD-CFL.Finally, to show that PDA are more expressive than HD-PDA, we consider the language We note that L ∈ CFL while we show below L / ∈ HD-CFL using arguments similar to the classical proof showing that L is not DCFL.Hence, the following lemma completes the proof of Theorem 4.1.
Proof.We show that there does not exist an HD-PDA recognising L. In fact, we show that if there exists an HD-PDA P recognising L, then we can construct a PDA P recognising the language A straightforward application of the pumping lemma for context-free languages (see, e.g.[HU79]) to words of the form a n b n c n shows that L is not in CFL.Thus, we reach a contradiction.The idea behind the construction is to replicate the part of the control unit of P which processes the suffix b n of an input word a n b 2n with the difference that in the newly added parts, the transitions caused by input symbol b are replaced with similar ones for input symbol c.This new part of the control unit may be entered after P has processed a n b n .
Now we show that L( P) = L. First we show that L( P) ⊆ L. Consider a word w ∈ L( P).There may be two cases: (i) Assume P has an accepting run on w that does not visit a state in Q.In this case, we have that w is in L(P) = L ⊆ L. (ii) Assume there exists an accepting run of P on w that visits a state in Q.Since P recognises L, and by construction of P, a state q i ∈ Q can be reached from a state q i ∈ Q only if q i ∈ F and q i ∈ F , and the corresponding transition is an ε-transition, we have that starting from the initial configuration (q I , ⊥), a state in Q is reached for the first time only after processing an input prefix a n b n or a n b 2n for some n ⩾ 0. If this prefix of w is a n b 2n , then w = a n b 2n .This is because if w = a n b 2n c m for some m > 0 (recall that after visiting a state q i in Q, the only non-ε transitions possible are on the letter c), then by the construction of P, we have that P can accept the word a n b 2n b m which is not in the language L. On the other hand, let the prefix be a n b n when a state q i ∈ Q is visited for the first time.Note that q i ∈ F , and let ( q i , γ i ) be the corresponding configuration.If a sequence of transitions τ i , . . ., τ j from ( q i , γ i ) to ( q j , γ j ) is possible such that not all of τ i , . . ., τ j are ε-transitions, that is, the transitions process c m for some m ∈ N, and q j ∈ F , then a sequence of transitions τ i , . . ., τ j of the same length processing b m is possible from (q i , γ i ) to (q j , γ j ) with q j ∈ F .Since this leads to an accepting run from (q I , ⊥) to (q j , γ j ) while visiting only the states in Q on processing a n b n b m with m > 0, we have m = n, and hence w = a n b n c n ∈ L.
On the other hand, if all transitions τ i , . . ., τ j are ε-transitions, then w = a n b n ∈ L.
Now we prove the other direction, that is L ⊆ L( P).Here, we rely on the fact that the accepting run of P on a n b n induced by a resolver r is a prefix of the accepting run of P on a n b 2n induced by r.This allows to switch to the copied states Q after processing a n b n and then process c n instead of b n .
Consider a word w ∈ L such that w ∈ L. By construction of P, we have that w ∈ L( P) since P accepts all words that are also accepted by P. Now suppose that w ∈ L but w / ∈ L, that is, w is of the form a n b n c n for some n ⩾ 1.Since by assumption, we have that P is an HD-PDA recognising the language L, there exists a resolver r that for every word in L induces an accepting run of the word in L. Let (q i , γ i ) be the configuration of P reached after processing the prefix a n b n in the run induced by r on the input a n b 2n .
Note that q i ∈ F since r also induces an accepting run for the input a n b n .Now if for the input a n b 2n , the sequence of transitions chosen by r from (q i , γ i ) after processing a n b n is τ i , τ i+1 , . . ., τ j with (q i , γ i ) − → (q j , γ j ), with q j ∈ F , and the sequence τ i , . . ., τ j processes b n , then by the construction of P, there exists a sequence of transitions τ i , τ i+1 , . . ., τ j with ( q i , γ i ) and with q j ∈ F such that there is an ε-transition from (q i , γ i ) to ( q i , γ i ) and the sequence τ i , τ i+1 , . . ., τ j processes c n , and hence w ∈ L( P).
Thus we have that L = L( P).Hence we show that if P is an HD-PDA, then we can construct a PDA P recognising L which is not a CFL, thus leading to a contradiction to our assumption that L is in HD-CFL.3:13 Unambiguous context-free languages, i.e. those generated by grammars for which every word in the language has a unique leftmost derivation, are another class sitting between DCFL and CFL.Thus, it is natural to ask how unambiguity and history-determinism are related: To conclude this section, we show that both notions are independent.
Theorem 4.5.There is an unambiguous context-free language that is not in HD-CFL and a language in HD-CFL that is inherently ambiguous.

An unambiguous grammar for the language {a
∈ HD-CFL is easy to construct and we show that the language B = {a i b j c k | i, j, k ⩾ 1, k ⩽ max(i, j)} is inherently ambiguous.Its inclusion in HD-CFL is easily established using a similar argument as for the language B 2 = {a i $a j $b k $ | k ⩽ max(i, j)} above.The dollars add clarity to the HD-PDA but are cumbersome in the proof of inherent ambiguity.
We show that B = {a i b j c k | i, j, k ⩾ 1, k ⩽ max(i, j)} is inherently ambiguous, i.e. for every grammar generating B there is at least one word that has two different leftmost derivations.
We use standard definitions and notation for context-free grammars as in [HU79, Section 4.2].We say that a grammar is reduced, if every variable is reachable from the start variable, every variable can be reduced to a word of terminals, and for no variable A, it holds that (2) all variables, possibly other than the start variable S, belong to D(G), and (3) either S ∈ D(G) or S occurs only once in the leftmost derivation of any word in L(G).Now we state the following lemma from [Mau69].
Lemma 4.6.For every unambiguous CFG G, there exists an unambiguous almost-looping An example of an almost-looping grammar for language B is the following: Now we prove the following, using techniques inspired by Maurer's proof that Lemma 4.7.The language B is inherently ambiguous.
Proof.Assume, towards a contradiction, that G is an unambiguous grammar for B, which, from Lemma 4.6, we can assume, without loss of generality, to be an almost-looping grammar.Let A be a variable of G.
(1) A is of Type 1 if there is a derivation A * =⇒ xAy where xy = a n A,1 for some n A,1 > 0.
(2) A is of Type 2 if there is a derivation A * =⇒ xAy where xy = b n A,2 for some n A,2 > 0.
(3) A is of Type 3 if there is a derivation A * =⇒ xAy where x = a ℓ A,3 and y = c r A,3 for some ℓ A,3 ⩾ r A,3 > 0.
(4) A is of Type 4 if there is a derivation A * =⇒ xAy where x = b ℓ A,4 and y = c r A,4 for some ℓ A,4 ⩾ r A,4 > 0.
(5) A is of Type 5 if there is a derivation A * =⇒ xAy where x = a ℓ A,5 and y = b r A,5 for some ℓ A,5 , r A,5 > 0. Note that some variables may be of multiple types (e.g. the variable D in Figure 3 has Type 2 and Type 4).
First, we show that each variable in D(G) has at least one of these five types.So, let A ∈ D(G).Then, there exists a derivation A * =⇒ xAy with xy ̸ = ε.Note that both x and y belong to a * , b * , or c * since otherwise, due to G being reduced, one could derive words that are not in the language.Next, we note that the cases where x belongs to c * , and y belongs to a * or b * cannot happen.Similarly, the case where x belongs to b * , and y belongs to a * cannot happen.Also we cannot have xy in c * , since this will allow us to have words with an arbitrary number of c's which can be more than the number of a's and b's and such a word is not in the language.
Further, we cannot have x = a ℓ and y = c r with 0 < ℓ < r.Otherwise, consider a derivation of some word in B that uses A, i.e.

Now, towards a contradiction assume we indeed have
A * =⇒ xAy with x = a ℓ , y = c r and ℓ < r.
Then, pumping q copies of x and y, for some suitable q ∈ N, yields a derivation S * =⇒ αAβ * =⇒ αx q Ay q β * =⇒ a s+ℓq b u c v+rq such that v + rq > max(s + ℓq, u), i.e. we have derived a word that is not in B. Similarly, we cannot have x = b ℓ and y = c r for some 0 < ℓ < r.Altogether, this implies that A indeed has at least one of the five types stated above.
Moreover, we claim that there is a t ∈ N such that the following three properties are true for every word w ∈ B: We prove these properties as follows: we denote by d the width of the grammar G which is the maximum number of symbols appearing on the right side of some production rule of G. Further, we denote by m the number of variables appearing in G.We argue that t = d m+1 satisfies the three properties above.We focus on Property 1, the two other proofs are similar.Suppose that w contains more than d m+1 c's and consider the derivation tree of that word.The weight ω(v) of a vertex v in the derivation tree is defined as the number of c's in the subtree rooted at v. Hence, the root of the derivation tree has at least weight d m+1 .We build a finite path v 0 , v 1 , . . ., v k from the root of this tree to one of its leaves as follows: The initial vertex v 0 is the root and at each step, we choose as successor of v i its child v i+1 with the largest weight.A vertex v i of this path is decreasing if ω(v i ) > ω(v i+1 ).There are are at least m + 1 decreasing vertices on the path because ω(v 0 ) = d m+1 , ω(v k ) = 1, and Thus, there are two decreasing vertices on the path that are labelled by the same variable A such that there is a derivation of the form A * =⇒ xAy with some c in xy.
Let p > t be a positive integer divisible by the least common multiple of the n A,i , ℓ A,i and r A,i for all A ∈ D(G) and i ∈ {1, . . ., 5}, where we define n A,1 = 1 if A is not of Type 1, and similarly for all other i > 1.We show that the word w = a 2p b 2p c 2p ∈ B has two leftmost derivations.
First consider the derivation of the word w b = a 2p b p c 2p ∈ B. As we have more than t c's in w b Property 1 shows that the derivation contains a variable of Type 3 or Type 4. Next, we argue that it cannot contain a variable of Type 4: The occurrence of such a variable would allow us to either produce a word that is not in a * b * c * or to inject b p c r for p ⩾ r > 0 leading to the derivation of a 2p b 2p c 2p+r , which is not in the language.Thus, the derivation of w b uses at least one variable of Type 3. Also, since w b has p > t b's, Property 3 implies that the (unique) leftmost derivation of w b has the form Thus A is a variable of Type 2 or Type 5 (note that we have already ruled out Type 4 above).More precisely, we have that x belongs to a + or b * and y = b j for some j ∈ N. Now we show that the case where x belongs to a + is not possible.Assume for contradiction that x = a i for some i > 0. Then we also have the derivation Therefore, A is a Type 2 variable that is used in the derivation of w b , which can be used to inject another b p , yielding a derivation of w.Thus, we have exhibited a derivation of w that uses a variable of Type 3. Now consider a derivation of the word w a = a p b 2p c 2p .Such a derivation cannot contain a variable of Type 3 since this allows us either to produce a word that is not in a * b * c * or to inject a p c r for p ⩾ r > 0, leading to the derivation of a 2p b 2p c 2p+r / ∈ B. Further, arguing as above, some variable of Type 1 must appear in the derivation of w a that is used to obtain sufficient number of a's in the derivation of w a .Such a variable of Type 1 can be used to inject a p into w a which leads to the derivation of w.Thus, we have exhibited a derivation of w that does not contain a variable of Type 3.
Altogether, there are two different leftmost derivations of the word w.Thus, G is not unambiguous, yielding the desired contradiction.

Succinctness
We show that HD-PDA are not only more expressive than DPDA, but also more succinct.Similarly, we show that PDA are more succinct than HD-PDA.Recall that the size of a PDA is the sum of its state-space and stack alphabet.
Theorem 5.1.HD-PDA can be exponentially more succinct than DPDA, and PDA can be double-exponentially more succinct than HD-PDA.
We first show that HD-PDA can be exponentially more succinct than DPDA.To this end, we construct a family (C n ) n∈N of languages such that C n is recognised by an HD-PDA of size O(n), yet every DPDA recognising C n has at least exponential size in n.
Let c n ∈ (${0, 1} n ) * be the word describing an n-bit binary counter counting from 0 to 2 n − 1.For example, c 2 = $00$01$10$11.We consider the family of languages We show that the language C n is recognised by an HD-PDA of size O(n) and that every DPDA D recognising C n has exponential size in n.Observe that this result implies that even HD-PDA that are equivalent to DPDA are not determinisable by pruning.In contrast, for NFA, HDness implies determinisability by pruning [BKS17].
Lemma 5.2.The language C n is recognised by an HD-PDA of size O(n).
Proof.We define a PDA P = (Q, Σ, Γ, q I , ∆, F ) that recognises C n .The automaton P operates in three phases: a push phase, followed by a check phase, and then a final phase.These phases work as follows.Suppose that P receives an input w ∈ {0, 1, $, #} * .During the first phase, P pushes the input processed onto the stack until the sequence 1 n appears.If it never appears, the input is accepted.During this phase, P also checks whether the prefix w ′ of w processed up to this point is a sequence of counter values starting with 0 n , i.e. whether w ′ is in the language If w ′ ̸ ∈ L c , then P immediately accepts.Otherwise, P moves to the second phase.During the check phase, P pops the stack.At any point, P can nondeterministically guess that the top symbol of the stack is evidence of bad counting (details are described below).It then accepts the input if the guess was correct.If P completely pops the stack without correctly guessing an error in the counter, it moves to the final phase.Since the prefix w ′ processed up to this point ends with the sequence 1 n , if P now processes any suffix different from a single #, then the input is not equal to c n #, and can be accepted.
The stack alphabet of P has constant size 3.The push phase requires 3(n + 1) states: • First, P checks whether $0 n is a prefix of the input.This can be done with n + 2 states.
• Then, P checks whether the following {0, 1} * segments are n-bits wide, and only the last one is 1 n .This can be done with 2n + 1 additional states: repeatedly, P processes n + 1 symbols, checks whether only the first of them is a $, and keeps track of whether at least one of them is 0. We now show that 6(n + 1) additional states are enough for the check phase.To this end, we study the errors that P needs to check.Note that, to increment the counter correctly, we need to change the value of all the bits starting from the last 0, and leave the previous bits unchanged.Therefore, P can recognise with 6(n + 1) states whether the top symbol of the stack does not correspond to a correct counter increment: P pops the top n + 1 stack symbols while keeping in memory • the value of the first symbol popped; • whether we have not yet popped a $ (there is exactly one $ in the top n + 1 stack symbols, as the stack content is in L c ), or a $ but no 0 afterwards, or a $ and at least one 0 afterwards.3:17 The input is accepted whenever the first symbol popped and the top stack symbol after popping match yet no 0 has been popped between the $ and the last symbol, or they differ yet at least one 0 has been popped between the $ and the last symbol.Finally, only three states are needed for the final phase: when the bottom of the stack is reached, P transitions to a new state, and from there it checks whether the suffix is in the language {0, 1, $, #} * \ {#}.
To conclude, note that P is history-deterministic: the only nondeterministic choice happens during the check phase, and the resolver knows which symbols of the stack are evidence of bad counting.Note that this choice only depends on the current stack content.
As mentioned above, each C n is recognised by a DPDA; now we show that such a DPDA must be large.
Lemma 5.3.Every DPDA recognising the language C n has at least exponential size in n.
Proof.It is known that every DPDA can be complemented at the cost of multiplying its number of states by three [HU79].Therefore, to prove the statement, we show that even every PDA recognising the complement {c n #} of C n has at least exponential size in n: Claim 5.4.Every PDA P = (Q, Σ, Γ, q I , ∆, F ) recognising {c n #} has a size greater than 2 (n−1)/3 .To prove the claim, we transform P into a context-free grammar generating the singleton language {c n #}, and then we show that such a grammar requires exponentially many variables.This is a direct consequence of the mk Lemma [CLL + 05], but proving it directly using similar techniques yields a slightly better bound.
Before changing P into a grammar, we slightly modify its acceptance condition: we add to P a fresh final state f in which the stack can be completely popped including the bottom of stack symbol ⊥ (which normally cannot be touched according to our definition of PDA).Moreover, we allow P to transition towards f nondeterministically from all of its other final states.This new automaton, which accepts by empty stack, is easily transformed into a grammar G using the standard transformation [HU79]: • The terminals of G are 0, 1, $ and #.
• The variables of G are the triples (p, X, q), for every state p, q ∈ Q ∪ {f } and stack symbol X ∈ Γ ⊥ .• The initial variable is (q I , ⊥, f ), where q I is the initial state of P and f is the fresh final state.• Each transition (p, X, a, q, γ) ∈ ∆ yields production rules as follows: (1) If γ = ε, then G has the production rule (p, X, q) → a; (2) If γ = Y , then G has the production rule (p, X, q 1 ) → a(q, Y, q 1 ) for all q 1 ∈ Q; (3) If γ = Y Z, then G has the production rule (p, X, q 2 ) → a(q, Y, q 1 )(q 1 , Z, q 2 ) for all q 1 , q 2 ∈ Q.The variables can be interpreted as follows: for every p, q ∈ Q and X ∈ Γ, the variable (p, X, q) can be derived into any input word w ∈ {0, 1, $, #} * that P can process starting in state p and ending in state q while consuming the symbol X from the top of the stack.Therefore, in particular, since the initial variable is (q I , ⊥, f ), G generates the same language as P.

Remember that c
1 represents an n-bit binary counter counting from 0 to 2 n − 1.For each 0 ⩽ i ⩽ 2 n − 1, let us consider the vertex v i of T such that the counter value d i is an infix of the derivation of v i , but of none of its children.In other words, d i is split between the derivations of the children of v i .By definition of the grammar G, each vertex of T has at most three children, hence at most two counter values can be split amongst the children of a given vertex.Therefore, for every As a consequence, the vertices v 0 , v 2 , v 4 , . . ., v 2 n −2 are all distinct.Finally, since c n # is the only word recognised by P and each counter value d ∈ {0, 1} n appears a single time as an infix of c n #, the 2 n−1 variables labelling these vertices need to be distinct.Now, we consider the gap between HD-PDA and PDA.We show that there exists a family (L n ) n>0 of languages such that L n is recognised by a PDA of size O(log n) while every HD-PDA recognising this language has at least exponential size in n.
Formally, we set L n = (0 + 1) * 1(0 + 1) n−1 , that is, the n th bit from the end is a 1.Here, we count starting from 1, so that the last bit is the 1 st bit from the end.Note that this is the standard example for showing that NFA can be exponentially more succinct than DFA, and has been used for many other succinctness results ever since.
Lemma 5.5.There exists a PDA of size O(log n) recognising L n .
Proof.We describe a PDA P n that recognises L n .The PDA P n nondeterministically guesses the n th bit from the end, checks that it is a 1 and switches to a counting gadget that checks that the word ends in n steps, as follows: (i) It pushes the binary representation of n − 2 onto the stack.For example, if n = 8, then 110 is pushed onto the stack with 0 at the top.Note that log(n − 2) states suffice for pushing the binary representation of n − 2. If n = 1, then instead of pushing anything onto the stack, the automaton directly moves to a final state without any enabled transitions.(ii) Then P n moves to a state that attempts to decrement the counter by one for each successive input letter, as follows: When an input letter is processed, it pops 0's until 1 is at the top of the stack, say m 0's.Then, it replaces the 1 with a 0, and finally pushes m 1's back onto the stack before processing the next letter.If the stack empties before a 1 is at the top of the stack, then the counter value is 0 and the automaton moves to a final state with no enabled transitions.Note that O(log n) states again suffice for this step.Thus, P n has O(log n) states.Note that for all n, P n uses only three stack symbols, 0, 1, and ⊥.Altogether, the size of P n is O(log n), and P n recognises L n .
Lemma 5.6.Every HD-PDA recognising L n has at least exponential size in n.
Towards proving this, we define the following notions.We say that a word w of length n is rotationally equivalent to a word w ′ if w ′ is obtained from w by rotating it.For example, the word w = 1101 is rotationally equivalent to w ′ = 1110 since w ′ can be obtained from w by rotating it once to the right.Note that the words that are rotationally equivalent form an equivalence class, and thus rotational equivalence partitions {0, 1} n .Since the size of each class is at most n, the number of equivalence classes is at least 2 n n .Now, we define the stack height of a configuration c = (q, γ) as sh(c) = |γ| − 1, and we define steps of a run as usual: Consider a run c 0 τ 0 c 1 τ 1 • • • c n−1 τ n−1 c n .A position s ∈ 3:19 {0, . . ., n} is a step if for all s ′ ⩾ s, we have that sh(c s ′ ) ⩾ sh(c s ), that is, the stack height is always at least sh(c s ) after position s.Any infinite run of a PDA has infinitely many steps.We have the following observation.
Remark 5.7.If two runs of a PDA have steps s 0 and s 1 , respectively, with the same mode, then the suffix of the run following the step s 0 can replace the suffix of the other run following the step s 1 , and the resulting run is a valid run of the PDA.Now, we are ready to prove Lemma 5.6.Here, we work with infinite inputs for HD-PDA.The run induced by a resolver on such an input is the limit of the runs on the prefixes.
Proof.Let P be an HD-PDA with resolver r that recognises L n .We show that |Q| • |Γ| ⩾ 2 n n , where Q is the set of states and Γ is the stack alphabet of P.
Towards a contradiction, assume that |Q| • |Γ| < 2 n n .Then there exist two words w 0 and w 1 of length n that are not rotationally equivalent and such that the runs ρ 0 and ρ 1 of P induced by r on w ω 0 and w ω 1 contain steps with the same mode, at positions s 0 and s 1 in ρ 0 and ρ 1 respectively, such that at least n letters are processed before s 0 and s 1 .Now consider in each of these two runs the sequence of input letters of length n preceding and including the step position.Let these n-letter words be w ′ 0 and w ′ 1 respectively.Since w 0 and w 1 are not rotationally equivalent, w ′ 0 and w ′ 1 differ in at least one position j ⩽ n.W.l.o.g., assume that for w ′ 0 , the bit at position j is 0, while it is 1 at position j for w ′ 1 .Since the resolver chooses a run such that for every word where the n th letter from the end is a 1 is accepted, this implies that ρ 0 does not visit a final state after processing j − 1 letters after s 0 , while ρ 1 visits a final state after processing j − 1 letters after s 1 .Now we reach a contradiction as follows.The suffix of ρ 0 starting from position s 0 + 1 can be replaced with the suffix of ρ 1 starting from position s 1 + 1.By Remark 5.7, this yields a valid run ρ of P.However, since the state that occurs after j − 1 letters are processed after position s 1 in ρ 1 is final, after the replacement, the state that occurs after j − 1 letters are processed after position s 0 in ρ is final as well.However the n th letter from the end of the word processed by this accepting run of P is a 0, contradicting that P recognises L n .
Thus, we have that |Q| • |Γ| is at least equal to the number of rotationally equivalent classes, that is, |Q| • |Γ| ⩾ 2 n n .Hence, the size of P is at least 2 n 2 √ n .

History-deterministic Visibly Pushdown Automata
As we shall see in Section 7, one downside of HD-PDA is that, like ω-HD-PDA, they have poor closure properties and checking HDness is undecidable.We therefore consider a wellbehaved class of HD-PDA, namely HD visibly pushdown automata, HD-VPA for short, that is closed under union, intersection, and complementation.Let Σ c , Σ r and Σ int be three disjoint sets of call symbols, return symbols and internal symbols respectively.Let Σ = Σ c ∪ Σ r ∪ Σ int .A visibly pushdown automaton [AM04] (VPA) P = (Q, Σ, Γ, q I , ∆, F ) is a restricted PDA that pushes onto the stack only when it reads a call symbol, it pops the stack only when a return symbol is read, and does not use the stack when there is an internal symbol.Formally, • A letter a ∈ Σ c is only processed by transitions of the form (q, X, a, q ′ , XY ) with X ∈ Γ ⊥ , i.e. some stack symbol Y ∈ Γ is pushed onto the stack.
• A letter a ∈ Σ r is only processed by transitions of the form (q, X, a, q ′ , ε) with X ̸ = ⊥ or (q, ⊥, a, q ′ , ⊥), i.e. the topmost stack symbol is removed, or if the stack is empty, it is left unchanged.
• A letter a ∈ Σ int is only processed by transitions of the form (q, X, a, q ′ , X) with X ∈ Γ ⊥ , i.e. the stack is left unchanged.• There are no ε-transitions.
Intuitively, the stack height of the last configuration of a run processing some w ∈ (Σ c ∪ Σ r ∪ Σ s ) * only depends on w.
We denote by HD-VPA the VPA that are history-deterministic.Every VPA (and hence every HD-VPA) can be determinised, i.e. all three classes of automata recognise the same class of languages, denoted by VPL, which is a strict subset of DCFL [AM04].
6.1.Succinctness.While all three classes of VPA are equally expressive, VPA can be exponentially more succinct than deterministic VPA (DVPA) [AM04].We show that there is an exponential gap both between the succinctness of HD-VPA and DVPA and between VPA and HD-VPA.The proof of the former gap again uses a language of bad counters, similar to C n used in Theorem 5.1, which we adapt for the VPA setting by adding a suffix allowing the automaton to pop the stack.Furthermore, for the gap between VPA and HD-VPA, we similarly adapt the language L n of words where the n th bit from the end is a 1, from the proof of Theorem 5.1, by making sure that the stack height is always bounded by 1.Such an HD-VPA is essentially an HD-NFA, and therefore determinisable by pruning, which means that it is as big as a deterministic automaton for the language.Theorem 6.1.HD-VPA can be exponentially more succinct than DVPA and VPA can be exponentially more succinct than HD-VPA.
We split the proof into two parts.Lemma 6.2.HD-VPA can be exponentially more succinct than DVPA.
Proof.We construct a family (C ′ n ) n∈N of languages such that there is an HD-VPA of size O(n) recognising C ′ n , yet every DVPA recognising C ′ n has at least exponential size in n.This family is obtained by adapting the family (C n ) n∈N that we used to prove the succinctness of HD-PDA in Section 5: Once again, we consider the word c n ∈ (${0, 1} n ) * describing an n-bit binary counter counting from 0 to 2 n − 1.We consider the languages bad counters, where 0, 1 and $ are call symbols and # is a return symbol.The only difference with C n is that the forbidden word is n is obtained by a small modification of the construction presented in the proof of Lemma 5.2.We adapt the construction of the automaton P recognising C n as follows: • The push phase is identical; • The check phase is performed by consuming the # symbols instead of having ε-transitions.
While the stack is not empty, P accepts even if it has not found evidence of bad counting yet.Moreover, P transitions towards a final sink state if a non-# symbol is read.Once the stack is empty, it transitions towards the final phase; 3:21 • In the final since the prefix processed up to this point ends with an empty stack, if the suffix left to read is non-empty then the input is not equal to c n # 2 n (n+1) , and can be accepted.
Finally, we can prove that every DPDA (and in particular every DVPA) recognising C ′ n has at least exponential size in n in the exact same way as we proved Lemma 5.3: The proof only uses the fact that c n is an infix of the single word rejected by C n , thus C ′ n can be treated identically.Note that this lower bound is independent of the partition of the letters into calls, returns, and internals.Lemma 6.3.VPA can be exponentially more succinct than HD-VPA.
We show that there exists a family (L ′ n ) n∈N of languages such that there exists a VPA of size O(n) recognising L ′ n while every HD-VPA recognising the same language has size at least 2 n/6 .Towards this we consider a language L ′ n of words in (01 + 10) * • (ε + 0 + 1) with the n th last letter being 1.We first note that L ′ n can be recognised by a VPA with O(n) states, which checks that the input is in (01 + 10) * • (ε + 0 + 1) and nondeterministically guesses the n th last letter and verifies that it is a 1.
First, we note that every DFA recognising L ′ n has exponential size, which can be shown by counting the equivalence classes of the Myhill-Nerode congruence of L ′ n (see, e.g.[HU79]).Remark 6.4.Every DFA recognising L ′ n has at least 2 ⌈n/2⌉ states.Using this, we obtain an exponential lower bound on the size of HD-VPA recognising L ′ n , thereby completing the proof of Lemma 6.3.Lemma 6.5.Every HD-VPA recognising L ′ n has at least size 2 ⌈n/6⌉ .Proof.The proof is based on the fact that HD-NFA can be determinised by pruning [BKS17], that is, they always contain an equivalent DFA, i.e. the lower bound of Remark 6.4 is applicable to HD-NFA as well.
Let P be an HD-VPA recognising L ′ n .We consider the following cases: (1) Both 0 and 1 are either a return symbol or an internal symbol : The HD-VPA P in this case can essentially be seen as an HD-NFA with the same set of states, since the stack is not used (it is always equal to ⊥).Given that HD-NFA are determinisable by pruning, by Remark 6.4 such an HD-NFA has at least 2 ⌈n/2⌉ states.(2) At least one of 0 and 1 is a call symbol while the other one is a call or an internal symbol : Let Q be the set of states and Γ be the stack alphabet of P. Since the height of the stack is nondecreasing, P has only access to the top stack symbol.We can thus construct an equivalent HD-NFA over finite words with states in Q × Γ.Since HD-NFA are determinisable by pruning, and using Remark 6.4 again, we have that |Q| • |Γ| ⩾ 2 ⌈n/2⌉ .Thus either |Q| ⩾ 2 n/4 or |Γ| ⩾ 2 n/4 .Hence for this case, we have that the size of the HD-VPA is at least 2 n/4 .(3) One of 0 and 1 is a call symbol while the other one is a return symbol : Note that since a word in L ′ n is composed of sequences of 10 and 01, the stack height can always be restricted to 2. Thus the configuration space of P, restricted to configurations on accepting runs, is finite, and there is an equivalent HD-NFA of size at most |Q| • |Γ| 2 .Thus |Q| • |Γ| 2 ⩾ 2 ⌈n/2⌉ giving either |Q| ⩾ 2 ⌈n/6⌉ or |Γ| ⩾ 2 ⌈n/6⌉ .Again by the determinizability by pruning argument, we have that the size of the HD-VPA P is at least 2 ⌈n/6⌉ .6.2.Deciding HDness of VPA.We now turn to the question of deciding whether a given VPA is HD.We show decidability using the one-token game, introduced by Bagnol and Kuperberg [BK18].It is easier to decide than the game-based characterisation of HDness of ω-regular automata by Henzinger and Piterman [HP06].While the one-token game does not characterise the HDness of Büchi automata [BK18], here we show that it suffices for VPA.
Theorem 6.6.The following problem is ExpTime-complete: Given a VPA P, is P HD? Proof.We first define the one-token game, introduced by Bagnol and Kuperberg [BK18] in the context of regular languages, for VPA.Fix a VPA P = (Q, Σ, Γ, q I , ∆, F ), which we assume w.l.o.g. to be complete, i.e. every mode has at least one enabled a-transition for every a ∈ Σ.The positions of the one-token game consist of pairs of configurations (c, c ′ ), starting from the pair containing the initial configuration of P twice.At each round i from position (c i , c ′ i ): • Player 1 picks a letter a i ∈ Σ, then • Player 2 picks an a i -transition τ i ∈ ∆ enabled in c i , leading to a configuration c i+1 , then • Player 1 picks an a i -transition τ ′ i ∈ ∆ enabled in c ′ i , leading to a configuration c ′ i+1 .• Then round i ends.In round i + 1, the play then proceeds from the position (c i+1 , c ′ i+1 ).Note that due to completeness of P, there is always at least one move available for each player.
The moves of the two players during a play induce an infinite word a 0 a 1 • built by Player 2 and Player 1 respectively.Player 2 wins if for all n ⩾ 0, whenever the run constructed by Player 1 is accepting then the run c 0 τ 0 • • • τ n c n+1 constructed by Player 2 is accepting as well.Recall that VPA do not have ε-transitions, so the two runs proceed in lockstep, i.e. both runs process a 0 • • • a n .
Observe that this game can be seen as a safety game on an infinite arena induced by the configuration graph of a visibly pushdown automaton, obtained by taking the product of P with itself.This in turn is solvable in exponential time [Wal01].
It now suffices to argue that this game characterises whether the VPA P is HD: We argue that P is HD if and only if Player 2 wins the one-token game on P. One direction is immediate: if P is HD, then the resolver induces a strategy for Player 2 in the one-token game which ensures that the run constructed by Player 2 is accepting whenever Player 1 picks a word that is accepted by P.This covers in particular the cases when the run constructed by Player 1 is accepting, which suffices for Player 2 to win.
For the converse direction, we show that a winning strategy for Player 2 in the one-token game can be turned into a resolver for P. To this end, consider the family of copycat strategies for Player 1 that copy the transition chosen by Player 2 until she plays an a-transition from a configuration c to a configuration c ′ such that there is a word aw that is accepted from c but w is not accepted from c ′ .We call such transitions non-residual.If Player 2 plays such a non-residual transition, then the copycat strategies stop copying and instead play the letters of w and the transitions of an accepting run over aw from c.
If Player 2 wins the one-token game with a strategy σ, she wins, in particular, against every copycat strategy for Player 1. Observe that copycat strategies win any play along which Player 2 plays a non-residual transition.Therefore, σ must avoid ever playing a non-residual transition.We can now use σ to construct a resolver r σ for P: r σ maps a sequence of transitions over a word w and a letter a to the transition chosen by σ in the one-token game where Player 1 played wa and used a copycat strategy to construct his run.Then, r σ never produces a non-residual transition.As a result, if a word w is in L(P), then 3:23 the run induced by σ over every prefix v of w leads to a configuration that accepts the remainder of w.This is in particular the case for w itself, for which r σ induces an accepting run.This concludes our argument that r σ is indeed a resolver, and P is therefore HD.
Thus, to decide whether a VPA P is HD it suffices to solve the one-token game on P, which can be done in exponential time.The matching lower bound follows from a reduction from the inclusion problem for VPA, which is ExpTime-hard [AM04], to HDness (see [LZ22, Theorem 6.1(1)] for details of the reduction in the context of ω-HD-PDA).
Thus, HDness for VPA is decidable.On the other hand, the problem of deciding whether a language of a VPA is history-deterministic, i.e. accepted by some HD-VPA, is trivial, as VPA are determinisable [AM04].
6.3.Good-enough Synthesis.Finally, we relate the HDness problem to the good-enough synthesis problem [AK20], also known as the uniformization problem [CL15], which is similar to Church's synthesis problem 4 , except that the system is only required to satisfy the specification on inputs in the projection of the specification on the first component.
Let w ∈ Σ 1 and w ′ ∈ Σ 2 with |w| = |w ′ |.Then, for the sake of readability, we write w w ′ for the word w(0) w ′ (0 We now prove that the ge-synthesis problem for HD-VPA and DVPA is as hard as the HDness problem for VPA, giving us the following corollary of Theorem 6.6.Corollary 6.8.The ge-synthesis problem for inputs given by HD-VPA or DVPA is ExpTime-complete. Proof.We show that deciding ge-synthesis and HDness, which is ExpTime-complete (see Theorem 6.6), are polynomially equivalent.We show the upper bound for HD-VPA and the lower bound for DVPA.
We first reduce the good-enough synthesis problem to the HDness problem.Given an HD-VPA P = (Q, Σ 1 × Σ 2 , Γ, q I , ∆, F ), with resolver r, let P ′ be P projected onto the first component: P ′ = (Q, Σ 1 , Γ, q I , ∆ ′ , F ) has the same states, stack alphabet and final states as P, but the transitions of P ′ are obtained by replacing each label a b of a transition of P by a.Let each transition of P ′ be annotated with a Σ 2 -letter of a corresponding P-transition.Thus, P ′ recognises the projection of L(P) on the first component.
We show that P has a ge-synthesis function if and only if P ′ is HD.First, a ge-synthesis function f for P, combined with r, induces a resolver r ′ for P ′ by using f to choose output letters and r to choose which transition of P to use; together these uniquely determine a transition in P ′ .Then, if w ∈ L(P ′ ), f guarantees that the annotation of the run induced by r ′ in P ′ is a witness w ′ such that w w ′ ∈ P, and then r guarantees that the run is accepting, since the corresponding run in P over w w ′ must be accepting.Conversely, assume P ′ is HD.Then, a resolver for P ′ induces a ge-synthesis function for P by reading the Σ 2 -annotation of the chosen transitions in P ′ .Indeed, the resolver produces an accepting run with annotation For non-closure under complementation, recall the language We prove in Lemma 4.2 that B 2 ∈ HD-CFL, yet Lemma 4.3 shows that its complement B c 2 is not even a context-free language.Closure under set difference implies closure under complementation since for every language L over alphabet Σ, we have that the complement L c is equal to Σ * \L.Now we show that HD-CFL is not closed under concatenation.Consider again the languages L 1 = {a n b n | n ⩾ 0}, and L 2 = {a n b 2n | n ⩾ 0}.We showed that L 1 ∪ L 2 is not an HD-CFL.Now let L 5 = cL 1 ∪ L 2 .Clearly L 5 is a DCFL and hence an HD-CFL since the initial letter determines whether the word is in cL 1 or in L 2 .Now the language represented by the regular expression c * is regular, and hence an HD-CFL.We argue that c * L 5 is not an HD-CFL, and hence HD-CFLs are not closed under concatenation.
Assume for contradiction that c * L 5 is an HD-CFL.Let L 6 = c * L 5 ∩ ca * b * = cL 1 ∪ cL 2 .Since HD-CFLs are closed under intersection with regular languages (see Theorem 7.1), L 6 is also an HD-CFL.Now if L 6 is an HD-CFL, then there exists an HD-PDA P 6 with a resolver r such that for a word w = cv ∈ L 6 where v ∈ L 1 ∪ L 2 , the resolver r after reading c reaches some state q of P 6 from which it induces an accepting run on v.This allows us to construct from P 6 an HD-PDA P 1∪2 accepting L 1 ∪ L 2 with q as the initial state.We reach a contradiction since L 1 ∪ L 2 is not an HD-CFL.Now we show that HD-CFLs are also not closed under Kleene star.Again we consider the language L 5 above which is an HD-CFL.We show that L * 5 is not an HD-CFL.Had L * 5 been an HD-CFL, this would imply that L * 5 ∩ ca * b * is an HD-CFL.However L * 5 ∩ ca * b * = cL 1 ∪cL 2 which is the language L 6 defined above.Now as we have already shown that the language L 6 is not an HD-CFL, this implies that L * 5 is also not an HD-CFL.For non-closure under homomorphism, consider the language is recognised by a DPDA, and hence by an HD-PDA using Lemma 3.1, but its projection (which is a homomorphism) cannot be recognised by an HD-PDA (see Lemma 4.4).

Compositionality with games
One of the appeals of history-deterministic automata consists of the good compositional properties that they enjoy, which allows them to be used to solve games.Namely, we consider arena-based games, in which two players, called Player 1 and Player 2, build a path, with the owner of the current position choosing an outgoing edge at each turn, and Player 2 wins if the label of the path is in the winning condition.If the winning condition is the language of a deterministic or history-deterministic automaton D with acceptance condition C, taking the product (or composition) of the arena with D yields an arena which, when treated as a game with winning condition C, has the same winner as the original game with winning condition L(D).In short, this product construction reduces games with winning conditions recognised by deterministic or history-deterministic C-automata into C-games.For example, it can be used to reduce games with ω-regular winning conditions, which are recognised by history-deterministic and deterministic parity automata, into parity games.In this sense, history-deterministic automata are just as good as deterministic automata when it comes to solving games via composition, making them appealing for applications such as reactive synthesis, which can be solved via games.
Here we first show that HD-PDA, like regular history-deterministic automata, enjoy this compositional property, both with respect to finite and infinite arenas.We then consider the converse question, namely, whether all PDA that behave well with respect to composition, are HD-PDA.This is the case for (ω-)regular automata [BL19], but not for quantitative automata [BL21].We show that for PDA over finite words, those that enjoy compositionality with infinitely branching games are exactly those that are history-deterministic.On the other hand, unlike for the (ω-)regular setting, compositionality with finitely branching games does not suffice to guarantee that the automaton is history-deterministic, even for automata over finite words.
Our proof of equivalence of compositionality with games and history-determinism for PDA over finite words uses the determinacy of safety games.Generalising this proof to automata over infinite words would require determinacy of games with ω-context-free winning conditions, which is a large cardinal assumption [Fin12].
A Σ-arena A = (V, V 1 , V 2 , E, ι, ℓ) consists of a directed graph (V, E) of which the positions V are partitioned into those belonging to Player 1, V 1 , and those belonging to Player 2, V 2 , of which the edges E are labelled with Σ ∪ {end} via a labelling ℓ : E → Σ ∪ {end}), and rooted at an initial position ι ∈ V .We assume that all end-edges lead to a terminal position, that is, without outgoing edges, that all other positions have at least one successor, and that all edges leading to a terminal position are labelled end.
A play on A is a finite path ending in a terminal position, or an infinite path.A strategy for a player is a mapping from finite paths ending in a position that belongs to that player to one of its outgoing edges.A play is consistent with Player i's strategy σ if for all of its prefixes π ending in V i , the next edge is σ(π).
An (arena-based) game (A, L) consists of a Σ-arena A and a language L ⊆ Σ * .A finite play is winning for Player 2 if it is labelled with a word in L • end.Note that Player 2 loses in particular all infinite plays.Winning strategies are defined in the usual way, and we say that a player wins the game if they have a winning strategy.3:27 We now consider the of an arena and an automaton, that is, a product game in which Player 2 not only has to guarantee that the label of the play in the arena is in the language of the automaton, but she also has to build an accepting run over that word, transition by transition, as the play in the arena progresses.
Formally, the positions of the product game G(A, P) of a Σ-arena A = (V, V 1 , V 2 , E, ι, ℓ) and a PDA P = (Q, Σ, Γ, q I , ∆, F ) consists of a position in V and a configuration of P, the initial position being (ι, q I , ⊥).At each round i from position (v, q, γ), the player who owns the position v chooses an outgoing edge (v, v ′ ) ∈ E, labelled by some letter ℓ(v, v ′ ) = a.If a ̸ = end, Player 2 then chooses a possibly empty sequence of ε-transitions followed by one a-transition, so that the combined sequence induces a finite run from (q, γ) leading to some (q ′ , γ ′ ).Then, the play proceeds in round i + 1 from (v ′ , q ′ , γ ′ ).If a = end, then the play ends in (v ′ , q, γ), as v ′ is by assumption a terminal position of the arena.Player 2 wins by reaching a position (v, q, γ) where v is the terminal position and q is final.Again, strategies, winning strategies and winning the game are defined as expected.
Observe that G(A, P) may have an infinite number of positions, and may have infinite branching, due to unbounded ε-transition sequences.If A is finitely representable, so is G(A, P).In particular, if A is finite, then G(A, P) is a reachability game on the configuration graph of a pushdown automaton.Definition 8.1.Let P be a PDA over an alphabet Σ.
• P is compositional if the following holds for all (finitely or infinitely branching) Σ-arenas A: Player 2 wins the game (A, L(P)) if and only if she wins G(A, P). • P is weakly compositional if the following holds for all finitely branching Σ-arenas A: Player 2 wins the game (A, L(P)) if and only if she wins G(A, P).
Note that compositionality implies weak compositionality.In the remainder of this section, we compare history-determinism and (weak) compositionality.
Proof.Let P be an HD-PDA over Σ and let A be a Σ-arena.If Player 2 wins the game (A, L(P)) with a strategy σ, then she wins G(A, P) by using σ in the A-component of G(A, P) and the resolver r that witnesses the history-determinism of P in the P-component of G(A, P).Then, σ guarantees that the word built in A is in L(P) • end, and r guarantees that Player 2 reaches a final state at the end of the word in L(P), before end is encountered.
Conversely, a winning strategy for Player 2 in G(A, P) projected onto the A-component is a winning strategy for Player 2 in (A, L(P)).
We now consider the converse: whether it is the case that all (weakly) compositional PDA are HD-PDA.In the regular setting, automata are weakly compositional if and only if they are compositional [BL19].Here, however, we observe that the answer depends on whether infinite branching is permitted in the arenas considered.
Lemma 8.3.There exists a PDA that is weakly compositional, but not history-deterministic.
Proof.Consider the 3-state PDA in Fig. 8 that recognises a * b, by first guessing an upper bound m on the number of a's in the word, then processing a no more than m times, followed by a b.The automaton accepts a n b by pushing m ⩾ n elements onto the stack before reading n times a, followed by b.
It is clearly not an HD-PDA, as the resolver would have to predict the number of a's to be seen.However, for all finitely branching Σ-arenas A, if Player 2 wins the game (A, L(P)), she also wins G(A, P).Indeed, if Player 2 wins (A, L(P)) with a strategy σ, then by König's lemma, there is some bound n on the number of a's in any play that is consistent with σ.
Then, in G(A, P), Player 2 wins by first pushing n letters onto the stack in P, then playing σ in the A-component.Since σ guarantees that there are at most n occurrences of a in the play of L(P), this strategy is winning for Player 2.
In contrast, HD-PDA are exactly the PDA that preserve winners when composed with infinitely branching arenas.
Theorem 8.4.A PDA is history-deterministic if and only if it is compositional.
Proof.One direction of the equivalence was shown in Lemma 8.2: every HD-PDA is compositional.
For the other direction, we show that from a PDA that is not history-deterministic, we can construct an arena that witnesses the failure of compositionality.To characterise automata that are not history-deterministic, we rely on the so-called letter game, a gamebased characterisation of history-determinism due to Henzinger and Piterman [HP06].
Given a PDA P, the letter game is played by two players, called Challenger and Resolver, and P is history-deterministic if and only if Resolver wins the letter game induced by P. The game is played in rounds: in each round i, Challenger chooses a letter a i and then Resolver responds with ρ i , the concatenation of a (potentially empty) sequence of ε-transitions and an a i -transition.Resolver's winning condition is a safety condition: before each round i, either the word w = a 0 a 1 • • • a i−1 built by Challenger so far is not in the language L(P) or the sequence ρ 0 ρ 1 • • • ρ i−1 played by Resolver so far induces an accepting run of P over w.This game characterises history-determinism, as a winning strategy for Resolver is exactly a resolver witnessing the history-determinism of P.
We now crucially rely on the determinacy of the letter game, which we can easily show for PDA over finite words.Indeed, the winning condition for Resolver is a safety condition, in the sense that all losing plays have a finite prefix of which all continuations are losing.The winning condition is therefore Borel, and, by Martin's theorem [Mar75], determined.This implies that if P is not history-deterministic, then Challenger has a winning strategy σ C in the letter game.
We now build an arena A σ C , based on σ C , such that Player 2 wins (A σ C , L(P)), but loses the composition game G(A σ C , P), thus witnessing that P is not compositional.
The arena A σ C emulates the strategy tree for σ C , in the sense that all the plays of the arena are plays that are consistent with the strategy σ C .Furthermore, the branching in the arena corresponds to the branching of the strategy tree of σ C , which represents Resolver's choices in the letter game.All positions of A σ C belong to Player 1.
Note that a play in the letter game has the form a 0 ρ 0 a 1 ρ 1 a 2 ρ 2 • • • where each a i is a letter in Σ and each ρ i is a (possibly empty) sequence of ε-transitions concatenated with

Figure 3 :
Figure 3: An example CFG for language B

Property 1 :
If w has more than t c's, then the (unique) leftmost derivation of w has the form S * =⇒ αAβ * =⇒ αxAyβ * =⇒ w such that xy contains a c.Thus, A has type 3 or type 4. Property 2: If w has more than t a's, then the (unique) leftmost derivation of w has the form S * =⇒ αAβ * =⇒ αxAyβ * =⇒ w such that xy contains an a.Thus, A has type 1, type 3, or type 5. Property 3: If w has more than t b's, then the (unique) leftmost derivation of w has the form S * =⇒ αAβ * =⇒ αxAyβ * =⇒ w such that xy contains a b.Thus, A has type 2, type 4, or type 5.

4
Compare the following definitions to that of Gale-Stewart games in Section 9, which capture Church's synthesis problem.3:25 Now, we consider the properties of HD-PDA.Theorem 7.2.HD-PDA are not closed under union, intersection, complementation, set difference, concatenation, Kleene star and homomorphism.Proof.The proofs for union, intersection, complementation, and set difference are similar to those used for ω-HD-PDA [LZ22, Theorem 5.1].We state them here for completeness.To show non-closure under union, consider the languages L 1 = {a n b n | n ⩾ 0} and L 2 = {a n b 2n | n ⩾ 0} respectively.There exist a DPDA recognising L 1 and a DPDA recognising L 2 .Hence by Lemma 3.1, there also exist an HD-PDA recognising L 1 and an HD-PDA recognising L 2 .However, we show in Lemma 4.4 that L 1 ∪ L 2 is not recognised by any HD-PDA.For intersection, consider the languages L 3 = {a n b n c m | m, n ⩾ 0} and L 4 = {a m b n c n | m, n ⩾ 0}.There exist DPDA recognising L 3 and L 4 .Hence by Lemma 3.1 there exist HD-PDA recognising L 3 and L 4 .Now let L = L 3 ∩ L 4 = {a n b n c n | n ⩾ 0}.As L is not a CFL there does not exist any HD-PDA recognising L.

Figure 4 :
Figure 4: A PDA recognising a * b that is weakly compositional but not history-deterministic.Grey states are final, and X is an arbitrary stack symbol.