Collapsible Pushdown Graphs of Level 2 are Tree-Automatic

We show that graphs generated by collapsible pushdown systems of level 2 are tree-automatic. Even if we allow epsilon-contractions and reachability predicates (with regular constraints) for pairs of configurations, the structures remain tree-automatic whence their first-order logic theories are decidable. As a corollary we obtain the tree-automaticity of the second level of the Caucal-hierarchy.


Introduction
Higher-order pushdown systems were first introduced by Maslov [10,11] as accepting devices for word languages.Later, Knapik et al. [8] studied them as generators for trees.They obtained an equi-expressivity result for higher-order pushdown systems and for higherorder recursion schemes that satisfy the constraint of safety, which is a rather unnatural syntactic condition.Recently, Hague et al. [6] introduced collapsible pushdown systems as extensions of higher-order pushdown systems and proved that these have exactly the same power as higher-order recursion schemes as methods for generating trees.
Both -higher-order and collapsible pushdown systems -also form interesting devices for generating graphs.Carayol and Wöhrle [3] showed that the graphs generated by higherorder pushdown systems 1 of level l coincide with the graphs in the l-th level of the Caucalhierarchy, a class of graphs introduced by Caucal [4].Every level of this hierarchy is obtained from the preceding level by applying graph unfoldings and MSO interpretations.Both operations preserve the decidability of the MSO theory whence the Caucal-hierarchy forms a rather large class of graphs with decidable MSO theories.If we use collapsible pushdown systems as generators for graphs we obtain a different situation.Hague et al. showed that even the second level of the hierarchy contains a graph with undecidable MSO theory.But they showed the decidability of the modal µ-calculus theories of all graphs in the hierarchy.This turns graphs generated by collapsible pushdown systems into an interesting class from a model theoretic point of view.There are few natural classes that share these properties.In fact, the author only knows one further example, viz.nested pushdown trees.Alur et al. [1] introduced these graphs for µ-calculus model checking purposes.We proved in [7] that nested pushdown trees also have decidable first-order theories.We gave an effective model checking algorithm using pumping techniques, but we also proved that nested pushdown trees are tree-automatic structures.Tree-automatic structures were introduced by Blumensath [2].These structures enjoy decidable first-order theories due to the good closure properties of finite automata on trees.
In this paper, we are going to extend our previous result to the second level of the collapsible pushdown hierarchy.All graphs of the second level are tree-automatic.This subsumes our previous result as nested pushdown trees are first-order interpretable in collapsible pushdown graphs of level two.Furthermore, we show that collapsible pushdown graphs of level 2 are still tree-automatic when expanded by a reachability predicate, i.e., by the binary relation which contains all pairs of configurations such that there is a path from the first to the second configuration.Thus, first-order logic extended by reachability predicates is decidable on level 2 collapsible pushdown graphs.
In the next section, we introduce the necessary notions concerning tree-automaticity and in Section 3 we define collapsible pushdown graphs.We explain the translation of configurations into trees in Section 4. Section 5 is a sketch of the proof that this translation yields tree-automatic representations of collapsible pushdown graphs, even when enriched with certain regular reachability predicates.The last section contains some concluding remarks about questions arising from our result.

Preliminaries
We write MSO for monadic second order logic and FO for first-order logic.For words w 1 , w 2 ∈ Σ * , we write w 1 ⊓ w 2 for the greatest common prefix of w 1 and w 2 .A Σ-labelled tree is a function T : D → Σ for a finite D ⊆ {0, 1} * which is closed under prefixes.For d ∈ D we denote by T d the subtree rooted at d.
Sometimes it is useful to define trees inductively by describing their left and right subtrees.For this purpose we fix the following notation.Let T0 and T1 be Σ-labelled trees and σ ∈ Σ.Then we write T := σ( T0 , T1 ) for the Σ-labelled tree T with the following three properties 1. T (ε) = σ, 2. T 0 = T0 , and 3. T 1 = T1 .
In the rest of this section, we briefly present the notion of a tree-automatic structure as introduced by Blumensath [2].
The convolution of two Σ-labelled trees T and T ′ is given by a function where is a new symbol for padding and By "tree-automata" we mean a nondeterministic finite automaton that labels a finite tree top-down.f : L → B for L the language accepted by A B such that the following hold.For T, T ′ ∈ L, the automaton Tree-automatic structures form a nice class because automata theoretic techniques may be used to decide first-order formulas on these structures: ).If B is tree-automatic, then its first-order theory is decidable.
We will use the classical result that regular sets of trees are MSO definable.
Theorem 2.3 ([12], [5]).For a set T of finite Σ-labelled trees, there is a tree automaton recognising T if and only if T is MSO definable.

Definition of Collapsible Pushdown Graphs (CPG)
In this section we define our notation of collapsible pushdown systems.For a more comprehensive introduction, we refer the reader to [6].
Let us fix a 2-word s ∈ Σ * 2 which consists of an ordered list w 1 , w 2 , . . ., w m ∈ Σ * .We separate the words of this list by colons writing s = w 1 : w 2 : . . .: w m .By |s| we denote the number of words s consists of, i.e., |s| = m.
For another word s ′ = w ′ 1 : w ′ 2 : . . .: w ′ n ∈ Σ * 2 , we write s : s ′ for the concatenation w 1 : w 2 : . . .: w m : w ′ 1 : w ′ 2 : . . .: w ′ n .If w ∈ Σ * , we write [w] for the 2-word that consists of a list of one word which is w.A level 2 collapsible pushdown stack is a special element of (Σ × {1, 2} × N) +2 that is generated by certain stack operations from an initial stack which we introduce in the following definitions.The natural numbers following the stack symbol represent the socalled collapse pointer : every element in a collapsible pushdown stack has a pointer to some substack and applying the collapse operation returns the substack to which the topmost symbol of the stack points.Here, the first number denotes the collapse level.If it is 1 the collapse pointer always points to the symbol below the topmost symbol and the collapse operations just removes the topmost symbol.The more interesting case is when the collapse level of the topmost symbol of the stack s is 2. Then the stack obtained by the collapse contains the first n words of s where n is the second number in the topmost element of s.
The initial level 1 stack is ⊥ 1 := (⊥, 1, 0) and the initial level 2 stack is For k ∈ {1, 2} and for a 2-word s = w 1 : w 2 : . . .: • we define the topmost (k − 1)-word of s as top k (s) := For s, w n and k as before, σ ∈ Σ\{⊥}, and w ′ n := a 1 . . .a m−1 , we define the stack operations The set of level 2-operations is OP := push σ,1 , push σ,2 , clone 2 , pop 1 , pop 2 , collapse .The set of level 2 stacks, Stck(Σ), is the smallest set that contains ⊥ 2 and is closed under all operations from OP.Note that collapse-and pop k -operations are only allowed if the resulting stack is in (Σ + ) + .This avoids the special treatment of empty words or stacks.Furthermore, a collapse on level 2 summarises a non-empty sequence of pop 2 -operations.For example, starting from ⊥ 2 , we can apply a clone 2 , a push σ,2 , a clone 2 , and finally a collapse.This sequence first creates a level 2 stack that contains 3 words and then performs the collapse and ends in the initial stack again.This example shows that clone 2 -operations are responsible for the fact that collapse-operations on level 2 may remove more than one word from the stack.For s, s ′ ∈ Stck(Σ), we call s ′ a substack of s if there are

Collapsible Pushdown Systems and Collapsible Pushdown Graphs
Now we introduce collapsible pushdown systems and graphs (of level 2) which are analogues of pushdown systems and pushdown graphs using collapsible pushdown stacks instead of ordinary stacks.
The next definition introduces runs of collapsible pushdown systems.
Definition 3.4.Let S be a CPS.A run r of S of length n is a function We write ln(r) := n and call r a run from r(0) to r(n).We say r visits a stack s at i if r(i) = (q, s).
For runs r, r ′ of length n and m, respectively, with r(n) = r ′ (0), we define the composition r • r ′ of r and r ′ in the obvious manner.Remark 3.5.Note that we do not require runs to start in the initial configuration.

Encoding of Collapsible Pushdown Graphs in Trees
In this section we prove that CPG are tree-automatic.For this purpose we have to encode stacks in trees.The idea is to divide a stack into blocks and to encode different blocks in different subtrees.The crucial observation is that every stack is a list of words that share the same first letter.A block is a maximal list of words in the stack that share the same two first letters2 .If we remove the first letter of every word of such a block, the resulting 2-word decomposes again as a list of blocks.Thus, we can inductively carry on to decompose parts of a stack into blocks and code every block in a different subtree.The roots of these subtrees are labelled with the first letter of the corresponding block.This results in a tree in which every initial left-closed path represents one word of the stack.By left-closed, we mean that the last element of the path has no left successor.
It turns out that -via this encoding -each stack operation corresponds to a simple MSO-definable tree-operation.The main difficulty is to provide a tree-automaton that checks whether there is a run to the configuration represented by some tree.This problem is addressed in Section 5.  Now we encode a (σ, n, m)-blockline l in a tree by labelling the root with (σ, n), by encoding the blockline induced by the first block of l in the left subtree, and by encoding the rest of the blockline in the right subtree.In order to avoid repetitions, we do not repeat the symbol (σ, n) in the right subtree, but replace it by the default letter ε.Definition 4.2.Let s = w 1 : w 2 : . . .: w n ∈ (Σ × {1, 2} × N) +2 be a (σ, l, k)-blockline.Let w ′ i be words such that s = (σ, l, k) \ [w ′ 1 : w ′ 2 : . . .: w ′ n ] and set s ′ := w ′ 1 : w ′ 2 : . . .: w ′ n .As an abbreviation we write h s i := w h : w h+1 : . . .: w i .Furthermore, let w 1 : w 2 : . . .: w j be a maximal block of s.Note that j > 1 implies , Enc( j+1 s n , ε)) otherwise.Enc(s) := Enc(s, (⊥, 1)) is called the (tree-)encoding of the stack s ∈ Stck(Σ).
Figure 3 shows a configuration and its encoding.
Remark 4.3.In this encoding, the first block of a (σ, l, k)-blockline is encoded in a subtree whose root d is labelled (σ, l).We can restore k from the position of d in the tree Enc(s) as follows.If l = 1 then k = |d| 0 , i.e., the number of occurrences of 0 in d.This is due to the fact that level 1 links always point to the preceding letter and that we always introduce a left-successor tree in order to encode letters that are higher in the stack.
The case l = 2 needs some closer inspection.Assume that some d ∈ T := Enc(s) is labelled (σ, 2).Then it encodes a letter (σ, 2, k) and this is not a cloned element.
(c, 2, 1) (e, 1, 3) .By induction, one easily sees that for each such pair e, e1 ∈ T all the letters that are in words left of the letter encoded by e1 are encoded in lexicographically smaller elements.Furthermore, the size of ((0 * )1) * ∩ T corresponds to the number of words in s since the introduction of a 1-successor corresponds to the separation of the first block of some blockline from the other blocks.Each of these separation can also be seen as the separation of the last word of the first block from the first word of the second block of this blockline.Note that we separate two words that are next to each other in exactly one blockline.Putting these facts together our claim is proved.Another view on this correspondence is the bijection f : {1, 2, . . ., |s|} → R where R := ((0 * )1) * ∩ dom(T ) and i is mapped to the i-th element of R in lexicographic order.f (i) is exactly the position where the (i − 1)-st word is separated from the i-th one for all i ≥ 2. In order to state the properties of f , we need some more notation.We write π for the canonical projection π : (Σ × {1, 2} × N) * → (Σ × {1, 2}) * and w i for the i-th word of s.Furthermore, let w ′ i be a word such that, w i = (w i ⊓ w i−1 ) • w ′ i (here we set w 0 := ε).Then the word along the path3 from the root to f (i) is exactly π(w i ⊓ w i−1 ) for all 2 ≤ i ≤ |s| and the path from f (j) to f (j) • 0 m for maximal m ∈ N is π(w ′ j ) for all 1 ≤ j ≤ |s|.In order to encode a configuration c := (q, s), we add q as a new root of the tree and attach the encoding of s as the left subtree, i.e., Enc(c) := q(Enc(s), ∅).
The image of this encoding function contains only trees of a very specific type.We call this class T Enc .In the next definition we state the characterising properties of T Enc .This class is MSO definable, whence automata-recognisable.Definition 4.4.Let T Enc be the class of all trees T that satisfy the following conditions.
(1) The root of T is labelled by some element of Q (T (ε) ∈ Q).
Our encoding turns the transitions of a CPG into regular tree-operations.The treeoperations corresponding to pop 2 and collapse can be seen in Figures 4 and 5.For the pop 2 , note that if v 1 is the 0-successor of v 0 then v 0 and v 1 encode symbols in the same word of the encoded stack.As a pop 2 removes the rightmost word, we have to remove all the nodes encoding information about this word.As the rightmost leaf corresponds to the topmost symbol of the stack, we have to remove this leaf and all its 0-ancestors.
For the collapse (on level 2), we note that each ε represents a cloned element.The collapse induced by such an element produces the same stack as a pop 2 of its original version.The original symbol of the rightmost leaf is its first ancestor not labelled by ε.
Note that the operations corresponding to pop 2 and collapse are clearly MSO definable.All other transitions in CPG correspond to MSO definable tree-operations, too.Due to space restrictions we skip the details.Lemma 4.7.Let C be the set of encodings of configurations of a CPS S. Then there are automata A (q,op) for all q ∈ Q and all op ∈ OP such that for all c 1 , c 2 ∈ C A (q,op) accepts Enc(c 1 ) ⊗ Enc(c 2 ) iff c 1 ⊢ (q,op) c 2 .

Recognising Reachable Configurations
We show that Enc maps the reachable configurations of a given CPS to a regular set.For this purpose we introduce milestones of a stack s.It turns out that these are exactly those substacks of s that every run to s has to visit.Furthermore, the milestones of s are represented by the nodes of Enc(s): with every d ∈ Enc(s), we can associate a subtree of s which encodes a milestone.Furthermore, the substack relation on the milestones corresponds exactly to the lexicographical order ≤ lex of the elements of Enc(s).For every d ∈ Enc(s) we can guess the state in which the corresponding milestone is visited for the last time by some run to s and we can check the correctness of this guess using MSO or, equivalently, tree-automata.
We prove that we can check the correctness of such a guess by introducing a special type of run, called loop, which is basically a run that starts and ends with the same stack.A run from one milestone to the next will mainly consist of loops combined with a finite number of stack operations.

Note that the substack relation ≤ linearly orders MS(s).
Lemma 5.2.If s, t, m are stacks with m ∈ MS(t) but m ≤ s, then every run from s to t visits m.Thus, for every run r from the initial configuration to s, the function s ′ → max{i ∈ dom(r) : r(i) = (q, s ′ ) for some q ∈ Q} is an order embedding with respect to substack relation on the milestones and the natural order of dom(r).
In order to state the close correspondence between milestones of a stack s and the elements of Enc(s), we need the following definition.LStck(d, s) is a substack of s for all d ∈ dom(Enc(s)).This observation follows from Remark 4.3 combined with the fact that the left stack is induced by a lexicographically downward closed subset.In fact, LStck(d, s) is a milestone of s.
Lemmas 5.5 and 5.2 imply that every run r decomposes as r = r 1 • r 2 • . . .• r n where r i is a run from the i-th milestone of r(ln(r)) to the (i + 1)-st milestone.
In order to describe the structure of the r i , we have to introduce the notion of a loop.Informally speaking, a loop is a run r that starts and ends with the same stack s and which does not look too much into s.Definition 5.6.Let r be a run of length n with r(i) = (q i , s i ) for all 0 ≤ i ≤ n.
• r is called a simple high loop if s 0 = s n and if s 0 < s i for all 0 < i < n.
• r is called a simple low loop of s if s 0 = s n = s, between 0 and n the stack s is never visited, s 1 = pop 1 (s), CLvl(s) = 1, |s i | ≥ |s| for all 0 ≤ i ≤ n, and r↾ [2,n−1] is the composition of simple low loops and simple high loops of pop 1 (s).• r is called loop if it is a finite composition of low loops and high loops.
Lemma 5.7.Let s be some stack, m 1 , m 2 milestones of s, and r a run from m 1 to m 2 that never visits any other milestone of s.
where each l i is a loop, and all p i , p, and c are runs of length 1, p performs one push σ,k , c performs one clone 2 , and the p i perform one pop 1 each.
This lemma motivates why we only define low loops for stacks s with CLvl(s) = 1.Whenever the topmost symbol of a milestone m is not a cloned element, then pop 1 (m) is another milestone.Hence, the l i can only contain low loops if they start at a stack with cloned topmost symbol.But any stack s with cloned topmost symbol and CLvl(s) = 2 cannot be restored from pop 1 (s) without passing pop 2 (s) since a push σ,2 -operation would create the wrong link-level.
From Lemma 5.7 we can derive that deciding whether there is a run from one milestone to the next is possible if we know the pairs of initial and final states of loops of certain stacks s.Hence we are interested in the sets Loops(s) ⊆ Q × Q with (q 1 , q 2 ) ∈ Loops(s) if and only if there is a loop from (q 1 , s) to (q 2 , s).The crucial observation is that Loops(s) may be calculated by a finite automaton reading top 2 (s).Lemma 5.8.For every CPS there exists a finite automaton A that calculates4 on input w ∈ (Σ × {1, 2}) * the set Loops(s) for all stacks s such that w = π(top 2 (s)).Here, π : (Σ × {1, 2} × N) * → (Σ × {1, 2}) * is the projection onto the symbols and collapse-levels.

Detection of Reachable Configurations
We have already seen that every run to a valid configuration (q, s) passes all the milestones of s.Now, we use the last state in which a run r to (q, s) visits each milestone as a certificate for the reachability of (q, s).To be precise, a certificate for the reachability of (q, s) is a map f : dom Enc(q, s) \ {ε} → Q such that there is some run r from ⊥ 2 to (q, s) and f (d) = q if and only if r(i) = q, LStck(d) for i the maximal position in r where LStck(d) is visited.Lemma 5.9.For every CPG G, there is a tree-automaton that checks for each map f : dom(Enc(q, s)) \ {ε} → Q whether f is a certificate of the reachability of (q, s), i.e., whether f is induced by some run r from the initial configuration to (q, s).
The proof of the lemma uses Lemma 5.8 and the fact that the path from the root to some d ∈ Enc(s) encodes the topmost word of LStck(d, Enc(s)).Hence, a tree automaton reading Enc(s) is able to calculate for each position d ∈ Enc(s) the pairs of initial and final states of loops of LStck(d).As every run decomposes as a sequence of loops separated by a single operation, knowing Loops(s ′ ) for each s ′ ≤ s enables the automaton to check the correctness of a candidate for a certificate of reachability.
As a tree-automaton may non-deterministically guess a certificate of the reachability of a configuration, the encodings of reachable configurations form a regular set.

Extension to Regular Reachability
By now, we have already established the tree-automaticity of each CPG G since we have seen that our encoding yields a regular image of the vertices of G and the transition relations are turned into regular relations of the tree encoding.Using similar techniques, we can improve this result: Theorem 5.10.If G is the ε-closure of some CPG G ′ then (G, Reach) is tree-automatic where Reach is the binary predicate that is true on a pair (c 1 , c 2 ) of configurations if there is a path from c 1 to c 2 in G.
Remark 5.11.Each graph in the second level of the Caucal-hierarchy can be obtained as the ε-contraction of some level 2 CPG (see [3]) whence all these graphs are tree-automatic.
For a CPS S let R ⊆ ∆ * be a regular language over the transitions of S. As collapsible pushdown graphs are closed under products with finite automata even the reachability predicate Reach R with restriction to R is tree-automatic.Here, Reach R xy holds if there is a path from x to y in CPG(S) that uses a sequence of transitions in R. If A is the automaton recognising R, we obtain that Reach R (q, s)(q ′ , s ′ ) holds in CPG(S) iff Reach (q, q i ), s (q ′ , q f ), s ′ holds in CPG(S × A) where q i is the initial and q f the unique final state of A. Using this idea one can define a CPG G ′ which is basically CPG(S ∪ (S × A)) extended by transitions from (q, s) to ((q, q i ), s) and to ((q, q f ), s).CPG(S) as well as Reach R w.r.t.CPG(S) are FO[Reach]-interpretable in G ′ .Hence we obtain: Theorem 5.12.Given a collapsible pushdown graph of level 2, its FO[Reach R ] theory is decidable for each regular R ⊆ ∆ * .

Computation of concrete tree-automatic representations of CPG
Up to now, we have only seen that there is a tree-automatic representation for each CPG.For computing a concrete representation, we rely on the following lemma.Lemma 5.13.Given some CPS S = (Γ, Q, ∆, q 0 ), some q ∈ Q, and some stack s, it is decidable whether (q, s) is a vertex of CPG(S).
The proof is based on the idea that a stack is uniquely determined by its top element and the information which substacks can be reached via collapse-and pop i -operations.Hence we can construct an extension S ′ of S and a modal formula ϕ q,s such that there is some element v ∈ CPG(S ′ ) satisfying CPG(S ′ ), v |= ϕ q,s iff (q, s) ∈ CPG(S).S ′ basically contains new states for every substack of s and connects the different states via the appropriate pop ioperations which are only applied if the topmost symbol of the stack agrees with the symbol we would expect when starting the pop i -sequence in configuration (q, s).
From this lemma we can derive the computability of the automata in Lemma 5.8.Having obtained these automata, the construction of a tree-automatic representation of some CPG is directly derived from the proofs yielding the following theorem.
we define the topmost symbol Sym(s) := σ, the collapse-level of the topmost element CLvl(s) := i, and the collapse-link of the topmost element CLnk(s) := j.

Figure 1 :
Figure 1: Example of a collapsible pushdown graph

Figure 2 :
Figure 2: Example of blocks in a stack.These form a c-blockline.