Decidable Classes of Tree Automata Mixing Local and Global Constraints Modulo Flat Theories

. We deﬁne a class of ranked tree automata TABG generalizing both the tree automata with local tests between brothers of Bogaert and Tison (1992) and with global equality and disequality constraints (TAGED) of Filiot et al. (2007). TABG can test for equality and disequality modulo a given ﬂat equational theory between brother subterms and between subterms whose positions are deﬁned by the states reached during a computation. In particular, TABG can check that all the subterms reaching a given state are distinct. This constraint is related to monadic key constraints for XML documents, meaning that every two distinct positions of a given type have diﬀerent values. We prove decidability of the emptiness problem for TABG. This solves, in particular, the open question of the decidability of emptiness for TAGED. We further extend our result by allowing global arithmetic constraints for counting the number of occurrences of some state or the number of diﬀerent equivalence classes of subterms (modulo a given ﬂat equational theory) reaching some state during a computation. We also adapt the model to unranked ordered terms. As a consequence of our results for TABG, we prove the decidability of a fragment of the monadic second order logic on trees extended with predicates for equality and disequality between subtrees, and cardinality.


Introduction
Tree automata techniques are widely used in several domains like automated deduction (see e.g.[CDG + 07]), static analysis of programs [BT05] or protocols [VGL07,FGVTT04], and XML processing [Sch07].However, a severe limitation of standard tree automata (TA) is that they are not able to test for equality (isomorphism) or disequality between subterms in an input term.For instance, the language of terms matching a non-linear pattern such as f (x, x) is not regular (i.e.there exists no TA recognizing this language).Let us illustrate how this limitation can be problematic in the context of XML documents processing.XML documents are commonly represented as labeled trees, and they can be constrained by XML schemas, which define both typing restrictions and integrity constraints.All the typing formalisms currently used for XML are based on finite tree automata.The key constraints for databases are common integrity constraints expressing that every two distinct positions of a given type have different values.This is typically the kind of constraints that can not be characterized by TA.
One first approach to overcome this limitation of TA consists in adding the possibility to make equality or disequality tests at each step of the computation of the automaton.
The tests are performed locally, between subterms at a bounded distance from the current computation position in the input term.The emptiness problem, i.e. whether the language recognized by a given automaton is empty, is undecidable with such tests [Mon81].A decidable subclass is obtained by restricting the tests to sibling subterms [BT92] (see [CDG + 07] for a survey).
Another approach was proposed more recently in [FTT07,FTT08] with the definition of tree automata with global equality and disequality tests (TAGED).The TAGED do not perform the tests during the computation steps but globally on the term, at the end of the computation, at positions which are defined by the states reached during the computation.For instance, they can express that all the subterms that reached a given state q are equal, or that every two subterms that reached respectively the states q and q ′ are different.Nevertheless, arbitrary disequalities are not allowed in TAGED, since such q and q ′ must be different.The emptiness has been shown decidable for several subclasses of TAGED [FTT07,FTT08], but the decidability of emptiness for the whole class remained a challenging open question.
In this paper, we answer this question positively, for a class of tree recognizers more general than TAGED.We propose (in Section 3) a class of tree automata with local constraints between siblings and global constraints (TABG) which significantly extends TAGED in several directions: (i) TABG combine global constraints a la TAGED with local equality and disequality constraints between brother subterms a la [BT92], (ii) the equality and disequality constraints are treated modulo a given flat equational theory (here flat means that both sides of the equation have the same variables and height, and that this height is bounded by 1), allowing to consider relations more general than syntactic equalities and disequalities, like e.g.structural equalities and disequalities, (iii) testing global disequality constraints between subterms that reached the same state is allowed (such test specify key constraints, which are not expressible with TAGED), (iv) the global constraints are arbitrary Boolean combinations (including negation) of atomic equality and disequality (in TAGED, only conjunction of atoms are allowed, without negation).
In Section 4, we consider the addition to TABG of global counting constraints on the number |q| of occurrences of a given state q in a computation, or the number q of distinct equivalence classes (modulo the flat theory) of subterms reaching a given state q in a computation.These counting constraints are only allowed to compare states to constants, like in |q| ≤ 5 or q + 2 q ′ ≥ 9 (with counting constraints being able to compare state cardinalities, like in |q| = |q ′ |, the emptiness problem becomes undecidable).Using this formalism as an intermediate step, we show that negative literals and disjunctions can be eliminated without loss of generality in the global constraints of TABG, i.e. that TABG whose global constraints are restricted to be conjunctions of positive literals (namely positive conjunctive TABG) have already the same expressiveness of the full TABG class.In particular, the counting constraints do not improve the expressiveness of TABG.
Our main result, presented in Section 5, is that emptiness is decidable for positive conjunctive TABG (and hence for TABG).The decision algorithm uses an involved pumping argument: every sufficiently large term recognized by the given TABG can be reduced by an operation of parallel pumping into a smaller term which is still recognized.The existence of the bound for the minimum accepted term is based on a particular well quasi-ordering.
We show that the emptiness decision algorithm of Section 5 can also be applied to a generalization of the subclass TAG of TABG without the local constraints computing on unranked ordered labeled trees (Section 6).This demonstrates the robustness of the method.
As an application of our results, in Section 7 we present a (strict) extension of the monadic second order logic on trees whose existential fragment corresponds exactly to TAG.In particular, we conclude its decidability.Related Work.TABG is a strict (decidable) extension of TAG and TA with local equality and disequality constraints, since the expressiveness of both subclasses is incomparable (see e.g.[JKV09]).
The tree automata model of [BT92] has been generalized from ranked trees to unranked ordered trees into a decidable class called UTASC [WL07,LW09].In unranked trees, the number of brothers (under a position) is unbounded, and UTASC transitions use MSO formulae (on words) with 2 free variables in order to select the sibling positions to be tested for equality and disequality.The decidable generalization of TAG to unranked ordered trees proposed in Section 6 and the automata of [WL07,LW09] are incomparable.The combination of both formalisms could be the object of a further study.
Another way to handle subterm equalities is to use automata computing on DAG representation of terms [Cha99,ANR05].This model is incomparable to TAG whose constraints are conjunctions of equalities [JKV09].The decidable extension of TA with one tree shaped memory [CC05] can simulate TAG with equality constraints only, providing that at most one state per run can be used to test equalities [FTT07].
We show in Section 3 that the TABG strictly generalize the TAGED of [FTT07,FTT08].The latter have been introduced as a tool to decide a fragment of the spatial logic TQL [FTT07].Decidable subclasses of TAGED were also shown decidable in correspondence with fragments of monadic second order logic on the tree extended with predicates for subtree (dis)equality tests.In Section 7, we generalize this correspondence to TAG and a more natural extension of MSO.
There have been several approaches to extend TA with arithmetic constraints on cardinalities |q| described above: the constraints can be added to transitions in order to count between siblings [SSM03,DL06] (in this case we could call them local by analogy with equality tests) or they can be global [KR02].We compare in Section 4 the latter approach (closer to our settings) with our extension of TABG, with respect to emptiness decision.To our knowledge, this is the first time that arithmetic constraints on cardinalities of the form q are studied.

Preliminaries
2.1.Terms, Positions, Replacements.We use the standard notations for terms and positions, see [BN98].A signature Σ is a finite set of function symbols with arity.We sometimes denote Σ explicitly as {f 1 : a 1 , . . ., f n : a n } where f 1 , . . ., f n are the function symbols, and a 1 , . . ., a n are the corresponding arities, or as {f 1 , . . ., f n } when the arities are omitted.We denote the subset of function symbols of Σ of arity m as Σ m .The set of (ranked) terms over the signature Σ is defined recursively as T (Σ) := {f (t 1 , . . ., t m ) | f : m ∈ Σ, t 1 , . . ., t m ∈ T (Σ)}.Note that the base case of this definition is {f | f : 0 ∈ Σ}, which coincides with Σ 0 by omitting the arity.Elements of this subset are called constants.
Positions in terms are denoted by sequences of natural numbers.With λ we denote the empty sequence (root position), and p.p ′ denotes the concatenation of positions p and p ′ .The set of positions of a term is defined recursively as Pos f (t 1 , . . ., t m ) = {λ} ∪ {i.p | i ∈ {1, . . ., m} ∧ p ∈ Pos(t i )}.A term t ∈ T (Σ) can be seen as a function from its set of positions Pos(t) into Σ.For this reason, the symbol labeling the position p in t shall be denoted by t(p).By p < p ′ and p ≤ p ′ we denote that p is a proper prefix of p ′ , and that p is a prefix of p ′ , respectively.In these cases, p ′ is necessarily of the form p.p ′′ , and we define p ′ − p as p ′′ .Two positions p 1 , p 2 incomparable with respect to the prefix ordering are called parallel, and it is denoted by p 1 p 2 .The subterm of t at position p, denoted t| p , is defined recursively as t| λ = t and f (t 1 , . . ., t m )| i.p = t i | p .The replacement in t of the subterm at position p by s, denoted t[s] p , is defined recursively as t[s] λ = s and f (t 1 , . . ., t i−1 , t i , t i+1 , . . ., t m )[s] i.p = f (t 1 , . . ., t i−1 , t i [s] p , t i+1 , . . ., t m ).The height of a term t, denoted h(t), is the maximal length of a position of Pos(t).In particular, the length of λ is 0.

Tree automata.
A tree automaton (TA, see e.g. states and ∆ is a set of transition rules of the form f (q 1 , . . ., q m ) → q where f : m ∈ Σ, q 1 , . . ., q m , q ∈ Q.Sometimes, we shall refer to A as a subscript of its components, like in Q A to indicate that this is the set of states of A.
A run of A is a pair r = t, M where t is a term in T (Σ) and M : Pos(t) → ∆ A is a mapping satisfying the following statement for each p ∈ Pos(t): if t| p is written of the form f (t 1 , . . ., t m ), and M (p.1), . . ., M (p.m) are rules with right-hand side states q 1 , . . ., q m ∈ Q A , respectively, then M (p) is a rule of the form f (q 1 , . . ., q m ) → q for some q ∈ Q A .We write r(p) for the right-hand side state of M (p), and say that r is a run of A on t.Moreover, by term(r) we refer to t, and by symbol(r) we refer to t(λ).The run r is called successful (or accepting) if r(λ) is in F A .The language L(A) of A is the set of terms t for which there exists a successful run of A. A language L is called regular if there exists a TA A satisfying L = L(A).For facility of explanations, we shall use term-like notations for runs defined as follows in the natural way.For a run r = t, M , by Pos(r) we denote Pos(t), and by h(r) we denote h(t).Similarly, by r| p we denote the run t| p , M | p , where M | p is defined as for each p ′ in Pos(t| p ), and say that r| p is a subrun of r.Moreover, for a run r ′ = t ′ , M ′ such that the states r ′ (λ) and r(p) coincide, by r[r ′ ] p we denote the run 2.3.Tree automata with local constraints between brothers.A tree automaton with constraints between brothers (defined in [BT92] and called TACBB in [CDG + 07]) is a tuple A = Q, Σ, F, ∆ where Q, Σ and F are defined as for TA, but with the difference that ∆ is a set of constrained rules of the form f (q 1 , . . ., q m ) C → q, where C is a set of equalities and disequalities of the form i ≈ j or i ≈ j for i, j ∈ {1, . . ., m}.We call C a local constraint between brothers.By ta(A) we define the TA obtained from A by removing all constraints from ∆.
A run of a TACBB A is a pair r = t, M defined similarly to the case of TA; t is a term in T (Σ) and the mapping M : Pos(t) → ∆ A satisfies the following statement for each p ∈ Pos(t): if t| p is written of the form f (t 1 , . . ., t m ), and M (p.1), . . ., M (p.m) are rules with right-hand side states q 1 , . . ., q m ∈ Q A , respectively, then M (p) is a rule of the form f (q 1 , . . ., q m ) C → q for some q ∈ Q A and constraint between brothers C.Moreover, for each equality i ≈ j in C, t i = t j holds, and for each disequality i ≈ j in C, t i = t j holds.The notions of successful run and recognized language are defined for TACBB analogously to the case of TA.
2.4.Term equations.Given a set of variables X , the set of (ranked) terms over Σ and X is defined as T (Σ ∪ X ) by considering arity 0 for the elements of X .A substitution σ is a mapping from variables to terms σ : X → T (Σ ∪ X ).It is also considered as a function from arbitrary terms to terms σ : T (Σ ∪ X ) → T (Σ ∪ X ) by the recursive definition σ(f (t 1 , . . ., t m )) = f (σ(t 1 ), . . ., σ(t m )) for every function symbol f and subterms t 1 , . . ., t m .
An equation between terms is an unordered pair of terms denoted l ≈ r.Given a set of equations E and two terms s, t, we say that s and t are equivalent modulo E, denoted s = E t, if there exist terms s 1 , s 2 , . . ., s n , n ≥ 1 satisfying the following statement: s = s 1 , s n = t, and for each i ∈ {1, . . ., n − 1}, there exists an equation l ≈ r in E, a substitution σ, and a position p, such that s i | p = σ(l) and s i+1 = s i [σ(r)] p .A flat equation is an equation l ≈ r where l and r are terms satisfying h(l) = h(r) ≤ 1, and any variable x occurs in l if and only if x occurs in r.A flat theory is a set of flat equations.
The following technical lemma shows that equivalence modulo a flat theory is preserved by certain replacements of subterms.It will be useful in Section 5.
be terms satisfying the following conditions: Proof.We prove the left-to-right direction only.The other one is analogous by swapping the roles of s and t by the roles of s ′ and t ′ , respectively.Since s = E t holds, there exist terms u 1 , u 2 , . . ., u k , k ≥ 1 satisfying the following statement: s = u 1 , u k = t, and for each i ∈ {1, . . ., k − 1}, there exists an equation l ≈ r in E, a substitution σ, and a position p, such that We prove the statement by induction on k.For k = 1, s = t holds.Thus, g is f , m is n, and for each i ∈ {1, . . ., n}, s i = t i holds.In particular, each s i = E t i holds.Therefore, each s ′ i = E t ′ i also holds, and hence Let l ≈ r, p and σ be the rule, position and substitution satisfying u 1 | p = σ(l) and u 2 = u 1 [σ(r)] p .Recall that u 1 is s.First, suppose that p is not λ.Then, p is of the form j.p ′ for some j ∈ {1, . . ., n} and position p ′ .Note that u 2 | j = E u 1 | j holds, and for each i ∈ {1, . . ., n} \ {j}, u 2 | i = u 1 | i holds.Thus, u 2 is of the form f (v 1 , . . ., v n ) and for each i ∈ {1, . . ., n}, v i = E s i holds.Moreover, since E is a flat theory, the step at p preserves the height, and hence, for each i ∈ {1, . . ., n}, From the statement of the lemma, the following conditions follow: holds, and we are done.Now, consider the case where p is λ.In this case s = u 1 = σ(l), and u 2 = σ(r).Since E is a flat theory, l and r are of the form f (α 1 , . . ., α n ) and h(β 1 , . . ., β µ ), where either n, µ > 0 or n = µ = 0, and α 1 , . . ., α n , β 1 , . . ., β µ are either constants or variables.Moreover, a variable occurs in l if and only if it occurs in r.Note that σ(α 1 ) = s 1 , . . ., σ(α n ) = s n holds.We call v 1 = σ(β 1 ), . . ., v µ = σ(β µ ).Note that u 2 = h(v 1 , . . ., v µ ).We define terms v ′ 1 , . . ., v ′ µ as follows for each i in {1, . . ., µ}.If v i is a constant, then we define v ′ i as v i .Otherwise, if v i is not a constant, then β i is a variable x.Since E is a flat theory, some α j (we choose any) must be x.In this case we define v ′ i as s ′ j .With these definitions, the following conditions follow: holds, and we are done.
2.5.Well quasi-orderings.A well quasi-ordering [Gal91] ≤ on a set S is a reflexive and transitive relation such that any infinite sequence of elements e 1 , e 2 , . . . of S contains an increasing pair e i ≤ e j with i < j.

Tree Automata with Global Constraints
In this subsection, we define a class of tree automata with global constraints strictly generalizing both the TACBB of [BT92] and the TAGED of [FTT08].The generalization consists in considering more general global constraints, and interpreting all the constraints modulo a flat equational theory.
As an intermediate step, we define an extension of the TACBB of [BT92] where the local constraints between brothers are considered modulo a flat equational theory.Definition 3.1.A tree automaton with constraints between brothers modulo a flat theory (TAB) is a tuple A = Q, Σ, F, ∆, E where Q, Σ, F, ∆ is a TACBB and E is a flat equational theory.
By ta(A) we denote ta( Q, Σ, F, ∆ ).A run of a TAB A = Q, Σ, F, ∆, E is a pair r = t, M defined analogously to a run of a TACBB, except that the constraints between brothers are interpreted modulo E.More specifically, for each position p in Pos(t), if t| p is written of the form f (t 1 , . . ., t m ), and M (p.1), . . ., M (p.m) are rules with right-hand side states q 1 , . . ., q m ∈ Q, respectively, then M (p) is a transition rule of ∆ A of the form f (q 1 , . . ., q m ) C → q for some q ∈ Q and constraint between brothers C.Moreover, for each equality i ≈ j in C, t i = E t j holds, and for each disequality i ≈ j in C, t i = E t j holds.The notions of successful run and recognized language are defined for TAB analogously to the case of TA.
We further extend this class TAB with global equality and disequality constraints generalizing those of TAGED [FTT08].Definition 3.2.A tree automaton with global and brother constraints modulo a flat theory and C is a Boolean combination of atomic constraints of the form q ≈ q ′ or q ≈ q ′ , where q, q ′ ∈ Q.

By ta(A) we denote ta(tab(A)).
A run of a TABG A = Q, Σ, F, ∆, E, C is a run r = t, M of tab(A) such that r satisfies C, denoted r |= C, where the satisfiability of constraints is defined as follows.For atomic constraints, r |= q ≈ q ′ (respectively r |= q ≈ q ′ ) holds if and only if for all different positions p, p ′ ∈ Pos(t) such that M (p) = q and M (p ′ ) = q ′ , t| p = E t| p ′ (respectively t| p = E t| p ′ ) holds.This notion of satisfiability is extended to Boolean combinations as usual.As for TA, we say that r is a run of A on t.A run r of A on t ∈ T (Σ) is successful (or accepting) if r(λ) ∈ F .The language L(A) of A is the set of terms t for which there exists a successful run of A.
It is important to note that the semantics of ¬(q ≈ q ′ ) and q ≈ q ′ differ, as well as the semantics of ¬(q ≈ q ′ ) and q ≈ q ′ .This is because we have a "for all" quantifier in both definitions of semantics of q ≈ q ′ and q ≈ q ′ .Let us introduce some notations, summarized in Figure 1 that we use below to characterize some classes of tree automata related to TABG (Figure 1 also refers to a class defined in Section 4).A TABG A is called positive if C A is a disjunction of conjunctions of atomic constraints and it is called positive conjunctive if C A is a conjunction of atomic constraints.The subclass of positive conjunctive TABG is denoted by TABG ∧ .
We recall that a TAB where all the constraints are empty is just a TA.For a TABG A, when the theory E A is empty and tab(A) is just a TA, we say that A is just a tree automaton with global constraints (TAG).Its subclass with positive conjunctive constraints is denoted TAG ∧ .
With the notation TABG[τ 1 , . . ., τ m ], we characterize the class of tree automata with global and brother constraints modulo a flat theory whose global constraints are Boolean TAB TACBB TA : effective strict inclusion : effective equivalence Figure 1: Decidable classes of TA with local and global constraints combination of atomic constraints of types τ 1 , . . ., τ m .The types ≈ and ≈ denote respectively the atomic constraints of the form q ≈ q ′ and q ≈ q ′ , where q, q ′ are states.For instance, the abbreviation TABG used in Definition 3.2 stands for TABG[≈, ≈].This notation is extended to the positive conjunctive fragment by TABG ∧ [τ 1 , . . ., τ k ] and to the fragment without local constraints between brother, by TAG[τ 1 , . . ., τ k ].
3.1.Expressiveness.The class of regular languages is strictly included in the class of TABG languages due to the constraints.
Example 3.3.Let Σ = {a : 0, f : 2}.The set {f (t, t) | t ∈ T (Σ)} is not a regular tree language (this can be shown using a classical pumping argument).However, it is recognized by the following TAB: and it is also recognized by the following TAG[≈]: where t → q | q r is an abbreviation for t → q and t → q r .An example of successful run of A on t = f (f (a, a), f (a, a)) is q f q 1 (q 0 , q 0 ), q 1 (q 0 , q 0 ) , where we use term-like notation for marking the reached state at each position.
Moreover, the TAGED of [FTT08] are also a particular case of TAG[≈, ≈], since they can be redefined in our setting as restricted TAG ∧ [≈, ≈], where the equational theory is empty, and where q and q ′ are required to be distinct in any atomic constraint of the form q ≈ q ′ .Reflexive disequality constraints such as q ≈ q correspond to monadic key constraints for XML documents, meaning that every two distinct positions of type q have different values.A state q of a TAG[≈, ≈] can be used for instance to characterize unique identifiers as in the following example, which presents a TAG[≈, ≈] whose language cannot be recognized by a TAGED.This example will be referred several times in Section 5, in order to illustrate the definitions used in the decision procedure of the emptiness problem for TAG[≈, ≈].Example 3.4.The TAG[≈, ≈] of our running example accepts (in state q M ) lists of dishes called menus, where every dish is associated with one identifier (state q id ) and the time needed to cook it (state q t ).We have other states accepting digits (q d ), numbers (q N ) and lists of dishes (q L ).
The constraint C ensures that all the identifiers of the dishes in a menu are pairwise distinct (i.e. that q id is a key) and that the time to cook is the same for all dishes: C = q id ≈ q id ∧ q t ≈ q t .A term in L(A) together with an associated successful run are depicted in Figure 2.
Althought this is a simple exercise, let us establish formally that TAG[≈, ≈] are strictly more expressive than TAGED.
Lemma 3.5.The class of languages recognized by TAG ∧ [≈, ≈] strictly includes the class of languages recognized by TAGED.
Proof.Since a TAGED is just a TAG ∧ [≈, ≈] where no constraint of the form q ≈ q occurs, the inclusion holds.In order to see that it is strict, it suffices to show a language L which can be recognized by a TAG ∧ [≈, ≈] but not by a TAGED.
Let Σ = {a : 0, s : 1, f : 2}.The set L of terms of T (Σ) of the form f (s n 1 (a), f (s n 2 (a), . . ., f (s n k (a), a) . ..)), such that k ≥ 0 and the natural numbers n i , for i ≤ k, are pairwise distinct, is recognized by the following TAG ∧ [≈, ≈]: Assume that there exists a TAG ∧ [≈, ≈] A without reflexive disequality constraints of the form q ≈ q (i.e. a TAGED), recognizing this language L.Then, there exists an accepting run r of A on the term t = f (s(a), f (s 2 (a), . . .f (s |Q A |+1 (a), a) . ..)) ∈ L. Therefore, r |= C A (the global constraint of A, which is positive by hypothesis).
There are two different positions and r is a run of A on t, r ′ is a run of ta(A) on t ′ .Hence, it suffices to prove that the constraint C A is satisfied by r ′ .Consider a position p of the form 2.2. . . ..2 with |p| < j.We start by proving that any atomic constraint involving r ′ (p) is satisfied.Note that r ′ (p) = r(p) holds, and that the subterm t| p has only this occurrence in t.Thus, any atomic constraint involving r(p) and a state q occurring in r is necessarily of the form r(p) ≈ q.Since any state occurring in r ′ occurs also in r, any atomic constraint involving r ′ (p) and a state q occurring in r ′ is of the form r ′ (p) ≈ q.Moreover, the subterm t ′ | p has only this occurrence in t ′ .Thus, such a constraint is satisfied.Now consider two different positions p 1 , p 2 which are not of the form described above.It remains to see that any atomic constraint involving r ′ (p 1 ) and r ′ (p 2 ) is satisfied.In the case where r ′ | p 1 and r ′ | p 2 are different, this is a direct consequence of the fact that both subruns r ′ | p 1 and r ′ | p 2 are also subruns of r at different positions.Otherwise, in the case where r ′ | p 1 and r ′ | p 2 are the same subrun, then, r ′ (p 1 ) = r ′ (p 2 ) holds, and any atomic constraint involving r ′ (p 1 ) and r ′ (p 2 ) must be of the form r ′ (p 1 ) ≈ r ′ (p 2 ) because A has no reflexive disequalities.Thus, the atomic constraint is also satisfied in this case.
The following example shows a TABG recognizing a language that cannot be recognized by a TAG[≈, ≈].The proof is a simple exercise and it is left to the reader.
Example 3.6.Assume that the terms of Example 3.4 are now used to record the activity of a restaurant.To this end, we transform the TAG of example 3.4 into a TABG as follows.First, in order to simplify the example we omit the restriction that all cooking times coincide, i.e.C = q id ≈ q id .Second, we add a new argument of type q t to L 0 , L and M , so that the old argument q t characterizes the theoretical time to cook, and the new q t characterizes the real time that was needed to cook the dish.Let us replace the transitions with L 0 , L and M in input by L 0 (q id , q t , q t ) − −− → 2≈3 q L , L 0 (q id , q t , q t ) − −− → 2 ≈3 q ′ L , L(q id , q t , q t , q L ) − −− → 2≈3 q L , L(q id , q t , q t , q L ) − −− → 2 ≈3 q ′ L , M (q id , q t , q t , q L ) − −− → 2≈3 q M , M (q id , q t , q t , q L ) − −− → 2 ≈3 q ′ M , where q ′ L is a new state meaning that there was an anomaly.We also add a transition L(q id , q t , q t , q ′ L ) → q ′ L to propagate q ′ L and M (q id , q t , q t , q ′ L ) → q ′ M .By keeping the set of final states as {q M }, the recognized language of the TABG obtained is the set of records well cooked, i.e. such that for all dishes, the real time to cook is equal to the theoretical time.By redefining the set of final states as {q ′ M }, the recognized language is the set of records with an anomaly.

Decision Problems.
The membership is the problem to decide, given a term t ∈ T (Σ) and a TABG A over Σ whether t ∈ L(A).
Proposition 3.7.Membership is NP-complete for TABG, by assuming that the maximum arity of the signature Σ is a constant for the problem.
Proof.In order to prove that this problem is in NP, given a TABG A = Q, Σ, F, ∆, E, C and a term t ∈ T (Σ), we can non-deterministically guess a function M from Pos(t) into ∆, and check that t, M is a successful run of A on t.The checking can be performed in polynomial time.In particular, testing equivalence modulo E can be performed in polynomial time using a dynamic programming scheme, by assuming that the maximum arity of Σ is a constant of the problem, which is a usual assumption.More general results are given in [Nie96,CHJ94].For NP-hardness, [FTT08,JKV09] present PTIME reductions of the satisfiability of Boolean expressions into membership for TAG ∧ [≈] whose constraints are conjunctions of equalities of the form q ≈ q.
Recall that for plain TA, membership is in PTIME.
The universality is the problem to decide, given a TABG A over Σ, whether L(A) = T (Σ).It is known to be undecidable already for a small subclass of TAG.
The following consequence is a new result for TAGED.Proposition 3.9.It is undecidable whether the language of a given TAG ∧ [≈] is regular.
Proof.We show that universality is reducible to regularity using a new function symbol f with arity 2, and any non-regular language L which is recognizable by a TAG ∧ [≈] (such a language exists).
Let A be an input of universality for TAG ∧ [≈] and let Thus, in order to conclude, it suffices to show that L(A) = T (Σ) if and only if L(A ′ ) is regular.For this purpose let us first define the quotient of a term language R by a term s with respect to a function symbol f : R/s := {t | f (s, t) ∈ R}.This operation preserves regular languages: for all s and f , if The emptiness is the problem to decide, given a TABG A, whether L(A) = ∅.The proof that it is decidable for TABG is rather involved and is presented in Section 5.

Arithmetic Constraints and Reduction to TABG ∧
This section has two goals.The first goal is to present an extension of TABG by allowing certain global arithmetic constraints.They are interesting by themselves since they allow the representation of several natural properties in a simple way.The second goal is to show that the class of TABG languages coincides (in expressiveness) with the class of TABG ∧ languages.In other words, for each TABG there exists a TABG ∧ recognizing the same language.This reduction will be very useful in Section 5 in order to prove decidability of emptiness of TABG.
The reason for presenting both results in the same section is that arithmetic constraints simplify the task of transforming a TABG into a TABG ∧ representing the same language.This is because negations can be replaced by arithmetic constraints with an equivalent meaning in a first intermediate step, and such constraints are easier to deal with.
All this work is developed in Subsection 4.2.Before that, in Subsection 4.1 we present a more general form of arithmetic constraints for which emptiness is undecidable.The motivation of this first subsection is to show the limits of positive results in this setting, and to justify the limited form of the constraints in Subsection 4.2.4.1.Global Integer Linear Constraints.Let Q be a set of states.A linear inequality over Q is an expression of the form q∈Q a q • |q| ≥ a or q∈Q a q • q ≥ a where every a q and a belong to Z.We consider the above linear inequalities as atomic constraints of tree automata with global constraints, and denote by |.| Z and .Z their respective types.The type Z denotes |.| Z and .Z together.
Using the notation introduced in Section 3, TABG[≈, ≈, |.| Z , .Z ] (or TABG[≈, ≈, Z]) denotes the class of tree automata with global and brother constraints modulo a flat theory of the form and C is a Boolean combination of atomic constraints which can be linear inequalities as above or equality or disequality constraints of the form q ≈ q ′ or q ≈ q ′ , with q, q ′ ∈ Q.
Let A be a TABG[≈, ≈, |.| Z , .Z ] over Σ and with state set Q and flat equational theory E, let r be a run of tab(A) on a term t ∈ T (Σ) and let q ∈ Q. Intuitively, the interpretation of |q| with respect to r is the number of occurrences of q in r, i.e. the number of positions p holding r(p) = q.The interpretation of q with respect to r is the number of different subterms (modulo E) in t reaching state q with r, i.e. the maximum number of positions p 1 , p 2 , . . ., p n holding r(p 1 ) = r(p 2 ) = . . .= r(p m ) = q and such that the terms t| p 1 , t| p 2 , . . ., t| pn are pairwise different (modulo E).More formally, the interpretations of |q| and q with respect to r (and t) are defined, respectively, by the following cardinalities: This permits to define the satisfiability of linear inequalities with respect to r and t: r |= q∈Q a q • |q| ≥ a holds if and only if q∈Q a q • |q| r ≥ a holds, and r |= q∈Q a q • q ≥ a holds if and only if q∈Q a q • q r ≥ a holds.The satisfiability of the global constraint C A of A by r, denoted r |= C A is defined accordingly, and if r |= C A then r is called a run of A. A run of A on t ∈ T (Σ) is successful (or accepting) if r(λ) ∈ F A .The language L(A) of A is the set of terms t for which there exists a successful run of A.
Example 4.1.Let us add a new argument to the dishes of the menu of Example 3.4 which represents the price coded on two digits by a term N (d 1 , d 0 ).We add a new state q p for the type of prices, and other states q cheap , q moderate , q expensive , q chic describing price level ranges, and transitions 0|1 → q cheap , 2|3 → q moderate , 4|5|6 → q expensive , 7|8|9 → q chic and N (q cheap , q d ) → q p , . . . .The price is a new argument of L 0 , L and M , hence we replace the transitions with these symbols in input by L 0 (q id , q t , q p ) → q L , L(q id , q t , q p , q L ) → q L , M (q id , q t , q p , q L ) → q M .We can use a linear inequality |q cheap | + |q moderate | − |q expensive | − |q chic | ≥ 0 to characterize the moderate menus, and |q expensive | + |q chic | ≥ 6 to characterize the menus with too many expensive dishes.A linear inequality q p ≤ 1 expresses that all the dishes have the same price.
The class TAG[ |.| Z ] has been studied under different names (e.g.Parikh automata in [KR02], linear constraint tree automata in [BMSL09]) and it has a decidable emptiness test.Indeed, the set of successful runs of a given TA with state set Q is a context-free language (seeing runs as words of Q * ), and the Parikh projection (the set of tuples over N |Q| whose components are the |q| r for every run r) of such a language is a semi-linear set.The idea for deciding emptiness for a TAG[ |.| Z ] A is to compute this semi-linear set and to test the emptiness of its intersection with the set of solutions in N |Q| of C A , the arithmetic constraint of A (a Boolean combination of linear inequalities of type |.| Z ) which is also semi-linear.This can be done in NPTIME, see [BMSL09].
To our knowledge, the class TAG[ .Z ] with global constraints counting the number of distinct subterms in each state, has not been studied, even modulo an empty theory.
Combining constraints of type ≈ and counting constraints of type |.| Z however leads to undecidability.
Proof.We consider the Hilbert's tenth problem, that is, solvability of an input equation P = 0 where P is a polynomial with integer coefficients and variables ranging over the natural numbers.This problem is known undecidable, and with the addition of new variables it is easily reducible to a question of the form ∃x 1 . . .∃x n : e 1 ∧ . . .∧ e m , where x 1 , . . ., x n are variables ranging over the natural numbers, and e 1 , . . ., e m are equations that are either of the form x j + x k = x t or x j * x k = x t or x j = 1 or x j = 0. We reduce this last problem to emptiness of We consider an instance ϕ ≡ ∃x 1 . . .∃x n : e 1 ∧ . . .∧ e m .Without loss of generality, we assume that e 1 , . . ., e m ′ for m ′ ≤ m are all the equations of the form x j * x k = x t , and that for each of such equations, the indexes j, k, t are different.We will construct a Since the construction of A is technical, let us give first some intuitions (see Figure 3).Consider a possible assignment x 1 := v 1 , . . ., x n := v n .A concrete run of A will be able to check whether this assignment proves that ϕ is true, and only accept the corresponding term if the answer is positive.In this run, there will be v 1 occurrences of state q |x 1 | , v 2 occurrences of state q |x 2 | , and so on.Equations of the form x j + x k = x t , x j = 1 and x j = 0 can directly be checked by constraints of the form For each equation e i of the form x j * x k = x t there will be v k occurrences of a state called q e i ,|x k | .This is ensured by the constraint Under each of these occurrences, there will be the same term, reaching a state q e i ,x j , and containing v j occurrences of a state q e i ,|xt| .The uniqueness of this term, as well as the number of occurrences of q e i ,|xt| , are both ensured by an equality constraint q x j ≈ q e i ,x j .In summary, there will be v j * v k occurrences of state q e i ,|xt| .The satisfiability of the equation x j * x k = x t will be checked by the constraint f (q x 1 , . . ., q xn , q e 1 , . . ., q e m ′ ) → q accept }∪ {g(q a ) → q |x j | , g(q a ) → q x j , g(q |x j | ) → q |x j | , g(q |x j | ) → q x j j ∈ {1, . . ., n}}∪ {g(q a ) → q e i ,|xt| , g(q a ) → q e i ,x j , g(q e i ,|xt| ) → q e i ,|xt| , g(q e i ,|xt| ) → q e i ,x j , h(q e i ,x j , q a ) → q e i ,|x k | , h(q e i ,x j , q e i ,|x k | ) → q e i ,|x k | , h(q a , q e i ,|x k | ) → q e i , h(q a , q a ) → q e i i ∈ {1, . . ., m ′ }, Figure 3: Accepting run of s = f (g v 1 +1 (a), . . ., g vn+1 (a), s e 1 , . . ., s e m ′ ) and the subrun of s e i , where e i is of the form It remains to prove that ϕ is true if and only if L(A) is not empty.To this end, let us first assume that x 1 := v 1 , . . ., x n := v n is a solution of ϕ.In order to simplify the presentation, we denote the term h(a, h(s, h(s, . . ., h(s, a) . ..))), with k occurrences of s, by h[a, s, . . .(k) . . ., s, a], and given an equation e i ≡ x j * x k = x t , we denote the term h[a, g v j +1 (a), . . .(v k ) . . ., g v j +1 (a), a] by s e i .Let us consider the term s = f (g v 1 +1 (a), . . ., g vn+1 (a), s e 1 , . . ., s e m ′ ).It is not difficult to see that the run of Figure 3 is an accepting run of s.Note that for each equation e i ≡ x j * x k = x t , the constraints Now, assume that there is an accepting run r of A on a term s.Since r is accepting, the transition rule f (q x 1 , . . ., q xn , q e 1 , . . ., q e m ′ ) → q accept is applied at the root of s.According to the form of the rules involving q x 1 , . . ., q xn , it holds that s is of the form s = f (g v 1 +1 (a), . . ., g vn+1 (a), s e 1 , . . ., s e m ′ ), for some natural numbers v 1 , . . ., v n and some terms s e 1 , . . ., s e m ′ .Moreover, the states q |x 1 | , . . ., q |xn| have v 1 , . . ., v n occurrences, respectively.It remains to see that the assignment x 1 := v 1 , . . ., x n := v n makes ϕ true.The satisfiability of a constraint of the form , thus an equation of the form x j + x k = x t (or x j = 1 or x j = 0) holds with this assignment.It remains to see that every equation e i of the form x j * x k = x t also holds with this assignment.According to the form of the rules of A and the satisfiability of the constraints Therefore, by the satisfiability of the constraint |q e i ,|xt| | = |q |xt| |, it follows v j * v k = v t , and hence the equation x j * x k = x t holds with this assignment, and we are done.4.2.Global Natural Linear Constraints.We present now a restriction on linear inequalities which enables a decidable emptiness test when combined with ≈ and ≈ as global constraints.A natural linear inequality over Q is a linear inequality as above whose coefficients a q and a all have the same sign.We call them natural since it is equivalent to consider inequalities in both directions whose coefficients are all non-negative, like a q • |q| ≤ a, with a q , a ∈ N, to refer to −a q • |q| ≥ −a.We also consider linear equalities a q • |q| = a, with a q , a ∈ N, to refer to a conjunction of two natural linear inequalities.
The types of the natural linear inequalities are denoted by |.| N and .N .Below, we shall abbreviate these two types by N.
The main difference between the linear inequalities of type |.| Z and |.| N (and respectively .Z and .N ) is that the former permits to compare the respective number of occurrences of two states, like e.g. in |q| ≤ |q ′ |, whereas the latter only permits to compare the number of occurrences of one state (or a sum of the number occurrences of several states with coefficients) to a constant as e.g. in |q| ≤ 4 or |q| + 2|q ′ | ≤ 9.
In the rest of the subsection we show that TABG[≈, ≈, N] has the same expressiveness as TABG ∧ [≈, ≈].The proof works in several steps: • First, we define the notion of normalized TABG[≈, ≈, N], that is a TABG[≈, ≈, N] with a constraint being a disjunction of conjunctions of literals in a simple form.• Second, we remove negative literals of the form ¬(q ≈ q ′ ) or ¬(q ≈ q ′ ), obtaining a list of TABG Remember that the form of the positive arithmetic literals can be either a 1 q 1 + . . .+ a n q n ⊗ k or a 1 |q 1 |+ . . .+ a n |q n |⊗ k, with ⊗ in {≥, ≤, =}, n > 0, k ≥ 0 and strictly positive a 1 , . . ., a n .Lemma 4.4.Any TABG[≈, ≈, N] can be effectively transformed into a normalized TABG[≈ , ≈, N] with the same equational theory and preserving the language.
Proof.First, by applying de Morgan laws, negations are moved inwards so that each negation is applied to just an atom.Second, negative arithmetic literals are made positive by simple transformations: inequalities are inverted and equalities become disjunctions of inequalities.Third, strict inequalities are converted into non-strict by adding or subtracting 1 to a side.Fourth, by applying simple arithmetic operations all such literals are made of the required form a 1 q 1 + . . .+ a n q n ⊗ k or a 1 |q 1 | + . . .+ a n |q n | ⊗ k, for ⊗ in {≥, ≤, =}, n > 0 and strictly positive a 1 , . . ., a n .In this step, a trivially false literal is replaced by false, and a trivially true literal is replaced by true.Finally, by applying the standard transformation into disjunctive conjunctive normal form we get the desired result.
In order to remove negative equality and disequality literals and positive arithmetic constraints, we use the idea of inserting new states which are synonyms of existing states.Intuitively, a synonym is a new state q that behaves analogous to an existing state q, i.e. the rules and constraints are modified such that the relation of q with the other states is the same as for q.Nevertheless, the constraints are further modified to ensure that, whenever q occurs in an execution, q also occurs.Moreover, all subterms reaching q are the same (or equivalent modulo the relation induced by the flat theory), but are different from (nonequivalent to) the ones reaching q.This way, an execution of the original automaton with occurrences of q can be transformed into an execution of the new automaton, where the occurrences of a concrete subterm (up to the equivalence relation) reaching q in the original execution now reach q instead.Definition 4.5.
Let q be a state in Q.Let q be a state not in Q.
We define F q❀q as F if q is not in F , and as F ∪ {q} if q is in F .We define ∆ q❀q as the set of rules obtained from the rules of ∆ with all possible replacements of occurrences of q by q.More formally, ∆ q❀q is {f (q ′ 1 , . . ., q ′ n ) → q ′ n+1 | ∃f (q 1 , . . ., q n ) → q n+1 ∈ ∆ : ∀i ∈ {1, . . ., n + 1} : (q i = q ′ i ∨ (q i = q ∧ q ′ i = q))}.We define C q❀q as the constraint ( q = 0 ∧ q = 0) ∨ ( q = 1 ∧ q ≈ q) ∧ C ′ , where C ′ is obtained from the normalization of C by replacing each literal by a new formula according to the following description.
We define A q❀q as Q ∪ {q}, Σ, F q❀q , ∆ q❀q , E, C q❀q .We write (F q❀q ) q′ ❀q ′ for q = q′ and q = q′ more succinctly as F q,q ′ ❀q,q ′ , and similarly for ∆ q,q ′ ❀q,q ′ , C q,q ′ ❀q,q ′ and A q,q ′ ❀q,q ′ .The condition ( q = 0 ∧ q = 0) added to C q❀q is necessary to satisfy L(A q❀q ) = L(A), as it is proved in Lemma 4.6.This lemma is not used in the rest of the article, since the introduction of synonyms is combined with other constraints in further transformations.Nevertheless, we preserve Lemma 4.6 since its proof gives intuition about the definition of synonyms, and the arguments are similar to other ones appearing later.
Lemma 4.6.Let A = Q, Σ, F, ∆, E, C be a TABG[≈, ≈, N].Let q be a state in Q.Let q be a state not in Q.
Proof.Accepting runs of A having no occurrence of q are also accepting runs of A q❀q .An accepting run of A having occurrences of q can be converted into an accepting run of A q❀q by choosing one subterm t reaching q and replacing q by q at all positions with subterms equivalent to t by the relation induced by E.
Accepting runs of A q❀q can be converted into accepting runs of A by replacing each occurrence of q by q.
The following lemma makes use of synonyms in order to remove a negative literal of the form ¬(q ≈ q′ ) preserving the language.The next one, Lemma 4.8, analogously permits to remove a negative literal of the form ¬(q ≈ q′ ).
Proof.Accepting runs of A can be converted into accepting runs of A ′ as follows.First, we choose two subterms t and t′ different modulo the equivalence relation induced by E and reaching q and q′ , respectively.Note that these terms must exist in order to satisfy the literal ¬(q ≈ q′ ) of C. Second, we replace q by q at all the positions with subterms equivalent to t by the relation induced by E. Similarly, we replace q′ by q′ at all the positions with subterms equivalent to t′ by the relation induced by E. This way, the subconstraint q = 1 ∧ q′ = 1 ∧ q ≈ q′ is satisfied, but also C ′ q,q ′ ❀q,q ′ is satisfied.Accepting runs of A ′ can be converted into accepting runs of A by replacing each occurrence of q by q, and each occurrence of q′ by q′ .Note that the subconstraint q = 1 ∧ q′ = 1 ∧ q ≈ q′ ensures the existence of such occurrences, and with subterms which are different modulo the equivalence relation induced by E. Thus, the literal ¬(q ≈ q′ ) of C is satisfied.The constraint C ′ is also satisfied.Lemma 4.8.Consider the same assumptions as in Lemma 4.7, except that C is of the form ¬(q ≈ q′ ) ∧ C ′ and the constraint of Proof.Analogous to the proof of Lemma 4.8.
The following definition will be used to remove literals of type .N .Definition 4.9.Let C be a constraint, and let k be a natural number.By C q ❀k we define the constraint obtained from C by replacing all occurrences of q by k.
The following two lemmas show how to remove literals of the form q = 1 or q = 0 preserving the language.
Proof.Accepting runs of A ′ and A coincide because the constraints C and C A ′ have the same semantics.

Lemma 4.11. Let
Proof.Accepting runs of A ′ and A coincide because the constraints C and C A ′ have the same semantics.Now, we will use the above lemmas in order to iteratively remove all negative literals and the arithmetic literals of type .N .Each removal step is not defined for arbitrary normalized TABG[≈, ≈, N], but just for normalized conjunctive TABG[≈, ≈, N].For this reason, we first describe how to transform a given normalized TABG[≈, ≈, N] into a list of normalized conjunctive TABG[≈, ≈, N] such that, the union of their languages coincides with the language of the original TABG[≈, ≈, N].Definition 4.12.∆, E, C n .These automata are conjunctive and normalized and, moreover, Iteratively, we will transform a list of normalized conjunctive TABG[≈, ≈, N] into a new list of automata of the same kind but with simplified constraints, preserving the language.In order to show that this process terminates, we define a measure on normalized conjunctive TABG[≈, ≈, N] which will decrease at each step.Moreover, a case with minimal measure corresponds to a positive TABG[≈, ≈, |.| N ].This measure is a pair of natural numbers which depends on the constraint C of the normalized conjunctive TABG[≈, ≈, N].In the first component we have the amount of negative literals in C. In the second component we have the addition of the isolated constants in all arithmetic literal constraints of type .N plus the number of uses of the function symbol .N .Definition 4.13.We define the measure of a normalized conjunctive constraint C, denoted C as a pair of natural numbers.We describe it by distinguishing the following cases.
• If C is of the form q 1 ≈ q 2 or q 1 ≈ q 2 , then its measure is 0, 0 .
• If C is of the form (a 1 q 1 + . . .+ a n q n ⊗ k), where ⊗ is in {=, ≥, ≤}, then its measure is 0, The measure of A, denoted A is defined as C .We say that A 1 is bigger than A 2 (or, equivalently, that A 2 is smaller than A 1 ), denoted A 1 > A 2 (or A 2 < A 1 ), if the measure of A 1 is bigger (or smaller) than the measure of A 2 , according to the lexicographic extension of the relation > of natural numbers.
The following lemma shows that any normalized conjunctive TABG[≈, ≈, N] with nonminimal measure can be transformed into a list of TABG[≈, ≈, N] of the same kind with smaller measures and preserving the language.Lemma 4.14.Let A = Q, Σ, F, ∆, E, C be a normalized conjunctive TABG[≈, ≈, N] whose measure is not 0, 0 .
Then one can construct normalized conjunctive TABG[≈, ≈, N] A 1 , . . ., A n with the same equational theory E, each of them having a measure smaller than A and such that L(A) = L(A 1 ) ∪ . . .∪ L(A n ) holds.
Proof.In the case where C has some negative literal ¬(q ≈ q ′ ) or ¬(q ≈ q ′ ), the transformations described in Lemmas 4.7 and 4.8 give a new TABG[≈, ≈, N] A ′ , and the subdivision A 1 , . . ., A n of the normalization of A ′ (as defined in Definition 4.12) is such that the constraints C A 1 , . . ., C An have one less negative literal than C. Thus, the measure of each of these automata is smaller than the measure of A.
In the case where C has no negative literals of the form ¬(q ≈ q ′ ) or ¬(q ≈ q ′ ), its measure is of the form 0, m for m > 0. It follows that there is at least one literal of the form (a q + a i • q i ⊗k), where ⊗ is in {=, ≥, ≤}.We consider a new state q and the automaton A q❀q .Its constraint C q❀q is of the form ( q = 0 ∧ q = 0) ∨ ( q = 1 ∧ q ≈ q) ∧ C ′ .Note that, according to Definition 4.5, C ′ is a conjunction because there are no negative literals of the form ¬(q ≈ q ′ ) or ¬(q ≈ q ′ ) in C. Thus, C q❀q can be rewritten as the disjunction of two conjunctions C 1 and C 2 , where C 1 is q = 0 ∧ q = 0 ∧ C ′ and C 2 is q = 1 ∧ q ≈ q ∧ C ′ .Hence, the subdivision of the normalization of A q❀q are the automata A 1 , A 2 obtained from A q❀q by replacing its constraint by C 1 and C 2 , respectively.The measures of C 1 and C 2 may be bigger than the one of C. In order to conclude, for each case we show that additional transformations can be applied to A 1 and A 2 , producing automata with smaller measures than the one of A and preserving the represented language.
• The literals of C 1 of type .N are q = 0 and q = 0, and those obtained from the literals of C of type .N by replacing q by q + q .Note that original literals of the form (a q + a i • q i ⊗ k) have been converted into (a q + a q + a i • q i ⊗ k), and recall that there is at least one literal of this form in C. Applying to A 1 the transformation described in Lemma 4.11 for q and q, each one of the above literals is transformed into (a • 0 + a • 0 + a i • q i ⊗ k), which has a smaller measure than the original literal (a q + a i • q i ⊗ k).Moreover, the literals q = 0 and q = 0 are converted into |q| = 0 and |q| = 0, respectively.In summary, the measure of ((C 1 ) q❀0 ) q❀0 is smaller than the one of C.
• Similarly, the literals of C 2 of type .N are q = 1 and those obtained from the literals of C of type .N by replacing q by q + q .As above, note that original literals of the form (a q + a i • q i ⊗ k) have been transformed into (a q + a q + a i • q i ⊗ k), and recall that there is at least one literal of this form in C. Applying to C 2 the transformation described in Lemma 4.10 for q, each one of the above literals is converted into (a• q +a•1+ a i • q i ⊗k).The normalization of such a literal is the normalization of (a • q + a i • q i ⊗ k − a), which might be already normalized or must be replaced by true or false in order to normalize it, depending on k − a and ⊗.In every case, the resulting literal has a smaller measure than the original literal (a q + a i • q i ⊗ k).Moreover, the literal q = 1 is replaced by |q| ≥ 1 ∧ q ≈ q as a consequence of the transformation of Lemma 4.10.To summarize, the measure of (C 2 ) q❀1 is smaller than the one of C. Proof.Without loss of generality, the constraint C can be assumed to be normalized.The subdivision of A is a collection of normalized conjunctive TABG[≈, ≈, N] such that the union of their languages coincides with L(A).
By iterated application of the Lemma 4.14 to each automaton of the subdivision, combined with the fact that the ordering on measures is well founded, we conclude to the effective existence of normalized conjunctive TABG[≈, ≈, N] A 1 , . . ., A n such that L(A) = L(A 1 ) ∪ . . .∪ L(A n ) and each of them has measure 0, 0 .This kind of automata are, in fact, TABG ∧ [≈, ≈, |.| N ], since measure 0, 0 implies that negative literals and literals of type .N do not occur.Now, in order to remove all arithmetic constraints, it remains to remove the ones of type |.| N .This is a rather easy task.For a given A N whose purpose is to simulate the computations of A. To this end, the states of A N count the number of occurrences of the states of A in the simulated computation, up to a certain maximum value.This allows A N to check the constraints of type |.| N of A directly through states.Thus, each state of A N is of the form q M for a state q of A and a mapping M : Q A → N, that is, a mapping counting the number of occurrences of each state.
We define max A as one plus the maximum isolated constant occurring in the literals of C of type |.| N , i.e. one plus the maximum constant k occurring in a literal of C of the form (a Given two mappings M 1 : Q → {0, . . ., max A } and M 2 : Q → {0, . . ., max A }, the sum of M 1 and M 2 is defined as the mapping M 1 + M 2 : Q → {0, . . ., max A } satisfying (M 1 + M 2 )(q) = min(M 1 (q) + M 2 (q), max A ).Given a state q in Q we define M q : Q → {0, . . ., max A } as the mapping satisfying M q (q) = 1 and M q (q ′ ) = 0 for all q ′ ∈ Q \ {q}.
Proof.The accepting runs of A can be converted into accepting runs of A N and vice-versa, following the transformations described below.
• A run r N of A N can be converted into a run r of A by replacing each occurrence of a state q M by the corresponding state q.• A run r of A can be converted into a run r N of A N .The transformation can be defined recursively as follows.Let r be a run of the form (f (q 1 , . . ., q m ) D → q)(r 1 , . . ., r m ).Let (r 1 ) N , . . ., (r m ) N be the transformations of r 1 , . . ., r m , and let (q 1 ) M 1 , . . ., (q m ) Mm be the states reached by (r 1 ) N , . . ., (r m ) N , respectively.Then, r N is (f ((q 1 ) M 1 , . . ., (q m ) Mm ) D → q M 1 +...+Mm+Mq )((r 1 ) N , . . ., (r m ) N ).Each one of the two above transformations is the inverse of the other.Thus, they describe a bijection between runs of A and runs of A N .Moreover, for each run r of A, the state q M reached by r N holds that each q ′ ∈ Q satisfies M (q ′ ) = min(|r −1 (q ′ )|, max A ) (note that r −1 (q ′ ) is the set of positions reaching state q ′ ).Hence, by the definition of F N , it follows that q is in F and r satisfies the arithmetic constraints of C if and only if q M is in F N .As a consequence, r is accepting if and only if r N is accepting.Thus, L(A N ) = L(A) holds.
The following corollary is a consequence of Corollary 4.15 combined with Lemma 4.17.
Then, one can construct some TABG ∧ [≈, ≈] A 1 , . . ., A n with the same equational theory As a final step, we show that TABG ∧ [≈, ≈] are closed under union for a fixed E.
Lemma 4.19.Let A 1 and A 2 be TABG ∧ [≈, ≈] with the same equational theory E.Then, a TABG ∧ [≈, ≈] A with the same equational theory E can be effectively constructed satisfying Without loss of generality we can assume that the sets of states Q 1 and Q 2 are disjoint.
In the case where C 1 is just false the result follows by defining A := A 2 .Similarly, in the case where C 2 is just false the result follows by defining A := A 1 .From now on we assume that these cases do not take place.
We define A as It is clear that any accepting run of A is also an accepting run of either A 1 or A 2 .Moreover, it can be proved that any accepting run of either A 1 or A 2 is also an accepting run of A. We show this fact only for A 1 , since the case for A 2 is analogous.
Let r be an accepting run of A 1 .Then, r |= C 1 holds.In order to see that it is, in fact, an accepting run of A, it remains to prove r |= C 2 .Since A 2 is a TABG ∧ [≈, ≈], C 2 is a conjunction of positive literals of type ≈, ≈ applied to states of Q 2 .Therefore, r |= C 2 holds, since C 2 is not false and any positive literal holds because r uses only states from Then, one can construct a TABG ∧ [≈, ≈] A ′ with the same equational theory E such that L(A ′ ) = L(A).
Corollary 4.21.The class of TABG languages (modulo the same equational theory) is closed under union.
In order to complete the closure results for TABG languages under basic set operations, we show that they are also closed under intersection, but not under complementation.Proof.We use a classical Cartesian product of sets of states, with a careful redefinition of constraints on this product.
Lemma 4.23.The class of TABG languages is not closed under complementation.
Proof.To prove the statement it suffices to define a language L such that L is not recognizable by TABG but its complement L is.In order to simplify the presentation, we denote terms of the form f (g Let L be the language defined as: In order to prove that L is not recognizable by TABG, by Corollary 4.20, it suffices to prove it for TABG ∧ [≈, ≈].We proceed by contradiction assuming that there exists a TABG ∧ [≈, ≈] A such that L(A) = L. Let t ∈ L be the term [1, . . ., n, n, . . ., 1], where n > |Q A |, and let r be an accepting run of A on t.By the pigeonhole principle, there exist i, j ∈ {1, . . ., n}, with i < j, such that the positions p i = i−1 2. . . ..2 and p j = j−1 2. . . ..2 satisfy r(p i ) = r(p j ).Let r ′ be the replacement r[r| p j ] p i .Note that r ′ is an accepting run of ta(A) on the term [1, . . ., i − 1, j, . . ., n, n, . . ., 1], which is not in L. To conclude, it remains to prove that the constraints of A are satisfied in r ′ .First, note that this replacement only introduces new subterms at the positions P = {p ∈ Pos(r) | p < p i }.Moreover, the rules applied by r ′ at positions in P are the same as in r, and any constraint affecting a position in P in r is necessarily a disequality, since term(r| p) = E A term(r| p ′ ) holds for p ∈ P and p ′ ∈ Pos(r) \ {p}.By the definition of r ′ , necessarily term(r ′ | p) = E A term(r ′ | p ′ ) holds also for p ∈ P and p ′ ∈ Pos(r ′ ) \ {p}.Therefore, r ′ satisfies all the constraints, and hence, r ′ is an accepting run of A, a contradiction.
It remains to prove that L can be recognized by a TABG.We start by decomposing L into simpler languages.First, let L 1 be the language of the malformed terms, i.e. the terms over {f : 2, g : 1, a : 0} that are not of the form [n 1 , . . ., n k ].Second, let L 2 be the language of the well-formed terms [n 1 , . . ., n k ] such that for some i ∈ {1, . . ., k} there exists no j ∈ {1, . . ., k} \ {i} satisfying n i = n j .Third, let L 3 be the language of the well-formed terms [n 1 , . . ., n k ] such that there exist different i 1 , i 2 , i 3 ∈ {1, . . ., k} satisfying

Emptiness Decision Algorithm
In this section we prove the decidability of the emptiness problem for TABG ∧ .As a consequence of this result and the results of Section 4, it follows the decidability of emptiness for TABG, and even more, of TABG[≈, ≈, N].
The decidability of emptiness for TABG ∧ is proved in three steps.In Subsection 5.1, we present a new notion of pumping which allows to transform a run into a smaller run under certain conditions.In Subsection 5.2, we define a well quasi-ordering ≤ on a certain set S. In Subsection 5.3, we connect the two previous subsections by describing how to compute, for each run r with height h = h(r), a certain sequence e h , . . ., e 1 of elements of S satisfying the following fact: there exists a pumping on r if and only if e i ≤ e j for some h ≥ i > j ≥ 1.Moreover, each e i of the computed sequence is chosen among a finite number of possibilities.Finally, all of these constructions are used as follows.Suppose the existence of an accepting run r.If r is "too high", the fact that ≤ is a well quasi-ordering and the properties of the sequence imply the existence of such i, j.Thus, it follows the existence of a pumping providing a smaller accepting run r ′ .We conclude the existence of a computational bound for the height of a minimum accepting run, and hence, decidability of emptiness.
5.1.Global Pumpings.Pumping is a traditional concept in automata theory, and in particular, it is very useful in order to reason about tree automata.The basic idea is to convert a given run r into another run by replacing a subrun at a certain position p in r by a run r ′ , thus obtaining a run r[r ′ ] p .Pumpings are useful for deciding emptiness: if a "big" run can always be reduced by a pumping, then decision of emptiness is obtained by a search of an accepting "small" run.For plain tree automata, a necessary and sufficient condition to ensure that r[r ′ ] p is a run is that the resulting states of r| p and r ′ coincide, since the correct application of a rule at a certain position depends only on the resulting states of the subruns of the direct children.In this case, an accepting run with height bounded by the number of states exists, whenever the accepted language is not empty.
When the tree automaton has equality and disequality constraints, the constraints may be falsified when replacing a subrun by a new run.For TABG ∧ , we will define a notion of pumping ensuring that the constraints are satisfied.This notion of pumping requires to perform several replacements in parallel.We first define the sets of positions involved in such kind of pumping.
Definition 5.1.Let A be a TABG ∧ .Let r be a run of A. Let i be an integer between 1 and h(r).We define Example 5.2.According to Definition 5.1, for our running example (Example 3.4), we have the H i , Ȟi and Hi presented in Figure 4.
The following lemma is rather straightforward from the previous definition.
Lemma 5.3.Let A be a TABG ∧ .Let r be a run of A. Let i be an integer between 1 and h(r).Then, any two different positions in H i ∪ Ȟi ∪ Hi are parallel, and for any arbitrary position p in Pos(r) there is a position p in H i ∪ Ȟi ∪ Hi such that, either p is a prefix of p, or p is a prefix of p.
Proof.For the first fact, note that any proper prefix p of a position p in H i ∪ Ȟi ∪ Hi satisfies h(r| p ) > i.Thus, such a p is not in H i ∪ Ȟi ∪ Hi .For the second fact, consider any p in Pos(r).If h(r| p ) ≤ i holds, then the smallest position p satisfying p ≤ p and h(r| p) ≤ i is in H i ∪ Ȟi ∪ Hi , and we are done.Otherwise, if h(r| p ) > i holds, then the smallest position p of the form p.1. . . ..1 and satisfying h(r| p) ≤ i is in H i ∪ Ȟi ∪ Hi , and we are done.
Definition 5.4.Let A be a TABG ∧ .Let E be E A .Let r be a run of A. Let i, j be integers satisfying 1 Let {p 1 , . . ., pn } be H i ∪ Ȟi ∪ Hi more explicitly written.The run r[r| I(p 1 ) ] p1 . . .[r| I(pn) ] pn is called a global pumping on r with indexes i, j, and injection I.
] pn is a run of ta(A), but it is still necessary to prove that it is a run of A. By abuse of notation, when we write r[r| I(p 1 ) ] p1 . . .[r| I(pn) ] pn , we sometimes consider that I and {p 1 , . . ., pn } are still explicit, and say that it is a global pumping with some indexes 1 ≤ j < i ≤ h(r).
Our goal is to prove that any global pumping r[r| I(p 1 ) ] p1 . . .[r| I(pn) ] pn is a run, and in particular, that all equality and disequality constraints are satisfied.To this end we first state the following intermediate statement, which determines the height of the terms • Assume that both p 1 and p 2 are proper prefixes of positions in H i ∪ Ȟi ∪ Hi .Note that, in this case, symbol(r ′ | p 1 ) = symbol(r| p 1 ) and symbol(r ′ | p 2 ) = symbol(r| p 2 ) hold.Let symbol(r| p 1 ) and symbol(r| p 2 ) be f and g, with arities n and m, respectively.Recall that I is the identity for the positions in Hi , and hence, a position α in {1, . . ., n} satisfies symbol(r| Similarly, a position β in {1, . . ., m} satisfies symbol(r| Moreover, since such positions p 1 .αand p 2 .βare prefixes of positions in H i ∪ Ȟi ∪ Hi , by induction hypothesis, (term(r| )) for all such α in {1, . . ., n} and β in {1, . . ., m}.By Lemma 2.1, (term(r| follows, and we are done.Now we prove that the result of a global pumping preserves the satisfaction of the global constraints. Lemma 5.9.Let A be a TABG ∧ .Let r be a run of A. Let r ′ be the global pumping r[r| I(p 1 ) ] p1 . . .[r| I(pn) ] pn with indexes 1 ≤ j < i ≤ h(r) and injection I.Then, r ′ satisfies all global constraints of A.
• Suppose that one of p 1 , p 2 , say p 1 , is a proper prefix of a position in H i ∪ Ȟi ∪ Hi , and that p 2 satisfies that some position in ) is smaller than or equal to j, and r ′ | p 2 is also a subrun of r.Moreover, p 1 is also a position of r, r ′ (p 1 ) = r(p 1 ) holds, and h(r| p 1 ) = i + k holds for some k > 0. Hence, term(r| p 1 ) = E term(r ′ | p 2 ) holds.Since r is a run and r ′ | p 2 is a subrun of r, the atom involving r(p 1 ) and r ′ (p 2 ) is necessarily of the form r(p 1 ) ≈ r ′ (p 2 ).Thus, the atom involving r ′ (p 1 ) and r ′ (p 2 ) is necessarily of the form r ′ (p 1 ) ≈ r ′ (p 2 ).By Lemma 5.6, h(r ′ | p 1 ) is j + k.Therefore, also term(r ′ | p 1 ) = E term(r ′ | p 2 ) holds, and hence, such an atom is satisfied for such positions in r ′ .• Suppose that both p 1 , p 2 are proper prefixes of positions in H i ∪ Ȟi ∪ Hi .Then, p 1 , p 2 are positions of r satisfying h(r| p 1 ), h(r| p 2 ) ≥ i.Moreover, r(p 1 ) = r ′ (p 1 ) and r(p 2 ) = r ′ (p 2 ) hold.Since r is a run, the atom involving r(p 1 ) and r(p 2 ) is satisfied in the run r for positions p 1 and p 2 .By Lemma 5.8, (term(r| p 1 ) = E term(r| p 2 )) ⇔ (term(r ′ | p 1 ) = E term(r ′ | p 2 )) holds.Thus, the atom involving r ′ (p 1 ) and r ′ (p 2 ) is satisfied in the run r ′ for positions p 1 and p 2 .
Finally, we prove that the result of a global pumping preserves the satisfaction of the constraints between brothers.
Lemma 5.10.Let A be a TABG ∧ .Let r be a run of A. Let r ′ be the global pumping r[r| I(p 1 ) ] p1 . . .[r| I(pn) ] pn with indexes 1 ≤ j < i ≤ h(r) and injection I.
Then, r ′ satisfies all constraints between brothers of A.
Proof.Let us consider a position p of Pos(r ′ ) and two positions i 1 , i 2 involved in a constraint of the rule used at position p in r ′ , i.e. either γ = (i 1 ≈ i 2 ) or γ = (i 1 ≈ i 2 ) occur in this constraint.According to Lemma 5.3, we can distinguish the following cases: • Suppose that a position in H i ∪ Ȟi ∪ Hi , is a prefix of p.Then, r ′ | p is also a subrun of r.Thus, since r is a run, the constraint is satisfied.
holds.Thus, the atom involving i 1 and i 2 is satisfied in the run r ′ for position p.
As a consequence of the previous lemmas, we have that the result of a global pumping satisfies all constraints.
Corollary 5.11.Let A be a TABG ∧ .Let r be a run of A. Let r ′ be the global pumping r[r| I(p 1 ) ] p1 . . .[r| I(pn) ] pn with indexes 1 ≤ j < i ≤ h(r) and injection I.
Then, r ′ is a run of A.

5.2.
A well quasi-ordering.In this subsection we define a well quasi-ordering.It assures the existence of a computational bound for certain sequences of elements of the corresponding well quasi-ordered set.It will be connected with global pumpings in the next subsection.
We define the extension of ≤ to pairs of multisets of n-tuples of natural numbers as P 1 , P1 ≤ P 2 , P2 if P 1 ≤ P 2 and P1 ≤ P2 .
As a direct consequence of Higman's Lemma [Gal91] we have the following: Lemma 5.13.Given n, ≤ is a well quasi-ordering for pairs of multisets of n-tuples of natural numbers.
In any infinite sequence e 1 , e 2 , . . . of elements from a well quasi-ordered set there always exist two indexes i < j satisfying e i ≤ e j .In general, this fact does not imply the existence of a bound for the length of sequences without such indexes.For example, the relation ≤ between natural numbers is a well quasi-ordering, but there may exist arbitrarily long sequences x 1 , . . ., x k of natural numbers such that x i > x j for all 1 ≤ i < j ≤ k.In order to bound the length of such sequences, it is sufficient to force that the first element and each next element of the sequence are chosen among a finite number of possibilities.Indeed in this this case, by König's lemma, the prefix trees describing all such (finite) sequences is finite.As a particular case of this fact we have the following result (the proof is standard, but we include it for completeness).
Lemma 5.14.There exists a computable function B : N × N → N such that, given two natural numbers a, n, B(a, n) is a bound for the length ℓ of any sequence T 1 , Ť1 , . . ., T ℓ , Ťℓ of pairs of multisets of n-tuples of natural numbers such that the following conditions hold: (1) The tuple 0, . . ., 0 does not occur in any T i , Ťi for i in {1, . . ., ℓ}.
Proof.For proving the statement, we first construct a rooted tree S = (V, E) labelled by sequences of pairs of multisets of n-tuples, where the depth of each node is equal to the length of the sequence labeling it and such that the set of internal nodes of S corresponds exactly to the set of sequences satisfying conditions (1) to (4).Second, we show that S is finite.This concludes the proof, since finiteness of S and its constructive definition imply that S is computable, and B(a, n) can be defined as the maximal depth of S.
We define V as the set of all the sequences T 1 , Ť1 , . . ., T ℓ , Ťℓ of pairs of multisets of n-tuples satisfying the conditions (1) to (3) and such that there are no i, j satisfying 1 ≤ i < j < ℓ and T i , Ťi ≤ T j , Ťj .This last condition, that we will refer to as (5), is weaker than (4) since in (5) we have j < ℓ instead of j ≤ ℓ.Thus, all sequences satisfying conditions (1) to (4) belong to V .Note that V contains the empty sequence, which we denote as ε.We define E ⊆ V 2 as the set of edges containing T 1 , Ť1 , . . ., T i , Ťi −→ T 1 , Ť1 , . . ., T i , Ťi , T i+1 , Ťi+1 for every such couple of sequences in V .
It is quite obvious that S = (V, E) is a tree rooted at ε, since ε does not have an input edge, each sequence of length 1 has a unique input edge coming from ε, and each sequence of length i > 1 has a unique input edge coming from its unique prefix sequence of length i − 1.Also, the set of internal nodes of S is exactly the set of sequences satisfying conditions (1) to (4), and the set of leaves of S is exactly the set of sequences satisfying conditions (1) to (3), and (5), but not (4).
It remains to show that S is finite.To this end, it suffices to see that S is finitely branching and that there is no path with infinite length.
First, we prove that each node v ∈ V has a finite branching: ε links to all the sequences of length 1, the number of which is bounded by conditions (1) and (2); and each sequence T 1 , Ť1 , . . ., T i , Ťi can only link to sequences of the form T 1 , Ť1 , . . ., T i , Ťi , T i+1 , Ťi+1 , the number of which is bounded by conditions (1) and (3).
In order to bound the height of a term accepted by a given TABG ∧ A (and of minimum height), Lemma 5.14 will be used by making a to be the maximum arity of the signature of A, and making n to be the number of states of A.

5.3.
Mapping a run to a sequence of the well quasi-ordered set.We will associate, to each number i in {1, . . ., h(r)}, a pair of multisets of n-tuples of natural numbers, which can be compared with other pairs according to the definition of ≤ in the previous subsection.To this end, we first associate n-tuples to terms and multisets of n-tuples to sets of positions.
Definition 5.15.Let A be a TABG ∧ .Let E be E A .Let q 1 , . . ., q n be the states of A. Let r be a run of A. Let P be a set of positions of r.Let t be a term.We define r t,P as the following tuple of natural numbers: {p ∈ P | term(r| p ) = E t ∧ r(p) = q 1 } , . . ., {p ∈ P | term(r| p ) = E t ∧ r(p) = q n } Definition 5.16.Let A be a TABG ∧ .Let E be E A .Let r be a run of A. Let P be a set of positions of r.Let {[t 1 ], . . ., [t k ]} be the set of equivalence classes modulo E of the set of terms {term(r| p ) | p ∈ P } with representatives t 1 , . . ., t k .We define r P as the multiset [r t 1 ,P , . . ., r t k ,P ].
Example 5.17.Following our running example, for the representation of the n-tuples of natural numbers we order the states as q d , q N , q id , q t , q L , q M .The multisets r H i , r Ȟi and r Hi are presented in Figure 6.
The following lemma connects the existence of a pump-injection with the quasi-ordering relation.
Lemma 5.18.Let A be a TABG ∧ .Let r be a run of A. Let i, j be integers satisfying 1 ≤ j < i ≤ h(r).
Then, there exists a pump-injection I : Proof.Although we prove both directions of the double implication, the left-to-right one is technical but not conceptually difficult, and it is not necessary for the rest of the paper.In the following, we write E for E A .⇒) Assume that there exists a pump-injection I : (H i ∪ Ȟi ∪ Hi ) → (H j ∪ Ȟj ∪ Hj ).We just prove r H i ≤ r H j , since r Ȟi ≤ r Ȟj can be proved analogously.By Condition (C 1 ) of the definition of pump-injection, I(H i ) ⊆ H j holds.We write the equivalence classes of The decidability of emptiness of A follows, since the existence of successful runs implies that one of them can be found among a computable and finite set of possibilities.
Using Corollary 4.20 and Theorem 5.22, we can conclude the decidability of emptiness for TABG, and more generally for TABG[≈, ≈, N].

Unranked Ordered Trees
Our tree automata models and results can be generalized from ranked to unranked ordered terms.In this setting, Σ is called an unranked signature, meaning that there is no arity fixed for its symbols, i.e. that in a term a(t 1 , . . ., t n ), the number n of children is arbitrary and does not depend on a.Let us denote by U (Σ) the set of unranked ordered terms over Σ.The notions of positions, subterms, etc., are defined for unranked terms of U (Σ) as for ranked terms of T (Σ).
We extend the definition of automata for unranked ordered terms, called hedge automata [Mur99], with global constraints.We do not consider constraints between brothers nor flat theories in this setting.Definition 6.1.A hedge automaton with global constraints (HAG) over an unranked signature Σ is a tuple A = Q, Σ, F, ∆, C where Q is a finite set of states, F ⊆ Q is the subset of final states, C is a Boolean combination of atomic constraints of the form q ≈ q ′ or q ≈ q ′ , with q, q ′ ∈ Q, and ∆ is a set of transition rules of the form a(L) → q where a ∈ Σ, q ∈ Q and L is a regular (word) language over Q * , assumed given by a finite state automaton with input alphabet Q.
The notion of run of TAG is extended to HAG in the natural way.A run of a HAG A is a pair r = t, M where t ∈ U (Σ) is an unranked ordered term and M is a mapping from Pos(t) into ∆ A such that for each position p ∈ Pos(t) with n children, if M (p.1), . . ., M (p.n) are rules with right-hand side states q 1 , . . ., q n ∈ Q A , respectively, then M (p) is a transition rule of the form t(p)(L) → q in ∆, and the word q 1 • • • q n belongs to L.Moreover, r |= C A , where satisfiability of C A by r is defined like in Section 3. A run r is called successful (or accepting) if r(λ) ∈ F A .
The emptiness decision results of Corollary 5.24 can be transposed from TAG into HAG using a standard transformation from unranked to ranked binary terms, like the extension encoding described in [CDG + 07], Chapter 8.
Let us associate to the unranked signature Σ the (ranked) signature Σ @ := {a : 0 | a ∈ Σ} ∪ {@ : 2} where @ is a new symbol not in Σ.The operator curry is a bijection from U (Σ) into T (Σ @ ) recursively defined as follows: curry(a) = a for all a ∈ Σ curry a(t 1 , . . ., t n ) = @ curry a(t 1 , . . ., t n−1 ) , curry(t n ) An example of application of this operator is presented in Figure 7.We extend the application of the operator curry to sets of unranked ordered terms by curry(L) = {curry(t) | t ∈ L}.Proof.Let A be Q, Σ, F, ∆, C more explicitly written.Without loss of generality, we assume that for each a ∈ Σ, q ∈ Q, the set of rules ∆ contains exactly one transition of the form a(L) → q, and we denote by Āa,q the NFA recognizing the corresponding language L. Recall that such automata have Q as input alphabet.Without loss of generality, we assume that the sets of states of A and all Āa,q are pairwise disjoint.Let Q be the union of all states of all the automata Āa,q .Intuitively, the transitions of the automaton A ′ will simulate both the transitions of A and the transitions of the NFAs Āa,q , when running on curry(t) for some t ∈ U (Σ). Let where ∆ ′ contains the following transitions for each a ∈ Σ, q ∈ Q: • a → q if Āa,q recognizes the empty word, • a → q where q is the initial state of Āa,q , • @(q, q ′ ) → q′ if there is a transition q − − → q ′ q′ in Āa,q , and • @(q, q ′ ) → q if there is a transition q − − → q ′ q′ in Āa,q and q′ is a final state of Āa,q .
It is not difficult to see that there exists an accepting run of A if and only if there exists an accepting run of A ′ .
There exist alternative encodings from unranked to ranked trees in the literature, e.g., the first-child next-sibling encoding: see Figure 8 for an example of this transformation.This alternative encoding makes the representation of equality and disequality between subterms of the original unranked term difficult, since the transformed subterms may have original siblings occurring now as their subterms.For example, in Figure 8, the two occurrences of the subterm c correspond to different terms in the result of the transformation.

Logics on Trees
In this section, we discuss the application of our results to second order logics interpreted over domains defined by terms.We propose a strict extension of the second order monadic logic of the tree with equality, disequality and arithmetic constraints, and show that satisfiability is decidable for this extension thanks to a correspondence with TAG[≈, ≈, N]. 7.1.MSO on Ranked Terms.A ranked term t ∈ T (Σ) over Σ can be seen as a model for logical formulae, with an interpretation domain which is the set of positions Pos(t).We consider monadic second order formulae interpreted on such models, built with the usual Boolean connectors, with quantifications over first order variables (interpreted as positions), denoted x, y . . .and over unary predicates (i.e.second order variables interpreted as sets of positions), denoted X, Y . .., and with the following predicates, • equality: x = y, • membership: X(x), • labeling: a(x), for a ∈ Σ • navigation: S i (x, y), for all i smaller than or equal to the maximal arity of symbols of Σ (we call +1 the type of such predicates), • term equality: X ≈ Y , term disequality: X ≈ Y (predicate types ≈ and ≈), • linear inequalities: , where every a i and a belong to Z (predicate types |.| Z and .Z ).
We write MSO[τ 1 , . . ., τ k ] for the set of monadic second order logic formulae with equality, membership, labeling predicates and other predicates of types τ 1 , . . ., τ k , amongst the above types +1, ≈, ≈, and |.| Z , .Z .We also use the notations |.| N and .N for natural linear inequalities (linear inequalities whose coefficient all have the same sign) and the abbreviations Z and N of Section 4.
On the other side, the fragment ∃MSO[+1, |.| Z ] is decidable [KR02], and a fragment of ∃MSO[+1, ≈, ≈] is shown decidable in [FTT08] for a restricted variant of ≈, using a two way correspondence between these formulae and a decidable subclass of TAGED.
Proof.Following the same proof scheme as [FTT08], we show that for every closed formula φ in ∃MSO[+1, ≈, ≈, N], we can construct a TAG[≈, ≈, N] recognizing exactly the set of models of φ.Then, the decidability of the logic follows from Theorem 5.24.
Without loss of generality, we may assume that φ is of the form ∃X 1 . . .∃X n (φ 0 (X) ∧ φ ≈ (X) ∧ φ N (X)) where φ 0 (X) is a MSO[+1] formula with free variables X = X 1 , . . ., X n , and φ ≈ (X) and φ N (X) are Boolean combinations of atoms of the respective form X i ≈ X j , X i ≈ X j and a i • |X i | ≥ a, a i • X i ≥ a.Moreover, we shall also assume that φ ≈ (X) and φ N (X) are conjunctions of atoms or negations of atoms of the above form.Otherwise, we put them into disjunctive normal form and then split φ into an equivalent formula φ 1 ∨ . . .∨ φ k , where each φ i , i ≤ k, is of the form requested: , where φ i 0 (X) ∈ MSO[+1] and φ i ≈ (X) and φ i N (X) are conjunctions of atoms or negations of atoms as above, and we solve satisfiability separately for each φ i .
The above transformation also works in the other direction (this result is not necessary for the proof of Theorem 7.2 though): for every TAG[≈, ≈, N], we can construct a formula φ in ∃MSO[+1, ≈, ≈, N], whose set of models is L(A).
Note that ∃MSO[+1, ≈] is strictly more expressive than MSO, since the equality between subterms is not expressible in MSO (see e.g.[CDG + 07]).The TA construction of [TW68] for the decidability of MSO[+1] involves the closure under projection on components for TA languages over signatures made of tuples of symbols (for the elimination of ∃ quantifiers).TAG languages are not closed under projection on some components of tuples, as it is already the case for simpler form tree automata with equality [Tre00].Thus, the same approach cannot be used to prove decidability of emptiness of TAG.
7.2.MSO on Unranked Ordered Terms.In unranked ordered terms of U (Σ), the number of children of a position is unbounded.Therefore, for navigating in such terms with logical formulae, the successor predicates S i (x, y) of Section 7.1 are not sufficient.In order to describe unranked ordered terms as models, we replace these above predicates S i by: • S ↓ (x, y) (y is a child of x), • S → (x, y) (y is the successor sibling of x).The type of these predicates is still called +1.Note that the above predicates S 1 , S 2 , . . .can be expressed using these two predicates only.
It is shown in [SSM03] that the extension MSO[+1, |.| Z ] is undecidable for unranked ordered terms when counting constraints are applied to sibling positions.Using the results of Section 6, and an easy adaptation of the automata construction in the proof of Theorem 7.2, we can generalize Theorem 7.2 to ∃MSO over unranked ordered terms.

Conclusion
We have answered (positively) the open problem of decidability of the emptiness problem for the TAGED [FTT08], by proposing a decision algorithm for a class TABG of tree automata with global constraints strictly extending the global constraints of TAGED in several directions.Moreover, the TABG combine the global constraints with local tests between brother subterms a la [BT92] and equality interpreted modulo flat theories.Our method for emptiness decision, presented in Section 5 appeared to be robust enough to deal with several extensions like global counting constraints, and generalization to unranked terms.
A challenging question would be to investigate the precise complexity of the emptiness problem, avoiding the use of Higman's Lemma in the algorithm.For instance, in [FTT08], it is shown, using a direct reduction into solving positive and negative set constraints [CP94, GTT94,Ste94], that emptiness is decidable in NEXPTIME for TAGED (i.e. for TAG ∧ [ ≈] modulo an empty theory and such that in every atomic constraint q ≈ q ′ , q and q ′ are distinct states).On the other hand, the best known lower bound for emptiness decision for TABG is EXPTIME-hardness (this holds already for TAG ∧ [≈] as shown in [FTT08]).
Another interesting problem mentioned in the introduction is the combination of the HAG of Section 6 with the unranked tree automata with tests between siblings, UTASC [WL07, LW09].Perhaps, the techniques of Section 5 could help for the emptiness decision for a formalism using for instance MSO binary querying (following e.g.[NPTT05]) for selecting the test position of global constraints.
Finally, another branch of research related to TABG concerns automata and logics for data trees, i.e. trees labeled over an infinite (countable) alphabet (see [Seg06] for a survey).Indeed, data trees can be represented by terms over a finite alphabet, with an encoding of the data values into terms.This can be done in several ways, and with such encodings, the data equality relation becomes the equality between subterms.Therefore, this could be worth studying in order to relate our results on TAG to decidability results on automata or logics on data trees like those in [JL07,BMSL09].
∧ [≈, ≈, N] such that the union of their languages coincides with the language of the original TABG[≈, ≈, N].In this step we use arithmetic constraints for simulating the removed negative literals.• Third, we remove arithmetic literals of type .N , obtaining a new list of TABG ∧ [≈, ≈, |.| N ] such that the union of their languages coincides with the language of the original TABG[≈ , ≈, N].In this step we use positive literals of types ≈, ≈, and |.| N in order to simulate the removed literals of type .N .• Fourth, we remove arithmetic literals of type |.| N , obtaining a new list of TABG ∧ [≈, ≈] such that the union of their languages coincides with the language of the original TABG[≈, ≈, N].In this step, new states are used for counting the amount of occurrences of original states.• Finally, we show that TABG ∧ [≈, ≈] are closed under union.Hence, we obtain a single TABG ∧ [≈, ≈] whose language coincides with the one of the original TABG[≈, ≈, N].Definition 4.3.Let A = Q, Σ, F, ∆, E, C be a TABG[≈, ≈, N].The constraint C is normalized if it is either true or false or a disjunction of conjunctions of literals, where all arithmetic literals are positive.

Lemma 4. 22 .
The class of TABG languages (modulo the same equational theory) is closed under intersection.
an injective function such that the following conditions hold: (C 1 ) I(H i ) ⊆ H j , I( Ȟi ) ⊆ Ȟj and I( Hi ) ⊆ Hj .Moreover, I restricted to Hi is the identity, i.e.I(p) = p for each p in Hi .(C 2 ) For each p in H i ∪ Ȟi ∪ Hi , r(p) = r(I(p)).

Figure 8 :
Figure 8: First-child next-sibling encoding of an unranked term.