Residuality and Learning for Nondeterministic Nominal Automata

We are motivated by the following question: which data languages admit an active learning algorithm? This question was left open in previous work by the authors, and is particularly challenging for languages recognised by nondeterministic automata. To answer it, we develop the theory of residual nominal automata, a subclass of nondeterministic nominal automata. We prove that this class has canonical representatives, which can always be constructed via a finite number of observations. This property enables active learning algorithms, and makes up for the fact that residuality -- a semantic property -- is undecidable for nominal automata. Our construction for canonical residual automata is based on a machine-independent characterisation of residual languages, for which we develop new results in nominal lattice theory. Studying residuality in the context of nominal languages is a step towards a better understanding of learnability of automata with some sort of nondeterminism.


Introduction
Formal languages over infinite alphabets have received considerable attention recently. They include data languages for reasoning about XML databases [NSV04], trace languages for analysis of programs with resource allocation [GDPT13], and behaviour of programs with data flows [HJV19]. Typically, these languages are accepted by register automata, first introduced in the seminal paper [KF94]. Another appealing model is that of nominal automata [BKL14]. While nominal automata are as expressive as register automata, they enjoy convenient properties. For example, the deterministic ones admit canonical minimal models, and the theory of formal languages and many textbook algorithms generalise smoothly.
In this paper, we investigate the properties of so-called residual nominal automata. An automaton accepting a language L is residual whenever the language of each state is a derivative of L. In the context of regular languages over finite alphabets, residual finite state we denote classes where automata are not allowed to guess values, i.e., to store symbols in registers without explicitly reading them.
automata (RFSAs) are a subclass of nondeterministic finite automata (NFAs) introduced by Denis et al. [DLT02] as a solution to the well-known problem of NFAs not having unique minimal representatives. They show that every regular language admits a unique canonical RFSA, which can be much smaller than the canonical deterministic automaton. Residual automata play a key role in the context of exact learning 1 , in which one computes an automaton representation of an unknown language via a finite number of observations. The defining property of residual automata allows one to (eventually) observe the semantics of each state independently. In the finite-alphabet setting, residuality underlies the seminal algorithm L for learning deterministic automata [Ang87] (deterministic automata are always residual), and enables efficient algorithms for learning nondeterministic [BHKL09] and alternating automata [AEF15,BLLR17]. Residuality has also been studied for learning probabilistic automata [DE08]. Existence of canonical residual automata is crucial for the convergence of these algorithms.
Our investigation of residuality in the context of data languages is motivated by the question: which data languages admit an exact learning algorithm? In previous work [MSS + 17], we have shown that the L algorithm generalises smoothly to data languages, meaning that deterministic nominal automata can be learned. However, the nondeterministic case proved to be significantly more challenging. In fact, in stark contrast with the finite-alphabet case, nondeterministic nominal automata are strictly more expressive than deterministic ones, so that residual automata are not just succinct representations of deterministic languages. As a consequence, our attempt to generalise the NL algorithm for learning nondeterministic finite automata [BHKL09] to nominal automata only partially succeeded: we only proved that it converges for deterministic languages, leaving the nondeterministic case open. By investigating residual data languages, we are finally able to settle this case.
In summary, our contributions are as follows: • Section 3: We refine classes of data languages as depicted in Figure 1, by giving separating languages for each class. • Section 4: We develop new results of nominal lattice theory, from which we prove the main characterisation theorem (Theorem 4.10). This provides a machine-independent characterisation of the languages accepted by residual nominal automata and constructs canonical automata which: a) are minimal in their respective class and unique up to isomorphism; b) can be constructed via a finite number of observations of the language. We also give an analogous result for non-guessing automata (Theorem 4.17). • Section 5: We study decidability and closure properties for residual nominal automata.
We prove that, like for nondeterministic nominal automata, equivalence is undecidable.
On the other hand, universality is decidable. • Section 6: We settle important open questions about exact learning of data languages.
We show that residuality does not imply convergence of existing algorithms, and we give a (modified) NL -style algorithm that works precisely for residual languages. This research mirrors that of residual probabilistic automata [DE08]. There, too, one has distinct classes of which the deterministic and residual ones admit canonical automata and have an algebraic characterisation. We believe that our results contribute to a better understanding of learnability of automata with some sort of nondeterminism.
1.1. Differences with conference version. This paper is the extended version of [MS20], published at CONCUR'20. Since then we have added: • full proofs for all the results; • a new result stating that equivalence is undecidable; • a greatly expanded Section 6, with the learning algorithm, proofs and bounds; • more elaborate discussion in which we consider alternating and unambiguous nominal automata, as well as other symmetries.

Preliminaries
2.1. Nominal Sets. Our paper is based on the theory of nominal sets of [Pit13], whose basic notions we now briefly recall. Let A = {a, b, c, . . .} be a countably infinite set of atoms and let Perm(A) be the set of finite permutations on A, i.e., the bijective functions π : A → A such that the set {a ∈ A | π(a) = a} is finite. Finite permutations form a group where the unit is given by the identity function, the inverse by functional inverse, and multiplication by function composition.
A function · : Perm(A) × X → X is a group action if it satisfies π · (π · x) = (π • π ) · x and id · x = x. We often omit the · and write πx instead. We say that a set of atoms A ⊂ A supports x ∈ X whenever πx = x for all the permutations π that fix A pointwise (i.e., π(a) = a for all a ∈ A). A nominal set is a set X together with a group action such that each x ∈ X has a finite support (that is, each x is finitely supported ). Every element of a nominal set x ∈ X has a least finite support which we denote by supp(x).
Given a nominal set X, the group action extends to subsets of X. Let U ⊆ X be a subset, then we define the group action as π · U := {πx | x ∈ U }. A subset U that is supported by the empty set is called equivariant and for such U we have πU = U for all permutations π. This definition extends to relations and functions. For instance, a function f : X → Y between nominal sets is equivariant whenever πf (x) = f (πx).
Two elements of a nominal set x, y ∈ X can be considered equivalent if there is a permutation π such that πx = y. This defines an equivalence relation and partitions the set into classes. Such an equivalence class is called an orbit. Concretely, the orbit of x is given by orb(x) := {πx | π ∈ Perm(A)}. A nominal set X is orbit-finite whenever it is a finite union of orbits. More generally, we consider A-orbits, those are orbits defined over permutations fixing (2) if P is orbit-finite and non-empty, then it has a minimal element (i.e., some m ∈ P such that n ≤ m implies n = m).
Proof. Ad (1), let x, y in the same orbit. There is a finite permutation π such that y = πx, which gives x ≤ y = πx. By equivariance of ≤ we get πx ≤ πy, which gives Since π is a finite permutation we have π k = id for some k and so x ≤ y ≤ · · · ≤ x, proving x = y. Ad (2), consider the finite set of orbits Orb(P ) := {orb(x) | x ∈ P } and the induced relation is clearly reflexive. For transitivity, consider o 1 o 2 o 3 with witnesses x ≤ y and y ≤ z; there is a permutation π such that y = πy , giving x ≤ y = πy ≤ πz, which shows o 1 ≤ o 3 . For antisymmetry, consider o 1 o 2 and o 2 o 1 with witnesses x ≤ y and y ≤ x . Again there is a permutation π such that y = πy which gives x ≤ y = πy ≤ πx . By (1) we have πx = x and hence x = y as required. We have shown that (Orb(P ), ) is a finite poset, therefore it has a minimal orbit o in Orb(P ), and any element of o is minimal in P .
The above property does not hold for the ordered atoms (Q, ≤). To see this, consider the nominal poset Q itself (which is a single orbit set) with the given order. This poset has no minimal element.
2.4. Nominal Automata. The theory of nominal automata seamlessly extends classical automata theory by having orbit-finite nominal sets and equivariant functions in place of finite sets and functions.
Definition 2.4. A (nondeterministic) nominal automaton A consists of an orbit-finite nominal set Σ, the alphabet, an orbit-finite nominal set of states Q, and the following equivariant subsets The usual notions of acceptance and language apply. We denote the language accepted by A by L(A), and the language accepted by a state q by q . Note that the language L(A) ∈ P fs (Σ * ) is equivariant, and that q ∈ P fs (Σ * ) need not be equivariant, but it is supported by supp(q).
Remark 2.5. In most examples we take the alphabet to be Σ = A, but it can be any orbit-finite nominal set. For instance, Σ = Act × A, where Act is a finite set of actions, represents actions act(x) with one parameter x ∈ A (actions with arity n can be represented via n-fold products of A).
We recall the notion of derivative language [DLT02]. 2 Definition 2.6. Given a language L and a word u ∈ Σ * , we define the derivative of L with respect to u as u −1 L := {w | uw ∈ L} and the set of all derivatives as These definitions seamlessly extend to the nominal setting. Note that w −1 L is finitely supported whenever L is.
Of special interest are the deterministic, residual, and non-guessing nominal automata, which we introduce next.
Definition 2.7. A nominal automaton A is: • Deterministic if I = {q 0 }, and for each q ∈ Q and a ∈ Σ there is a unique q such that (q, a, q ) ∈ δ. In this case, the relation is in fact functional δ : Q × Σ → Q. • Residual if each state q ∈ Q accepts a derivative of L(A), formally: q = w −1 L(A) for some word w ∈ Σ * . The words w such that q = w −1 L(A) are called characterising words for the state q. • Non-guessing if supp(q 0 ) = ∅, for each q 0 ∈ I, and supp(q ) ⊆ supp(q) ∪ supp(a), for each (q, a, q ) ∈ δ.
Observe that the transition function of a deterministic automaton preserves supports (i.e., if C supports (q, a) then C also supports δ(q, a)). Consequently, all deterministic automata are non-guessing. For the sake of succinctness, in the following we drop the qualifier "nominal" when referring to these classes of nominal automata.
For many examples, it is useful to define the notion of an anchor. Given a state q, a word w is an anchor if δ(I, w) = {q}, that is, the word w leads to q and no other state. Every anchor for q is also a characterising word for q (but not vice versa). A state with an anchor is called anchored, and we call an automaton anchored if all states have anchors.
Finally, we recall the Myhill-Nerode theorem for nominal automata.

Separating languages
Deterministic, nondeterministic and residual automata have the same expressive power when dealing with finite alphabets. The situation is more nuanced in the nominal setting. We now give one language for each class in Figure 1. For simplicity, we mostly use the one-orbit nominal set of atoms A as alphabet. These languages separate the different classes, meaning that they belong to the respective class, but not to the classes below or beside it. For each example language L, we depict: a nominal automaton recognising L (on the left); the set of derivatives Der(L) (on the right). We make explicit the poset structure of Der(L): grey rectangles represent orbits of derivatives, and lines stand for set inclusions (we grey out irrelevant ones). This poset may not be orbit-finite, in which case we depict a small, indicative part. Observing the poset structure of Der(L) explicitly is important for later, where we show that the existence of residual automata depends on it. We write aa −1 L to mean (aa) −1 L. Variables a, b, . . . are always atoms and u, w, . . . are always words. Deterministic: First symbol equals last symbol. Consider the language This is accepted by the following deterministic nominal automaton ( Figure 2). The automaton is actually infinite-state, but we represent it symbolically using a register-like notation, where we annotate each state with the current value of a register. Note that the derivatives . . are in the same orbit. In total Der(L d ) has three orbits, which correspond to the three orbits of states in the deterministic automaton. The derivative awa −1 L d , for example, equals aa −1 L d . Non-guessing residual: Some atom occurs twice. The language is L ng,r := {uavaw | u, v, w ∈ A * , a ∈ A}.
The poset Der(L ng,r ) is not orbit-finite, so by the nominal Myhill-Nerode theorem there is no deterministic automaton accepting L ng,r . However, derivatives of the form ab −1 L ng,r can be written as a union ab −1 L ng,r = a −1 L ng,r ∪ b −1 L ng,r . In fact, we only need an orbit-finite set of derivatives to recover Der(L ng,r ). These orbits are highlighted in the diagram on the right ( Figure 3). Selecting the "right" derivatives is the key idea behind constructing residual automata in Theorem 4.10.
Lng,r a −1 Lng,r · · · b −1 Lng,r ab −1 Lng,r · · · · · · abc −1 Lng,r · · · · · · aa −1 Lng,r Nondeterministic: Last letter is unique. We consider the language Derivatives a −1 L n are again unions of smaller languages: a −1 L n = b =a ab −1 L n . However, the poset Der(L) has an infinite descending chain of languages (with an increasing support), namely a −1 L ⊃ ab −1 L ⊃ abc −1 L ⊃ . . . Figure 4 shows this descending chain, where we have omitted languages like aa −1 L n , as they only differ from a −1 L n on the empty word. The existence of a such a chain implies that L n cannot be accepted by a residual automaton. This is a consequence of Theorem 4.10, as we shall see later.
Residual: Last letter is unique but anchored. We reconsider the previous automaton A n and add a transition in order to make the automaton residual. First, we extend the alphabet to Σ = A ∪ {Anc(a) | a ∈ A}, where Anc is nothing more than a label. Then, we add the transitions (a, Anc(a), a) for each a ∈ A, see Figure 5. The language accepted by the new automaton is Here, we have forced the automaton to be residual, by adding an anchor to the first state. Nevertheless, guessing is still necessary. In the poset, we note that all elements in the descending chain can now be obtained as unions of Anc(a) −1 L r . For instance, Non-guessing nondeterministic: Repeated atom with different successor. Consider the language Here, we allow a = b or a = c in this definition. This is a language which can be accepted by a non-guessing automaton ( Figure 6). However, there is no residual automaton for this language. The poset structure of Der(L ng ) is very complicated. We will return to this example after Theorem 4.10.

Canonical Residual Nominal Automata
In this section we will give a characterisation of canonical residual automata. We will first introduce notions of nominal lattice theory, then we will state our main result (Theorem 4.10). We conclude the section by providing similar results for non-guessing automata.
4.1. Theory of Nominal Join-Semilattices. We abstract away from words and languages and consider the set P fs (Z) for an arbitrary nominal set Z. This is a Boolean algebra of which the operations ∧, ∨, ¬ are all equivariant maps [GLP11]. Moreover, the finitely supported union : P fs (P fs (Z)) → P fs (Z) is also equivariant. We note that this is more general than a binary union, but it is not a complete join-semilattice. Hereafter, we shall denote set inclusion by ≤ (< when strict). Definition 4.1. Given a nominal set Z and X ⊆ P fs (Z) equivariant 3 , we define the set generated by X as Remark 4.2. The set X is closed under the operation , and moreover is the smallest equivariant set closed under containing X. In other words, − defines a closure operator. We will often say "X generates Y ", by which we mean Y ⊆ X .
for every finitely supported x ⊆ X. The subset of all join-irreducible elements is denoted by This is again an equivariant set. For convenience, we may use the following equivalent definition of join-irreducible: x is non-empty and (∀x 0 ∈ x.x 0 < x) =⇒ x < x, for every finitely supported x ⊆ X.
Remark 4.4. In lattice and order theory, join-irreducible elements are usually defined only for a lattice (see, e.g., [DP02]). However, we define them for arbitrary subsets of a lattice. (Note that a subset of a lattice is not a sub-lattice in general.) This generalisation will be needed later, when we consider the poset Der(L), which is contained in the lattice P fs (Σ * ), but it is not a sub-lattice.
Remark 4.5. The notion of join-irreducible, as we have defined here, corresponds to the notion of prime in some papers on learning nondeterministic automata [BHKL09, DLT02, MSS + 17]. Unfortunately, the word prime has a slightly different meaning in lattice theory. We stick to the terminology of lattice theory.
If a set Y is well-behaved, then its join-irreducible elements will actually generate the set Y . This is normally proven with a descending chain condition. We first restrict our attention to orbit-finite sets. The following Lemma extends [DP02, Lemma 2.45] to the nominal setting. The proof is analogous to the ordinary case, except that is relies on the specific structure of equality atoms (Lemma 2.3).
Lemma 4.6. Let X ⊆ P fs (Z) be an orbit-finite and equivariant set.
(1) Let a ∈ X, b ∈ P fs (Z) and a ≤ b. Then there is x ∈ JI(X) such that x ≤ a and x ≤ b.
(2) Let a ∈ X, then a = {x ∈ X | x join-irreducible in X and x ≤ a}.
Proof. Ad 1. Consider the set S = {x ∈ X | x ≤ a, x ≤ b}. This is a finitely supported and supp(S)-orbit-finite set, hence it has some minimal element m ∈ S by Lemma 2.3 (here we are using its generalisation to finitely-supported sets). We shall prove that m is join-irreducible in X. Let x ⊆ X finitely supported and assume that x 0 < m for each x 0 ∈ x.
Note that x 0 < m ≤ a and so that x 0 / ∈ S (otherwise m was not minimal). Hence x 0 ≤ b (by definition of S). So x ≤ b and so x / ∈ S, which concludes that x = m, and so x < m as required.
3 A similar definition could be given for finitely supported X. In fact, all results in this section generalise to finitely supported. But we use equivariance for convenience. Ad 2. Consider the set T = {x ∈ JI(X) | x ≤ a}. This set is finitely supported, so we may define the element b = T ∈ P fs (Z). It is clear that b ≤ a, we shall prove equality by contradiction. Suppose a ≤ b, then by (1.), there is a join-irreducible x such that x ≤ a and x ≤ b. By the first property of x we have x ∈ T , so that x ≤ b = T is a contradiction. We conclude that a = b, i.e., a = T as required.
So far, we have defined join-irreducible elements relative to some fixed set. We will now show that these elements remain join-irreducible when considering them in a bigger set, as long as the bigger set is generated by the smaller one. This will later allow us to talk about the join-irreducible elements.
Proof. (⊇) Let x ∈ X be join-irreducible in X. Suppose that x = y for some finitely supported y ⊆ Y . Note that also y ⊆ X Then x = y 0 for some y 0 ∈ y, and so x is The last set is a finitely supported subset of Y , and so there is a y 0 in it such that y = y 0 . Moreover, this y 0 is below some x 0 ∈ x, which gives y 0 ≤ x 0 ≤ y. We conclude that y = x 0 for some x 0 ∈ x.
In other words, the join-irreducibles of X are the smallest set generating X.
Corollary 4.9. If an orbit-finite set Y generates X, then JI(X) ⊆ Y .

Characterising Residual Languages.
We are now ready to state and prove the main theorem of this paper. We fix the alphabet Σ. Recall that the nominal Myhill-Nerode theorem tells us that a language is accepted by a deterministic automaton if and only if Der(L) is orbit-finite. Here, we give a similar characterisation for languages accepted by residual automata. Moreover, the following result gives a canonical construction.
Theorem 4.10. Given a language L ⊆ P fs (Σ * ), the following are equivalent: (1) L is accepted by a residual automaton.
(2) There is some orbit-finite set J ⊆ Der(L) which generates Der(L).
Proof. We prove three implications: (1 ⇒ 2). Let A := (Σ, Q, I, F, δ) be a residual automaton accepting L. Take the set of languages accepted by the states: J := { q | q ∈ A}. This is clearly orbit-finite, since Q is. Moreover, each derivative is generated as follows: (2 ⇒ 3). We can apply Lemma 4.8 with Y = J and X = Der(L) and obtain JI(J) = JI(Der(L)). It follows that JI(Der(L)) is orbit-finite (since it is a subset of J) and generates Der(L).
(3 ⇒ 1). Consider the following residual automaton: In fact, all the components are orbit-finite, and equivariance of ≤ implies equivariance of δ.
We shall now prove that the language of this automaton is exactly L. As a first step, we prove that q = w −1 L by induction over words.
At step (i) we have used the induction hypothesis (u is a shorter word than au) and the fact that − preserves unions. At step (ii, right-to-left) we have used that v −1 L is join-irreducible. The other steps are unfolding definitions. Now, note that L = w −1 L | w −1 L ≤ L, w ∈ Σ , since the join-irreducible languages generate all languages. In other words, the initial states (together) accept L.
Corollary 4.11. The construction above defines a canonical residual automaton with the following uniqueness property: it has the minimal number of orbits of states and the maximal number of orbits of transitions.
Proof. State minimality follows from Corollary 4.9, where we note that the states of any residual automata accepting L form a generating subset of Der(L). Maximality of transitions follows from the fact that no transitions can be added without changing the language.
For finite alphabets, the classes of languages accepted by DFAs and NFAs are the same (by determinising an NFA). This means that Der(L) is always finite if L is accepted by an NFA, and we can always construct the canonical RFSA. Here, this is not the case, that is why we need to stipulate (in Theorem 4.10) that the set JI(Der(L)) is orbit-finite and actually generates Der(L). Either condition may fail, as we will see in Example 4.13.
Example 4.12. In this example we show that residual automata can also be used to compress deterministic automata. The language L := {abb . . . b | a = b} can be accepted by a deterministic automaton of 4 orbits, and this is minimal. (A zero amount of bs is also accepted in L.) The minimal residual automaton, however, has only 2 orbits, given by the join-irreducible languages: The trick in defining the automaton is that the a-transition from −1 L to ab −1 L guesses the value b. In the next section (Section 4.3), we will define the canonical non-guessing residual automaton, which has 3 orbits.
Example 4.13. We return to the examples L n and L ng from Section 3. We claim that neither language can be accepted by a residual automaton. For L n we note that there is an infinite descending chain of derivatives Each of these languages can be written as a union of smaller derivatives. For instance, a −1 L n = b =a ab −1 L n . This means that JI(Der(L n )) = ∅, hence it does not generate Der(L n ) and by Theorem 4.10 there is no residual automaton.
In the case of L ng , we have an infinite ascending chain This in itself is not a problem: the language L ng,r also has an infinite ascending chain. However, for L ng , none of the languages in this chain are a union of smaller derivatives, which we shall now prove formally.
Claim 4.14. All the languages in (4.1) are join-irreducible.
Proof. Consider the word w = a k . . . a 1 a 0 with k ≥ 1 and all a i distinct atoms. We will prove that w −1 L ng is join-irreducible in Der(L ng ), by considering all u −1 L ng ⊆ w −1 L ng . Observe that if u is a suffix of w, then u −1 L ng ⊆ w −1 L ng . This is easily seen from the given automaton, since it may skip any prefix. We now show that u being a suffix of w is also a necessary condition.
Assume that u is not a suffix of w, so there is an i ≥ 0 with x = a i and u contains the suffix xa i−1 . . . a 0 . Take a fresh atom a −1 . If x = a k for some k, let c := a k−1 (note that we may use a −1 here) and otherwise let c be fresh. Then a −1 xc is in u −1 L, since we have repeated x with a different successor. However, regarding w −1 L: If x does not occur in w, then c is fresh and a −1 xc is clearly not in w −1 L (all atoms are distinct). If x = a k (and so c = a k−1 ), then wa −1 a k a k−1 mentions only a k and a k−1 twice, but not with distinct successors; hence a −1 xc / ∈ w −1 L. We conclude that if u is not a suffix of w, then u −1 L is not a subset of w −1 L.
So far, we have shown that To see that w −1 L ng is indeed join-irreducible, we consider the join X = {u −1 L ng | u is a strict suffix of w}. Note that a k a k / ∈ X, but a k a k ∈ w −1 L ng . We conclude that w −1 L ng = {u −1 L ng | u −1 L ng w −1 L ng } as required.
This result implies that the set JI(Der(L ng )) is not orbit-finite. By Theorem 4.10, we can conclude that there is no residual automaton accepting L ng . Remark 4.15. For arbitrary (nondeterministic) languages there is also a characterisation in the style of Theorem 4.10. Namely, L is accepted by an automaton iff there is an orbit-finite set Y ⊆ P fs (Σ * ) which generates the derivatives. However, note that the set Y need not be a subset of the set of derivatives. In these cases, we do not have a canonical construction for the automaton. Different choices for Y define different automata and there is no way to pick Y naturally.
4.3. Automata without guessing. We reconsider the above results for non-guessing automata. Nondeterminism in nominal automata allows naturally for guessing, meaning that the automaton may store symbols in registers without explicitly reading them. For instance, in Figure 4 the automaton non-deterministically stores a(ny) symbol in the initial state without actually reading it, and by doing so it "guesses" which symbol will be read at the end of a word. The original definition of register automata in [KF94] does not allow for guessing, and non-guessing automata remain actively researched [MQ19]. Register automata with guessing were introduced in [KZ10], because it was realised that non-guessing automata are not closed under reversal.
To adapt our theory to non-guessing automata, we need to introduce a more restricted form of powerset. We say that U ⊆ X is uniformly finitely supported (ufs in short) if ∪ x∈U supp(x) is finite. The ufs powerset is defined as follows: This too comes with its notion of ufs-join, performing the union of ufs sets.
The key insight for this section is that the constraints on supports for non-guessing automata (see Definition 2.7) imply that the transition relation can be expressed as a function of the form δ : Q × Σ → P ufs (Q). Intuitively, whenever a symbol a is read from a state q, all successor states must have support that is at most that of q plus that of a, which implies that the union of their supports is finite (i.e., they form a ufs set). The consequence of shifting from P fs to P ufs is that, when giving a specialised version of Theorem 4.10 for non-guessing automata, we can consider the join-semilattice structure given by ufs sets and ufs unions. We first characterise join-irreducibles for such join-semilattices.
Definition 4.16. Let X ⊆ P fs (Z) be equivariant and x ∈ X, we say that x is ufs-joinirreducible in X if x = x =⇒ x ∈ x, for every finitely supported x ⊆ X such that supp(x 0 ) ⊆ supp(x), for each x 0 ∈ x. The set of all ufs-join-irreducible elements is denoted by JI ufs (X) := {x ∈ X | x ufs-join-irreducible in X} .
The only change required is an additional condition on the elements and supports in x. In particular, the sets x are ufs sets, hence their union is ufs.
All the lemmas from the previous section are proven similarly. We state the main result for non-guessing automata.
Theorem 4.17. Given a language L ⊆ P fs (Σ * ), the following are equivalent: (1) L is accepted by a non-guessing residual automaton.
(2) There is some orbit-finite set J ⊆ Der(L) which generates Der(L) by ufs unions.
For direction (3 ⇒ 1) we need a slightly different definition of the canonical automaton: Q := JI ufs (Der(L)) The fact that this automaton accepts L can be proven similarly to what done for Theorem 4.10.
We need to show that this automaton is indeed non-guessing, namely: (1) supp(I) = ∅; ( The first condition follows from L being equivariant. For the second one, we have To better understand the structure of the canonical non-guessing residual automaton, we recall the following fact. Lemma 4.18. For orbit-finite nominal sets Q, we have P ufs (Q) = P fin (Q).
As a consequence, the transition function of non-guessing automata can be written as δ : Q × Σ → P fin (Q). This shows that the canonical non-guessing residual automaton has finite nondeterminism. It also shows that it is sufficient to consider finite unions in Theorem 4.17, instead of uniformly supported ones.

Decidability and Closure Results
In this section we investigate decidability and closure properties of residual automata. First, a positive result: universality is decidable for residual automata. This is in contrast to the nondeterministic case, where universality is undecidable, even for non-guessing automata [Boj19].
In the constructions below, we use computation with atoms. This is a computation paradigm which allow algorithmic manipulation of infinite -but orbit-finite -nominal sets. For instance, it allows looping over such a set in finite time. Important here is that this paradigm is equivalent to regular computability (see [BT18]) and implementations exist to compute with atoms [KS16,KT17].  Proof. We will sketch an algorithm that, given a residual automaton A, answers whether L(A) = Σ * . The algorithm decides negatively in the following cases: • I = ∅. In this case the language accepted by A is empty.
• Suppose there is a q ∈ Q with q / ∈ F . By residuality we have q = w −1 L(A) for some w. Note that q is not accepting, so that / ∈ w −1 L(A). Put differently: w / ∈ L(A). (We note that w is not used by the algorithm. It is only needed for the correctness.) • Suppose there is a q ∈ Q and a ∈ Σ such that δ(q, a) = ∅. Again q = w −1 L(A) for some w. Note that a is not in q . This means that wa is not in the language.
When none of these three cases hold, the algorithm decides positively. We shall prove that this is indeed the correct decision. If none of the above conditions hold, then I = ∅, Q = F , and for all q ∈ Q, a ∈ Σ we have δ(q, a) = ∅. Here we can prove that the language of each state is q = Σ * . Given that there is an initial state, the automaton accepts Σ * . Note that the operations on sets performed in the above cases all terminate, because all involve orbit-finite sets.
Next we consider equivalence of residual automata and checking whether an automaton is residual. Both will turn out to be undecidable and we use following construction in order to prove this.
Construction 5.2. Let A = (Σ, Q, I, F, δ) be a nondeterministic automaton. Let be an extended alphabet, where we assume the new symbols q and q to be disjoint from Σ. We now construct two residual automata from A, where those symbols are used as anchors: A anc = (Σ , Q anc , I anc , F anc , δ anc ), where A = (Σ , Q , I , F , δ ), where Note that these constructions are effective, as they involve computations over orbit-finite sets. We observe the following facts about A anc and A : (1) The states q and q (in both automata) are anchored by the words q and q respectively. Moreover, any symbol from the original alphabet a ∈ Σ is a characterising word for the state in A . So we conclude that both automata are residual.
(2) For q ∈ Q we note that the languages q anc and q are the same (where q anc ∈ Q anc and q ∈ Q denote the "same" state). Similarly we have q anc = q . Proof. We show undecidability by reducing the universality problem for nondeterministic automata to the equivalence problem. (Note that a reduction from universality of residual automata will not work as that is decidable.) We use the above construction and prove that  (3). When considering all words on Σ * , we note that the anchors will lead to single states q which accept the same languages by (2). So L(A anc ) = L(A ) as required.
Conversely, suppose that L(A anc ) = L(A ). Then by (3) we can conclude that We conclude that we can decide universality of nondeterministic automata via equivalence of residual automata. So equivalence of residual automata is undecidable.
Last, determining whether an automaton is actually residual is undecidable. In other words, residuality cannot be characterised as a syntactic property. This adds value to learning techniques, as they are able to provide automata that are residual by construction.
Proposition 5.4. The problem of determining whether a given nondeterministic nominal automaton is residual is undecidable.
Proof. The construction is inspired by [DLT02,Proposition 8.4]. 4 We show undecidability by reducing the universality problem for nominal automata to the residuality problem.
Let A = (Σ, Q, I, F, δ) be a nominal (nondeterministic) automaton on the alphabet Σ. We apply Construction 5.2 and extend the alphabet Σ further by where we assume {$, #} to be disjoint from Σ. We define A = (Σ , Q , I , F , δ ) by  Before we assume anything about A, let us analyse A . In particular, let us consider whether the residuality property holds for each state. From (1) we know that this holds for A anc . For the states x and z we have z = Σ * = $ −1 L(A ) and x = # −1 L(A ) (see Figure 7). The only remaining state for which we do not yet know whether the residuality property holds is state y.
If L(A) = Σ * (i.e., the original automaton is universal), then we note that y = x . In this case, y = # −1 L(A ). So, in this case, A is residual.
Suppose that A is residual. Then y = w −1 L for some word w. Provided that L(A) is not empty, there is some u ∈ L(A). So we know that $u ∈ y . This means that word w cannot start with a ∈ Σ, q, q for q ∈ Q, or $ as their derivatives do not contain $u. The only possibility is that w = # k for some k > 0. This implies y = x , meaning that the language of A is universal.
This proves that A is universal iff A is residual.
These results also hold for the subclass of non-guessing automata, as the constructions do not introduce any guessing and universality for non-guessing nondeterministic nominal automata is undecidable.
Closure properties. We will now show that several closure properties fail for residual languages. Interestingly, this parallels the situation for probabilistic languages: residual ones are not even closed under convex sums. We emphasise that residual automata were devised for learning purposes, where closure properties play no significant role. In fact, one typically exploits closure properties of the wider class of nondeterministic models, e.g., for automata-based verification. The following results show that in our setting this is indeed unavoidable.
Consider the alphabet Σ = A ∪ {Anc(a) | a ∈ A} and the residual language L r from Section 3. We consider a second language L 2 = A * which can be accepted by a deterministic (hence residual) automaton. We have the following non-closure results: Union: The language L = L r ∪ L 2 cannot be accepted by a residual automaton. In fact, although derivatives of the form Anc(a) −1 L are still join-irreducible (see Section 3, residual case), they have no summand A * , which means that they cannot generate a −1 L = A * ∪ b =a Anc(b) −1 L. By Theorem 4.10(3) it follows that L is not residual. Intersection: The language L = L r ∩ L 2 = L n cannot be accepted by a residual automaton, as we have seen in Section 3. Reversal: The language {aw | a not in w} is residual (even deterministic), but its reverse language is L n and cannot be accepted by a residual automaton. Complement: Consider the language L ng,r of words where some atom occurs twice. Its complement L ng,r is the language of all fresh atoms, which cannot even be recognised by a nondeterministic nominal automaton [BKL14]. Closure under concatenation and Kleene star is yet to be settled. 5.1. Length of characterising words. We end this section by giving a result about the length of characterising words. Note that in the finite case, the characterising words of an n-state residual automaton have length at most 2 n , since one can determinise automata. In our case, this no longer holds, and we show that the length of characterising word is not bounded in the number of states only. We state this result in terms of register automata to help intuition.
Proposition 5.5. There is a family of residual register automata A k (k ≥ 1) with two states and k registers of which the characterising words have length k.
Proof. We define a variation on the automaton A r from Section 3 using the alphabet Σ = A ∪ {Anc(a) | a ∈ A}. The automaton A k is defined by the following sets, where A k denotes the set of k-element subsets of A (note that A 1 = A r ): A state S = {a 1 , . . . , a k } ∈ A k is anchored by the word w = Anc(a 1 ) . . . Anc(a k ), which is of length k. This is also the shortest characterising word for that state. Note that Q only has two orbits, meaning that a register automaton equivalent to this nominal automaton only requires two states.

Exact learning
In our previous paper on learning nominal automata [MSS + 17], we provided a learning algorithm to learn residual automata, that converges for deterministic languages. However, we observed by experimentation that the algorithm was also able to learn certain nondeterministic languages. At that point we did not know which class of languages could be accepted by residual nominal automata, and so it was left open whether the algorithm converges for all residual languages. In this section we will answer this question negatively, but also provide a modified algorithm which does always converge.
6.1. Angluin-style learning. We briefly review the classical automata learning algorithms L by Angluin [Ang87] for deterministic automata, and NL by Bollig et al. [BHKL09] for residual automata.
Both algorithms can be seen as a game between two players: the learner and the teacher. The learner aims to construct the minimal automaton for an unknown language L over a finite alphabet Σ. In order to do this, it may ask the teacher, who knows about the language, two types of queries: Membership query: Is a given word w in the target language, i.e., w ∈ L? Equivalence query: Does a given hypothesis automaton H recognise the target language, that is, is L = L(H)?
If the teacher replies yes to an equivalence query, then the algorithm terminates, as the hypothesis H is correct. Otherwise, the teacher must supply a counterexample, that is a word in the symmetric difference of L and L(H). Availability of equivalence queries may seem like a strong assumption and in fact it is often weakened by allowing only random sampling (see [KV94] or [Vaa17] for details).
Observations about the language made by the learner via queries are stored in an observation table T . This is a table where rows and columns range over two finite sets of words S, E ⊆ Σ respectively, and T (u, v) = 1 if and only if uv ∈ L. Intuitively, each row of T approximates a derivative of L, in fact we have T (u) ⊆ u −1 L. However, the information contained in T may be incomplete: some derivatives w −1 L are not reached yet because no membership queries for w have been posed, and some pairs of rows T (u), T (v) may seem equal to the learner, because no word has been seen yet which distinguishes them. The learning algorithm will add new words to S when new derivatives are discovered, and to E when words distinguishing two previously identical derivatives are discovered.
The table T is closed whenever one-letter extensions of derivatives are already in the table, i.e., T has a row for ua −1 L, for all u ∈ S, a ∈ Σ. If the table is closed, 5 L is able to construct an automaton from T , where states are distinct rows (i.e., derivatives). The construction follows the classical one for the canonical automaton of a language from its derivatives [Ner58]. The NL algorithm uses a modified notion of closedness, where one is allowed to take unions (i.e., a one-letter extension can be written as unions of rows in T ), and hence is able to learn a RFSA accepting the target language.
When the table is not closed, then a derivative is missing, and a corresponding row needs to be added. Once an automaton is constructed, it is submitted in an equivalence query. If a counterexample is returned, then again the table is extended, after which the process is repeated iteratively. The L and NL algorithms adopt different counterexample-handling strategies: the former adds a new row, the latter a new column. Both result in a new derivative being detected.
6.2. The nominal case. In [MSS + 17] we have given nominal versions of L and NL , called νL and νNL respectively. They seamlessly extend the original algorithms by operating on orbit-finite sets. This allows us to learn automata over infinite alphabets, but using only finitely many queries. The algorithm νL always terminates for deterministic languages, because the language only has orbit-finitely many distinct derivatives (Theorem 2.8), and hence only need orbit-finitely many distinct rows in the observation table. However, it will never terminate for languages not accepted by deterministic automata (such as residual or nondeterministic languages). The nondeterministic case is more interesting. Using Theorem 4.10, we can finally establish which nondeterministic languages can be characterised via orbit-finitely many observations. Corollary 6.2 (of Theorem 4.10). Let L be a nondeterministic nominal language. If L is a residual language, then there exists an observation table with orbit-finitely many rows and columns from which we can construct the canonical residual automaton.
This explains why in [MSS + 17] νNL was able to learn some residual nondeterministic automata: an orbit-finite observation table exists, which allows νNL to construct the canonical residual automaton. Unfortunately, the νNL algorithm does not guarantee that it always finds this orbit-finite observation table. We only have that guarantee for deterministic languages. The following example shows that νNL may indeed diverge when trying to close the table.
Example 6.3. Suppose νNL tries to learn the residual language L accepted by the automaton below over the alphabet Σ = A ∪ {Anc(a) | a ∈ A}. This is a slight modification of the residual language of Section 3. The algorithm starts by considering the row for the empty word , and its one-letter extensions · a = a and · Anc(a) = Anc(a). These rows correspond to the derivatives −1 L = L, a −1 L and Anc(a) −1 L. Column labels are initialised to the empty word . At this point a −1 L and Anc(a) −1 L appear identical, as the only column does not distinguish them. However, they appear different from −1 L, so the algorithm will add the row for either a or Anc(a) in order to close the table. Suppose the algorithm decides to add a. Then it will consider one-letter extensions ab, abc, abcd, etc. Since these correspond to different derivativeseach strictly smaller than the previous one -the algorithm will get stuck in an attempt to close the table. At no point it will try to close the table with the word Anc(a), since it stays equivalent to a. So in this case νNL will not terminate. However, if the algorithm instead adds Anc(a) to the row labels, it will then also add Anc(a)Anc(b), which is a characterising word for the initial state. In that case, νNL will terminate. 6.3. Modified νNL . We modify the νNL algorithm from [MSS + 17] to ensure that it always terminates. We do this by changing how the table will be closed. The algorithm is shown in Algorithm 1 with the changes made to νNL in red. In short, the change is as follows. When the algorithm adds a word w to the set of rows, then it also adds all other words of length |w|. Since all words of bounded length are added, the algorithm will eventually find all characterising words of the canonical residual automaton, and it will therefore be able to reconstruct this automaton. We briefly recall the notation we use in the algorithm and afterwards prove convergence. We denote the observation table T by the pair T = (S, E) of row and column indices. The membership will be queried for the set SE ∪ SΣE and these observations will be stored during the algorithm. The function row T : S ∪ SΣ → P fs (E) returns the content of each row, i.e., row T (t) := {e ∈ E | te ∈ L}. 6 Since row T takes values in a nominal join-semilattice, we Algorithm 1 Modified nominal NL algorithm for Theorem 6.6.
Modified νNL learner 1 Initialise T with S, E = { } 2 repeat 3 while T is not join-closed or not join-consistent 4 if T is not join-closed 5 find s ∈ S, a ∈ A such that row(sa) ∈ JI(Rows(T )) \ Rows ↑ (T ) 6 l = length of the word sa 7 if T is not join-consistent 9 find s 1 , s 2 ∈ S, a ∈ A, and e ∈ E such that row(s 1 ) ≤ row(s 2 ) but e ∈ row(s 1 a) and e / ∈ row(s 2 a) 10 E = E ∪ orb(ae) 11 Query H = A(T ) for equivalence 12 if the Teacher replies no, with a counter-example t 13 E = E ∪ {orb(t 0 ) | t 0 is a suffix of t} 14 until the Teacher replies yes to the equivalence query. 15 return H use the notation (≤, <, ∨, . . .) from Section 4.1 on rows. Note that row T (s) = s −1 L ∩ E can be thought of an approximation to s −1 L. We will omit the subscript T in row.
Given an observation table T = (S, E), we define the set of rows as This is an orbit-finite poset, ordered by ≤, that is, the order on P fs (E) given by subset inclusion. We define the set of upper rows as Rows ↑ (T ) := {row(s) | s ∈ S} ⊆ Rows(T ).
Definition 6.4. A table T = (S, E) is • join-closed if for each s ∈ S and a ∈ Σ we have row(sa) = {row(s) | row(s) ≤ row(sa), s ∈ JI(Rows(T )) ∩ Rows ↑ (T )}, in words: each extended row sa can be obtained as a join of join-irreducible rows in S; • join-consistent if for all s 1 , s 2 ∈ S and a ∈ Σ we have row(s 1 ) ≤ row(s 2 ) =⇒ row(s 1 a) ≤ row(s 2 a).
Another way to define join-closedness would be to consider the set JI(Rows ↑ (T )) instead of JI(Rows(T )) ∩ Rows ↑ (T ). This would slightly change the algorithm, but not substantially. We stick to the original description of NL [BHKL09]. This closely follows the definition of the canonical residual automaton in Theorem 4.10, but only uses the information from the observation table. This construction is effective, because one can decide whether r ∈ Rows(T ) is a join-irreducible element if r = {y ∈ Rows(T ) | y < x} and r is non-empty. This is a set-builder expression in the programming language developed in [Boj19]. The rest of the construction is directly given as set-builder expressions.
In the remainder of this section we prove termination of our modified learning algorithm. In the following, |X| is the orbit-count of an orbit-finite set X. By atom-dimension of X we mean the maximal size of supports of elements of X. Let p(k) denote the number of orbits of the set {(a 1 , a 2 , . . . , a k ) | a i ∈ A, ∀i, j : a i = a j }; it equals the number of partial permutations on an k-element set.
Theorem 6.6. Algorithm 1 query learns residual nominal languages. Moreover, it uses at most O(l+|Σ ≤l | 2 ·p(dl)) equivalence queries, where l is the length of the longest characterising word and d is the atom-dimension of Σ.
The theorem will be proven with the help of the following lemmata.
Lemma 6.7. Every observation table (S, E) during the execution can be extended to a join-consistent table (S, E ) by adding orbit-finitely many columns.
Proof. Note that the row function defines a preorder on S defined by s 1 s 2 iff row(s 1 ) ≤ row(s 2 ). Each time columns are added in line 10 to solve join-inconsistency, this preorder is refined. So we obtain a chain of preorders: · · · 3 2 1 ⊆ S × S. When the preorder has been maximally refined it will only contain identity pairs, so joinconsistency trivially holds. Since the set S × S is orbit-finite, this chain has a length at most |S × S|. The number of orbits in S × S is bounded 7 by |S| 2 · p(k), where k is the atom-dimension of S. Lemma 6.8. If the set S of an observation table T = (S, E) contains all characterising words, then the table is join-closed.
Proof. For this lemma, we consider the idealised observation table T = (S, Σ * ). In this table, we have row T (s) = s −1 L for each s ∈ S. Since S contains all characterising words, we have JI(Rows(T )) = JI(Der(L)). This means that the idealised table is join-closed, that is, for s ∈ S and a ∈ Σ we have row T (sa) = s∈I row T (s), for a suitable orbit-finite set of row indices I. This equation still holds if we restrict to the set E, and so the table T is join-closed. Lemma 6.9. Given an observation table (S, E) for a residual language, the algorithm extends it to a join-closed and join-consistent table (S , E ) in finitely many steps.
Proof. We will show that lines 7 and 10 are executed finitely many times. Line 7 adds Σ l incrementally to the set S with increasing l, and l is only increased until Σ ≤l contains all the required characterising words (Lemma 6.8). When line 10 is executed, we have two cases: if the resulting (S, E ) table is join-closed but not join-consistent, we keep adding columns, ∈ row(s) ∩ E ⇐⇒ row(s) ∈ F and ∈ E ⇐⇒ ∈ row(s) ∧ ∈ E au ∈ row(s) ∩ E ⇐⇒ u ∈ δ(row(s), a) ⇐⇒ ∃s with row(s ) ≤ row(sa) and u ∈ row(s ) ∩ E ⇐⇒ ∃s with row(s ) ≤ row(sa) and u ∈ row(s ) and u ∈ E ⇐⇒ u ∈ δ(row(s), a) and u ∈ E ⇐⇒ au ∈ row(s) and u ∈ E Note that we need E to be suffix-closed and that the empty word is in E.
We can now prove the main theorem of this section.
Proof of Theorem 6.6. Each time a counterexample is added, the next hypothesis (which will always be constructed per Lemma 6.9) will be different (Lemma 6.10). But this can only happen if a column or row is added, which we only need to do finitely many times (Lemmas 6.8 and 6.7). To be precise, this happens at most l + |Σ ≤l | 2 · p(dl) times, where p is from the proof of Lemma 6.7 and d is the atom-dimension of Σ and l is the least such that Σ ≤l contains a characterising word for each state of the canonical residual automaton. (Put differently: consider the shortest characterising words for all states, then l is the length of the longest of these.) So we conclude that the algorithm terminates after finitely many equivalence queries and only need finitely many membership queries since the table (S ∪ SΣ) × E is orbit-finite.
Unfortunately, considering all words bounded by a certain length requires many membership queries. In fact, characterising words can be exponential in length [DLT02], meaning that this algorithm may need doubly exponentially many membership queries. 8 Remark 6.11. Note that our termination argument is not concerned with the implementation of the teacher. This is standard for Angluin-style algorithms, which assume that the teacher is always able to provide correct answers to queries. As mentioned, this assumption is often too strong, and in our setting a direct equivalence check is not available due to Proposition 5.1. In practice, however, it is common to use testing techniques [Vaa17]. 8 The reader should not interpret this as a complexity upper bound. In fact, no upper bound is known on the length of characterising words. 7. Discussion 7.1. Conclusion. In this paper we have investigated a subclass of nondeterministic automata over infinite alphabets. This class naturally arises in the context of query learning, where automata have to be constructed from finitely many observations. Although there are many classes of data languages, we have shown that our class of residual languages admits canonical automata. The states of these automata correspond to join-irreducible elements.
In the context of learning, we show that convergence of standard Angluin-style algorithms is not guaranteed, even for residual languages. We propose a modified algorithm which guarantees convergence at the expense of an increase in the number of observations.
We emphasise that, unlike other algorithms based on residuality such as NL [BHKL09] and AL [AEF15], our algorithm does not depend on the size, or even the existence, of the minimal deterministic automaton for the target language. This is a crucial difference, since dependence on the minimal deterministic automaton hinders generalisation to nondeterministic nominal automata, which are strictly more expressive. Ideally, in the residual case, one would like to have an efficient algorithm for which the complexity depends only on the length of characterising words, which is an intrinsic feature of residual automata. To the best of our knowledge, no such algorithm exists in the finite setting.
Finally, another interesting open question is whether all nondeterministic automata can be efficiently learned. We note that nondeterministic automata can be enumerated, and hence can be learned via equivalence queries only. This would result in a highly inefficient algorithm. This parallels the current understanding of learning probabilistic languages. Although efficient (learning in the limit) learning algorithms for deterministic and residual languages exist [DE04], the general case is still open.