Learning Regular Languages over Large Ordered Alphabets

This work is concerned with regular languages defined over large alphabets, either infinite or just too large to be expressed enumeratively. We define a generic model where transitions are labeled by elements of a finite partition of the alphabet. We then extend Angluin's L* algorithm for learning regular languages from examples for such automata. We have implemented this algorithm and we demonstrate its behavior where the alphabet is a subset of the natural or real numbers. We sketch the extension of the algorithm to a class of languages over partially ordered alphabets.


Introduction
The main contribution of this paper is a generic algorithm for learning regular languages defined over a large alphabet Σ.Such an alphabet can be infinite, like N or R or just so large, like B n for very large n or large subsets of N, so that it is impossible or impractical to treat it in an enumerative way, that is, to write down the entries of the transition function δ(q, a) for every a ∈ Σ.The obvious solution is to use a symbolic representation where transitions are labeled by predicates which are applicable to the alphabet in question.Learning algorithms infer an automaton from a finite set of words (the sample) for which membership is known.Over small alphabets, the sample should include the set S of all the shortest words that lead to each state (access sequences) and, in addition, the set S • Σ of all their Σ-continuations.Over large alphabets this is not a practical option and as an alternative we develop a symbolic learning algorithm over symbolic words which are only partially backed up by the sample.In a sense, our algorithm is a combination of automaton learning and learning of non-temporal predicates.Before getting technical, let us discuss briefly some motivation.
Finite automata are among the corner stones of Computer Science.From a practical point of view they are used routinely in various domains ranging from syntactic analysis, design of user interfaces or administrative procedures to implementation of digital hardware and verification of software and hardware protocols.Regular languages admit a very Our approach is different: we do allow the values of the input symbols to influence transitions via predicates, possibly of a restricted complexity.These predicates involve domain constants and they partition the alphabet into finitely many classes.For example, over the integers a state may have transitions labeled by conditions of the form c 1 ≤ x ≤ c 2 which give real (but of limited resolution) access to the input domain.On the other hand, we insist on a finite (and small) memory so that the exact value of x cannot be registered and has no future influence beyond the transition it has triggered.Many control systems, artificial (sequential machines working on quantized numerical inputs) as well as natural (central nervous system, the cell), are believed to operate in this manner.The automata that we use, like the symbolic automata and transducers studied in [HV11, VHL + 12, VB12], are geared toward languages recognized by automata having a large alphabet and a relativelysmall state space.
We then develop a symbolic version of Angluin's L * algorithm for learning regular sets from queries and counter-examples whose output is a symbolic automaton.The main difference relative to the concrete algorithm is that in the latter, every transition δ(q, a) in a conjectured automaton has at least one word in the sample that exercises it.In the symbolic case, a transition δ(q, a) where a stands for a set of concrete symbols, will be backed up in the sample only by a subset of a. Thus, unlike concrete algorithms where a counter-example always leads to a discovery of one or more new states, in our algorithm it may sometimes only modify the boundaries between partition blocks without creating new states.There are some similarities between our work and another recent adaptation of the L * algorithm to symbolic automata, the Σ * algorithm of [BB13].This work is incomparable to ours as they use a richer model of transducers and more general predicates on inputs and outputs.Consequently their termination result is weaker and is relative to the termination of the counter-example guided abstraction refinement procedure.
The rest of the paper is organized as follows.In Section 1 we provide a quick summary of learning algorithms over small alphabets.In Section 2 we define symbolic automata and then extend the structure which underlies all automaton learning algorithms, namely the observation table, to be symbolic, where symbolic letters represent sets, and where entries in the table are supported only by partial evidence.In Section 4 we write down a symbolic learning algorithm, an adaptation of L * for totally ordered alphabets such as R or N and illustrate the behavior of a prototype implementation.The algorithm is then extended to languages over partially ordered alphabets such as N d and R d where in each state, the labels of outgoing transition from a monotone partition of the alphabet are represented by finitely many points.We conclude by a discussion of past and future work.

Learning Regular Sets
We briefly survey Angluin's L * algorithm [Ang87] for learning regular sets from membership queries and counter-examples, with slightly modified definitions to accommodate for its symbolic extension.Let Σ be a finite alphabet and let Σ * be the set of sequences (words) over Σ.Any order relation < over Σ can be naturally lifted to a lexicographic order over Σ * .With a language L ⊆ Σ * we associate a characteristic function f : Σ * → {+, −}, where f (w) = + if the word w ∈ Σ * belongs to L and f (w) = −, otherwise.
A deterministic finite automaton over Σ is a tuple A = (Σ, Q, δ, q 0 , F ), where Q is a non-empty finite set of states, q 0 ∈ Q is the initial state, δ : Q × Σ → Q is the transition function, and F ⊆ Q is the set of final or accepting states.The transition function δ can be extended to δ : Q × Σ * → Q, where δ(q, ǫ) = q, and δ(q, u • a) = δ(δ(q, u), a) for q ∈ Q, a ∈ Σ and u ∈ Σ * .A word w ∈ Σ * is accepted by A if δ(q 0 , w) ∈ F , otherwise w is rejected.The language recognized by A is the set of all accepted words and is denoted by L(A).
Learning algorithms, represented by the learner, are designed to infer an unknown regular language L (the target language).The learner aims to construct a finite automaton that recognizes L by gathering information from the teacher.The teacher knows L and can provide information about it.It can answer two types of queries: membership queries, i.e., whether a given word belongs to the target language, and equivalence queries, i.e., whether a conjectured automaton suggested by the learner is the right one.If this automaton fails to accept L the teacher responds to the equivalence query by a counter-example, a word miss-classified by the conjectured automaton.
In the L * algorithm, the learner starts by asking membership queries.All information provided is suitably gathered in a table structure, the observation table.Then, when the information is sufficient, the learner constructs a hypothesis automaton and poses an equivalence query to the teacher.If the answer is positive then the algorithm terminates and returns the conjectured automaton.Otherwise the learner accommodates the information provided by the counter-example into the table, asks additional membership queries until it can suggest a new hypothesis and so on, until termination.
A prefix-closed set S ⊎ R ⊂ Σ * is a balanced Σ-tree if ∀a ∈ Σ: 1) For every s ∈ S s • a ∈ S ∪ R, and 2) For every r ∈ R, r • a ∈ S ∪ R. Elements of R are called boundary elements or leaves. 1 Definition 1.1 (Observation Table ).An observation table is a tuple T = (Σ, S, R, E, f ) such that Σ is an alphabet, S∪R is a balanced Σ-tree, E is a subset of Σ * and f : (S∪R)•E → {−, +} is the classification function, a restriction of the characteristic function of the target language L.
The set (S ∪ R) • E is the sample associated with the table, that is, the set of words whose membership is known.The elements of S admit a tree structure isomorphic to a spanning tree of the transition graph rooted in the initial state.Each s ∈ S corresponds to a state q of the automaton for which s is an access sequence, one of the shortest words that lead from the initial state to q.The elements of R should tell us about the back-and cross-edges in the automaton and the elements of E are "experiments" that should be sufficient to distinguish between states.This works by associating with every s ∈ S ∪ R a specialized classification function f s : E → {−, +}, defined as f s (e) = f (s • e), which characterizes the row of the observation table labeled by s.To build an automaton from a table it should satisfy certain conditions.Definition 1.2 (Closed, Reduced and Consistent Tables).An observation table T is: Note that a reduced table is trivially consistent and that for a closed and reduced table we can define a function g : R → S mapping every r ∈ R to the unique s ∈ S such that f s = f r .From such an observation table T = (Σ, S, R, E, f ) one can construct an automaton 1 We use ⊎ for disjoint union.
A T = (Σ, Q, q 0 , δ, F ) where Q = S, q 0 = ǫ, F = {s ∈ S : f s (ǫ) = +} and The learner attempts to keep the table closed at all times.The table is not closed when there is some r ∈ R such that f r is different from f s for all s ∈ S. To close the table, the learner moves r from R to S and adds the Σ-successors of r, i.e., all words r • a for a ∈ Σ, to R. The extended table is then filled up by asking membership queries until it becomes closed.
Variants of the L * algorithm differ in the way they treat counter-examples, as described in more detail in [BR04].The original algorithm [Ang87] adds all the prefixes of the counterexample to S and thus possibly creating inconsistency that should be fixed.The version proposed in [MP95] for learning ω-regular languages adds all the suffixes of the counterexample to E. The advantage of this approach is that the table always remains consistent and reduced with S corresponding exactly to the set of states.A disadvantage is the possible introduction of redundant columns that do not contribute to further discrimination between states.The symbolic algorithm that we develop in this paper is based on an intermediate variant, referred to in [BR04] as the reduced observation algorithm, where some prefixes of the counter-example are added to S and some suffixes are added to E.
Example 1.3.We illustrate the behavior of the L * algorithm while learning a language L over Σ = {1, 2, 3, 4, 5}.We use the tuple (w, +) to indicate a counter-example w ∈ L rejected by the conjectured automaton, and (w, −) for the opposite case.Initially, the observation table is T 0 = (Σ, S, R, E, f ) with S = E = {ǫ} and R = Σ and we ask membership queries for all words in (S ∪ R) • E to obtain table T 0 , shown in Fig. 1.The table is not closed so we move word 1 to S, add its continuations, 1 • Σ to R and ask membership queries to obtain table T 1 which is now closed.We construct an hypothesis A 1 (Fig. 2) from this table, and pose an equivalence query for which the teacher returns counter-example (3 • 1, −).We add 3 • 1 and its prefix 3 to set S and add all their continuations to the boundary of the table resulting table T 2 of Fig. 1.This table is not consistent: two elements ǫ and 3 in S are equivalent but their successors 1 and 3 • 1 are not.In order to distinguish the two strings we add to E the suffix 1 and end up with a closed and consistent table T 3 .The new hypothesis for this table is A 3 , shown in Fig. 2. Once more the equivalence query will return a counter-example, (1 • 3 • 3, −).We again add the counter-example and prefixes to the table, ask membership queries to fill in the table and solve the inconsistency that appears for 1 and 1 • 3 by adding suffix 3 to the table.The table corresponds now to the correct hypothesis A 5 , and the algorithm terminates.

Symbolic Automata
In this section we introduce the variant of symbolic automata that we use.Symbolic automata [HV11,VB12] give a more succinct representation for languages over large finite alphabets and can also represent languages over infinite alphabets such as N, R, or R n .The size of a standard automaton for a language grows linearly with the size of the alphabet and so does the complexity of learning algorithms such as L * .As we shall see, symbolic automata admit a variant of the L * algorithm whose complexity is independent of the alphabet size.Let Σ be a large, possibly infinite, alphabet, to which we will refer from now on as the concrete alphabet.We define a symbolic automaton to be an automaton over Σ where each state has a small number of outgoing transitions labeled by symbols that represent subsets of Σ.For every state, these subsets form a (possibly different) partition of Σ and hence the automaton is complete and deterministic.We start with an arbitrary alphabet viewed as an unstructured set and present the concept in purely semantic manner before we move to ordered sets and inequalities in subsequent sections.
Let Σ be a finite alphabet, that we call the symbolic alphabet and its elements symbolic letters or symbols.Let ψ : Σ → Σ map concrete letters into symbolic ones.The Σ-semantics of a symbolic letter a ∈ Σ is defined as [a] ψ = {a ∈ Σ : ψ(a) = a} and the set {[a] ψ : a ∈ Σ} forms a partition of Σ.We will often omit ψ from the notation and use [a] when ψ, which is always present, is clear from the context.The Σ-semantics can be extended to symbolic words of the form w = a 1 • a 2 • • • a k ∈ Σ * as the concatenation of the concrete one-letter languages associated with the respective symbolic letters or, recursively speaking, concrete and symbolic transition functions respectively, such that δ(q, a) = δ(q, ψ q (a)), • q 0 is the initial state and F is a set of accepting states.
The transition function is extended to words as in the concrete case and the symbolic automaton can be viewed as an acceptor of a concrete language.When at q and reading a concrete letter a, the automaton will take the transition δ(q, a) where a is the unique element of Σ q satisfying a ∈ [a].Hence L(A) consists of all concrete words whose run leads from q 0 to a state in F .A language L over alphabet Σ is symbolic recognizable if there exists a symbolic automaton A such that L = L(A).Remark: The association of a symbolic language with a symbolic automaton is more subtle because we allow different partitions of Σ and hence different symbolic input alphabets at different states.The transition to be taken while being in a state q and reading a symbol a ∈ Σ q is well defined only when [a] ⊆ [a ′ ] for some a ′ ∈ Σ q .Such a model can be transformed into an automaton which is complete over a symbolic alphabet which is common to all states as follows.Let and let Σ = {b ∈ Σ ′ : [b] = ∅}.Then we define A = ( Σ, Q, δ, q 0 , F ) where, by construction, for every b ∈ Σ and every q ∈ Q, there is a unique a ∈ Σ q such that [b] ⊆ [a] and hence one can define the transition function as δ(q, b) = δ(q, a).This model is more comfortable for language-theoretic studies but in the learning context it introduces an unnecessary blowup in the alphabet size and the number of queries for every state.For this reason we stick in this paper to the Definition 2.1 which is more economical.A similar approach of state-local abstraction has been taken in [IHS13] for learning parameterized language.The construction of Σ ′ is similar to the minterm construction of [DV14] used to create a common alphabet in order to apply the minimization algorithm of Hopcroft to symbolic automata.Anyway, in our learning framework symbolic automata are used to read concrete and not symbolic words.
It is straightforward that for a finite concrete alphabet Σ the set of languages accepted by symbolic automata coincides with the set of recognizable regular languages over Σ.Moreover, even when the alphabet is infinite, closure under Boolean operations is preserved.
Proposition 2.2 (Closure under Boolean Operations).Languages accepted by deterministic symbolic automata are effectively closed under Boolean operations.
Proof.Closure under complement is immediate by complementing the set of accepting states.For intersection the standard product construction is adapted as follows.Let L 1 , L 2 be languages recognized by the symbolic automata 2 ) It is sufficient to observe that the corresponding implied concrete automata A 1 , A 2 and A satisfy δ((q 1 , q 2 ), a) = (δ 1 (q 1 , a), δ 2 (q 2 , a)) and the standard proof that L(A) = L(A 1 ) ∩ L(A 2 ) follows.Closure under union and set difference is then evident.
The above product construction is used to implement equivalence queries where both the target language and the current conjecture are represented by symbolic automata.A counter-example is found by looking for a shortest path in the product automaton from the initial state to a state in and selecting a lexicographically minimal concrete word along that path.

Symbolic Observation Tables
In this section we adapt observation tables to the symbolic setting.They are similar to the concrete case with the additional notions of evidences and evidence compatibility.
Definition 3.1 (Balanced Symbolic Σ-Tree).A balanced symbolic Σ-tree is a tuple (Σ, S, R, ψ) where It is required that for every s ∈ S and a ∈ Σ s , s • a ∈ S ∪ R and for any r ∈ R and a ∈ Σ, r • a ∈ S ∪ R .Elements of R are called boundary elements of the tree.
We will use observation tables whose rows are symbolic words and hence an entry in the table will constitute a statement about the inclusion or exclusion of a large set of concrete words in the language.We will not ask membership queries concerning all those concrete words, but only for a small representative subset that we call evidence.
As for the concrete case we use f s : E → {−, +} to denote the partial evaluation of f to some symbolic word s ∈ S ∪ R, such that, f s (e) = f (s • e).Note that the set E consists of concrete words but this poses no problem because elements of E are used only to distinguish between states and do not participate in the derivation of the symbolic automaton from the table.Concatenation of a symbolic word and a concrete one follows concatenation of symbolic words as defined above where each concrete letter a is considered as a symbolic letter a with [a] = {a} and µ(a) = a.The notions of closed, consistent and reduced table are similar to the concrete case.
The set M T = (S ∪ R) • E is called the symbolic sample associated with T .We require that for each word w ∈ M T there is at least one concrete w ∈ µ(w) whose membership in L, denoted by f (w), is known.The set of such words is called the concrete sample and is defined as A table where all evidences of the same symbolic word admit the same classification is called evidence-compatible.
When a table T is evidence compatible the symbolic classification function f can be defined for every s ∈ (S ∪ R) and e ∈ E as f (s • e) = f (s • e), s ∈ µ(s).
Theorem 3.4 (Automaton from Table ).From a closed, reduced and evidence compatible table one can construct a deterministic symbolic automaton compatible with the concrete sample.
Proof.The proof is similar to the concrete case.Let T = (Σ, Σ, S, R, ψ, E, f , µ) be such a table, which is reduced and closed and thus a function g : R → S such that g(r) = s iff f r = f s is well defined.The automaton derived from the table is then A T = (Σ, Σ, ψ, Q, δ, q 0 , F ) where: By construction and like the L * algorithm, A T classifies correctly the symbolic sample and, due to evidence compatibility, this holds also for the concrete sample.

Learning Languages over Ordered Alphabets
In this section we present a symbolic learning algorithm starting with an intuitive verbal description.The algorithmic scheme is similar to the concrete L * algorithm but differs in the treatment of counter-examples and the new concept of evidence compatibility.Whenever the table is not closed, S ∪ R is extended until closure.Then a conjectured automaton A T is constructed and an equivalence query is posed.If the answer is positive we are done.Otherwise, the teacher provides a counter-example leading to the extension of S ∪ R and/or E. Whenever such an extension occurs, additional membership queries are posed to fill the table.The table is always kept evidence compatible and reduced except temporarily during the processing of counter-examples.
From now on we assume Σ to be a totally ordered alphabet with a minimal element a 0 and restrict ourselves to symbolic automata where the concrete semantics for every symbolic letter is an interval.In the case of a dense order like in R, we assume the intervals to be left-closed and right-open.The order on the alphabet can be extended naturally to a lexicographic order on Σ * .Our algorithm also assumes that the teacher provides a counterexample of minimal length which is minimal with respect to the lexicographic order.This strong assumption improves the performance of the algorithm and its relaxation is discussed in Section 7.
The rows of the observation table consist of symbolic words because we want to group together all concrete letters and words that are assumed to induce the same behavior in the automaton.New symbolic letters are introduced in two occasions: when a new state is discovered or when a partition is modified due to a counter-example.In both cases we set the concrete semantics [a] to the largest possible subset of Σ, given the current evidence (in the first case it will be Σ).As an evidence we always select the smallest possible a ∈ [a] (a 0 when [a] = Σ).The choice of the right evidences is a key point for the performance of the algorithm as we want to keep the concrete sample as small as possible and avoid posing unnecessary queries.For infinite concrete alphabets this choice of evidence guarantees termination.
The initial symbolic table is T = (Σ, Σ, S, R, ψ, E, f , µ), where Σ = {a 0 }, [a 0 ] = Σ, S = {ǫ}, R = {a 0 }, E = {ǫ}, and µ(a 0 ) = {a 0 }.The table is filled by membership queries concerning ǫ and a 0 .Whenever T is not closed, there is some r ∈ R such that f r = f s for every s ∈ S. To close the table we move r from R to S, recognizing it as a new state, and checking the behavior of its continuation.To this end we add to R the word r ′ = r • a, Algorithm 1 The symbolic algorithm 1: procedure Symbolic if EQ(A T ) then ⊲ A T is correct Ask MQ for all words in {µ(r • a new ) • e : e ∈ E} 9: end while 11: end procedure where a is a new symbolic letter with [a] = Σ.We extend the evidence function by letting µ(r ′ ) = µ(r) • a 0 , assuming that all elements of Σ behave as a 0 from r. Once T is closed we construct a hypothesis automaton as described in the proof of Theorem 3.4.When a counter-example w is presented, it is of course not part of the concrete sample.A miss-classified word in the conjectured automaton means that somewhere a wrong transition is taken.Hence w admits a factorization w = u • b • v where u ∈ Σ * and b ∈ Σ is where the first wrong transition is taken.Obviously we do not know u and b in advance but know that this happens in the following two cases.Either b leads to an undiscovered state in the automaton of the target language, or letter b does not belong to the interval it was assumed to belong in the conjectured automaton.The latter case happens only when b does not belong to the evidence function.Since counter-example w is minimal, it admits Procedure 3 Process counter-example 6: Ask MQ for all words in {µ(u • a new ) • e : e ∈ E} 10: else ⊲ u is in the boundary 13: S ′ = S ∪ {u} ⊲ and becomes a state 14: if b = a 0 then 15: 18: 25: Ask MQ for all words in end if 34: end procedure ∈ µ(u ′ ) for any word u ′ in the symbolic sample.We consider two cases, u ∈ S and u ∈ R.
In the first case, when u ∈ S, u is already a state in the hypothesis but b indicates that the partition boundariues are not correctly defined and need refinement.That is, u • b was wrongly considered to be part of [u • a] for some a ∈ Σ u , and thus b was wrongly considered to be part of In the second case, the symbolic word u is part of the boundary.From the counterexample we deduce that u is not equivalent to any of the existing states in the hypothesis and should form a new state.Specifically, we find the prefix s that was considered to be equivalent to u, that is g(u) = s ∈ S. Since the table is reduced f u = f s ′ for any other s ′ ∈ S. Because w is the shortest counter-example, the classification of s • b • v in the automaton is correct (otherwise s • b • v, for some s ∈ [s] would constitute a shorter counter-example) and different from that of u • b • v.We conclude that u is a new state, which is added to S. To distinguish between u and s we add to E the word b • v, possibly with some of its suffixes (see [BR04] for a more detailed discussion of counter-example processing).
As u is a new state we need to add its continuations to R. We distinguish two subcases depending on b.If b = a 0 , the smallest element of Σ, then a new symbolic letter a new is added to Σ, with [a new ] = Σ and µ(u new are added to R. A detailed description of the algorithm is given in Algorithm 1 and its major procedures, table closing and counter-example treatment are described in Procedures 2 and 3 respectively.A statement of the form Σ ′ = Σ ∪ {a} indicates the introduction of a new symbolic letter a ∈ Σ.We use M Q and EQ as shorthands for membership and equivalence queries, respectively.In the following we illustrate the symbolic algorithm as applied to a language over an infinite alphabet. Example 4.1.Let Σ = [0, 100) ⊂ R with the usual order and let L ⊆ Σ * be a target language.Fig. 5 shows the evolution of the symbolic observation tables and Fig. 6 depicts the corresponding automata and the concrete semantics of the symbolic alphabets.
We initialize the table with S = {ǫ}, R = {a 0 }, µ(a 0 ) = {0} and E = {ǫ} and ask membership queries for ǫ (rejected) and 0 (accepted).The obtained table, T 0 is not closed so we move a 0 to S, introduce Σ a 0 = {a 1 }, where a 1 is a new symbol, and add a 0 • a 1 to R with µ(a 0 • a 1 ) = 0 • 0. Asking membership queries we obtain the closed table T 1 and its automaton A 1 .We pose an equivalence query and obtain (50, −) as a (minimal) counter-example which implies that all words smaller than 50 are correctly classified.We add a new symbol a 2 to Σ ǫ and redefine the concrete semantics to [a 0 ] = {a < 50} and [a 2 ] = {a ≥ 50}.As evidence we select the smallest possible letter, µ(a 2 ) = 50, ask membership queries to obtain the closed table T 2 and automaton A 2 .
For this hypothesis we get a counter-example (0 • 30, −) whose prefix 0 is already in the sample, hence the misclassification occurs in the second transition.We refine the alphabet partition for state a 0 by introducing a new symbol a 3 and letting [a 1 ] = {a < 30} and [a 3 ] = {a ≥ 30}.Table T 3 is closed but automaton A 3 is still incorrect and a counterexample (50 • 0, −) is provided.The prefix 50 belongs to the evidence of a 2 and is moved from the boundary to become a new state and its successor a 2 • a 4 , for a new symbol a 4 , is added to R. To distinguish a 2 from ǫ, the suffix 0 of the counter-example is added to E resulting in T 4 which is not closed.The newly discovered state a 0 • a 1 is added to S, the filled table T 5 is closed and the conjectured automaton A 5 has two additional states.Subsequent equivalence queries result counter-examples (50•20, +), (50•80, −) and (50• 50 • 0, +) which are used to refine the alphabet partition at state a 2 and modify its outgoing transitions progressively as seen in automata A 6 , A 7 and A 8 , respectively.Automaton A 8 accepts the target language and the algorithm terminates.
Note that for the language in Example 1.3, the symbolic algorithm needs around 30 queries instead of the 80 queries required by L * .If we choose to learn a language as the one described in Example 4.1, restricting the concrete alphabet to the finite alphabet Σ = {1, . . ., 100}, then L * requires around 1000 queries compared to 17 queries required by our symbolic algorithm.As we shall see in Section 6, the complexity of the symbolic algorithm does not depend on the size of the concrete alphabet, only on the number of transitions.

Learning Languages over Partially-ordered Alphabets
In this section we sketch the extension of the results of this paper to partially-ordered alphabets of the form Σ = X d where X is a totally-ordered set such as an interval [0, k) ⊆ R. Letters of Σ are d-tuples of the form x = (x 1 , . . ., x d ) and the minimal element is 0 = (0, . . ., 0).The usual partial order on this set is defined as x ≤ y if and only if x i ≤ y i for all i = 1, . . ., d.When x ≤ y and x i = y i for some i the inequality is strict, denoted by x < y, and we say then that x dominates y.Two elements are incomparable, denoted by x||y, if x i < y i and x j > y j for some i and j.For partially-ordered sets, a natural extension of the partition of an ordered set into intervals is a monotone partition, where for each partition block P there are no three points such that x < y < z, x, z ∈ P , and y ∈ P .We define in the following such partitions represented by a finite set of points.
A forward cone B + (x) ⊂ Σ is the set of all points dominated by a point x ∈ Σ (see Fig. 7a).Let F = {x 1 , . . ., x l } be a set of points, then B + (F ) = B + (x 1 ) ∪ . . .∪ B + (x l ) as shown in Fig. 7b.From a family of sets of points F = {F 0 , . . ., F m−1 }, such that F 0 = {0} satisfying for every i: 1) ∀y ∈ F i , ∃x ∈ F i−1 such that x < y, and 2) ∀y ∈ F i , ∀x ∈ F i−1 , y < x, we can define a monotone partition of the form P = {P 1 , . . ., P m−1 }, where A subset P of Σ, as defined above, may have several mutually-incomparable minimal elements, none of which being dominated by any other element of P .One can thus apply the symbolic learning algorithm but without the presence of unique minimal evidence and minimal counter-example.For this reason a symbolic word may have more than one evidence.Evidence compatibility is preserved though due to the nature of the partition.
The teacher is assumed to return a counter-example chosen from a set of incomparable minimal counter-examples.Like in the algorithm for totally ordered alphabet, every counterexample either discovers a new state or refines a partition.The learning algorithm for partially-ordered alphabets is similar to Algorithm 1 and can be applied with only a minor modification in the treatment of the counterexamples and specifically in the refinement procedure.Lines 6-8 of Procedure 3 should be ignored in the case where there exists a symbolic letter a ′ , as illustrated in Fig. 8a, such that f (u In such a case, function ψ is updated as in line 9 by replacing a new by a ′ and b should be added to µ(a ′ ).In Fig. 8b, one can see the partition after refinement, where all letters above b have been moved from [a] to [a ′ ].The learner starts asking MQs for the empty word.A symbolic letter a 0 is chosen to represent its continuations with the minimal element of Σ as evidence, i.e., µ(a 0 ) = 0 0 .The symbolic word a 0 is moved to S for the table T 0 to be closed.The symbolic letter a 1 is added to the alphabet of state a 0 , and the learner asks a MQ for 0 0 0 0 , the evidence of the symbolic word a 0 a 1 .The first hypothesis automaton is A 0 with Σ-semantics [a 0 ] = [a 1 ] = Σ.The counter-example ( 45 50 , −) refines the partition for the initial state.The symbolic alphabet is extended to and µ(a 2 ) = 45 50 .The new observation table and hypothesis are T 1 and A 1 .Two more counter-examples will come to refine the partition for the initial state, ( 60 0 , −) and ( 0 70 , −), that will modify the partition for the initial state, moving all letters greater than 60 0 and 0 70 to the Σ-semantics of a 2 as can be seen in ψ 2 and ψ 3 respectively.
After the hypothesis A 3 , the counter-example ( Then counter-example ( 45 50 0 0 , +) is presented.As we can see, the prefix 45 50 exist already in µ(a 2 ) and a 2 ∈ R which means a 2 becomes a state, and to distinguish it from the state represented by the empty word the learner adds to E the suffix of the counterexample 0 0 .The resulting table T 8 is not closed and a 0 a 1 is moved to S. The new table T 9 is closed and evidence compatible.The hypothesis A 9 has now four states and the symbolic alphabet and Σ-semantics for each state can be seen in ψ 9 .The counter-examples that follow will refine the partition at state a 2 .The new transitions discovered and all refinements are shown in A 10−18 and ψ 10 − ψ 18 .The language was learned using 20 membership queries and 17 counter-examples.

On Complexity
The complexity of the symbolic algorithm is influenced not by the size of the alphabet but by the resolution (partition size) with which we observe it.Let L ⊂ Σ be the target language and let A be the minimal symbolic automaton recognizing this language with state set Q of size n and a symbolic alphabet Σ = q Σ q such that |Σ q | ≤ m for every q.
Each counter-example improves the hypothesis in one out of two ways.Either a new state is discovered or a partition gets refined.Hence, at most n − 1 equivalence queries of the first type can be asked and n(m − 1) of the second, resulting in O(mn) equivalence queries.
Concerning the size of the table, the set of prefixes S is monotonically increasing and reaches the size of exactly n elements.Since the table, by construction, is always kept reduced, the elements in S represent exactly the states of the automaton.The size of the boundary is always smaller than the total number of transitions in the automaton, that is mn − n + 1.The number of suffixes in E, playing a distinguishing role for the states of the automaton, range between log 2 n and n.Hence, the size of the table ranges between (n + m) log 2 n and n(mn + 1).For a totally ordered alphabet the size of the concrete sample coincides with the size of the symbolic sample associated with the table and hence the number of membership queries asked is O(mn 2 ).For a partially ordered alphabet with each F i defined by at most l points, some additional queries are asked.For every row in S, at most n(m − 1)(l − 1) additional words are added to the concrete sample, hence more membership queries might need to be asked.Furthermore, at most l − 1 more counter-examples are given to refine a partition.To conclude, the number of queries in total asked to learn language L is O(mn 2 ) if l < n and O(lmn) otherwise.

Conclusion
We have defined a generic algorithmic scheme for automaton learning, targeting languages over large alphabets that can be recognized by finite symbolic automata having a modest number of states and transitions.Some ideas similar to ours have been proposed for the particular case of parametric languages [BJR06] and recently in a more general setting The genericity of our algorithm is due to a semantic approach (alphabet partitions) but of course, each and every domain will have its own semantic and syntactic specialization in terms of the size and shape of the alphabet partitions.In this work we have implemented an instantiation of this scheme for alphabets such as (N, ≤) and (R, ≤).When dealing with numbers, the partition into a finite number of intervals (and monotone sets in higher dimensions) is very natural and used in many application domains ranging from quantization of sensor readings to income tax regulations.It will be interesting to compare the expressive power and succinctness of symbolic automata with other approaches for representing numerical time series and to compare our algorithm with other inductive inference techniques for sequences of numbers.
As a first excursion into the domain, we have made quite strong assumptions on the nature of the equivalence oracle, which, already for small alphabets, is a bit too strong and pedagogical to be realistic.We assumed that it provides the shortest counter-example and also that it chooses always the minimal available concrete symbol.We can relax the latter (or both) and even omit this oracle altogether and replace it by random sampling, as already proposed in [Ang87] for concrete learning.Over large alphabets, it might be even more appropriate to employ probabilistic convergence criteria a-la PAC learning [Val84] and be content with a correct classification of a large fraction of the words, thus tolerating imprecise tracing of boundaries in the alphabet partitions.This topic is subject to ongoing work.Another challenging research direction is the adaptation of our framework to languages over Boolean vectors.

2 :
while ∃r ∈ R such that ∀s ∈ S, f r = f s do 3:S ′ = S ∪ {r} ⊲ rbecomes a new state 4: {suffixes of b • v} 19: µ(u • a new ) = µ(u) • a 0 20: Ask MQ for all words in {µ(u • a new ) • e : e ∈ E ′ } [a].Due to minimality, all letters in [a] less than letter b behave like µ(a).We assume that all remaining letters in [a] behave like b and map them to a new symbol a new that we add to Σ u .We then update ψ u such that ψ ′ u (a) = a new for all a ∈ [a], a ≥ b, and ψ ′ u (a) = ψ u (a), otherwise.The evidence function is updated by letting µ(u•a new ) = µ(u)•b and u • a new is added to R.

Figure 9 .
Figure 9. Observation tables for Example 5.1 ) adds a new symbol a 3 and a new transition in the hypothesis automaton.The counter-examples that follow, namely, ( −), and ( 0 0 30 30 , −) refine the Σ-semantics for symbols in Σ a 0 as shown in ψ 4−7 .