Inferring Symbolic Automata

We study the learnability of symbolic finite state automata (SFA), a model shown useful in many applications in software verification. The state-of-the-art literature on this topic follows the query learning paradigm, and so far all obtained results are positive. We provide a necessary condition for efficient learnability of SFAs in this paradigm, from which we obtain the first negative result. The main focus of our work lies in the learnability of SFAs under the paradigm of identification in the limit using polynomial time and data, and its strengthening efficient identifiability, which are concerned with the existence of a systematic set of characteristic samples from which a learner can correctly infer the target language. We provide a necessary condition for identification of SFAs in the limit using polynomial time and data, and a sufficient condition for efficient learnability of SFAs. From these conditions we derive a positive and a negative result. The performance of a learning algorithm is typically bounded as a function of the size of the representation of the target language. Since SFAs, in general, do not have a canonical form, and there are trade-offs between the complexity of the predicates on the transitions and the number of transitions, we start by defining size measures for SFAs. We revisit the complexity of procedures on SFAs and analyze them according to these measures, paying attention to the special forms of SFAs: normalized SFAs and neat SFAs, as well as to SFAs over a monotonic effective Boolean algebra. This is an extended version of the paper with the same title published in CSL'22.


Introduction
Symbolic finite state automata, SFAs for short, are an automata model in which transitions between states correspond to predicates over a domain of concrete alphabet letters.Their purpose is to cope with situations where the domain of concrete alphabet letters is large or infinite.As an example for automata over finite large alphabets consider automata over the alphabet 2 AP where AP is a set of atomic propositions; these are used in model checking [CGP01,BK08].Another example, used in string sanitizer algorithms [HLM + 11], are automata over predicates on the Unicode alphabet which consists of over a million symbols.An infinite alphabet is used for example in event recording automata, a determinizable class of timed automata [AFH99] in which an alphabet letter consists of both a symbol from a finite alphabet, and a non-negative real number.Formally, the transition predicates in an SFA are defined with respect to an effective Boolean algebra as defined in section 2.
SFAs have proven useful in many applications [DVLM14, PGLM15, ASJ + 16, HD17, SV17, MRA + 17] and consequently have been studied as a theoretical model of automata.Many algorithms for natural questions over these automata already exist in the literature, in particular, Boolean operations, determinization, and emptiness [VdHT10]; minimization [DV16]; and language inclusion [KT14].Recently the subject of learning automata in verification has also attracted attention, as it has been shown useful in many applications, see Vaandrager's survey [Vaa17].
There already exists substantial literature on learning restricted forms of SFAs [GJL10, MM14, ASKK16, MM17, CDYS17], as well as general SFAs [DD17,AD18], and even nondeterministic residual SFAs [CHYS19].For other types of automata over infinite alphabets, [HSM11] suggests learning abstractions, and [She19] presents a learning algorithm for deterministic variable automata.All these works consider the query learning paradigm, and provide extensions to Angluin's L * algorithm for learning DFAs using membership and equivalence queries [Ang87a].Unique to these works is the work [AD18] which studies the learnability of SFAs taking as a parameter the learnability of the underlying algebras, providing positive results regarding specific Boolean algebras.
One of our contributions is to demonstrate that these positive learnability results are far from trivial.In particular, we show that there are limitations to the power of membership and equivalence queries when it comes to learning SFAs.To do so, we provide a necessary condition for efficient learnability of SFAs in the query learning paradigm, from which we obtain a negative result regarding query learning of SFAs over the propositional algebra.This is, to the best of our knowledge, the first negative result on learning SFAs with membership and equivalence queries and thus gives useful insights into the limitations of the L * framework in this context.
The main focus of our work lies on the learning paradigm of identification in the limit using polynomial time and data.We are interested in providing sufficient or necessary conditions for a class of SFAs to be learnable under this paradigm.To this end, we show that the type of the algebra, in particular whether it is monotonic or not, largely influences the learnability of the class.
Learnability of a class of languages in a certain paradigm greatly depends on the representation chosen for the language.For instance, regular languages are efficiently learnable (both in the paradigm of identification in the limit using polynomial time and data, and in the query learning paradigm using membership and equivalence queries) when represented as DFAs but not when represented as NFAs.While we are interested in SFAs as the representations, there are various types of SFAs (with the same expressive power), and the learnabilty results for them may vary.
The literature on SFAs has mainly focused on a special type of SFA, termed normalized, in which there is at most one transition between every pair of states.This minimization of the number of transitions comes at the cost of obtaining more complex predicates.We consider, in addition to normalized SFAs, another special type of SFAs that we term neat SFAs, which by contrast, allows several transitions between the same pair of states, but restricts the predicates to be basic, as formally defined in subsection 2.1.
To get on the right track, we first take a global look at the complexity of the standard operations on SFAs, and how they vary according to the special form.We revisit the results in the literature and analyze them along the measures we find adequate for the size of an SFA: the number of states (n), the number of transitions (m) and the size of the most complex predicate (l). 1 The results show that most procedures are more efficient on neat SFAs.
We then turn to study identification of SFAs in the limit using polynomial time and data.We provide a necessary condition a class of SFAs M should meet in order to be identified in the limit using polynomial time and data, and a sufficient condition a class of SFAs M should meet in order to be efficiently identifiable.These conditions are expressed in terms of the existence of certain efficiently computable functions, which we call Generalize M , Concretize M , and Decontaminate M .We then provide positive and negative results regarding the learnability of specific classes of SFAs in this paradigm.In particular, we show that the class of SFAs over any monotonic algebra is efficiently identifiable.Comparison to the conference version Preliminary results of this work appear in [FFZ22].This paper extends the results of [FFZ22] by adding a thorough discussion of the different SFA types and their effect on the complexity on different automata procedures; as well as a new theorem regarding efficient learnability, and additional examples for learning SFAs.In particular, sections 3, 4, and 5, are all new, as well as Theorem 7.3 and Examples 10.6 and 10.7.Outline The rest of the paper is organized as follows.In section 2 we provide the necessary definitions on effective Boolean algebras and SFAs.Section 3 introduces the special forms of SFAs.In section 4 we discuss transformations between the special forms.Section 5 then reviews the complexity of standard automata procedures along the mentioned parameters.
We then turn to discuss the learnability of symbolic automata.Section 6 provides a short overview and definitions regarding learnability of SFAs.In section 7 we discuss the paradigm of learnability in the limit using polynomial time and data, and provide an overview of learning DFAs in this paradigm.Sections 8 and 9 present a necessary condition and a sufficient condition for the efficient learnability of SFAs, and sections 10 and 11 use these conditions to prove a positive result on the learnability of SFAs over monotonic algebras, and a negative result on the learnability of SFAs over the propositional algebra.Section 12 discusses query learning of SFAs and provides a negative result.We conclude in section 13 with a short discussion.

Effective Boolean Algebra. A Boolean Algebra
A can be represented as a tuple (D, P, • , ⊥, , ∨, ∧, ¬) where D is a set of domain elements; P is a set of predicates closed under the Boolean connectives, where ⊥, ∈ P; the component • : P → 2 D is the so-called semantics function.It satisfies the following three requirements: (i) ⊥ = ∅, (ii) = D, and (iii) for all ϕ, ψ ∈ P, ϕ ∨ ψ = ϕ ∪ ψ , ϕ ∧ ψ = ϕ ∩ ψ , and ¬ϕ = D \ ψ .A Boolean Algebra is effective if all the operations above, as well as satisfiability, are decidable.Henceforth, we implicitly assume Boolean algebras to be effective.
One way to define a Boolean algebra is by defining a set P 0 of atomic formulas that includes and ⊥ and obtaining P by closing P 0 for conjunction, disjunction and negation.For a predicate ψ ∈ P we say that ψ is atomic if ψ ∈ P 0 .We say that ψ is basic if ψ is a conjunction of atomic formulas.
We now introduce two Boolean algebras that are discussed extensively in the paper.The Propositional Algebra is defined with respect to a set AP = {p 1 , p 2 , . . ., p k } of atomic propositions.The set of atomic predicates P 0 consists of the atomic propositions and their negations as well as and ⊥.The domain D consists of all the possible valuations for these propositions, thus it is B k where B = {0, 1}.The semantics of an atomic predicate p is given by

The Interval
, and similarly 2.1.1.Predicate Size.In order to reason about the complexity of operations over the Boolean algebra (and later, the efficient learnability of SFAs using such Boolean algebras), we need some measure of the size of predicates.We assume the algebra is associated with a function size P : P → N returning for each predicate its size.If the algebra is defined via a set of atomic propositions, one can assume the existence of functions size P : P 0 → N, size P ∧ : N × N → N, size P ∨ : N ×N → N, size P ¬ : N → N according to which the size of predicates can be inductively computed.Note that the size is a property of the predicate, not the set of concrete elements it represents.
Example 2.1.For the interval algebra, we define the size of one interval to be 1, and the size of a general predicate as the size of its parse tree, where leaves are single intervals (whose size is 1).Thus for example size P (([0, 50) ∨ [100, 200)) ∧ [20, 60)) is 5 whereas the size of the semantically equivalent predicate [20, 50) is 1.
Similarly, for the propositional algebra we define the size of a predicate to be the size of its parse tree.Note that Boolean functions from B k to B can be represented in other ways as well, e.g., using Binary Decision Diagrams (BDDs) [Bry86].This would result in a different Boolean algebra (where predicates are BDDs) with a different size measure for predicates.

Symbolic Automata.
A symbolic finite automaton (SFA) is a tuple M=(A, Q, q ι , F, ∆) where A is a Boolean algebra, Q is a finite set of states, q ι ∈ Q is the initial state, F ⊆ Q is the set of final states, and ∆ ⊆ Q × P A × Q is a finite set of transitions, where P A is the set of predicates of A.
We use the term letters for elements of D where D is the domain of A, and the term words for elements of D * .A run of M on a word a 1 a 2 . . .a n is a sequence of transitions q 0 , ψ 1 , q 1 q 1 , ψ 2 , q 2 . . .q n−1 , ψ n , q n satisfying that a i ∈ ψ i , that q i , ψ i+1 , q i+1 ∈ ∆ and that q 0 = q ι .Such a run is said to be accepting if q n ∈ F .A word w = a 1 a 2 . . .a n is said to be accepted by M if there exists an accepting run of M on w.The set of words accepted by an SFA M is denoted L(M).We use L(M) for the set of labeled words An SFA is said to be deterministic if for every state q ∈ Q and every letter a ∈ D we have that |{ q, ψ, q ∈ ∆ | a ∈ ψ }| ≤ 1, namely from every state and every concrete letter there exists at most one transition.It is said to be complete if |{ q, ψ, q ∈ ∆ | a ∈ ψ }| ≥ 1 for every q ∈ Q and a ∈ D, namely from every state and every concrete letter there exists at least one transition.It is not hard to see that, as is the case for finite automata (over concrete Figure 1.The SFA M over A N alphabets), non-determinism does not add expressive power but does add succinctness.When A is deterministic we use ∆(q, w) to denote the state A reaches on reading the word w from state q.If ∆(q ι , w) = q then w is termed an access word to state q.If w is the smallest access word according to lexicographic order we say that w is the lex-access word to state q.
Example 2.2.Consider the SFA M given in Figure 1.It is defined over the algebra A N which is the interval algebra restricted to the domain D = N ∪ {∞}.The language of M is the set of all words over D of the form w 1 • d • w 2 where w 1 is some word over the domain D, the letter d satisfies 0 ≤ d < 100 and all letters of the word w 2 are numbers smaller than 200.
The lex-access word to state q 0 is , and 0 is the lex-access word to state q 1 .

Types of Symbolic Automata
Since the complexity of a learning algorithm for a class of languages L using some representation R is measured with respect to the size of the smallest representation R ∈ R for the unknown language L ∈ L, we first need to agree how to measure the size of an SFA.Subsection 3.1 explains why the number of states is not a sufficient measure, and proposes an alternative using three parameters.Optimizing different parameters leads to different special forms which are discussed in subsection 3.2.
3.1.Size of an SFA.We note that there is a trade-off between the number of transitions and the complexity of the transition predicates.The size of an automaton (not a symbolic one) is typically measured by its number of states.This is since for DFAs, the size of the alphabet is assumed to be a given constant, and the rest of the parameters, in particular the transition relation, are at most quadratic in the number of states.In the case of SFAs the situation is different, as the size of the predicates labeling the transitions can vary greatly.
In fact, if we measure the size of a predicate by the number of nodes in its parse dag, then the size of a formula can grow unboundedly (and the same is true for other reasonable size measures for predicates).The size and structure of the predicates influence the complexity of their satisfiability check, and thus the complexity of the corresponding algorithms.On the other hand there might be a trade-off between the size of the transition predicates and the number of transitions; e.g., a predicate of the form ψ 1 ∨ ψ 2 . . .∨ ψ k can be replaced by k transitions, each one labeled by ψ i for 1 ≤ i ≤ k.Therefore, we measure the size of an SFA by three parameters: the number of states (n), the maximal out-degree of a state (m) and the largest size of a predicate (l).
In addition, in order to analyze the complexity of automata algorithms discussed in subsection 5.1 and subsection 5.2, for a class P of predicates over a Boolean algebra A, we use the complexity measure sat P (l), which is the complexity of satisfiability check for a predicate of size l in P. We also use sat P 0 (l) for the respective complexities when restricted to atomic predicates.
3.2.Special Form SFAs.We turn to define special types of SFAs, which affect the complexity of related procedures.Neat and Normalized SFAs The literature defines an SFA as normalized if for every two states q and q there exists at most one transition from q to q .This definition prefers fewer transitions at the cost of potentially complicated predicates.By contrast, preferring simple transitions at the cost of increasing the number of transitions, leads to neat SFAs.We define an SFA to be neat if all transition predicates are basic predicates.Feasibility The second distinction concerns the fact that an SFA can have transitions with unsatisfiable predicates.A symbolic automaton is said to be feasible if for every q, ψ, q ∈ ∆ we have that ψ = ∅.Feasibility is an orthogonal property to being neat or normalized.Monotonicity The third distinction we make concerning the nature of a given SFA regards its underlying algebra.A Boolean algebra A over domain D is said to be monotonic if the following conditions hold.
(1) There exists a total order < on the elements of D; and The interval algebra is clearly monotonic, as is the similar algebra obtained using R (the real numbers) instead of Z (the integers).On the other hand, the propositional algebra is clearly non-monotonic.
Example 3.1.The SFA M from Example 2.2 (Figure 1) is defined over a monotonic algebra, and is neat, normalized, deterministic and complete.

Transformations to Special Forms
We now address the task of transforming SFAs into their special forms as presented in section 3. We discuss transformations to the special forms neat, normalized and feasible automata, measured as suggested using n, m, l -the number of states, the maximal out-degree of a state, and the largest size of a predicate.4.1.Neat Automata.Since each predicate in a neat SFA is a conjunction of atomic predicates, neat automata are intuitive, and the number of transitions in the SFA reflects the complexity of the different operations, as opposed to the situation with normalized SFAs.For the class P 0 of basic formulas, sat P 0 (l) is usually more efficient than sat P (l), and in particular is polynomial for the algebras we consider here.This is since, for a basic predicate ϕ that is a conjunction of l atomic predicates, satisfiability testing can be reduced to checking that there are no two atomic predicates that contradict each other.Since satisfiability checking directly affects the complexity of various algorithms discussed in subsection 5.1, neat SFAs allow for efficient automata operations, as we show in subsection 5.2.4.1.1.Transforming to Neat.Given a general SFA M of size n, m, l , we can construct a neat SFA M of size n, m • 2 l , l , by transforming each transition predicate to a DNF formula, and turning each disjunct into an individual transition.The number of states, n, remains the same.However, the number of transitions can grow exponentially due to the transformation to DNF.In the worst case, the size of the most complex predicate can remain the same after the transformation, resulting in the same l parameter for both automata.
Note that there is not necessarily a unique minimal neat SFA.For instance, a predicate ψ over the propositional algebra with AP = {p 1 , p 2 , p 3 }, satisfying ψ = {[100], [101], [111]} can be represented using the two basic transitions (p 1 ∧¬p 2 ) and (p 1 ∧p 2 ∧p 3 ); or alternatively using the two basic transitions (p 1 ∧ p 3 ) and (p 1 ∧ ¬p 2 ∧ ¬p 3 ), though it cannot be represented using one basic transition. 3lthough in the general case, the transformation from normalized to neat SFAs is exponential, for monotonic algebras we have the following lemma, which follows directly from the definition of monotonic algebras and basic predicates.
Lemma 4.1.Over a monotonic algebra, the conjunction of two atomic predicates is also an atomic predicate; inductively, any basic formula that does not contain negations, over a monotonic algebra, is an atomic predicate.In addition, the negation of an atomic predicate is a disjunction of at most 2 atomic predicates.Lemma 4.2.Let M be a normalized SFA over a monotonic algebra A mon .Then, transforming M into a neat SFA M is linear in the size of M.
Since a DNF formula with m disjunctions is a natural representation of m basic transitions, Lemma 4.2 follows from the following property of monotonic algebras.
Lemma 4.3.Let ψ be a general formula over a monotonic algebra A mon .Then, there exists an equivalent DNF formula ψ d of size linear in |ψ|.
Proof.First, we transform ψ into a Negation Normal Form formula ψ NNF , pushing negations inside the formula.When transforming to NNF, the number of atomic predicates (possibly under negation) remains the same, and so is the number of conjunctions and disjunctions.Since, by Lemma 4.1, a negation of an atomic predicate over a monotonic algebra, namely a negation of an interval, results in at most two intervals, we get that |ψ NNF | ≤ 2 • |ψ|.Note that ψ NNF does not contain any negations, as they were applied to the intervals.We now transform ψ NNF into a DNF formula ψ d recursively, operating on sub-formulas of ψ NNF , distributing conjunctions over disjunctions.
We inductively prove that ) is in DNF and we are done.
For the induction step, consider the following two cases.(1) Assume ψ NNF = ψ 1 ∨ ψ 2 .By the induction hypothesis, there exists DNF formulas ψ 1d and ψ 2d such that ψ id = ψ i and |ψ id | ≤ |ψ i | for i = 1, 2. Then, ψ d = ψ 1d ∨ ψ 2d is equivalent to ψ NNF and at most of the same size.(2) Assume ψ NNF = ψ 1 ∧ ψ 2 .Again, by the induction hypothesis, instead of ψ 1 ∧ ψ 2 we can consider ψ 1d ∧ ψ 2d where ψ 1d and ψ 2d are in DNF.That is . Then, we have the following: would have resulted in a longer single interval), and the same for {[c j , d j ) : 1 ≤ j ≤ l}.Thus, every element a i or c j can define at most one interval of the form To conclude, since ψ N N F is linear in the size of ψ, and the size of ψ d is at most the size of ψ N N F , we have that the translation of ψ into the DNF formula ψ d is linear.4.2.Normalized Automata.Neat automata stand in contrast to normalized ones.In a normalized SFA, there is at most one transition between every pair of states, which allows for a succinct formulation of the condition to transit from one state to another.On the other hand, this makes the predicates on the transitions structurally more complicated.Given a general SFA M with parameters n, m, l , we can easily construct a normalized SFA M as follows.For every pair of states q and q , construct a single edge labeled with the predicate q,ϕ,q ∈δ ϕ.Then, M has size n, min(n 2 , m), size P ∨ m (l) , where we use size P ∨ m (l) to denote the size of m disjunctions of predicates of size at most l.
Note that there is no unique minimal normalized automaton either, since in general a Boolean formula can have two semantically equivalent, yet syntactically different expressions in the underlying representation system, e.g., two distinct BDDs can represent the same formula.However, in subsection 5.2 we show that over monotonic algebras there is a canonical minimal normalized SFA.
The complexity of sat P (l) for general formulas (corresponding to normalized SFAs) is usually exponentially higher than for basic predicates (and thus for neat SFAs).In addition, as we show above, generating a normalized automaton is an easy operation.This motivates working with neat automata, and generating normalized automata as a last step, if desired (e.g., for presenting a graphical depiction of the automaton).4.3.Feasible Automata.The motivation for feasible automata is clear; if the automaton contains unsatisfiable transitions, then its size is larger than necessary, and the redundancy of transitions makes it less interpretable.Thus, infeasible SFAs add complexity both algorithmically and for the user, as they are more difficult to understand.In order to generate a feasible SFA from a given SFA M, we need to traverse the transitions of M and test the satisfiability of each transition.The parameters n, m, l of the SFA remain the same since there is no change in the set of states, and there might be no change in transitions as well (if they are all satisfiable).
In the following, we usually assume that the automata are feasible, and when applying algorithms, we require the output to be feasible as well.
Table 1.Analysis of standard automata procedures on SFAs.

Decision Procedures Time Complexity
Table 2. Analysis of time complexity of decision procedures for SFAs.

Complexity of standard automata procedures on SFAs
In this section we analyze the complexity of automata procedures on SFAs, in terms of their effect on the parameters n, m, l .We start in subsection 5.1 with examining general SFAs, and then in subsection 5.2 discuss the effects on special SFAs.

Complexity of Automata
Procedures for General SFAs.We turn to discuss Boolean operations, determinization and minimization, and decision procedures (such as emptiness and equivalence) for the different types of SFAs.For intersection and union, the product construction of SFAs was studied in [VdHT10,HV11].There, the authors assume a normalized SFAs as an input, and do not delve into the effect of the construction on the number of transitions and the complexity of the resulting predicates.Determinization of SFAs was studied in [VdHT10], and [DV14] study minimization of SFAs, assuming the given SFA is normalized.
Table 1 shows the sizes of the SFAs resulting from the mentioned operations, in terms of n, m, l .The analysis applies to all types of SFAs, not just normalized ones.The time complexity for each operation is given in terms of the parameters n, m, l and the complexity of feasibility tests for the resulting SFA, as discussed in subsection 4.3.Table 2 summarizes the time complexity of decision procedures for SFAs: emptiness, inclusion, and membership.Again, the analysis applies to all types of SFAs.We note that in many applications of learning in verification, the challenging part is implementing the teacher (e.g., in [PGB + 08, CKKS20, FGPS20, FGPS22]).In such cases the complexity of membership and equivalence queries as well as standard automata operations plays a major role.
In both tables we consider two SFAs M 1 and M 2 with parameters n i , m i , l i for i = 1, 2, over algebra A with predicates P. We use size P ∧ m (l) for an upper bound on the size of m conjunctions of predicates of size at most l.All SFAs are assumed to be deterministic, except of course for the input for determinization.
We now briefly describe the algorithms we analyze in both tables.Product Construction [VdHT10,HV11] The product construction for SFAs is similar to the product of DFAs -the set of states is the product of the states of M 1 and M 2 ; and a transition is a synchronization of transitions of M 1 and M 2 .That is, a transition from q 1 , q 2 to p 1 , p 2 can be made while reading a concrete letter γ, iff q 1 , ψ 1 , p 1 ∈ ∆ 1 and q 2 , ψ 2 , p 2 ∈ ∆ 2 and γ satisfies both ψ 1 and ψ 2 .Therefore, the predicates labeling transitions in the product construction are conjunctions of predicates from the two SFAs M 1 and M 2 .Complementation In order to complement a deterministic SFA M 1 , we first need to make M 1 complete.In order to do so, we add one state which is a non-accepting sink, and from each state we add at most one transition which is the negation of all other transitions from that state.If M 1 is complete, then complementation simply switches accepting and non-accepting states, resulting in the same parameters n 1 , m 1 , l 1 .Determinization [VdHT10] In order to make an SFA deterministic, the algorithm of [VdHT10] uses the subset construction for DFAs, resulting in an exponential blowup in the number of states.However, in the case of SFAs this is not enough, and the predicates require special care.Let P = {q 1 , • • • , q t } be a state in the deterministic SFA, where q 1 , . . ., q t are states of the original SFA M 1 , and let ψ 1 , . . ., ψ t be some predicates labelling outgoing transitions from q 1 , . . .q t , correspondingly.Then, in order to determinize transitions, the algorithm of [VdHT10] computes the conjunction t i=1 ψ i , which labels a single transition from the state P .Minimization [DV14] Given a deterministic SFA M 1 , the output of minimization is an equivalent deterministic SFA with a minimal number of states.When constructing such an SFA, the number of states and transitions cannot grow.However, as in determinization, if two states of M 1 are replaced with one state, then outgoing transitions might overlap, resulting in a non-deterministic SFA.D'Antoni and Veanes [DV14] suggest several algorithms to cope with this difficulty.One of their approaches is to compute minterms, which are the smallest conjunctions of outgoing transitions.Minterms then do not intersect, and thus the output is deterministic.Their other approaches avoid computing minterms, but are able to achieve the same goal.Emptiness If we assume a feasible SFA M as an input, then in order to check for emptiness we need to find an accepting state which is reachable from the initial state (as in DFAs).If we do not assume a feasible input, we need to test the satisfiability of each transition, thus the complexity depends on the complexity measure sat P (l).Membership Similarly to emptiness, in order to check if a concrete word γ 1 • • • γ n is in L(M), we need not only check if it reaches an accepting state but also locally consider the satisfiability of each transition.In the case of membership, we need to check whether the letter γ i satisfies the predicate on the corresponding transition.Inclusion Deciding inclusion amounts to checking emptiness and feasibility of M 1 ∩ M 2 .We assume here that both M 1 and M 2 are deterministic and complete.5.2.Complexity of Automata Procedures for Special SFAs.We now discuss the advantages of neat SFAs and of monotonic algebras, in the context of the algorithms presented in the tables, and show that, in general, they are more efficient to handle compared to other SFA types.5.2.1.Neat SFAs.As can be observed from Table 2, almost all decision procedures regarding SFAs depend on sat P (l).For neat SFAs it is more precise to say that they depend on sat P 0 (l), namely on the satisfiability of atomic predicates rather than arbitrary predicates.Since sat P 0 (l) is usually less costly than sat P (l), most decision procedures are more efficient on neat automata.Here, we claim that applying automata algorithms on neat SFAs preserves their neatness, thus suggesting that neat SFAs may be preferable in many applications.
Lemma 5.1.Let M 1 and M 2 be neat SFAs.Then the algorithms for their product construction, complementation, determinization and minimization discussed in subsection 5.1 result in a neat SFA.
Proof.Observing the procedures for product construction [VdHT10,HV11], determinization [VdHT10] and minimization [DV14] constructions, one can see that they use only conjunctions in order to construct the predicates on the output SFAs.Thus, if the predicates on the input SFAs are basic, then so are the output predicates.5.2.2.Monotonic Algebras.We now consider the class M Amon of SFAs over a monotonic algebra A mon with predicates P. We first discuss size P ∧ (l 1 , l 2 ) and sat P (l), as they are essential measures in automata operations.Then we show that for M 1 and M 2 in the class M Amon , the product construction is linear in the number of transitions, adding to the efficiency of SFAs over monotonic algebras.
Lemma 5.2.Let ψ 1 and ψ 2 be formulas over a monotonic algebra A mon .Then size Proof.Transforming to DNF is linear, as follows from Lemma 4.3.There, we show that the conjunction of two DNF formulas of sizes k and l has size k + l, which implies that the conjunction of general formulas has linear size.In addition, sat P (l) is trivial for a single interval, and following Lemma 4.3, is linear for general formulas.For an interval [a, b), satisfiability checking amounts to the question "is a < b?".Lemma 5.3.Let M 1 and M 2 be deterministic SFAs over a monotonic algebra A mon .Then the out-degree of their product SFA M is at most m = 2 • (m 1 + m 2 ).
Proof.From Lemma 4.2 and Lemma 4.3, we can construct neat SFAs M 1 and M 2 of sizes n i , 2m i , l i for i ∈ {1, 2}, that have the same languages as M 1 and M 2 , respectively.Similarly to the proof of Lemma 4.3, each transition q 1 , q 2 , [a, b) ∧ [c, d), p 1 , p 2 in the product SFA results in a predicate [max{a, c}, min{b, d}).Then, for q 1 ∈ Q 1 , every minimal element in the set of q 1 's outgoing transitions can define at most one transition in M, and the same for a state q 2 ∈ Q 2 , and so the number of transitions from q 1 , q 2 is at most m 1 + m 2 , as required.
Lemma 5.4.Let M be a neat SFA over a monotonic algebra.Then, transforming M into a complete SFA M is polynomial in the size of M.
Proof.In order to complete M, we add a non-accepting sink r in case it does not already exist, and at most m + 1 transitions from each state q to r, when m is the out-degree of the SFA: Let [a, b) and [c, d) be two predicates labeling outgoing transitions of q, where c is the minimal left end-point of a predicate such that b < c.Then, in order to complete M, we need to add a transition to the sink, labeled by the predicate [b, c).In addition, for the predicate [a, b) where there is no c > b that defines another predicate, if b = d ∞ we add [b, d ∞ ), and similarly we add [d −∞ , a).Then, for each state we add at most m + 1 new transitions, resulting in at most |Q| × (m + 1) new transitions.Definition 5.5.For predicates over a monotonic algebra, we define a canonical representation of a predicate ψ as the simplified DNF formula which is the disjunction of all maximal disjoint intervals satisfying ψ.
Note that every predicate ψ over a monotonic algebra defines a unique partition of the domain into maximal disjoint intervals.This unique partition corresponds to a simplified DNF formula, which is exactly the canonical representation of ψ.Lemma 5.7.Let M be an SFA over a monotonic algebra.Then: (1) There is a unique minimal-state neat SFA M such that L(M) = L(M ).
(2) There is a canonical minimal-state normalized SFA M such that L(M) = L(M ).
Proof.For a language L = L(M) for some SFA M, the minimal number of states in an SFA corresponds, similarly to DFAs, to the number of equivalence classes in the equivalence relation N defined by (u, v) ∈ N ⇐⇒ ∀z ∈ D * : (uz ∈ L ⇔ vz ∈ L) [Myh57,Ner58].Indeed if (u, v) ∈ N then there is no reason that reading them (from the initial state) should end up in different states, and if (u, v) / ∈ N then reading them (from the initial state) must lead to different states.
As for transitions, we have the following.(1) Let ψ be a general predicate labeling a transition in M. Then ψ defines a unique partition of the domain into maximal disjoint intervals, which are exactly the transitions in a neat SFA.Then, the minimal state neat SFA is unique, where its transitions correspond exactly to these maximal disjoint intervals.(2) For normalized transitions, we can use Lemma 4.3 to transform a general predicate labeling a transition to a DNF predicate in linear time.A DNF predicate over a monotonic algebra is in-fact a disjunction of disjoint intervals, where the construction of Lemma 4.3 obtains the maximal disjoint intervals.Then, to obtain a canonical representation, we order these intervals by order of their minimal elements.

Learning SFAs
We turn to discuss the learnability of symbolic automata.In grammatical inference, loosely speaking, we are interested in learning a class of languages L over an alphabet Σ, from examples which are words over Σ. Examples for classes of languages can be the set of regular languages, the set of context-free languages, etc.A learning algorithm, aka a learner, is expected to output some concise representation of the language from a class of representations R for the class L. For instance, in learning the class L reg of regular languages one might consider the class R dfa of DFAs, or the class R lin of right linear grammars, since both are capable of expressing all regular languages. 7We often say that a class of representations R is learnable (or not) when we mean that class of languages L is learnable (or not) via the class of representations R. The complexity of learning an unknown language L ∈ L via R is typically measured with respect to the size of the smallest representation R L ∈ R for L. For instance, when learning L reg via R dfa a learner is expected to output a DFA for an unknown language in time that is polynomial in the number of states of the minimal DFA for L.
In our setting we are interested in learning regular languages using as a representation a class of SFAs over a certain algebra.To measure complexity we must agree on how to measure the size of an SFA.Thus, as discussed in subsection 3.1 we represent the size of an SFA using the parameters n, m, l of the number of states of the SFA, the number of transitions, and the size of the largest predicate.Another important factor regarding size and canonical forms of SFAs, is the underlying algebra, specifically, whether it is monotonic or not.Learning Paradigms The exact definition regarding learnability of a class depends on the learning paradigm.In this work we consider two widely studied paradigms: identification in the limit using polynomial time and data and learning with membership and equivalence queries.Their definitions are provided in sections 7 and 12, respectively.Note that in general, a positive or negative result in one paradigm, does not imply the same result in another paradigm.We discuss this further in section 13.Basic SFAs To provide results regarding the learnability of SFAs, we study classes of SFAs that contain all basic SFAs, defined as follows.
Definition 6.1.An SFA M over a Boolean Algebra A with a set of predicates P is termed basic if it is of the form M ϕ = (A, {q ι , q ac , q rj }, q ι , {q ac }, ∆) where ϕ ∈ P and ∆ = { q ι , ϕ, q ac , q ι , ¬ϕ, q rj , q rj , , q rj , q ac , , q rj }.Note that M ϕ accepts only words of length one consisting of a concrete letter satisfying ϕ, and it is minimal among all complete deterministic SFAs accepting this language (minimal in both number of states and number of transitions).
In the sequel, our results are regarding classes of SFAs that contain all basic SFAs M ϕ for all ϕ ∈ P.

Efficient Identifiability
While in the better-known setting of active learning (namely, query learning with mqs and eqs) the learner can select any word and query about its membership in the unknown language, in passive learning the learner is given a set of words, and for each word w in the set, a label b w indicating whether w is in the unknown language or not.Formally, a sample for a language L is a finite set S consisting of labeled examples, that is, pairs of the form w, b w where w is a word and b w ∈ {0, 1} is its label, satisfying that b w = 1 if and only if w ∈ L. The words that are labeled 1 are termed positive words, and those that are labeled 0 are termed negative words.Note that if L is recognized by an automaton M, we have that S ⊆ L(M) (as defined in subsection 2.2).If S is a sample for L we often say that S agrees with L. Given two words w, w , we say that w and w are not equivalent with respect to S, denoted w ∼ S w , iff there exists z such that wz, b , w z, b ∈ S and b = b .Otherwise we say that w and w are equivalent with respect to S, and denote w ∼ S w .
Given a sample S for a language L over a concrete domain D, it is possible to construct a DFA that agrees with S in polynomial time.Indeed one can create the prefix-tree automaton, a simple automaton that accepts all and only the positively labeled words in the sample.Clearly the constructed automaton may not be the minimal automaton that agrees with S.There are several algorithms that infer such a minimal automaton, in particular the popular RPNI [OG92], that merges the states of the prefix-tree automaton and results in an automaton that may accept an infinite language.Obviously though, this procedure is not guaranteed to return an automaton for the unknown language, as the sample may not provide sufficient information.For instance if L = aL 1 ∪ bL 2 and the sample contains only words starting with a, there is no way for the learner to infer L 2 and hence also L correctly.One may thus ask, given a language L, what should a sample contain in order for a passive learning algorithm to infer L correctly, and can such sample be of polynomial size with respect to a minimal representation (e.g., a DFA) for the language.
One approach to answer these questions is captured in the paradigm of identification in the limit using polynomial time and data.This model was proposed by Gold [Gol78], who also showed that it admits learning of regular languages represented by DFAs.We follow de la Higuera's more general definition [dlH97]. 8This definition requires that for any language L in a class of languages L represented by R, there exists a sample S L of size polynomial in the size of the smallest representation R ∈ R of L (e.g., the smallest DFA for L), such that a valid learner can infer the unknown language L from the information contained in S L .The set S L is then termed a characteristic sample. 9Here, a valid learner is an algorithm that learns the target language exactly and efficiently.In particular, a valid learner produces in polynomial time a representation that agrees with the provided sample.The learner also has to correctly learn the unknown language L when given the characteristic sample S L as input.Moreover, if the input sample S subsumes S L yet is still consistent with L, the additional information in the sample should not "confuse" the learner; the latter still has to output a correct representation for L. (Intuitively, this requirement precludes situations in which the sample consists of some smart encoding of the representation that the learner simply deciphers.In particular, the learner will not be confused if an adversary "contaminates" the characteristic sample by adding labeled examples for the target language.)We provide the formal definition after the following informal example.
Example 7.1.For the class of DFAs, let us consider the regular language L = a * over the alphabet {a, b}.Further, consider a sample set S = { , 1 , a, 1 , b, 0 , bb, 0 , ba, 0 } for L.There is a valid learner for the class of all DFAs that uses the sample S as a characteristic sample for L. By definition, such a learner has to output a DFA for L when fed with S, but also has to output equivalent DFAs whenever given any superset of S as input, as long as this superset agrees with L. Naturally, the sample S is also consistent with the regular language L = { , a}.However, this does not pose any problem, since the same learner can use a characteristic sample for L that disagrees with L, for example, S = { , 1 , a, 1 , aa, 0 }.When defining a system of characteristic samples like that, the core requirement is that the size of a sample be bounded from above by a function that is polynomial in the size of the smallest DFA for the respective target language.
8 This paradigm may seem related to conformance testing.The relation between conformance testing for Mealy machines and automata learning of DFAs has been explored in [BGJ + 05]. 9 De la Higuera's notion of characteristic sample is a core concept in grammatical inference, for various reasons.Firstly, it addresses shortcomings of several other attempts to formulate polynomial-time learning in the limit [Ang87b,Pit89].Secondly, this notion has inspired the design of popular algorithms for learning formal languages such as, for example, the RPNI algorithm [OG92].Thirdly, it was shown to bear strong relations to a classical notion of machine teaching [GM96]; models of the latter kind are currently experiencing increased attention in the machine learning community [ZSZR18].
Definition 7.2 (identification in the limit using polynomial time and data [dlH97]).A class of languages L is said to be identified in the limit using polynomial time and data via representations in a class R if there exists a learning algorithm A such that the following two requirements are met.
(1) Given a finite sample S of labeled examples, A returns a hypothesis R ∈ R that agrees with S in polynomial time.
(2) For every language L ∈ L, there exists a sample S L , termed a characteristic sample, of size polynomial in the minimal representation R ∈ R for L such that the algorithm A returns a correct hypothesis when run on any sample S for L that subsumes S L .
Note that the first condition ensures polynomial time and the second polynomial data.However, the latter is not a worst-case measure; the algorithm may fail to return a correct hypothesis on arbitrarily large finite samples (if they do not subsume a characteristic set).
Note also that the definition does not require the existence of an efficient algorithm that constructs a characteristic sample for each language in the underlying class.When such an algorithm is also available we say that the class is efficiently identifiable.The following result shows that efficient identifiability does not trivially follow from identifiability; in fact it makes the much stronger statement that not even computability of characteristic sets follows from their existence.
Theorem 7.3.There exists a class of languages that possesses polynomial-size characteristic sets, yet without the ability to construct such sets effectively.
We first prove Theorem 7.3 and then provide a definition for efficient identification.
Proof.We present a class of recursive languages that is identifiable in the limit from polynomial time and data, while there is no (polynomial-time or other) algorithm that constructs a characteristic sample for every language in the class, using a specific underlying representation of the languages in the class.
For the purpose of defining such a class, let ϕ be a Gödel numbering of all partial computable functions over the natural numbers, and let Φ be a corresponding Blum complexity measure.Here ϕ i refers to the ith partial computable function in the numbering ϕ.Intuitively, Φ i (j) is undefined if ϕ i (j) is undefined (i.e., the computation of ϕ i (j) does not terminate); otherwise Φ i (j) is the number of computational steps required until the termination of the computation of ϕ i (j).The set K = {k | ϕ k (k) is defined} is called the halting set for ϕ; this set is recursively enumerable but membership in K is not decidable.
We now define two languages for each natural number k: Note that L k,1 = L k,2 if and only if k / ∈ K. Now let L consist of all languages L k,q for k ∈ N and q ∈ {1, 2}.
There is an effective algorithm that decides membership in L k,q , given k ∈ N and q ∈ {1, 2}.To see that, note that, given k, q, and a word w, membership is trivial to decide when q = 1 or when w is not of the form a i .If w = a i , then w belongs to L k,2 if and only if the computation of ϕ k (k) does not terminate within fewer than i steps, which can be checked effectively.Moreover, every language in L is regular and has a characteristic sample of size at most 2. In particular, { b k , 1 } serves as a characteristic sample for L k,1 (and thus also for L k,2 in case k / ∈ K), while { b k , 1 , a Φ k (k)+1 , 0 } is a characteristic sample for L k,2 in case k ∈ K. Thus, using the above representation, the class L has polynomial-size (even constant-size) characteristic samples.However, there is no algorithm to construct such characteristic samples effectively, since otherwise such an algorithm could be used to decide membership in K. (The latter can be verified by noting that L k,2 ⊆ L k,1 for all k.Therefore, a system of characteristic samples would need to distinguish L k,2 from L k,1 (when k ∈ K) by either (i) a negative example of the form a i , 0 for L k,2 , or (ii) a positive example of the form a i , 1 for L k,1 , where a i / ∈ L k,2 .Thus, the presence or absence of such example in the characteristic samples for L k,1 , L k,2 can be used to decide whether or not k ∈ K.) Since we are concerned with learning classes of automata, we now formulate the definition of efficient identification directly over classes of automata.
Definition 7.4 (efficient identification).A class of automata M over an alphabet Σ is said to be efficiently identified if the following two requirements are met.
(2) There exists a polynomial time algorithm Char : M → 2 (Σ * ×{0,1}) such that, for every M ∈ M and every sample S satisfying Char(M) ⊆ S ⊆ L(M), the automaton Infer(S) recognizes the same language as M.
When we apply this definition for a class of SFAs over a Boolean algebra A with domain D and predicates P, the characteristic sample is defined over the concrete set of letters D rather than the set of predicates P as this is the alphabet of the words accepted by an SFA.(Inferring an SFA from a set of words labeled by predicates can be done using the methods for inferring DFAs, by considering the alphabet to be the set of predicates.) Throughout this section we study whether a class of SFAs M is efficiently identifiable.That is, we are interested in the existence of algorithms Infer M and Char M satisfying the requirements of Definition 7.4.In section 8 we provide a necessary condition for a class of SFAs to be identified in the limit using polynomial time and data.In section 9 we provide a sufficient condition for a class of SFAs to be efficiently identifiable.On the positive side, we show in section 10 that the class of SFAs over the interval algebra is efficiently identifiable.On the negative side, we show in section 11 that SFAs over the general propositional algebra cannot be identified in the limit using polynomial time and data.All classes of SFAs that we study are assumed to contain all basic SFAs (as per Definition 6.1).7.1.Efficient Identification of DFAs.Before investigating efficient identification of SFAs, it is worth noting that DFAs are efficiently identifiable.We state a result that provides more details about the nature of these algorithms, since we need it later, in section 10, to provide our positive result.Intuitively, it says that there exists a valid learner such that if D is a minimal DFA recognizing a certain language L then the learner can infer L from a characteristic sample consisting of access words to each state of D and their extensions with distinguishing words (words showing each pair of states cannot be merged) as well as one letter extensions of the access words that are required to retrieve the transition relation.For completeness we give a proof of this theorem in Appendix A.

Necessary Condition
We make use of the following definitions.A sequence Γ 1 , . . ., Γ m consisting of finite sets of concrete letters Γ i ⊆ D is termed a concrete partition of D if the sets are pairwise disjoint (namely Γ i ∩ Γ j = ∅ for every i = j).Note that we do not require that in addition 1≤i≤k Γ i = D. We use Π conc (D) to define the set of all concrete partitions over D. A sequence of predicates ψ 1 , . . ., ψ k over a Boolean algebra A on a domain D is termed a predicate partition if ψ i ∩ ψ j = ∅ for every i = j, and in addition 1≤i≤k ψ i = D.That is, here we do require the assignments to the predicates cover the domain.We use Π pred (P) to define the set of all predicate partitions over P. For both concrete partition and predicate partition, we do not require that the sets Γ i or ψ i are non-empty.
Note that f g and f c are defined over partitions of any size.In Theorem 8.2 we use their dyadic restriction, that is, a concretizing and a generalizing functions that are defined only over partitions of size two.
We say that f g (resp.f c ) is efficient if it can be computed in polynomial time.Note that if f c is efficient then the sets Γ i in the constructed concrete partition are of polynomial size.
We are now ready to provide a necessary condition for identifiability in the limit using polynomial time and data.
Theorem 8.2.Let M A be a class of SFAs over a Boolean algebra A, that contain all basic SFAs over A. If M A is identified in the limit using polynomial time and data, then there exist efficient dyadic concretizing and generalizing functions Concretize A : Π pred (P) → Π conc (D) and Generalize Proof.Assume that M A is identified in the limit using polynomial time and data.That is, there exist two algorithms CharSFA : M A → 2 D * ×{0,1} and InferSFA : 2 D * ×{0,1} → M A satisfying the requirements of Definition 7.2.We show that efficient dyadic concretizing and generalizing functions do exist.
We start with the definition of Concretize A .Let ϕ 1 , ϕ 2 be the argument of Concretize A .Note that ϕ 2 = ¬ϕ 1 by the definition of a predicate partition.The implementation of Concretize A invokes CharSFA on the SFA M ϕ 1 accepting all words of length one consisting of a concrete letter satisfying ϕ 1 , as defined in Definition 6.1.Let S be the returned sample.Let Γ 1 be the set of positively labeled words in the sample.Note that all such words are of size one, namely they are letters.Let Γ 2 be the set of letters that are first letters in a negative word in the sample.Then Concretize A returns Γ 1 , Γ 2 .
We turn to the definition of Generalize A .Given Γ 1 , Γ 2 the implementation of Generalize A invokes InferSFA on sample S = {(γ, 1) That is, all one-letter words in Γ 1 are positively labeled, all one-letter words in Γ 2 are negatively labeled, and all words of length 2 using some of the given concrete letters are negatively labeled.Let M be the returned SFA when given S , such that S ⊇ S, as an input.Let Ψ 1 be the set of all predicates labeling some edge from the initial state to an accepting state, and let Ψ 2 be the set of all predicates labeling some edge from the initial state to a rejecting state.Let ϕ = ( ψ∈Ψ 1 ψ) ∧ ( ψ∈Ψ 2 ¬ψ).Then Generalize A returns ϕ, ¬ϕ .
It is not hard to verify that the constructed methods Concretize A and Generalize A satisfy the requirements of the theorem.
The following example shows the existence of functions Concretize and Generalize for the interval algebra.

and consider the functions Concretize
where Γi = j =i Γ j for every 1 ≤ i < m.Then, Concretize A N and Generalize A N satisfy the variadic generalization of the conditions of Theorem 8.2.
We would like to relate the necessary condition on the learnability of a class of SFAs over a Boolean algebra A to the learnability of the Boolean algebra A itself.For this we need to first define efficient identifiability of a Boolean algebra A. Since to learn an unknown predicate we need to supply two sets, one of negative examples and one of positive examples, it makes sense to say that a Boolean algebra A with predicates P over domain D is efficiently identifiable if there exist efficient dyadic concretizing and generalizing functions, Concretize A : Π pred (P) → Π conc (D) and Generalize A : Π conc (D) → Π pred (P) satisfying the criteria of Theorem 8.2.Using this terminology we can state the following corollary.
Corollary 8.4.Efficient identifiability of the Boolean algebra A is a necessary condition for identification in the limit using polynomial time and data of any class of SFAs over A, that contains all basic SFAs over A.

Sufficient Condition
We turn to discuss a sufficient condition for the efficient identifaibility of a class of SFAs M A over a Boolean algebra A. To prove that M A is efficiently identifiable, we need to supply two algorithms CharSFA M A and InferSFA M A as required in Definition 7.4.The idea is to reduce the problem to efficient identifiablity of DFAs, namely to use the algorithms CharDFA and InferDFA provided in Theorem 7.5.The implementation of CharSFA, given an SFA M, will transform it into a DFA D M by applying Concretize A on the partitions induced by the states of the SFA.The resulting DFA D M will not be equivalent to the given SFA M, but it may be used to create a sample of words S M that is a characteristic set for M, see Figure 2.
To implement InferSFA we would like to use InferDFA to obtain, as a first step, a DFA from the given sample, then at the second step, apply Generalize A on the concrete-partitions induced by the DFA states.A subtle issue that we need to cope with is that inference should succeed also on samples subsuming the characteristic sample.The fact that this holds for inference of the DFA does not suffice, since we are guaranteed that the inference of the DFA will not be confused if the sample contains more labeled words, as long as the new words are over the same alphabet.In our case the alphabet of the sample can be a strict subset of the concrete letters D (and if D is infinite, this surely will be the case).Example 10.7 in section 10 illustrates this problem for the class of SFAs over a monotonic algebra A m , for which the respective methods Concretize Am and Generalize Am exist.So, we need an additional step to remove words from the given sample if they are not over the alphabet of the characteristic sample.We call a method implementing this Decontaminate M A .Formally, we first define the extension of Concretize A and Generalize A to automata instead of partitions, which we term Concretize M A and Generalize M A (with M in the subscript).

• The formal definition of Concretize
where ∆ D is defined as follows.For each state q ∈ Q let π q = ψ 1 , . . ., ψ m be the predicate partition consisting of all predicates labeling a transition exiting q in M. Intuitively, in D, the outgoing transitions of each state q correspond to Concretize A (π q ).That is, let be a DFA.We define Generalize M A (D) with respect to an algebra A as follows.Let M = (A, Q, q ι , F, ∆ M ) where ∆ M is defined as follows.For each state q ∈ Q let Γ 1 , . . ., Γ m be the concrete partition consisting of letters labeling outgoing transitions from q.Note that Γ 1 , . . ., Γ m is a concrete partition, since D is a DFA.Let Generalize A ( Γ 1 , . . ., Γ m ) = ψ 1 , . . ., ψ m .Then, q, ψ i , q ∈ ∆ M if Γ i is the set of letters labeling transitions from q to q in D.
We are now ready to define the conditions the decontaminating function has to satisfy.We recall that the role of the decontaminating function is to identify words in the sample that are not over the alphabet Γ D of the characteristic sample (note that Γ D is not known to the decontaminating function).As before we say that f d is efficient if it can be computed in polynomial time.
Example 9.2.Intuitively, InferDFA is only promised to be correct if it is applied on a sample set S over the alphabet of the original DFA.For DFAs, this is always the case.However for SFAs, the concrete alphabet D usually contains more letters than appear in the characteristic set S. If Γ D is the set of letters in S then S might contain letters in D \ Γ D , i.e., letters that are not from the alphabet of the characteristic set.Consider, for example, the characteristic set S over the interval algebra: and consider the set S ⊇ S that contains, in addition to the words in S, also the word 150 • 100.Since the letter 150 is not part of any word in the original set S, we cannot apply InferDFA as it is, but we first need to remove it from S .Note that words that are not in S but are over the alphabet {0, 100, 200} do not pose a problem, as InferDFA can handle supersets over the same alphabet as the set S. See Examples 10.6 and 10.7 in section 10 for more details.
We now provide the sufficient condition for efficient identifiability.
Theorem 9.3.Let M A be a class of SFAs over a Boolean algebra A. If there exist an efficient decontaminating function Decontaminate M A and efficient functions Concretize A and Generalize A satisfying that if Concretize A ( ψ 1 , . . ., ψ m ) = Γ 1 , . . ., Γ m and Generalize A ( Γ 1 , . . ., Γ m ) = ϕ 1 , . . ., ϕ m where Γ i ⊆ Γ i for every 1 ≤ i ≤ m then ϕ i = ψ i for every 1 ≤ i ≤ m then the class M A is efficiently identifiable.
Given functions Concretize A , Generalize A and Decontaminate M A for a class M A of SFAs over a Boolean algebra A, meeting the criteria of Theorem 9.3, we show that M A can be efficiently identified by providing two algorithms CharSFA and InferSFA, described below.These algorithms make use of the respective algorithms CharDFA and InferDFA guaranteed in Theorem 7.5, as well as the methods provided by the theorem.
We briefly describe these two algorithms, and then turn to prove Theorem 9.3.The algorithm CharSFA receives an SFA M ∈ M, and returns a characteristic sample for it.It does so by applying Concretize M A (M) (Algorithm 1) to construct a DFA D M and generating the sample S M using the algorithm CharDFA applied on the DFA D M .
Algorithm InferSFA, given a sample set S, if S subsumes a characteristic set of an SFA M, returns an equivalent SFA.Otherwise InferSFA returns an SFA that agrees with the sample S. First, it applies Decontaminate M A to find a subset S ⊆ S over the alphabet of the subsumed characteristic sample, if such a subsumed sample exists.Then it uses S to construct a DFA by applying the inference algorithm InferDFA on S .From this DFA it constructs an SFA, M S , by applying Generalize M A (Algorithm 2).If the resulting automaton disagrees with the given sample it resorts to returning the prefix-tree automaton.In order to construct the symbolic prefix-tree automaton we first construct the prefix-tree DFA A for the set S, and then apply Generalize M A (A) to get an SFA that agrees with S.
In brief, we define: The symbolic prefix-tree automaton of S otherwise In section 10 we provide methods Concretize A , Generalize A and Decontaminate M A for SFAs over monotonic algebras, deriving their identification in the limit result.We now prove Theorem 9.3.
Proof of Theorem 9.3.Given functions Concretize A , Generalize A , and Decontaminate M A , we show that the algorithms CharSFA and InferSFA satisfy the requirements of Definition 7.4.
For the first condition, given that CharDFA, Decontaminate M A and Generalize A run in polynomial time, and that the prefix-tree automaton can be constructed in polynomial time, it is clear that so does InferSFA.In addition, the test performed in the definition of InferSFA ensures the output agrees with the sample.
For the second condition, note that the sample generated by CharSFA is polynomial in the size of D M , from the correctness of CharDFA.In addition, since Concretize A is efficient, D M is polynomial in the size of M, and thus S M generated by CharSFA is polynomial in M as well.It is left to show that given S M is the concrete sample produced by CharSFA when running on an SFA M, then when InferSFA runs on any sample S ⊇ S M it returns an SFA for L(M).Since Decontaminate M A is a decontaminating function, and S ⊇ S M , it holds that the set S = Decontaminate M A (S) is such that S ⊇ S M and is only over the alphabet Γ M , which is the alphabet of the DFA D M generated in Algorithm 1.
From the correctness of InferDFA, given S ⊇ S M , applying InferDFA on the output S of Decontaminate M A results in a DFA D that is equivalent to D M constructed in Algorithm 1.Since D M is complete with respect to its alphabet Γ M , for state q of D, the concrete partition Γ 1 , . . ., Γ n generated in Algorithm 2, line 4, covers Γ M and subsumes the output of Concretize M A on π q (Algorithm 1, line 2).Thus, since Generalize A and Concretize A satisfy the criteria of Theorem 9.3, it holds that the constructed predicates agree with the original predicates.In addition, since S, and therefore S , agrees with M, the test performed in the definition of InferSFA succeeds and the returned SFA is equivalent to M.

Positive Result
We present the following positive result regarding monotonic algebras.
Theorem 10.1.Let M Am be the set of SFAs over a monotonic Boolean algebra A m .Then M Am is efficiently identifiable.
In order to prove Theorem 10.1, we show that the sufficient condition holds for the case of monotonic algebras.Example 10.6 demonstrates how to apply CharSFA and InferSFA in order to learn an SFA over the algebra A N .
Proposition 10.2.There exist functions Concretize Am and Generalize Am for a monotonic Boolean algebra A m , satisfying the criteria of Theorem 9.3.
Proof.Let D be the domain of A m .We provide the functions Concretize Am and Generalize Am and prove that the criteria of Theorem 9.3 hold for them.For ease of presentation, for the function Concretize Am we consider basic predicates.Note that for monotonic algebras, basic predicates are in fact intervals, as a conjunction of intervals is an interval.We can assume all predicates are basic since, as we show in Lemma 4.3, for monotonic algebras the transformation from a general formula to a DNF formula of basic predicates is linear.Then, each basic predicate in the formula corresponds to a different predicate in the predicate partition.The definitions of Concretize Am and Generalize Am are generalizations of the functions Concretize A N and Generalize A N given in Example 8.3.We define Concretize Am ( ψ 1 , . . .ψ m ) = Γ 1 , . . ., Γ m where we set Since A m is monotonic, Γ i is well-defined and contains a single element, thus Concretize Am is an efficient concretizing function.
To show that any class of SFAs M Am over a monotonic algebra A m is efficiently identifiable, we define in Algorithm 3 an algorithm that implements a decontaminating function q 0 q 1 0 0, 100 100, 200 200  To do so, it first finds the set Γ M of all elements that are a minimal left point of some interval, and then chooses from S the words over Γ M .It does so as follows.First, note that 100 ∼ S 150, 100 ∼ S 200 and 150 ∼ S 200, while 0 ∼ S 100, 150, 200.Since 0 is the minimal element it has to be in Γ M ; and since 100 is the minimal element that is not equivalent to 0 it has to define a new interval and thus is in Γ M as well.Next, we consider suffixes of words over {0, 100}.These are 0 • 0 and 0 • 100 which are equivalent, and 0 • 200 which is not equivalent to the former.Since 100 is equivalent to 0 it does not define a new interval now, but 200 does as it is the minimal (and only) element that is not equivalent to 0 when considering suffixes of 0. Then, we deduce that Γ M = {0, 100, 200} and thus S = { , 0 , 0, 1 , 100, 0 , 200, 0 , 0 • 0, 1 , 0 • 100, 1 , 0 • 200, 0 }.Now Algorithm InferSFA is applied to the set S and the resulting DFA would be the DFA D M of Figure 4. Then it applies Generalize M A N described in Example 8.3 and the result will be the original SFA of Figure 1.That is, for outgoing transitions of q 0 it applies Generalize A N ( {0}, {100, 200} ) = [0, 100), [100, ∞) and for outgoing transitions of q 1 it applies Generalize A N ( {0, 100}, {200} ) = [0, 200), [200, ∞) and uses these predicates to annotate the corresponding transitions in the SFA.
Example 10.7.Consider the SFA M 1 of Figure 5. Applying CharDFA(Concretize M A N (M 1 )) results in the following S. Now, let S = S ∪ { 150, 1 , 150 • 250, 1 }.Note that S is consistent with L(M 1 ).When trying to learn an SFA from S and applying InferDFA(S ), the algorithm cannot distinguish between the words 150 and 100, as well as between 0 and 150.The same holds also for 0 • 0 vs. 150 • 250, and 100 • 100 vs. 150 • 250.
Figure 7 presents two DFAs that are both consistent with the set S .Since S does not contain any characteristic sample for DFAs over the alphabet {0, 100, 150, 250}, InferDFA concludes that S does not subsume any characteristic sample, and returns the prefix-tree automaton, given in Figure 6.
This example illustrates that InferSFA cannot classify 150 and 150•250 by only applying InferDFA, without reasoning about the predicates of the algebra.To this end we provide the function Decontaminate M A , which is able, in the case of a monotonic algebra, to find which letters are the ones that should be used to define new predicates.

Negative Result
The result of Theorem 10.1 does not extend to the non-monotonic case, as stated in Theorem 11.1 regarding SFAs over the general propositional algebra.Let D B = {B k } k∈N .Recall that B = {0, 1} and B k is the set of all valuations of k atomic propositions.Let P B = {P B k } k∈N where P B k is the set of predicates over at most k atomic propositions.Let A B be the Boolean algebra defined over the discrete domain D B and the set of predicates P B , and the usual operators ∨, ∧ and ¬.Let M A B be the class of SFAs over the Boolean algebra A B .We show that unless P = N P , this class of SFAs is not efficiently identifiable. 11 Theorem 11.1.The class M A B is not efficiently identifiable unless P = N P .
Proof.We show that there is no pair of efficient dyadic concretizing and generalizing functions f c : Π pred (P B ) → Π conc (D B ) and f g : Π conc (D B ) → Π pred (P B ) unless P = N P .From Theorem 8.2 it follows that M B is not efficiently identifiable unless P = N P .
11 This result may be contrasted with [AD18] who provide a positive learnability result regarding SFAs over the OBDDs algebra.The result of [AD18] is with respect to query learning, while Theorem 11.1 concerns efficient identifiability in the limit.As we discuss in section 13, one cannot derive efficient learnability from a positive result in the query learning setting.Moreover, in Theorem 11.1 (as well as in Corollary 12.3 and the discussion in section 12) we refer to efficient learnability with respect the propositional algebra as defined in subsection 2.1 where the size is measured with respect to the number of atomic propositions, while [AD18] refer to the size of the SFAs in which the predicates are OBDDs, whose size is measured by the number of nodes in the OBDD.However, the number of nodes in an OBDD can be exponential in the number of atomic propositions.Therefore, our result has no conflict with the result of [AD18].
Assume towards contradiction that such a pair of functions exist.We provide a polynomial time algorithm A SAT for SAT.On a predicate ϕ, the algorithm A SAT invokes f c ( ϕ, ¬ϕ ).Suppose the returned concrete partition is Γ 1 , Γ 2 .Then A SAT returns "true" if and only if Γ 1 = ∅.Correctness follows from the fact that if there exists a system of characteristic samples for P B then the set of positive examples associated with a satisfiable predicate ϕ must be non-empty, as otherwise f g cannot distinguish ϕ from ⊥.

Query Learning
The paradigm of query learning stipulates that the learner can interact with an oracle (teacher ) by asking it several types of allowed queries.In this section we consider these queries to be membership queries (mq) and equivalence queries (eq).We say that a class M of automata is efficiently learnable using mqs / eqs / both mqs and eqs if there is an algorithm that for every language L with a representation in M asks a polynomial number of mqs / eqs / both mqs and eqs and outputs an automaton in M that is polynomial in the minimal representation of L in M.
Angluin showed, on the negative side, that regular languages cannot be efficiently learned (in the exact model) from only mqs [Ang81] or only eqs [Ang90].On the positive side, she showed that regular languages, represented as DFAs, can be efficiently learned using both mqs and eqs [Ang87a].The celebrated algorithm, termed L * , was extended to learning many other classes of languages and representations, e.g., [Sak90, BV96, AV10, BHKL09, AEF15, MP95, AF16, NFZ21].See the survey [Fis18] for more references.
In particular, an extension of L * , termed MAT * , to learn SFAs was provided in [AD18] which proved that SFAs over an algebra A can be efficiently learned using MAT * if and only if the underlying algebra is efficiently learnable, and the size of disjunctions of k predicates does not grow exponentially in k. 12 From this it was concluded that SFAs over the following underlying algebras are efficiently learnable: Boolean algebras over finite domains, equality algebra, tree automata algebra, and SFAs algebra.Efficient learning of SFAs over a monotonic algebra using mqs and eqs was established in [CDYS17], which improved the results of [MM14, MM17] by using a binary search instead of a helpful teacher.
The result of [AD18] provides means to establish new positive results on learning classes of SFAs using mqs and eqs, but it does not provide means for obtaining negative results for query learning of SFAs using mqs and eqs.We strengthen this result by providing a learnability result that is independent of the use of a specific learning algorithm.In particular, we show that efficient learnability of a Boolean algebra A using mqs and eqs is a necessary condition for the learnability of a class of SFAs over A, as we state in Theorem 12.1.
Theorem 12.1.A class of SFAs M over a Boolean algebra A, that contains all basic SFAs over A, is polynomially learnable using mqs and eqs, only if A is polynomially learnable using mqs and eqs.
Proof.Assume that M is polynomially learnable using mqs and eqs, using an algorithm Q M .We show that there exists a polynomial learning algorithm Q A for the algebra A using mqs and eqs.The algorithm Q A uses Q M as a subroutine, and behaves as a teacher for Q M .Whenever Q M asks an M-mq on word γ 1 . . .γ k , if k > 1 then Q A answers "no".If k = 1 then the M-mq is essentially an A-mq, thus Q A issues this query and passes the answer to Q M .Whenever Q M asks a M-eq on SFA M, if M is not of the form M ψ for some ψ (as defined in Definition 6.1) then Q A answers "no" to the M-eq and returns some word w ∈ L(M) s.t.|w| > 1 and w was not provided before, as a counterexample.To this aim it can record the largest counterexample given so far (according to the lexicographic order) and return the next one in this order.Otherwise (if the SFA is of the form M ψ for some ψ) Q A asks an A-eq on ψ.If the answer is "yes" then Q A terminates and returns ψ as the result of the learning algorithm; if the answer to the A-eq on ψ is "no", then the provided counterexample γ, b γ is passed back to Q M together with the answer "no" to the M-eq.It is easy to verify that Q A terminates correctly in polynomial time.
From Theorem 12.1 we derive what we believe to be the first negative result on learning SFAs from mqs and eqs, as we show that SFAs over A B k , the propositional algebra over k variables, are not polynomially learnable using mqs and eqs.Polynomiality is measured with respect to the parameters n, m, l representing the size of the SFA and the number k of atomic propositions.Note that the algebra A B k is a restriction of the algebra A B considered in section 11 and therefore implies a negative result also with regard to the algebra A B considered there.
We achieve this by showing that no learning algorithm A for the propositional algebra using mqs and eqs can do better than asking 2 k mqs/eqs, where k is the number of atomic propositions. 13We assume the learning algorithm is sound, that is, if S + i and S − i are the sets of positive and negative examples observed by the algorithm up to stage i, then at stage i + 1 the algorithm will not ask a mq for a word in S + i ∪ S − i or an eq for an automaton that rejects a word in S + i or accepts a word in S − i .Proposition 12.2.Let A be a sound learning algorithm for the propositional algebra over B k .There exists a target predicate ψ of size k, for which A will be forced to ask at least 2 k − 1 queries (either mq or eq).
Proof.Since A is sound, at stage i + 1 we have S + i+1 ⊇ S + i and S − i+1 ⊇ S − i and at least one inclusion is strict.Since the size of the concrete alphabet is 2 k , for every round i < 2 k , an adversarial teacher can answer both mqs and eqs negatively.In the case of an eq there must be an element in B k \ (S − i ∪ S + i ) with which the provided automaton disagrees.The adversary will return one such element as a counterexample.This forces A to ask at least 2 k −1 queries.Note that for any element v in B k there exists a predicate ϕ v of size k such that ϕ v = {v}.
Corollary 12.3.SFAs over the propositional algebra A B k with k propositions cannot be learned in poly(k) time using mqs and eqs.
The propositional algebra A B k is a special case of the n-dimensional boxes algebra.Learning n-dimensional boxes was studied using mqs and eqs [GGM94, BGGM98, BK98], as well as in the PAC setting [BK00].The algorithms presented in [GGM94, BGGM98, BK98, BK00] are mostly exponential in n.Alternatively, [GGM94,BGGM98] suggest algorithms that are exponential in the number of boxes in the union.In [BK98] a linear query learning algorithm for unions of disjoint boxes is presented.Since n-dimensional boxes subsume the propositional algebra, Corollary 12.3 implies the following.
13 In [Nak00] Boolean formulas represented using OBDDs are claimed to be polynomially learnable with mqs and eqs.However, [Nak00] measures the size of an OBDD by its number of nodes, which can be exponential in the number of propositions.
Corollary 12.4.The class of SFAs over the n-dimensional boxes algebra cannot be learned in poly(n) time using mqs and eqs.

Discussion
We examined the question of learnability of a class of SFAs over certain algebras where the main focus of our study is on passive learning.We provided a necessary condition for identification of SFAs in the limit using polynomial time and data, as well as a necessary condition for efficient learning of SFAs using membership and equivalence queries.We note that a positive result on learning deterministic SFAs using mqs and eqs implies a positive result for identification of deterministic SFAs in the limit using polynomial time and data.The latter follows because a systematic set of characteristic samples {S L } L∈L for a class of languages L may be obtained by collecting the words observed by the query learner when learning L, and given the SFA is deterministic, the words in the sample can be restricted to ones of polynomial size, thus the size of the sample is polynomial in the size of the SFA. 14However, it does not imply a positive result regarding the stronger notion of efficient identifiability, as the latter requires the set to be also constructed efficiently, and the complexity analysis for query learning does not include the complexity of the teacher in computing queries, e.g., deciding equivalence and in constructing counterexamples.We thus provided a sufficient condition for efficient identification of a class of SFAs, and showed that the class of SFAs over any monotonic algebra satisfies these conditions.
We hope that these sufficient or necessary conditions will help to obtain more positive and negative results for learning of SFAs, and spark an interest in investigating characteristic samples in other automata models used in verification.
Algebra is the Boolean algebra in which the domain D is the set Z∪{−∞, ∞} of integers augmented with two special symbols with their standard semantics, and the set of atomic formulas P 0 consists of intervals of the form [a, b) where a, b ∈ D. The semantics associated with intervals is the natural one: [a, b) = {z ∈ D | a ≤ z and z < b}.If a ≥ b then [a, b) = ∅ and we have that [a, b) is semantically equivalent to ⊥.
(2) There exist two elements d −∞ and d ∞ such that d −∞ ≤ d and d ≤ d ∞ for all d ∈ D; and (3) An atomic predicate ψ ∈ P 0 can be associated with two concrete values a and b such that ψ = {d ∈ D : a ≤ d < b}.Henceforth, we denote an atomic predicate ψ over a monotonic algebra as ψ = [a, b) where ψ = {d ∈ D : a ≤ d < b}.If b ≤ a then we have that ψ = ∅ and thus the predicate is equivalent to ⊥.

Example 8. 3 .
Consider the class M A N of SFAs over the algebra A N of Example 2.
Definition 9.1.A function f d : 2 (D * ×{0,1}) → 2 (D * ×{0,1}) is called decontaminating for a class of SFAs M and a respective Concretize M function if the following holds.Let M ∈ M be an SFA, and let D = Concretize M (M).Let S D = CharDFA(D).Then, for every S ⊇ S D such that S agrees with M, it holds that S D ⊆ f d (S ) ⊆ (S ∩ Γ D ), where Γ D is the alphabet of S D .

Figure 4 .
Figure 4.The DFA DM constructed in CharSFA

Figure 7 .
Figure 7. Two DFAs that are consistent with the set S of Example 10.7.