Predicate Generation for Learning-Based Quantifier-Free Loop Invariant Inference

We address the predicate generation problem in the context of loop invariant inference. Motivated by the interpolation-based abstraction refinement technique, we apply the interpolation theorem to synthesize predicates implicitly implied by program texts. Our technique is able to improve the effectiveness and efficiency of the learning-based loop invariant inference algorithm in [14]. We report experiment results of examples from Linux, SPEC2000, and Tar utility.


Introduction
One way to prove that an annotated loop satisfies its pre-and post-conditions is by giving loop invariants.In an annotated loop, pre-and post-conditions specify intended effects of the loop.The actual behavior of the annotated loop however does not necessarily conform to its specification.Through loop invariants, verification tools can check whether the annotated loop fulfills its specification automatically [9].
Finding loop invariants is tedious and sometimes requires intelligence.Recently, an automated technique based on algorithmic learning and predicate abstraction is proposed [14].Given a fixed set of atomic predicates and an annotated loop, the learning-based technique can infer a quantifier-free loop invariant over the given atomic predicates.By employing a learning algorithm and a mechanical teacher, the new technique is able to generate loop invariants without constructing abstract models nor computing fixed points.
As in other techniques based on predicate abstraction, the selection of atomic predicates is crucial to the effectiveness of the learning-based technique.Oftentimes, users extract atomic predicates from program texts heuristically.If this simple strategy does not yield necessary atomic predicates to express any loop invariants the loop invariant inference algorithm will not be able to infer a loop invariant.Even when the heuristic does give necessary atomic predicates, it may select too many redundant predicates and impede the efficiency of loop invariant inference algorithm.
One way to circumvent this problem is to generate atomic predicates by need.Several techniques have been developed to synthesize atomic predicates by interpolation [8,12,19,20].Let A and B be logic formulae.An interpolant I of A and B is a formula such that A ⇒ I and I ∧ B is inconsistent.Moreover, the non-logical symbols in I must occur in both A and B. By Craig's interpolation theorem, an interpolant I always exists for any firstorder formulae A and B when A ∧ B is inconsistent [6].The interpolant I can be seen as a concise summary of A with respect to B. Indeed, many abstraction refinement techniques for software model checking [8,11,12,19,20] have used interpolation to synthesize atomic predicates.
Inspired by the refinement technique in software model checking, we develop an interpolation-based technique to synthesize atomic predicates in the context of learningbased loop invariant inference.Our algorithm does not add new atomic predicates by interpolating invalid execution paths in control flow graphs.We instead interpolate the loop body with purported loop invariants from the learning algorithm.We adopt the existing interpolating theorem provers [1,2,3,19] for the interpolation.With our new predicate generation technique, we can improve the effectiveness and efficiency of the existing learningbased loop invariant inference technique [14].Constructing the set of atomic predicates is fully automatic and on-demand.
1.1.Example.Consider the following annotated loop: Assume that variables x and y both have the value n ≥ 0 before entering the loop.The loop body decreases each variable by one until the variable x becomes zero.We want to show that x + y is zero after executing the loop.This requires of us to establish the fact that variables x and y have the same value during iterations and eventually become zero after exiting the loop.To express this fact as a loop invariant, we require a predicate x = y.The program text however does not reveal this equality explicitly.Moreover, atomic predicates from the program text cannot express any loop invariant that establishes the given specification.Using atomic predicates in the program text is not sufficient in this case.However, we can exploit the fact that any loop invariant ι should be weaker than the pre-condition δ and stronger than the disjunction of the loop guard κ and the post-condition (δ ⇒ ι ⇒ κ ∨ ).Then, we can gen an interpolant from inconsistent formula δ ∧ ¬(κ ∨ ) and extract atomic predicates in it.From the interpolant of (n , we obtain two atomic predicates x = y and 2y ≥ 0. Observe that the interpolation is able to synthesize the necessary predicate x = y.In fact, loop invariant x = y ∧ x ≥ 0 establishes the specification of the loop.
1.2.Related Work.Jung et al. [14] introduce the loop invariant inference technique based on algorithmic learning.Kong et al. [16] extend this technique to quantified loop invariant inference.Both algorithms require users to provide atomic predicates.The present work addresses this problem for the case of quantifier-free loop invariants.Recently, Lee et al. [18] introduce learning-based technique for termination analysis.The technique infers the transition invariant of a given loop as a proof of termination, by combining algorithmic learning and decision procedures.In the paper, the authors design a heuristic to generate atomic transition predicates.It is an interesting future work to adapt our technique in the present paper for transition invariant inference.
Many interpolation algorithms and their implementations are available [1,2,3,19].Interpolation-based techniques for predicate refinement in software model checking are proposed in [8,11,12,13,20].Abstract models used in these techniques however may require excessive invocations to theorem provers.Another interpolation-based technique for firstorder invariants is developed in [21].The paramodulation-based technique presented in the paper does not construct abstract models as our approach.It however only generates invariants in first-order logic with equality.A template-based predicate generation technique for quantified invariants is proposed [22].The technique reduces the invariant inference problem to constraint programming and generates predicates in user-provided templates.
1.3.Paper Organization.Section 2 gives preliminaries for the presentation.Section 3 reviews the learning-based loop invariant inference framework [14].Section 4 presents our interpolation-based predicate generation technique.Section 5 presents the loop invariant inference algorithms with automatic predicate generation.Section 6 presents and discusses our experimental results.Section 7 concludes this work.

Preliminaries
2.1.Quantifier-free Formulae.Let QF denote the quantifier-free logic with equality, linear inequality, and uninterpreted functions.Define the domain D = Q ∪ B where Q is the set of rational numbers and B = {F , T } is the Boolean domain.Fix a set X of variables.A valuation over X is a function from X to D. The class of valuations over X is denoted by Val X .For any formula θ ∈ QF and valuation ν over free variables in θ, θ is satisfied by ν (written ν |= θ) if θ evaluates to T under ν; θ is inconsistent if θ is not satisfied by any valuation.Given a formula θ ∈ QF , a satisfiability modulo theories (SMT) solver returns a satisfying valuation ν of θ if θ is not inconsistent [3,7].
2.2.Interpolation Theorem.For θ ∈ QF , we denote the set of non-logical symbols occurred in θ by σ(θ).Let Θ = [θ 1 , . . ., θ m ] be a sequence with The third condition of interpolants makes them attractive to use for predicate generation; since the set of symbols in an interpolant should be an intersection of sets of symbols in two inconsistent formulae, it sometimes consists of predicates which do not appear in the two.The interpolation theorem states that an inductive interpolant exists for any inconsistent sequence [6,19,20].Some of existing theorem provers [1,2,3,19] 1): The abstraction function α maps any quantifier-free formula to a Boolean formula in Bool [B P ], whereas the concretization function γ maps any Boolean formula in Bool [B P ] to a quantifier-free formula in QF [P ].Moreover, the function α * maps a valuation over X to a valuation over B P ; the function γ * maps a valuation over B P to a quantifier-free formula in QF [P ].The function Γ(ν) specifies the valuation ν in QF .Observe that quantifier-free formula γ(β) is a minterm when Boolean formula β is a canonical monomial.Observe also that formula γ(α(θ)) is in disjunctive normal form and equivalent to θ ∈ QF [P ].
Consider, for instance, P = {n ≥ 0, x = n, y = n} and The following lemmas prove useful properties of these abstraction and concretization functions.
Lemma 2.1.Let P be a set of atomic predicates, θ ∈ QF [P ], and β a canonical monomial in Bool Note that each θ i is a cube over set P .Let Lit(θ) be a set of literals in formula θ.Then, Lit(θ i ) ⊆ P ∪ {¬p : p ∈ P }.
The other direction is trivial.
Lemma 2.5.Let P be a set of atomic propositions, θ ∈ QF [P ], and µ a Boolean valuation for 2.4.CDNF Learning Algorithm.CDNF algorithm [4] is an exact learning algorithm for Boolean formulae based on monotone theory.It infers an unknown target formula by posing queries to a teacher.The teacher is responsible for answering two types of queries.The learning algorithm may ask if a valuation satisfies the target formula by a membership query.Or it may ask if a conjectured formula is equivalent to the target in an equivalence query.Using the answers for the queries, CDNF algorithm infers a Boolean formula equivalent to the unknown target within a polynomial number of queries in the formula size of the target [4].
2.5.Programs.We consider the following imperative language in this paper: Two basic types are available: natural numbers and Booleans.A term in Exp is a natural number; a term in BExp is of Boolean type.The keyword nondet denotes an arbitrary value in the type of the assigned variable.An annotated loop is of the form: The BExp formula κ is the loop guard.The BExp formulae δ and are the precondition and postcondition of the annotated loop respectively.Define X k = {x k : x ∈ X}.For any term e over X, define e ] for a statement S is a first-order formula over variables X 0 ∪ X 1 defined as follows. [ Let ν and ν be valuations, and S a statement.We write ν ] evaluates to true by assigning ν(x) and ν (x) to x 0 and x 1 for each x ∈ X respectively.Given a sequence of statements S 1 ; S A precondition Pre(θ : S) for θ ∈ QF with respect to the statement S, which is a first-order formula that entails θ after executing the statement S, is defined as follows.
2.6.Problem Definition.Given an annotated loop, Observe that the condition (2) is equivalent to ι ⇒ ∨ κ.The first two conditions specify necessary and sufficient conditions of any loop invariants respectively.The formulae δ and ∨ κ are called the strongest and weakest approximations to loop invariants respectively.We are particularly interested in the following variant of the loop invariant inference problem: (a) Given a set P of atomic predicates, finding an invariant ι ∈ QF [P ]; and (b) Given an annotated loop, finding a suitable set P of atomic predicates that contains enough predicates to express at least one of the invariants.
Jung et al. propose a algorithmic-learning-based technique [14] that solves the part (a) of the problem.The technique combines predicate abstraction and decision procedures to make a mechanical teacher that answers the queries from learning algorithm.With predicate abstraction, the learning algorithm becomes an efficient engine for exploring possible combinations of predicates to find an invariant.
In this paper, we address the part (b) of the problem using interpolation.As already stated in Section 2.2, interpolation provides a systematic method for predicate generation and widely adopted in software model checking.We explain the application of interpolation in the context of learning-based loop invariant inference.

Inferring Loop Invariants with Algorithmic Learning
In this section, we review the learning-based framework for inferring quantifier-free loop invariant due to Jung et al. [14].Given a set P of atomic predicates, the authors show how to apply a learning algorithm for Boolean formulae to infer quantifier-free loop invariants freely generated by P .They first adopt predicate abstraction to relate quantifier-free and Boolean formulae.They then design a mechanical teacher to guide the learning algorithm to a Boolean formula whose concretization is a loop invariant.We first explain the algorithms for resolving queries from the learning algorithm and then the main loop of learning-based loop invariant inference.
3.1.Answering Queries from Algorithmic Learning.Figure 2 shows a high-level view of learning-based loop invariant inference framework.In the framework, a learning algorithm is used to drive the search of loop invariants.It "learns" an unknown loop invariant by inquiring a mechanical teacher.The mechanical teacher of course does not know any loop invariant.It nevertheless tries to answer these queries by the information derived from program texts.In this case, the teacher uses approximations to loop invariants.By employing a learning algorithm, it suffices to design a mechanical teacher to find loop invariants.With predicate abstraction and a learning algorithm for Boolean formulae at hand, it remains to design a mechanical teacher to guide the learning algorithm to the abstraction of a loop invariant.The key idea in [14] is to exploit approximations to loop invariants.An under-approximation to loop invariants is a quantifier-free formula ι which is stronger than some loop invariants of the given annotated loop; an over-approximation is a quantifier-free formula ι which is weaker than some loop invariants.
In the following, we explain exactly how we can answer queries from learning algorithm using under-and over-approximation of loop invariant.
3.1.1.Answering Membership Queries.In the membership query MEM (µ), the teacher is required to answer whether µ |= α(ξ).We concretize the Boolean valuation µ and check it against the approximations.If the concretization γ * (µ) is inconsistent (that is, γ * (µ) is unsatisfiable), we simply answer NO for the membership query.Otherwise, there are three cases: (1) γ * (µ) ⇒ ι.Algorithm 1 shows our membership query resolution algorithm.Note that instead of giving a random answer when a membership query cannot be resolved by given invariant approximations, one can give more accurate answer by exploiting better approximations from static analyzers.This learning-based framework is orthogonal to existing static analysis techniques [14].

Answering Equivalence
Queries.To answer the equivalence query EQ(β), we concretize the Boolean formula β and check if γ(β) is indeed an invariant of the while statement for the given pre-and post-conditions.If it is, we are done.Otherwise, we use an SMT solver to find a witness to α(ξ) ⊕ β.There are three cases: (1) There is a ν such that ν |= ¬(ι ⇒ γ(β)).Then ν |= ι ∧ ¬γ(β).By Lemma 2. As in the membership query resolution, we give a random answer when an equivalence query is not resolved by given invariant approximations.We can still refine approximations using some static analysis to give more accurate counterexample.
/* {δ} while κ do S 1 ; S 2 ; • • • ; S m done { } : an annotated loop */ Output: a loop invariant for the annotated loop ι := δ ∨ ; ι := ∨ κ; repeat call a learning algorithm for Boolean formulae where membership and equivalence queries are resolved by Algorithms 1 and 2 respectively; until a loop invariant is found ; Algorithm 3: Main Loop 3.2.Main Loop of of Inference Framework.The main loop of loop invariant inference algorithm is given in Algorithm 3. We heuristically choose δ ∨ and ∨ κ as the under-and over-approximations respectively.Note that the under-approximation would be stronger if one uses the strongest approximation δ.It is, however, reported that the weaker approximation δ ∨ for the under-approximation is more effective in resolving queries [14].After determining the approximations, a learning algorithm is used to find an invariant.In [14], Jung et al. use CDNF algorithm with Algorithms 1 and 2 for resolving queries.
Note that the mechanical teacher may give conflicting answers.Random answers to membership queries may contradict abstract counterexamples from equivalence queries.Moreover, different valuations may correspond to the same abstract valuation.The learning algorithm cannot infer any loop invariant in the presence of conflicting answers.When the mechanical teacher gives conflicting answers, we restart the learning algorithm and search another loop invariant.In practice, there are nevertheless sufficiently many invariants for an annotated loop.The learning-based technique can infer a loop invariant without incurring any conflicts after a small number of restarts.As an empirical evidence, observe the number of restarts in Table 1.Even without the new predicate generation technique, the numbers of restarts in all but three examples are less than three.The number of restarts is dramatically improved with the new technique since the technique generates predicates incrementally on demand so that it can make the abstraction parsimonious.
We remark that the learning-based loop invariant inference is semi-algorithm; Algorithm 3 terminates with a loop invariant only when there exists one for the loop that can be expressed with the given set of predicates.If there are not enough atomic predicates to express any invariant, the algorithm will iterate indefinitely.For example, tar example in Section 6 timed out because it turned out to have no invariant with only atomic predicates from the program text.

Predicate Generation by Interpolation
One drawback in the learning-based approach to loop invariant inference is to require a set of atomic predicates.It is essential that at least one quantifier-free loop invariant is representable by the given set P of atomic predicates.Otherwise, concretization of formulae in Bool [B P ] cannot be loop invariants.The mechanical teacher never answers YES to equivalence queries.To address this problem, we will synthesize new atomic predicates for the learning-based loop invariant inference framework progressively.
The interpolation is essential to our predicate generation technique.Let Θ = [θ 1 , θ 2 , . . ., θ m ] be an inconsistent sequence of quantifier-free formula and Λ = [λ 0 , λ 1 , λ 2 , . . ., λ m ] its inductive interpolant.By definition, ). Hence λ i can be seen as a concise summary of θ 1 ∧ θ 2 ∧ • • • ∧ θ i with restricted symbols.Since each λ i is written in a less expressive vocabulary, new atomic predicates among variables can be synthesized.We therefore apply the interpolation theorem to synthesize new atomic predicates and refine the abstraction.
Our predicate generation technique consists of three components.Before the learning algorithm is invoked, an initial set of atomic predicates is computed (Section 4.1).When the learning algorithm is failing to infer loop invariants, new atomic predicates are generated to refine the abstraction (Section 4.2).Lastly, conflicting answers to queries may incur from predicate abstraction.We further refine the abstraction with these conflicting answers (Section 4.3).Throughout this section, we consider the annotated loop {δ} while κ do S 1 ; S 2 ; • • • ; S m done { } with the under-approximation ι and over-approximation ι.
4.1.Initial Atomic Predicates.The under-and over-approximations to loop invariants must satisfy ι ⇒ ι.Otherwise, there cannot be any loop invariant ι such that ι ⇒ ι and ι ⇒ ι.Thus, the sequence [ι, ¬ι] is inconsistent.For any interpolant [T , λ, F ] of [ι, ¬ι], we have ι ⇒ λ and λ ⇒ ι.The quantifier-free formula λ can be a loop invariant if it satisfies λ ∧ κ ⇒ Pre(λ : S 1 ; S 2 ; • • • ; S m ).It is however unlikely that λ happens to be a loop invariant.Yet our loop invariant inference algorithm can generalize λ by taking the atomic predicates in λ as the initial atomic predicates.The learning algorithm will try to infer a loop invariant freely generated by these atomic predicates.4.2.Atomic Predicates from Incorrect Conjectures.Consider an equivalence query EQ(β) where β ∈ Bool [B P ] is an abstract conjecture.If the concretization θ = γ(β) is not a loop invariant, we interpolate the loop body with the incorrect conjecture θ.For any quantifier-free formula θ over variables Let φ and ψ be quantifier-free formulae over X. Define the following sequence: Observe that • φ 0 and [[S 1 ]] 0 share the variables X 0 ; • [[S m ]] m−1 and ¬ψ m share the variables X m ; and Starting from the program states satisfying φ 0 , the formula characterizes the images of φ 0 during the execution of S 1 ; S 2 ; • • • ; S i .Lemma 4.1.Let X denote the set of variables in the statement S 1 ; S 2 ; • • • ; S i , and φ a quantifier-free formula over X.For any valuation ν over Proof.By induction on the length of statement S 1 ; S 2 ; • • • ; S i .Suppose that the lemma is true for statement S 1 ; S ]] i and vice versa.By induction hypothesis, the formula ]] i is satisfied by ν and the statement follows by it.
/* {δ} while κ do S 1 ; • • • ;S m done { } : an annotated loop */ /* ι, ι : under-and over-approximations to loop invariants */ Input: 4.3.Atomic Predicates from Conflicting Abstract Counterexamples.Because of the abstraction, conflicting abstract counterexamples may be given to the learning algorithm.Consider the example in Section 1. Recall that n ≥ 0 ∧ x = n ∧ y = n and x + y = 0 ∨ x > 0 are the under-and over-approximations respectively.Suppose there is only one atomic predicate y = 0.The learning algorithm tries to infer a Boolean formula λ ∈ Bool [b y=0 ].Let us resolve the equivalence queries EQ(T ) and EQ(F ).On the equivalence query EQ(F ), we check if F is weaker than the under-approximation by an SMT solver.It is not, and the SMT solver gives the valuation ν 0 (n) = ν 0 (x) = ν 0 (y) = 1 as a witness.Applying the abstraction function α * to ν 0 , the mechanical teacher returns the abstract counterexample b y=0 → F .The abstract counterexample is intended to notify that the target formula λ and F have different truth values when b y=0 is F .That is, λ is satisfied by the valuation b y=0 → F .On the equivalence query EQ(T ), the mechanical teacher checks if T is stronger than the over-approximation.It is not, and the SMT solver now returns the valuation ν 1 (x) = 0, ν 1 (y) = 1 as a witness.The mechanical teacher in turn computes b y=0 → F as the corresponding abstract counterexample.The abstract counterexample notifies that the target formula λ and T have different truth values when b y=0 is F .That is, λ is not satisfied by the valuation b y=0 → F .Yet the target formula λ cannot be satisfied and unsatisfied by the valuation b y=0 → F .We have conflicting abstract counterexamples.

Loop Invariant Inference Algorithms with Predicate Generation
Algorithm 6 is the main loop of inference framework with predicate generation.The algorithm is the same as Algorithm 3 except the gray-boxed parts.
We first compute the initial set of atomic predicates by interpolating ι and ¬ι (Section 4.1).With the initial set, we start the learning process until the algorithm finds a loop invariant or there is an exception raised.Exceptions basically mean that the current set of predicates might not be enough to find a loop invariant.We need in this case to find more predicates using one of the algorithms explained in Section 4.
The learning algorithm finds conflicting abstract counterexamples when the equivalence query resolution algorithm gives a random counterexample that contradicts the previous ones or the current predicate abstraction is too coarse.Since we cannot distinguish the two, we always generate more predicates using Algorithm 5, hoping that we can find a loop invariant in the next iteration.
The ExcessiveRandomAnswers exception is raised when our new equivalence query resolution algorithm, which is detailed later, suspects that it generates too many random counterexamples because of the coarse predicate abstraction.In this case, we generate more predicates using Algorithm 4.
Note that we start the learning algorithm from the scratch every time we generate more predicates.The reason is because we use CDNF algorithm for learning that handles only a fixed number of Boolean variables.Recently, Chen et al. [5] propose a variant of CDNF and concludes the loop invariant inference algorithm.Otherwise, the mechanical teacher compares the concretization of the abstract conjecture with approximations to loop invariants.If the concretization is stronger than the under-approximation, weaker than the over-approximation, or it does not satisfy the necessary condition given in Proposition 4.3, an abstract counterexample is returned after recording the witness valuation [14,16].The witnessing valuations are needed to synthesize atomic predicates in Algorithm 6 when conflicts occur.If the concretization is not a loop invariant and falls between both approximations to loop invariants, there are two possibilities.The current set of atomic predicates is sufficient to express a loop invariant; the learning algorithm just needs a few more iterations to infer a solution.Or, the current atomic predicates are insufficient to express any loop invariant; the learning algorithm cannot derive a solution with these predicates.Since we cannot tell which scenario arises, a threshold is deployed heuristically.If the number of random abstract counterexamples is less than the threshold, we give the learning algorithm more time to find a loop invariant.Only when the number of random abstract counterexamples exceeds the threshold, can we synthesize more atomic predicates for abstraction refinement.Intuitively, the current atomic predicates are likely to be insufficient if lots of random abstract counterexamples have been generated.In this case, we raise ExcessiveRandomAnswers exception to synthesize more atomic predicates from the incorrect conjecture in Algorithm 6. Observe that in Algorithm 6, threshold τ is set to 1.3 |P | , the approximate size of the search space, which we found empirically.

Experimental Results
We have implemented the proposed technique in OCaml.In our implementation, the SMT solver Yices and the interpolating theorem prover CSIsat [1] are used for query resolution and interpolation respectively.In addition to the examples in [14], we add two more examples: riva is the largest loop expressible in our simple language from Linux1 , and tar is extracted from Tar  In the table, the column Previous represents the work in [14] where atomic predicates are chosen heuristically.Specifically, all atomic predicates in pre-and post-conditions, loop guards, and conditions of if statements are selected.The column Current gives the results for our automatic predicate generation technique.Interestingly, heuristically chosen atomic predicates suffice to infer loop invariants for all examples except tar.For the tar example, the learning-based loop invariant inference algorithm fails to find a loop invariant due to ill-chosen atomic predicates.In contrast, our new algorithm is able to infer a loop invariant for the tar example in 0.02s.The number of atomic predicates can be significantly reduced as well.Thanks to a smaller number of atomic predicates, loop invariant inference becomes more economical in these examples.Without predicate generation, four of the six examples take more than one second.Only one of these examples takes more than one second using the new technique.Particularly, the parser example is improved in orders of magnitude.
The column BLAST gives the results of lazy abstraction technique with interpolants implemented in BLAST [20].In addition to the total elapsed time, we also show the preprocessing time in parentheses.Since the learning-based framework does not construct abstract models, our new technique outperforms BLAST in all cases but one (ide-wait-ireason).If we disregard the time for preprocessing in BLAST, the learning-based technique still wins three cases (ide-ide-tape, tar, vpr) and ties one (usb-message).Also note that the number of atomic predicates generated by the new technique is always smaller except parser.Given the simplicity of the learning-based framework, our preliminary experimental results suggest a promising outlook for further optimizations.6.1.tar from Tar.This simple fragment is excerpted from the code for copying two buffers.M items in the source buffer are copied to the target buffer that already has N items.The variable size keeps the number of remaining items in the source buffer and copy denotes the number of items in the target buffer after the last copy.In each iteration, an arbitrary number of items are copied and the values of size and copy are updated accordingly.
Observe that the atomic predicates in the program text cannot express any loop invariant that proves the specification.However, our new algorithm successfully finds the following loop invariant in this example: M + N ≤ copy + size ∧ copy + size ≤ M + N equal to 0.06.In contrast, the execution time for [14] ranges from 1.20s to 80.20s with the standard deviation equal to 14.09.By Chebyshev's inequality, the new algorithm infers a loop invariant in one second with probability greater than 0.988.With a compact set of atomic predicates, loop invariant inference algorithm performs rather predictably.3. ide-wait-ireason from Linux Device Driver.In the ide-wait-ireason example (Figure 5), predicate generation performs better even though it generates the same number of atomic predicates.This is because the technique can synthesize the atomic predicate retries ≤ 100 which does not appear in the program text but is essential to loop invariants.Surely this atomic predicate is expressible by the two atomic predicates retries = 100 and retries < 100 from the program text.However the search space is significantly reduced with the more succinct atomic predicate retries ≤ 100.Subsequently, the learning algorithm only needs a quarter of queries to infer a loop invariant.

Conclusions
A predicate generation technique for learning-based loop invariant inference was presented.The technique applies the interpolation theorem to synthesize atomic predicates implicitly implied by program texts.To compare the efficiency of the new technique, examples excerpted from Linux, SPEC2000, and Tar source codes were reported.The learning-based loop invariant inference algorithm is more effective and performs much better in these realistic examples.
More experiments are always needed.Especially, we would like to have more realistic examples which require implicit predicates unavailable in program texts.Additionally, loops manipulating arrays often require quantified loop invariants with linear inequalities.Extension to quantified loop invariants is also important.

Figure 3 .
Figure 3.A Sample Loop in Tar

{Figure 5 .
Figure 5.A Sample Loop in Linux IDE Driver Predicate Abstraction.Let QF [P ] denote the set of quantifier-free formulae over the set P of atomic predicates.A cube over P is a conjunctionp 1 ∧ • • • ∧ p k ∧ ¬p k+1 ∧ • • • ∧¬p k+k where all p j ∈ P are distinct.We say that k + k is the size of the cube.A minterm over P is a cube whose size is |P |.Consider the set Bool [B P ] of Boolean formulae over the set B P of Boolean variables where B P = {b p : p ∈ P }.An abstract valuation is a function from B P to B. We write Val B P for the set of abstract valuations.A Boolean formula in Bool [B P ] is a canonical monomial if it is a conjunction of literals, where each Boolean variable in B P occurs exactly once.The following functions [14, 15] relate formulae in QF [P ] and Bool [B P ] (Figure Moreover, the new framework does not construct abstract models nor compute fixed points.It can be more scalable than traditional techniques.After formulae in QF and valuations in Val X are abstracted to those in Bool [B P ] and Val B P respectively, a learning algorithm is used to infer abstractions of loop invariants.Let ξ be an unknown target Boolean formula in Bool [B P ].A learning algorithm computes a representation of the target ξ by interacting with a teacher.The teacher should answer the following queries [4]: • Membership queries.Let µ ∈ Val B P be an abstract valuation.The membership query MEM (µ) asks if the unknown target ξ is satisfied by µ.If so, the teacher answers YES ; otherwise, NO. • Equivalence queries.Let β ∈ Bool [B P ] be an abstract conjecture.The equivalence query EQ(β) asks if β is equivalent to the unknown target ξ.If so, the teacher answers YES .Otherwise, the teacher gives an abstract valuation µ such that the exclusive disjunction of β and ξ is satisfied by µ.The abstract valuation µ is called an abstract counterexample.

Table 1 .
Experimental Results.P : # of atomic predicates, MEM : # of membership queries, EQ : # of equivalence queries, RE : # of the learning algorithm restarts, T : total elapsed time (s).
2. All examples are translated into annotated loops manually.Data are the average of 100 runs and collected on a 2.4GHz Intel Core2 Quad CPU with 8GB memory running Linux 2.6.31(Table1).