Predicate Abstraction via Symbolic Decision Procedures

We present a new approach for performing predicate abstraction based on symbolic decision procedures. Intuitively, a symbolic decision procedure for a theory takes a set of predicates in the theory and symbolically executes a decision procedure on all the subsets over the set of predicates. The result of the symbolic decision procedure is a shared expression (represented by a directed acyclic graph) that implicitly represents the answer to a predicate abstraction query. We present symbolic decision procedures for the logic of Equality and Uninterpreted Functions (EUF) and Difference logic (DIFF) and show that these procedures run in pseudo-polynomial (rather than exponential) time. We then provide a method to construct symbolic decision procedures for simple mixed theories (including the two theories mentioned above) using an extension of the Nelson-Oppen combination method. We present preliminary evaluation of our Procedure on predicate abstraction benchmarks from device driver verification in SLAM.


Introduction
Predicate abstraction is a technique for automatically creating finite abstract models of finite and infinite state systems [GS97].The method has been widely used in abstracting finite-state models of programs in SLAM [BMMR01] and numerous other software verification projects [HJMS02, CCG + 04].It has also been used for synthesizing loop invariants [FQ02] and verifying distributed protocols [DDP99,LBC03].
The fundamental operation in predicate abstraction can be summarized as follows: Given a set of predicates P describing some set of properties of the system state, and a formula e, compute the weakest Boolean formula F P (e) over the predicates P that implies e 1 .Most implementations of predicate abstraction [GS97,BMMR01] construct F P (e) by collecting the set of cubes (a conjunction of the predicates or their negations) over P that imply e.The implication is checked using a first-order theorem prover.This method may require making a very large (2 |P | in the worst case) number of calls to a theorem prover and can be expensive.
We propose a new way to perform predicate abstraction based on symbolic decision procedures.A symbolic decision procedure for a theory T (SDP T ) takes sets of predicates G and E and symbolically executes a decision procedure for T on G ′ ∪ {¬e | e ∈ E} 2 , for all the subsets G ′ of G.The output of SDP T (G, E ) is a shared expression (an expression where common subexpressions can be shared) representing those subsets G ′ ⊆ G, for which G ′ ∪ {¬e | e ∈ E} is unsatisfiable.We show that such a procedure can be used to compute F P (e) for performing predicate abstraction.
We present symbolic decision procedures for the logic of Equality and Uninterpreted Functions(EUF) and Difference logic (DIF) and show that these procedures run in polynomial and pseudo-polynomial time respectively, and therefore produce compact shared expressions.We provide a method to construct SDP for a combination of two simple theories T 1 ∪ T 2 (including EUF + DIF), by using an extension of the Nelson-Oppen combination [NO80] method.We use Binary Decision Diagrams (BDDs) [Bry86] to construct F P (e) from the shared representations efficiently in practice.
We present a preliminary evaluation of our procedure on predicate abstraction benchmarks from device driver verification in SLAM, and show that our method outperforms existing methods for doing predicate abstraction.
The rest of the paper is organized as follows: Section 1.1 describes related work in predicate abstraction techniques.Section 2 describes the background concepts including predicate abstraction.Section 3 describes symbolic decision procedures, and instantiates it for two different theories (EUF and DIF).Section 4 describes a framework for modularly combining the SDPs for two theories that satisfy certain requirements, using an extension of the Nelson-Oppen combination method.Section 5 describes the implementation and the experimental evaluation of our technique.Finally, we present the conclusions and future work in Section 6.
1.1.Related Work.Several techniques have been suggested to improve the performance of predicate abstraction.The techniques can be broadly classified into three categories: In the first category, we classify methods that treat the decision procedures as a "black box", and attempt to minimize the number of decision procedure calls during predicate abstraction.The second category consists of methods that use a quantifier elimination procedure to perform predicate abstraction.Finally, there are techniques that do not compute the most precise abstract directly; instead, they rely on counterexamples or proofs in the overall verification process to refine the abstraction.In the following paragraphs, we describe these techniques in more details.
The techniques that aim to reduce the number of calls to the theorem prover or decision procedure are mostly based on enumerating cubes over P in an increasing order of their size.Das et al. [DDP99] enumerates cubes over a tree, after fixing the order of predicates that appear in any path to the leaves.If a cube is found unsatisfiable, then all its subcubes (represented by the subtree) are pruned off.This method may require 2 |P |+1 calls to the theorem prover in the worst case.Saidi and Shankar [SS99] relaxes the order on the predicates, and enumerate all possible cubes (3 |P | of them) over the predicates.Flanagan and Qadeer [FQ02] provide an algorithm that searches over the 2 |P | clauses (disjunction of cubes over the predicates or their negations) of size |P |, but attempts to greedily grow the clause (by dropping literals) when such a clause is implied by the formula e.Their technique requires |P |.2 |P | theorem prover calls in the worst case.Other techniques sacrifice precision to gain efficiency, by only considering cubes of some fixed length [BMMR01].All these techniques may require an exponential number of theorem prover calls in the worst case, and demonstrate worst case behavior in practice.However, more importantly, since these queries are not incremental, the state of the prover has to be reset across each call, precluding any learning across calls.
Alternately, predicate abstraction can be formulated as a quantifier elimination problem.Lahiri et al. [LBC03] and Clarke et al. [CKSY04] perform predicate abstraction by reducing the problem of computing F P (e) to Boolean quantifier elimination.The former method first transforms a first-order quantifier elimination problem into Boolean quantifier elimination by encoding first-order formulas into Boolean formulas; the latter assumes all variables are propositional.The method in [LBC03] first converts the quantifier-free first-order formula to a Boolean formula such that the translation preserves the set of satisfying assignments of the Boolean variables in the original formula.Both these techniques use incremental Boolean Satisfiability (SAT) techniques [CKSY04,McM02] to perform the Boolean quantifier elimination.These techniques have the benefit that the large number of calls to the theorem prover is avoided, and learning can be used to prune away the search space in the SAT solver.However, the translation from a first-order formula to a Boolean formula can result in a loss of structure (since the arithmetic operations are encoded as bitwise operations), and make the translation inefficient.Namjoshi and Kurshan [NK00] also proposed using quantifier elimination for first-order logic directly to perform predicate abstraction -however many theories (such as the theory of Equality with Uninterpreted Functions) do not admit quantifier elimination.
Most of the above approaches use decision procedures or SAT solvers as "black boxes", at best in an incremental fashion, to perform predicate abstraction.We believe that having a customized procedure for predicate abstraction can help improve the efficiency of predicate abstraction on large problems.
Finally, there are a set of techniques to avoid computing the most precise abstraction upfront, and refine it only based on failed proof attempts in the verification tool.Das and Dill [DD01] and subsequently Ball et al. [BCDR04] use counterexamples to refine the predicate abstraction incrementally.Jhala and McMillan [JM05] use interpolants to refine the predicate abstraction.It is not clear if it is always preferable to compute the abstraction incrementally.But, we have observed that the refinement loop can often becomes the main bottleneck in these techniques (for example in SLAM), and limits the scalability of the overall system [BCDR04].application of a function symbol to a list of terms.A formula can be the constants true or false or an atomic formula or Boolean combination of other formulas.Atomic formulas can be formed by an equality between terms or by an application of a predicate symbol to a list of terms.

Setup
The function and predicate symbols can either be uninterpreted or can be defined by a particular theory.For instance, the theory of integer linear arithmetic defines the functionsymbol "+" to be the addition function over integers and "<" to be the comparison predicate over integers.If an expression involves function or predicate symbols from multiple theories, then it is said to be an expression over mixed theories.
A formula F is said to be satisfiable if it is possible to assign values to the various symbols in the formula from the domains associated with the theories to make the formula true.A formula is valid if ¬F is not satisfiable (or unsatisfiable).We say a formula A implies a formula B (A ⇒ B) if and only if (¬A) ∨ B is valid.
We define a shared expression to be a Directed Acyclic Graph (DAG) representation of an expression where common subexpressions can be shared, by using names to refer to common subexpressions.For example, the intermediate variable t refers to the expression e 1 in the shared expression "let t = e 1 in (e 2 ∧ t) ∨ (e 3 ∧ ¬t)".

Predicate Abstraction.
A predicate is an atomic formula or its negation3 .If G is a set of predicates, then we define G .= {¬g | g ∈ G}, to be the set containing the negations of the predicates in G.We use the term "predicate" in a general sense to refer to any atomic formula or its negation and should not be confused to only mean the set of predicates that are used in predicate abstraction.
Definition 2.1.For a set of predicates P , a literal l i over P is either a predicate p i or ¬p i , where p i ∈ P .A cube c over P is a conjunction of literals.A clause cl over P is a disjunction of literals.Finally, a minterm over P is a cube with |P | literals, and exactly one of p i or ¬p i is present in the cube.Given a set of predicates P .= {p 1 , . . ., p n } and a formula e, the main operation in predicate abstraction involves constructing the weakest Boolean formula F P (e) over P such that F P (e) ⇒ e.The expression F P (e) can be expressed as the set of all the minterms over P that imply e: F P (e) = {c | c is a minterm over P and c implies e} (2.1) Proposition 2.2.For a set of predicates P and a formula e, the following statements are true: Figure 2: Inference rules for theory of equality and uninterpreted functions.
We know that F P (e) ⇒ e, by the definition of F P (e).By contrapositive rule, ¬e ⇒ ¬F P (e).But F P (¬e) ⇒ ¬e.Therefore, F P (¬e) ⇒ ¬F P (e).
To prove the third equation, note that F P (e 1 ) ∨ F P (e 2 ) ⇒ e 1 ∨ e 2 and F P (e 1 ∨ e 2 ) is the weakest expression that implies e 1 ∨ e 2 .
The operation F P (e) does not distribute over disjunctions.Consider the example where P .
The above properties suggest that one can adopt a two-tier approach to compute F P (e) for any formula e: (1) Convert e into an equivalent Conjunctive Normal Form (CNF), which comprises of a conjunction of clauses, i.e., e ≡ ( i cl i ).
(2) For each clause cl i .
= (e i 1 ∨ e i 2 . . .∨ e i m ), compute r i .= F P (cl i ) and return F P (e) .= i r i .To obtain an equivalent CNF form, one cannot introduce auxiliary variables (to keep the size of the resulting formula linear in the size of the input formula), as is typically done during an equisatisfiable CNF translation.These auxiliary variables introduced have to be existentially quantified out to obtain an equivalent formula.In our case, the CNF representation of the formula can be exponentially large compared to the original formula.However, we can use recent techniques to obtain the CNF form lazily, by a method proposed by McMillan [McM02].
For the rest of hte paepr, we focus here on computing F P ( e i ∈E e i ) when e i is a predicate.Unless specified otherwise, we always use e to denote ( e i ∈E e i ), a disjunction of predicates in the set E in the sequel.

Symbolic Decision Procedures (SDP)
We now show how to perform predicate abstraction using symbolic decision procedures.We start by describing a saturation-based decision procedure for a theory T and then use it to describe the meaning of a symbolic decision procedure for the theory T .Finally, we show how a symbolic decision procedure can yield a shared expression of F P (e) for predicate abstraction.
A set of predicates G (over theory T ) is unsatisfiable if the formula ( g∈G g) is unsatisfiable.For a given theory T , the decision procedure for T takes a set of predicates G in the theory and checks if G is unsatisfiable.A theory is defined by a set of inference rules.An inference rule R is of the form: which denotes that the predicate A can be derived from predicates A 1 , . . ., A n in one step.
Each theory has at least one inference rule for deriving contradiction (⊥).We also use g : − g 1 , . . ., g k to denote that the predicate g (or ⊥, where g = ⊥) can be derived from the predicates g 1 , . . ., g k using one of the inference rules in a single step.Figure 2 describes the inference rules for the theory of Equality and Uninterpreted Functions.
3.1.Saturation based decision procedures.Consider a simple saturation-based procedure DP T shown in Figure 3, that takes a set of predicates G as input and returns satisfiable or unsatisfiable.
The algorithm maintains two sets: (i) W is the set of predicates derived from G up to (and including) the current iteration of the loop in step (2); (ii) W ′ is the set of all predicates derived before the current iteration.These sets are initialized in step (1).During each iteration of step (2), if a new predicate g can be derived from a set of predicates {g 1 , . . ., g k } ⊆ W ′ , then g is added to W .The loop terminates after a bound derivDepth T (G).In step (3), we check if any subset of facts in W can derive contradiction.If such a subset exists, the algorithm returns unsatisfiable, otherwise it returns satisfiable.
The parameter d .= derivDepth T (G) is a bound (that is determined solely by the set G for the theory T ) such that if the loop in step (2) is repeated for at least d steps, then DP T (G) returns unsatisfiable if and only if G is unsatisfiable.If such a bound exists for any set of predicates G in the theory, then DP T procedure implements a decision procedure for T .Definition 3.1.A theory T is called a bounded saturation theory, if the procedure DP T described in Figure 3 implements a decision procedure for T .
In the rest of the paper, we only consider bounded saturation theories.Since there is no ambiguity, we will drop the term "bounded" in the rest of the paper and refer to such a theory as saturation theory.To show that a theory T is a saturation theory, it suffices to consider a decision procedure algorithm for T (say A T ) and show that DP T implements A T .This can be shown by deriving a bound on derivDepth T (G) for any set G in the theory. ( • return unsatisfiable (4) else return satisfiable to denote i ≤ m ≤ j.
3.2.Symbolic Decision Procedure.For a (saturation) theory T , a symbolic decision procedure for T (SDP T ) takes sets of predicates G and E as inputs, and symbolically simulates Figure 4 presents the symbolic decision procedure for a theory T , which symbolically executes the saturation based decision procedure DP T on all possible subsets of the input component G. Just like the DP T algorithm, this procedure also has three main components: initialization, saturation and contradiction detection.The algorithm also maintains sets W and W ′ , as the DP T algorithm does.
Since SDP (G, E ) has to execute DP T (G ′ ∪ E ) on all G ′ ⊆ G, the number of steps to iterate the saturation loop equals the maximum derivDepth T (G ′ ∪ E) for any G ′ ⊆ G.For a set of predicates S, we define the bound maxDerivDepth T (S) as follows: During the execution, the algorithm constructs a set of shared expressions with the variables over B G as the leaves and temporary variables t[•] to name intermediate expressions.We use t[(g, i)] to denote the expression for the predicate g after the iteration i of the loop in step (2) of the algorithm.We use t[(g, ⊤)] to denote the top-most expression for g in the shared expression.Below, we briefly describe each of the phases of SDP T : : For any e i ∈ E, since ¬e i is present in all possible subset G ′ ∪ E, we replace the leaf for ¬e i with true.: Saturation [Step (2)].For each predicate g, S(g) is the set of derivations of g from predicates in W ′ during any iteration.For any predicate g, we first add all the ways to derive g until the previous steps by adding t[(g, i − 1)] to S(g).Every time g can be derived from some set of facts g 1 , . . ., g k such that each g j is in W ′ , we add this derivation to S(g) in Equation 3.1.At the end of the iteration i, t[(g, i)] and t[(g, ⊤)] are updated with the set of derivations in S(g).The loop is executed maxDerivDepth T (G ∪ E) times.
(1) Initialization Update the set of derivations of g at this level:  : , where the leaves of the expression are the variables in B G .The only operations in t[e] are conjunction and disjunction; t[e] is thus a Boolean expression (or a Boolean circuit) over B G .The internal nodes in the expression are shared and can be inputs to multiple nodes in the subsequent level.We now define the evaluation of a (shared) Boolean expression inductively with respect to a subset G ′ ⊆ G. Definition 3.2.For any Boolean expression t[x] whose leaves are in set B G , and a set G ′ ⊆ G, we define eval(t[x ], G ′ ) as the recursive evaluation of t[x], after replacing each leaf b g of t[x] with true if g ∈ G ′ and with false otherwise.The propositional connectives in the expression (∧ and ∨) are interpreted using their standard meaning.
The following theorem explains the correctness of the symbolic decision procedure.
To prove Theorem 3.3, we first describe an intermediate lemma about SDP T .To disambiguate between the data structures used in DP T and SDP T , we use W S and W ′ S (corresponding to symbolic) to denote W and W ′ respectively for the SDP algorithm.Moreover, it is also clear that W ′ (respectively W ′ S ) at the iteration i (i > 1) is the same as W (respectively W S ) after i − 1 iterations.
Lemma 3.4.For any set of predicates

and
(2) eval(t[(g, i)], G ′ ) = true if and only if g ∈ W for the DP T algorithm.
Proof.We use an induction on i to prove this lemma, starting from i = 0.
For the base case (after step (1) of both algorithms), Moreover, for this step, eval(t[(g, 0 )], G ′ ) for a predicate g can be true in two ways.
(1) If g ∈ E, then step (1) of SDP T assigns it to true.Therefore eval ) which is true, by the definition of eval(, ).Again g ∈ W after step (1) of the DP T algorithm too.Let us assume that the inductive hypothesis holds for all values of i less than m.Consider the iteration number m.It is easy to see that if any fact g is added to W in this step, then g is also added to W S ; therefore part (1) of the lemma is easily established.
To prove part (2) of the lemma, we will consider two cases depending of whether a predicate g was present in W before the m th iteration: (1) Let us assume that after m − 1 iterations of DP T (G ′ ∪ E ) procedure, g ∈ W . Since g is never removed from W during any step of DP T , g ∈ W after m iterations too.Now, by the inductive hypothesis, eval(t[(g, m − 1 )], G ′ ) = true.However, t[(g, m − 1)] =⇒ t[(g, m)] (because t[(g, m)] contains t[(g, m − 1)] as one of its disjuncts in step 2(c) of the SDP T algorithm).Therefore, eval(t[(g, m)], G ′ ) = true.
(2) We have to consider two cases depending on whether g can be derived in DP T (G ′ ∪ E ) in step m.
(a) If g can't be derived in this step in DP T algorithm, then there is no set {g 1 , . . ., g k } ⊆ W ′ (of DP T ) such that g : − g 1 , . . ., g k .Since W ′ is the same as W after m − 1 iterations, we can invoke the induction hypothesis to show that there exists a predicate But for each g j ∈ {g 1 , . . ., g k }, eval((g j , m − 1 ), G ′ ) = true and thus eval((g, m), G ′ ) = true.This completes the induction proof.
We are now ready to complete the proof of Theorem 3.3.
Proof.Consider the situation where both SDP T (G, E ) and DP T (G ′ ∪ E ) have executed the loop in step (2) for i = maxDerivDepth T (G ∪ E).We will consider two cases depending on whether ⊥ can be derived in DP T (G ′ ∪ E ) in step (3).
Corollary 3.5.For a set of predicates P , if t[e] .
Hence t[e] is a shared expression for F P (e), where e denotes e i ∈E e i .An explicit representation of F P (e) can be obtained by first computing t[e] .
= SDP T (P ∪ P , E ) and then enumerating the cubes over P that make t[e] true.
In the following sections, we will instantiate T to be the EUF and DIF theories and show that SDP T exists for such theories.For each theory, we only need to determine the value of maxDerivDepth T (G) for any set of predicates G.
Example 3.6.= {a = c} are limited to equality and disequality predicates.For this theory T , maxDerivDepth T (G ∪ E) equals the lg(m), where m is the number of terms in G ∪ E. We do not show this result for equality theory in this paper, but prove it for the more general theory of difference logic in Section 3.4.Therefore, we need to iterate Step (2) of the algorithm, for lg({a, b, c, d}) = 2 steps in Figure 4.
First, a Boolean variable b g is introduced for each of the predicate g ∈ G.These variables represent t[(g, 0)] for each g ∈ G.For each e i ∈ E, we use true to represent t[(e i , 0)].Then the Step (2) of the algorithm is repeated for 2 steps.At each step, new derivations are produced from the existing set of predicates at the level.The nodes at each level denotes the set W for the particular iteration.Each derivation from two predicates in W is represented as the conjunction of the two predicates (using the diamond connective), and multiple derivations for a predicate (e.g. 3 ways to derive a = c for i = 2) are represented with multiple incoming edges to a node.
Finally, the contradiction inference rule is used to derive contradictions (⊥) at the last level.Since the only way to derive contradiction in this example is using a = c and a = c, this is the only derivation of ⊥.The expression t[e] represents the acyclic graph rooted at ⊥, whose leaves are symbols in B G .The expression t[e] intuitively represents all the derivations of a = c from G.More precisely, it represents all the subsets of G that are inconsistent with a = c.
There are a couple of observations that one can make from the previous example: (1) The expression t[e] is a Boolean formula with B G as inputs and an alternation of AND and OR operations.There are no negations (NOT) in the formula.(2) Even for this simple example, there are several redundant derivations.For example, consider the node a = b in level i = 2.At this level, a = b can either be derived from a = b or from b = c and a = c, in the previous level.However, the derivation of a = c in level i = 1 already uses a = b (at level i = 0) for one of its derivations.This means that the set of derivations of a = b in level i = 2 contains redundant derivations.These derivations do not affect the correctness of the procedure, but simply increases the size of t[e].However, as we will see in the next two sections, the size of the graph for t[e] is still (pseudo) polynomially bounded for interesting theories.
Remark 3.7.It may be tempting to terminate the loop in step (2) of SDP T (G, E ) once the set of predicates in W does not change across two iterations.However, this would lead to an incomplete procedure and the following example demonstrates this.
Example 3.8.Consider an example where G contains a set of predicates that denotes an "almost" fully connected graph over vertices x 1 , . . ., x n .G contains an equality predicate between every pair of variables except the edge between x 1 and x n .Let E .
= {x 1 = x n }.After one iteration of the SDP T algorithm on this example, W will contain an equality between every pair of variables including x 1 and x n since x 1 = x n can be derived from x 1 = x i , x i = x n , for every 1 < i < n.Therefore, if the SDP T algorithm terminates once the set of predicates in W stabilizes, the procedure will terminate after two steps.Now, consider the subset For this subset of G, DP T (G ′ ∪ E ) requires lg(n) > 1 (for n > 2) steps to derive the (1) Partition the set of terms in terms(G) into equivalence classes using the G = predicates.At any point in the algorithm, let EC (t) denote the equivalence class for any term t ∈ terms(G).
(a) Initially, each term belongs to its own distinct equivalence class.(b) We define a procedure merge(t 1 , t 2 ) that takes two terms as inputs.The procedure first merges the equivalence classes of t 1 and t 2 .If there are two terms (2) If there exists a predicate t 1 = t 2 in G = , such that EC (t 1 ) = EC (t 2 ), then return unsatisfiable; else satisfiable.3.3.SDP for Equality and Uninterpreted Functions.The terms in this logic can either be variables or application of an uninterpreted function symbol to a list of terms.A predicate in this theory is t 1 ∼ t 2 , where t i is a term and ∼ ∈ {=, =}.For a set G of EUF predicates, G = and G = denote the set of equality and disequality predicates in G, respectively.Figure 2 describes the inference rules for this theory.
Let terms(φ) denote the set of syntactically distinct terms in an expression (a term or a formula) φ.For example, terms(f (h(x))) is {x, h(x), f (h(x))}.For a set of predicates G, terms(G) denotes the union of the set of terms in any g ∈ G.
A decision procedure for EUF can be obtained by the congruence closure algorithm [NO80], described in Figure 6.
For a set of predicates G, let m = |terms(G)|.We can show that if we iterate the loop in step (2) of DP T (G) (shown in Figure 3) for at least 3m steps, then DP T can implement the congruence closure algorithm.More precisely, for two terms t 1 and t 2 in terms(G), the predicate t 1 = t 2 will be derived within 3m iterations of the loop in step 2 of DP T (G) if and only if EC (t 1 ) = EC (t 2 ) after step (1) of the congruence closure algorithm (see proof below).Proposition 3.9.For a set of EUF predicates G, if m .= |terms(G)|, then the value of maxDerivDepth T (G) for the theory is bound by 3m.
Proof.We first determine the derivDepth T (G) for any set of predicates in this theory.
Given a set of EUF predicates G, and two terms t 1 and t 2 in terms(G), we need to determine the maximum number of iterations in step (2) of DP T (G) to derive Recall that the congruence closure algorithm(described in Figure 6) is a decision procedure for the theory of EUF.At any point in the algorithm, the terms in G are partitioned into a set of equivalence classes.The operation EC (t 1 ) = EC (t 2 ) is used to determine if t 1 and t 2 belong to the same equivalence class.
One way to maintain an equivalence class C .= {t 1 , . . ., t n } is to keep an equality t i = t j between every pair of terms in C. At any point in the congruence closure algorithm, the set of equivalence classes corresponds to a set of equalities C = over terms.Then EC (u) = EC (v ) can be implemented by checking if u = v ∈ C = .Although this is certainly not an efficient representation of equivalence classes, this representation allows us to build SDP T for this theory.
Let us implement the C ′ = .
= merge(C = , t 1 , t 2 ) operation that takes in the current set of equivalence classes C = , two terms t 1 and t 2 that are merged and returns the set of equalities C ′ = denoting the new set of equivalence classes.This can be implemented using the step (2) of the DP T algorithm as follows: (1) All these steps can be performed in one iteration of step 2.
(3) For every u ∈ EC (t 1 ) and every v ∈ EC (t 2 ), add the edge u = v to C ′ = by either of the two transitive rules (u = v : If there are m distinct terms in G, then there can be at most m merge operations, as each merge reduces the number of equivalence classes by one and there were m equivalence classes at the start of the congruence closure algorithm.Each merge requires three iterations of the step (2) of the DP T algorithm to generate the new equivalence classes.Hence, we will need at most 3m iterations of step (2) of DP T to derive any fact t 1 = t 2 that is implied by G = .
Observe that this decision procedure DP T for EUF does not need to derive a predicate t 1 = t 2 from G, if both t 1 and t 2 do not belong to terms(G).Otherwise, if one generates t 1 = t 2 , then the infinite sequence of predicates f (t 1 ) = f (t 2 ), f (f (t 1 )) = f (f (t 2 )), . . .can be generated without ever converging.
Again, since maxDerivDepth T (G) is the maximum derivDepth T (G ′ ) for any subset G ′ ⊆ G, and any G ′ can have at most m terms, maxDerivDepth T (G) is bounded by 3m.We also believe that a more refined counting argument can reduce it to 2m, because two equivalent classes can be merged simultaneously in the DP T algorithm.

3.3.1.
Complexity of SDP T .The run time and size of expression generated by SDP T depend both on maxDerivDepth T (G) for the theory and also on the maximum number of predicates in W at any point during the algorithm.The maximum number of predicates in W can be at most m(m − 1)/2, considering equality between every pair of term.The disequalities are never used except for generating contradictions.It is also easy to verify that the size of S(g) (used in step (2) of SDP T ) is polynomial in the size of input.Hence the run time of SDP T for EUF and the size of the shared expression returned by the procedure is polynomial in the size of the input.
3.4.SDP for Difference Logic.Difference logic is a simple yet useful fragment of linear arithmetic, where predicates are of the form x ⊲⊳ y + c, where x, y are variables, ⊲⊳∈ {<, ≤} and c is a real constant.Any equality x = y + c is represented as a conjunction of x ≤ y + c and y ≤ x − c.The variables x and y are interpreted over real numbers.The function Figure 7: Inference rules for Difference logic.
symbol "+" and the predicate symbols {<, ≤} are the interpreted symbols of this theory.Figure 7 presents the inference rules for this theory 4 .Given a set G of difference logic predicates, we can construct a graph where the vertices of the graph are the variables in G and there is a directed edge in the graph from x to y, labeled with (⊲⊳, c) if x ⊲⊳ y + c ∈ G.We will use a predicate and an edge interchangeably in this section.
n] c i and either (i) all the edges in the cycle are ≤ edges and d < 0, or (ii) at one edge is an < edge and d ≤ 0.
It is well known [CLR90] that a set of difference predicates G is unsatisfiable if and only the graph constructed from the predicates has a simple illegal cycle.Alternately, if we add an edge (⊲⊳, c) between x and y for every simple path from x to y of weight c (⊲⊳ determined by the labels of the edges in the path), then we only need to check for simple cycles of length two in the resultant graph.This corresponds to the rules (C) and (D) in Figure 7.
For a set of predicates G, a predicate corresponding to a simple path in the graph of G can be derived within lg(m) iterations of step (2) of DP T procedure, where m is the number of variables in G (see proof below).
Proposition 3.11.For a set of DIF predicates G, if m is the number of variables in G, then maxDerivDepth T (G) for the DIF theory is bound by lg(m).
the original graph of G, then after lg(m) iterations of the loop in step (2), there is a predicate x ⊲⊳ ′ y + c in W ; where c = Σ i∈[1,n−1] c i and ⊲⊳ ′ is < if at least one of ⊲⊳ i is < and ≤ otherwise.This is because if there is a simple path between x and y through edges in G with length (number of edges from G) between 2 i−1 and 2 i , then the algorithm DP T generates a predicate for the path during iteration i.
However, DP T can produce a predicate x ⊲⊳ y + c, even though none of the simple paths between x and y add up to this predicate.These facts are generated by the non-simple paths that go around cycles one or more times.Consider the set G .= {x < y + 1, y < x − 2, x < 4 Constraints like x ⊲⊳ c are handled by adding a special variable x0 to denote the constant 0, and rewriting the constraint as In this case we can produce the fact y < z − 3 from y < x − 2, x < z − 1 and then x < z − 2 from y < z − 3, x < y + 1.
To prove the correctness of the DP T algorithm, we will show these additional facts can be safely generated.Consider two cases: • Suppose there is an illegal cycle in the graph.In that case, after lg(m) steps, we will have two facts x ⊲⊳ y + c and y ⊲⊳ x + d in W such that they form an illegal cycle.Thus DP T returns unsatisfiable.• Suppose there are no illegal cycles in the original graph for G.For simplicity, let us assume that there are only < edges in the graph.A similar argument can be made when ≤ edges are present.
In this case, every cycle in the graph has a strictly positive weight.A predicate x ⊲⊳ y+d can be generated from non-simple paths only if there is a predicate x ⊲⊳ y + c ∈ G such that c < d.The predicate x ⊲⊳ y + d can't be a part of an illegal cycle, because otherwise x ⊲⊳ y + c would have to be part of an illegal cycle too.Hence DP T returns satisfiable.
Note that we do not need any inference rule to weaken a predicate, X < Y + D : − X < Y + C, with C < D. This is because we use the predicates generated only to detect illegal cycles.If a predicate x < y + c does not form an illegal cycle, then neither does any weaker predicate x < y + d, where d ≥ c.

3.4.1.
Complexity of SDP T .Let c max be the absolute value of the largest constant in the set G. We can ignore any derived predicate in of the form x ⊲⊳ y + C from the set W where the absolute value of C is greater than (m − 1) * c max .This is because the maximum weight of any simple path between x and y can be at most (m − 1) * c max .Again, let const(g) be the absolute value of the constant in a predicate g.The maximum weight on any simple path has to be a combination of these weights.Thus, the absolute value of the constant is bound by: C ≤ min{(m − 1) * c max , Σ g∈G const(g)} The maximum number of derived predicates in W can be 2 * m 2 * (2 * C + 1), where a predicate can be either ≤ or <, with m 2 possible variable pairs and the absolute value of the constant is bound by C.This is a pseudo polynomial bound as it depends on the value of the constants in the input.
However, many program verification queries use a subset of difference logic where each predicate is of the form x ⊲⊳ y or x ⊲⊳ c.For this case, the maximum number of predicates generated can be 2 * m * (m − 1 + k), where k is the number of different constants in the input.

Combining SDP for saturation theories
In this section, we provide a method to construct a symbolic decision procedure for the combination of saturation theories T 1 and T 2 , given SDP for T 1 and T 2 .The combination is based on an extension of the Nelson-Oppen (N-O) framework [NO79] that constructs a decision procedure for the theory T 1 ∪ T 2 using the decision procedures of T 1 and T 2 .
We assume that the theories T 1 and T 2 have disjoint signatures (i.e., they do not share any function symbol), and each theory T i is convex and stably infinite5 .Let us briefly explain the N-O method for combining decision procedures before explaining the method for combining SDP .4.1.Nelson-Oppen method for Combining Decision Procedures.Given two theories T 1 and T 2 , and the decision procedures DP T 1 and DP T 2 , the N-O framework constructs the decision procedure for T 1 ∪ T 2 , denoted as DP T 1 ∪T 2 .
To decide an input set G, the first step in the procedure is to purify G into sets G 1 and G 2 such that G i only contains symbols from theory T i and G is satisfiable if and only if G 1 ∪ G 2 is satisfiable.Consider a predicate g .= p(t 1 , . . ., t n ) in G, where p is a theory T 1 symbol.The predicate g is purified to g ′ by replacing each subterm t j whose top-level symbol does not belong to T 1 with a fresh variable w j .The expression t j is then purified to t ′ j recursively.We add g ′ to G 1 and the binding predicate w j = t ′ j to the set G 2 .We denote the latter as binding predicate because it binds the fresh variable w j to a term t ′ j .Let V sh be the set of shared variables that appear in G 1 ∩ G 2 .A set of equalities ∆ over variables in V sh is maintained; ∆ records the set of equalities implied by the facts from either theory.Initially, ∆ = {}.
Each theory T i then alternately decides if DP T i (G i ∪ ∆) is unsatisfiable.If any theory reports unsatisfiable, the algorithm returns unsatisfiable; otherwise, the theory T i generates the new set of equalities over V sh that are implied by G i ∪ ∆6 .These equalities are added to ∆ and are communicated to the other theory.This process is continued until the set ∆ does not change.In this case, the method returns satisfiable.Let us denote this algorithm as DP T 1 ∪T 2 .
There can be at most |V sh | irredundant equalities over V sh , therefore the N-O loop terminates after |V sh | iterations for any input.4.2.Combining SDP using Nelson-Oppen method.We will briefly describe a method to construct the SDP T 1 ∪T 2 by combining SDP T 1 and SDP T 2 .As before, the input to the method is the pair (G, E) and the output is an expression t[e].The facts in E are also purified into sets E 1 and E 2 and the new binding predicates are added to either G 1 or G 2 .
Our goal is to symbolically encode the runs of the N-O procedure for G ′ ∪ E, for every G ′ ⊆ G.For any equality predicate δ over V sh , we maintain an expression ψ δ that records all the different ways to derive δ (initialized to false).We also maintain an expression ψ e to record all the derivations of e (initialized to false).
The N-O loop operates just like the case for constructing DP T 1 ∪T 2 .The SDP T i for each theory T i now takes (G i ∪ ∆, E i ) as input, where ∆ is the set of equalities over V sh derived so far.In addition to computing the (shared) expression t[e] as before, SDP T i also returns the expression t[(δ, ⊤)], for each equality δ over V sh that can be derived in step (2) of the SDP T algorithm.
The leaves of the expressions t[e] and t[(δ, ⊤)] are G i ∪∆ (since leaves for E i are replaced with true).We substitute the leaves for any δ ∈ ∆ with the expression ψ δ , to incorporate the derivations of δ until this point.We also update ψ δ ← (ψ δ ∨ t[(δ, ⊤)]) to add the new derivations of δ.Similarly, we update ψ e ← (ψ e ∨ t[e]) with the new derivations.
The N-O loop iterates |V sh | number of times to ensure that it has seen every derivation of a shared equality over V sh from any set After the N-O iteration terminates, ψ e contains all the derivations of e from G.However, at this point, there are two kind of predicates in the leaves of ψ e ; the purified predicates and the binding predicates.If g ′ was the purified form of a predicate g ∈ G, we replace the leaf for g ′ with b g .The leaves of the binding predicates are replaced with true, as the fresh variables in these predicates are really names for subterms in any predicate, and thus their presence does not affect the satisfiability of a formula.Let t[e] denote the final expression for ψ e that is returned by SDP T 1 ∪T 2 .Observe that the leaves of t[e] are variables in B G .
Theorem 4.2.For two convex, stably-infinite and signature-disjoint theories T 1 and T Since the theory of EUF and DIF satisfy all the restrictions of the theories of this section, we can construct an SDP for the combined theory that still runs in pseudo-polynomial time.

Implementation and Results
We have implemented a prototype of the symbolic decision procedure for the combination of EUF and DIF theories.To construct F P (e), we first build a BDD (using the CUDD [CUD] BDD package) for the expression t[e] (returned by SDP T (P ∪ P , E )) and then enumerate the cubes from the BDD.
Creating the BDD for the shared expression t[e] and enumerating the cubes from the BDD can have exponential complexity in the worst case.This is because the expression for F P (e) can involve an exponential number of cubes (e.g. the example in Fig 8).However, most problems in practice have a few cubes in F P (e).Secondly, as the number of leaves of t[e] (alternately, number of BDD variables) is bound by |P |, the size of the overall BDD is usually small, and is computed efficiently in practice.Finally, by generating only the prime implicants 7 of F P (e) from the BDD, we obtain a compact representation of F P (e).
We report preliminary results evaluating our symbolic decision procedure based predicate abstraction method on a set of software verification benchmarks.The benchmarks are generated from the predicate abstraction step for constructing Boolean Programs from C programs of Microsoft Windows device drivers in SLAM [BMMR01].
We compare our method with two other methods for performing predicate abstraction: : DP-based: This method uses the decision procedure zapato [BCLZ04] to enumerate the set of cubes that imply e. Various optimizations (e.g.considering cubes in 7 For any Boolean formula φ over variables in V , prime implicants of φ is a set of cubes C .= {c1, . . ., cm} over V such that φ ⇔ W c∈C c and two or more cubes from C can't be combined to form a larger cube.increasing order of size) are used to prevent enumerating exponential number of cubes in practice.: UCLID-based: This method performs quantifier-elimination using incremental SATbased methods [LBC03].The procedure works by first converting the problem into an existential quantifier elimination problem in first-order logic and then reducing it to Boolean quantifier elimination by using an encoding to Boolean logic.Finally, it uses SAT-based methods for performing Boolean quantification.To compare with the DP-based method, we generated 665 predicate abstraction queries from the verification of device-driver programs.Most of these queries had between 5 and 14 predicates in them and are fairly representative of queries in SLAM.The run time of DP-based method was 27904 seconds on a 3 GHz.machine with 1GB memory.The run time of SDP -based method was 273 seconds.This gives a little more than 100X speedup on these examples, demonstrating that our approach can scale much better than decision procedure based methods.We have not been able to run UCLID-based method on these particular SLAM benchmarks; the UCLID-based tool is no longer actively maintained, and we had trouble translating these SLAM benchmarks to input of UCLID.From our earlier experience of using UCLID on similar benchmarks (Fig. 3 in [LBC03]), we believe that most of these benchmarks can be solved within a few seconds, and the total runtime would not differ by more than 2-3X (in favor of the current technique).
To compare with UCLID-based approach, we generated different instances of a problem (see Figure 8 for the example) where P is a set of equality predicates representing n diamonds connected in a chain and e is an equality a1 = dn.We generated different problem instances by varying the size of n.For an instance with n diamonds, there are 5n − 1 predicates in P and 2 n cubes in F P (e) to denote all the paths from a1 to dn. Figure 8 shows the result comparing both the methods.We should note that UCLID method was run on a slightly slower 2GHz machine.The results illustrate that our method scales much better than the SAT-based enumeration used in UCLID for this example.Intuitively, UCLID-based approach grows exponentially with the number of predicates (2 |P | ), whereas our approach only grows exponentially with the number of diamonds (2 n ) in the result.

Conclusions and future work
In this paper, we have presented the concept of symbolic decision procedures and showed its use for predicate abstraction.We have provided an algorithm for synthesizing a SDP for any bounded saturation theory.We show that such SDP exists for interesting theories such as EUF and difference logic.These SDP construct a shared expression and run with polynomial and pseudo-polynomial complexity respectively.Finally, we have provided a method for constructing the SDP for simple mixed theories using an extension of the Nelson-Oppen combination framework.Preliminary results comparing it some of the existing approaches are encouraging.
There are several avenues of future work, some of which are outlined below: • First, it is interesting to find out how to construct a SDP for other theories, including the theory of linear arithmetic (over rationals).For linear arithmetic, one can perform a "symbolic" Fourier-Motzkin [DE73] elimination procedure to construct an SDP -the inference rule would eliminate a variable from all the predicates in a given level.However, it is not clear how to generate implied equalities from such a procedure to combine the SDP with SDP for other theories.• Second, as the example in Figure 5 illustrated, there are a lot of redundant derivations present in the resultant expression.The algorithm will benefit from optimizations that can minimize such redundant derivations.• Extend the combination of SDPs to non-convex theories.

Figure 1 Figure 1 :
Figure1defines the syntax of a quantifier-free fragment of first-order logic.An expression in the logic can either be a term or a formula.A term can either be a variable or an

Figure 3 :
Figure 3: DP T (G): A simple saturation-based procedure for theory T .We use m ∈ [i, j]to denote i ≤ m ≤ j.
Create the derivations for the goal e as t[e] ← d∈S(e) d (4) Return the shared expression for t[e].

Figure 4 :
Figure 4: Symbolic decision procedure SDP T (G, E ) for theory T .The expression e stands for e i ∈E e i .
Figure 5 demonstrates the working of the SDP (G, E ) for a simple example.The predicates in G .= {a = b, b = c, a = d, d = c} and E .

Figure 5 :
Figure 5: Example of SDP, where G .= {a = b, b = c, a = d, d = c} and E .= {a = c}.The diamond connective represents conjunction, and multiple incoming edges to a node represents a disjunction.The node corresponding to the predicate g at level i represents t[(g, i)].The figure omits several nodes and edges at each level to make the diagram readable.

Figure 6 :
Figure 6: Simple description of the congruence closure algorithm.

Figure 8 :
Figure 8: Result on diamond examples with increasing number of diamonds.The expression e is (a1 = dn).A "-" denotes a timeout of 1000 seconds.