Extracting verified decision procedures: DPLL and Resolution

This article is concerned with the application of the program extraction technique to a new class of problems: the synthesis of decision procedures for the classical satisfiability problem that are correct by construction. To this end, we formalize a completeness proof for the DPLL proof system and extract a SAT solver from it. When applied to a propositional formula in conjunctive normal form the program produces either a satisfying assignment or a DPLL derivation showing its unsatisfiability. We use non-computational quantifiers to remove redundant computational content from the extracted program and translate it into Haskell to improve performance. We also prove the equivalence between the resolution proof system and the DPLL proof system with a bound on the size of the resulting resolution proof. This demonstrates that it is possible to capture quantitative information about the extracted program on the proof level. The formalization is carried out in the interactive proof assistant Minlog.


Introduction
In order for verification tools to be used in an industrial context they have to be trusted to a high degree and in many cases are required to be certified.We present a new application of program extraction to develop a formally verified decision procedure for the satisfiability problem for propositional formulae in conjunctive normal form.The procedure is based on the DPLL proof system [17,16] which is also the basis of most contemporary SAT solvers that are used in an industrial context.
The need for verified SAT solvers is obvious; they are part of safety critical software, and also used for the verification and certification thereof.SAT solvers are nowadays highly optimized for speed, which makes the introduction of errors (in the process of optimization) more likely, and their verification more difficult.Besides the correctness also totality (or universality) of SAT solvers is an issue.For example, in the 2012 SAT competition (www.smtcomp.org)many systems were not total in the sense that they returned "Unknown" for certain inputs signifying that they could not deal with the given problem.
In this paper we report about the extraction of a SAT solver that is both correct and total by construction.In addition, it produces in the unsatisfiable case a formal proof of this fact, which is recognized in the SAT community as a highly desirable feature of SAT solvers.To be more precise, we formalize a correctness and completeness proof of the DPLL proof system in the interactive theorem prover Minlog, and use Minlog's program extraction facilities to obtain a formally verified SAT solving algorithm.When run on a CNF formula it produces a model satisfying the formula or a DPLL derivation showing its unsatisfiability.We also prove the equivalence of DPLL and resolution and extract a program translating DPLL proofs into resolution proofs of smaller or equal size.
Minlog [31,2,4] is an interactive proof assistant based on a first-order natural deduction calculus.It implements various methods of program extraction such as realizability [23] (which can be viewed as a technical rendering of the Curry-Howard correspondence [15,20]) and the Dialectica interpretation.It also extends program extraction to classical proofs via the Friedman/Dragalin A-translation.All these techniques are refined and optimized in order to improve usability and to obtain simpler programs.In addition to extracting a program from a proof, Minlog also automatically extracts a proof that the program meets its specification; see for instance [42] for an overview on program extraction and its underlying theory.A number of substantial case studies on program extraction have been carried out reaching from the extraction of a normalization-by-evaluation algorithm [3] to the extraction of programs in constructive analysis [41].Recent developments concentrate on program extraction for induction and coinduction, including applications in the context of exact real number computation [5].
An optimization in Minlog that is particularly important for this paper is the use of so-called non-computational quantifiers, which flag certain information in the proof as computationally irrelevant, and therefore allow for the removal of computational redundancy in the extracted program.In case of the extracted SAT solver, this leads to a significant improvement.
We also applied an automatic translation of Minlog terms into Haskell code to the extracted program and observed a further dramatic improvement of performance.We evaluate the performance of our extracted solver by comparing it 1) with another verified SAT solver, Versat [36], using Pigeon hole formulae and 2) with an industrial tool, SCADE [1], by means of an example from the railway domain.
An earlier version of this article, containing partial results, was reported at the MFPS 2012 [25] conference.

Related Work.
There are several other systems supporting program extraction from proofs for the purpose of producing formally verified programs.An early example is the Nuprl system [13]; other mature interactive theorem provers that implement program extraction are Coq [8], which is based on the Calculus of Inductive Constructions, and Isabelle [35], a generic theorem prover with extensions for many logics (see [7] for code generation and [6] for program extraction from proofs in Isabelle).More recently, other interactive theorem provers based on dependent types [30], such as Agda [11] and Idris [12], have emerged which realize the Curry-Howard correspondence and therefore can also be viewed as supporting program extraction.
The Coq system has been used in several approaches to formalize automatic theorem proving.Lescuyer and Conchon [26] program a SAT solver based on the DPLL algorithm as a recursive function in Coq, and verify its soundness and completeness formally in the system.The solver is then instantiated on the propositional fragment of Coq's logic, creating a user friendly proof tactic.Similarly, Verma et al. [46] formalize Binary Decision Diagrams in Coq, prove their correctness, and extract certified BDD algorithms in OCaml.The main reason for their formalization was to integrate symbolic model checking in Coq.Significant work has also been performed in Isabelle with several decision procedures verified and integrated into the system.The DPLL algorithm has been formalized by Marić and Janičić [28].This approach was extended to formalize a SAT solver including optimizations such as clause learning and the lazy two-watched-literal data structure [27].The authors investigated automatic code generation, but in the end the verified algorithm was manually translated into C code.The automatic theorem prover Metis [37] is used inside Isabelle to reconstruct proofs from faster external procedures such as the ones used in Sledgehammer [10].A different direction to deal with the correctness of SAT solvers has been to verify a proof checker for resolution proofs [48].This will check and guarantee that the output from a solver for a particular SAT problem is correct.
The DPLL solver Versat [36], mentioned earlier, was formalized and verified in the dependently typed programming language Guru [45] and then translated into imperative C code.This translation is possible because Guru contains mutable arrays.Since Guru allows for the verification of low level optimizations involving such arrays and Versat implements clause learning, the resulting solver is quite efficient.However, this approach differs from ours in that only soundness has been proven for Versat, whilst we have the possibility to deliver a proof in the case of unsatisfiability.This means that while every satisfiable assignment produced by Versat can be trusted, it is not guaranteed that Versat can solve every solvable problem.
A program extraction project related to ours was carried out by Weich [47] who gave two constructive proofs of the decidability of intuitionistic propositional logic and extracted two different programs that, for a given formula, either produce a derivation in intuitionistic sequent calculus, or a Kripke counter-model.The second proof and program extraction were formalized in Minlog for the implicational fragment.
The articles [26,28] verifying a DPLL SAT solver (in both Coq and Isabelle) were the main motivation for our work.Their approaches involve a formalization of the algorithm to be verified.In contrast, we work in a system that does not require any formalization of algorithms.It is enough to prove that each CNF-formula is either unsatisfiable or has a model.The desired SAT solving algorithm and its correctness proof are then extracted fully automatically.
(1) A literal l is either a positive variable +v or a negative variable −v, i.e. a variable v with a label + or − attached.(2) For every literal l we define the opposite literal l by +v = −v, −v = +v.(5) A formula in conjunctive normal form (CNF) is a finite conjunction of clauses.By a formula ∆ we will always mean a formula in CNF, and we will identify it with a finite set of clauses {C 1 , . . ., C k }, representing the conjunction of the C i .(6) A valuation Γ is a finite set of literals to be viewed as their conjunction.(7) A valuation Γ is consistent if ∀l (l ∈ Γ → l / ∈ Γ).We let Cons denote the set of all consistent valuations.(8) A model is a total function M which maps literals1 to booleans and satisfies the property ∀l (M l ↔ ¬M l).We shall use the abbreviations We call a valuation Γ and a formula ∆ compatible if there exists a model satisfying both, i.e. ∃M (M |= Γ ∧ M |= ∆); otherwise Γ and ∆ are called incompatible.
A sequent Γ ⊢ ∆ is a pair consisting of a valuation and a formula.The intended meaning of a sequent Γ ⊢ ∆ is that Γ and ∆ are incompatible.As a special case, when Γ is empty, ⊢ ∆ means that ∆ is unsatisfiable.In the following we use the notations Definition 2.2 (DPLL Proof System).The DPLL proof system consists of five rules: Several variants of the DPLL proof system have featured in the literature.The above definition is closest to the Coq formalisation [26], other formalisations such as [28] and [19] combine the Unit, Red and Elim rules to form a single rule called the "1-literal rule" or "unit propagation".

Soundness and Completeness of DPLL.
In this section we sketch the formal proof of soundness and completeness of the DPLL proof system.We will be very brief with the Soundness Theorem since its proof does not carry computational content and a similar proof is carried out in [26,28].On the other hand, we will describe the proof of the Completeness Theorem in some detail since we extract our SAT solver from it.
We first reformulate the DPLL proof system as an inductive definition that can be immediately formalized in the Minlog system.The definition has a clause for each rule.We notationally identify a sequent Γ ⊢ ∆ with the statement 'Γ ⊢ ∆ is derivable'.
Remark 3.1.The proof system described in Definition 2.2 has been reformulated for our theorem prover.The set of sequents Γ ⊢ ∆ is defined inductively by the following (universally quantified) inductive clauses: The proof proceeds by structural induction on the given derivation of the sequent Γ ⊢ ∆.We omit further details.
We now turn our attention to the Completeness Theorem for the DPLL proof system.The expected statement of completeness is: A constructive proof of this statement would yield a program that computes a DPLL proof for incompatible Γ, ∆.We reformulate the statement by replacing the implication 'incompatible(Γ, ∆) → Γ ⊢ ∆' with the classically equivalent but constructively stronger disjunction 'compatible(Γ, ∆) ∨ Γ ⊢ ∆'.In this way, we obtain an enhanced program that still computes a DPLL proof for incompatible Γ, ∆, but in addition produces a model if Γ and ∆ are compatible.
Proof.We aim to perform the proof in such a way that an efficient program is extracted.Therefore, we adopt the following strategy: (1) Since performing a Split rule is the only computational expensive operation -it is the only rule forcing the proof search to branch -we only apply it if absolutely necessary.(2) We perform an optimization on the proof level by partitioning the clauses into 'clean' and 'unclean' clauses, where a clause is called clean if we cannot apply Elim, Red or Unit to that clause.This increases the efficiency of the algorithm by reducing the number of comparisons needed.To this end we show that for all valuations Γ, and formulae ∆, Θ, The proof is by main induction on the measure where and a side induction on |∆| (i.e. the number of clauses in ∆).
Similarly, we can apply the induction hypothesis to (Γ, l), Θ, and ∅ yielding The disjunctions (3.1) and (3.2) result in 4 cases: In the case that Γ, l ⊢ Θ and Γ, l ⊢ Θ hold the Split rule is applied and we obtain Γ ⊢ Θ.In all other cases we use one of the models obtained from the induction hypotheses.
We perform a case distinction on whether the valuation Γ has a literal in common with C.
We perform a further case distinction on the cardinality of the clause C.
In the case that Γ, l ⊢ ∆ ′ ∪ Θ holds we apply the Unit rule resulting in Γ ⊢ ∆ ∪ Θ.In the other case we have a model of Γ, l and ∆ ′ ∪ Θ which clearly also models Γ and ∆ ∪ Θ.
We perform a case distinction on ∃l (l ∈ C ∧ l ∈ Γ) ∨ ¬∃l(l ∈ C ∧ l ∈ Γ).This disjunction can be proven constructively, since the sets involved are finite.
In this case we may move C from ∆ to Θ: Since µ(Γ; ∆ ′ ; (Θ, C)) ≤ µ(Γ; (∆ ′ , C); Θ) we can apply the side induction hypothesis to Γ, ∆ ′ , (Θ, C).Since for these values the hypotheses of the theorem are satisfied we obtain We can prove constructively that in this case Γ and C have some literal l in common.We apply the induction hypothesis to Γ, (∆ ′ , (C \ l)), Θ.Since clearly the measure decreases, and the hypotheses of the theorem are satisfied, we obtain Γ In the first case we apply the Elim rule, in the second case we use the model provided.

3.2.
Resolution.The resolution proof system [39] is widely used in practical applications, for instance in tools for proof checking and debugging [44] or interchange between different solvers [22].State-of-the-art SAT solvers such as MiniSAT [18] and zChaff [33] return (extended) resolution proofs for unsatisfiable problems.By formalizing that every DPLL derivation has an equivalent resolution derivation, and combining this result with the completeness proof from the previous section, we can extract a SAT solver which produces resolution derivations.The equivalence of DPLL and resolution was first shown by Robinson [40] who translated between the two proof systems using semantic trees.
By enriching the systems with size information we are able to show that the size of the resulting resolution proof does not exceed the size of the original DPLL proof.
For every valuation Γ we define a clause Γ representing its negation by {l 1 , . . ., l k } = {l 1 , . . ., l k }. ( We also need a version of the DPLL proof system with added bounds in order to speak about the sizes of the proofs. Theorem 3.7 (DPLL implies Resolution).For all consistent valuations Γ, CNF formulae ∆ and natural numbers n: Proof.The proof is an easy induction on DPLL derivations.We only sketch the overall idea.The Conflict and Split rule translate into the Sub and Res rule respectively.Both of these rules have the same cost to perform them as the DPLL rules and so the size of the derivations are less or equal.An application of the Unit rule is a special case of the Res rule in which one of the branches is obtained via a subsumption of a unit clause.The size of these two proofs is less or equal since the cost of performing the Sub rule and Res rule together is the same as that of the Unit rule.Finally, both the Elim and Red DPLL rules correspond to a form of weakening in the resolution proof which is done at no cost because the resulting resolution proofs are smaller in size than the DPLL proofs.Remark 3.8.One can also easily prove that resolution implies DPLL, more precisely, if ∆ ⊢ Res C, then C ⊢ DPLL ∆.However, as long as the sizes of derivations are measured only in terms of the number of applications of rules (as we do above), no size bound can be given.The reason is that the translation of one instance of the subsumption rule into DPLL requires n applications of the Red rule where n is the number of literals in C.
The Completeness Theorem for DPLL (Theorem 3.3), adapted to the DPLL system with size information, and Theorem 3.7 (a) immediately imply: Theorem 3.9 (Completeness of the Resolution Proof System).

∀∆ ((∃M
The program extracted from Theorem 3.7 translates DPLL derivations into equivalent resolution derivations.This translator and the SAT solver extracted from the Completeness Theorem for DPLL (Theorem 3.3) are combined in the program extracted from Theorem 3.9 to a SAT solver that yields resolution refutations for unsatisfiable formulae.Since the computationally hard and interesting part of this program is entirely contained in the DPLLbased SAT solver, we will restrict our attention to the latter when we discuss the extracted programs in detail in Sect. 5.

4.1.
Theory.Program extraction in Minlog is based on modified realizability [23].We highlight a few aspects that are important to understand the optimizations we achieved.For a complete and precise description of program extraction we refer to [42].A formula is said to have computational content if it has at least one occurrence of ∃ or ∨ at a strictly positive position.To every such formula A one assigns a type τ (A) of 'potential realizers'.If the formula has no computational content, one sets τ (A) = ǫ.From a proof of a formula A with computational content one can extract a program M of type τ (A) that realizes A (written M r A), that is, M solves the computational problem expressed by A. In order to fine-tune the computational content, in particular to remove redundant content, Minlog offers, besides the usual quantifiers ∀ and ∃, the non-computational (nc) quantifiers ∀ nc and ∃ nc (which roughly correspond to quantification in Prop in Coq).These have the same logical meaning as the usual quantifiers, but indicate that the extracted program does not operate on the quantified variable, only on its realizer.The definitions of the type and the realizability relations for the ordinary universal quantifier contrasted with its nc version are: a r ∀ nc x ρ A = ∀x ρ (a r A) Similarly for the two versions of the existential quantifier: One sees that for the nc-quantifiers the realizers do not depend on the quantified variables.The program extraction procedure respects the different kind of quantifiers by omitting in the nc case any information corresponding to the quantified variable.The proof rules for the nc-quantifiers are subject to stricter variable conditions ensuring that the omitted information is indeed not needed in the extracted program.Minlog is able to automatically detect the maximal set of occurrences of quantifiers in a proof that can be made noncomputational without compromising the correctness of the proof [38].This holds for the logical parts of the proof only.In the formalization of inductive definitions one has to manually place ∀ nc quantifiers.

Extraction to
Haskell.The programs extracted by Minlog are terms in Minlog's internal term language.This has the advantage that extracted programs can be reused for further proofs, and properties of the programs can be formally proven, again inside Minlog.Furthermore, the extracted programs are provably correct, and a (soundness) proof of this fact is automatically generated by Minlog.However, there are also inherent disadvantages: the interoperability of the extracted programs with external libraries or devices is limited, and executing the programs is sometimes slow.For both these reasons, it makes sense to translate the extracted programs into more conventional, general-purpose programming languages.Minlog implements a translation to Haskell (and also a limited translation to Scheme).Extracting to a lazy language such as Haskell makes the treatment of coinduction and corecursion (which is not used in our example) particularly simple [32].
There is a close fit between Haskell and the Minlog term language, and the translation is quite straightforward; basic terms such as variables, lambda abstractions, etc are translated to the corresponding Haskell terms.Standard algebras such as e.g.lists, integers, booleans, sum and product types are translated to their implementation in the Haskell Prelude, while user-defined algebras in general are translated to algebraic data types.Natural numbers are translated to (unbounded) integers for efficiency. 2 Program constants and their computation rules in Minlog correspond to functions defined by pattern matching in Haskell.Some care must be taken for e.g. the natural numbers; in Minlog, pattern matching on natural numbers is possible, but natural numbers are translated to integers, for which no pattern matching is available in Haskell.Instead guard conditions have to be used.Recursion operators, realizing structural induction, are automatically generated as Haskell functions by the translation.Minlog also supports general recursion along a decreasing measure, which makes sure that the program terminates.The Minlog implementation of the general recursion operator ensures that recursive calls are only made on arguments that are smaller than the current argument with respect to the measure: (inhab τ is a canonical inhabitant of τ , justified by the fact that all domains are inhabited in the intended, standard semantics).Note that the (potentially expensive) test µ(y) < µ(x) is computationally unnecessary, since at runtime we already know that our extracted program will only use recursive calls on smaller arguments.However, this test is needed because of Minlog's eager evaluation strategy.Omitting the test: would make Minlog get stuck in an endless loop, forever evaluating the recursive call gRec(y, f ) regardless of whether it is going to be used or not.However, since Haskell is a lazy language, we can safely implement general recursion using (4.1).This can give large efficiency gains in certain situations (see Section 6.1).In a lazy setting, soundness of this variant of the program extraction process can still be proven, and the Haskell translation supports this optimization.However, there is now a discrepancy between Minlog programs and their Haskell translations: if called in a way that does not respect the measure, the Minlog implementation of gRec will halt with an arbitrary value, while the Haskell version will diverge.For this reason, the optimization can be turned on and off with a switch, if identical behavior is important.Of course, every extracted term will respect the measure.

The Extracted Program
The size of the DPLL formalization is approximately 5500 lines of Minlog code.The extracted program comes to 300 lines of code as a Minlog term and 600 lines of Haskell code.In the following we present two versions of our extracted solver: one optimized with ∀ nc quantifiers which we shall refer to as the ∀ nc solver, and the other without these optimizations which we shall refer to as the ∀ solver.
The ∀ solver takes a CNF formula ∆ represented as a list of clauses as input, and produces either a model of ∆ or a derivation of its unsatisfiability.Models are represented as functions from literals to booleans.An algebraic data type for DPLL derivations is automatically generated from its inductive definition in Minlog.It has five constructors, one for each of the DPLL rules in Definition 3.1: Valu For Lit Algdpll Algdpll deriving (Show, Read, Eq, Ord) Each constructor takes a formula and a valuation as arguments.The formula itself never changes during the proof and is only part of the algebra for the purpose of proving correctness and does not play a role in any computation.While the valuation changes during the proof search, these changes can be captured by indicating which literal was added by the Unit and Split rules, thus making the valuation redundant as well.We added ncquantifiers to the definition by hand in order to remove redundant computational content, resulting in The control structure of the program closely follows the structure of the case distinctions and proofs by induction performed in the proof.Lemmas invoked during the proof are extracted separately and called as procedures.Since the proof is by general induction along a measure, the main body of the program is using general recursion along the same measure.

Execution of the Extracted Program
In the following we will see how both ∀ and ∀ nc solvers behave when they are applied to a number of SAT problems.The extracted decision procedure was run on several instances of the pigeon hole principle [14] in both Minlog and as Haskell programs.The pigeon hole principle states that there is no injective function that maps {1, 2 . . ., n} to {1, 2, . . ., n−1}.The unsatisfiable pigeon hole formulae are harder than the satisfiable formulae as they have a large search space that must be traversed entirely by the solver in order to construct a derivation.This difficulty can be seen -compare column 2 and 3 in Table 1 -when both the ∀ and ∀ nc solver are applied to the unsatisfiable pigeon hole formulae.The solver without the optimization takes considerably longer to construct a derivation of unsatisfiability.This is due to computationally irrelevant data being stored in the unoptimized derivations.
The next two columns of Table 1 present two versions of the ∀ nc solver when extracted to Haskell and compiled by the Glasgow Haskell Compiler (GHC).The first returns a witness of the result i.e. either a model which satisfies the formula or a derivation of its unsatisfiability.
The second returns only a Yes or No answer as to whether a formula is satisfiable or not.Due to the inherent laziness of Haskell the two programs differ quite dramatically in their behavior.The solver that returns a Yes/No answer performs considerably faster compared to the solver which produces the witness in addition.By using the Low Level Virtual Machine (LLVM) backend [24] for GHC, a further speed up was achieved, which can be seen in the last two columns of Table 1.We also compared the performance of our ∀ nc solver, compiled using the LLVM backend of GHC, with that of Versat [36].Our solver was run with the option of not computing a witness since Versat does generally not compute a proof.The results in Table 2 show that our solver is comparable with Versat.It is slower on the easier formulae and faster on the hardest pigeon hole formulae.This is because the clause learning optimization of Versat has some overhead and does not increase the performance on pigeon hole formulae.The point of the learned clauses is to reduce the search space for the solver.In this case, they instead consume more memory and time to compute.6.2.Industrial Case Study.The same version of our solver was also applied to the verification of a real world railway control system which was provided by our industrial partner Invensys Rail (now Siemens), via a description in Ladder logic.We adapted [21] to translate Ladder logic programs into Minlog/Haskell and the industrial tool SCADE [1], and also performed a comparison with Versat.The SAT problem is formulated to perform falsification checking, as described in [43], that is, a satisfying assignment represents a counter example, and an unsatisfiable result means the safety property can not be violated in the system.The size of our case study is 14726 clauses and 8166 variables.For comparison, we present the run-times for checking five safety conditions which show that two conflicting routes, out of a set of four routes R1, . . ., R4, can not be active in the railway at the same time.For each of the five conditions our solver produces a proof certifying that the safety property holds in approximately 7s.The SCADE suite can verify that each of the safety properties holds in less than one second (no greater accuracy of run-times provided by the system for this case).
While we cannot expect to compete with an industrial tool on speed and functionality, we have been able to solve a large practical problem in a reasonable amount of time.It is important to note that the solver inside the SCADE suite has not been formally verified whereas our solver has.Interestingly however, also Versat solves these problems in less than one second -see Table 3 for a comparison between our extracted solver and Versatthat is, we may conclude that optimizations such as clause-learning and the use of efficient data structures that enable to efficiently parse and identify (un-)satisfiability of a formula indeed improve the performance for this type of problems (and our extracted solver should be extended by these optimizations as well).

Conclusion
We have presented a conceptually new approach to the synthesis and verification of SAT algorithms that, in contrast to similar work in Coq and Isabelle [26,28] does not require the formalization of the SAT programs in the formal system, but obtains SAT algorithms purely by program extraction.To this end, we formalized the DPLL proof system and performed a constructive proof from which a correct SAT solving algorithm was extracted automatically.The extracted program attempts to show the (un)satisfiability of a propositional formula in conjunctive normal form.If the CNF formula is satisfiable it produces a model of the formula; otherwise it produces a derivation showing unsatisfiability.We strategically placed ∀ nc quantifiers into the proof to reduce the complexity of the extracted program and increase its performance.The solver containing ∀ nc quantifiers was extracted into the functional programming language Haskell, and the performance of the two solvers was evaluated using pigeon hole formulae.We have also shown how it is possible to extract a program that translates between DPLL and resolution proofs.This was done in such a way that we obtain some qualitative information about quantitative aspects of the extracted program i.e. computational complexity.Using this translation it was possible to extract a resolution solver based on the DPLL proof system.
Overall, our paper shows that the approach of developing verified programs via extraction from proofs is scalable to non-trivial applications.Furthermore, it demonstrates how to include efficiency considerations into this approach.For instance, we have avoided repeated unnecessary look-ups of clauses by the split of clause sets in two sets ∆ and Θ.This counters the often heard argument that with program extraction one 'loses the grip' on the program and its efficiency.It is important to note that these efficiency considerations do not compromise the correctness of the extracted program since these are applied at the proof level where correctness is guaranteed by the proof system.
We consider the fact that our approach does not require any formalization of algorithms a major advantage, since it means that program development via extraction can be carried out in a formal system that is much more lightweight than in the verification approach, where the term language must include a programming language, and the meaning of the programming constructs must be specified by axioms and proof rules.This advantage is particularly striking in applications in analysis [4,5] where corecursive exact real number algorithms (whose formalization and specification is non-trivial and subject of ongoing research) can be automatically extracted from proofs involving only coinductive definitions in the form of largest fixed points of predicate transformers.7.1.Future Work.There are two directions for further work: applying our method to extract a more advanced class of SAT solvers, and applying our approach to a different class of decision problems.
We are in the process of formalizing optimizations such as clause learning and conflict analysis [9,33,29].This requires a modification of the DPLL proof system such that it captures the additional behavior.A completeness theorem has then been proven for the modified calculus.We currently have extracted a prototype clause learning solver from this proof.In order for this solver to be an improvement on the previous one we need to lower the computational overhead resulting from clause learning.Such a solver would also benefit from lazy data structures such as the two-watched-literal scheme.It is unclear whether the inherent laziness of Haskell will provide the same effect as these data structures or if they would have to be formalized as part of the proof.
It is desirable to be able to solve not just propositional formulae but also first-order formulae.This is possible by extending SAT algorithms so that they can apply some background theory for first order formulae.Such algorithms are called Satisfiability Modulo Theories (SMT) solvers.We would have to formalize a proof system used by SMT solvers such as abstract DPLL [34] and then perform a completeness proof.A solver extracted from such a proof system would be able to solve a broader range of problems described in a language richer than propositional logic.7.2.Sources.The Minlog formalization optimized with ∀ nc quantifiers and its extracted program as Haskell code can be found at http://cs.swan.ac.uk/minlog/dpll/.

( 3 )
We set Var(+v) = Var(−v) = v, Var(L) = {Var(l) | l ∈ L} for a set of literals L, and Var(∆) = {Var(L) | L ∈ ∆} for a set of sets of literals ∆. (4) A clause C is a finite set of literals to be viewed as their disjunction.

Definition 3 . 4 (
Resolution Proof System).The derivable resolution sequents Γ n ⊢ Res C with a derivation of size n are conveniently defined by two rules: subsumption (or axiom) and resolution.

Remark 3 . 6 .ResC
The resolution proof system from Definition 3.4 has been reformulated as follows for our theorem prover.The derivable resolution sequents Γ n ⊢ with a derivation of size n are inductively defined by the following clauses:

Table 1 :
Performance in Minlog versus Haskell

Table 2 :
Performance compared to Versat Comparison of Program Performance.The ∀ solver and ∀ nc solver were compared using both unsatisfiable PHP(n + 1, n) and satisfiable PHP(n, n) pigeon hole formulae.

Table 3 :
Industrial case study: Extracted solver versus Versat