Efficient Full Higher-Order Unification

We developed a procedure to enumerate complete sets of higher-order unifiers based on work by Jensen and Pietrzykowski. Our procedure removes many redundant unifiers by carefully restricting the search space and tightly integrating decision procedures for fragments that admit a finite complete set of unifiers. We identify a new such fragment and describe a procedure for computing its unifiers. Our unification procedure, together with new higher-order term indexing data structures, is implemented in the Zipperposition theorem prover. Experimental evaluation shows a clear advantage over Jensen and Pietrzykowski's procedure.


Introduction
Unification is concerned with finding a substitution that makes two terms equal, for some notion of syntactic equality.Since the invention of Robinson's first-order unification algorithm [Rob65], it has become an indispensable tool in theorem proving, logic programming, natural language processing, programming language compilation and other areas of computer science.
Many of these applications are based on higher-order formalisms and require higher-order unification.Due to its undecidability and explosiveness, the higher-order unification problem is considered one of the main obstacles on the road to efficient higher-order tools.
One of the reasons for higher-order unification's explosiveness lies in flex-flex pairs, which consist of two variable-headed terms, e.g., F X ?= G a, where F , G, and X are variables and a is a constant.Even this seemingly simple problem has infinitely many incomparable unifiers.One of the first methods designed to combat this explosion is Huet's preunification [Hue75].Huet noticed that some logical calculi would remain complete if flex-flex pairs are not eagerly solved but postponed as constraints.If only flex-flex constraints remain, we know that a unifier must exist and we do not need to solve them.Huet's preunification has been used in many reasoning tools including Isabelle [NPW02], Leo-III [SB18], and Satallax [Bro12].However, recent developments in higher-order theorem proving [BBT + 19, BR19] require full unification-i.e., enumeration of unifiers even for flex-flex pairs, which is the focus of this article.
Jensen and Pietrzykowski's (JP) procedure [JP76] is the best known procedure for this purpose (Section 2).Given two terms to unify, it first identifies a position where the terms disagree.Then, in parallel branches of the search tree, it applies suitable substitutions, involving a variable either at the position of disagreement or above, and repeats this process on the resulting terms until they are equal or trivially nonunifiable.
Building on the JP procedure, we designed a new procedure (Section 3) with the same completeness guarantees (Section 4).The new procedure addresses many of the issues that are detrimental to the performance of the JP procedure.First, the JP procedure does not terminate in many cases of obvious nonunifiability, e.g., for X ?= f X, where X is a non-functional variable and f is a function constant.This example also shows that the JP procedure does not generalize Robinson's first-order procedure gracefully.To address this issue, our procedure detects whether a unification problem belongs to a fragment for which unification is decidable and finite complete sets of unifiers (CSUs) exist.We call algorithms that enumerate elements of the CSU for such fragments oracles.Noteworthy fragments with oracles are first-order terms, patterns [Nip93], functions-as-constructors [LM16], and a new fragment we present in Section 5.The unification procedures of Isabelle and Leo-III check whether the unification problem belongs to a decidable fragment, but we take this idea a step further by checking this more efficiently and for every subproblem arising during unification.
Second, the JP procedure computes many redundant unifiers.Consider the example F (G a) ?= F b, where it produces, in addition to the desired unifiers {F → λx.H} and {G → λx.b}, the redundant unifier {F → λx.H, G → λx.x}.The design of our procedure avoids computing many redundant unifiers, including this one.Additionally, as oracles usually return a small CSU, their integration reduces the number of redundant unifiers.
Third, the JP procedure applies more explosive rules than Huet's preunification procedure to flex-rigid pairs.To gracefully generalize Huet's procedure, we show that his rules for flex-rigid pairs suffice to enumerate CSUs if combined with appropriate rules for flex-flex pairs.
Fourth, the JP procedure repeatedly traverses the parts of the unification problem that have already been unified.Consider the problem f 100 (G a) ?= f 100 (H b), where the exponents denote repeated application.It is easy to see that this problem can be reduced to G a ?= H b. However, the JP procedure will wastefully retraverse the common context f 100 [ ] after applying each new substitution.Since the JP procedure must apply substitutions to the variables occurring in the common context above the position of disagreement, it cannot be easily adapted to eagerly decompose unification pairs.By contrast, our procedure is designed to decompose the pairs eagerly, never traversing a common context twice.
Last, the JP procedure does not allow to apply substitutions and β-reduce lazily.The rules of simpler procedures (e.g., first-order [HV09] and pattern unification [Nip93]) depend only on the heads of the unification pair.Thus, to determine the next step, implementations of these procedures need to substitute and β-reduce only until the heads of the current unification pair are not mapped by the substitution and are not λ-abstractions.Since the JP procedure is not based on the decomposition of unification pairs, it is unfit for optimizations of this kind.We designed our procedure to allow for this optimization.
To more efficiently find terms (in a large term set) that are unifiable with a given query term, we developed a higher-order extension of fingerprint indexing [Sch12] (Section 6).We implemented our procedure, several oracles, and the fingerprint index in the Zipperposition prover (Section 7).Since a straightforward implementation of the JP procedure already

The Unification Procedure
To unify two terms s and t, our procedure builds a tree as follows.The nodes of the tree have the form (E, σ), where E is a multiset of unification constraints {(s 1 ?= t 1 ), . . ., (s n ?= t n )} and σ is the substitution constructed up to that point.The root node of the tree is ({s ?= t}, id), where id is the identity substitution.The tree is then constructed applying the transitions listed below.The leaves of the tree are either failure nodes ⊥ or substitutions σ.Ignoring failure nodes, the set of all substitutions in the leaves forms a complete set of unifiers for s and t.More generally, our procedure can be used to unify a multiset E of constraints by making the root of the unification tree (E, id).
The procedure requires an infinite supply of fresh free variables.These fresh variables must be disjoint from the variables occurring in the initial multiset E. Whenever a transition (E, σ) −→ (E , σ ) is made, all fresh variables used in σ are removed from the supply and cannot be used again as fresh variables.
The transitions are parametrized by a mapping P that assigns a set of substitutions to a unification pair; this mapping abstracts the concept of unification rules present in other unification procedures.Moreover, the transitions are parametrized by a selection function S mapping a multiset E of unification constraints to one of those constraints S(E) ∈ E, the selected constraint in E. The transitions, defined as follows, are only applied if the grayed constraint is selected.
where s or t is not in hnf Dereference: where none of the previous transitions apply and F is mapped by σ Fail: where none of the previous transitions apply, and a and b are different rigid heads Delete: where none of the previous transitions apply where none of the previous transitions apply, some oracle found a finite CSU U for σs ?= σt using fresh auxiliary variables, and ∈ U ; if multiple oracles found a CSU, only one of them is considered where none of the previous transitions apply, and some oracle determined σs ?= σt has no solutions where none of the transitions Succeed to OracleFail apply Bind: where none of the transitions Succeed to OracleFail apply, and ∈ P(s The transitions are designed so that only OracleSucc, Decompose, and Bind can introduce parallel branches in the constructed tree.OracleSucc can introduce branches using different unifiers of the CSU, Bind can introduce branches using different substitutions in P, and Decompose can be applied in parallel with Bind. The form of the rules OracleSucc and Bind is similar: both extend the current substitution.However, they are designed following different principles.OracleSucc solves the selected unification constraint using an efficient algorithm applicable only to certain classes of terms.On the other hand, Bind is applied to explore the whole search space for any given constraint.These rules are separated in two to make Bind applicable only if OracleSucc (or OracleFail) is not, so that possible solutions (or failures) are detected early.
Our approach is to apply substitutions and αβη-normalize terms lazily.In this context, laziness means that the transitions Normalize αη , Normalize β , and Dereference partially normalize and partially apply the constructed substitution just enough to ensure that the heads are the ones we would get if the substitution was fully applied and the term was fully normalized.Additionally, the transitions that modify the constructed substitution, OracleSucc and Bind, do not apply that substitution to the unification pairs directly, but only extend it with a new binding.To support lazy dereferencing, these rules must maintain the invariant that all substitutions are idempotent.The invariant is easily preserved if the substitution from the definition of OracleSucc and Bind is itself idempotent and no variable mapped by σ occurs in F , for any variable F mapped by .
The OracleSucc and OracleFail transitions invoke oracles, such as pattern unification, to compute a CSU faster, produce fewer redundant unifiers, and discover nonunifiability earlier.
In some cases, addition of oracles lets the procedure terminate more often.
In the literature, oracles are usually stated under the assumption that their input belongs to the appropriate fragment.To check whether a unification constraint is inside the fragment, we need to fully apply the substitution and β-normalize the constraint.To avoid these expensive operations and enable efficient oracle integration, oracles must be redesigned to lazily discover whether the terms belong to their fragment.Most oracles contain a decomposition operation which requires only a partial application of the substitution and only partial β-normalization.If one of the constraints resulting from decomposition is not in the fragment, the original problem is not in the fragment.This allows us to detect that the problem is not in the fragment without fully applying the substitution and β-normalizing.
The core of the procedure lies in the Bind step, parameterized by the mapping P that determines which substitutions (called bindings) to create.The bindings are defined as follows: Huet-style projection for F : Let F be a free variable of type where some where the fresh free variables F m and bound variables x n are of appropriate types.Imitation of a for F : Let F be a free variable of type α 1 → • • • → α n → β and a be a free variable or a constant of type where the fresh free variables F m and bound variables x n are of appropriate types.Elimination for F : Let F be a free variable of type where the fresh free variable G as well as all x j k are of appropriate type.We call fresh variables emerging from this binding in the role of G elimination variables.Identification for F and G: Let F and G be different free variables.Furthermore, let the type of where n, m ≥ 0.Then, the identification binding binds F and G with where the fresh free variables H, F m , G n and bound variables x n ,y m are of appropriate types.Fresh variables from this binding with the role of H are called identification variables.Iteration for F : Let F be a free variable of the type α 1 → • • • → α n → β 1 and let some α i be the type where n > 0 and m ≥ 0. Iteration for F at i is The free variables H and G 1 , . . ., G m are fresh, and y is an arbitrary-length sequence of bound variables of arbitrary types.All new variables are of appropriate type.Due to indeterminacy of y, this step is infinitely branching.The following mapping P c (λx. s ?= λx.t) is used as the parameter P of the procedure: • If the constraint is flex-rigid, let P c (λx. F s ?= λx.a t) be an imitation of a for F , if a is a constant, and all Huet-style projections for F , if F is not an identification variable.
• If the constraint is flex-flex and the heads are different, let P c (λx. F s ?= λx.G t) be all identifications and iterations for both F and G, and all JP-style projections for non-identification variables among F and G.
• If the constraint is flex-flex and the heads are identical, we distinguish two cases: if the head is an elimination variable, P c (λx. s ?= λx.t) = ∅; otherwise, let P c (λx. F s ?= λx.F t) be all iterations for F at arguments of functional type and all eliminations for F .
Comparison with the JP Procedure.The JP procedure enumerates unifiers by constructing a search tree with nodes of the form (s ?= t, σ), where s ?= t is the current unification problem and σ is the substitution built so far.The initial node consists of the input problem and the identity substitution.Success nodes are nodes of the form (s ?= s, σ).The set of all substitutions contained success nodes form a CSU.
To determine the child nodes of a node (s ?= t, σ), the procedure computes the common context C of s and t, yielding term pairs (s 1 , t 1 ), . . ., (s n , t n ), called disagreement pairs, such that s = C[s 1 , . . ., s n ] and t = C[t 1 , . . ., t n ].It chooses one of the disagreement pairs (s i , t i ).Depending on the context C and the chosen disagreement pair (s i , t i ), it determines a set of bindings P JP (C, s i , t i ).For each of the bindings ∈ P JP (C, s i , t i ), it creates a child node (( s) ↓βη ?= ( t) ↓βη , σ), where u ↓βη denotes a βη-normal form of a term u.The set of bindings P JP (C, s i , t i ) is based on the heads of s i and t i , and the free variables occurring above s i and t i in C. The set P JP (C, s i , t i ) contains • all JP-style projections for free variables that are heads of s i or t i ; 1 • an imitation of a for F if a free variable F is the head of s i and a free variable or constant a is the head of t i (or vice versa); • all eliminations for free variables occurring above the chosen disagreement pair eliminating only the argument containing the disagreement pair; • an identification for the heads of s i and t i if they are both free variables; and • all iterations for the heads of s i and t i if they are free variables, and for all free variables occurring above the disagreement pair. 2 Architecturally, the most noticeable difference between the JP procedure and ours is the representation of the problem: The JP procedure works on a single constraint, while our procedure maintains a multiset of constraints.At a first glance, this is a merely presentational change.However, it has consequences for termination, performance, and redundancy of the procedure.
Since the JP procedure never decomposes the common context of its only constraint, it allows iteration or elimination to be applied at a free variable above the disagreement pair, even if bindings were already applied below that free variable.This can lead to many different paths to the same unifier.In contrast, our procedure makes the decision which binding to apply to a flex-flex pair with the same head as soon as it is observed.Also, it explores the possibility of not applying a binding and decomposing the pair.In either way, the flex-flex pair is never revisited, which improves the performance and returns fewer redundant unifiers.We show that this restriction prunes the search space without influencing the completeness.
Our procedure makes the choice of child nodes based only on the heads of the chosen unification constraint.In contrast, the JP procedure tracks all the variables occurring in the common context.Thus, lazy normalization and lazy variable substitution cannot be integrated in the JP procedure a straightforward fashion.Moreover, as it does not feature a rule similar to Decompose, it always retraverses the already unified part of the problem, resulting in poor performance on deep terms.
One of the main drawbacks of the JP procedure is that it features a highly explosive, infinitely branching iteration rule.This rule is a more general version of Huet-style projection.
1 In JP's formulation of projection, they explicitly mention that the projected argument must be of base type.In our presentation, this follows from β being of base type by the convention introduced in Section 2.
2 In JP's formulation of iteration, it is not immediately obvious whether they intend to require iteration of arguments of base type.However, their Definition 2.4 [JP76] shows that they do.
Its universality enables finding elements of CSU for flex-flex pairs, for which Huet-style projection does not suffice.However, the JP procedure applies iteration indiscriminately on both flex-flex and flex-rigid pairs.We discovered that our procedure remains complete if iteration is applied only on flex-flex pairs, and Huet-style projection only on flex-rigid ones.This helps our procedure terminate more often than the JP procedure.As a side-effect, the restriction of our procedure to the preunification problem is a graceful generalization of Huet procedure, with additional improvements such as oracles, lazy substitution, and lazy β-reduction.
The bindings of our procedure contain further optimizations that are absent in the JP procedure: The JP procedure applies eliminations for only one parameter at a time, yielding multiple paths to the same unifier.It applies imitations to flex-flex pairs, which we found to be unnecessary.Similarly, we found out that tracking which rules introduced which variables can avoid computing redundant unifiers: It is not necessary to apply iterations and eliminations on elimination variables, and projections on identification variables.
Examples.We present some examples that demonstrate advantages of our procedure.The displayed branches of the constructed trees are not necessarily exhaustive.We abbreviate JP-style projection as JP Proj, imitation as Imit, identification as Id, Decompose as Dc, Dereference as Dr, Normalize β as N β , and Bind of a binding x as B(x).Transitions of the JP procedure are denoted by =⇒.For the JP transitions we implicitly apply the generated bindings and fully normalize terms, which significantly shortens JP derivations.
Example 3.1.The JP procedure does not terminate on the problem G ? = f G: By including any oracle that supports the first-order occurs check, such as the pattern oracle or the fixpoint oracle described in Section 7, our procedure gracefully generalizes first-order unification: The following derivation illustrates the advantage of the Decompose rule.
({h 100 (F a) The JP procedure produces the same intermediate substitutions σ 1 to σ 3 , but since it does not decompose the terms, it retraverses the common context h 100 [ ] at every step to identify the contained disagreement pair: Example 3.3.Even when no oracles are used, our procedure performs better than the JP procedure on small, simple problems.Consider the problem F a ?= a, which has a two element CSU: {F → λx.x, F → λx.a}.Our procedure terminates, finding both unifiers: The JP procedure finds those two unifiers as well, but it does not terminate as it applies iterations to F .({F (G a) The JP procedure additionally produces the following redundant unifier: Moreover, the JP procedure does not terminate because an infinite number of iterations is applicable at the root.Our procedure terminates in this case since we only apply iteration binding for non base-type arguments, which F does not have.
Pragmatic Variant.We structured our procedure so that most of the unification machinery is contained in the Bind step.Modifying P, we can sacrifice completeness and obtain a pragmatic variant of the procedure that often performs better in practice.Our preliminary experiments showed that using mapping P p defined as follows is a reasonable compromise between completeness and performance: • If the constraint is flex-rigid, let P p (λx. F s ?= λx.a t) be an imitation of a for F , if a is a constant, and all Huet-style projections for F if F is not an identification variable.
• If the constraint is flex-flex and the heads are different, let P p (λx. F s ?= λx.G t) be an identification binding for F and G, and all Huet-style projections for F if F is not an identification variable • If the constraint is flex-flex and the heads are identical, we distinguish two cases: if the head is an elimination variable, P p (λx. F s ?= λx.F t) = ∅; otherwise, let P p (λx. F s ?= λx.F t) be the set of all eliminations bindings for F .The pragmatic variant of our procedure removes all iteration bindings to enforce finite branching.Moreover, it imposes limits on the number of bindings applied, counting the applications of bindings locally, per constraint.It is useful to distinguish the Huet-style projection cases where α i is a base type (called simple projection), which always reduces the problem size, and the cases where α i is a functional type (called functional projection).We limit the number applications of the following bindings: functional projections, eliminations, imitations and identifications.In addition, a limit on the total number of applied bindings can be set.An elimination binding that removes k arguments counts as k elimination steps.Due to these limits, the pragmatic variant terminates.
To fail as soon as any of the limits is reached, the pragmatic variant employs an additional oracle.If this oracle determines that the limits are reached and the constraint is of the form λx. F s m ?= λx.G t n , it returns a trivial unifier -a substitution {F → λx m .H, G → λx n .H}, where H is a fresh variable; if the limits are reached and the constraint is flex-rigid, the oracle fails; if the limits are not reached, it reports that terms are outside its fragment.The trivial unifier prevents the procedure from failing on easily unifiable flex-flex pairs.
Careful tuning of each limit optimizes the procedure for a specific class of problems.For problems originating from proof assistants, shallow unification depth usually suffices.However, hard hand-crafted problems often need deeper unification.

Proof of Completeness
Like the JP procedure, our procedure misses no unifiers: Theorem 4.1.The procedure described in Section 3 parametrized by P c is complete, meaning that the substitutions on the leaves of the constructed tree form a CSU.More precisely, let E be a multiset of constraints and let V be the supply of fresh variables provided to the procedure.Then for any unifier of E there exists a derivation (E, id) −→ * σ and a substitution θ such that for all free variables X ∈ V , we have X = θσX.
Taking a high-level view, this theorem is proved by incrementally defining states (E j , σ j ) and remainder substitutions j starting with (E 0 , σ 0 ) = (E, id) and 0 = .The substitution j is what remains to be added to σ j to reach 0 .States are defined so that the shape of the selected constraint from E j and the remainder substitution guide the choice of applicable transition rule.We employ a measure based on values of E j and j that decreases with each application of the rules.Therefore, eventually, we will reach the target substitution σ.
In the remaining of this section, we view terms as αβη-equivalence classes, with the η-long β-normal form as their canonical representative.Moreover, we consider all substitutions to be fully applied.These assumptions are justified because all bindings depend only on the head of terms and hence replacing the lazy transitions Normalize αη , Normalize β , and Dereference by eager counterparts only affects the efficiency but not the overall behavior of our procedure.
We now give the detailed completeness proof of Theorem 4.1.Our proof is an adaptation of the proof given by Jensen and Pietrzykowski [JP76].Definitions and lemmas are reused, but are combined together differently to suit our procedure.We start by listing all reused definitions and lemmas from the original JP proof.The "JP" labels in their statements refer to the corresponding lemmas and definitions from the original proof.Definition 4.2 (JP D1.6).Given two terms t and s and their common context C, we can write t as C[t] and s as C[s] for some t and s.The pairs (s j , t j ) are called disagreement pairs.Definition 4.3 (JP D3.1).Given two terms t and s, let λx.t and λy.s be respective α-equivalent terms such that their parameters x and y are disjoint.Then the disagreement pairs of t and s are called opponent pairs in t and s.
Lemma 4.4 (JP L3.3 (1)).Let be a substitution and X, Y be free variables such that (X s) = Y t for some term tuples s and t.Then for every opponent pair u, v in X and Y (Definition 4.3), the head of u or v is a parameter of X or Y .
In contrast to applied constants, applied variables should not be eagerly decomposed.For a constant f, if f s ?= f t has a unifier, that unifier must clearly also unify s i ?= t i for each i.For a free variable X, a unifier of X s ?= X t does not necessarily unify s i ?= t i .The concept of ω-simplicity is a criterion on unifiers that captures some of the cases where eager decomposition is possible.Non-ω-simplicity on the other hand is the main trigger of Iteration-the most explosive binding of our procedure.Definition 4.5 (JP D3.2).An occurrence of a parameter x of term t in the body of t is ω-simple if both (1) the arguments of x are distinct and are exactly (the η-long forms of) all of the variables bound in the body of t, and (2) this occurrence of x is not in an argument of any parameter of t.
This definition is slightly too restrictive for our purposes.It is unfortunate that condition 1 requires x to be applied to all instead of just some of the bound variables.The JP proof would probably work with such a relaxation, and the definition would then cover all cases where eager decomposition is possible.However, to reuse the JP lemmas, we stick to the original notion of ω-simplicity and introduce the following relaxation: Definition 4.6.An occurrence of a parameter x of term t in the body of t is base-simple if it is ω-simple or both (1) x is of base type, and (2) this occurrence of x is not in an argument of any parameter of t.
Lemma 4.7.Let s have parameters x and a subterm x j v where this occurrence of x j is base-simple.Then for any sequence t of (at least j) terms, the body of t j is a subterm of s t (after normalization) at the position of x j v up to renaming of the parameters of t j .To compare positions of s and s t, ignore the parameter count mismatch.
Proof.Consider the process of β-normalizing s t.After substituting terms t into the body of s, a further reduction can only take place when some t k is an abstraction that gets arguments in s.The arguments v to the x j are distinct variables bound in the body of s.This follows easily from either case of the definition of base-simplicity.So t j is applied to the unmodified v after substituting terms t into the body of s.Base-simplicity also implies that t j v does not occur in an argument to another t k .Hence only the reduction of t j v itself affects this subterm.The variables v match the parameter count of t j because we consider the η-long form of t j ; so t j v reduces to the body of t j (modulo renaming).The position is obviously that of x j v. Lemma 4.8 (JP C3.4 strengthened).Let be a substitution and X a free variable.If (λx.X s) = λx.X t and some occurrence of the i th parameter of X is base-simple, then Proof.By Lemma 4.7, s i occurs in X( s) at certain position that depends only on X.
Similarly t i occurs in X t = X( s) at the same position, and hence s i = t i .
We define more properties to determine which binding to apply to a given constraint.Roughly speaking, the simple comparison form will trigger identification bindings, projectivity will trigger Huet-style projections, and simple projectivity will trigger JP-style projections.Definition 4.9 (JP D3.4).We say that s and t are in simple comparison form if all ω-simple heads of opponent pairs in s and t are distinct, and each opponent pair has an ω-simple head.
Definition 4.10 (JP D3.5).A term t is called projective if the head of t is a parameter of t.If the whole body is just the parameter, then t is called simply projective.
A central part of the proof is to find a suitable measure for the remaining problem size.Showing that the measure is strictly decreasing and well-founded guarantees that the procedure finds a suitable substitution in finitely many steps.We reuse the measure for remainder substitutions from JP [JP76], but embed it into a lexicographic measure to handle the decomposition steps and oracles of our procedure.Definition 4.11 (JP D3.7).The free weight of a term t is the total number of occurrences of free variables and constants in t.The bound weight of t is the total number of occurrences (excluding occurrences λx) of bound variables in t, but with the particular exemption: if a prefix variable u has one or more ω-simple occurrences in the body, then one such occurrence and its arguments are not counted.It does not matter which occurrence is not counted because in η-long form the bound weight of the arguments of an ω-simple variable is the same for all occurrences of that variable.Definition 4.12 (JP D3.8).For multisets E of unification constraints and substitutions , our measure on pairs (E, ) is the lexicographic comparison of A: the sum of the sizes of the terms in E B: the sum over the free weight of F , for all variables F mapped by C: the sum over the bound weight of F , for all variables F mapped by D: the sum over the number of parameters of F , for all variables F mapped by We denote the quadruple containing these numbers as ord(E, ).We denote the triple containing only the last three components of ord(E, ) as ord .We write < for the lexicographic comparison of these tuples.
The next six lemmas correspond to the bindings of our procedure and sufficient conditions for the binding to bring us closer to a given solution.This is expressed as a decrease of the ord measure of the remainder.In each of these lemmas, let u be a term with a variable head a and v a term with an arbitrary head b.Let be a unifier of u and v.The conclusion, let us call it C , is always the same: there exists a binding δ applicable to the problem u ?= v, and there exists a substitution such that ord < ord and for all variables X except the fresh variables introduced by the binding X = δX.For most of these lemmas, we refer to JP [JP76] for proofs.Although JP only claim X = δX for variables X mapped by , inspection of their proofs shows that the equality holds for all X except the fresh variables introduced by the binding.Moreover, some of our bindings have more preconditions, yielding additional orthogonal hypotheses in our lemmas, which we address below.Lemma 4.13 (JP L3.9).If a = b is not an elimination variable and a discards any of its parameters, then C by elimination.Moreover, for the elimination variable G introduced by this elimination, G discards none of its parameters and has the body of a.
Proof.Let a = λx.t and let (x j k ) i k=1 be the subsequence of x consisting of those variables which occur in the body t.It is a strict subsequence, since a is assumed to discard some parameter.Since the equal heads a = b of the constraint u ?= v are not elimination variables, elimination for (j k ) i k=1 can be applied.Let δ = {a → λx.G x j 1 . . .x j i } be the corresponding binding.Define to be like except a = a and G = λx j 1 . . .
Obviously G is a closed term and X = δX holds for all X = G.Moreover ord < ord , because free and bound weights stay the same ( a and G have the same body t) whereas the number of parameters strictly decreases.The definition of (j k ) i k=1 implies that G discards none of its parameters.
Lemma 4.14 (JP L3.10).Assume that there exists a parameter x of a such that x has a non-ω-simple (Definition 4.5) occurrence in a, which is not below another parameter, or such that x has at least two ω-simple occurrences in a.Moreover, if a = b, to make iteration applicable, a must not be an elimination variable, and x must be of functional type.Then C is achieved by iteration.
Lemma 4.15 (JP L3.11).Assume that a and b are different free variables.If a is simply projective (Definition 4.10) and a is not an identification variable, then C by JP-style projection.
Lemma 4.16 (JP L3.12).If a is not projective and b is rigid, then C by imitation.Lemma 4.17 (JP L3.13).Let a = b.Assume that a = a and b = b are in simple comparison form (Definition 4.9) and neither is projective.Then C by identification.Moreover, H is not projective, where H is the identification variable introduced by this application of the identification binding.
Proof.This is JP's Lemma 3.13, plus the claim that H is not projective.Inspecting the proof of that lemma, it is obvious that H cannot be projective because a and b are not projective.
Lemma 4.18.Assume that a is projective (Definition 4.10), a is not an identification variable, and b is rigid.Then C by Huet-style projection.
Proof.Since a is projective, we have a = λx n .x k t m for some k and some terms t m .If a is also simply projective, then x k must be non-functional and since Huet-style projection and JP-style projection coincide in that case, Lemma 4.15 applies.Hence, in the following we may assume that a is not simply projective, i.e., that m > 0.
Let δ be the Huet-style projection binding: for fresh variables F 1 , . . ., F m .This binding is applicable because b is rigid.Let be the same as except that we set a = a and for each 1 ≤ j ≤ m we set It remains to show that ord < ord .The free weight of a is the same as the sum of the free weights of F j for 1 ≤ j ≤ m.Thus, the free weight is the same for and .The bound weight of a however is exactly 1 larger than the sum of the bound weights of F j for 1 ≤ j ≤ m because of the additional occurrence of x k in a.The exemption for ω-simple 18:14 occurrences in the definition of the bound weight cannot be triggered by this occurrence of x k because m > 0 and thus x k is not ω-simple.It follows that ord < ord .
We are now ready to prove the completeness theorem (Theorem 4.1).
Proof.Let E be a multiset of constraints and let V be the supply of fresh variables provided to our procedure.Let be a unifier of E. We must show that there exists a derivation (E, id) −→ * σ and a substitution θ such that for all free variables X ∈ V , we have X = θσX.
Let E 0 = E and σ 0 = id.Let 0 = τ for some renaming τ , such that every free variable occurring in 0 E 0 does not occur in E 0 and is not contained in V .Then 0 unifies E 0 because unifies E by assumption.Moreover, 0 = 0 σ 0 .We proceed to inductively define E j , σ j and j until we reach some j such that E j = ∅.To guarantee well-foundedness, we ensure that the measure ord( j , E j ) decreases with each step.We maintain the following invariants for all j: • 0 X = j σ j X for all free variables X ∈ V ; • every free variable occurring in j E j does not occur in E j and is not contained in V ; • for every identification variable X, j X is not projective; and • for every elimination variable X, each parameter of j X has occurrences in j X, all of which are base-simple.
If E j = ∅, let u ?= v be the selected constraint S(E j ) in E j .First assume that an oracle is able to find a CSU for the constraint u ?= v.Since j unifies u and v, by the definition of a CSU, the CSU discovered by the oracle contains a unifier δ of u and v such that there exists j+1 and for all free variables X except for the auxiliary variables of the CSU we have j X = j+1 δX.Thus, an OracleSucc transition is applicable and yields the node (E j+1 , σ j+1 ) = (δ(E j \ {u ?= v}), δ σ j ).Therefore we have a strict containment j+1 E j+1 ⊂ j+1 δ E j = j E j .This implies ord(E j+1 , j+1 ) < ord(E j , j ).It also shows that the constraints j+1 E j+1 are unified when j E j are.Since the auxiliary variables introduced by OracleSucc are fresh, they cannot occur in E j nor in σ j X for any X ∈ V .Hence, we have 0 X = j σ j X = j+1 δ σ j X = j+1 σ j+1 X for all free variables X ∈ V .Any free variable occurring in j+1 E j+1 cannot not occur in E j+1 and is not contained in V because j+1 E j+1 ⊂ j E j and the variables in E j+1 = δ(E j \ {u ?= v}) are either variables already present in E j or fresh variables introduced by OracleSucc.New identification or elimination variables are not introduced; so their properties are preserved.Hence all invariants are preserved.
Otherwise we proceed by a case distinction on the form of u ?= v.Typically, one of the Lemmas 4.13-4.18 is going to apply.Any one of them gives substitutions and δ with properties that let us define E j+1 = δE j , σ j+1 = δ σ j and j+1 = .The problem size always strictly decreases, because these lemmas imply j+1 E j+1 = j+1 δE j = j E j and ord j+1 = ord < ord j .Regarding the other invariants, the former equation guarantees that j+1 unifies E j+1 , and 0 X = j σ j X = j+1 δ σ j X = j+1 σ j+1 X for all X ∈ V because the fresh variables introduced by the binding cannot occur in σ j X for any X ∈ V .The conditions on variables must be checked separately when new ones are introduced.Let a be the head of u = λx.a u and b be the head of v = λx.b v. Consider the following cases: u and v have the same head symbol a = b: (1) Suppose that j a has a parameter with non-base-simple occurrence.By one of the induction invariants, a is not an elimination variable.Among all non-basesimple occurrences of parameters in j a, choose the leftmost one, which we call x.This occurrence of x cannot be below another parameter, because having x occur in one of its arguments would make that other parameter non-base-simple, contradicting the occurrence of x being leftmost.Thus x is neither base-simple nor below another parameter; so x is of functional type.Moreover, non-base-simplicity implies non-ω-simplicity.Hence, we can apply Lemma 4.14 (iteration).
(2) Otherwise suppose that j a discards some of its parameters.By one of the induction invariants, j a is not an elimination variable.Hence Lemma 4.13 (elimination) applies.
The newly introduced elimination variable G satisfies the required invariants, because Lemma 4.13 guarantees that j+1 G uses its parameters and shares the body with j a which by assumption of this case contains only base-simple occurrences.
(3) Otherwise every parameter of j a has occurrences and all of them are base-simple.
We are going to show that Decompose is a valid transition and decreases j E j .By Lemma 4.8 we conclude from j u = j v that j u i = j v i for every i.Hence the new constraints for all i} after Decompose are unified by j .This allows us to define j+1 = j and σ j+1 = σ j .To check that j+1 E j+1 = j E j+1 is smaller than j E j it suffices to check that constraints j u i ?= j v i together are smaller than j u ?= j v. Since all parameters of j a have base-simple occurrences, j u i is a subterm of j u = λx.j a ( j u) by Lemma 4.7.Similarly for j v.It follows that j+1 E j+1 is smaller than j E j .Since j+1 = j and σ j+1 = σ j , the other invariants are obviously preserved.u and v is a flex-flex pair with different heads: (5) First, suppose that j a or j b is simply projective (Definition 4.10).By the induction hypothesis, the simply projective head cannot be an identification variable.Thus Lemma 4.15 (JP-style projection) applies.(6) Otherwise suppose that j a is projective but not simply.Then the head of j a is some parameter x k .But this occurrence cannot be ω-simple because it has arguments which cannot be bound above the head x k .Thus Lemma 4.14 (iteration) applies.If j b is projective but not simply, the same argument applies.(7) Otherwise suppose that j a, j b are in simple comparison form (Definition 4.9).By one of the the induction invariants, the free variables occurring in j E j do not occur in E j .Thus j a = a and j b = b.Then Lemma 4.17 (identification) applies.(8) Otherwise j a, j b are not in simple comparison form.By Lemma 4.4 and by the definition of simple comparison form, there is some opponent pair x k r, b in j a and j b (after possibly swapping u and v) where either the occurrence of x k is not ω-simple (Definition 4.5) or else x k has another ω-simple occurrence in the body of j a. Then Lemma 4.14 (iteration) applies.u and v is a flex-rigid pair: Without loss of generality, assume that a is flex and b is rigid.
(9) Suppose first that j a is projective.By one of the induction invariants, a cannot be an identification variable.Thus Lemma 4.18 (Huet-style projection) applies.(10) Otherwise j a is not projective.The head of j a must be b because b is rigid, and j unifies u and v. Since j a is not projective, that means that b is not a bound variable.Therefore, b must be a constant.Then Lemma 4.16 (imitation) applies.We have now constructed a run (E 0 , σ 0 ) −→ (E 1 , σ 1 ) −→ (E 2 , σ 2 ) −→ • • • of the procedure.This run cannot be infinite because because the measure ord(E j , j ) strictly decreases as j increases.Hence, at some point we reach a j such that E j = ∅ and 0 X = j σ j X for all X ∈ V .Therefore, (E, id) −→ * (∅, σ j ) −→ σ j , and X = τ −1 j σ j X for all X ∈ V , completing the proof.

A New Decidable Fragment
We discovered a new fragment that admits a finite CSU and a simple oracle.The oracle is based on work by Prehofer and the PT procedure [Pre95], an adaptation of the preunification procedure by Snyder and Gallier [SG89] (which itself is an adaptation of Huet's procedure).PT transforms an initial multiset of constraints E 0 by applying bindings .If there is a sequence E 0 =⇒ 1 • • • =⇒ n E n such that E n has only flex-flex constraints, we say that PT produces a preunifier σ = n . . . 1 with constraints E n .A sequence fails if E n = ⊥.As in the previous section, we consider all terms to be αβη-equivalence classes with the η-long β-reduced form as their canonical representative.Unlike previously, in this section we view unification constraints s ?= t as ordered pairs.The following rules, however, are stated modulo orientation.The PT transition rules, adapted for our presentation style, are as follows:

fresh variables of appropriate types
The grayed constraints are required to be selected by a given selection function S. We call S admissible if it selects only flex-rigid constraints, prioritizes selection of constraints applicable for Failure and Decomposition, and of descendant constraints of Projection transitions with j = 0 (i.e., for x i of base type), in that order of priority.In the remainder of this section we consider only admissible selection functions, an assumption that Prehofer also makes implicitly in his thesis.Additionally, whenever we compare multisets, we use the multiset ordering defined by Dershowitz and Manna [DM79].As above, we assume that the fresh variables are taken from an infinite supply V of fresh variables that are different from the variables in the initial problem and never reused.
The following lemma states that PT is complete for preunification: Lemma 5.1.Let be a unifier of a multiset of constraints E 0 .Then PT produces a preunifier σ with constraints E n , and there exists a unifier θ of E n such that X = θσX for all X that are not contained in the supply V of fresh variables.
Proof.This lemma is a refinement of Lemma 4.1.7from Prehofer's PhD thesis [Pre95], and this proof closely follows the proof of that lemma.Compared to the lemma from Prehofer's the constraint λx.F v n ?= λx.a u m was chosen, which, due to solidity restrictions, cannot be such a descendant.Therefore, we know that PT will transform the descendants of the matching problem λx.v j ?= λx.a u m until either Failure is observed (making PT trivially terminating) or until no descendant exists and the grounding matcher is computed (see Lemmas 5.5 and 5.4).This results in removal of the original constraint λx.F v n ?= λx.a u m and application of the computed grounding matcher, which will either remove all the free variables in the right-hand side of the constraint (not increasing A and reducing B) or not increasing A and B, reduce C if no free variables occur in the right-hand side.
• λx. a v n ?= λx.F u m : if a is a bound variable, projecting F onto argument u i will either enable application of Decomposition as the next step reducing A, or it will result in Failure, trivially terminating.If a is a constant, then projecting F onto some u j will either yield Failure or enable Decomposition, reducing A.
Enumerating a CSU for a solid flex-flex pair may seem as hard as for any other flex-flex pair; however, the following two lemmas show that solid pairs admit an MGU: Lemma 5.7.The unification problem {λz.F s m ?= λz.F s m }, where both terms are solid, has an MGU of the form σ = {F → λx m .G x j 1 . . .x jr } where G is an auxiliary variable, and 1 ≤ j 1 < • • • < j r ≤ m are exactly those indices j i for which s j i = s j i .
Proof.Let be a unifier for the given unification problem.Let λx.u = F .Take an arbitrary subterm of u whose head is a bound variable x i .If x i is of function type, it corresponds to either s i or s i which, due to solidity restrictions, has to be a bound variable.Furthermore, since is a unifier, s i and s i have to be the syntactically equal.Similarly, if x i is of base type, it corresponds to two ground terms s i and s i which have to be syntactically equal.We conclude that can use variables from x n only if they correspond to syntactically equal terms.Therefore, there is a substitution θ such that X = θσX for all X = G.Due to arbitrary choice of , we conclude that σ is an MGU.= λx.F s m .We prove σ is an MGU by showing that there exists a substitution θ such that X = θσX for all non-auxiliary variables X.We can focus only on X ∈ {F, F } because all other non-auxiliary variables do neither appear in the original problem nor in σF or σF ; so we can simply define θX = X.
Let λx m .u = F and λy m .u = F , where the bound variables x m and y m have been α-renamed apart.We also assume that the names of variables bound inside u and u are α-renamed so that they are different from x m and y m .Finally, bound variables from the definition of σ have been α-renamed to match x m and y m .We define θ to be the substitution From diff's definition it is clear that there are terms v, v for which it is undefined.However, we will show that for each u and u that are bodies of bindings from a unifier , diff is defined and has the desired property.In equations 5.3, 5.4, and 5.5, if there are multiple numbers k or l that fulfill the condition, choose an arbitrary one.We need to show that F = θσF and F = θσF .By the definitions of u, u , θ and σ and β-reduction, this is equivalent to We will show by induction that for any λx m .v, λy m .v such that {x 1 → s 1 , . . ., x m → s m }v = {y 1 → s 1 , . . ., y m → s m }v ( ) The equation ( ) holds for v = u and v = u because is a unifier of λx.F s m ?= λx.F s m .Therefore, once we have shown that ( ) implies ( †), we know that ( †) holds for v = u and v = u and we are done.
We prove that ( ) implies ( †) by induction on the size of v and v .We consider the following cases: v = λx.v 1 : For ( ) to hold, v and v must be of the same type.Therefore, the λ-prefixes of their η-long representatives must have the same length and we can apply equation 5.1.By the induction hypothesis, ( †) holds.v = x i : In this case, {x 1 → s 1 , . . ., x m → s m }v = s i .Since ( ) holds, v must be an instance of a unifier from the CSU of s i = H i s m .However, since s i and all terms in s m are ground, λy m .v = σ k i (H i ), for some k.Then, diff(x i , v ) = z k i , and it is easy to check that ( †) holds.v = x i v n , n > 0: In this case, x i is mapped to s i which, due to solidity restrictions, has to be a functional bound variable.Since ( ) holds, we conclude that the head of {y 1 → s 1 , . . ., y m → s m }v must be s j , such that s j = s i ; this also means that v = y j v n .Therefore, it is easy to check that some τ = {H i → λy m .y j } is a matcher for the problem By induction hypothesis, we get that ( †) holds.traversal.To ensure we do not remain stuck waiting for a unifier from a particular lazy list, the procedure will periodically return an empty set, indicating that the next lazy list should be probed.
The implemented selection function for our procedure prioritizes selection of rigid-rigid over flex-rigid pairs, and flex-rigid over flex-flex pairs.However, since the constructed substitution σ is not applied eagerly, heads can appear to be flex, even if they become rigid after dereferencing and normalization.To mitigate this issue to some degree, we dereference the heads with σ, but do not normalize, and use the resulting heads for prioritization.
We implemented oracles for the pattern, solid, and fixpoint fragment.Fixpoint unification [Hue75] is concerned with problems of the form {F ?= t}.If F does not occur in t, {F → t} is an MGU for the problem.If there is a position p in t such that t| p = F u m and for each prefix q = p of p, t| q has a rigid head and either m = 0 or t is not a λ-abstraction, then we can conclude that F ? = t has no solutions.Otherwise, the fixpoint oracle is not applicable.
For second-order logic with only unary constants, it is decidable whether a unifier for a problem in this class (called monadic second-order ) exists [Far88].As this class of terms admits a possibly infinite CSU, this oracle cannot be used for OracleSucc step, but it can be used for OracleFail.Similarly the fragment of second-order terms with no repeated occurrences of free variables has decidable unifier existence but possibly infinite CSUs [Dow01].Due to their limited applicability and high complexity we decided not to implement these oracles.

Evaluation
We evaluated the implementation of our unification procedure in Zipperposition, assessing the complete variant and the pragmatic variant, the latter with several different combinations of limits for number of bindings.As part of the implementation of the complete mode for Boolean-free higher-order logic in Zipperposition [BBT + 19], Bentkamp implemented a straightforward version of the JP procedure.This version is faithful to the original description, with a check as to whether a (sub)problem can be solved using a first-order oracle as the only optimization.Our evaluations were performed on StarExec Miami [SST14] servers with Intel Xeon E5-2620 v4 CPUs clocked at 2.10 GHz with 60 s CPU limit.
Contrary to first-order unification, there is no widely available corpus of benchmarks dedicated solely to evaluating performance of higher-order unification algorithms.Thus, we used all 2606 monomorphic higher-order theorems from the TPTP library [Sut17] and 832 monomorphic higher-order Sledgehammer (SH) generated problems [SBP13] as our benchmarks5 .Many TPTP problems require synthesis of complicated unifiers, whereas Sledgehammer problems are only mildly higher-order-many of them are solved with firstorder unifiers.
We used the naive implementation of the JP procedure (jp) as a baseline to evaluate the performance of our procedure.We compare it with the complete variant of our procedure (cv) and pragmatic variants (pv) with several different configurations of limits for applied bindings.All other Zipperposition parameters have been fixed to the values of a variant of a well-performing configuration we used for the CASC-27 theorem proving competition [Sut19].Figure 1 compares different variants of the procedure with the naive JP implementation.Each pv configuration is denoted by pv a bcde where a is the limit on the total number of applied bindings, and b, c, d, and e are the limits of functional projections, eliminations, imitations, and identifications, respectively.Figure 2 summarizes the effects of using different oracles.
The configuration of our procedure with no oracles outperforms the JP procedure with the first-order oracle.This suggests that the design of the procedure, in particular lazy normalization and lazy application of the substitution, already reduces the effects of the JP procedure's main bottlenecks.Raw evaluation data shows that on TPTP benchmarks, complete and pragmatic configurations differ in the set of problems they solve-cv solves 19 problems not solved by pv 4 2222 , whereas pv 4 2222 solves 34 problems cv does not solve.Similarly, comparing the pragmatic configurations with each other, pv 6 3333 and pv 4 2222 each solve 13 problems that the other one does not.The overall higher success rate of pv 2 1020 compared to pv 2 1222 suggests that solving flex-flex pairs by trivial unifiers often suffices for superposition-based theorem proving.
Counterintuitively, in some cases, using oracles can hurt the performance of Zipperposition.Using oracles typically results in generating smaller CSUs, whose elements are more general substitutions than the ones we obtain without oracles.These more general substitutions usually contain more applied variables, which Zipperposition's heuristics avoid due to their explosive nature.This can make Zipperposition postpone necessary inferences for too long.Configuration n benefits from this effect and therefore solves 18 TPTP problems that no other configuration in Figure 2 solves.The same effect also gives configurations with only one oracle an advantage over configurations with multiple oracles on some problems.
The evaluation sheds some light on how often solid unification problems appear in practice.The raw data show that configuration s solves 5 TPTP problems that neither f nor p solve.Configuration f solves 8 TPTP problems that neither s nor p solve, while p solves 9 TPTP problems that two other configurations do not.This suggests that the solid oracle is slightly less beneficial than the fixpoint or pattern oracles, but still presents a useful addition the set of available oracles.
A subset of TPTP benchmarks, concerning operations on Church numerals, is designed to test the efficiency of higher-order unification.Our procedure performs exceptionally well on these problems-it solves all of them, usually faster than other competitive higher-order pattern unification but is significantly more complex to implement.Prehofer [Pre95] lists many other decidable fragments, not only for unification but also preunification and unifier existence problems.Most of these algorithms are given for second-order terms with various constraints on their variables.Finally, one of the first decidability results is Farmer's discovery [Far88] that higher-order unification of terms with unary function symbols is decidable.
Our procedure draws inspiration from and contributes to all three lines of research.Accordingly, its advantages over previously known procedures can be laid out along those three lines.First, our procedure mitigates many issues of the JP procedure.Second, it can be modified not to solve flex-flex pairs, and become a version of Huet's procedure with important built-in optimizations.Third, it can integrate any oracle for problems with finite CSUs, including the one we discovered.
The implementation of our procedure in Zipperposition was one of the reasons this prover evolved from proof-of-concept prover for higher-order logic to competitive higher-order prover.In the 2020 edition of CASC, Zipperposition won the higher-order division solving 84% of problems, which is 20 percentage points ahead of the runner-up.

Conclusion
We presented a procedure for enumerating a complete set of higher-order unifiers that is designed for efficiency.Due to a design that restricts the search space and a tight integration of oracles, it reduces the number of redundant unifiers returned and gives up early in cases of nonunifiability.In addition, we presented a new fragment of higher-order terms that admits finite CSUs.Our evaluation shows a clear improvement over previously known procedures.
In future work, we will focus on designing intelligent heuristics that automatically adjust unification parameters according to the type of the problem.For example, we should usually choose shallow unification for mostly first-order problems and deeper unification for hard higher-order problems.We plan to investigate other heuristic choices, such as the order of bindings and the way in which search space is traversed (breadth-or depth-first).We are also interested in further improving the termination behavior of the procedure, without sacrificing completeness.Finally, following the work of Libal [Lib15] and Zaionc [Zai85], we would like to consider the use of regular grammars to finitely present infinite CSUs.For example, the grammar G ::= λx.x | λx.f (G x) represents all elements of the CSU for the problem λx.G (f x)
Example 3.4.The search space restrictions also allow us to prune some redundant unifiers.Consider the problem F (G a) ?= F b, where a and b are of base type.Our procedure produces only one failing branch and the following two successful branches: where a and b are different rigid heads Solution: { λx.F x =⇒ E where F does not occur in t, t does not have a flex head, and = {F → λx.t} Imitation: { λx.F s m ? = λx.t } E Lemma 5.8.Let {λx.F s m λx.F s m } be a solid unification problem where F = F .By Lemma 5.4, there exists a finite CSU {σ 1 i , . . ., σ k i i } of the problem {s i H i s m }, where H i is a fresh free variable.Let λy m .s j i = λy m .σ j i (H i ) y m .Similarly, also by Lemma 5.4, there exists a finite CSU {σ 1 i , . . ., σl i i } of the problem {s i Hi is a fresh free variable.Let λx m .s j i = λx m .σj i ( Hi ) x m .Let Z be a fresh free variable.An MGU σ for the given problem is F → λx m .Z x 1 . . .x 1 ? = ?= ?= Hi s m }, where ?
The cv configuration and all of the pv configurations use only pattern unification as an To test the effect of oracle choice, we evaluated the complete variant in 8 combinations: with no oracles (n), with only fixpoint (f ), pattern (p), or solid (s) oracle, and with their combinations: fp, fs, ps, fps.