(Leftmost-Outermost) Beta Reduction is Invariant, Indeed

Slot and van Emde Boas' weak invariance thesis states that reasonable machines can simulate each other within a polynomially overhead in time. Is lambda-calculus a reasonable machine? Is there a way to measure the computational complexity of a lambda-term? This paper presents the first complete positive answer to this long-standing problem. Moreover, our answer is completely machine-independent and based over a standard notion in the theory of lambda-calculus: the length of a leftmost-outermost derivation to normal form is an invariant cost model. Such a theorem cannot be proved by directly relating lambda-calculus with Turing machines or random access machines, because of the size explosion problem: there are terms that in a linear number of steps produce an exponentially long output. The first step towards the solution is to shift to a notion of evaluation for which the length and the size of the output are linearly related. This is done by adopting the linear substitution calculus (LSC), a calculus of explicit substitutions modeled after linear logic proof nets and admitting a decomposition of leftmost-outermost derivations with the desired property. Thus, the LSC is invariant with respect to, say, random access machines. The second step is to show that LSC is invariant with respect to the lambda-calculus. The size explosion problem seems to imply that this is not possible: having the same notions of normal form, evaluation in the LSC is exponentially longer than in the lambda-calculus. We solve such an impasse by introducing a new form of shared normal form and shared reduction, deemed useful. Useful evaluation avoids those steps that only unshare the output without contributing to beta-redexes, i.e. the steps that cause the blow-up in size. The main technical contribution of the paper is indeed the definition of useful reductions and the thorough analysis of their properties.


Introduction
Theoretical computer science is built around algorithms, computational models, and machines: an algorithm describes a solution to a problem with respect to a fixed computational model, whose role is to provide a handy abstraction of concrete machines.The choice of the model reflects a tension between different needs.For complexity analysis, one expects a neat relationship between the primitives of the model and the way in which they are effectively implemented.In this respect, random access machines are often taken as the reference model, since their definition closely reflects the von Neumann architecture.The specification of algorithms unfortunately lies at the other end of the spectrum, as one would like them to be as machine-independent as possible.In this case programming languages are the typical model.Functional programming languages, thanks to their higher-order nature, provide very concise and abstract specifications.Their strength is also their weakness: the abstraction from physical machines is pushed to a level where it is no longer clear how to measure the complexity of an algorithm.Is there a way in which such a tension can be resolved?
The tools for stating the question formally are provided by complexity theory and by Slot and van Emde Boas' invariance thesis [SvEB84], which stipulates when any Turing complete computational model can be considered reasonable: Reasonable computational models simulate each other with polynomially bounded overhead in time, and constant factor overhead in space.The weak invariance thesis is the variant where the requirement about space is dropped, and it is the one we will actually work with in this paper (alternatively called extended, efficient, modern, or complexity-theoretic Church(-Turing) thesis).The idea behind the thesis is that for reasonable models the definition of every polynomial or super-polynomial class such as P or EXP does not rely on the chosen model.On the other hand, it is well-known that sub-polynomial classes depend very much on the model.A first refinement of our question then is: are functional languages invariant with respect to standard models like random access machines or Turing machines?
Invariance results have to be proved via an appropriate measure of time complexity for programs, i.e. a cost model.The natural measure for functional languages is the unitary cost model, i.e. the number of evaluation steps.There is, however, a subtlety.The evaluation of functional programs, in fact, depends very much on the evaluation strategy chosen to implement the language, while the reference model for functional languages, the λ-calculus, is so machine-independent that it does not even come with a deterministic evaluation strategy.And which strategy, if any, gives us the most natural, or canonical cost model (whatever that means)?These questions have received some attention in the last decades.The number of optimal parallel β-steps (in the sense of Lévy [Lév78]) to normal form has been shown not to be a reasonable cost model: there exists a family of terms that reduces in a polynomial number of parallel β-steps, but whose intrinsic complexity is non-elementary [LM96,AM98].If one considers the number of sequential β-steps (in a given strategy, for a given notion of reduction), the literature offers some partial positive results, all relying on the use of sharing (see below for more details).
Sharing is indeed a key ingredient, for one of the issues here is due to the representation of terms.The ordinary way of representing terms indeed suffers from the size-explosion problem: even for the most restrictive notions of reduction (e.g.Plotkin's weak reduction), there is a family of terms {t n } n∈N such that |t n | is linear in n, t n evaluates to its normal form in n steps, but at the i-th step a term of size 2 i is copied, producing a normal form of size exponential in n.Put differently, an evaluation sequence of linear length can possibly produce an output of exponential size.At first sight, then, there is no hope that evaluation lengths may provide an invariant cost model.The idea is that such an impasse can be avoided by sharing common subterms along the evaluation process, in order to keep the representation of the output compact, i.e. polynomially related to the number of evaluation steps.But is appropriately managed sharing enough?The answer is positive, at least for certain restricted forms of reduction: the number of steps is already known to be an invariant cost model for weak reduction [BG95, SGM02, DLM08, DLM12] and for head reduction [ADL12].
If the problem at hand consists in computing the normal form of an arbitrary λ-term, however, no positive answer was known, to the best of our knowledge, before our result.We believe that not knowing whether the λ-calculus in its full generality is a reasonable machine is embarrassing for the λ-calculus community.In addition, this problem is relevant in practice: proof assistants often need to check whether two terms are convertible, itself a problem usually reduced to the computation of normal forms.
In this paper, we give a positive answer to the question above, by showing that leftmostoutermost (LO, for short) reduction to normal form indeed induces an invariant cost model.Such an evaluation strategy is standard, in the sense of the standardization theorem, one of the central theorems in the theory of λ-calculus, first proved by Curry and Feys [CF58].The relevance of our cost model is given by the fact that LO reduction is an abstract concept from rewriting theory which at first sight is totally unrelated to complexity analysis.Moreover, the underlying computational model is very far from traditional, machine-based models like Turing machines and RAMs.
Another view on this problem comes in fact from rewriting theory itself.It is common practice to specify the operational semantics of a language via a rewriting system, whose rules always employ some form of substitution, or at least of copying, of subterms.Unfortunately, this practice is very far away from the way languages are implemented, as actual interpreters perform copying in a very controlled way (see, e.g., [Wad71,PJ87]).This discrepancy induces serious doubts about the relevance of the computational model.Is there any theoretical justification for copy-based models, or more generally for rewriting theory as a modeling tool?In this paper we give a very precise answer, formulated within rewriting theory itself.A second contribution of the paper, indeed, is a rewriting analysis of the technique used to prove the invariance result.
As in our previous work [ADL12], we prove our result by means of the linear substitution calculus (see also [Acc12,ABKL14]), a simple calculus of explicit substitutions (ES, for short) introduced by Accattoli and Kesner, that arises from linear logic and graphical syntaxes and it is similar to calculi studied by de Bruijn [dB87], Nederpelt [Ned92], and Milner [Mil07].A peculiar feature of the linear substitution calculus (LSC) is the use of rewriting rules at a distance, i.e. rules defined by means of contexts, that are used to closely mimic reduction in linear logic proof nets.Such a framework-whose use does not require any knowledge of these areas-allows an easy management of sharing and, in contrast to previous approaches to ES, admits a theory of standardization and a notion of LO evaluation [ABKL14].The proof of our result is based on a fine quantitative study of the relationship between LO derivations for the λ-calculus and a variation over LO derivations for Related Work.In the literature invariance results for the weak call-by-value λ-calculus have been proved three times, independently.First, by Blelloch and Greiner [BG95], while studying cost models for parallel evaluation.Then by Sands, Gustavsson and Moran [SGM02], while studying speedups for functional languages, and finally by Martini and the second author [DLM08], who addressed the invariance thesis for the λ-calculus.The latter also proved invariance for the weak call-by-name λ-calculus [DLM12].Invariance of head reduction has been shown by the present authors, in previous work [ADL12].The problem of an invariant cost model for the ordinary λ-calculus is discussed by Frandsen and Sturtivant [FS91], and then by Lawall and Mairson [LM96].Frandsen and Sturtivant's proposal consists in taking the number of parallel β-steps to normal form as the cost of reducing any term.A negative result about the invariance of such cost model has been proved by Asperti and Mairson [AM98].When only first order symbols are considered, Dal Lago and Martini, and independently Avanzini and Moser, proved some quite general results through graph rewriting [DLM12,AM10], itself a form of sharing.This paper is a revised and extended version of [ADL14a], to which it adds explanations and the proofs that were omitted.It differs considerably with respect to both [ADL14a] and the associated technical report [ADL14b], as proofs and definitions have been improved and simplified, partially building on the recent work by Accattoli and Sacerdoti Coen in [ASC15], where useful sharing is studied in a call-by-value scenario.
After the introduction, in Sect. 1 we explain why the problem is hard by discussing the size-explosion problem.An abstract view of the solution is given in Sect.7. The sections in between (2-6) provide the background, i.e. definitions and basic results, up to the introduction of useful reduction-at a first reading we suggest to skip them.After the abstract view, in Sect.8 we explain how the various abstract requirements are actually proved in the remaining sections (9-14), where the proofs are.We put everything together in Sect.15, and discuss optimizations in Sect.16.

Why is The Problem Hard?
In principle, one may wonder why sharing is needed at all, or whether a relatively simple form of sharing suffices.In this section, we will show that sharing is unavoidable and that a new subtle notion of sharing is necessary.
If we stick to explicit representations of terms, in which sharing is not allowed, counterexamples to invariance can be designed in a fairly easy way.The problem is size-explosion, or the existence of terms of size n that in O(n) steps produce an output of size O(2 n ), and affects the λ-calculus as well as its weak and head variants.The explosion is due to iterated useless duplications of subterms that are normal and whose substitution does not create new redexes.For simple cases as weak or head reduction, turning to shared representations of λ-terms and micro-step substitutions (i.e. one occurrence at the time) is enough to avoid size-explosion.For micro-steps, in fact, the length of evaluation and the size of the output are linearly related.A key point is that both micro-step weak and head reduction stop on a compact representation of the weak or head normal form.
In the ordinary λ-calculus, a very natural notion of evaluation to normal form is LO reduction.Unfortunately, turning to sharing and micro-step LO evaluation is not enough, because such a micro-step simulation of β-reduction computes ordinary normal forms, i.e. it does not produce a compact representation, but the usual one, whose size is sometimes exponential.In other words, size-explosion reappears disguised as length-explosion: for the size-exploding family, indeed, micro-step evaluation to normal form is necessarily exponential in n, because its length is linear in the size of the output.Thus, the number of β-steps cannot be shown to be invariant using such a simple form of sharing.
The problem is that evaluation should stop on a compact-i.e.not exponentialrepresentation of the normal form, as in the simpler cases, but there is no such notion.Our way out is the definition of a variant of micro-step LO evaluation that stops on a minimal useful normal form, that is a term with ES t such that unfolding all the substitutions in t produces a normal form, i.e. such that the duplications left to do are useless.In Sect.6, we will define useful reduction, that will stop on minimal useful normal forms and for which we will show invariance with respect to both β-reduction and random access machines.
In the rest of the section we discuss in detail the size-explosion problem, recall the solution for the head case, and explain the problem for the general case.Last, we discuss the role of standard derivations.
1.1.A Size-Exploding Family.The typical example of a term that is useless to duplicate is a free variable 1 , as it is normal and its substitution cannot create redexes.Note that the same is true for the application xx of a free variable x to itself, and, iterating, for (xx)(xx), and so on.We can easily build a term u n of size |u n | = O(n) that takes a free variable x, and puts its self application xx as argument of a new redex, that does the same, i.e. it puts the self application (xx)(xx) as argument of a new redex, and so on, for n times normalizing in n steps to a complete binary tree of height n and size O(2 n ), whose internal nodes are applications and whose 2 n leaves are all occurrences of x.Let us formalize this notion of variable tree x @n of height n: x @0 := x; x @(n+1) := x @n x @n .Clearly, the size of variable trees is exponential in n, a routine induction indeed shows |x @n | = 2 n+1 − 1 = O(2 n ).Now let us define the family of terms {t n } n≥1 that in only n LO steps blows up x into the tree x @n : The statement is slightly generalized, in order to express it as a nice property over variable trees.n+m) .Proof.By induction on n.Cases: (1) Base Case, i.e. n = 1: 1 On open terms: in the λ-calculus free variables are unavoidable because reduction takes place under abstractions.Even if one considers only globally closed terms, variable occurrences may look free locally, as y in λy.((λx.(xx))y)→ β λy.(yy).This is why for studying the strong λ-calculus it is common practice to work with possibly open terms.(2) Induction Step: It seems that the unitary cost model-i.e. the number of LO β-steps-is not invariant: in a linear number of β-steps we reach an object which cannot even be written down in polynomial time.
1.2.The Head Case.The solution the authors proposed in [ADL12] tames size-explosion in a satisfactory way when head reduction is the evaluation strategy (note that β-steps in Proposition 1.1 are in particular head steps).It uses sharing under the form of explicit substitutions (ES), that amounts to extend the language with an additional constructor noted t[x u], that is an avatar of let-expressions, to be thought of as a sharing annotation of u for x in t, or as a term notation for the DAGs used in the graph-rewriting of λ-calculus (see [AG09]).The usual, capture-avoiding, and meta-level notion of substitution is instead noted t{x u}.
Let us give a sketch of how ES work for the head case.Formal details about ES and the more general LO case will be given in the Sect.4. First of all, a term with sharing, i.e. with ES, can always be unshared, or unfolded, obtaining an ordinary λ-term t → .
Definition 1.2 (Unfolding).The unfolding t → of a term with ES t is given by: Head β-reduction is β-reduction in a head context, i.In particular, the size-exploding family t n x is evaluated by the following linear head steps.
For n = 1 we have 2 A more accurate explanation of the terminology: in the literature on ES the rewriting rule (λx.t)u → t[x u] (that is the explicit variant of β) is often called B to distinguish it from β, and dB-that will be formally defined in Sect.4-stays for distant B (or B at a distance) rather than distant β.
Note that only the head variable has been replaced, and that evaluation requires one → dB step and one → lhs step.For n = 2, It is easily seen that r n As one can easily verify, the size of the linear head normal form r n is linear in n, so that there is no size-explosion (the number of steps is also linear in n).Moreover, the unfolding r n → of r n is exactly x @n , so that the linear head normal form r n is a compact representation of the head normal form, i.e. the expected result.Morally, in r n only the left branch of the complete binary tree x @n has been unfolded, while the rest of the tree is kept shared via explicit substitutions.Size-explosion is avoided by not substituting in arguments at all.
Invariance of head reduction via LHR is obtained in [ADL12] by proving that LHR correctly implements head reduction up to unfolding within-crucially-a quadratic overhead.This is how sharing is exploited to circumvent the size-explosion problem: the length of head derivations is a reasonable cost model even if head reduction suffers of size-explosion, because the actual implementation is meant to be done via LHR and be only polynomially (actually quadratically) longer.Note that-a posteriori-we are allowed to forget about ES.They are an essential tool for the proof of invariance.But once invariance is established, one can provide reasonable complexity bounds by simply counting β-steps in the λ-calculus, with no need to deal with ES.
Of course, one needs to show that turning to shared representations is a reasonable choice, i.e. that using a term with ES outside the evaluation process does not hide an exponential overhead.Shared terms can in fact be managed efficiently, typically tested for equality of their unfoldings in time polynomial (actually quadratic [ADL12], or quasi-linear [GR14]) in the size of the shared terms.In Sect.14, we will discuss another kind of test on shared representations.1.3.Length-Explosion and Usefulness.It is clear that the computation of the full normal form x @n of t n x, requires exponential work, so that the general case seems to be hopeless.In fact, there is a notion of linear LO reduction → LO [ABKL14], obtained by iterating LHR on the arguments, that computes normal forms and it is linearly related to the size of the output.However, → LO cannot be polynomially related to the LO strategy → LOβ , because it produces an exponential output, and so it necessarily takes an exponential number of steps.In other words, size-explosion disguises itself as length-explosion.With respect to our example, → LO extends LHR evaluation by unfolding the whole variable tree in a LO way, x @n [y 1 y 2 y 2 ][y 2 y 3 y 3 ] . . .[y n x] and leaving garbage [y 1 y 2 y 2 ][y 2 y 3 y 3 ] . . .[y n x] that may eventually be collected.Note the exponential number of steps.
Getting out of this cul-de-sac requires to avoid useless duplication.Essentially, only substitution steps that contribute to eventually obtain an unshared β-redex have to be done.
The other substitution steps, that only unfold parts of the normal form, have to be avoided.Such a process then produces a minimal shared term whose unfolding is an ordinary normal form.The tricky point is how to define, and then select in reasonable time, those steps that contribute to eventually obtain an unshared β-redex.The definition of useful reduction relies on tests of certain partial unfoldings that have a inherent global nature, what in a graphical formalism can be thought of as the unfolding of the sub-DAG rooted in a given sharing node.Of course, computing unfoldings takes in general exponential time, so that an efficient way of performing such tests has to be found.
The proper definition of useful reduction is postponed to Sect.6, but we discuss here how it circumvents size-explosion.With respect to the example, useful reduction evaluates t n x to the useful normal form that unfolds to the exponentially bigger result x @n .In particular, our example of sizeexploding family will be evaluated without performing any duplication at all, because the duplications needed to compute the normal form are all useless.
Defining and reasoning about useful reduction requires some care.At first sight, one may think that it is enough to evaluate a term t in a LO way, stopping as soon as a useful normal form is reached.Unfortunately, this simple approach does not work, because sizeexplosion may be caused by ES lying in between two β-redexes, so that LO evaluation would unfold the exploding substitutions anyway.Moreover, it is not possible to simply define useless terms and avoid their reduction.The reason is that usefulness and uselessness are properties of substitution steps, not of subterms.Said differently, whether a subterm is useful depends crucially on the context in which it occurs.An apparently useless argument may become useful if plugged into the right context.Indeed, consider the term u n := (λx.(t n x))I, obtained by plugging the size-exploding family in the context (λx.• )I, that abstracts x and applies to the identity I := λz.z.By delaying β-redexes we obtain: Now-in contrast to the size-explosion case-it is useful to unfold the whole variable tree x @n , because the obtained copies of x will be substituted by I, generating exponentially many β steps, that compensate the explosion in size.Our notion of useful step will elaborate on this idea, by computing contextual unfoldings, to check if a substitution step contributes (or will contribute) to some future β-redex.Of course, we will have to show that such tests can be themselves performed in polynomial time.
It is also worth mentioning that the contextual nature of useful substitution implies that-as a rewriting rule-it is inherently global: it cannot be first defined at top level (i.e.locally) and then extended via a closure by evaluation contexts, because the evaluation context has to be taken into account in the definition of the rule itself.Therefore, the study of useful reduction is delicate at the technical level, as proofs by naïve induction on evaluation contexts usually do not work.
1.4.The Role of Standard Derivations.Apart from the main result, we also connect the classic rewriting concept of standard derivation with the problem under study.Let us stress that such a connection is a plus, as it is not needed to prove the invariance theorem.
We use it in the proof, but only to shed a new light on a well-established rewriting concept, and not because it is necessary.
The role of standard derivations is in fact twofold.On the one hand, LO β-derivations are standard, and thus our invariant cost model is justified by a classic notion of evaluation, internal to the theory of the λ-calculus and not ad-hoc.On the other hand, the linear useful strategy is shown to be standard for the LSC.Therefore, this notion, at first defined ad-hoc to solve the problem, turns out to fit the theory.
The paper contains also a general result about standard derivations for the LSC.We show they have the subterm property, i.e. every single step of a standard derivation ρ : t → * u is implementable in time linear in the size |t| of the input.It follows that the size of the output is linear in the length of the derivation, and so there is no size-explosion.Such a connection between standardization and complexity analysis is quite surprising, and it is one of the signs that a new complexity-aware rewriting theory of β-reduction is emerging.
At a first reading, we suggest to read Sect.7, where an abstract view of the solution is provided, right after this section.In between (i.e.sections 2-6), there is the necessary long sequence of preliminary definitions and results.In particular, Sect.6, will define useful reduction.

Rewriting
For us, an (abstract) reduction system is a pair (T, → T ) consisting of a set T of terms and a binary relation → T on T called a reduction (relation).When (t, u) ∈→ T we write t → T u and we say that t T -reduces to u.The reflexive and transitive closure of → T is written → * T .Composition of relations is denoted by juxtaposition.Given k ≥ 0, we write a Given a deterministic reduction system (T, → T ), and a term t ∈ T , the expression # → T (t) stands for the number of reduction steps necessary to reach the → T -normal form of t along → T , or ∞ if t diverges.Similarly, given a natural number n, the expression → n T (t) stands for the term u such that t → n T u, if n ≤ # → T (t), or for the normal form of t otherwise.

λ-Calculus
3.1.Statics.The syntax of the λ-calculus is given by the following grammar for terms: We use t{x u} for the usual (meta-level) notion of substitution.An abstraction λx.t binds x in t, and we silently work modulo α-equivalence of these bound variables, e.g.(λy.(xy)){x y} = λz.(yz).We use fv(t) for the set of free variables of t.
Contexts.One-hole contexts are defined by: and the plugging of a term t into a context C is defined by As usual, plugging in a context can capture variables, e.g.(λy.( • y)) y = λy.(yy).The plugging C D of a context D into a context C is defined analogously.Plugging will be implicitly extended to all notions of contexts in the paper, always in the expected way.
3.2.Dynamics.We define β-reduction → β as follows: The position of a β-redex C t → β C u is the context C in which it takes place.To ease the language, we will identify a redex with its position.A derivation ρ : t → k u is a finite, possibly empty, sequence of reduction steps, sometimes given as C 1 ; . . .; C k , i.e. as the sequence of positions of reduced redexes.We write |t| for the size of t and |ρ| for the length of ρ.
Leftmost-Outermost Derivations.The left-to-right outside-in order on redexes is expressed as an order on positions, i.e. contexts.Let us warn the reader about a possible source of confusion.The left-to-right outside-in order in the next definition is sometimes simply called left-to-right (or simply left) order.The former terminology is used when terms are seen as trees (where the left-to-right and the outside-in orders are disjoint), while the latter terminology is used when terms are seen as strings (where left-to-right is a total order).While the study of standardization for the LSC [ABKL14] uses the string approach (and thus only talks about the left-to-right order and the leftmost redex), here some of the proofs require a delicate analysis of the relative positions of redexes and so we prefer the more informative tree approach and define the order formally.Definition 3.1.
(1) The outside-in order : (a) Root: The following are a few examples.For every context C, it holds that

Inductive LOβ Contexts.
It is useful to have an inductive characterization of the contexts in which → LOβ takes place.We use the following terminology: a term is neutral if it is β-normal and it is not of the form λx.u, i.e. it is not an abstraction.Definition 3.3 (iLOβ Context).Inductive LOβ (or iLOβ) contexts are defined by induction as follows: Proof.The left-to-right implication is by induction on C. The right-to-left implication is by induction on the definition of C is iLOβ.

The Shallow Linear Substitution Calculus
4.1.Statics.The language of the linear substitution calculus (LSC for short) is given by the following grammar for terms: The constructor t[x u] is called an explicit substitution (of u for x in t).Both λx.t and t[x u] bind x in t.In general, we assume a strong form of Barendregt's convention: any two bound or free variables have distinct names.We also silently work modulo α-equivalence of bound variables to preserve the convention, e.g.The operational semantics of the LSC is parametric in a notion of (one-hole) context.General contexts simply extend the contexts for λ-terms with the two cases for explicit substitutions: Along most of the paper, however, we will not need such a general notion of context.In fact, our study takes a simpler form if the operational semantics is defined with respect to shallow contexts, defined as (note the absence of the production t[x S]): In the following, whenever we refer to a context without further specification, it is implicitly assumed that it is a shallow context.We write S ≺ p t if there is a term u such that S u = t, and call it the prefix relation.
A special class of contexts is that of substitution contexts: Remark 4.1 (α-Equivalence for Contexts).While Barendregt's convention can always be achieved for terms, for contexts the question is subtler.Plugging in a context S, indeed, is not a capture-avoiding operation, so it is not stable by α-renaming S, as renaming can change the set of variables captured by S (if the hole of the context appears in the scope of the binder).Nonetheless, taking into account both the context S and the term t to be plugged into S, one can always rename both the bound variable in S and its free occurrences in t and satisfy the convention.Said differently, the contexts we consider are always obtained by splitting a term t as a subterm u and a context S such that S u = t, so we assume that t has been renamed before splitting it into S and u, guaranteeing that S respects the convention.In particular, we shall freely assume that in t[x u] and S[x u] there are no free occurrences of x in u, as this can always be obtained by an appropriate α-conversion.
4.2.Dynamics.The (shallow) rewriting rules → dB (dB = β at a distance) and → ls (ls = linear substitution) are given by: and the union of → dB and → ls is simply noted →.
Let us point out a slight formal abuse of our system: rule → ls does not preserve Barendregt's convention (shortened BC), as it duplicates the bound names in u, so BC is not stable by reduction.To preserve BC it would be enough to replace the target term with S u α [x u], where u α is an α-equivalent copy of u such that all bound names in u have been replaced by fresh and distinct names.Such a renaming can be done while copying u and thus does not affect the complexity of implementing → ls .In order to lighten this already technically demanding paper, however, we decided to drop an explicit and detailed treatment of α-equivalence, and so we simply stick to S u [x u], letting the renaming implicit.
The implicit use of BC also rules out a few degenerate rewriting sequences.For instance, the following degenerated behavior is not allowed because the initial term does not respect BC.By α-equivalence we rather have the following evaluation sequence, ending on a normal form The just defined shallow fragment simply ignores garbage collection (that in the LSC can always be postponed [Acc12]) and lacks some of the nice properties of the LSC (obtained simply by replacing shallow contexts by general contexts).Its relevance lies in the fact that it is the smallest fragment implementing linear LO reduction (see forthcoming Definition 4.5).The following are examples of shallow steps: while the following are not With respect to the literature on the LSC we slightly abuse the notation, as → dB and → ls are usually used for the unrestricted versions, while here we adopt them for their shallow variants.Let us also warn the reader of a possible source of confusion: in the literature there exists an alternative notation and terminology in use for the LSC, stressing the linear logic interpretation, for which → dB is noted → m and called multiplicative (cut-elimination rule) and → ls is noted → e and called exponential.
Taking the external context into account, a substitution step has the following explicit form: We shall often use a compact form: Since every → ls step has a unique compact form, and a shallow context is the compact form of at most one → ls step, it is natural to use the compact context of a → ls step as its position.
Definition 4.2.Given a → dB -redex S t → dB S u with t → dB u or a compact → ls -redex S x → ls S t , the position of the redex is the context S.
As for λ-calculus, we identify a redex with its position, thus using S, S ′ , S ′′ for redexes, and use ρ : t → k u for (possibly empty) derivations.We write |t| [•] for the number of substitutions in t and |ρ| dB for the number of dB-steps in ρ.

Linear LO Reduction.
We redefine the LO order on contexts to accommodate ES.
Definition 4.3.The following definitions are given with respect to general (not necessarily shallow) contexts, even if apart from Sect.11 we will use them only for shallow contexts.
(1) The outside-in order : (a) Root: Note that ≺ O can be seen as the prefix relation ≺ p on contexts.
(2) The left-to-right order : C ≺ L D is defined by: (a) Application: Note that the outside-in order ≺ O can be seen as the prefix relation ≺ p on contexts.
The next lemma guarantees that we defined a total order.Proof.By induction on t.
Remember that we identify redexes with their position context and write S ≺ LO S ′ .We can now define LO reduction in the LSC, first considered in [ABKL14].
Definition 4.5 (LO Linear Reduction → LO ).Let t be a term and S a redex of t. S is the leftmost-outermost (LO for short) redex of t if S ≺ LO S ′ for every other redex S ′ of t.We write t → LO u if a step reduces the LO redex.
Technical Remark.Note that one cannot define → LO as the union of the two natural rules → LOdB and → LOls , reducing the LO dB and ls redexes, respectively.For example, if , because the LO redex has to be chosen among both dB and ls redexes.Therefore, we will for instance say given a → LO dB-step and not given the LO dB-step.

Unfoldings
In Sect. 1, we defined the unfolding t → of a term t (Definition 1.2, page 7).Here we extend it in various ways.We first define context unfoldings, then we generalize the unfolding (of both terms and contexts) relatively to a context, and finally we unfold shallow derivations.5.1.Unfolding Contexts.Shallowness is crucial here: the unfolding of a shallow context is still a context, because the hole cannot be duplicated by unfolding, being out of all ES.First of all, we define substitution on (general) contexts: Note that the definition of S{x u} assumes that the free variables of u are not captured by S (that means that for instance y / ∈ fv(u) in (λy.C){x u}).This can always be achieved by α-renaming S (according to Remark 4.1).
And then define context unfolding S → as: We have the following properties.
Lemma 5.1.Let S be a shallow contexts.Then: (1) S → is a context; (2) S t {x u} = S{x u} t{x u} ; Proof.By induction on S.
An important notion of context will be that of applicative context, i.e. of context whose hole is applied to an argument, and that if plugged with an abstraction provides a dB-redex.Definition 5.2 (Applicative Context).An applicative context is a context A ::= S Lt , where S and L are a shallow and a substitution context, respectively.
Note that applicative contexts are not made out of applications only: t(λx.(• [y u]r)) is an applicative context.
5.2.Relative Unfoldings.Useful reduction will require a more general notion of unfolding and context unfolding.The usefulness of a redex, in fact, will depend crucially on the context in which it takes place.More precisely, it will depend on the unfolding of the term extended with the substitutions that the surrounding context can provide-this is the unfolding of a term relative to a context.Moreover, relative unfoldings will also be needed for contexts.
Definition 5.4 (Relative Unfolding).Let S be a (shallow) context (verifying, as usual, Barendregt's convention-see also the remark after this definition).The unfolding t → S of a term t relative to S and the unfolding S ′ → S of a (shallow) context S ′ relative to S are defined by: For instance, (xy) ). Remark 5.5 (Relative Unfoldings and Barendregt's Convention).Let us point out that the definition of relative unfolding t → S relies crucially on the use of Barendregt's convention for contexts (according to Remark 4.1).For contexts not respecting the convention, in fact, the definition does not give the intended result.For instance, We also state some further properties of relative unfolding, to be used in the proofs, and proved by easy inductions.
Lemma 5.6 (Properties of Relative Unfoldings).Let t and u be terms and S and S ′ be shallow contexts.
(1) Well-Definedness: S → S ′ is a context.(2) Commutation: the following equalities hold (and in those on the right S is assumed to not capture x) (tu) As expected, linear substitution steps do not modify the unfolding, as the next lemma shows.Its proof is a nice application of the properties of contexts and (relative) unfoldings, allowed by the restriction to shallow contexts (the property is valid more generally for unrestricted ls steps, but we will need it only for the shallow ones).
Instead, dB-steps project to β-steps.Because of shallowness, we actually obtain a strong form of projection, as every dB-step projects on a single β-step.We are then allowed to identify dB and β-redexes.
Lemma 5.8 (→ dB Strongly Projects on → β ).Let t be a LSC term and Proof.Let t = S r → dB S p = u with r → dB p.We show that S where the first equality in the South-East corner is given by the fact that x does not occur in L and the variables on which L substitutes do not occur in s, as is easily seen by looking at the starting term.Thus the implicit substitutions L and {x s → } commute.
(3) Left of an application S = S ′ q.By Lemma 5.6.2we know that t . Using the i.h.we derive the following diagram: (4) Right of an application S = qS ′ .By Lemma 5.6.2we know that t Using the i.h.we derive the following diagram: The following calculation concludes the proof:

Useful Derivations
In this section we define useful reduction, a constrained, optimized reduction, that will be the key to the invariance theorem.The idea is that an optimized substitution step We consider the step in (6.1) a case of relative duplication because yy contains a β-redex up to relative unfolding in its context, as we have (yy) → • [y λz.z] = (λz.z)(λz.z),and thus duplicating yy duplicates a β-redex, up to unfolding.
Similarly, a case of relatively useful creation is given by: Again, the step itself does not create a dB-redex, but-up to unfolding-it substistutes an abstraction, because y → • [y λz.z] = λz.z,and the context is applicative (note that a context is applicative iff it is applicative up to unfolding, by Lemma 5.3).
The actual definition of useful reduction captures at the same time absolute and relative cases by means of relative unfoldings.Definition 6.1 (Useful/Useless Steps and Derivations).A useful step is either a dB-step or a ls-step S x → ls S r (in compact form) such that: (1) Relative Duplication: either r → S contains a β-redex, or (2) Relative Creation: r → S is an abstraction and S is applicative.A useless step is a ls-step that is not useful.A useful derivation (resp.useless derivation) is a derivation whose steps are useful (resp.useless).
Note that a useful normal form, i.e. a term that is normal for useful reduction, is not necessarily a normal form.For instance, the reader can now verify that the compact normal form we discussed in Sect. 1, namely is a useful normal form, but not a normal form.
As a first sanity check for useful reduction, we show that as long as there are useful substitutions steps to do, the unfolding is not → β -normal.
Proof.The equality S x → = S t → holds in general for → ls -steps (Lemma 5.7).For the existence of a β-redex, note that S t → = S → t → S by Lemma 5.6.6a, and that S → applicative iff S is applicative by Lemma 5.3.2.Then by relative duplication or relative creation there is a → β -redex in S x → .
We can finally define the strategy that will be shown to implement LO β-reduction within a polynomial overhead.Definition 6.3 (LO Useful Reduction → LOU ).Let t be a term and S a redex of t. S is the leftmost-outermost useful (LOU for short) redex of t if S ≺ LO S ′ for every other useful redex S ′ of t.We write t → LOU u if a step reduces the LOU redex.
6.1.On Defining Usefulness via Residuals.Note that useful steps concern future creations of β-redexes and yet circumvent the explicit use of residuals, relying on relative unfoldings only.It would be interesting, however, to have a characterization based on residuals.We actually spent time investigating such a characterization, but we decide to leave it to future work.We think that it is informative to know the reasons, that are the following: (1) a definition based on residuals is not required for the final result of this paper; (2) the definition based on relative unfoldings is preferable, as it allows the complexity analysis required for the final result; (3) we believe that the case studied in this paper, while certainly relevant, is not enough to develop a general, abstract theory of usefulness.We feel that more concrete examples should first be developed, for instance in call-by-value and call-by-need scenarios, and comparing weak and strong variants, extending the language with continuations or pattern matching, and so on.The complementary study in [ASC15], indeed, showed that the weak call-by-value case already provides different insights, and that useful sharing as studied here is only an instance of a more general concept; (4) we have a candidate characterization of useful reduction using residuals, for which however one needs sophisticated rewriting theory.It probably deserves to be studied in another paper.Our candidate characterization relies on a less rigid order between redexes of the LSC than the total order ≺ LO considered here, namely the partial box order ≺ box studied in [ABKL14].Our conjecture is that an → ls redex S is useful iff it is shallow and (a) there is a (not necessarily shallow) → dB redex C such that S ≺ box C, or (b) S creates a shallow → dB redex, or (c) there is a (not necessarily shallow) → ls redex C such that S ≺ box C and there exists a residual D of C after S that is useful.Coming back to the previous point, we feel that such an abstract characterizationassuming it holds-is not really satisfying, as it relies too much on the concrete notion of shallow redex.It is probably necessary to abstract away from a few cases in order to find the right notion.An obstacle, however, is that the rewriting theory developed in [ABKL14] has yet to be adapted to call-by-value and call-by-need.
To conclude, while having a residual theory of useful sharing is certainly both interesting and challenging, it is also certainly not necessary in order to begin a theory of cost models for the λ-calculus.

The Proof, Made Abstract
Here we describe the architecture of our proof, decomposing it, and proving the implementation theorem from a few abstract properties.The aim is to provide a tentative recipe for a general proof of invariance for functional languages.
We want to show that a certain abstract strategy for the λ-calculus provides a unitary and invariant cost model, i.e. that the number of steps is a measure polynomially related to the number of transitions on a Turing machine or a RAM.
In our case, will be LO β-reduction → LOβ .Such a choice is natural, as → LOβ is normalizing, it produces standard derivations, and it is an iteration of head reduction.
Because of size-explosion in the λ-calculus, we have to add sharing, and our framework for sharing is the (shallow) linear substitution calculus, that plays the role of a very abstract intermediate machine between λ-terms and Turing machines.Our encoding will rather address an informal notion of an algorithm rather than Turing machines.The algorithms will be clearly implementable with polynomial overhead but details of the implementation will not be discussed (see however Sect.16).
In the LSC, → LOβ is implemented by LO useful reduction → LOU .We say that → LOU is a partial strategy of the LSC, because the useful restriction forces it to stop on compact normal forms, that in general are not normal forms of the LSC.Let us be abstract, and replace → LOU with a general partial strategy X within the LSC.We want to show that X is invariant with respect to both and RAM.Then we need two theorems, which together-when instantiated to the strategies → LOβ and → LOU -yield the main result of the paper: (1) High-Level Implementation: terminates iff X terminates.Moreover, is implemented by X with only a polynomial overhead.Namely, t k X u iff t h u → with k polynomial in h (our actual bound will be quadratic); (2) Low-Level Implementation: X is implemented on a RAM with an overhead in time which is polynomial in both k and the size of t.
7.1.High-Level Implementation.The high-level half relies on the following notion.
Definition 7.1 (High-Level Implementation System).Let be a deterministic strategy on λ-terms and X a partial strategy of the shallow LSC.The pair ( , X ) is a high-level implementation system if whenever t is a LSC term it holds that: . Moreover, it is locally bounded if whenever t is a λ-term and ρ : t * X u then the length of a sequence of ls-steps from u is linear in the number |ρ| dB of (the past) dB-steps in ρ.
The normal form and projection properties address the qualitative part of the highlevel implementation theorem, i.e. the part about termination.The normal form property guarantees that X does not stop prematurely, so that when X terminates cannot keep going.The projection property guarantees that termination of implies termination of X .It also states a stronger fact: steps can be identified with the dB-steps of the X strategy.Note that the fact that one → dB step projects on exactly one → β -step is a general property of the shallow LSC, given by Lemma 5.8.The projection property then requires that the steps selected by the two strategies coincide up to unfolding.
The local boundedness property is instead used for the quantitative part of the theorem, i.e. to provide the polynomial bound.A simple argument indeed bounds the global number of ls-steps in X derivation with respect to the number of dB-steps, that-by the identification of β and dB redexes-is exactly the length of the associated derivation.
(1) ⇐) Suppose that t is X -normalizable and let ρ : t * X u a derivation to X -normal form.By the projection property there is a derivation t * u → .By the normal form property u → is a -normal form.⇒) Suppose that t is -normalizable and let τ : t k u be the derivation to -normal form (unique by determinism of ).Assume, by contradiction, that t is not Xnormalizable.Then there is a family of X -derivations ρ i : t i X u i with i ∈ N, each one extending the previous one.By the local boundedness, X can make only a finite number of ls steps (more generally, → ls is strongly normalizing in the LSC).Then the sequence {|ρ i | dB } i∈N is non-decreasing and unbounded.By the projection property, the family {ρ i } i∈N unfolds to a family of -derivations {ρ i → } i∈N of unbounded length (in particular greater than k), absurd.
(2) From Lemma 5.7 (→ ls projects on =) and Lemma 5.8 (a single shallow → dB projects on a single → β step) we obtain ρ ).Now, ρ has the shape: By the local boundedness, we obtain b i ≤ c • i j=1 a j for some constant c.Then: Note that i j=1 a j ≤ k j=1 a j = |ρ| dB and k ≤ |ρ| dB .So Note that the properties of the implementation hold for all derivations (and not only for those reaching normal forms).In fact, they even hold for derivations in strongly diverging terms.In this sense, our cost model is robust.

Low-Level Implementation.
For the low-level part we define three basic requirements.
Definition 7.3.A partial strategy X on LSC terms is efficiently mechanizable if given a derivation ρ : t * X u: (1) No Size-Explosion: |u| is polynomial in |t| and |ρ|; (2) Step: every redex of u can be implemented in time polynomial in |u|; (3) Selection: the search for the next X redex to reduce in u takes polynomial time in |u|.
The first two properties are natural.At first sight the selection property is always trivially verified: finding a redex in u takes time linear in |u|.However, our strategy for ES will reduce only redexes satisfying a side-condition whose naïve verification takes exponential time in |u|.Then one has to be sure that such a computation can be done in polynomial time.
Theorem 7.4 (Low-Level Implementation).Let X be an efficiently mechanizable strategy, t a λ-term, and k a number of steps.Then there is an algorithm that outputs k X (t), and which works in time polynomial in k and |t|.
Proof.The algorithm for computing k X (t) is obtained by iteratively searching for the next X redex to reduce and then reducing it, by using the algorithms given by the step and selection property.The complexity is obtained by summing the polynomials given by the step and selection property, that are in turn composed with the polynomial of the no sizeexplosion property.Since polynomials are closed by sum and composition, the algorithm works in polynomial time.
In [ADL12], we proved that head reduction and linear head reduction form a locally bounded high-level implementation system and that linear head reduction is efficiently mechanizable (but note that [ADL12] does not employ the terminology we use here).
Taking X as the LO strategy → LO of the LSC, almost does the job.Indeed, → LO is efficiently mechanizable and (→ LOβ , → LO ) is a high-level implementation system.Unfortunately, it is not a locally bounded implementation, because of the length-explosion example given in Sect. 1, and thus invariance does not hold.This is why useful reduction is required.7.3.Efficient Mechanizability and Subterms.We have been very lax in the definition of efficiently mechanizable strategies.The strategy that we will consider has the following additional property.Definition 7.5 (Subterm).A partial strategy X on LSC terms has the subterm property if given a derivation ρ : t * X u the terms duplicated along ρ are subterms of t.The subterm property in fact enforces linearity in the no size-explosion and step properties, as the following immediate lemma shows.(2) Linear Step: every redex of u can be implemented in time linear in |t|; The subterm property is fundamental and common to most implementations of functional languages [Jon96, SGM02, ADL12, ABM14], and for implementations and their complexity analysis it plays the role of the subformula property in sequent calculus.It is sometimes called semi-compositionality [Jon96].We will show that every standard derivation for the LSC has the subterm property.To the best of our knowledge, instead, no strategy of the λ-calculus has the subterm property.We are not aware of a proof of the nonexistence of strategies for β-reduction with the subterm property, though.For a fixed strategy, however, it is easy to build a counterexample, as β-reduction substitutes everywhere, in particular in the argument of applications that can later become a redex.The reason why the subterm property holds for many micro-step strategies, indeed, is precisely the fact that they do not substitute in such arguments, see also Sect.12.
The reader may wonder, then, why we did not ask the subterm property of an efficiently mechanizable strategy.The reason is that we want to provide a general abstract theory, and there may very well be strategies that are efficiently mechanizable but that do not satisfy the subterm property, or, rather, that satisfy only some relaxed form of it.
Let us also point out that in the subterm property the word subterm conveys the good intuition but is slightly abused: since evaluation is up to α-equivalence, a subterm of t is actually a subterm up to variable names, both free and bound.More precisely: define r − as r in which all variables (including those appearing in binders) are replaced by a fixed symbol * .Then, we will consider u to be a subterm of t whenever u − is a subterm of t − in the usual sense.The key property ensured by this definition is that the size |u| of u is bounded by |t|, which is what is actually relevant for the complexity analysis.

Road Map
We need to ensure that LOU derivations are efficiently mechanizable and form a high-level implementation system when paired with LOβ derivations.These are non-trivial properties, with subtle proofs in the following sections.The following schema is designed to help the reader to follow the master plan: (1) we will show that (→ LOβ , → LOU ) is a high-level implementation system, by showing (a) the normal form property in Sect.9 (b) the projection property in Sect.10, by introducing LOU contexts; (2) we will prove the local boundedness property of (→ LOβ , → LOU ) through a detour via standard derivations.The detour is in three steps: (a) the introduction of standard derivations in Sect.11, that are shown to have the subterm property; (b) the proof that LOU derivations are standard in Sect.12, and thus have the subterm property; (c) the proof that LOU derivations have the local boundedness property, that relies on the subterm and normal form properties; (3) we will prove that LOU derivations are efficiently mechanizable by showing: (a) the no size-explosion and step properties, that at this point are actually already known to hold, because they follow from the subterm property (Lemma 7.6); (b) the selection property, by exhibiting a polynomial algorithm to test whether a redex is useful or not, in Sect.14.
In Sect.15, we will put everything together, obtaining an implementation of the λ-calculus with a polynomial overhead, from which invariance follows.

The Normal Form Property
To show the normal form property we first have to generalize it to the relative unfolding in a context, in the next lemma, and then obtain it as a corollary.The statement of the lemma essentially says that for a useful normal form u in a context S the unfolding u → S either unfolds to a → β -normal form or it has a useful substitution redex provided by S. The second case is stated in a more technical way, spelling out the components of the redex, and will be used twice later on, in Lemma 10.2 in Sect. 10 (page 26) and Proposition 13.3 in Sect.13 (page 35).To have a simpler look at the statement we suggest to ignore cases 2(b)i and 2(b)ii at a first reading.(2) Abstraction u = λy.r.Follows from the i.h.applied to r and λx.• .
(3) Application u = rp.Note that u → S = 5.6.2r → S p → S and that r is a useful normal form, since u is.We can then apply the i.h. to r and S • p .There are two cases: (a) r → S • p = r → S is a normal form.Sub-cases: (i) r → S is an abstraction λy.q.This is the interesting inductive case, because it is the only one where the i.h.provides case 1 for r but the case ends proving case 2, actually 2(b)ii, for u.In the other cases of the proof (3(a)ii, 3(b), and 4), instead, the case of the statement provided by the i.h. is simply propagated mutatis mutandis.Note that r cannot have the form L λy.s , because otherwise u would not be → LOU normal.Then it follows that r = L x (as r cannot be an application).For the same reason, r → cannot have the form λy.s.Then r → = z for some variable z (possibly z = x).Now take S ′ := Lp.Note that S ′ is applicative and that x → S Lp = 5.6.6 L x → S • p = L x → S = r → S = λy.q.So we are in case 2(b)ii and there is a useful → ls -redex of position S S ′ .(ii) r is not an abstraction.Note that r → S is neutral.Then the statement follows from the i.h.applied to p and S r • .Indeed, if p Corollary 9.2 (Normal Form Property).Let t be a useful normal form.Then t → is a β-normal form.
Proof.Let us apply Lemma 9.1 to S := • and u := t.Since S = • , case 2 cannot hold, so that t At the end of the next section we will obtain the converse implication (Corollary 10.7.1), as a corollary of the strong projection property.
Let us close this section with a comment.The auxiliary lemma for the normal form property is of a technical nature.Actually, it is a strong improvement (inspired by [ASC15]) over the sequence of lemmas we provided in the technical report [ADL14b], that followed a different (worse) proof strategy.At a first reading the lemma looks very complex and it is legitimate to suspect that we did not fully understand the property we proved.We doubt, however, the existence of a much simpler proof, and believe that-despite the technical nature-our proof is compact.There seems to be an inherent difficulty given by the fact that useful reduction is a global notion, in the sense that it is not a rewriting rule closed by evaluation contexts, but it is defined by looking at the whole term at once.Its global character seems to prevent a simpler proof.

The Projection Property
We now turn to the proof of the projection property.We already know that a single shallow dB-step projects to a single β-step (Lemma 5.8).Therefore it remains to be shown that → LOU dB-steps project on LOβ steps.We do it contextually, in three steps: (1) giving a (non-inductive) notion of LOU context, and proving that if a redex S is a → LOU redex then S is a LOU context.(2) providing that LOU contexts admit a inductive formulation.
(3) proving that inductive LOU contexts unfold to inductive LOβ contexts, that is where LOβ steps take place.As for the normal form property, the proof strategy is inspired by [ASC15], and improves the previous proof in the technical report [ADL14b].10.1.LOU Contexts.Remember that a term is neutral if it is β-normal and is not of the form λx.u (i.e. it is not an abstraction).
The next lemma shows that → LOU redexes take place in LOU contexts.In the last sub-case the proof uses the generalized form of the normal form property (Lemma 9.1).
Lemma 10.2 (The Position of a → LOU Step is a LOU Context).Let S : t → u be a useful step.If S is a → LOU step then S is LOU.

Proof. Properties in the definition of LOU contexts:
(1) Outermost: if S = S ′ λx.S ′′ then clearly S ′ is not applicative, otherwise there is a useful redex (the → dB redex involving λx.S ′′ ) containing S, i.e. S is not the LOU redex, that is absurd.(2) Leftmost: suppose that the leftmost property of LOU contexts is violated for S, and let S = S ′ rS ′′ be such that r → S ′ is not neutral.We have that r is → LOU -normal.Two cases: (a) r → S ′ is an abstraction.Then S ′ is the position of a useful redex and S is not the LOU step, absurd.(b) r → S ′ contains a β-redex.By the contextual normal form property (Lemma 9.1) there is a useful redex in t having its position in r, and so S is not the position of the → LOU -redex, absurd.10.2.Inductive LOU Contexts.We introduce the alternative characterization of LOU contexts, avoiding relative unfoldings.We call it inductive because it follows the structure of the context S, even if in the last clause the hypothesis might be syntactically bigger than the conclusion.Something, however, always decreases: in the last clause it is the number of ES.The lemma that follows is indeed proved by induction over the number of ES in S and S itself.
Definition 10.3 (Inductive LOU Contexts).A context S is inductively LOU (or iLOU), if a judgment about it can be derived by using the following inductive rules: Let us show that LOU contexts are inductive LOU contexts.
Lemma 10.4 (LOU Contexts are iLOU).Let S be a context.If S is LOU then S is inductively LOU.
(3) Left Application S = S ′ u.By i.h., S ′ is iLOU.Note that S ′ = L λx.S ′′ otherwise the outermost property of LOU contexts would be violated, since S ′ appears in an applicative context.Then S is iLOU by rule (@l-iLOU).(4) Right Application S = uS ′ .Then S ′ is LOU and so S ′ is iLOU by i.h..By the leftmost property of LOU contexts u → = u → • is neutral.Then S is iLOU by rule (@r-iLOU).(5) Substitution S = S ′ [x u].We prove that S ′ {x u → } is LOU and obtain that S is iLOU by applying the i.h.(first component decreases) and rule (ES-iLOU).There are two conditions to check: (a) Outermost: consider S ′ {x u → } = S ′′ λy.S ′′′ .Note that the abstraction λy comes from an abstraction of S ′ to which {x u → } has been applied, because u → cannot contain a context hole.Then S ′ = S 2 λy.S 3 with S 2 {x u → } = S ′′ and S 3 {x u → } = S ′′′ .By hypothesis S is LOU and so S 2 [x u → ] is not applicative.Then S 2 and thus S 2 {x u → } are not applicative.(b) Leftmost: consider S ′ {x u → } = S ′′ rS ′′′ .Note that the application rS ′′′ comes from an application of S ′ to which {x u → } has been applied, because u → cannot (2) By the normal form property (Corollary 9.2) t has a useful redex, otherwise t → is βnormal, that is absurd.Then the statement follows from the strong projection property (Theorem 10.6) or from Lemma 5.7.
For the sake of completeness, let us also point out that the converse statements of Lemma 10.2 (i.e.useful steps taking place in LOU contexts are → LOU steps) and Lemma 10.5 (inductive LOU contexts are LOU contexts) are provable.We omitted them to lighten the technical development of the paper.

Standard Derivations and the Subterm Property
Here we introduce standard derivations and show that they have the subterm property.11.1.Standard derivation.They are defined on top of the concept of residual of a redex.For the sake of readability, we use the concept of residual without formally defining it (see [ABKL14] for details).
Definition 11.1 (Standard Derivation).A derivation ρ : S 1 ; . . .; S n is standard if S i is not the residual of a LSC redex S ′ ≺ LO S j for every i ∈ {2, . . ., n} and j < i.
The same definition where terms are ordinary λ-terms and redexes are β-redexes gives the ordinary notion of standard derivation in the theory of λ-calculus.
Note that any single reduction step is standard.Then, notice that standard derivations select redexes in a left-to-right and outside-in way, but they are not necessarily LO.For instance, the derivation is standard even if the LO redex (i.e. the dB-redex on x) is not reduced.The extension of the derivation with ((λx.z)z)[yz] → dB z[x z][y z] is not standard.Last, note that the position of a ls-step (Definition 4.2, page 14) is given by the substituted occurrence and not by the ES, that is (xy We have the following expected result.
The subterm property states that at any point of a derivation ρ : t → * u only subterms of the initial term t are duplicated.Duplicable subterms are identified by boxes, and we need a technical lemma about them.11.2.Boxes, Invariants, and Subterms.A box is the argument of an application or the content of an explicit substitution.In the graphical representation of λ-terms xwith ES, our boxes correspond to explicit boxes for promotions.
Definition 11.3 (Box Context, Box Subterm).Let t be a term.Box contexts (that are not necessarily shallow) are defined by the following grammar, where C is a general context (i.e.not necessarily shallow): A box subterm of t is a term u such that t = B u for some box context B.
(2) ≺ LO -ordered shallow derivations are standard: any strategy picking shallow redexes in a left-to-right and outside-in fashion does produce standard derivations (it follows from the easy fact that a shallow redex S cannot turn a non-shallow redex S ′ such that S ′ ≺ LO S into a shallow redex).Moreover, the only redex swaps we will consider (Lemma 12.2) will produce shallow residuals.
11.4.Shallow Terms.Let us conclude the section with a further invariant of standard derivations.It is not needed for the invariance result, but it sheds some light on the shallow subsystem under study.Let a term be shallow if its substitutions do not contain substitutions.The invariant is that if the initial term is a λ-term then standard shallow derivations involve only shallow terms.
Lemma 11.6 (Shallow Invariant).Let t be a λ-term and ρ : t → k u be a standard derivation.Then u is a shallow term.
Proof.By induction on k.If k = 0, the statement is evidently true.Otherwise, by i.h.every explicit substitution in r, where ρ : t → k−1 r, contains a λ-term.We distinguish the two cases concerning the sort of the next step r → u: (1) ls-step.By the subterm property and the fact that t has no ES, the step duplicates a term without substitutions, and-since reduction is shallow-it does not put the duplicated term in a substitution.Therefore, every substitution of u corresponds uniquely to a substitution of r with the same content.Then u is a shallow term by i.h.(2) dB-step.It is easily seen that the argument of the dB-step is on the right of the previous step, so that by Lemma 11.4 it contains a (box) subterm of t.Then, the substitution created by the dB-step contains a subterm of t, that is an ordinary λ-term by hypothesis.The step does not affect any other substitution, because reduction is shallow, and so u is a shallow term.
In this paper we state many properties relative to derivations whose initial term is a λterm.The shallow invariant essentially means that all these properties may be generalized to (standard) derivations whose initial term is shallow.There is, however, a subtlety that justifies our formulation with respect to λ-terms.Lemma 11.6, indeed, does not hold if one simply assume that t is a shallow term.Consider for instance (λx.x)(y[y z]), that is shallow and that reduces in one step (thus via a standard derivation) to x[x y[y z]], which is not shallow.The subtlety is that the position S of the first step of the standard derivation has to be a ≺ LO -majorant of the position of any ES in the term.For the sake of simplicity, we prefered to assume that the initial term has no ES.Note also that this Lemma 11.6 is the only point of this section relying on the assumption that reduction is shallow (the hypothesis of the derivation being standard is also necessary, consider (λx.x)((λy.y)z)→ dB (λx.x)(y[y z]) → dB x[x y[y z]]).

LOU Derivations are Standard
Notation: to avoid ambiguities, in this section we use R, P, Q for redexes, R ′ , P ′ , Q ′ for their residuals, and S, S ′ , S ′′ for shallow contexts.
LO derivations are standard (Theorem 11.2), and this is expected.A priori, instead, LOU derivations may not be standard, if the reduction of a useful redex R could turn a useless redex P ≺ LO R into a useful redex.Luckily, this is not possible, i.e. uselessness is stable by reduction of ≺ LO -majorants, as proved by the next lemma.
We first need to recall two properties of the standardization order ≺ LO relative to residuals, called linearity and enclave.They are two of the axioms of the axiomatic theory of standardization developed by Melliès in his PhD thesis [Mel96], that in turn is a refinement of a previous axiomatization by Gonthier, Lévy, and Melliès [GLM92] (that did not include the enclave axiom).The axioms of Melliès' axiomatic theory have been proved to hold for the LSC by Accattoli, Bonelli, Kesner, and Lombardi in [ABKL14].The two properties essentially express that if R ≺ LO P then P cannot act on R. Their precise formulation follows.
Lemma 12.1 ([ABKL14]).If R ≺ LO P then (1) Linearity: R has a unique residual R ′ after P ; (2) Enclave: two cases (a) Creation: Now we can prove the key lemma of the section.
Lemma 12.2 (Useless Persistence).Let R : t → ls u be a useless redex, P : t → p be a useful redex such that R ≺ LO P , and R ′ be the unique residual of R after P (uniqueness follows from the just recalled property of linearity of ≺ LO ).Then (1) R ′ is shallow and useless; (2) if P is LOU and Q is the LOU redex in p then R ′ ≺ LO Q. Proof.
(1) Let R : S ′ S x [x r] → ls S ′ S r [x r] .According to Definition 6.1 (page 19), a ls-redex is useless when it is not useful.Then, uselessness of R implies that r → S ′ is a normal λ-term (otherwise the relative duplication clause in the definition of useful redexes would hold) and if r → S ′ is an abstraction then S ′ S[x r] is not an applicative context (otherwise relative creation would hold).
Note that ls-steps cannot change the useless nature of R. To change it, in our case, they should be able to change the abstraction/normal nature of r → S ′ or to change the applicative nature of S ′ S[x r] , but both changes are impossible: unfoldings, and thus r → S ′ , cannot be affected by ls-steps (formally, an omitted generalization of Lemma 5.6 is required), and ls-steps cannot provide/remove arguments to/from context holes.So, in the following we suppose that P is a dB-redex.
By induction on S ′ , the external context of R. Cases: (a) Empty context • .Consider P , that necessarily takes place in the context S, P : S x [x r] → S ′ x [x r] The only way in which the residual R ′ : S ′ x [x r] → ls S ′ r [x r] of R can be useful is if P turned the non-applicative context S into an applicative context S ′ , assuming that r → is an abstraction.It is easily seen that this is possible only if P ≺ LO R, against hypothesis.Namely, only if S = S ′′ L λy.L ′ p and S ′′ is applicative, so that P is (i) Abstraction, i.e. S ′ = λy.S ′′ .Both redexes R and P take place under the outermost abstraction, so the statement follows from the i.h.
(ii) Left of an application, i.e. S ′ = S ′′ q.Note that P cannot be the eventual root dB-redex (i.e. if S ′′ is of the form L λy.S ′′′ then P is not the dB-redex involving λy and q), because this would contradict R ≺ LO P .If the redex P takes place in S ′′ S x [x r] then we use the i.h.Otherwise P takes place in q, the two redexes are disjoint, and commute.Evidently, the residual R ′ of R after P is still shallow and useless.(iii) Right of an application, i.e. S ′ = qS ′′ .Since R ≺ LO P , P necessarily takes place in S ′′ , and the statement follows from the i.h.(iv) Substitution, i.e. S ′ = S ′′ [y q].Both redexes R and P take place under the outermost explicit substitution [y q], so the statement follows from the i.h.(2) Assume that P is LOU.By Point 1, the unique residual R ′ of any useless redex R ≺ LO P is useless, so that the eventual next LOU redex Q either has been created by P or it is the residual of a redex Q * such that P ≺ LO Q * .The enclave property guarantees that R ′ ≺ LO Q.
Now an easy iterated application of the previous lemma shows that LOU derivations are standard.
Proof.By induction on the length k of a LOU derivation ρ.If k = 0 then the statement trivially holds.If k > 0 then ρ writes as τ ; R where τ by i.h. is standard.Let τ be R 1 ; . . .R k and R i : t i → t i+1 with i ∈ {1, . . ., k}.If τ ; R is not standard there is a term t i and a redex P of t i such that (1) R is a residual of P after R i ; . . .; R k ; (2) P ≺ LO R i .Since R i is LOU, P is useless.Then, iterating the application of Lemma 12.2 to the sequence R i ; . . .; R k , we obtain that R is useless, which is absurd.Then ρ = τ ; R is standard.
We conclude by applying Corollary 11.5.

The Local Boundedness Property, via Outside-In Derivations
In this section we show that LOU derivations have the local boundedness property.We introduce yet another abstract property, the notion of outside-in derivation, and show that together with the subterm property it implies local boundedness.We conclude by showing that LOU derivations are outside-in.
Definition 13.1 (Outside-In Derivation).Two ls-steps t → ls u → ls r are outside-in if the second one substitutes on the subterm substituted by the first one, i.e. if there exist S and S ′ such that the two steps have the compact form S x → ls S S ′ y → ls S S ′ u .A derivation is outside-in if any two consecutive substitution steps are outside-in.
B g (x) = (var(x), false, ∅, {x}); where B g (t) = (n t , b t , V t , W t ); where where B g (t) = (n t , b t , V t , W t ) and B g (u) = (n u , b u , V u , W u ) and: where C g (t, S) = (n t,S , b t,S , V t,S , W t,S ) and B g (u) = (n u , b u , V u , W u ) and: on LSC terms is defined in Figure 2. The algorithm computing g on pairs in the form (t, S) (where t is a LSC term and S is a shallow context) is defined in Figure 3. First of all, we need to convince ourselves about the correctness of the proposed algorithms: do they really compute the function g? Actually, the way the algorithms are defined, namely by primitive recursion on the input terms, helps very much here: a simple induction suffices to prove the following: Proposition 14.1.The algorithms A g , B g , and C g are all correct: for every λ-term t, for every LSC term u and for every context S, we have (1) The equation A g (t) = g(t) can be proved by induction on the structure of t.An interesting case: • If t = ur, then we know that: where Now, first of all observe that redex (t) = true if and only if there is a redex in u or a redex in r or if u is a λ-abstraction.Moreover, the variables occurring in applicative position in t are those occurring in applicative position in either u or in r or x, if u is x itself.Similarly, the variables occurring free in t are simply those occurring free in either u or in r.The thesis can be synthesized easily from the inductive hypothesis.
(2) The equation B g (u) = g(u → ) can be proved by induction on the structure of u, using the correctness of A.
(3) The equation C g (u, S) = g(u → S ) can be proved by induction on the structure of S, using the correctness of B. This concludes the proof.
The way the algorithms above have been defined also helps while proving that they work in bounded time, e.g., the number of recursive calls triggered by A g (t) is linear in |t| and each of them takes polynomial time.As a consequence, we can also easily bound the complexity of the three algorithms at hand.Proposition 14.2 (Selection Property).The algorithms A g , B g , and C g all work in polynomial time.Thus the LOU strategy has the selection property.
Proof.The three algorithms are defined by primitive recursion.More specifically: • Any call A g (t) triggers at most |t| calls to A g ; • Any call B g (t) triggers at most |t| calls to B g ; • Any call C g (t, S) triggers at most |t| + |S| calls to B and at most |S| calls to C; Now, the amount of work involved in any single call (not counting the, possibly recursive, calls) is itself polynomial, simply because the tuples produced in output are made of objects whose size is itself bounded by the length of the involved terms and contexts.What Proposition 14.2 implicitly tells us is that the usefulness of a given redex in an LSC term t can be checked in polynomial time in the size of t.The Selection Property (Definition 7.3) then holds for LOU derivations: the next redex to be fired is the LO useful one (of course, finding the LO useful redex among useful redexes can trivially be done in polynomial time).

Summing Up
The various ingredients from the previous sections can be combined so as to obtain the following result: Theorem 15.1 (Polynomial Implementation of λ).There is an algorithm which takes as input a λ-term t and a natural number n and which, in time polynomial in m = min{n, # → LOβ (t)} and |t|, outputs an LSC term u such that t → m u → .
Together with the linear implementation of Turing machines in the λ-calculus given in [ADL12], we obtain our main result.
Theorem 15.2 (Invariance).The λ-calculus is a reasonable model in the sense of the weak invariance thesis.
As we have already mentioned, the algorithm witnessing the invariance of the λ-calculus does not produce a λ-term, but a useful normal form, i.e. a compact representation (with ES) of a λ-term.Theorem 15.1, together with the fact that equality of terms can be checked efficiently in compact form entail the following formulation of invariance, akin in spirit to, e.g., Statman's Theorem [Sta79]: Corollary 15.3.There is an algorithm which takes as input two λ-terms t and u and checks whether t and u have the same normal form in time polynomial in # → LOβ (t), # → LOβ (u), |t|, and |u|.
If one instantiates Corollary 15.3 to the case in which u is a (useful) normal form, one obtains that checking whether the normal form of any term t is equal to (the unfolding of) u can be done in time polynomial in # → LOβ (t), |t|, and |u|.This is particularly relevant when the size of u is constant, e.g., when the λ-calculus computes decision problems and the relevant results are truth values.
Please observe that whenever one (or both) of the involved terms are not normalizable, the algorithms above (correctly) diverge.

Discussion
Applications.One might wonder what is the practical relevance of our invariance result, since functional programming languages rely on weak evaluation, for which invariance was already known.The main application of strong evaluation is in the design of proof assistants and higher-order logic programming, typically for type-checking in frameworks with dependent types as the Edinburgh Logical Framework or the Calculus of Constructions, as well as for unification modulo βη in simply typed frameworks like λ-Prolog.Of course, in these cases the language at work is not as minimalistic as the λ-calculus, it is often typed, and other operations (e.g.unification) impact on the complexity of evaluation.Nonetheless, the strong λ-calculus is always the core language, and so having a reasonable cost model for it is a necessary step for complexity analyses of these frameworks.Let us point out, moreover, that in the study of functional programming languages there is an emerging point of view, according to which the theoretical study of the language should be done with respect to strong evaluation, even if only weak evaluation will be implemented, see [SR15].We also believe that our work may be used to substantiate the practical relevance of some theoretical works.There exists a line of research attempting to measure the number of steps to evaluate a term by looking to its denotational interpretations (e.g.relational semantics/intersection types in [dC09, dCPdF11, BL13, LMMP13] and game semantics in [Ghi05, DLL08, Cla13]) with the aim of providing abstract formulations of complexity properties.The problem of this literature is that either the measured strong strategies do not provide reliable complexity measures, or they only address head/weak reduction.In particular, the number of LO steps to normal form-i.e.our cost model-has never been measured with denotational tools.This is particularly surprising, because head reduction is the strategy arising from denotational considerations (this is the leading theme of Barendregt's book [Bar84]) and the LO strategy is nothing but iterated head reduction.We expect that our result will be the starting point for revisiting the quantitative analyses of β-reduction based on denotational semantics.
Mechanizability vs Efficiency.Let us stress that the study of invariance is about mechanizability rather than efficiency.One is not looking for the smartest or shortest evaluation strategy, but rather for one that can be reasonably implemented.The case of Lévy's optimal evaluation, for instance, hides the complexity of its implementation in the cleverness of its definition.A Lévy-optimal derivation, indeed, can be even shorter than the shortest sequential strategy, but-as shown by Asperti and Mairson [AM98]-its definition hides hyper-exponential computations, so that optimal derivations do not provide an invariant cost model.The leftmost-outermost strategy, is a sort of maximally unshared normalizing strategy, where redexes are duplicated whenever possible and unneeded redexes are never reduced, somehow dually with respect to optimal derivations.It is exactly this inefficiency that induces the subterm property, the key point for its mechanizability.It is important to not confuse two different levels of sharing: our LOU derivations share subterms, but not computations, while Lévy's optimal derivations do the opposite.By sharing computations optimally, they collapse the complexity of too many steps into a single one, making the number of steps an unreliable measure.
Inefficiencies.This work is foundational in spirit and only deals with polynomial bounds, and in particular it does not address an efficient implementation of useful sharing.There are three main sources of inefficiency: (1) Call-by-Name Evaluation Strategy: for a more efficient evaluation one should at least adopt a call-by-need policy, while many would probably prefer to switch to call-by-value altogether.Both evaluations introduce some sharing of computations with respect to call-by-name, as they evaluate the argument before it is substituted (call-by-need) or the β-redex is fired (call-by-value).Our choice of call-by-name, however, comes from the desire to show that even the good old λ-calculus with normal order evaluation is invariant, thus providing a simple cost model for the working theoretician.(2) High-Level Quadratic Overhead: in the micro-step evaluation presented here the number of substitution steps is at most quadratic in the number of β-steps, as proved in the high-level implementation theorem.Such a bound is strict, as there exist degenerate terms that produce these quadratic substitution overhead-for instance, the micro-step evaluation of the paradigmatic diverging term Ω, but the degeneracy can also be adapted to terminating terms.(3) Low-Level Separate Useful Tests: for the low-level implementation theorem we provided a separate global test for the usefulness of a substitution step.It is natural to wonder if an abstract machine can implement it locally.The idea we suggested in [ADL14a] is that some additional labels on subterms may carry information about the unfolding in their context, allowing to decide usefulness in linear time, and removing the need of running a global check.
These inefficiencies have been addressed by Accattoli and Sacerdoti Coen in two studies [ASC14,ASC15], complementary to ours.In [ASC14], they show that (in the much simpler weak case) call-by-value and call-by-need both satisfy an high-level implementation theorem and that the quadratic overhead is induced by potential chains of renaming substitutions, sometimes called space leaks.Moreover, in call-by-value and call-by-need the quadratic overhead can be reduced to linear by simply removing variables from values.The same speed-up can be obtained for call-by-name as well, if one slightly modifies the micro-step rewriting rules (see the long version of [ASC14]-that at the time of writing is submitted and can only be found on Accattoli's web page-that builds on a result of [SGM02]).In [ASC15], instead, the authors address the possibility of local useful tests, but motivated by [ASC14], they rather do it for a weak call-by-value calculus generalized to evaluate open terms, that is the evaluation model used by the abstract machine at work in the Coq proof assistant [GL02].Despite being a weak setting, open terms force to address useful sharing along the lines of what we did here, but with some simplifications due to the weak setting.The novelty of [ASC15] is an abstract machine implementing useful sharing, and studied via a distillation, i.e. a new methodology for the representation of abstract machines in the LSC [ABM14].Following the mentioned suggestion, the machine uses simple labels to check usefulness locally and-surprisingly-the check takes constant time.Globally, the machine is proved to have an overhead that is linear both in the number of β-steps and the size of the initial term.Interestingly, that work builds on the schema for usefulness that we provided here, showing that the our approach, and in particular useful sharing, are general enough to encompass more efficient scenarios.But there is more.At first sight call-by-value seemed to be crucial in order to obtain a linear overhead, but the tools of [ASC15]-a posteriori-seem to be adaptable to call-by-name, with a slight slowdown: useful tests are checked in linear rather than constant time (linear in the size of the initial term).For call-by-need with open terms, the same tools seem to apply, even if we do not yet know if useful tests are linear or constant.Generally speaking, our result can be improved along two superposing axes.
One is to refine the invariant strategy so as to include as much sharing of computations as possible, therefore replacing call-by-name with call-by-value or call-by-need with open terms, or under abstractions.The other axe is to refine the overhead in implementing micro-step useful evaluation (itself splitting into two high-level and low-level axes), which seems to be doable in (bi)linear time more or less independently of the strategy.
On Non-Deterministic β-Reduction.This paper only deals with the cost of reduction induced by the natural, but inefficient leftmost-outermost strategy.The invariance of full β-reduction, i.e. of the usual non-deterministic relation allowed to reduce β-redexes in any order, would be very hard to obtain, since it would be equivalent to the invariance of the cost model induced by the optimal one-step deterministic reduction strategy, which is well known to be even non-recursive [Bar84].Note that, a priori, non-recursive does not imply non-invariant, as there may be an algorithm for evaluation polynomial in the steps of the optimal strategy and that does not simulate the strategy itself-the existence of such an algorithm, however, is unlikely.The optimal parallel reduction strategy is instead recursive but, as mentioned in the introduction, the number of its steps to normal form is well known not to be an invariant cost model [AM98].

Conclusions
This work can be seen as the last tale in the long quest for an invariant cost model for the λ-calculus.In the last ten years, the authors have been involved in various works in which parsimonious time cost models have been shown to be invariant for more and more general notions of reduction, progressively relaxing the conditions on the use of sharing [DLM08,DLM12,ADL12].None of the results in the literature, however, concerns reduction to normal form as we do here.
We provided the first full answer to a long-standing open problem: the λ-calculus is indeed a reasonable machine, if the length of the leftmost-outermost derivation to normal form is used as cost model.
To solve the problem we developed a whole new toolbox: an abstract deconstruction of the problem, a theory of useful derivations, a general view of functions efficiently computable in compact form, and a surprising connection between standard and efficiently mechanizable derivations.Theorem after theorem, an abstract notion of machine emerges, hidden deep inside the λ-calculus itself.While such a machine is subtle, the cost model turns out to be the simplest and most natural one, as it is unitary, machine-independent, and justified by the standardization theorem, a classic result apparently unrelated to the complexity of evaluation.
This work also opens the way to new studies.Providing an invariant cost model, i.e. a metric for efficiency, it gives a new tool to compare different implementations, and to guide the development of new, more efficient ones.As discussed in the previous section, Accattoli and Sacerdoti Coen presented a call-by-value abstract machine for useful sharing having only a linear overhead [ASC15], that actually on open λ-terms is asymptotically faster than the abstract machine at work in the Coq proof assistant, studied in [GL02].Such a result shows that useful sharing is not a mere theoretical tool, and justifies a finer analysis of the invariance of λ-calculus.
Among the consequences of our results, one can of course mention that proving systems to characterize time complexity classes equal or larger than P can now be done merely by deriving bounds on the number of leftmost-outermost reduction steps to normal form.This could be useful, for instance, in the context of light logics [GR07, CDLRDR08, BT09].The kind of bounds we obtain here are however more general than those obtained in implicit computational complexity, because we deal with a universal model of computation.
While there is room for finer analyses, we consider the understanding of time invariance essentially achieved.However, the study of cost models for λ-terms is far from being over.Indeed, the study of space complexity for functional programs has only made its very first steps [Sch07, GMR08, DLS10, Maz15], and not much is known about invariant space cost models.
Lemma 4.4 (Totality of ≺ LO ).If C ≺ p t and D ≺ p t then either C ≺ LO D or D ≺ LO C or C = D.
Lemma 9.1 (Contextual Normal Form Property).Let S u be such that u is a useful normal form and S is a shallow context.Then either (1) u → S is a β-normal form, or (2) S = • and there exists a shallow context S ′ such that (a) u = S ′ x , and (b) S S ′ is the position of a useful ls-redex of S u , namely (i) Relative Duplication: x → S S ′ has a β-redex, or (ii) Relative Creation: S ′ is applicative and x → S S ′ is an abstraction.Proof.Note that: • If there exists S ′ as in case 2 then S = • , otherwise case 2(b)i or case 2(b)ii would imply a useful substitution redex in u, while u is a useful normal form by hypothesis.• Cases 1 and 2 are mutually exclusive: if 2(b)i or 2(b)ii holds clearly u → S has a β-redex.We are only left to prove that one of the two always holds.By induction on u.Cases: (1) Variable u = x.If x → S is a β-normal form nothing remains to be shown.Otherwise, x → S has a β-redex and we are in case 2(b)i, setting S ′ := • (giving x → S S ′ = x → S ).
a β-normal form and we are in case 1.While if exists S ′′ such that p = S ′′ x verifies case 2 (with respect to S r • ) then S ′ := rS ′′ verifies case 2 with respect to S. (b) exists S ′′ such that r = S ′′ x verifying case 2 of the statement with respect to S • p .Then case 2 holds for S ′ := S ′′ p with respect to S. (4) Substitution u = r[y p].Note that u → S = r[y p] → S = 5.6.6 r → S • [y p] .So we can apply the i.h. to r and S • [y p] , from which the statement follows.

Figure 2 :
Figure 2: Computing g in compact form.

Figure 3 :
Figure 3: Computing g in compact form, relative to a context.
3) Freedom: if S does not capture any free variable of t then t → r → ls p.
′ by Lemma 5.6.4.Then S ′ r → takes place only if it contributes to eventually obtain an unshared (i.e.shallow) β/dB-redex.Absolute usefulness can be of two kinds.There is also a subtler relative usefulness to dB-redexes.A substitution step may indeed put just a piece of what later, with further substitutions, will become a dB-redex.Accommodating relative usefulness requires to generalize the duplication and the creation cases to contexts and relative unfoldings.Let us give some examples.The following step (tx)[x yy] → ls (t(yy))[x yy] is useless.However, in an appropriate context it is relatively useful.For instance, let us put it in • [y λz.z], obtaining a case of relatively useful duplication, Note that the step, as before, does not duplicate a dB-redex.Now, however, evaluation will continue and turn the substituted copy of yy into a dB-redex, as follows (1) Duplication: a step can duplicate dB-redexes, as in S x [x (λy.r)p] → ls S (λy.r)p [x (λy.r)p] (2) Creation: it can create a new dB-redex with its context, if it substitutes an abstraction in an applicative context, as in S L x u [x λy.t] → ls S L λy.t u [x λy.t] (t(yy))[x yy][y λz.z] → ls (t((λz.z)y))[xyy][y λz.z]