Modular coinduction up-to for higher-order languages via first-order transition systems

The bisimulation proof method can be enhanced by employing `bisimulations up-to' techniques. A comprehensive theory of such enhancements has been developed for first-order (i.e., CCS-like) labelled transition systems (LTSs) and bisimilarity, based on abstract fixed-point theory and compatible functions. We transport this theory onto languages whose bisimilarity and LTS go beyond those of first-order models. The approach consists in exhibiting fully abstract translations of the more sophisticated LTSs and bisimilarities onto the first-order ones. This allows us to reuse directly the large corpus of up-to techniques that are available on first-order LTSs. The only ingredient that has to be manually supplied is the compatibility of basic up-to techniques that are specific to the new languages. We investigate the method on the pi-calculus, the lambda-calculus, and a (call-by-value) lambda-calculus with references.


Introduction
One of the keys for the success of bisimulation is its associated proof method, whereby to prove two terms equivalent, one exhibits a relation containing the pair and one proves it to be a bisimulation. The bisimulation proof method can be enhanced by employing relations called 'bisimulations up-to' [San98,Len98,PS12,RBR13]; see [PS19] for a historical perspective. These need not be bisimulations; they are simply contained in a bisimulation. Such techniques have been widely used in languages for mobility such as π-calculus or higher-order languages such as the λ-calculus, or Ambients (e.g., [Las98a,MN05,SW01]).
Several forms of bisimulation enhancements have been introduced: 'bisimulation up to bisimilarity' [Mil89] where the derivatives obtained when playing bisimulation games can be rewritten using bisimilarity itself; 'bisimulation up to transitivity' where the derivatives may be rewritten using the up-to relation [San98]; 'bisimulation up to context' [San94b], where a common context may be removed from matching derivatives. Further enhancements may exploit the peculiarities of the definition of bisimilarity on certain classes of languages: e.g., the up-to-injective-substitution techniques of the π-calculus [JR99,SW01], techniques for shrinking or enlarging the environment in languages with information hiding mechanisms (e.g., existential types, encryption and decryption constructs [AG98,SP07a,SP07b]), frame equivalence in the psi-calculi [PP14], or higher-order languages [KW06,Las98b]. Lastly, it Notations. We let R, S range over binary relations and we often write x R y for (x, y) ∈ R. Given two relations R, S, we write RS for their relational composition, i.e., RS {(x, z) | ∃y, x R y ∧ y S z}, R + for the transitive closure of R, and R * for its reflexive transitive closure. We use the standard arrow notation → to denote functions when the domain is clear, and ⇒ for logical implication; other arrow notations will be introduced as we go along.
In languages defined from a grammar, a context C of arity n ∈ N is a term with numbered holes [·] 1 , . . . , [·] n , where each hole [·] i can appear any number of times in C. We write C[P 1 , . . . , P n ] for the application of such a context to n terms P 1 , . . . , P n of the language.

First-order bisimulation and up-to techniques
As explained in the introduction, the results in this section are not new: we review generalpurpose tools that we exploit to prove soundness of up-to techniques. These tools were obtained in several steps: respectfulness is from [San98]; we refined it into the notion of compatibility up-to in the conference version of this paper [MPS14], and this refinement eventually led to the notion of the companion [Pou16] and to the associated tools we exploit here.
A first-order Labelled Transition System, briefly LTS, is a triple (Pr, Act, −→) where Pr is a non-empty set of states (or processes), Act is the set of actions (or labels), and −→ ⊆ Pr × Act × Pr is the transition relation. We use P, Q, R to range over the processes of the LTS, and µ to range over the labels in Act, and, as usual, write P µ − → Q when (P, µ, Q) ∈ −→. We assume that Act includes a special action τ that represents an internal activity of the processes. We derive bisimulation from the notion of progression between relations. Definition 2.1. We define the monotone function sp on relations on processes of an LTS: We say that R strongly progresses to S, written R sp S, if R ⊆ sp(S). A relation R is a strong bisimulation if R sp R; and strong bisimilarity, ∼, is the union of all strong bisimulations.
Definition 2.2. We define the monotone function wp on relations on processes of an LTS: We say that R weakly progresses to S, written R wp S, if R ⊆ wp(S). A relation R is a weak bisimulation if R wp R; and weak bisimilarity, ≈, is the union of all weak bisimulations.
Below we summarise the ingredients of the theory of bisimulation enhancements for first-order LTSs from [PS12] that will be needed in the sequel. We use f and g to range over monotone functions on relations over a fixed set of states. Each such function represents a potential up-to technique; only the sound functions, however, qualify as up-to techniques: Unfortunately, the class of sound functions does not enjoy good algebraic properties. In particular, the composition and the pairwise union of two sound functions are not necessarily sound [PS12, Section 6.3.3]. As a remedy to this, the subset of compatible functions has been proposed. The concepts in the remainder of the section can be instantiated with both strong and weak bisimilarities; we thus use p to range over sp or wp.
Definition 2.4. We write f p g when f • p ⊆ p • g. A monotone function f on relations In other terms, f p g when R Simple examples of compatible functions are the identity function id and the function mapping any relation onto bisimilarity (strong or weak case, depending on the considered case). This means that (R → ∼) is sp-compatible, and (R → ≈) is wp-compatible. In addition, (R → ∼) is also a useful wp-compatible function. The class of compatible functions is closed under function composition and union (where the union ∪F of a set of functions F is the point-wise union mapping R to f ∈F f (R)), and thus under ω-iteration (where the ω-iteration f ω of a function f maps R to n∈N f n (R)). For example (R → (R ∪ ∼)) is spand wp-compatible.
Other examples of compatible functions are typically contextual closure functions, or up-to-context, mapping a relation into its closure w.r.t. a given set of contexts. Not all context closures are compatible: their compatibility must be established separately for each LTS. For such functions, the following lemma shows that the compatibility of up-to-context implies the congruence of (strong or weak) bisimilarity.
Certain closure properties for compatible functions however only hold in the strong case. The main example is the chaining operator , which implements pointwise relational composition: f g (R) f (R) g(R) where the juxtaposition R S of two relations R and S and denotes their relational composition. Using chaining we can obtain the compatibility of the 'up-to-transitivity' function, mapping a relation R onto its reflexive and transitive closure R . Another important example of compatible function in the strong case is 'up-to-strong-bisimilarity' (R → ∼R∼), which is also compatible in the weak case.
In contrast, the counterpart of this latter function in the weak case, R → ≈R≈, is unsound. This is a major drawback in up-to techniques for weak bisimilarity, which can be partially overcome by resorting to the expansion relation [AKH92, SM92] (a refinement of expansion is the contraction relation [San17]). Expansion is an asymmetric refinement of weak bisimilarity whereby P Q holds if P and Q are bisimilar and, in addition, Q is at least as efficient as P , in the sense that Q is capable of producing the same activity as P without ever performing more internal activities (the τ -actions). More precisely, the associated progression function is ep defined below, and is the union of all R such that R ⊆ ep(R).
Up-to-expansion yields a function (R → R ) that is contained in a wp-compatible function, and which can be freely combined with any wp-compatible function, yielding for instance the 'up-to-expansion-and-contexts' technique. More sophisticated up-to techniques can be obtained by carefully adjusting the interplay between visible and internal transitions, and by taking into account termination hypotheses [PS12].
Some further compatible functions are the functions sp and wp themselves (indeed a function f is p-compatible if f • p ⊆ p • f , hence trivially f can be replaced with p itself). Intuitively, the use of sp and wp as up-to techniques means that, in a diagram-chasing argument, the two derivatives need not be related; it is sufficient that the derivatives of such derivatives be related. Accordingly, we sometimes call functions sp and wp unfolding functions. We will use sp in the example in Section 6 and wp in Sections 4 and 5, when proving the wp-compatibility of the up-to-context techniques. Note that up-to-context functions are the only ones that need to be proved compatible separately for each LTS; in this section all other functions mentioned as compatible are so for every first-order LTS.

Companion.
We say that f is below g, and write f ⊆ g, when f (R) ⊆ g(R) for every relation R. Any function below a sound function is sound as well. Similarly, if f ⊆ g and g(∼) ⊆ ∼ then f (∼) ⊆ ∼.
In general a function below a compatible function need not be itself compatible. However, it turns out that there is a largest compatible function, which is called the companion of p [Pou16], defined as the pointwise union of all p-compatible functions: In the following, we generally omit the subscript or superscript p when clear from the context. Since t is itself compatible we can deduce from Lemmas 2.5 and 2.6 that if f ⊆ t sp , then f is sound for ∼ and f (∼) ⊆ ∼. Similarly in the weak case: if f ⊆ t wp then f is sound for ≈ and f (≈) ⊆ ≈.
The identity function id and the function p itself are below t. The fact that function composition preserves compatibility is reflected by the idempotence of t, i.e. t • t = t. Since the companion is idempotent and contains all compatible functions, every bisimulation proof up to a certain combination of compatible functions can be presented as a bisimulation up to the companion. Although this observation does not make such proofs fundamentally easier, it slightly simplifies their presentation: the precise combination of up-to techniques does not have to be made explicit. This is typically extremely convenient in a proof assistant.
2.2. Tools for validating up-to techniques. The companion makes it possible to perform bisimulation proofs up to arbitrary combinations of functions that are known to be below it. In concrete languages, we thus have to prove that the functions associated to up-to techniques such as up-to-context, are indeed below the companion. By definition of the companion, given a function f , an obvious way to prove f ⊆ t consists in proving that f is compatible, i.e. 1 , f f . This is however quite restrictive in practice, because many useful functions are not compatible by themselves, they are only contained in a compatible function, which is often hard to express explicitly. (Very much like bisimulation up-to, which can be small and convenient to work with while the concrete bisimulations lying over them can be large or hard to express.) Seeing the companion as a coinductive object, one can in fact relax the requirement This means we can freely exploit a function g already known to be below the companion when establishing a progression about f .
(2) if f f 2 , then f ω is compatible and contains f . This means we can use f twice in a row when establishing a progression about f . By a similar argument, f f ω also entails f ⊆ t, meaning we can actually use f as many times as required.
(3) for all sets F of functions such that for all , so that all functions in F are below the companion. This intuitively makes it possible to reason by 'mutual coinduction' in order to prove that a family of up-techniques is valid.
Leaving the companion aside, the last item above was in fact named "compatibility up-to" in the previous version of this work [MPS14]. This idea was simplified in [Pou16], by defining the second-order function B(g) f g f . Indeed, the notation f p g, which was an apparent overloading of the progression operator , can now be seen as the regular progression operator associated to the function B. This function B also has a companion written T , which is a monotone function satisfying the following properties, for all monotone functions f : The first point is just the fact that every function compatible up to T lies below the companion (like every bisimulation up to t is contained in bisimilarity). The second point tells us that given a family F of functions, T (∪F ) actually contains all potential combinations of functions in F and functions below t.
The three examples of compatible functions up-to listed above can thus be seen as particular instances of compatible functions up to T . In particular, the last item, which we will use repeatedly to prove that up-to-context techniques are valid in the first-order LTSs we present, can be generalised as follows: Remark 2.7 (On respectfulness). In the first modular treatment of up-to techniques for bisimilarity [San98,SW01], the notion of respectful function was used: a monotone function f is respectful if for all R, S such that R ⊆ S and R S, we have f (R) f (S). Every compatible function is respectful, but the converse is not true. The hypothesis R ⊆ S was actually added in the definition of respectfulness to ease proofs about up-to-context, which typically lead to respectful functions that are not compatible. However, this difference between compatible and respectful functions disappears when considering the companion: the largest compatible function and the largest respectful function coincide [Pou16], so that focusing on the simpler notion of compatibility does not prevent us from using certain up-to techniques, in the end.
In practice, proofs of up-to techniques based on respectfulness can be adapted to the compatibility setting as follows. Suppose we try to prove f T (f ) for a specific function f , i.e., to prove that R S entails f (R) T (f )(S). The missing assumption R ⊆ S is in general useful for those cases where we obtain process derivatives related via R rather than S. Respectfulness makes it possible to conclude directly in those cases, since With compatibility, we can use the up-to-unfolding technique: we have where the first inclusion is just the assumption R S. 3. The π-calculus We let letters a, b range over a set of names. We recall the syntax for π-calculus processes (P ) and transition labels (µ): The name b is bound is P in constructs a(b).P and νb P . The early operational semantics is described by the rules for −→ π in Figure 1. We write fn(Q) for the free names in Q, defined as usual. The names n(µ) of µ are defined as n(ab) n(ab) n(a(b)) {a, b} and n(τ ) ∅ and the bound names bn(µ) of µ are defined as bn(ab) bn(ab) bn(τ ) ∅ and bn(a(b)) {b}. −→ π P in some presentations of the π-calculus. Those presentations look simpler but rely on α-conversion of b in νb P . We choose here to be more explicit.
We do not want to distinguish processes according to the identity of the bound names they may extrude. This is why we need a specific clause for bound outputs in the standard definition of bisimulation: Definition 3.1. A relation R is a strong early bisimulation if, whenever P R Q: if P µ −→ π P and µ is not a bound output, then Q µ −→ π Q for some Q such that P R Q , (3) the converse of (1) and (2), on Q. Early bisimilarity, ∼ e , is the union of all early bisimulations. The weak version of early bisimilarity, weak early bisimilarity, written ≈ e , is obtained in the standard way: the =⇒ π Q ; and similarly the transition When translating the π-calculus semantics to a first-order one, the ad-hoc condition b / ∈ fn(Q) has to be removed. To this end, one has to force an agreement between two bisimilar processes on the choice of the bound names appearing in transitions. We obtain this by considering named processes (c, P ) in which c is bigger or equal to all names in P . For this to make sense we assume an enumeration of the names and use ≤ as the underlying order, and c + 1 for name following c in the enumeration; for a set of names N , we also write c ≥ N to mean c ≥ a for all a ∈ N .
The rules below define the translation of the π-calculus transition system to a first-order LTS. In the first-order LTS, the grammar for labels is the same as that of the original LTS; however, for a named process (c, P ) the only name that may be exported in a bound output is c + 1; similarly only names that are below or equal to c + 1 may be imported in an input transition. (Indeed, testing for all fresh names b > c is unnecessary, doing it only for one (b = c + 1) is enough.) This makes it possible to use the ordinary definition of bisimilarity for first-order LTSs, and thus recover the early bisimilarity on the source terms.
We write π 1 for the first-order LTS derived from the above translation of the π-calculus. Although the labels of the source and target transitions have a similar shape, the LTS in π 1 is first-order because labels are taken as purely syntactic, uninterpreted objects. We can also define π 1 using the following two rules: This characterisation is less explicit but sometimes more convenient in proofs, and it might give an insight on to how to derive translations for other name-based calculi by keeping track of new names and of binding labels. We will show that the standard notions strong and weak early bisimilarity for the π-calculus (∼ e and ≈ e from Definition 3.1) correspond to ∼ and ≈ in π 1 . Proving soundness, i.e., bisimilarity in π 1 entails bisimilarity in π, requires us to establish first that bisimilarity in π 1 is stable under injective substitutions. Anticipating that we also want to propose various up-to techniques for π 1 , we show directly that the corresponding up-to-injective-substitutions technique is below the companion. It follows that bisimilarity in π 1 is stable under injective substitutions by Lemma 2.6, and the work is done only once. We define the following monotone functions on relations on π 1 processes: The first one, isub, makes it possible to use injective substitutions; the second one, bsub, is restricted to bijective substitutions; the third one, str, is a form of strengthening, making it possible to readjust the bound c on free names; conversely, the last one, w, is a form of weakening. The last two functions are often useful as up-to techniques, by themselves. The point of the function bsub is that it makes it possible to obtain isub as a derived technique: we have isub = bsub • w, and bsub is slightly easier to analyse.
Lemma 3.2. The functions isub, bsub, str, and w are all below the companion t sp and below the companion t wp .
Then we show w bsub • w and str str • bsub, which is done using a similar diagramchasing argument. Each newly created name is handled with a transposition using bsub, using the facts that Q µ −→ π Q implies fn(Q ) ⊆ fn(Q) ∪ n(µ), and that fn(Q σ) = σ(fn(Q )). Since bsub ⊆ t, we deduce w T (w) and str T (str), so that both w and str are also below t. It follows that isub = bsub • w ⊆ t • t = t.
It follows by Lemma 2.6 that bisimilarities in π 1 are closed under injective substitution: isub(∼) ⊆ ∼ and isub(≈) ⊆ ≈. We can now establish full abstraction between π 1 and early bisimilarities: Proof. We prove the case of weak bisimilarity, the strong case being easier. For the direct implication, we show that the relation R 1 defined below is a weak bisimulation: The only interesting transition is when (c, P ) =⇒ π Q with P ≈ e Q . Repeatedly applying the rules defining · −→, since b = c + 1, yields For the converse, proving that is a weak early bisimulation needs a little more care, since fresh names in labels can be other than c + 1 (they can be less than or greater than c + 1). Suppose P R 2 Q, which means there is c ≥ fn(P ) ∪ fn(Q) such that (c, P ) ≈ (c, Q). We analyse the transitions of the form P µ −→ π P : and hence, since isub(≈) ⊆ ≈ by Lemmas 2.6 and 3.2, we obtain (b, P ) ≈ The above full abstraction result allows us to import the theory of up-to techniques for first-order LTSs and bisimilarity, in both the strong and the weak cases.
We have already proved the validity of preliminary up-to techniques that are specific to π 1 (Lemma 3.2); we proceed below with up-to-context techniques.
The up-to-context function is decomposed into a set of smaller context functions, called initial [PS12], one for each operator of the π-calculus. The only exception to this is the input prefix, since early bisimilarity in the π-calculus is not preserved by this operator. We write C o , C ν , C ! , C | , and C + for these initial context functions, respectively applying the operators of output prefix, restriction, replication, parallel composition, and sum, to all pairs in the given relation.
Definition 3.4. We define the functions C o , C ν , C ! , C | and C + on relations on π 1 by the following rules: While bisimilarity in the π-calculus is not preserved by input prefix, a weaker rule holds: where can be ∼ e or ≈ e . We define accordingly C i , the function for input prefix: Definition 3.5. C i is the function on π 1 relations defined by the rule: for each function f ∈ F . In each case, we assume R S and we prove f (R) T (∪F )(S). Remark that R ⊆ sp(S), and so For this, it suffices to analyse the transitions emerging from the left-hand side of f (R), as every f is symmetric: The interesting case arises for transitions for which the last rule applied is the extrusion rule: (c, (νd)P ) The problem is to relate . This is done using the isub function with the injective substitution {b/d} : For this, we analyse the transition (c, P 1 | P 2 ) µ −→ (c , P ). First, let's assume that c = c. The transition must come from one of the four rules par-l, par-r, comm-l, or comm-r: -Symmetrically, rule par-r takes us to the pair ((c, -Working both sides, rules comm-l and comm-r both lead us to a pair The second case is when c = c + 1. This means that the transition is derived using par-l, par-r, close-l, or close-r. We consider two cases: -The last rule is a par-l rule (par-r being symmetric), with a label of the form a(b).
We know (c, P 1 ) R (c, Q 1 ) and (c, P 2 ) R (c, Q 2 ) and We have b = c + 1, following the rule for bound output. We also have the following reductions in π 1 , from (c, P 1 ), and then from (c, Q 1 ) using the progression R S: . We now need to relate the resulting processes: The same happens for the input transition We can assume b = c + 1 as b is fresh on both sides. The two hypotheses can then be transformed into transitions in π 1 : We have the same transitions for Q 1 and Q 2 , respectively. Using the hypothesis R S, we obtain named processes (b, Q 1 ) and (b, Q 2 ), related through S, which we can combine using C | and then strengthen b to c since b ∈ fn(P, Q): .
with Q of the same shape, so we only need to relate (c , P ) to (c , Q ) knowing (b, P i ) S (b, Q i ). First, we note that (b, P ) isub(R) (b, Q). We have now the following pairs in S 0 (C ! ∪ id)(isub(R ∪ S)): We can then apply C | several times to obtain the three pairs (with S 1 C ω | (S 0 )): The first two pairs handle cases 1 and 2. For case 3 we need to apply C ν to add (νb)− and then str so to go from (b, (νb)−) to (c, (νb)−). We apply C ω | again to add the missing − | P | . . . | P and we obtain (c, P ) and (c, Q ) in the relation C ω | (str(C ν (S 1 ))). Concluding, we have obtained the following progression: ) which was our original goal. Note that the iterated C ω | was used twice; both times it can be absorbed by T , so to give us at the end For each f ∈ F we have established a progression from f (R) to T (∪F )(R ∪ S), and so to T (∪F )(S), as needed: this gives us f T (∪F ), and in turn ∪F T (∪F ) and ∪F ⊆ t.
Weak bisimilarity is not preserved by sums, only by guarded sums, whose function is Proof. The progressions are as in the proof of Theorem 3.6, except for C g+ , which is treated as C o and C i ; we need one more up-to technique for the case of the replication. Assuming R wp S, the following progressions hold: For the replication operator, only case 1 (of the corresponding proof of Theorem 3.6) cannot be transported to the weak case. We have: We use the property that R S so that from (c, and we conclude as before. • if n = 0 then there is no transition from Q or !Q, we know P 0 S Q but we cannot reach the desired form (c, !Q | Q | . . . | Q) with a transition. Instead, we remark that (c, !Q) ∼ (c, !Q | Q | . . . | Q) and so we simply progress to the relation S 2 ∼ where S 2 = C ω | (C ! (R) ∪ R ∪ S). Compared to the strong case, we only need to compose (on the left) the right-hand side of the progression with the function R → ∼R∼ ('up-to-strong-bisimilarity') which is indeed wp-compatible. As a byproduct of the compatibility of these initial context functions, and using Lemma 2.6, we derive the standard congruence properties of strong and weak early bisimilarity, including the rule (3.1) for input prefix.
Corollary 3.8. In the π-calculus, relations ∼ e and ≈ e are preserved by the operators of output prefix, replication, parallel composition, restriction; ∼ e is also preserved by sum, whereas ≈ e is only preserved by guarded sums. Moreover, rule (3.1), for input prefix, is valid both for ∼ e and ≈ e . −→ π P where b is bound, the definition of bisimulation containing a quantification over names. To translate this bisimilarity in a first-order LTS we would need two transitions for the input a(b): one to fire the input a, leaving b uninstantiated (for example, in a new kind of process (b)(c, P ) akin to an abstraction), and another to instantiate b with any name, for transitions starting from processes of the new kind: While such a translation does yield full abstraction for both strong and weak late bisimilarities, the decomposition of an input transition into two steps prevents us from obtaining the compatibility of up-to-context. Indeed, compatibility of up-to-context intuitively requires that the immediate transitions of C[P ] should depend only on the immediate transitions of P . However, if inputs are decomposed into two steps, a contexts such as [·] 1 | ab may combine two successive steps of the (input) argument to perform a single τ transition.
To conclude, the main take-away message on the π-calculus is that it suffices to count names to make the LTS first-order. Then, once the corresponding up-to techniques for names are set-up, we recover the usual progression proofs, in a modular way. While this level of modularity was already present in [Pou08], it now becomes simpler thanks to the companion.

Call-by-name λ-calculus
To study the applicability of our approach to higher-order languages, we investigate the pure call-by-name λ-calculus, referred to as ΛN in the sequel.
We use M, N to range over the set Λ of λ-terms, and x, y, z to range over variables. The set Λ of pure λ-terms is defined by: We assume the familiar concepts of free and bound variables and substitutions, and identify α-convertible terms. The only values are the λ-abstractions λx.M . In this section and in the following one, results and definitions are presented on closed terms and we write Λ 0 for the subset of closed terms. Extension to open terms is made using closing abstractions (i.e., abstracting on all free variables). The reduction relation of ΛN is the call-by-name reduction relation −→ n , defined as the least relation over Λ 0 that is closed under the following rules. We write =⇒ n for its reflexive and transitive closure. In call-by-name, evaluation contexts are described by the following grammar: As reference equivalence for the λ-calculus we consider environmental bisimilarity [SKS11,KLS11], which coincides with contextual equivalence and Abramsky's applicative bisimilarity [Abr89] on pure λ-terms while enabling a richer set of up-to techniques. Environmental bisimilarity makes a clear distinction between the tested terms and the environment. An element of an environmental bisimulation has, in addition to the tested terms M and N , a further component E, the environment, which expresses the observer's current knowledge. When an input from the observer is required, the arguments supplied are terms that the observer can build using the current knowledge; that is, terms obtained by composing the values in E using the operators of the calculus. An environmental relation is a set of elements, each of which can be of two forms: either a relation E on closed values, or a triple (E, M, N ) where M, N are closed terms and E is a relation on closed values. We use X , Y to range over environmental relations. In a triple (E, M, N ) the relation component E is the environment, and M, N are the tested terms. We write M X E N for (E, M, N ) ∈ X . We write E for the closure of E under contexts. We only define the weak version of the bisimilarity; its strong version is obtained in the expected way. For environmental bisimilarity to be expressed via a first-order transition system, a few issues have to be resolved. For instance, an environmental bisimilarity contains both triples (E, M, N ), and pure environments E, which shows up in the difference between clauses (1) and (2) of Definition 4.1. Moreover, the input supplied to tested terms may be constructed using arbitrary contexts.
We write ΛN 1 for the first-order LTS resulting from the translation of ΛN . The states of ΛN 1 are sequences of λ-terms in which only the last one need not be a value. We use Γ and ∆ to range over sequences of values only; thus (Γ, M ) indicates a sequence of λ-values followed by M . We write |Γ| for the length of a sequence Γ, and Γ i for the i-th element in Γ, when i ≤ |Γ|.
For a finite environment E, we write E 1 for an ordered projection of the pairs in E on the first component, and E 2 is the corresponding projection on the second component. In the translation, intuitively, a triple (E, M, N ) of an environmental bisimulation is split into the two components (E 1 , M ) and (E 2 , N ). When C is a context of arity |Γ|, we write C[Γ] for the term obtained by replacing each hole [·] i in C with the value Γ i . The rules for transitions in ΛN 1 are as follows; they are reminiscent of [LM00]. The first rule says that if M reduces to M in ΛN then M can also reduce in ΛN 1 , in any environment. The second rule implements the observations in clause (2) of Definition 4.1: in an environment Γ (only containing values), any component Γ i can be tested by supplying, as input, a term obtained by filling a context C with values from Γ itself. The label of the transition records the position i and the context chosen. As the rules show, the labels of ΛN 1 include the special label τ , and can also be of the form i, C where i is a integer and C a context.
We establish full abstraction from environmental bisimilarity to bisimilarity on ΛN 1 for finite environments. Full abstraction for the empty environment alone is enough for our interests since contextual equivalence corresponds to environmental bisimilarity with the empty environment. One could accommodate ΛN 1 and the corresponding full abstraction result for possibly-infinite environments, however we felt that it was not worth the notational complications, since infinite environments are not reachable from finite ones in environmental bisimulations, and since we do not think that infinite environments increase discriminative power. In the statement below, ≈ denotes standard weak bisimilarity (Definition 2.2) on ΛN 1 .
The following proof shows a precise correspondence between environmental bisimulations and bisimulations in ΛN 1 . The reader familiar with environmental bisimilarities should find the statement illustrative and maybe applicable to other variants of environmental bisimilarities. It is also possible to show a direct, although less precise, correspondence between contextual equivalence and bisimilarity. This second approach is shown for the imperative λ-calculus in Section 5 and exploits the compatibility of up-to-context functions. Since compatibility of up-to-context is proved independently of the correspondence result for ΛN 1 , this approach would also work for ΛN 1 ; however, we found it more interesting here to show the more precise result.
Theorem 4.2. When E is a finite environment, Proof. (⇒) We show that if X is an environmental bisimulation then X 2 is a (first-order) weak bisimulation, where X 2 relates (E 1 , M ) to (E 2 , N ) when (E, M, N ) ∈ X , and E 1 to E 2 when E ∈ X . By symmetry we consider only one direction: we suppose x X 2 y and a transition x µ −→ x , and we obtain y such that yμ =⇒ y and x X 2 y .
(2) If (λx.P, λx.Q) ∈ Γ · ∆ ∈ X and (M, N ) ∈ (Γ · ∆) , we prove that P {M/x} X Γ·∆ Q{N/x}. The theorem also holds for the strong versions of the bisimilarities. Again, having established full abstraction with respect to a first-order transition system and ordinary bisimilarity, we can inherit the theory of bisimulation enhancements. We have however to check up-to techniques that are specific to environmental bisimilarity.
Structure and reusability of proofs. The first technique is proved compatible in Lemma 4.3, which is an example of the standard way of proving compatibility. The other three techniques are interdependent in that they each progress to a function containing all three (Lemmas 4.4, 4.13, and 4.14). These progressions could be established separately, which would be an improvement of modularity over a monolithic proof of compatibility (itself an improvement of size over two redundant proofs of up-to-context and congruence). Moreover, we achieve here a substantial amount of additional proof refactoring thanks to two general ingredients. The first (Definition 4.5, Lemmas 4.6 and 4.12) may be of general interest to handle calculi whose grammars separate 'values' from 'non-value'. The second (Lemmas 4.8, 4.9, and 4.10) may be of general interest for calculi that are quasi-deterministic, in the sense of Definition 4.7. (These results are used again in Section 5.) The three progressions are finally combined into Theorem 4.15.
A useful technique specific to environmental bisimilarity is 'up-to-environment', which allows us to replace an environment with a larger one. We define w(R) as the smallest relation that includes R and such that, whenever (   Proof. Since silent transitions do not alter the environment, we only consider (i, C)-transitions; writing Γ = V 1 , . . . , V n , Γ and ∆ = W 1 , . . . , W n , ∆, we have: where C +n is C where each hole [·] j has been replaced with [·] j+n . Then Γ N ), and so from R S we obtain w(R) w(S).
Somewhat dual to weakening is the strengthening of the environment, in which a component of an environment can be removed. However this is only possible if the component removed is 'redundant', that is, it can be obtained by gluing other pieces of the environment within a context; strengthening is captured by the following str function: where C v ranges over value contexts (i.e., the outermost operator of C v is an abstraction or C v is a hole). We show that str is below the companion in Theorem 4.15.
For up-to-context, we need to distinguish between arbitrary contexts and evaluation contexts. There are indeed congruence properties, and corresponding up-to techniques, that only hold for the latter contexts. A hole [·] i of a context C is in a redex position if the context obtained by filling all the holes but [·] i with values is an evaluation context. Below, C ranges over arbitrary contexts, whereas E ranges over contexts in which the first hole [·] 1 appears exactly once and in redex position. We will prove that functions C, str, and C e are below both companions with a separate progression result for each function. We start by establishing a progression for C.
(1) Suppose C[Γ] is a value, so C is a value context C v ; the transition to consider is of the (2) If C[Γ] is not a value then C is necessarily of the form C = E[C v1 C 2 , −], for some evaluation context E, value context C v1 , and context C 2 . The transition is of the form Using the label i, C 2 , the progression R S provides us with an answer (∆, N 1 ) such that ∆ i (C 2 [∆]) =⇒ n N 1 , which allows us to conclude up to C e : Before moving on to the techniques str and C e , it is useful to remark that when they are applied to values, they look like special cases of C. This can be used to shorten the proofs substantially, but this needs to be made formal first by defining a restriction function and using it to relate str and C e to C.
The first step is to show that indeed, techniques C e and str are, on value configuration pairs, special cases of C: Proof. Any pair in C e (v(R)) is of the form ((Γ , E[Γ n , Γ ]), (∆ , E[∆ n , ∆ ])) where: n is the arity of E, (Γ, ∆) ∈ R, and Γ (respectively ∆ ) is the sequence Γ (respectively ∆) without its last element. The context C = E[[·] n , [·] 1 , . . . , [·] n−1 ] applied to (Γ, ∆) ∈ R shows that the original pair is of the form ((Γ , C[Γ]), (∆ , C[∆])) and hence is in t(C(R)): we use w ⊆ t to remove the nth values from the environments Γ and ∆. The same argument applies for str as well, except that we use t in t(C(R)) only to swap the last two elements the sequences.
We handled pairs of value configurations, so now we need to handle the other kinds of pairs. We first handle the case where the left member of the pair is a value configuration. We need however to first define a notion of determinism of an LTS: Definition 4.7. We say that a LTS (Pr, Act, −→) is quasi-deterministic if there exists an equivalence relation on Act such that for all labels µ, µ ∈ Act and processes x, This version of determinism is looser than strict determinism, since it allows derivatives to be strongly bisimilar and not necessarily equal, and labels to be related through some equivalence relation, rather than equal. This equivalence relation must in turn be reflected by the set of labels that can be performed from a given process.
A similar notion can be found in the formalisation of a compiler with some nondeterminism [SVN + 13], where such a relation on labels is defined. This relation satisfies (1), a LTS that is said to be 'determinate' satisfies (2) (although (2) is more relaxed as it allows for bisimilar processes) and (3) and a 'receptive' LTS satisfies (4).  Proof. We first prove that whenever µ 1 , µ 2 = τ , for all x 2 , x 2 , by induction on x 1 =⇒ x 1 .
Proof. ΛN 1 is quasi-deterministic, using i, C i , C whenever C and C are of the same arity, so we use Lemma 4.9 with x = ∆, µ = 1, C 0 (where C 0 is a context of arity |Γ|, for example the context λx.x with no hole), which gives us y =⇒ y µ −→ , hence y is of the form ∆, with (x, y ) ∈ wp( S ).
Remark 4.11. Lemma 4.10 does not apply to non-deterministic calculi. In fact, those would require a special label signalling that the configuration is only composed of values in order for the proofs of progressions to go through. This would make Lemma 4.10 unnecessary in the proofs of progressions for str and C e (those proofs would however need to include long parts that are redundant with the proof of progression for C since Lemma 4.12 uses Lemma 4.10).
Lemma 4.12 helps separating a proof of progression f T (f ∪ g) for a function f into a few simpler proofs, namely: (4.4), that is, the progression for pairs of values; (4.5), that is, the progression for pairs of non-values; and (4.6), that is, the fact that f absorbs the reduction function r (R → ⇒R⇐), up to t. In order to carry out the splitting, f is also required to distribute over union (4.7) -which holds for str, C, and C e .
Lemma 4.12. If f and g are monotone functions such that: then f T (f ∪ g).
Proof. We first establish the following inclusion: Let S be a relation and (x, y) ∈ (id \ n)(p(S)), i.e. (x, y) ∈ p(S) and at least one of x or y is a value. The case p = sp is trivial: since x is a value if and only if y is a value, they are both values, and (x, y) ∈ v(p(S)) ⊆ r(v(p(t(S)))). The case p = wp is a consequence of Lemma 4.10: • First suppose that x is a value Γ. By Lemma 4.10, there is a value ∆ such that (Γ, ∆) ∈ wp( S ) ⊆ wp(t(S)). This is a value pair, so (Γ, ∆) ∈ v(wp(t(S))). Finally, (x, y) ∈ r(v(wp(t(S)))). • Otherwise, suppose y is a value ∆. We know that (∆, y) ∈ wp(S) −1 = wp(S −1 ) by symmetry of wp, so we can apply Lemma 4.10. This shows that there exists Γ such that (∆, Γ) ∈ wp( S −1 ). Following the reasoning for x, we derive (y, x) ∈ r(v(wp(t(S −1 )))), and so (x, y) ∈ r(v(wp(t(S)))) since r, v, wp, and t are symmetric. We can now conclude: t by (4.4) and monotonicity of t The distinction between values and non-values simplifies the proof of the progression for str.
Proof. This is the conclusion of Lemma 4.12 with f = str and g = C ∪ C e , so it is sufficient to establish the premises of the lemma: (2) str • n T (str ∪ C ∪ C e ) follows from the stronger inclusion str • n str: (3) str • r ⊆ t • str: since r ⊆ f ⊆ t, we only need to prove str • r ⊆ r • str. This can be derived more algebraically: it is trivial to check that str respects relation mirroring, relational composition, and silent transitions (str(R −1 ) ⊆ str(R) −1 , str(RS) ⊆ str(R)str(S), and str( The stronger result str T (str ∪ C) holds as well, but it requires a longer proof (also redundant with the progression for C) and it is not necessary. The progression for C e follows the same pattern.
Proof. Similarly we apply Lemma 4.12 with f = C e and g = C ∪str, and prove the hypotheses: Proof. Combining Lemmas 4.4, 4.13, and 4.14 provides us with the following progression for wp: and therefore str ∪ C ∪ C e is below the companion t wp . The case for sp is similar but easier; in particular the analogue of Lemma 4.10 is not required.
Once more, the fact that up-to-context functions are below t entails the corresponding congruence properties of environmental bisimilarity. In [SKS11] the two aspects (congruence and up-to-context) had to be proved separately, with similar proofs. Moreover the two cases of contexts (arbitrary contexts and evaluation contexts) had to be considered at the same time, within the same proof. Here, in contrast, the machinery of compatible functions allows us to split the effort into simpler proofs.
Remark 4.16. A transition system ensuring full abstraction as in Theorem 4.2 does not guarantee the compatibility of the up-to techniques specific to the language in consideration. For instance, a simpler and maybe more natural alternative to the second transition in (4.1) is the following one: With this rule, full abstraction holds, but up-to-context is unsound: for every Γ and ∆, the singleton relation {(Γ, ∆)} is a bisimulation up to C: indeed, using rule (4.9), the derivatives of the pair Γ, ∆ are of the shape Γ i (C[Γ]), ∆ i (C[∆]), and they can be discarded immediately, up to the context [·] i C. If up-to-context were sound then we would deduce that any two terms are bisimilar. (The rule in (4.1) prevents such a behaviour since it ensures that the tested values are 'consumed' immediately.)

Imperative call-by-value λ-calculus
In this section we study the addition of imperative features (higher-order references, that we call locations), to a call-by-value λ-calculus. It is known that finding powerful reasoning techniques for imperative higher-order languages is a hard problem. The language, ΛR, is a simplified variant of that in [KW06,SKS11]. The syntax of terms, values, and evaluation contexts, as well as the reduction semantics are given in Figure 2. A λ-term M is run in a store: a partial function from locations to closed values, whose domain includes all free locations of both M and its own co-domain. We use letters r, s, u, v to range over stores. New store locations may be created using the operator ν M ; the content of a store location may be read using get V , or rewritten using set V (the argument of the former instruction is ignored, and the latter instruction returns the identity value I λx.x). We denote the reflexive and transitive closure of −→ R by =⇒ R .
Note that in contrast with the languages in [KW06,SKS11], locations are not directly first-class values; the expressive power is however the same: a first-class location can always be encoded as the pair (get , set ). Having locations as first-class values by themselves is possible but would require two additional labels (for reading and writing), two additional rules, two new cases in the corresponding case analyses, and new ways to build contexts from  Figure 2: The imperative λ-calculus environments; presentation and proofs would then be substantially more involved. Hence, for readability issues, we have preferred to forbid it.
We present the first-order LTS for ΛR, and then we relate the resulting strong and weak bisimilarities directly with contextual equivalence (the reference equivalence in λ-calculi). Alternatively, we could have related the first-order bisimilarities to the environmental bisimilarities of ΛR, and then inferred the correspondence with contextual equivalence from known results about environmental bisimilarity, as we did for ΛN .
We write (s; M ) ↓ when M is a value; and (s; For the definition of contextual equivalence, we distinguish the cases of values and of arbitrary terms, because they have different congruence properties: values can be tested in arbitrary contexts, while arbitrary terms must be tested only in evaluation contexts. As in [SKS11], we consider contexts that do not contain free locations (they can contain bound locations). We refer to [SKS11] for more details on these aspects. We now define ΛR 1 , the first-order LTS for ΛR. The states and the transitions for ΛR 1 are similar to those for the pure λ-calculus of Section 4, with the addition of a component for the store. The two transitions (4.1) of call-by-name λ-calculus become: The first rule is the analogous of the first rule in (4.1). The important differences are on the second rule. First, since we are call-by-value, C now ranges over C v , the set of value contexts (i.e., holes or contexts of the form λx.C ) without free locations. Moreover, since we are now imperative, in a transition we must permit the creation of new locations, and a term supplied by the environment should be allowed to use them. In the rule, the new store is represented by r (whose domain has to be disjoint from that of s). Correspondingly, to allow manipulation of these locations from the observer, for each new location we make get and set available, as an extension of the environment; in the rule, these are collectively written getset(r), and Γ is the extended environment. Finally, we must initialise the new store, using terms that are created out of the extended environment Γ ; that is, each new location is initialised with a term D [Γ ] (for D ∈ C v ). Moreover, the contexts D chosen must be made visible in the label of the transition. To take care of these aspects, we view r as a store context, a tuple of assignments → D . Thus the initialisation of the new locations is written r[Γ ]; and, denoting by cod(r) the tuple of the contexts D in r, we add cod(r) to the label of the transition. Note also that, although C and D are location-free, their holes may be instantiated with terms involving the get and set operators, so that these contexts may still manipulate the store.
Once more, on the (strong and weak) bisimilarities that are derived from this first-order LTS, we can import the theory of compatible functions and bisimulation enhancements. Like in Section 3 for π, we establish the validity of a few up-to techniques before proving full abstraction: these techniques give us important closure properties of bisimilarities via Lemma 2.6.
Concerning additional up-to functions, specific to ΛR 1 , the functions w, str, C and C e are adapted from Section 4 in the expected manner-contexts C v , C and E must be location-free. A further function for ΛR 1 is store, which manipulates the store by removing locations that do not appear elsewhere (akin to garbage collection); thus, store(R . This may seem unnecessarily restrictive, but since renaming locations on either side using an injective substitution is a strongly bisimilar operation, using (R → ∼R∼) • store allows to choose r 1 on the left and r 2 on the right, as long as cod(r 1 ) = cod(r 2 ).
Lemma 5.2. The functions w, str, C e , store, C are below both companions t sp and t wp .
Proof. We apply the same proof schema as in Theorem 4.15 with more technical details to be handled, as the store is to be accounted for. We provide details mainly for the progression starting from store itself, which is the most interesting new aspect. We explain how the other parts are handled, with reference to the proof of Theorem 4.15.
We handle store first. To avoid introducing and remembering many new names such as Γ , Γ , etc., we write Γ V for Γ, V and Γ r for Γ, getset(r). For example, the rule for visible transitions can be rewritten into . It also simplifies writing and reading when taking one index of a composed environment, for example Γ rV i should be read as the ith element of (Γ r ) V , which can be either Γ i (if i ≤ |Γ|), or in getset(r), or V . Now store can be redefined as which do not appear in the label, are fresh. The transition is: There are two cases, depending if Γ rV i is in Γ V or in getset(r).
We can derive the corresponding transition labelled i , C , cod(v ) from (u; ∆, N ) which will silently reduce to (u ; ∆ W ), then make a visible weak transition to (u ; ∆ V v , N ) knowing that (s ; Γ V v , M ) S (u ; ∆ W v , N ). We can then replace Γ V v and ∆ W v with Γ rV v and ∆ rW v , to prove that (s ; Γ rV v , M ) S 1 (u ; ∆ rW v , N ), where S 1 is S where we applied the 'up-to-permutation' technique to move V and W in the middle of v . This technique is compatible, so S 1 ⊆ t wp (S) (and S 1 ⊆ t sp (S)).
(2) Suppose i ∈ {|Γ| + 1, . . . , |Γ| + 2|r|}. Then Γ rV i is either get or set with ∈ dom(r). (a) if Γ rV i = get then s is not modified and M = r[Γ r ] is a context of Γ r and hence is also a context of Γ V v using the same v as above. The result of the transition is: In the weak case, using Lemma 4.10 we get (u; ∆, N ) =⇒ (u ; ∆ W ), and this term is related to (s, Γ V ) through wp( S ) ⊆ t wp (S). Finally we can relate x to (u v [∆ W v ]; ∆ W v , C 1 [∆ W v ]) through C(store(t(S))). (b) if Γ rV i = set then s is modified at ∈ dom(r) (so we only have to change r) and M = I = C 1 [Γ V v ] for C 1 = I (a context with no holes) so the pair progresses again, using the same notations as before, to C(store(t(S))). In summary, we have store(R) store(S) ∪ t(S) ∪ C(store(t(S))), and so store T (store ∪ C) (5.2) We now establish the progressions for the remaining functions. First, w w with the same argument as in the proof of Lemma 4.3, so w ⊆ t. The most important proof is for C. We assume (s; Γ) R (u; ∆), and we analyse the transitions from (s; Γ, C[Γ]).
(1) if C[Γ] is a value, then C is a value context C v , and the same structure as for the corresponding case in Lemma 4.4 applies here, with the only significant difference being in the third case. The transition of interest is labelled with i, C 1 , cod(r) such that i ≤ |Γ| + 1.
(a) If i ≤ |Γ|, this means the value that is given an argument is one of the Γ i s. Let i, C 1 , cod(r ) be the label i, C 1 , cod(r) where we composed the contexts with C v , so that C v replaces [·] |Γ|+1 . Using this label on the progression R S we obtain a pair in S. We apply first str and then 'up-to-permutation' to add C v on each side, which puts the desired pair in t(str(S)). (b) If i = |Γ| + 1, and C v = [·] j , we proceed the same way as above, with the label j, C 1 , cod(r ) where C 1 (resp. r ) is C 1 (resp. r) where [·] j replaces all occurrences and with an augmented store, which results in a relation built on R, as follows (we use C and store, and set ∆ = ∆, C v [∆]): .
, −], and similarly for ∆: then some Γ i is run, so we also run it starting from the original configuration, with the label i, C v2 , ∅ using the evaluation context function, and therefore progressing to C e (S). We prove a stronger result, namely that the resulting configurations with the context C 2 are still related if the get and set operators are available. The derivation is as follows; we use weakening w, exploit C and store, and write Λ for get , set : .
To summarise, C(R) t(str(S)) ∪ C(store(C(R))) ∪ C(R) ∪ C e (S) ∪ w(C(store(R))) and the right-hand side is included in T (w ∪ str ∪ C ∪ store ∪ C e )(R ∪ S). Remark that w ⊆ t, R ⊆ p(S), and p ⊆ t, so we obtain: We now move on to str and C e . As in ΛN 1 , v(R) denotes the pairs of value configurations of R, and n(R) the pairs of non-value configurations of R. It is trivial to check that str • v ⊆ t • C and C e • v ⊆ t • C, and so by combining with (5.3), both str • v and C e • v progress to T (str ∪ C ∪ store ∪ C e ). It is also straightforward to derive str • r ⊆ r • str and C e • r ⊆ r • C e , and that str • n str and C e • n C e . Note that ΛR 1 is quasi-deterministic; indeed, new locations, both for ν and in the choice of the domain of r in visible transitions, are chosen non-deterministically, but their choice does not matter up to strong bisimilarity. We can now apply Lemma 4.12 with f = str and g = C ∪ store ∪ C e to obtain: str T (str ∪ C ∪ store ∪ C e ) (5.4) and with f = C e and g = str ∪ C ∪ store to obtain: Combining (5.2), (5.3), (5.4), and (5.5), yields that h (str ∪ C ∪ store ∪ C e ) progresses to T (h), and hence h ⊆ t.
Having established that C and C e are below the companion gives as a consequence that our first-order bisimilarity is a congruence under location-free contexts, from which we can derive the soundness implication in Theorem 5.3 below. Since M is a value, (s; M ) ⇓. By choosing E = [·] 1 in (5.6) we know that (u; N )⇓ and thus (u; N ) =⇒ R (u ; W ) for some value W and store u . We then obtain the weak transition (u; ∆, N ) µ =⇒ (u ; ∆ , N ) through (u ; ∆, W ), for some ∆ , ∆ , N such that: To close the bisimulation diagram, we will now prove that (s ; Γ , M ) R (u ; ∆ , N ). Let E be a location-free evaluation context, we show that We recall that a context of arity n is a context with holes [·] 1 , . . . , [·] n each occurring any number of times and that in an evaluation context the first hole [·] 1 is the one that occurs exactly once and in evaluation position. Let F be an evaluation context of arity |Γ| + 1.
Instantiating the definition of R with F , we have the following equivalence: We choose F carefully so that (5.9) is equivalent to (5.8). Let i → C i , i = 1, . . . , n, be the collection of location-context pairs of the store context r. By determinism of reductions, since each side of (5.9) reduces to the corresponding side of (5.8), we know that (5.9) is equivalent to (5.8).
Congruence of bisimilarity is restricted either to values (C), or to evaluation contexts (C e ). It does not hold for arbitrary contexts, but Lemma 5.6 provides a sufficient condition for some relations between arbitrary terms to be preserved by arbitrary contexts. First we establish weaker results: for evaluation contexts (Lemma 5.4), then for non-evaluation contexts (Lemma 5.5). Finally Lemma 5.6 combines the two.
In the following, we use to denote any of the relations ∼, ≈, and . (In Lemma 5.4 F may contain free locations, unlike occurrences in earlier definitions of transitions and of up-to-context functions.)  (2) (Case of visible action.) First, since no hole is in evaluation position, C[L] is a value iff C[R] is a value, so they have the same visible actions of the form i, D, cod(r). We end up with the same shape of configurations we had for the τ transition above, and we therefore proceed similarly. We have thus proved that R progresses to R (expansion up to expansion). In the strong case, we prove that R progresses to ∼R, and in the weak case we prove that R weakly progresses to (≈R) ∩ (R≈) (which corresponds to two possible ways of using Lemma 5.4 in the above proof). Such a refinement is necessary because in the weak case, one can use "up to ≈" only when ≈ is not on the same side as the challenge.
Lemma 5.6. Let be any of the relations ∼, ≈, and . Suppose L, R are ΛR terms with (s; Γ, L) (s; Γ, R) for all environments Γ and store s. Then also (s; Γ, C[L]) (s; Γ, C[R]), for every store s, environment Γ and context C.
Proof. Using Lemma 5.4 and transitivity of , we rewrite the occurrence of L that is in evaluation position into R, and repeat this until there is no such L (such a rewriting may have to be performed more than once if L is not a value but R is so; for example if L = II and R = I, then L is in evaluation position in LL on the right and in LR on the left). We finally apply Lemma 5.5.
The separation between evaluation contexts and non-evaluation contexts is critical, as handling all contexts together would yield a much larger bisimulation candidate. Proof. This is a consequence of Lemma 5.7 and Lemma 5.6.
We use Lemma 5.6 at various places in the example we cover in Section 6. For instance we use it to replace a term N 1 (λx.E[x])M (with E an evaluation context) with N 2 E[M ], under an arbitrary context. Such a property is delicate to prove, even for closed terms, because the evaluation of M could involve reading from a location of the store that itself could contain occurrences of N 1 and N 2 .

An example
We conclude by discussing an example from [KW06]. It consists in proving a law between terms of ΛR extended with integers, operators for integer addition and subtraction, and a conditional-those constructs are straightforward to accommodate in the presented framework. For readability, we also use the standard notation for store assignment, dereferencing and sequence: ( := M ) set M , ! get I, and M ; N (λx.N )M where x does not appear in N . The two terms are the following ones: • M λg.ν := 0; g(incr ); if ! mod 2 = 0 then I else Ω • N λg.g(F ); I, where incr λz. := ! + 2, and F λz.I. Intuitively, those two terms are weakly bisimilar because the location bound by in the first term will always contain an even number.
We consider two proofs of the example. In comparison with the proof in [SKS11]: (i) we handle the original example from [KW06], and (ii) the availability of a broader set of up-to techniques and the possibility of freely combining them allows us to work with smaller relations. In the first proof we work up to the store (through the function store) and up to expansion-two techniques that are not available in [SKS11]. In the second proof we exploit