Differentials and distances in probabilistic coherence spaces

In probabilistic coherence spaces, a denotational model of probabilistic functional languages, mor-phisms are analytic and therefore smooth. We explore two related applications of the corresponding derivatives. First we show how derivatives allow to compute the expectation of execution time in the weak head reduction of probabilistic PCF (pPCF). Next we apply a general notion of"local"differential of morphisms to the proof of a Lipschitz property of these morphisms allowing in turn to relate the observational distance on pPCF terms to a distance the model is naturally equipped with. This suggests that extending probabilistic programming languages with derivatives, in the spirit of the differential lambda-calculus, could be quite meaningful.


Introduction
Currently available denotational models of probabilistic functional programming (with full recursion, and thus partial computations) can be divided in three classes.
• Game based models, first proposed in [DH00] and further developed by various authors (see [CCPW18] for an example of this approach).From their deterministic ancestors they typically inherit good definability features.• Models based on Scott continuous functions on domains endowed with additional probability related structures.Among these models we can mention Kegelspitzen [KP17] (domains equipped with an algebraic convex structure) and ω-quasi Borel spaces [VKS19] (domains equipped with a generalized notion of measurability).• Models based on (a generalization of) Berry stable functions.The first category of this kind was that of probabilistic coherence spaces (PCSs) and power series with non-negative coefficients (the Kleisli category of the model of Linear Logic developed in [DE11]) for which we could prove adequacy and full abstraction with respect to a probabilistic version of PCF [EPT18,Ehr20].We extended this idea to "continuous data types" (such as R) by substituting PCSs with positive cones and power series with functions featuring an hereditary monotonicity property that we called stability 1 and [Cru18] showed that this extension is actually conservative (stable functions on PCSs, which are special positive cones, are exactly power series).
The main feature of this latter semantics is the extreme regularity of its morphisms.Being power series, they must be smooth.Nevertheless, the category Pcoh is not a model of differential linear logic in the sense of [Ehr18].This is due to the fact that general addition of morphisms is not possible (only sub-convex linear combinations are available) thus preventing, e.g., the Leibniz rule to hold in the way it is presented in differential LL.Also a morphism X → Y in the Kleisli category Pcoh ! can be considered as a function from the closed unit ball of the cone P associated with X to the closed unit ball of the cone Q associated with Y .From a differential point of view such a morphism is well behaved only in the interior of the unit ball.On the border derivatives can typically take infinite values.
Contents.We already used the analyticity of the morphisms of Pcoh ! to prove full abstraction results [EPT18].We provide here two more corollaries of this property, involving also derivatives.For both results, we consider a paradigmatic probabilistic purely functional programming language 2 which is a probabilistic extension of Scott and Plotkin's PCF.This language pPCF features a single data type ι of integers, a simple probabilistic choice operator coin(r) : ι which flips a coin with probability r to get 0 and 1 − r to get 1.To make probabilistic programming possible, this language has a let(x, M, N ) construct restricted to M of type ι which allows to sample an integer according to the sub-probability distribution represented by M .The operational semantics is presented by a deterministic "stack machine" which is an environment-free Krivine machine parameterized by a choice sequence ∈ C 0 = {0, 1} <ω , presented as a partial evaluation function.We adopt a standard discrete probability approach, considering C 0 as our basic sample space and the evaluation function as defining a (total) probability density function on C 0 .We also introduce an extension pPCF lab of pPCF where terms can be labeled by elements of a set L of labels, making it possible to count the use of labeled subterms of a term M (closed and of ground type) during a reduction of M .Evaluation for this extended calculus gives rise to a random variable (r.v.) on C 0 ranging in the set M fin (L) of finite multisets of elements of L. The number of uses of terms labeled by a given l ∈ L (which is a measure of the computation time) is then an N-valued r.v., the expectation of which we want to evaluate.We prove that, for a given labeled closed term M of type ι, this expectation can be computed by taking a derivative of the interpretation of this term in the model Pcoh ! and provide a concrete example of computation of such expectations.This result can be considered as a probabilistic version of [dC09, dC18].The fact that derivatives can become infinite on the border of the unit ball corresponds then to the fact that this expectation of "computation time" can be infinite.
In the second application, we consider the contextual distance on pPCF terms generalizing Morris equivalence as studied in [CL17] for instance.The probabilistic features of the language make this distance too discriminating, putting e.g.terms coin(0) and coin(ε) at distance 1 for all ε > 0 (probability amplification).Any cone (and hence any PCS) is equipped with a norm and hence a canonically defined metric 3 .Using a locally defined notion of differential of morphisms in Pcoh !, we prove that these morphisms enjoy a Lipschitz property on all balls of radius p < 1, with a Lipschitz constant 1/(1−p) (thus tending towards ∞ when p tends towards 1).Modifying the definition of the operational distance by not considering Notations.We use R ≥0 for the set of real numbers x such that x ≥ 0, and we set R ≥0 = R ≥0 ∪ {+∞}.Given two sets S and I we use S I for the set of functions I → S, often considered as I-indexed families s of elements of S. We use the notation s (with an arrow) when we want to stress the fact that the considered object is considered as an indexed family, the indexing set I being usually easily derivable from the context.The elements of such a family s are denoted s i or s(i) depending on the context (to avoid accumulations of subscripts).Given i ∈ I we use e i for the function I → R ≥0 such that e i (i) = 1 and e i (j) = 0 if j = i.In other words e i (j) = δ i,j , the Kronecker symbol.We use M fin (I) for the set of finite multisets of elements of I.A multiset is a function µ : I → N such that supp(µ) = {i ∈ I | µ(i) = 0} is finite.We use additive notations for operations on multisets (0 for the empty multiset, µ + ν for their pointwise sum).We use [i 1 , . . ., i k ] for the multiset We use I <ω for the set of finite sequences i 1 , . . ., i k of elements of I and α β for the concatenation of such sequences.We use for the empty sequence.

Probabilistic coherence spaces (PCS)
For the general theory of PCSs we refer to [DE11,EPT18] where the reader will find a more detailed presentation, including motivating examples.Here, we recall only the basic definitions and provide a characterization of these objects.So this section should not be considered as an introduction to PCSs: for such an introduction the reader is advised to have a look at the articles mentioned above.PCSs are particular positive cones, a notion borrowed from [Sel04]) that we used in [EPT18] to extend the probabilistic semantics of PCS to continuous data-types such as the real line.

1.1.
A few words about cones.A (positive) pre-cone is a cancellative 4 commutative R ≥0 -semi-module P equipped with a norm P , that is a map P → R ≥0 , such that r x P = r x P for r ∈ R ≥0 , x + y P ≤ x P + y P and x P = 0 ⇒ x = 0.It is moreover assumed that x P ≤ x + y P , this condition expressing that the elements of P are positive.Given x, y ∈ P , one says that x is less than y (notation x ≤ y) if there exists z ∈ P such that x + z = y.By the cancellativeness property, if such a z exists, it is unique and we denote it as y − x.This subtraction obeys usual algebraic laws (when it is defined).Notice that if x, y ∈ P satisfy x + y = 0 then since x P ≤ x + y P , we have x = 0 (and of course also y = 0).Therefore, if x ≤ y and y ≤ x then x = y and so ≤ is an order relation.
A (positive) cone is a positive pre-cone P whose unit ball BP = {x ∈ P | x P ≤ 1} is ω-order-complete in the sense that any increasing sequence of elements of BP has a least upper bound in BP .In [EPT18] we show how a notion of stable function on cones can be defined, which gives rise to a cartesian closed category and in [Ehr20] we explore the category of cones and linear and Scott-continuous functions.
1.2.Basic definitions on PCSs.Given an at most countable set I and u, u ∈ R ≥0 I , we set u, u = i∈I u i u i ∈ R ≥0 .Given P ⊆ R ≥0 I , we define P ⊥ ⊆ R ≥0 I as Observe that if P satisfies ∀a ∈ I ∃x ∈ P x a > 0 and ∀a ∈ I ∃m ∈ R ≥0 ∀x ∈ P x a ≤ m then P ⊥ ∈ (R ≥0 ) I and P ⊥ satisfies the same two properties.
A probabilistic pre-coherence space (pre-PCS) is a pair X = (|X|, PX) where |X| is an at most countable set5 and Given any PCS X we can define a cone PX as follows: that we equip with the following norm: x PX = inf{r > 0 | x ∈ r PX} and then it is easy to check that B(PX) = PX.We simply denote this norm as X , so that x X = sup x ∈PX ⊥ x, x .
Given t ∈ R ≥0 I×J considered as a matrix (where I and J are at most countable sets) and u ∈ R ≥0 I , we define t • u ∈ R ≥0 J by (t • u) j = i∈I t i,j u i (usual formula for applying a matrix to a vector), and if s ∈ R ≥0 J×K we define the product s t ∈ R ≥0 I×K of the matrix s and t as usual by (s t) i,k = j∈J t i,j s j,k .This is an associative operation.Let X and Y be PCSs, a morphism from X to Y is a matrix t ∈ (R ≥0 ) |X|×|Y | such that ∀x ∈ PX t • x ∈ PY .It is clear that the identity matrix is a morphism from X to X and that the matrix product of two morphisms is a morphism and therefore, PCSs equipped with this notion of morphism form a category Pcoh.
this is a pre-PCS by this observation, and checking that it is indeed a PCS is easy.
We define then X ⊗ Y = (X Y ⊥ ) ⊥ ; this is a PCS which satisfies P(X ⊗ Z) = {x ⊗ z | x ∈ PX and z ∈ PZ} ⊥⊥ where (x ⊗ z) (a,c) = x a z c .Then it is easy to see that we have equipped in that way the category Pcoh with a symmetric monoidal structure for which it is * -autonomous wrt. the dualizing object ⊥ = 1 = ({ * }, [0, 1]) which is also the unit of ⊗.The * -autonomy follows easily from the observation that (X ⊥) X ⊥ .

DIFFERENTIALS IN PROBABILISTIC COHERENCE SPACES 2:5
The category Pcoh is cartesian: if (X i ) i∈I is an at most countable family of PCSs, then ( &i∈I X i , (π i ) i∈I ) is the cartesian product of the X i s, with | &i∈I X i | = ∪ i∈I {i} × |X i |, (π i ) (j,a),a = 1 if i = j and a = a and (π i ) (j,a),a = 0 otherwise, and x ∈ P( &i∈I A particular case is N = ⊕ n∈N X n where X n = 1 for each n.So that |N| = N and x ∈ (R ≥0 ) N belongs to PN if n∈N x n ≤ 1 (that is, x is a sub-probability distribution on N).For each n ∈ N we have e n ∈ PN which is the distribution concentrated on the integer n.There are successor and predecessor morphisms suc, pred ∈ Pcoh(N, N) given by suc n,n = δ n+1,n and pred n,n = 1 if n = n = 0 or n = n + 1 (and pred n,n = 0 in all other cases).An element of Pcoh(N, N) is a (sub)stochastic matrix and the very idea of this model is to represent programs as transformations of this kind, and their generalizations.
As to the exponentials, one sets |!X| = M fin (|X|) and . Then given t ∈ Pcoh(X, Y ), one defines !t ∈ Pcoh(!X, !Y ) in such a way that !t•x != (t•x) !(the precise definition is not relevant here; it is completely determined by this equation).We do not need here to specify the monoidal comonad structure of this exponential.The resulting cartesian closed category 6 Pcoh ! can be seen as a category of functions (actually, of stable functions as proved in [Cru18]).Indeed, so that we consider morphisms as power series (they are in particular monotonic and Scott continuous functions PX → PY ).In this cartesian closed category, the product of a family (X i ) i∈I is &i∈I X i (written X I if X i = X for all i), which is compatible with our viewpoint on morphisms as functions since P( &i∈I X i ) = i∈I PX i up to trivial iso.The object of morphisms from X to Y is !X Y with evaluation mapping (t, x) ∈ P(!X Y ) × PX to t(x) that we simply denote as t(x) from now on.The well defined function P(!X X) → PX which maps t to sup n∈N t n (0) is a morphism of Pcoh !(and thus can be described as a power series in the vector t = (t m,a ) m∈M fin (|X|),a∈|X| ) by standard categorical considerations using cartesian closeness: it provides us with fixed point operators at all types.

Probabilistic PCF, time expectation and derivatives
We introduce now the probabilistic functional programming language considered in this paper.The operational semantics is presented using elementary probability theoretic tools.
2.1.The core language.The types and terms are given by σ, τ, . . .  1 for the typing rules, with typing contexts Γ = (x 1 : σ 1 , . . ., x n : σ n ); notice that this figures includes the typing rules for the stacks that we introduce below.It is important to keep in mind that it would not make sense to extend the construction let(z, M, N ) to terms M which are not of type ι.This construction uses essentially the fact that the type ι is a positive formula of linear logic, see [ET19].
2.1.1.Denotational semantics.We survey briefly the interpretation of pPCF in PCSs thoroughly described in [EPT18].Types are interpreted by ι = N and σ ) (a "Kleisli morphism") that we see as a function k i=1 P σ i → P σ as explained in Section 1.2.These functions are given by and, assuming that Γ, x : σ M : τ and u In former papers we have presented the operational semantics of pPCF as a discrete Markov chain on states which are the closed terms of pPCF.This Markov chain implements the standard weak head reduction strategy of PCF which is deterministic for ordinary PCF but features branching in pPCF because of the coin(r) construct (see [EPT18]).Here we prefer another, though strictly equivalent, presentation of this operational semantics, based on an environment-free Krivine Machine (thus handling states which are pairs made of a closed term and a closed stack) further parameterized by an element of {0, 1} <ω to be understood as a "random tape" prescribing the values taken by the coin(r) terms during the execution of states.We present this machine as a partial function taking a state s, a random tape α and returning an element of [0, 1] to be understood as the probability that the sequence α of 0/1 choices occurs during the execution of s.We allow only execution of ground type states and accept 0 as the only terminating value: a completely arbitrary choice, sufficient for our purpose in this paper.Also, we insist that a terminating computation from (s, α) completely consumes the random tape α.These choices allow to fit within a completely standard discrete probability setting.Given an extension Λ of pPCF (with the same format for typing rules), we define the associated language of stacks (called Λ-stacks).
where M and N range over Λ.A stack typing judgment is of shape σ π (meaning that the stack π takes a term of type σ and returns an integer) and the typing rules are given in Fig. 1.
A state is a pair M, π (where we say that M is in head position) such that M : σ and σ π for some (uniquely determined) type σ, let S be the set of states.Let C 0 = {0, 1} <ω be the set of finite lists of booleans (random tapes), we define a partial function Ev : S × C 0 → [0, 1] in Fig. 2 7 where we use the functions Let D(s) be the set of all α ∈ C 0 such that Ev(s, α) is defined.When α ∈ D(s), the number Ev(s, α) ∈ [0, 1] is the probability that the random tape α occurs during the execution.When 7 Notice that all the equations defining Ev in Fig. 2 are well-typed in the sense that if the state of the LHS of an equation is well-typed, so is its RHS.

2:8
all coins are fair (all the values of the parameters r are 1/2), this probability is 2 −len(α) .The sum of these (possibly infinitely many) probabilities is ≤ 1.For fitting within a standard probabilistic setting, we define a total probability distribution Ev(s) : in all other cases so that 1 carries the weight of divergence.We prefer this option rather than adding an error element to C 0 which would be more natural from a programming language point of view, but less standard from the viewpoint of probability theory.This choice is arbitrary and has no impact on the result we prove because all the events of interest for us will be subsets of 0 C 0 ⊂ C 0 .Let P s be the associated probability measure.We are in a discrete setting so simply is the set of all random tapes (up to 0-prefixing) making s reduce to 0. Its probability is Ev(s, β) .
In the case s = M, ε (with M : ι) this probability is exactly the same as the probability of M to reduce to 0 in the Markov chain setting of [EPT18] (see e.g.[BLGS16] for more details on the connection between these two kinds of operational semantics).So the Adequacy Theorem of [EPT18] can be expressed as follows.
We use sometimes P(M ↓ 0) as an abbreviation for P M,ε ( M, ε ↓ 0).We shall introduce several versions of PCF in the sequel, with associated machines.
• In Section 2.2 we introduce pPCF lab where terms can be labeled by elements of L and the machine Ev lab which returns a multiset of elements of L counting how many times labeled subterms arrive in head position during the evaluation.• In Section 2.4 we introduce pPCF lc which includes a labeled version of the coin( ) construct, and the associated machine Ev lc which returns an element of R ≥0 (a probability actually).• In the same section we introduce as an auxiliary tool the machine Ev η lc which differs from the previous one by the fact that it returns an element of C 0 .
• We also introduce a machine Ev −η lc which is a kind of inverse of the previous one.In the proofs these machines will often be used with an additional integer parameter for indexing the execution steps.
It is important to notice that the pPCF lc language and the associated machines are only an intermediate step in the proof of Theorem 2.11, the main result of this section, whose statement does not mention them at all.

2.1.3.
Step-indexed Krivine Machine.In various proofs we shall need step-indexed versions of the involved machines, equipped with a further parameter in N.For instance the modified Ev(s, α, n) will be a total function S lc (L) × (C 0 ∪ {↑}) × N → R ≥0 ∪ {↑} where ↑ stands for non-terminated computations.Here are a few cases of this modified definition, the others being similar: where of course multiplication is extended by r↑ = ↑r = ↑ and similarly for concatenation.
Remark 2.2.One obvious and important feature of this definition, shared by all the forthcoming definitions based on similar step-indexing, is that if Ev(s, α, n) = ↑, we have Ev(s, α, k) = Ev(s, α, n) for all k ≥ n.This property will be referred to as "monotonicity of step-indexing".Then Ev(s, α) is defined, and has value r, iff Ev(s, α, n) = r for some n (and then the same will hold for any greater n).

2.2.
Probabilistic PCF with labels and the associated random variables.In order to count the number of times a given subterm N of a closed term M of type ι is used (that is, arrives in head position) during the execution of M, ε in the Krivine machine of Section 2.1.2,we extend pPCF into pPCF lab by adding a term labeling construct N l for l belonging to a fixed set L of labels.The typing rule for this new construct is simply Γ N : σ Γ N l : σ Of course pPCF lab -stacks involve now such labeled terms but their syntax is not extended otherwise; let S lab be the corresponding set of states.Then we define a partial function exactly as Ev apart for the following cases, Ev lab ( 0, ε , ) = 0 the empty multiset.
When applied to M, ε , this function counts how often labeled subterms of M arrive in head position during the reduction; these numbers, represented altogether as a multiset of elements of L, depend of course on the random tape provided as argument together with the state.
Let D lab (s) be the set of α's such that Ev lab (s, α) is defined.Define strip(s) ∈ S as s stripped from its labels.

2:10
Proof.Simple inspection of the definition of the two involved functions.More precisely, using the step-indexed versions of the machines presented in Section 2.1.3,an easy induction on n shows that We define a r.v. 8  Its expectation is (2.4)This is the expected number of occurrences of l-labeled subterms of s arriving in head position during successful executions of s.It is more meaningful to condition this expectation under convergence of the execution of s (that is, under the event strip(s) ↓ 0).We have as the r.v.Ev lab (s) l vanishes outside the event s ↓ 0 since D lab (s) = D(strip(s)).

2.3.
A bird's eye view of the proof.Our goal now is to extract this expectation from the denotational semantics of a term M such that M : ι, which contains labeled subterms, or rather of a term suitably definable from M .
For this purpose we will replace in M each N l (where N has type σ) with if(x l , N, Ω σ ) where x = (x l ) l∈L (for some finite subset L of L containing all the labels occurring in M ) is a family of pairwise distinct variables of type ι and Ω σ = fix(λx σ x) (an ever-looping term).We will obtain in that way in Section 2.5 a term sp x M such that We will consider this function as an analytic function (PN) L → PN which therefore induces an analytic function (where r e 0 = (r l e 0 ) l∈L ∈ PN L for r ∈ [0, 1] L ).We will prove that the expectation of the number of uses of subterms of M labeled by l is ∂f ( r) ∂r l (1, . . ., 1) .
Notice that in the partial derivative above, the r l 's are bound by the partial derivative itself and by the fact that it is evaluated at (1, . . ., 1).
In order to reduce this problem to Theorem 2.1, we will introduce the intermediate language pPCF lc which will allow to see each of these parameters r l as the probability of yielding 0 for a biased coin construct lcoin(l, r l ).This calculus will be executed in a further "Krivine machine" Ev lc which has as many random tapes as there are elements in L (plus one for the plain coin( ) constructs already occurring in M ).
This intermediate language will be used as follows: given a closed labeled term M whose all labels belong to L ⊆ L and a family of probabilities r ∈ [0, 1] L we will define in Section 2.5 a term lc r (M ) of pPCF lc which is M where each labeled subterm N l is replaced with if(lcoin(l, r l ), lc r (N ), Ω σ ) where Ω σ .The term lc r (M ) defined in that way is closed, the r l 's being parameters and not variables in the sense of the λ-calculus.The probability p( r) of convergence of lc r (M ) depends on (µ(l)) l∈I where µ(l) ∈ N is the number of times an l-labeled subterm of M arrives in head-position during the evaluation of M : this is the meaning of Lemma 2.8.More precisely µ(l) is the exponent of r l 's in this probability.The main feature of sp x (M ), exploited in Section 2.6, is that p( r) is obtained by applying the semantics of sp x (M ) to r (or more precisely to (r l e 0 ) l∈L ∈ PN L ) -the proof of this fact uses Theorem 2.1.The last step will consist in observing that, by taking the derivative of this probability wrt. the variables x l , one obtains the expectation of the number of times an l-labeled term arrives in head-position during the evaluation of M ; this is due to the fact that the µ(l) are exponents in the expression of p( r) and that these exponents become coefficients by differentiation.
2.4.Probabilistic PCF with labeled coins.Let pPCF lc be pPCF extended with a construct lcoin(l, r) typed as r ∈ [0, 1] ∩ Q and l ∈ L Γ lcoin(l, r) : ι This language features the usual coin(r) construct for probabilistic choice as well as a supply of identical constructs labeled by L that we will use to simulate the counting of Section 2.2.Of course pPCF lc -stacks involve now terms with labeled coins but their syntax is not extended otherwise; let S lc be the corresponding set of states.We use lab(M ) for the set of labels occurring in M (and similarly lab(s) for s ∈ S lc ).Given a finite subset L of L, we use pPCF lc (L) for the set of terms M such that lab(M ) ⊆ L and we define similarly S lc (L).We use the similar notations pPCF lab (L) and S lab (L) for the sets of labeled terms and stacks (see Section 2.2) whose all labels belong to L.
The partial function Ev lc : S lc (L) × C 0 × C L 0 → R ≥0 is defined exactly as Ev (for the unlabeled coin(r), we use only the first parameter in C 0 ), with the additional parameters β passed unchanged in the recursive calls, apart for the the following new rules: where β = (β(l)) l∈L stands for an L-indexed family of elements of C 0 and β [γ/l] is the family δ such that δ(l ) = β(l ) if l = l and δ(l) = γ.We define D lc (s) ⊆ C 0 × C L 0 as the domain of the partial function Ev lc (s, , ).
As before strip(s) ∈ S is obtained by stripping s from its labels, setting strip(lcoin(l, r)) = coin(r), and strip(M ) ∈ pPCF is defined in the same way.
We define a version Ev η lc (s, , ) of the machine Ev lc (s, , ) which returns an element of C 0 instead of an element of R ≥0 .The definition is the same up to the following rules: When defined, Ev η lc ( lcoin(l, r), π , α, β) is a shuffle of the β l 's and of α (in the order in which the corresponding elements of the tape are read during the execution).Being defined by similar recursions, the functions Ev lc (s, , ) and Ev η lc (s, , ) have the same domain.Then we have where "=" should be understood as Kleene equality (either both sides are undefined or both sides are defined and equal).Thanks to the step-indexing of Section 2.1.3,the proof of Equation (2.5) boils down to the following lemma.Proof.Assume that the property holds for all integer p < n and let us prove it for n.One reasons by cases on the shape of s considering only a few cases, the others being similar.Notice that the equation is obvious if n = 0 since then both hand-sides are = ↑ so we can assume n > 0.
• s = let(x, M, N ), π .By definition of the machine Ev lc we have where t = M, let(x, N ) • π and the inductive hypothesis applies.
and the inductive hypothesis applies.• As a last example assume that s = lcoin(l, r), π .We have Ev lc (s, α, β, n • ν i (r) and the inductive hypothesis applies.Equation (2.5) shows in particular that ∀(α, β) ∈ D lc (s) Ev η lc (s, α, β) ∈ D(strip(s)) .Lemma 2.5.The function Proof.We provide explicitly an inverse function, defined as another machine Ev −η lc (s, α).Again we provide only a few cases It is clear from this recursion that the partial function Ev −η lc (s, ) has D(strip(s)) as domain.Let us prove that Considering a step-indexed version of Ev −η lc with an additional parameter in n ∈ N defined along the same lines as (2.3), it suffices to prove that 6) The proof is by induction on n.So assume that the property holds for all integer p < n and let us prove it for n.Assume that Ev η lc (s, α, β, n) = ↑, which implies that n > 0. As usual we consider only a few interesting cases.All other cases are similar to the first one (and similarly trivial).
by inductive hypothesis.
• Assume now that s = coin(r), π .Since Ev η lc (s, α, β, n) = ↑ we must have α = .Let us write α = i γ, then we have Ev η lc (s, α, β, n) = i Ev η lc (t, γ, β, n − 1) where t = i, π and we have Ev η lc (t, γ, β, n − 1) = ↑.By inductive hypothesis it follows that Ev −η lc (t, Ev η lc (t, γ, β, n), n) = (γ, β) Then we have by definition of Ev −η lc .• The case s = lcoin(l, r), π is similar to the previous one, dealing with β(l) instead of α.Now we prove that for all δ ∈ D(strip(s)) one has Ev −η lc (s, δ) ∈ D lc (s) and that ∀δ ∈ D(strip(s)) , n) = δ using as usual the step-indexed versions of our machines.The proof is by induction on n so assume that the property holds for all p < n and let us prove it for n.Assume that Ev −η lc (s, δ, n) = ↑, which implies n > 0. We reason by cases on s, considering the same cases as above (the other cases, similar to the first one, are similarly trivial).
2.5.Spying labeled terms in pPCF.Given r = (r l ) l∈L ∈ (Q ∩ [0, 1]) L , we define a (type preserving) translation lc r : pPCF lab (L) → pPCF lc by induction on terms.For all term constructs but labeled terms, the transformation does nothing (for instance lc r (x) = x, lc r (λx σ M ) = λx σ lc r (M ) etc), the only non-trivial case being where σ is the type 9 of M .In Section 2.6, we will turn a closed labeled term M (with labels in the finite set L) into the term sp x (M ), defined in such a way that strip(lc r (M )) has a simple expression in terms of sp x (M ) (Lemma 2.10), allowing to interpret the coefficients of the power series interpreting sp x (M ) in terms of probability of reduction of the machine Ev lab with given resulting multisets of labels (Equation (2.10)).This in turn is the key to the proof of Theorem 2.11.
We write 0 k for the sequence consisting of k occurrences of 0.
Lemma 2.7.Let s ∈ S lab (L).Then Proof.We show that for any n ∈ N and any (α, β) (2.8) using as before step-indexed versions of the various machines.But in the present situation we shall not have the same indexing on both sides of implications because the encoding lc r (s) requires additional execution steps.
We prove (2.7) by induction on n ∈ N. Assume that the implication holds for all integers p < n and let us prove it for n, so assume that Ev lc (lc r (s), α, β, n) = ↑ which implies n > 0. We consider only three cases as to the shape of s, the other cases being completely similar to the first one.We use the following convention: if M is a labeled term we use M for lc r (M ) and similarly for stacks and states.
• Assume first that s = let(x, M, N ), π and let t = M, let(x, N ) • π .We have s = let(x, M , N ), π and t = M , let(x, N ) • π .We have and hence by inductive hypothesis there is k ∈ N such that Ev lab (t, α, k) = ↑ and ∀l ∈ L β l = 0 Ev lab (t,α,k)(l) . 9A priori this type is known only if we know the type of the free variables of M , so to be more precise this translation should be specified in a given typing context; this can easily be fixed by adding a further parameter to lc at the price of heavier notations.
• The third case we consider is s = M l , π for some l ∈ L so that where σ is the type of M .Since Ev lc (s , α, β, n) = ↑ we must have n ≥ 2 and But whatever is the value of n ≥ 2 we have Ev lc ( Ω σ , π , α, β , n − 2) = ↑ by definition of Ω σ .It follows that we must have i = 0 and Ev lc ( M , π , α, β , n − 2) = ↑.By inductive hypothesis there exists k ∈ N such that, setting t = M, π We have Ev lab (s, α, k + 1) = Ev lab (s, α, k) , and β l = 0 β l = 0 0 Ev lab (t,α,k)(l) = 0 Ev lab (s,α,k+1)(l) proving our contention.This ends the proof of (2.7), we prove now (2.8), also by induction on n.Assume that the implication holds for all p < n and let us prove it for n so assume that As usual this implies that n > 0. We deal with the same three cases as in the proof of (2.7).
• Assume last that s = M l , π for some l ∈ L so that s = if(lcoin(l, r l ), M , Ω σ ), π where σ is the type of M .Let t = M, π .Since Ev lab (s, α, n) = ↑ we have Ev lab (t, α, n − 1) = ↑ and Ev lab (s, α, n) = Ev lab (t, α, n − 1) + [l].We also know that for all m ∈ L In particular β l = 0 β l .Setting β m = β m for m = l, we have therefore By inductive hypothesis there is k ∈ N such that Ev lc (lc This ends the proof of (2.8).
2.6.The spying translation.We consider a last translation, from pPCF lab (L) to pPCF: let x be an L-indexed family of pairwise distinct variables (that we identify with the typing context (x l : ι) l∈L ).If M ∈ pPCF lab (L) with Γ M : σ (assuming that no free variable of M occurs in x) we define sp x (M ) with Γ, x sp x (M ) : σ by induction on M .The unique non trivial case is sp x (M l ) = if(x l , sp x (M ), Ω σ ) where σ is the type of M .As another example, we set sp x (λy τ M ) = λy τ sp x (M ) assuming of course that y is distinct from all x l 's.Example 2.12.The point of this formula is that we can apply it to algebraic expressions of the semantics of the program.Consider the following term M q (for q ∈ Q ∩ [0, 1]) such that M q : ι ⇒ ι: we study (M q )0 l (for a fixed label l ∈ L).So in this example, "execution time" means "number of uses of the parameter 0".For all v ∈ PN, we have M q (v) = ϕ q (v 0 ) e 0 where ϕ q : [0, 1] → [0, 1] is such that ϕ q (u) is the least element of [0, 1] which satisfies the choice between the two solutions of the quadratic equation being determined by the fact that the resulting function ϕ q must be monotonic in u.So by Theorem 2.1 (for q ∈ (0, 1]) Observe that we have also P(M 0 ↓ 0) = ϕ 0 (1) = 1 so that Equation (2.11) holds for all q ∈ [0, 1] (the corresponding curve is the second one in Fig. 3).
Since ϕ q (u) = (1 − q) u 2 + q ϕ q (u) 2 we have ϕ q (u) = 2(1 − q)u + 2qϕ q (u)ϕ q (u) and hence ϕ q (1) = 2(1 − q)/(1 − 2qϕ q (1)) so that (using the expression of ϕ q (1) given by Equation (2.11)), see the third curve in Fig. 3.For q > 1/2 notice that the conditional time expectation and the probability of convergence decrease when q tends to 1.When q is very close to 1, (M q )0 has a very low probability to terminate, but when it does, it uses its argument typically twice.For q = 1/2 we have almost sure termination with an infinite expected computation time.
Of course such explicit computations are not always possible.For instance, using more occurrences of (f )x we can modify the definition of M q in such a way that computing ϕ q (u) would require solving a quintic.Or even we could set and then our solution function ϕ q satisfies ϕ q (u) = (1 − q) u 2 + q ϕ q (u)ϕ q (ϕ q (u)); in such a case we cannot expect to have an explicit expression for ϕ q (u).Approximating the value of ϕ q (1) from below is possible by performing a finite number of iterations of the fixpoint and approximating it from above is a more subtle problem.).Plots of ϕ q (1) and E(Ev lab ( (M q )0 l , ε ) l | (M q )0, ε ↓ 0) with q on the x-axis.See Example 2.12.
Remark 2.13.(Connection with relational and coherence semantics.)It is possible to interpret terms of pPCF in Rel, the relational model of Linear Logic (see for instance [BE01]).
In this model each type σ is interpreted as a set σ Rel : This semantics is "qualitative" in the sense that a point can only belong or not belong to the interpretation of a term whereas the Pcoh semantics is quantitative in the sense that the interpretation of the same term also provides a coefficient ∈ R ≥0 for this point.We explain shortly the connection between the two models.To this end we describe first the relational model.One of the shortest ways to do so is by means of the "intersection typing system" given in Fig. 4 where we use the following conventions: • Φ, Φ 0 • • • are semantic contexts of shape Φ = (x 1 : µ 1 : σ 1 , . . ., x k : µ k : σ k ) where the x i 's are pairwise distinct variables and . ., x k : µ j k : σ k ) for j = 1, . . ., n which have all the same underlying typing context Γ = (x 1 : σ 1 , . . ., x k : σ k ), n j=1 Φ j stands for the semantic context (x 1 : n j=1 µ j 1 : σ 1 , . . ., x k : n j=1 µ j k : σ k ) whose underlying typing context is Γ (0 Γ can be seen as the case n = 0 of this construct with the slight problem that Γ cannot be derived from the Φ j 's in that case since there are none, whence the special construct 0 Γ ).Then, assuming that (x i : σ i ) k i=1 M : τ , this typing system is such that, given µ ∈ as soon as all occurrences of coin(r) in M are such that r / ∈ {0, 1} (occurrences of coin(0) and coin(1) can be replaced by 1 and 0 respectively without changing the semantics of M ).This is easy to prove by a simple induction on M .
Since [BE01,Bou11] we know that Girard's coherence space semantics can be modified as follows: a non-uniform coherence space is a triple X = (|X|, ˝X , ˇX ) where |X| is an at most countable set (the web of X) and ˝X , ˇX are disjoint binary symmetric relations on |X| called strict coherence and strict incoherence but contrarily to ordinary coherence spaces we can have a ˝X a or a ˇX a for some a ∈ |X|.These objects can be organized into a categorical model of classical linear logic nCoh whose associated Kleisli cartesian closed category is a model of PCF (that is, pPCF without the coin(r) construct).Contrarily to what happens with Girard's coherence spaces 10 , we have Moreover given a PCF term M such that M : τ , the set M nCoh , which is a clique of the non-uniform coherence space τ nCoh -meaning that ∀a, a ∈ M ¬(a ˇ τ nCoh a ) -, satisfies In other words, the interpretation of a PCF term in nCoh is exactly the same as its interpretation in the basic model Rel.So what is the point of the model nCoh?It teaches us something we couldn't see in Rel: M Rel is a clique in the non-uniform coherence space associated with its type in nCoh.
Let M ∈ PCF lab (L) (that is M ∈ pPCF lab (L) and M contains no instances of coin(r)) such that M : ι so that sp x (M ) ∈ PCF satisfies (x l : ι) l∈L sp x (M ) : ι where x = (x l : ι) l∈L is a list of pairwise distinct variables.Then sp x (M ) nCoh x is a clique of the non-uniform coherence space X = !N nCoh ⊗ • • • ⊗ !N nCoh N nCoh (one occurrence of !N nCoh for each element of L).If ( µ, n) ∈ sp x (M ) nCoh x then we know that each µ(l) ∈ M fin (N) satisfies supp(µ(l)) ⊆ {0} (see Lemma 2.9).And since the set sp x (M ) nCoh x is a clique in X and in view of the characterization above of the coherence relation of !N nCoh , this set contains at most one element.When it is empty, this means that the execution of M does not terminate.When it is a singleton {( µ, n)}, the execution of M terminates with value n, using µ(l)(0) times the l-labeled subterms of M .This can be understood as a version of the denotational 10 Indeed in Girard's coherence spaces, the web of !X is the set of all finite cliques, or all finite multicliques of elements of |X| (there are two versions of this exponential), hence this web depends on the coherence relation ˝X .This is no more the case with non-uniform coherence spaces and this is the most important difference between the two models.

A Lipschitz property.
Using the differential of Section 3.2, we prove that all morphisms of Pcoh !satisfy a Lipschitz property, with a coefficient which cannot be upper bounded on the whole domain.
First of all, observe that, if w ∈ P(X Y ) and x ∈ PX, we have Indeed if w X Y = 0 and x X = 0 we have w w X Y ∈ P(X Y ) and x x X ∈ PX, therefore w w X Y • x x X ∈ PY and our contention follows.And if w X Y = 0 or x X = 0 the inequation is obvious since then w • x = 0.
Let p ∈ [0, 1).If x ∈ PX and x X ≤ p, observe that, for any u ∈ PX, one has x + (1 − p)u X ≤ x X + (1 − p) u X ≤ 1 and hence (1 − p)u ∈ P(X x ).Therefore, given w ∈ P(X x Y ), we have w • (1 − p)u Y ≤ 1 for all u ∈ PX and hence (1 − p)w ∈ P(X Y ).Let t ∈ P(!X 1).We have seen that, for all x ∈ PX we have t (x) ∈ P(X x 1 t(x) ) ⊆ P(X x 1).Therefore, if we assume that x X ≤ p, we have Let now x, y ∈ PX be such that x X , y X ≤ p (we don't assume any more that x and y are comparable).We have

Figure 1 :
Figure 1: Typing rules for pPCF terms and stacks
k i=1 M fin ( σ i Rel ) and a ∈ τ Rel , one has ( µ, a) ∈ M Rel Γ iff the judgment (x i : µ i : σ i ) k i=1 M : a : τ is derivable.The interpretation of a term in Rel is simply the support of its interpretation in Pcoh: