Probabilistic Rewriting and Asymptotic Behaviour: on Termination and Unique Normal Forms

While a mature body of work supports the study of rewriting systems, abstract tools for Probabilistic Rewriting are still limited. In this paper we study the question of uniqueness of the result (unique limit distribution), and develop a set of proof techniques to analyze and compare reduction strategies. The goal is to have tools to support the operational analysis of probabilistic calculi (such as probabilistic lambda-calculi) where evaluation allows for different reduction choices (hence different reduction paths).


Introduction
Rewriting Theory [Ter03] is a foundational theory of computing.Its impact extends to both the theoretical side of computer science, and the development of programming languages.A clear example of both aspects is the paradigmatic term rewriting system, λ-calculus, which is also the foundation of functional programming.Abstract Rewriting Systems (ARS) are the general theory which captures the common substratum of rewriting theory, independently of the particular structure of the objects.It studies properties of terms transformations, such as normalization, termination, unique normal form, and the relations among them.Such results are a powerful set of tools which can be used when we study the computational and operational properties of any calculus or programming language.Furthermore, the theory provides tools to study and compare strategies, which become extremely important when a system may have reductions leading to a normal form, but not necessarily.Here we need to know: is there a strategy which is guaranteed to lead to a normal form, if any exists (normalizing strategies)?Which strategies diverge if at all possible (perpetual strategies)?
Probabilistic Computation models uncertainty.Probabilistic forms of automata [Rab63], Turing machines [San71], and the λ-calculus [Sah78] exist since long.The pervasive role it is assuming in areas as diverse as robotics, machine learning, natural language processing, has stimulated the research on probabilistic programming languages, including functional languages [KMP97, RP02, PPT05] whose development is increasingly active.A typical programming language supports at least discrete distributions by providing a probabilistic construct which models sampling from a distribution.This is also the most concrete way to endow the λ-calculus with probabilistic choice [DPHW05,DLZ12,EPT11]. Within the vast research on models of probabilistic systems, we wish to mention that probabilistic rewriting is the explicit base of PMaude [AMS06], a language for specifying probabilistic concurrent systems.
Content and contributions.Probability is concerned with asymptotic behaviour: what happens not after a finite number n of steps, but when n tends to infinity.In this paper we focus on the asymptotic behaviour of rewrite sequences with respect to normal formsnormal form being the most standard notion of result in rewriting.We study computational properties such as (1.), (2.), (3.) above.We do so with the point of view of ARSs, aiming for properties which hold independently of the specific nature of the rewritten objects; the purpose is to have tools which apply to any probabilistic rewriting system.PARS.After motivating and introducing our formalism for PARSs (Section 2 and 3), in Section 4 we formalize the notion of limit distribution, and of well-defined result.Since in a PARS each term has different possible reduction sequences (with each sequence leading to a possibly different limit distribution), to each term is naturally associated a set of limit distributions.To study when a PARS has a well-defined result is the main focus of the paper.
Recall a property which is crucial to the computational interpretation of a system such as the λ-calculus: if a term has a normal form, it is unique-meaning that the result of the computation is well-defined.With this in mind, we investigate in the probabilistic setting an analogue of the ARS notions of Unique Normal Form (UN), and the possibility or necessity to reach a result: Normalization (WN), Termination (SN).We provide methods and criteria to establish these properties, and we uncover relations between them.Specific contributions are the following.
• We propose an analogue of UN for PARS.The question was already studied in [DM18] for PARS which are almost surely terminating, but the solution there does not extend to the general case.• We investigate the classical ARS method to prove UN via confluence; we uncover that subtle aspects appear when dealing with a notion of result as a limit.We do prove an analogue of "confluence implies UN" for PARS-however the proof is not simply an adaptation of the standard techniques, due to the fact that the set of limit distributions is-in general-infinite, and it is not guaranteed to have maximal elements (think of [0, 1[ which has a sup, but not a max).
Asymptotic rewriting: QARS.To better understand the asymptotic behaviour of computation, in Section 5 we introduce the setting of Quantitative Abstract Rewrite System (QARS).While motivated from the analysis of probabilistic rewriting, QARSs abstract from the probabilistic structure.This allows us to capture the essence of the arguments, and to separate the properties which really depend on probability (and its specific properties) from those which are only concerned with the fact that results are limits.QARS are a natural refinement of the notion of Abstract Rewrite Systems with Information content (ARSI), introduced by Ariola and Blom [AB02].There, to the ARS is associated a partial order that expresses the information content of the elements.We adopt the same view.ARSI however have a notion of limit which is tailored to infinite normal forms in the sense of Böhm trees [Bar84] and Levy-Longo trees [Lév78].With QARS, we simply move from partial orders (and a specific definition of limit), to ω-complete partial orders-this is enough to capture also probabilistic computation.
First, we study the properties of limits.Then, we provide a set of proof techniques to support the asymptotic analysis of reduction strategies.To do so, we extend to our setting a method which was introduced for ARSs by Van Oostrom [vO07], and which is based on Newman's property of Random Descent (RD) [New42,vO07,vOT16] (see Section 1.1.2).The Random Descent method turns out to be well-suited to asymptotic and probabilistic rewriting, providing a useful family of tools.In analogy to their counterpart in [vO07], we generalize in a quantitative way the notions of Random Descent (which becomes obs-RD) and of being better (which become obs-better); both properties are here parametric with respect to the information content which we wish to observe.
A significant technical feature (inherited from [vO07]) is that both notions of obs-RD and obs-better come with a characterization via a local condition, in the sense that only single steps from an object-rather than all possible sequences of steps-need to be examined.
Probabilistic rewriting: tools and applications.In Sections 7 and 8 we specialize the Random Descent techniques to PARS.
• obs-RD entails that all rewrite sequences from a term lead to the same result, in the same expected number of steps (the average of number of steps, weighted w.r.t.probability).• obs-better offers a method to compare strategies ("strategy S is always better than strategy T ") w.r.t. the probability of reaching a result and the expected time to reach a result.It provides a sufficient criterion to establish that a strategy is normalizing (resp.perpetual ) i.e. the strategy is guaranteed to lead to a result with maximal (resp.minimal) probability.To illustrate their use, we apply these methods to a probabilistic λ-calculus-Weak Callby-Value λ-calculus-which is discussed in Section 7.2.A larger example of application to probabilistic λ-calculi is [FR19], whose developments rely also on the abstract results presented here; we illustrate this in Section 9.
Remark 1.1 (On the term Random Descent).Please note that in [New42], the term Random refers to non-determinism (in the choice of the redex), not to randomized choice.
Journal vs conference version.This paper is the journal version of [Fag19].The content has been considerably extended.In particular, we develop the setting of QARS (Section (5)), which formalizes the notion of asymptotic rewriting, and does not appear in [Fag19].This allows us to separate the properties which really depend on probability from those which are concerned with results as limits, cleaning the arguments from unnecessary structure.The study of limits in both probabilistic and non-probabilistic setting is unified to a more general theory.The results obtained for QARS can be transferred to ARS and PARS alike, but also to other frameworks where reduction is asymptotic.
1.1.1.Probabilistic λ-calculus, non-deterministic evaluation, and (non-)Unique Result.Rewrite theory provides numerous tools to study uniqueness of normal forms, as well as techniques to study and compare strategies.This is not the case in the probabilistic setting.Perhaps a reason is that when extending the λ-calculus with a choice operator, confluence is lost, as was observed early [dP95]; we illustrate it in Example 1.2 and 1.3, which is adapted from [dP95,DLZ12].The way to deal with this issue in probabilistic λ-calculi (e.g.[DPHW05, DLZ12, EPT11]) has been to fix a deterministic reduction strategy, typically "leftmost-outermost".To fix a deterministic strategy is not satisfactory, neither for the theory nor the practice of computing.To understand why this matters, recall for example that confluence of the λ-calculus is what makes functional programs inherently parallel: every sub-expression can be evaluated in parallel, still, we can reason on a program using a deterministic sequential model, because the result of the computation is independent of the evaluation order (we refer to [Mar13], and to Harper's text "Parallelism is not Concurrency" for discussion on deterministic parallelism, and how it differs from concurrency).Let us see what happens in the probabilistic case.
Example 1.2 (Confluence failure).Let us consider the untyped λ-calculus extended with a binary operator ⊕ which models probabilistic choice.Here ⊕ is just flipping a fair coin: M ⊕N reduces to either M or N with equal probability 1/2; we write this as Consider the term P Q, where P = (λx.x)(λx.xXOR x) and Q = (T ⊕ F); here XOR is the standard constructs for the exclusive OR, T and F are terms which encode the booleans.
• If we evaluate P and Q independently, from P we obtain λx.(x XOR x), while from Q we have either T or F, with equal probability 1/2.By composing the partial results, we obtain {(T XOR T) }, and therefore {F 1 }.• If we evaluate P Q sequentially, in a standard leftmost-outermost fashion, P Q reduces to (λx.x XOR x)Q which reduces to (T ⊕ F) XOR (T ⊕ F) and eventually to {T Example 1.3.The situation becomes even more complex if we examine also the possibility of diverging; try the same experiment on the term P R, with P as above, and R = (T⊕F)⊕∆∆ (where ∆ = λx.xx).Proceeding as before, we now obtain either {F We do not need to loose the features of λ-calculus in the probabilistic setting.In fact, while some care is needed, determinism of the evaluation can be relaxed without giving up uniqueness of the result: the calculus we introduce in Section 7.2 is an example (we relax determinism to Random Descent); we fully develop this direction in further work [FR19].To be able to do so, we need abstract tools and proof techniques to analyze probabilistic rewriting.The same need for theoretical tools holds, more in general, whenever we desire to have a probabilistic language which allows for deterministic parallel reduction.
In this paper we focus on uniqueness of the result, rather than confluence.While important, confluence is a sufficient but not necessary property to have uniqueness of normal forms.

Other key notions.
Confluence is not enough.Key to non-deterministic evaluation strategies is that, despite the fact that there are many ways of evaluating a term, all choices eventually yield the same result.To this aim, confluence is not enough.The reduction of a term that has a normal form may still produce diverging computations, which yield no result (think of β-reduction in usual λ-calculus, reducing the term (λx.z)(∆∆)).What we really want for a non-deterministic evaluation strategy is that all reduction sequences from the same t have the same behaviour : if t has a normal form, then all reduction sequences from t eventually reach it (uniform normalization); ideally, all should do so in the same number of steps.This latter property is known as Random Descent [New42,vO07,vOT16], and it is often Random Descent.Newman's Random Descent (RD) [New42] is an ARS property which guarantees that normalization suffices to establish both termination and uniqueness of normal forms.Precisely, if an ARS has random descent, paths to a normal form do not need to be unique, but they have unique length.In its essence: if a normal form exists, all rewrite sequences lead to it, and all have the same length 1 .While only few systems directly verify it, RD is a powerful ARS tool; a typical use in the literature is to prove that a strategy has RD, to conclude that it is normalizing.A well-known property which implies RD is a form of diamond: Von Oostrom [vO07] has defined a characterization of RD by means of a local property, proposing RD as a uniform method to (locally) compare strategies for normalization and minimality (resp.perpetuality and maximality).Such a method has then been extended in [vOT16], where the notion of length is abstracted into a notion of measure.In Section 7 and 8 we develop similar methods in a probabilistic setting.The probabilistic analogous of length, is the expected number of steps (Section 7.1).
Weak Call-by-Value λ-calculus (and its probabilistic counter-part).A notable example of system which satisfies Random Descent is Call-by-Value (CbV) λ-calculus endowed with weak evaluation.
In Plotkin's Call-by-Value λ-calculus, β-redexes are fired only when the argument is a value (i.e., a variable or a λ-abstraction).Since the goal is to compute values-as is natural in functional programming-evaluation is often restricted to be weak [How70,CH98], where weak means no reduction in the function bodies (i.e.within the scope of λ-abstractions).Weak CbV is the basis of the ML/CAML family of functional languages-and of most probabilistic functional languages.There are three main weak schemes: reducing from left to right, as originally defined by Plotkin [Plo75], from right to left, as in Leroy's ZINC abstract machine [Ler90] (resulting in a more efficient implementation), or in an arbitrary order, used for example in [DLM08].While left and right reduction are deterministic, weak reduction in arbitrary order is non-deterministic and subsumes both.
If we consider programs (closed terms), values are exactly the normal forms of weak reduction.Because it satisfies Random Descent, CbV weak reduction → w has striking properties (see e.g.[DLM08] for an account).First, if M reduces to a value (M → w * V ), then any sequence of → w -steps from M will reach V ; second, the number n of steps such that M → w n V is always the same.
In Section 7.2, we study a probabilistic extension of weak CbV, Λ w ⊕ .We show that it has analogous properties to its classical counterpart: all rewrite sequences converge to the same result, in the same expected number of steps.
Local vs global conditions.An important distinction in rewriting theory is between local and global properties.A property of a term t is global if it is quantified over all rewrite sequences from t, it is local if it is quantified only over one-step reductions from the term.Local properties are easier to test, because the analysis (usually) involves a finite number of cases.To work locally-that is, reducing a test problem which is global to local properties-dramatically reduces the space of search when testing.Let us exemplify this with a familiar example.
A Locality is also the strength and beauty of the Random Descent method.While Newman's lemma fails in a probabilistic setting, Random Descent methods adapt well.
1.2.Related work.First, let us observe that there is a vast literature on probabilistic transition systems, however objectives and therefore questions and tools are different than those of PARS.A similar distinction exist between abstract rewrite systems and transition systems.Here we discuss related work in the context of PARS [BG06,BK02].
We are not aware of any work which investigates normalizing strategies (or normalization in general, rather than termination).Instead, confluence in probabilistic rewriting has already drawn interesting work.A notion of confluence for a probabilistic rewrite system defined over a λ-calculus is studied in [DAGG11,DLMZ11]; in both cases, the probabilistic behaviour corresponds to measurement in a quantum system.The work more closely related to our goals is [DM18].It studies confluence of non-deterministic PARS in the case of finitary termination (being finitary is the reason why Newman's Lemma holds), and in the case of AST .As we observe in Section 4.2.2, their notion of unique limit distribution (if α, β are limits, then α = β), while simple, it is not an analogue of UN for general PARS.We extend the analysis beyond AST, to the general case, which arises naturally when considering untyped probabilistic λ-calculus.On confluence, we also mention [KC17], whose results however do not cover non-deterministic PARS ; the probability of the limit distribution is concentrated in a single element, in the spirit of Las Vegas Algorithms.[KC17] revisits results from [BK02], while we are in the non-deterministic framework of [BG06].
The way we define the evolution of a PARS, via the one-step relation ⇒, follows the approach in [LFVY17], which also contains an embryo of the current work (a form of diamond property); the other results and developments are novel.A technical difference with [LFVY17] is that for the formalism to be general, a refinement is necessary (see Section 2.5); the issue was first pointed out in [DM18].Our refinement is a variant of the one introduced (for the same reasons) in [ALY20]; there, normal forms are discarded-because the authors are only interested in the probability of termination-while we are interested in a more qualitative analysis of the result.[ALY20] demonstrates the equivalence with the approach in [BG06].
Quantitative Abstract Rewrite Systems (QARS) refine Ariola and Blom's notion of Abstract Rewrite Systems with Information content (ARSI) [AB02]; there, to the ARS is associate a partial order which expresses a comparison between the "information content" of the elements.Here, we simply move from partial orders to ω-complete partial orders (ω-cpo).The difference is in the notion of limit, hence its properties, and our novel contribution is the study of such properties.ARSI are tailored to infinite normal forms in the sense of Böhm and Levy-Longo trees-limits (infinite normal forms) are there given by completing the partial order via a specific standard construction, ideal completion (see for instance Ch. 1 in [AC98]).So, given an element t in an ARSI, the infinite normal form of t is the downward closure of the set of the information contents of all its reducts.Such an approach would not suit probability distributions, but moving to ω-cpo suffices.Being simply the supremum of an ω-chain, the notion of limit which come with QARS is more general2 and flexible, allowing us to model a larger variety of situations.All results we establish for limits in the setting of QARS also hold for the infinite normal forms of ARSI, while the converse is not true.In Appendix A.1 we give a concrete example that shows the difference: a confluent ARSI has unique infinite normal forms (Theorem 5.4 there)-the analogue result is (in general) not true for QARS.

Probabilistic Abstract Rewriting System
We assume the reader familiar with the basic notions of rewrite theory (such as Ch. 1 of [Ter03]), and of discrete probability theory.We review the basic language of both.We then recall the definition of probabilistic abstract rewrite system from [BK02, BG06]-here denoted pars-and explain on examples how a system described by a pars evolves.This will motivate the formalism which we present in Section 3. A relation → is deterministic if for each t ∈ C there is at most one s ∈ C such that t → s.
Unique Normal Form.C has the property of unique normal form (with respect to reduction) Normalization and Termination.The fact that an ARS has unique normal forms does not imply neither that all elements have a normal form, nor that if an element has a normal form, each rewrite sequence converges to it.An element c is terminating3 (aka strongly normalizing, SN), if it has no infinite sequence c → c 1 → c 2 . ..; it is normalizing (aka weakly normalizing, WN), if it has a normal form.These are all important properties to establish about an ARS, as it is important to have a rewrite strategy which finds a normal form, if it exists.
Vol. 18:2 PROBABILISTIC REWRITING AND ASYMPTOTIC BEHAVIOUR 5:9 Basics on Probabilities.The intuition is that random phenomena are observed by means of experiments (running a probabilistic program is such an experiment); each experiment results in an outcome.The collection of all possible outcomes is represented by a set, called the sample space Ω.When the sample space Ω is countable, the theory is simple.A discrete probability space is given by a pair (Ω, µ), where Ω is a countable set, and µ is a discrete probability distribution on Ω, i.e. a function µ : Ω → [0, 1] such that ω∈Ω µ(ω) = 1.A probability measure is assigned to any subset A ⊆ Ω as µ(A) = ω∈A µ(ω).In the language of probabilists, a subset of Ω is called an event.

Probabilistic Abstract Rewrite Systems (pars).
A probabilistic abstract rewrite system ( pars) is a pair (A, →) of a countable set A and a relation → ⊆ A × Dst F (A) such that for each (a, β) ∈ →, β = 1.We write a → β for (a, β) ∈ → and we call it a rewrite step, or a reduction.An element a ∈ A is in normal form if there is no β with a → β.We denote by NF A the set of the normal forms of A (or simply NF when A is clear).A pars is deterministic if, for all a, there is at most one β with a → β.
Remark 2.3.The intuition behind a → β is that the rewrite step a → b (b ∈ A) has probability β(b).The total probability given by the sum of all steps a → b is 1.
Probabilistic vs Non-deterministic.It is important to understand the distinction between probabilistic choice (which globally happens with certitude) and non-deterministic choice (which leads to different distributions of outcomes.)Let us discuss some examples.
Example 2.4 (A deterministic pars).Fig. 2 shows a simple random walk over N, which describes a gambler starting with 2 points and playing a game where every time he either gains 1 point with probablity 1/2 or looses 1 point with probability 1/2.This system is encoded by the following pars on N: n + 1 → {n 1/2 , (n + 2) 1/2 }.Such a pars is deterministic, because for every element, at most one choice applies.Note that 0 is the (only) normal form.
Example 2.5 (A non-deterministic pars).Assume now (Fig. 3) that the gambler of Example 2.4 is also given the possibility to stop at any time.The two choices are here encoded as follows: n + 1 → {n 1/2 , (n + 2) 1/2 }, n + 1 → {stop 1 } The choice between two possible rules makes the system non-deterministic, and therefore the system can evolve in several different ways.Fig. 3 illustrates one possible way.
Probabilistic vs Non-deterministic.We now need to explain how a system which is described by a pars evolves.An option is to follow the stochastic evolution of a single run, a sampling at a time, as we have done in Fig. 1, 2, and 3.This is the approach in [BG06], where non-determinism is solved by the use of policies.Here we follow a different (though equivalent) way.We describe the possible states of the system, at a certain time t, globally, essentially as a distribution on the space of all elements.The evolution of the system is then a sequence of such states.Since all the probabilistic choices are taken together, a global step happens with probability 1; the only source of non-determinism in the evolution of the system is choice.This global approach allows us to deal with non-determinism by using techniques which have been developed in Rewrite Theory.Before introducing the formal definitions, we informally examine some examples, and point out why some care is needed.
Example 2.8 (Fig. 5).Fig. 5 illustrates the possible evolutions of a system with rules r 0 : a → {a 1/2 , T 1/2 } and r 2 : a → {a 1 }.5:11 Figure 5. Ex.2.8 (non-deterministic pars) If we look at Fig. 3, we observe that after two steps, there are two distinct occurrences of the element 2, which live in two different runs of the program: the run 2.1.2,and the run 2.3.2.There are two possible transitions from each 2. The next transition only depends on the fact of having 2, not on the run in which 2 occurs: its history is only a way to distinguish the occurrence.For this reason, given a pars (A, →), we keep track of different occurrences of an element a ∈ A, but not necessarily of the history.Next section formalizes these ideas.
Markov Decision Processes.To understand our distinction between occurrences of a ∈ A in different paths, it is helpful to think how a system is described in the framework of Markov Decision Processes (MDP) [Put94].Indeed, in the same way as ARS correspond to transition systems, pars correspond to probabilistic transitions.Let us regard a pars step r : a → β as a probabilistic transition (r is here a name for the rule).Let assume a 0 ∈ A is an initial state.In the setting of MDP, a typical element (called sample path) of the sample space Ω is a sequence ω = (a 0 , r 0 , a 1 , r 1 . ..)where r 0 : a 0 → β 1 is a rule, a 1 ∈ Supp(β 1 ) an element, r 1 : a 1 → β 1 , and so on.The index t = 0, 1, 2, . . ., n, . . . is interpreted as time.On Ω various random variables are defined; for example, X t = a t , which represents the state at time t.The sequence X t is called a stochastic process.

A Formalism for Probabilistic Rewriting
This section presents a formalism to describe the global evolution of a system described by a pars, which is a variant of that used in [ALY20].The equivalence with the approach in [BG06] is demonstrated in [ALY20].

PARS.
Let A be a countable set on which a pars A = (A, →) is given.We define a rewrite system (mA, ⇒), where mA is the set of objects to be rewritten, and ⇒ a relation on mA.We indicate as PARS the resulting rewriting system.
The objects to be rewritten.mA is the set of all multidistributions on A, which are defined as follows.Let m be a multiset4 of pairs of the form pa, where p ∈ ]0, 1] is a real number, and a ∈ A an element of A; the multiset m = [p i a i | i ∈ I] is a multidistribution on A if m = i∈I p i ≤ 1.We write the multidistribution [1a] simply as [a].
Sum and product are partial operations, similarly to what happens for distributions.The sum of multidistributions is denoted by +, and it is the disjoint union of multisets (think of list concatenation).Given two multidistributions The product q • m of a scalar q and a multidistribution m is defined pointwise, provided that p ∈ [0, 1]: Intuitively, a multidistribution m ∈ mA is a syntactical representation of a discrete probability space where each point in the space (each outcome) is associated to a probability and an element of A. More precisely, each pair in m correspond to a trace of computation, or-in the language of Markov Decision Processes-to a sample path.
The rewriting relation.The binary relation ⇒ on mA is obtained by lifting the relation → of the pars A = (A, →), as follows.
Definition 3.1 (Lifting).Given a relation →⊆ A × Dst(A), its lifting to a relation ⇒⊆ mA × mA is defined by the rules For the lifting, several natural choices are possible.Here we force all non-terminal elements to be reduced.This choice plays an important role for the development of the paper, as it corresponds to the key notion of one step reduction in classical ARS (see discussion in Section 10).Let us discuss some more the lifting rules.
• Rule L1.Note that the relation ⇒ is reflexive on normal forms.
• Rule L3.To apply rule L3, we have to choose a reduction step from a i for each i ∈ I.
The (disjoint) sum of all m i (i ∈ I) is weighted with the scalar p i associated to each p i a i .
Example 3.2.Let us derive the reduction in Fig. 3.For readability, elements in N are in bold.
Figures conventions: we depict any rewrite relation simply as →; as it is standard, we use for → * ; solid arrows are universally quantified, dashed arrows are existentially quantified.

3.2.
Normal forms and observations.Intuitively, a multidistribution m ∈ mA is a syntactical representation of a discrete probability space where at each element of the space is associated a probability and an element of A. This space may contain various information.We analyze this space by defining random variables that observe specific properties of interest.
Here we focus on a specific event of interest: the set NF A of normal forms of A.
Distribution over the elements of A. First of all, to each multidistribution m = [p i a i | i ∈ I] we can associate a (sub)distribution m dst ∈ Dst(A) as follows: Informally, for each c ∈ A, we sum the probability of all occurrence of c in the multidistribution (observe that, m being a multiset, there are in general more than one elements p i a i where a i = c).
Distribution over the normal forms of A. Given m ∈ mA, the probability that the system is in normal form is described by m dst (NF A ) (recall Example 2.1); the probability that the system is in a specific normal form u is described by m dst (u).
It is convenient to spell-out a direct definition of both, to which we will refer in the rest of the paper.
Informally, this function extracts from m = [p i a i ] i∈I the subdistribution m NF over normal forms.8 .The probability of reaching a normal form u can only increase in a rewrite sequence (because of (L1) in Def.3.1).Therefore the following key lemma holds.
Equivalences and Order.In this paper m ∈ mA is a multiset, for simplicity and uniformity with [FR19], but we could have used lists rather than multisets-as we do in [Fag19].We do not really care of equality of elements in mA-what we are interested are instead equivalence and order relations w.r.t the observation of specific events.For example, the following (recall from Section 2.3 that the order on Dst(A) is the pointwise order): Let m, r ∈ mA.

Asymptotic Behaviour of PARS
We examine the asymptotic behaviour of rewrite sequences with respect to normal forms, which are the most common notion of result.
The intuition is that a rewrite sequence describes a computation; an element m i such that m ⇒ i m i represents a state (precisely, the state at time i) in the evolution of the system with initial state m.The result of the computation is a distribution over the possible normal forms of the probabilistic program.We are interested in the result when the number of steps tends to infinity, that is at the limit.This is formalized by the (rather standard) notion of limit distribution (Def.4.3).What is new here, is that since each element m has different possible rewrite sequences (each sequence leading to a possibly different limit distribution) to m is naturally associated a set of limit distributions.
A fundamental property for a system such as the λ-calculus is that if an element has a normal form, it is unique.This is crucial to the computational interpretation of the calculus, because it means that the result of the computation is well defined.A question we need to address in the setting of PARS, is what does it mean to have a well-defined result.With this in mind, we investigate an analogue of the ARS notions of normalization, termination, and unique normal form.4.1.Limit Distributions.Before introducing limit distributions, we revisit some facts on sequences of bounded functions.
Monotone Convergence.We recall the following standard result.
Theorem 4.1 (Monotone Convergence for Sums).Let X be a countable set, f n : X → [0, ∞] a non-decreasing sequence of functions, such that f (x) := lim n→∞ f n (x) = sup n f n (x) exists for each x ∈ X.Then lim Recall that subdistributions over a countable set X are equipped with the pointwise order : α ≤ α if α(x) ≤ α (x) for each x ∈ X.Let α n n∈N be a non-decreasing sequence of (sub)distributions over X.For each t ∈ X, the sequence α n (t) n∈N of real numbers is nondecreasing and bounded, therefore the sequence has a limit, which is the supremum: lim n→∞ α n (t) = sup n {α n (t)}.Observe that if α < α then α < α , where we recall that α := x∈X α(x).Lemma 4.2.Given α n n∈N as above, the following properties hold.Define Proof.(1.) follows from the fact that α n n∈N is a nondecreasing sequence of functions, hence (by Monotone Convergence, Thm.4.1) we have: ) is immediate, because the sequence α n n∈N is nondecreasing and bounded.
Limit distributions.Let A = (mA, ⇒) be the rewrite system induced by a pars (A, →).
Let m n n∈N be a rewrite sequence.If t ∈ NF A , then m NF n (t) n∈N is nondecreasing (by Lemma 3.4); so we can apply Lemma 4.2, with α n n∈N now being m NF n n∈N .Definition 4.3 (Limits).Let m n n∈N be a rewrite sequence from m ∈ mA.We say Note that in the definition above, p (item 1.) is a scalar, while β (item 2.) is subdistribution over normal forms.The former is a quantitative version of a boolean (yes/no) property, to reach a normal form.The latter, is a quantitative (more precisely, probabilistic) version of "which normal form is reached." Clearly A computationally natural question is if the result of computing an element m is well defined.We analyze it in Section 5-putting this question in a more general, but also simpler, context.In fact, most properties of the asymptotic behaviours of PARSs are not specific to probability, and are best understood when focusing only on the essentials, abstracting from the details of the formalism.Before doing so, we build an intuition by informally investigating the notions of normalization, termination, and unique normal form in our concrete setting.4.2.PARS vs ARS: Subtleties, Questions, and Issues.4.2.1.On Normalization and Termination.In the setting of ARS, a rewrite sequence from an element c may or may not reach a normal form.The notion of reaching a normal form comes in two flavours (see Section 2.1): (1.) there exists a rewrite sequence from c which leads to a normal form (normalization, WN); (2.) each rewrite sequence from c leads to a normal form (termination, SN).If no rewrite sequence leads to a normal form, then c diverges.
It is interesting to analyze a similar ∃/∀ distinction in a quantitative setting.We distinguish two cases.
Convergence with probability 1.If we restrict the notion of convergence to probability 1, then it is natural to say that an element m weakly normalizes if it has a rewrite sequence which converges with probability 1, and strongly normalizes (or, it is AST) if all rewrite sequences converge with probability 1.
The general case.Many natural examples-in particular when we consider untyped probabilistic λ-calculus-are not limited to convergence with probability 1, as Example 1.3 shows.In the general case, extra subtleties emerge, due to the fact that each rewrite sequence converges with some probability p ∈ [0, 1] (possibly 0).
A first important observation is that the set {q | m ∞ =⇒ q} has a supremum (say p), but not necessarily a greatest element.Think of [0, p[, which has a sup, but not greatest element.If Lim has no greatest element, it means that no rewrite sequence converges to the supremum p.
A second remark is that we naturally speak of termination/normalization with probability 0.Not only does it appear awkward to separate the case 0 (as distinct from 0.00001), but divergence also-dually-should be quantitative.
We say that m (weakly) normalizes (with probability p) if {q | m ∞ =⇒ q} has a greatest element p.This means that there exists a reduction sequence whose limit is p.Dually, we can say that m strongly normalizes (or terminates) (with probability p), if all reduction sequences converge with the same probability p ∈ [0, 1].
Since in this case all reduction sequences from the same element have the same behaviour, a better term seems that m uniformly normalizes.And indeed, "all reduction sequences from the same element converge with the same probability" is the analogue of the ARS notion of uniform normalization, the property that all reduction sequences from an element either all terminate, or all diverge (otherwise stated: weak normalization implies strong normalization).Summing up, we use the following terminology: Definition 4.4 (Normalization and Termination).A PARS is WN ∞ , SN ∞ , or AST, if each m satisfies the corresponding property, where • m is WN ∞ (m normalizes) if there exists a sequence from m which converges with greatest probability (say p).To specify, we say that m is p-WN ∞ .• m is SN ∞ (m strongly-or uniformely-normalizes) if all sequences from m converge with the same probability (say p).To specify, we say that m is p-SN ∞ .• m is Almost Sure Terminating (AST) if it strongly normalizes with probability 1 (i.e., it is 1-SN ∞ ).
Example 4.5.The system in Fig. 5 is 1-WN ∞ , but not 1-SN ∞ .The top rewrite sequence (in blue) converges with probability 1 = lim n→∞ n k:1 1 2 k .The bottom rewrite sequence (in red) converges with probability 0. In between, we have all dyadic possibilities.In contrast, the system in Fig. 4 is AST .Normalization and termination are quantitative yes/no properties-we are only interested in the number β , for β limit distribution; for example, if m then m converges with probability 1, but we make no distinction between the two-very different-results.Similarly, consider again Fig. 4. The system is AST, however the limit distributions are not unique: they span an infinity of distributions which have shape {T p , F 1−p }.These observations motivate attention to finer-grained properties.
In the usual theory of rewriting, the fact that the result is well defined is expressed by the unique normal form property (UN).Let us examine an analogue of UN in a probabilistic setting.An intuitive candidate is the following, which was first proposed in [DM18]: shows that, in the case of AST, confluence implies ULD.However, ULD is not a good analogue in general, because a PARS does not need to be AST (or SN ∞ ); it may well be that m ∞ =⇒ α and m ∞ =⇒ β, with α = β .We have seen rewrite systems which are not AST in Fig. 5, and in Example 1.3.Similar examples are natural in an untyped probabilistic λ-calculus (recall that the λ-calculus is not SN!).
We then prefer not to limit the analysis to AST.In such a case, ULD is not implied by confluence: the system in Fig. 5 is indeed confluent, but not ULD.Still, we would like to say that it satisfies a form of UN .
We propose as probabilistic analogue of UN the following property UN ∞ : Lim(m) has a greatest element.which we justify in Section 5, where we show that PARS satisfy an analogue of standard ARS results: "Confluence implies UN" (Thm.5.18), and "the Normal Form Property implies UN" (Prop.5.11).There are however two important observations to make.
Important observation!While the statements are similar to the classical ones, the content is not.To understand the difference, and what is non-trivial here, observe that in general there is no reason to believe that Lim(m) has maximal elements.Think again of the set [0, 1[, which has no max, even if it has a sup.Observe also that Lim(m) is-in general-uncountable.
In Section 5.2 we will see that to prove the existence of maximal limits is indeed not immediate.For this reason, while in the case of finitary termination uniqueness of normal forms follows immediately from confluence, it is not so when termination is asymptotic: confluence does not directly guarantee UN ∞ , and more work is needed.
Which notion of Confluence?Property UN ∞ is guaranteed by a form of confluence weaker than one would expect.Assume s * ⇔ m ⇒ * r; with the standard notion of confluence in mind, we may require that ∃u such that s ⇒ * u, r ⇒ * u or that ∃u, u such that s ⇒ * u, r ⇒ * u and u = flat u .Both are fine, but in Section 5.2 we show that a weaker notion of equivalence (which was already discovered in [AB02]) suffices-we only need to compare multidistributions w.r.t.their information content on normal forms.Remark 4.6.In the case of AST (and SN ∞ ), all limits are maximal, hence UN ∞ becomes ULD.4.2.3.Newman's Lemma Failure, and Proof Techniques for PARS.The statement of Thm.5.18 "Confluence implies UN ∞ " has the same flavour as the analogue one for ARSs, but the notions are not the same.The notion of limit (and therefore that of UN ∞ , SN ∞ , and WN ∞ ) does not belong to the theory of ARSs.For this reason, the rewrite system (mA, ⇒) which we are studying is not simply an ARS.One should not assume that standard properties of ARSs transfer to their asymptotic analog.An illustration of this is Newman's Lemma.Given a PARS, let us assume AST and observe that in this case, confluence at the limit can be identified with UN ∞ .A wrong attempt: AST + WCR ∞ ⇒ UN ∞ , where WCR ∞ : if m ⇒ s 1 and m ⇒ s 2 , then ∃r, with s 1 ∞ =⇒ r, s 2 ∞ =⇒ r.This does not hold.A counterexample is the PARS in Fig. 4, which does satisfy WCR ∞ .Remark 4.7.Could a different formulation uncover properties similar to Newman Lemma?Another " candidate" statement we can attempt is : AST + WCR ⇒ UN ∞ .Unfortunately, here we did not find an answer.However, this property is an interesting case study.It is not hard to show that such a property holds when Lim(m) is finite, or uniformly discrete, meaning that-given a definition of distance-there exist a minimal distance between two elements in Lim(m).This fact also implies that a counterexample (if any) cannot be trivial.On the other side, if the property holds, the difficulty is which proof technique to use, since well-founded induction is not available to us.
What is at play here is that the notion of termination is not the same for ARSs and for PARSs.A fundamental fact of ARSs (on which all proofs of Newman's Lemma rely) is: termination implies that the rewriting relation is well founded.All terminating ARSs allow well-founded induction as proof technique; this is not the case for probabilistic termination.To transfer properties from ARSs to PARSs there are two issues: we need to find the right formulation and the right proof technique.
Notice that our counter-example above still leaves open the question "Can a different formulation uncover properties similar to Newman's Lemma?" Or, better, "Are there local properties which guarantee UN ∞ ?"

Quantitative Abstract Rewriting Systems
We observed that the notion of result as a limit does not belong to ARSs.However, in many arguments we do not need all the structure coming from PARS.To be able to study asymptotic rewriting, in this section we define Quantitative Abstract Rewriting Systems (QARS).As already noted, QARS are a natural refinement of ARSI in [AB02]-we simply move from partial orders to ω-cpos.The main contribution of this section is to provide a set of proof techniques, first to study properties of the limits, and then to compare reduction strategies.Working abstractly allows us to study the asymptotic properties, capturing the essence of the arguments.
QARS.We can see computation as a process that produces a result by gradually increasing the amount of available information.So a reduction sequence gradually computes a result by converging (in a finite or infinite number of steps) to the maximal amount of information which it can produce.The standard structure to express a result in terms of partial information is that of an ω-cpo.
Recall that a partially ordered set S = (S, ≤) is an ω-complete partial order (ω-cpo) if every ω-chain b 0 ≤ b 1 ≤ . . .has a supremum in S. We assume the partial order to have a least element ⊥.We denote the elements of S with bold letters a, b, c. . .Let (C, →) be an ARS.To each element t ∈ C it is associated a notion of (partial) information, which is modeled by a function from C to an ω-cpo.Def.5.1 formalizes this intuition.Proof.Let r 0 , s ∈ C, a ∈ Lim obs (r 0 ), r 0 → * s.Let r n n∈N be a sequence with limit a.As illustrated in Fig. 6, starting from s, we build a sequence s = s r 0 → * s r 1 → * s r 2 . . ., where s r i , i ≥ 1 is given by Skew Confluence: from r 0 → * r i and r 0 → * s r i−1 we obtain s r i−1 → * s r i with obs(r i ) ≤ obs(s r i ).Let b be the limit of the sequence so obtained; observe that b ∈ Lim obs (r 0 ).By construction, ∀i, obs(r i ) ≤ obs(s r i ) ≤ b.From a = sup obs(r n ) it follows that a ≤ b.
LIM implies that if a maximal limit exists, it is the greatest limit.
Proposition 5.11 (Greatest limit).Given a QARS ((C, →), obs), and m ∈ C, LIM implies that if Lim obs (m) has a maximal element, then it is the greatest element.
Proof.Let a ∈ Lim obs (m) be maximal.For each c ∈ Lim obs (m), there is a sequence m n n∈N from m such that c = sup n obs(m n ).LIM implies that ∀n, m n → ∞ obs b n ≥ a.By maximality of a, b n = a and therefore obs(m n ) ≤ a. From c = sup n obs(m n ) we conclude that c ≤ a, that is, a is the greatest element of Lim obs (m).
Given a confluent QARS, to guarantee that UN ∞ holds, and therefore for each m ∈ C, [[m]] is defined, it suffices to establish that Lim obs (m) has a maximal element.
In Section 5.4 we prove that in the case of PARS, confluence implies the existence of a maximal element and therefore of a greatest element.To do that, we use more We now lift the result to P. Precisely, we prove that for P, property LIM (Def.5.9) implies existence of a maximal element α of Lim(m).Then (by Prop.5.11) α is the greatest element of Lim(m).We rely on the following properties, which we already established in Section 4.1.
Lemma 5.16.If P satisfies LIM, then P also does.Similarly for all variants of confluence in Def.5.7.
Proof.The claim follows from Fact 5.8, Lemma 5.10, and Propositions 5.11 and 5.17.

Tools for the analysis of QARS
We closed Section 4.2.3 with the question: "Are there local properties which guarantee UN ∞ ?"This section develops criteria of this kind.

If the result [[m]
] of computing m is well defined, the next natural question is how to compute it: does there exist a strategy → ♣ ⊆ → whose limit is guaranteed to be [[m]]?More generally: does there exist a strategy → ♣ ⊆ → whose limit is guaranteed to be a maximal element of Lim obs (m), if it exists?
We introduce some tools to help in this analysis.Our focus is on properties which can be expressed by local conditions.6.1.Weighted Random Descent.We present a method to establish-with a local testthat for each element m of a QARS, Lim obs (m) contains a unique element by generalizing the ARS property of Random Descent.Random Descent is not only an elegant technique in rewriting, developed in [vO07,vOT16], but adapts well and naturally to the asymptotic setting.
Random Descent.A reduction → has random descent (RD) [New42] if whenever an element t has normal form, then all rewrite sequences from t lead to it, and all have the same length.The best-known property which implies RD, as first observed by Newman [New42], is the following RD-diamond : if s 1 ← t → s 2 then either s 1 = s 2 , or s 1 → u ← s 2 for some u.This is only a sufficient condition.Quite surprisingly, Random Descent can be characterized by a local (one-step) property [vO07].Weighted Random Descent.We generalize Random Descent to observations.The property obs-RD states that even though an element m may have different reduction sequences, they are all indistinguishable if regarded through the lenses of obs.That is, if we consider all reduction sequences m n n∈N starting from the same m, they all induce the same ω-chain obs(m n ) .Obviously, if all ω-chains from m are equal, they all have the same limit sup n {obs(m n )}.
The main technical result of the section is a local characterization of the property (Thm 6.5), similarly to [vO07].Definition 6.1 (Weighted Random Descent).The QARS ((C, →), obs) satisfies the following properties (illustrated in Fig. 7) if they hold for each m ∈ C. sequences using the leftmost strategy fails for both CbV and CbN.The tools in Section 7 allow for an elegant solution.
On the necessity for non-deterministic evaluation and Random Descent in probabilistic λ-calculi.A programming language which is built on a λ-calculus implements a specific evaluation strategy.Typically, evaluation is given by a strategy → s of the general reduction →.In this paper, we studied a property of strategies which is more flexible than determinism, Random Descent.Why not simply fix a deterministic strategy?This choice has several motivations.Non-deterministic evaluation is a useful feature, which supports optimization techniques and parallel/distributed implementation, but in some cases it is also a necessity and a key reasoning tool -this appears clearly in the probabilistic case.
We illustrate this with two examples from the literature on probabilistic λ-calculus, [FR19] and [CP20].Here we discuss the most familiar of all reductions: Call-by-Name λ-calculus with head reduction (similar arguments hold for weak reduction in CbV λ-calculus).The usual definition of head reduction [Bar84] is deterministic, but it also has a non-deterministic variant (well studied in Linear Logic) whose normal forms are precisely the head normal forms.We write this reduction simply → h .Exactly as weak reduction for Call-by-Value, → h is well known to have Random Descent, and the same hold for its probabilistic incarnation (an explicit definition is in [FR19], Ch.X).
• In [FR19], moving from head reduction to its non-deterministic variant → h allows to obtain a standardization result, which was known [Alb14,Lev19] not to hold when adopting usual, left-to-right head reduction (see [FR19], Ex. 45 for a counter-example).
• Similarly, Curzi and Pagani [CP20] move from usual head reduction to head spine reduction, which in turn is included in → h .The fact that the evaluation order is not left-to-right, but still there is no difference with respect to head normal forms is crucial to obtain the result of that paper.

Conclusions
The motivation behind this work is the need for theoretical tools to support the study of operational properties in probabilistic computation, similarly to the role that ARS have for classical computation.
We have investigated several abstract properties of probabilistic rewriting, and how the behaviour of different rewrite sequences starting from the same element compare w.r.t.normal forms.To guarantee that the result of a computation is well defined, we have introduced and studied the property UN ∞ , a robust probabilistic analogue of the notion of unique normal form.In particular, we have analyzed its relation with (various notions of) confluence.We also investigated relations with normalization (WN ∞ ) and termination (SN ∞ ), and between these notions.We have developed the notions of obs-RD and obs-better as tools to analyze and compare PARS strategies.obs-RD is an alternative to strict determinism, analogous to Random Descent for ARS (non-determinism is irrelevant w.r.t. a chosen event of interest).The notion of obs-better provides a sufficient criterion to establish that a strategy is normalizing (resp.perpetual ) i.e. the strategy is guaranteed to lead to a result with maximal (resp.minimal) probability.
We have illustrated our techniques by studying a probabilistic extension of weak call-byvalue λ-calculus; it has analogous properties to its classical counterpart: all rewrite sequences converge to the same result, in the same expected number of steps.
One-Step Reduction and Expectations.In this paper, we focus on normal forms and properties related to the event NF A .However, we believe that the methods would allow us to compare strategies w.r.t.other properties and random variables of the system.The formalism seems especially well suited to express the expected value of random variables.
A key feature of the binary relation ⇒ is to exactly capture the ARS notion of one-step reduction (in contrast to one or no step), with a gain which is two-folded.
(1) Probability Theory.Because all terms in the distribution are forced to reduce at the same pace, a rewrite sequence faithfully represents the evolution in time of the system (i.e. if m ⇒ i m i , then m i captures the state at time i of all possible paths a 0 → . . .→ a i ).This makes the formalism well suited to express the expected value of stochastic processes.(2) Rewrite Theory.The results in Sections 6.1, 6.3, 7.2, crucially rely on exactly one-step reduction.The reason why this is crucial, is similar to the classical fact that termination follows from normalization by the diamond property [Newman 1942], but not by the very similar property b ← a → c ⇒ ∃d (b → = d = ← c) (see [Terese], 1.3.18).
Finite Approximants.obs-RD characterizes the case when (not only at the limit, but also at the level of the approximants) the non-deterministic choices are irrelevant.The notion of approximant which we have studied here is "stop after a number k of steps" (k ∈ N).We can consider different notion of approximants.For example, we could also wish to stop the evolution of the system when it reaches a normal form with probability p.Our method can easily be adapted to analyze this case.We believe it is also possible to extend to the probabilistic setting the results in [vOT16], which would go further in this direction.
Further and future work.In this paper, we have studied existence and uniqueness of the result of asymptotic computation.The next goal is to study how to compute such a result, i.e. the study of reduction strategies-this is the object of current investigation.[vO07] makes a convincing case of the power of the RD methods for ARS, by using a large range of examples from the literature, to elegantly and uniformly revisit normalization results of various λ-calculi.We cannot do the same here, because the rich development of strategies for λ-calculus has not yet an analogue in the probabilistic case.Nevertheless, we hope that the availability of tools to analyze PARS strategies will contribute to their development.
paradigmatic example of global property is confluence (CR): b * ← a → * c ⇒ ∃d s.t.b → * d * ← c.Its global nature makes it difficult to establish.A standard way to factorize the problem is: (1.) prove termination and (2.) prove local confluence (WCR): b ← a → c ⇒ ∃d s.t.b → * d * ← c.This is exactly Newman's lemma: Termination + WCR ⇒ CR.The beauty of Newman's lemma is that a global property (CR) is guaranteed by a local property (WCR).

2. 1 .
Basics on ARS.An abstract rewrite system (ARS) is a pair C = (C, →) consisting of a set C and a binary relation → on C (called reduction) whose pairs are written t → s and called steps; → * (resp.→ = ) denotes the transitive reflexive (resp.reflexive) closure of →.We write c → if there is no u such that c → u; in this case, c is a normal form.NF C denotes the set of the normal forms of C. If c → * u and u ∈ NF C , we say c has a normal form u.

( 1 ) 2 )
Flat Equivalence: m = flat r, if m dst = r dst .Similarly, m ≥ flat r if m dst ≥ r dst .(Equivalence in Normal Form: m = NF r, if m NF = r NF .Similarly, m ≥ NF r, if m NF ≥ r NF (3) Equivalence in the NF-norm: m = r, if m NF = r NF , and m ≥ r, if m NF ≥ r NF Note that (2.) and (3.) compare m and r abstracting from any element which is not in normal form.Example 3.5.Assume T is a normal form and a = c are not.(1)Let m = [ 1 2 T, 1 2 T], r = [1T].m = flat r, m = NF r, m = r all hold.(2) Let m = [ 1 2 a, 1 2 T], r = [ 1 2 c, 1 6 T, 26T].m = NF r, m = r both hold, m = flat r does not.The above example illustrates also the following.

( 1 )
m n n∈N converges with probability p = sup n { m NF n }. (2) m n n∈N converges to β ∈ Dst(NF A ) β(t) = sup n {m NF n (t)} We call β a limit distribution (on normal forms) of m, and p a limit probability (to reach a normal form) of m.We write m ∞ =⇒ β (resp.m ∞ =⇒ p) if m has a sequence which converges to β (resp.converges with probability p).We define Lim(m) := {β | m ∞ =⇒ β} the set of limit distributions, and Lim (m) := {p | m ∞ =⇒ p}.

4. 2 . 2 .
On Unique Normal Forms and Confluence.We now focus on two natural questions.First: when is the notion of the result [[m]] well defined?Second: given a probabilistic program M , if [M ] ∞ =⇒ α and [M ] ∞ =⇒ β, how do β and α relate?

Figure
Figure 7. Random Descent