Approximate reasoning for real-time probabilistic processes

We develop a pseudo-metric analogue of bisimulation for generalized semiMarkov processes. The kernel of this pseudo-metric corresponds to bisimulation; thus we have extended bisimulation for continuous-time probabilistic processes to a much broader class of distributions than exponential distributions. This pseudo-metric gives a useful handle on approximate reasoning in the presence of numerical information - such as probabilities and time - in the model. We give a fixed point characterization of the pseudo-metric. This makes available coinductive reasoning principles for reasoning about distances. We demonstrate that our approach is insensitive to potentially ad hoc articulations of distance by showing that it is intrinsic to an underlying uniformity. We provide a logical characterization of this uniformity using a real-valued modal logic. We show that several quantitative properties of interest are continuous with respect to the pseudo-metric. Thus, if two processes are metrically close, then observable quantitative properties of interest are indeed close.


Introduction
The starting point and conceptual basis for classical investigations in concurrency are the notions of equivalence and congruence of processes -when can two processes be considered the same and when can they be substituted for each other?Most investigations into timed [AH92,AD94] and probabilistic concurrent processes are based on equivalences of one kind or another, e.g.[CSZ92, Han94, Hil94, LS91, SL95, PLS00] to name but a few.
As has been argued before [JS90, DGJP99, DGJP04], this style of reasoning is fragile in the sense of being too dependent on the exact numerical values of times and probabilities.Previously this had pointed out for probability, but the same remarks apply, mutis mutandis, to real time as well.Consider the following two paradigmatic examples: • Consider the probabilistic choice operator: A 1 + p A 2 , which starts A 1 with probability p and A 2 with probability 1 − p.Consider A 1 + p+ǫ A 2 and A 1 + p+2ǫ A 2 .In traditional exact reasoning, the best that one can do is to say that all these three processes are inequivalent.Clearly, there is a gradation here: • Consider the delay t .A operator that starts A after a delay of t time units.Consider delay t+ǫ .A and delay t+2ǫ .A. Again, in exact reasoning, the best that one can do is to say that all these three processes are inequivalent.Again, delay t+ǫ .A is intuitively closer to delay t .A than delay t+2ǫ .A is to delay t .A. In both examples, the intuitive reasoning behind relative distances is supported by calculated numerical values of quantitative observables -expectations in the probabilistic case and (cumulative) rewards in the timed case.
The fragility of exact equivalence is particularly unfortunate for two reasons: firstly, the timings and probabilities appearing in models should be viewed as numbers with some error estimate.Secondly, probability distributions over uncountably many states arise in even superficially discrete paradigms such as Generalized semi-Markov processes (e.g.see [She87] for a textbook survey), and discrete approximations are used for algorithmic purposes [BHK99, HCH + 02].These approximants do not match the continuous state model exactly and force us to think about approximate reasoning principles -e.g. when does it suffice to prove a property about an approximant?Thus, we really want an "approximate" notion of equality of processes.In the probabilistic context, Jou and Smolka [JS90] propose that the correct formulation of the "nearness" notion is via a metric.Similar reasons motivate the study of Lincoln, Mitchell, Mitchell and Scedrov [LMMS98], our previous study of metrics for labelled Markov processes [DGJP99, DGJP04,DGJP02], the study of the fine structure of these metrics by van Breugel and Worrell [vBW01b,vBW01a] and the study of Alfaro, Henzinger and Majumdar of metrics for probabilistic games [dAHM03].
In contrast to these papers, in the present paper we focus on real-time probabilistic systems that combine continuous time and probability.We consider generalized semi-Markov processes (GSMPs).Semi-Markov processes strictly generalize continuous-time Markov chains by permitting general (i.e.non-exponential) probability distributions; GSMPs further generalize them by allowing competition between multiple events, each driven by a different clock.
Following the format of the usual definition of bisimulation as a maximum fixed point, we define a metric on configurations of a GSMP as a maximum fixed point.This permits us to use analogues of traditional coinductive methods to reason about metric distances.For example, in exact reasoning, to deduce that two states are equivalent, it suffices to produce a bisimulation that relates the states.In our setting, to show that the distance between two states is less than ǫ, it suffices to produce a (metric) bisimulation that sets the distance between the states to be less than ǫ.
Viewing metric distance 0 as bisimilarity, we get a definition of bisimulation for GSMPs, a class that properly includes CTMCs.In contrast to existing work on bisimulation for general probability distributions (e.g.[BG02,Her02]) our definition accounts explicitly for the change of probability densities over time.
Secondly, we demonstrate that our study does not rely on any "ad-hoc" construction of metric distances.Uniform spaces capture the essential aspects of metric distances by axiomatizing the structure needed to capture relative distances -e.g.statements of the form "x is closer to y than to z."A metric determines a uniform space but different metrics can yield the same uniform space.Uniform spaces represent more information than topological spaces but less than metric spaces, so we are identifying, as precisely as we can, the intrinsic meaning of the quantitative information.We present our maximal fixpoint construction as a construction on uniform spaces, showing that the numerical values of different metric representations are not used in an essential way.In particular, in our setting, it shows that the actual numerical values of the discount factors used in the definition of the metric do not play any essential role.
Thirdly, we provide a "logical" characterization of the uniformity using a real-valued modal logic.In analogy to traditional completeness results, we prove that the uniformity notion induced by the real-valued modal logic coincides with the uniformity induced by the metric defined earlier.Our logic is intentionally chosen to prove this completeness result.It is not intended to be used as an expressive specification formalism to describe properties of interest.Our framework provides an intrinsic characterization of the quantitative observables that can be accommodated -functions that are continuous with respect to the metric.
Finally, we illustrate the use of such studies in reasoning by showing that several quantitative properties of interest are continuous with respect to the metric.Thus, if two processes are close in the metric then observable quantitative properties of interest are indeed close.For expository purposes, the list considered in this paper includes expected hitting time, expected (cumulative and average) rewards.The tools used to establish these results are "continuous mapping theorems" from stochastic process theory, and provide a general recipe to tackle other observables of interest.
The rest of this paper is organized as follows.We begin with a review of the model of GSMPs in Section 2. We then give a review of the basic ideas from stochastic process theory -metrics on probability measures on metric spaces in Section 3 and the Skorohod J2 metrics on timed traces in Section 4. We discuss timed traces in the context of GSMPs in Section 5. We define metric bisimulation in Section 6.We discuss interesting quantitative observables are continuous functions in Section 8. We present our construction in terms of uniform spaces in Section 7. Finally, we show the completeness of the real-valued modal logic in Section 9.

Generalized semi-Markov processes
GSMPs properly include finite state CTMCs while also permitting general probability distributions.We describe GSMPs informally here following the formal description of [Whi80].The key point is that in each state there are possibly several events that can be executed.Each event has its own clock -running down at its own rate -and when the first one reaches zero that event is selected for execution.Then a probabilistic transition determines the final state and any new clocks are set according to given probability distributions: defined by conditional density functions.The probability distribution over the next states depends only on the current state and the event that has occurred: this is the "Markov" in semi-Markov.The clocks are reset according to an arbitrary distribution, not necessarily an exponential (memoryless) distribution: hence the "semi".We will consider finite-state systems throughout this paper.
A finite-state GSMP over a set of atomic propositions AP has the following ingredients: (1) A finite set S of states.Each state has an associated finite set of events I(s), each with its own clock (we use the same letter for an event and its clock) and a non-zero rate for each clock in I(s).A clock in I(s) runs down at the constant rate associated with it.
(2) A labelling function Props : S → 2 AP that assigns truth values to atomic propositions in each state.(3) A continuous probability density function f (t; s, i; s ′ , i ′ ), over time, for each i ∈ I(s), for each target state s ′ and i ′ ∈ I(s ′ ).This is used to define how clocks are reset during a transition.(4) For each i ∈ I(s), a probabilistic transition function We use c, c′ (resp.r) for vectors of clock values (resp.rates).We use the vector operation c − rc t to indicate the clock vector resulting from evolution of each clock under its rate for time t.
Definition 2.1.Let s be a state.A generalized state is of the form s, c where c is a vector of clock values indexed by i ∈ I(s) that satisfies a uniqueness condition: there is a unique clock in I(s) that reaches 0 first.We write T ( s, c ) for the time required for the first clock (unique by the above definition) to reach 0. We use G for the set of generalized states, and g s , g ′ s , g 1 s . . .for generalized states.
We describe the evolution starting in a generalized state s, c .Each clock in c decreases at its associated rate.By the uniqueness condition on generalized states, a unique clock reaches 0 first.Let this clock be i ∈ I(s).The distribution on the next states is determined by the probabilistic transition function next i : S × S → [0, 1].For each target state s ′ , • The clocks i ′ ∈ I(s) \ I(s ′ ) are discarded.
• The new clocks i ′ ∈ I(s ′ ) \ [I(s) \ {i}], get new initial time values assigned according to the continuous probability density function f (t; s, i; s ′ , i ′ ).• The remaining clocks in I(s) ∩ I(s ′ ) carry forward their time values from s to s ′ The continuity condition on probability distributions ensures that this informal description yields a legitimate Markov kernel [Whi80].The semantics of a real-time probabilistic process can be described as a discrete-time Markov process on generalized states.For each generalized state, we associate a set of sequences of generalized states that arise following the prescription of the evolution given above.

Pseudometrics
Definition 3.1.A pseudometric m on a state space S is a function S × S → [0, 1] such that: m(x, x) = 0, m(x, y) = m(y, x), m(x, z) ≤ m(x, y) + m(x, z) We consider a partial order on pseudometrics on a fixed set of states S.
Definition 3.2.M is the class of pseudometrics on S ordered as: The top element ⊤ is the constant 0 function, and the bottom element is the discrete metric [DGJP02].Thus, any monotone function F on (M, ) has a complete lattice of fixed points.
3.1.Wasserstein metric.The Wasserstein metric is actually a prescription for lifting the metric from a given (pseudo)metric space to the space of probability distributions on the given space.
Let (M, m) be a pseudometric space, and let P, Q be probability measures on M .Then, W (m)(P, Q) is defined by the solution to the following linear program (h : M → [0, 1] is any function): An easy calculation using the linear program shows that the distances on distributions satisfies symmetry and the triangle inequality, so we get a pseudometric -written W (m)on distributions.
By standard results (see Anderson and Nash [AN87]), this is equivalent to defining W (m)(P, Q) as the solution to the following dual linear program (here ρ is any measure on M × M , S and S ′ are any measurable subsets): The Wasserstein construction is monotone on the lattice of pseudometrics.
Proof.Clearly every solution to the linear program for W (m ′ )(P, Q) is also a solution to the linear program for W (m)(P, Q).The result is now immediate.
We discuss some concrete examples to illustrate the distances yielded by this construction.Let (M, m) be a 1-bounded metric space, i.e. m(x, y) ≤ 1, for all x, y.Let 1 x be the unit measure concentrated at x, Example 3.5.We calculate W (m)(1 x , 1 x ′ ).The primal linear program -using the function h : M → [0, 1] defined by h(y) = m(x, y) -yields W (m)(1 x , 1 x ′ ) ≥ m(x, x ′ ).The dual linear program -using the product measure -yields Example 3.6.Let P, Q be such that for all measurable U , |P (U ) − Q(U )| < ǫ.For any 1-bounded function h, | hdP − hdQ| < ǫ -since for any simple function g with finite range {v 1 . . .v n } dominated by h: In 1-bounded metric spaces, the Wasserstein metric is closely related to the Prohorov metric, π, which metrizes the topology of weak convergence.We say that P n weakly converges to P if for all bounded continuous real-valued functions f f dP n converges to f dP .For any Borel set A, we write A ǫ for {u : ∃v ∈ A.d(u, v) < ǫ}.The Prohorov metric π(P, Q) between two measures is defined by The connection between the Wasserstein metric and the Prohorov metric, see [GS01] for a tutorial presentation of various such relationships, is: The following lemma is the key tool to approximate continuous probability distributions by discrete distributions in a separable metric space.We use the following lemma later with the U i , i > 1's being subsets of ǫ-neighborhoods of a point, and the U i , i > 1 being a finite cover, wrt P , for all but ǫ of the space (which will be covered by U 0 ).Lemma 3.7.Let P be a probability measure on (M, m).Let U i , i = 0, 1, 2, . . .n be a finite partition of the points of M into measurable sets such that: The fourth inequality follows as 0 ≤ h(x) ≤ 1 and from our assumption on U i , m(x, x i ) ≤ ǫ, and from the constraint on h, h(x

Cadlag functions
Usually when one defines bisimulation one requires that "each step" of one process matches "each step" of the other process.What varies from situation to situation is the notion of step and of matching.In the present case there is no notion of atomic step: one has to match sequences instead.In the usual cases matching steps and matching sequences are equivalent so one works with steps as they are simpler.Here we have no choice: we have to work with timed sequences.
The timed sequences that one works with are functions from [0, ∞) to the state space.Since we have discrete transitions these functions are not continuous.It turns out that the class of functions most often used are the cadlag2 functions.
and for any increasing sequence {t} ↑ t 0 These functions have very nice properties: for example, they have at most countably many discontinuities.More to the point perhaps, if one fixes an ǫ > 0, then in any bounded interval there are at most finitely many jumps higher than ǫ, so all but finitely many jumps are small.
The study of metrics on spaces of these functions was initiated by Skorohod [Sko56]; see Whitt's book [Whi02] for an expository presentation.Skorohod defined several metrics: we use one called the J 2 metric.The most naive metric that one can define is the sup metric.This fails to capture the convergence properties that one wants: it insists on comparing two functions at the exact same points.Skorohod's first metric (the J 1 metric) allows one to perturb the time axis so that functions which have nearby values at nearby points are close.The J 1 metric also fails to satisfy certain convergence properties and we use the J 2 metric defined below, which like the J 1 metric, allows one to compare nearby time points.
Let (M, m) be a metric space.Let | • | be the metric on positive reals R + be defined by Thus the J 2 distance between two functions is the Hausdorff distance between their graphs (ie. the set of points (x, f (x))) in the space [0, ∞) × M equipped with the metric d((x, s), (y, t)) = max(|x − y|, m(s, t)).
The following lemma is immediate from definitions.
The next lemma is standard, e.g.see Billingsley [Bil99].
) is a separable space with a countable basis given by piecewise constant functions with finitely many discontinuities and finite range contained in a basis of M .
We consider a few examples, to illustrate the metric -see the book by Whitt [Whi02] for a detailed analysis of Skorohod's metrics.The first example shows that jumps/discontinuities can be matched by nearby jumps.
The next example shows that a single jump can be matched by multiple nearby jumps.
Example 4.6.[Whi02] Let {a n , }, {c n } be increasing sequences that converges to 1 2 such that a n < c n .Let: These are depicted in the picture below.
The next two non-examples shows that "jumps are detected".In a later section, we develop a real-valued modal logic that captures the reasoning behind these two non-examples.Here, to provide preliminary intuitions, we provide a preview of this development in a specialized form.Given a cadlag function f with range [0, 1] and the standard metric, and a Lipschitz function h on [0, 1] let L(h)(f ) be defined as In this definition, view h as a test performed on the values taken by the function f .Since h is a Lipschitz function on [0, 1], the results of such tests are smoothed out, and include the analogue of (logical) negation via the operation 1 − (•) and smoothed conditionals via h q (x) = max(0, x−q) that correspond to a "greater than q" test.The L(h)(f ) also performs an extra smoothing operation over time, so that the values of We will use this to establish non-convergence of function sequences in the J-metric in the next two examples.The first non-example shows that jumps are detected -a sequence of functions with jumps cannot converge to a continuous function.
Example 4.7.Let {b ′ n } be an increasing sequence that converges to 1 2 .Consider: n , 1) has distance 1 from the graph of 0. Alternatively, this can be illustrated by considering the Lipschitz operator h ′ ǫ on [0, 1] defined as: r ≥ e n These are depicted in the picture below.
To analyze in terms of the operator L, consider the Lipschitz operator h ǫ on [0, 1] defined as: )( 1 2 ) = 0. We conclude this section with a discussion of a delay operator on the space of cadlag functions.
Definition 4.9.Let (M, m) be a metric space.Let The distance between a cadlag function and its t-delayed version is no greater than t.

GSMPs and cadlag functions.
We deal with the temporal aspects of GSMPs next by constructing cadlag functions for paths: i.e. sequences of generalized states of a GSMP.
A sequence of generalized states is finitely varying if it is non-Zeno, i.e. for any i, ∞ j>i T ( s j , cj ), the sum of the times spent at each generalized state, diverges.Any finitely varying sequence of generalized states s i , ci generates f : [0, ∞) → G as follows: • f (t) = s i , c , where is the new clock values after evolving at rate rc for time |t − i k=0 T ( s k , ck )| starting from s k , ck .Such finitely varying traces satisfy the following: for any interval [t, t ′ ], that there is a finite partition t = t 0 < t 1 < t 2 . . .< t n = t ′ such that: We write Traces s, c for the set of traces that start with s, c .The probability distributions associated with initial clock-values at states (f s ) and transitions (next) induces a probability measure on Traces s, c .The paths that are Zeno have measure zero, so the finitely-varying paths generate Traces s, c in measure.
Arbitrarily close approximations to the distance between finitely-varying functions f, g ∈ D (G,m) [0, ∞) are forced by the distances between the values of f, g at finitely many points of time.This lemma is useful later on to show that our coinductive definition of metric has closure ordinal ω.
Then there is a finite subset G f in ⊆ G and ǫ > 0 such that for any m ′ , J(m Proof.If J(m)(f, g) > δ, without loss of generality there is a t such that the Hausdorff distance of (f (t), t) from Graph(g) is greater than δ + γ for some γ > 0. Consider the bounded interval [t − δ, t + δ].By finite-variation of g, we have a partition t 0 = t − δ < t 1 < t 2 . . .< t n = t + δ such that: For any m ′ on generalized states such that: (∀i, j)m ′ ( s i , ci , s j , cj ) ≥ m( s i , ci , s j , cj ) − ǫ the Hausdorff distance of (f (t), t) from Graph(g) is greater than δ by construction.

Bisimulation style definition of metric
Let M be the class of pseudo-metrics on generalized states that satisfy: where Props(s) is the set of atomic propositions true in state s.We order these pseudometrics as in section 3: In this definition, view k as a discount factor.In the next section, we will show that the choice of k does not affect the essential character of the metric."Type-checking" of this definition provides some intuitions: m is a pseudo-metric on generalized states.J(m), following Skorohod J2, is a pseudo-metric on finitely varying sequence of generalized states.W (J(m)), following Wasserstein, is a pseudo-metric on probability distributions on finitely varying sequence of generalized states.
As an immediate consequence of lemmas 4.3 and 3.4: It is well known that the greatest fixed point of F k is given by: Thus, a metric-bisimulation m provides an upper bound on the distances assigned by m F k .Consider the equivalence relation ≃= {( s, c , s ′ , c′ ) | m F k ( s, c , s ′ , c′ ) = 0}.≃ describes a notion of bisimulation and is explicitly defined as follows.
Let M {0,1} , the sublattice of M consisting of metrics whose range is {0, 1}, i.e. all distances are either 0 or 1. M {0,1} is essentially the class of equivalence relations.A simple proof shows that: As an example of metric-reasoning, we now show that generalized states with the same state, but clock values reflecting evolution for a time t are m F k -close.Lemma 6.4.Define a pseudo-metric m on generalized states as follows: Proof.It suffices to prove that for all generalized states s, c , s ′ , c′ The only case to consider is when s = s ′ , wlog assume c ≤ c′ .Define u : [0, t) → G as follows.
With this definition, it is clear that: Let m 0 = ⊤ and m i+1 = F k (m i ).The role played by the discount constant k is captured in the following fact: This is the key step in the proof of the following lemma.Lemma 6.5.m F k is separable.
Proof.We first note that if m, m ′ are such that (∀s)|m(s) − m ′ (s)| ≤ δ, then: this is immediate from the definition.We prove by induction on n that • Base.n = 0. Follows from the fact that m 0 is the constant 0 function and m 1 is bounded above by k.
Thus, for any s, c , s ′ , c′ ), W (J(m n+1 ) and ( s, c , s ′ , c′ ) and W (J(m n )) differ by atmost k n+1 .So: 1 − k Thus, an ǫ ball around s, c wrt the metric m can be realized as the countable union of open sets wrt the metrics m n .The result now follows from the separability of the metrics m n .
The separability of m F k enables one to prove the analogue of lemma 5.1.

Lemma 6.6 (Finite detectability of distances).
Let m be a pseudometric on G with countable basis.Let Then there is a finite subset G f in ⊆ G and ǫ > 0 such that for any metric m ′ m, From separability of m, lemma 4.4 yields separability of J(m).Let ρ be the measure induced on the space of all traces by Traces( s, c ).Using separability of J(m), we can get a finite partition U 0 , U 1 , . . ., U n of G satisfying diameter(U i ) < γ 16 for i ≥ 1, and ρ(U 0 ) < γ 16 .Using lemma 3.7 with ǫ γ 16 gives us a finite set of traces Similarly applying lemma 3.7 to Traces( s ′ , c′ ) gives us another finite set L 2 = {f ′ i | i} with probabilities p ′ i given by the measure induced by Traces( s ′ , c′ ).We then have from lemma 3.7 8 , and similarly W (J(m ′ ))(Traces( s ′ , c′ ), L 2 ) < γ 8 .The triangle inequality then gives us that F k (m ′ )( s, c , s ′ , c′ ) > δ.Lemma 6.7.F k has closure ordinal ω.
Proof.The proof proceeds by showing that the maximum fixed point m is given by m = ⊔ i m i , where m 0 = ⊤ and m i+1 F k (m i ).
Let m( s, c , s ′ , c′ ) > δ.From lemma 6.6, we deduce the finitely many conditions of the form Each of these finitely many conditions are met at a finite index, therefore by ω they are all met and the result follows.

Uniform spaces
A metric captures a quantitative notion of distance or "nearness", a topology captures a qualitiative notion of nearness: a topology is enough to talk about convergence and continuity.A topology is, however, not enough to capture a notion of relative distance.One cannot say "x is closer to y than it is to z" on the basis of a topology alone.A uniform space -see, for example, [Ger85] for a quick survey -captures the essence of the relative distance notion in metric spaces: if there are points x, y, z such that x is closer to y than to z, uniform spaces have enough data to capture this without committing to the actual numerical values of the distances.The aim of this section is to show that our treatment is "upto uniformity" -this is a formal way of showing that there is no ad-hoc treatment of the quantitative metric distances.In particular, we show that different discount factors k yield the same uniformity.
Let S be a set.
Definition 7.1.A pseudo-uniformity, U is a collection of subsets of S×S, called entourages, that satisfies: One can think of the entourages as defining approximations to the identity relationjust as the neighbourhood of a point can be thought of as an approximation of the point.The first axiom says this, the second axiom is symmetry and the third is a truncated version of transitivity and the final axiom says that a superset of an approximation to the identity is also an approximation to the identity.The usual presentation of uniformities also includes but this condition is not appropriate to our pseudo-metric setting.In this paper, we will work with pseudo-uniformities, often dropping "pseudo" and merely saying "uniformities".Definition 7.2.A pseudo-uniform space is a pair (S, U) where U is a pseudouniformity on S.
There is a natural notion of map between uniform spaces.A morphism between uniform spaces generalizes uniformly-continuous functions.
To gain intuition into this definition, we describe how a pseudometric generates a pseudo-uniformity.Given a pseudo-metric m on S, let K ǫ m = {(x, y) | m(x, y) < ǫ} for ǫ > 0. We get a pseudo-uniformity by considering [Ger85, p.218]: m ⊆ E} Thus, if m(x, y) < ǫ and m(x, z) > ǫ, there is an entourage that contains (x, y) but not (x, z).Clearly if one just scales the metric this construction yields the same uniformity.
Thus, all pseudometrics induce pseudo-uniformities, but the converse is not true: there are pseudo-uniformities that are not induced by metrics.Two metrics m, m ′ on the same set induce the same uniformity if and only if the identity map is uniformly continuous in both directions.
7.1.The lattice of uniformities.Consider uniformities induced by pseudo-metrics on a fixed set of states S. Definition 7.4.PMU is the class of pseudo-metrizable uniformities {U i } on S ordered as follows.
This order is closely related to the order on the lattice of pseudometrics.
Proof.Let U be the uniformity induced by m.We need to show that U = U 1 .Since K ǫ m ⊆ K ǫ m 1 , U ⊇ U 1 .We now show that U ⊆ U 1 .Consider an entourage E ∈ U. Thus, there is an Proof.The least element is given by the discrete metric: ⊥(s, t) = 0 if s = t, 1 otherwise.The top element has only one entourage S and is induced by the pseudometric given by (∀s, t)⊤(s, t) = 0.The greatest lower bounds of {U i } is given by the ∪ i U i .
Proof.Since the uniformities induced by m, m ′ are the same, the identity map (M, m) → (M, m ′ ) is uniformly continuous.We need to show that the identity map on distributions with metrics W (m) and W (m ′ ) is uniformly continuous.
Since the uniformity induced by W (m) (resp.W (m ′ )) is the same as the uniformity induced by the Prohorov metric π(m) (resp.π(m ′ )), it suffices to prove that the identity map on distributions with metrics π(m) and π(m ′ ) is uniformly continuous.
Lemma 7.9.The uniformity induced by the Skorohod J2 metric depends only the uniformity induced by m on M .
These operations are monotone, under the subset ordering, in both X and R.
Let E ⊆ S × S be an entourage in U (S).Consider the subset of J(E) of D M [0, ∞) × D M [0, ∞) induced by E as follows.J(E) is the set of all (f, g) such that: }.We will show that U is the uniformity generated by J(m).
Rewriting J(m) in this style: and thus considered in the definition of U. Furthermore, any arbitrary entourage E in U (S) is a superset of these basic entourages, and yields a superset by monotonicity of the relational operations.
The result now follows by the upward-closure axiom in the definition of U and the uniformity generated by J(m).
The proof of this theorem also shows that the construction J(m) yields the same uniformity for other definitions of metrics on M that yield the same uniformity, e.g. the metric d((x, s), (y, t))|x − y| + m(s, t)).
In light of this theorem, we write J(U) for the uniformity generated by a pseudometrizable uniformity U.As a direct consequence of lemma 4.3, we have: Corollary 7.10.S 1 ≤ S 2 ⇒ J(S 1 ) ≤ J(S 2 ) 7.4.A functional on the lattice of uniformities.Combining the above results, we deduce the existence of a monotone function F k on the lattice of uniformities.This function is insensitive to the actual numerical value of the discount factor k.
Proof.By Lemmas 7.7 and 7.9 we see that the functional F k is defined upto uniformity.If we change k to k ′ we are simply rescaling the metric, this clearly gives the same uniformity.
Furthermore, for any discount factor 0 < k < 1, we get the same maximum fixed point in the lattice of uniformities.In contrast to the above lemma, the following theorem relies on k = 1.
Theorem 7.12.The maximum fixpoint in (PMU , ≤) is the uniformity induced by m F k , the maximum fixpoint in (M, ).
Proof.It suffices to show that the greatest lower bound of the U i , ∪U i is the uniformity induced by sup m i .Let V be the uniformity induced by sup m i .Since m i m i for all i, we have that U i ⊆ V for all i.Thus ∪U i ⊆ V and hence the identity function from V to U i is uniformly continuous.
For the converse, we will show that the identity function from ∪U i → V is uniformly continuous.We know -from the proof of Lemma 6.5 -that for any generalized states g s and g ′ s We choose n and δ such that δ + k n+1 1−k < ǫ.Now for any such δ and n we have m n (g s , g ′ s ) < δ ⇒ sup m i (g s , g ′ s ) < ǫ.This shows that for any K ǫ sup m i is contained in a K δ mn and hence that the identity function is uniformly continuous.This proof relies on the fact that k < 1 otherwise the δ would not be defined.

Examples
In this section, we discuss several examples of the use of approximate reasoning techniques.The general approach in this section is to identify natural quantitative observables, already explored in the literature, that are amenable to approximation -i.e. to calculate the observable at a state s, c upto ǫ, it suffices to calculate it a close-enough state s ′ , c′ .This is clearly implied by continuity of the observable w.r.t. the metric m F k .
The main technical tool that we use to establish continuity of observables is a continuous mapping theorem, e.g.see [Whi02,She87] for an introductory exposition.
Theorem 8.1 (Continuous mapping theorem).Let P n be a sequence of probability distributions on X that weakly converge to P .Let U be a continuous function X → R. Then U dP n converges to U dP .8.1.Expected time to hit a proposition.Let p be a proposition.We consider the expected time required to hit a p-state, i.e. a state where the proposition p is true.Define So, using the continuous mapping theorem, we deduce that if { s i , ci } converges to s, c then the sequence of expected times to hit a p-state from { s i , ci } converges to the expected time to hit a p-state from { s, c }.In fact, since in this case, Hit p is a 1-Lipschitz function, we can also deduce the rate of convergence using [Whi02] -if m F ( s i , ci , s, c ) < ǫ, then the expected times to hit a p-state differ by atmost 2ǫ.8.2.Expected rewards.Let r i be an assignment of rewards to states s i such that if r i = r j , then states s i , s j differ in the truth-assignment of at least one proposition.This restriction can be viewed purely as a modelling constraint.
Define a function R : G → [0, ∞) by: R( s i , c = r i Under the hypothesis that distinct rewards are distinguished propositionally, R defines a continuous function. For any finitely-varying f , consider CumR(f ), a continuous function of t defined as follows: By standard results -e.g.see [Whi02] -CumR is a continuous function from D m F [0, ∞) to (C, unif) where C is the space of continuous functions from [0, ∞) → [0, ∞) with the uniform metric: Consider the following continuous functions from (C, unif) to [0, ∞): • For a fixed T , cumulative reward at time T .
• For a fixed T , average reward per unit time at T .
• The supremum of the times T at which cumulative reward is less than a fixed v, for some value v. • The supremum of the times T at which the average reward is less than a fixed v, for some value v.In each of these cases, by composing with CumR, we get a continuous function from D m F [0, ∞) to [0, ∞).So, the continuous mapping theorem applies, and we deduce that if { s i , ci } converges to s, c then the sequence of expected values from { s i , ci } converges to the expected value at { s, c }.

Functional characterization of uniformity
In an early treatment of metrics for LMPs [DGJP99, DGJP04] the metric was defined through a class of functions closely related to a modal logic.The idea was that, in a probabilistic setting, random variables play a role analogous to modal formulas.A class of random variables (measurable functions) was defined on the state space and the metric was obtained by taking the sup over this class of functions.The coinductive definition of the metric came later [vBW01b,DGJP02] and was shown to be the same as the metric defined logically.In the present work we develop the subject along similar lines.We already have the fixed-point version of the metric: we now give the "logical" view.In this section, we provide an explicit construction of the maximum fixed point by considering a class of [0, 1] valued functions.9.1.Function expressions.Definition 9.1.Fix 0 < k ≤ 1 2 .The syntax of function expressions is given by: where p ranges over atomic propositions, h is any Lipschitz function on [0, 1], t ∈ [0, ∞).
The subscript k gives the discount factor.We will not usually write this factor explicitly. Intuitively the F-function expressions are evaluated at generalized states, and the G-function expressions are evaluated on finitely-varying paths at the times shown.In a temporal logic with state and path formulas, like CTL*, the path formulas are implicitly evaluated at the first time.This may not seem to be the case with a formula like Gp ( p in LTL notation) but is clear with a formula like Xp ( p).In our G-formulas we cannot have a first time: we provide the time as an explicit parameter.One can imagine a much richer language of path formulas: for example, one might have time averages along a path.However, the present language suffices for the definition of the metric.
As preliminary intuition, 1 corresponds to the formula true, min(•, •) corresponds to conjunction3 , and h • f encompasses both testing (via h(x) = max(x − q, 0)) and negation (via h(x) = 1 − x).At a generalized state s, c , G(t) yields the (discounted) expectation of G(t) wrt the distribution of Traces s, c .The intuition underlying L(F )(t) has been discussed in section 4 -at a finitely-varying function f , L(F )(t) yields the evaluation at time t of a time-smoothed variant of f .
We formalize these intuitions below.The interpretations of F-function expressions and G-function expressions yield maps whose range is the interval [0, 1].
• The domain of F-function expressions is G, the set of generalized states.
• The domain of G-function expressions is the set of finitely-varying functions with range G. Fix a GSMP.F-function expressions are evaluated as follows at a generalized state s, c : where µ is the distribution of Traces( s, c ).Note that in this definition f varies among the paths of Traces( s, c ) so G(t) is a measurable function on the space of these paths and µ is a measure on these paths.
G-function expressions are evaluated as follows at a finitely-varying function f : Thus, for f ∈ Traces( s, c ), L(F )(t)(f ) is the upper Lipschitz approximation to F s k • f evaluated at t. 9.2.A pseudometric from function expressions.We define a pseudometric d k as follows.
We proceed to show that the uniformities defined by these two metrics agree.Unlike the case with discrete time systems the metrics themselves do not agree: it is the uniformity that is common to the two of them.• Every G-function-expression G satisfies The key case in this proof is the case for the F-function-expression G(t).For this case, the induction on G and k ≤ 1 2 yields that k × G(t) is 1-Lipschitz for metric J(m F k ).So, by the definition of the Wasserstein metric, we get the required inductive result for G(t).
This shows that the identity function from domain with metric m F k and range with metric d k is uniformly continuous.
We prove the converse below.We show that m F k is dominated by d k .We use the fact that the closure ordinal of F k is ω.Let m 0 = ⊤, m i+1 = F k (m).We show by induction on i that each m i is dominated by d k .We proceed in the following two steps: • Let f, g be such that J(m i )(f, g) > ǫ, where ǫ > 0. We show that there is a G-function expression such that for some t, G(t)(g) = 0 and G(t)(f ) > ǫ.
Following the proof of lemma 6.6, we get finite sets of traces L 1 , L 2 satisfying W (J(m i ))(L 1 , L 2 ) > ǫ + γ 2 and it suffices to prove the result for finite linear combinations.
From the above item, for each pair of traces (one from L 1 and the other from L 2 ), there are G-function expressions, G ij that are non-zero only on f 1 , . . ., f n and zero on f ′ 1 , . . ., f ′ m and which yield arbitrarily close approximations to the distance between the f i , f ′ j pairs.The result now follows by considering max(G p ij ).

Conclusions
We have given a pseudo-metric analogue of bisimulation for GSMPs.We have shown that this really depends on the underlying uniformity and that quantities of interest are continuous in this metric.We have given a coinduction principle and a logical characterization reminiscent of previous work for weak bisimulation of a discrete time concurrent Markov chain.
The previous approaches to bisimulation work well for CTMCs, precisely because of the fact that the distribution is memoryless; at any given instant the expected duration in a state and the transition probabilities only depend on the current state of the system, and thus one can define a bisimulation on the state space.In contrast, the problem of describing bisimulation for real-time processes that have general distributions, rather than memoryless distributions, has been vexing.In the present work, we have shifted emphasis to the generalized states that incorporate time and not tried to define a bisimulation on the ordinary states.Because the generalized states embody the quantitative temporal information we have to work metrically; an attempt to define bisimulation directly would have fallen afoul of the approximate nature of the timing information.
If we want to move to continuous state spaces and stochastic hybrid systems, the whole dynamical formalism has to be different: one can no longer think of paths as cadlag functions.We will have to use stochastic differential equations to describe the systems and the space of sample paths for the trajectories.That is a subject for future work and one that we have been heading towards from the inception of our work on LMPs [BDEP97,DEP02].
Traces s, c } with the distribution inherited from Traces s, c .Now this induces a matching between the traces in Traces s, c and those of Traces s ′ , c′ .This in turn induces a distribution on the product space Traces s, c × Traces s ′ , c′ .Using this distribution as the ρ in the dual form of the definition we get the result.

Theorem 9. 3 .
The uniformity induced by d k coincides with the uniformity induced by m F k , the maximum fixed point of F k .Proof.We demonstrate that the identity function is a uniformly continuous isomorphism between G equipped with the metrics d k and m F k .Consider the identity function from domain with metric m F k and range with metric d k .A mutual inductive proof shows that:• Every F-function-expression F satisfies:|F ( c, s ) − F ( c ′ , s′ )| ≤ m F k ( c, s , c ′ , s′ ) The second non-example shows that continuous functions do not approximate a function with jumps.