Quantitative Approximation of the Probability Distribution of a Markov Process by Formal Abstractions

The goal of this work is to formally abstract a Markov process evolving in discrete time over a general state space as a finite-state Markov chain, with the objective of precisely approximating its state probability distribution in time, which allows for its approximate, faster computation by that of the Markov chain. The approach is based on formal abstractions and employs an arbitrary finite partition of the state space of the Markov process, and the computation of average transition probabilities between partition sets. The abstraction technique is formal, in that it comes with guarantees on the introduced approximation that depend on the diameters of the partitions: as such, they can be tuned at will. Further in the case of Markov processes with unbounded state spaces, a procedure for precisely truncating the state space within a compact set is provided, together with an error bound that depends on the asymptotic properties of the transition kernel of the original process. The overall abstraction algorithm, which practically hinges on piecewise constant approximations of the density functions of the Markov process, is extended to higher-order function approximations: these can lead to improved error bounds and associated lower computational requirements. The approach is practically tested to compute probabilistic invariance of the Markov process under study, and is compared to a known alternative approach from the literature.


Introduction
Verification techniques and tools for deterministic, discrete time, finite-state systems have been available for many years [20].Formal methods in the stochastic context are typically limited to finite-state structures, either in continuous or in discrete time [5,23].Stochastic formulae or to reward-based criteria characterised via value function recursions.Moreover, the constructed Markov chain can be shown to represent an approximate probabilistic bisimulation of the original process [1,29,3].
The article is structured as follows.Section 2 introduces the model under study and discusses some structural assumptions needed for the abstraction procedure.The procedure comprises two separate parts: Section 3 describes the truncation of the dynamics of the model, whereas Section 4 details the abstraction of the dynamics (approximation of the transition kernel) -both parts contribute to the associated approximation error.Section 5 considers higher-order approximation schemes and quantifies the introduced approximation error.Section 6 specialises these higher-order schemes to explicit algorithms for low-dimensional models using known interpolation bases.Section 7 discusses the application of the procedure to the computation of probabilistic invariance, and compares it against an alternative approach in the literature.

Models, Preliminaries, and Goals of this work
We consider a discrete time Markov process M s defined over a general state space, which is characterised by a pair (S, T s ), where S is a continuous state space that we assume endowed with a metric and be separable 1 .We denote by (S, B(S), P) the probability structure on S, with B(S) being the Borel σ-algebra 2 in S and P a probability measure to be characterised shortly.T s is a stochastic kernel that assigns to each point s ∈ S a probability measure T s (•|s), so that for any measurable set A ∈ B(S), P(s(1) ∈ A|s(0) = s) = T s (A|s).We assume that the stochastic kernel T s admits a density function t s , namely T s (A|s) = A t s (s|s)ds.Given the measurable space (S, B(S), P), we set up the product space S t+1 containing elements s(t) = [s(0), s(1), . . ., s(t)], where the bold typeset is used in the sequel to indicate vector quantities.Suppose that the initial state of the Markov process M s is distributed according to the density function π 0 : S → R ≥0 .Then the multi-variate density function π 0 (s 0 )t s (s 1 |s 0 )t s (s 2 |s 1 ) . . .t s (s t |s t−1 ) is a probability measure P on the product space S t+1 .On the other hand the state distribution of M s at time t ∈ N is characterised by a density function π t : S → R ≥0 , which fully describes the statistics of the process at time t and is in particular such that, for all A ∈ B(S), where the symbol P is used to indicate the probability associated to events over the product space S t+1 (note that the event s(t) ∈ A is equivalent to s(t) ∈ S t ×A on their corresponding probability spaces).
The state density functions π t (•) can be characterised recursively, as follows: π t+1 (s) = S t s (s|s)π t (s)ds ∀s ∈ S. (2.1) 1 A metric space S is called separable if it has a countable dense subset. 2 The Borel σ-algebra B(S) is the smallest σ-algebra in S that contains all open subsets of S. For a separable metric space S, B(S) equals the σ-algebra generated by the open (or closed) balls of S.
In practice the forward recursion in (2.1) rarely yields a closed form for π t (•).A special instance where this is the case is represented by a linear dynamical system perturbed by a Gaussian process noise: due to the closure property of Gaussian distributions over addition and multiplication by a constant, it is possible to explicitly write recursive formulae for the mean and the variance of the distribution, and thus express in a closed form the distribution in time of the solution process.In more general cases, it is necessary to numerically (hence, approximately) compute this density function in time.
This article provides a numerical approximation of the density function of M s in time as the probability mass function (pmf) of a finite-state Markov chain M p .The Markov chain M p is obtained as an abstraction of the concrete Markov process M s .The abstraction is associated with a guaranteed and tunable error bound, and algorithmically it leverages a state-space partitioning procedure.The procedure is comprised of two steps: (1) since the state space S is generally unbounded, it is first properly truncated; (2) subsequently, a partition of the truncated dynamics is introduced.Section 3 discusses the error generated by the state-space truncation, whereas Section 4 describes the construction of the Markov chain by state-space partitioning.The discussed Markov chain abstraction is based on a piecewise-constant approximation of the density functions.In order to improve the efficiency and the precision of the approximation, we generalise the abstraction method in Section 5 utilizing higher-order approximations of the density functions.We employ the following example throughout the article as a running case study.
Example 2.1.Consider the one-dimensional stochastic dynamical system where the parameters a, σ > 0, whereas b ∈ R, and w(•) is a process comprised of independent, identically distributed random variables with a standard normal distribution.The initial state of the process is selected uniformly within the bounded interval [β 0 , γ 0 ] ⊂ R. The solution of the model is a Markov process, evolving over the state space S = R, and fully characterised by the conditional density function We raise the following assumptions in order to relate the state density function of M s to the probability mass function of M p .More precisely, these assumptions are employed for the computation of the approximation error, but the abstraction approach proposed in this article can be applied without raising them.Assumption 2.2.For given sets Γ ⊂ S 2 and Λ 0 ⊂ S, there exist positive constants ǫ and ε 0 , such that t s (s|s) and π 0 (s) satisfy the following conditions: Assumption 2.3.The density functions π 0 (s) and t s (s|s) are (globally) Lipschitz continuous, namely there exist finite constants λ 0 , λ f , such that the following Lipschitz continuity conditions hold: (2.4) Moreover, there exists a finite constant M f such that The Lipschitz constants λ 0 , λ f are practically computed by taking partial derivatives of the density functions π 0 (•), t s (•|s) and maximising their norm.The sets Λ 0 and Γ will be used to truncate the support of the density functions π 0 (•) and t s (•|s), respectively.Assumption 2.2 enables the precise study of the behaviour of density functions π t (•) over the truncated state space.Furthermore, the Lipschitz continuity conditions in Assumption 2.3 are essential to derive error bounds related to the abstraction of the Markov process over the truncated state space.In order to compute these error bounds, we assign the infinity norm to the space of bounded measurable functions over the state space S, namely . = {g : S → R, g bounded and measurable}.

State-Space Truncation Procedure
We limit the support of the density functions π 0 , t s to the sets Λ 0 , Γ respectively, and recursively compute support sets Λ t , as in (3.2), that are associated to the density functions π t .Then we employ the quantities ǫ, ε 0 in Assumption 2.2 to compute bounds ε t , as in (3.1), on the error incurring in disregarding the value of the density functions π t outside the sets Λ t .Finally we truncate the original, unbounded state space to the set Υ = ∪ N t=0 Λ t .As intuitive, the error related to the spatial truncation depends on the behaviour of the conditional density function t s over the eliminated regions of the state space.Suppose that sets Γ, Λ 0 are selected such that Assumption 2.2 is satisfied with constants ǫ, ε 0 : then Theorem 3.3 provides an upper bound on the error obtained from manipulating the density functions in time π t (•) exclusively over the truncated regions of the state space.Theorem 3.1.Under Assumption 2.2 the functions π t satisfy the bound where the quantities {ε t , t ∈ Z N } are defined recursively by whereas the support sets {Λ t , t ∈ Z N } are computed as where Π s denotes the projection map along the second set of coordinates 3 .
Remark 3.2.Notice that if the shape of the sets Γ and Λ 0 is computationally manageable (e.g., if these sets are polytopes), then it is possible to precisely implement the computation of the recursion in (3.2) by available software tools, such as the MPT toolbox [22].Further, if for some t 0 , Λ t 0 +1 ⊃ Λ t 0 , then for all t ≥ t 0 , Λ t+1 ⊃ Λ t .Similarly, we have that • if for some t 0 , Λ t 0 +1 ⊂ Λ t 0 , then for all t ≥ t 0 , Λ t+1 ⊂ Λ t .
• if for some t 0 , Λ t 0 +1 = Λ t 0 , then for all t ≥ t 0 , Λ t = Λ t 0 .In order to clarify the role of Γ in the computation of Λ t , we emphasize that Λ t+1 = ∪ s∈Λt Ξ(s), where Ξ depends only on Γ and is defined by the set-valued map Let us introduce the quantity κ(t, M f ), which plays a role in the solution of (3.1) and will be frequently used shortly: The following theorem provides a truncation procedure, valid over a finite time horizon Z N = {0, 1, . . ., N }, which reduces the state space S to the set Υ = N t=0 Λ t .The theorem also formally quantifies the associated truncation error.
Theorem 3.3.Suppose that the state space of the process M s has been truncated to the set Υ = N t=0 Λ t .Let us introduce the following recursion to compute functions µ t : S → R ≥0 as an approximation of the density functions π t : Then the introduced approximation error is Recapitulating, Theorem 3.3 leads to the following procedure to approximate the density functions π t of M s over an unbounded state space S: (1) truncate π 0 in such a way that µ 0 has a bounded support Λ 0 ; (2) truncate the conditional density function t s (•|s) over a bounded set for all s ∈ S, then quantify Γ ⊂ S 2 as the support of the truncated density function; (3) leverage the recursion in (3.2) to compute the support sets Λ t ; (4) use the recursion in (3.4) to compute the approximate density functions µ t over the set Υ = ∪ N t=0 Λ t .Note that the recursion in (3.4) is effectively computed over the set Υ, since µ t (s) = 0 for all s ∈ S\Υ. 3 Recall that both Γ and Λ × S are defined over S 2 = S × S. Note that we could as well handle the support of µ t (•) over the time-varying sets Λ t , by adapting the recursion in (3.4) with 1 Λ t+1 instead of 1 Υ .However, while employing the (larger) set Υ may lead to an increase in memory requirements at each stage, it will considerably simplify the computations of the state-space partitioning and of the Markov chain abstraction: indeed, employing time-varying sets Λ t would render the partitioning procedure also time-dependent, and the obtained Markov chain would be time-inhomogeneous.We therefore opt to work directly with set Υ in order to avoid these difficulties.
Example 2.1 (Continued).We easily obtain a closed form for the sets Λ t = [β t , γ t ], via β t+1 = aβ t + b − ασ, γ t+1 = aγ t + b + ασ.Set Υ is the union of intervals [β t , γ t ].The error of the state-space truncation over Υ is The recursion in (3.4) indicates that the support of functions µ t are always contained in the set Υ (namely, they are equal to zero over the complement of Υ).We are thus only interested in computing these functions over the set Υ, which allows simplifying the recursion in (3.4) as follows: (3.5)

Markov Chain abstraction via State-Space Partitioning
In this section we assume that sets Γ, Λ 0 have been properly selected so that Υ = ∪ N t=0 Λ t is bounded.In order to formally abstract process M s as a finite Markov chain M p and to approximate its state density functions, we select a finite partition of the bounded set Υ as Υ = ∪ n i=1 A i , where sets A i have non-trivial measure.We then complete the partition over the whole state space S = ∪ n+1 i=1 A i by additionally including set A n+1 = S\Υ.This results in a finite Markov chain M p with n+1 discrete abstract states in the set N n+1 = {1, 2, . . ., n, n+ 1}, and characterised by the transition probability matrix P = [P ij ] ∈ R (n+1)×(n+1) , where the probability of jumping from any pair of states i to j (P ij ) is computed as for all j ∈ N n+1 , and where δ (n+1)j is the Kronecker delta function (the abstract state n + 1 of M p is absorbing), and L(•) denotes the Lebesgue measure of a set (i.e., its volume).The quantities in (4.1) are well-defined since the set Υ is bounded and the measures L(A i ), i ∈ N n , are finite and non-trivial.Notice that matrix P for the Markov chain M p is stochastic, namely The initial distribution of M p is the pmf p 0 = [p 0 (1), p 0 (2), . . ., p 0 (n + 1)], and it is obtained from π 0 as p 0 (i) = A i π 0 (s)ds, ∀i ∈ N n+1 .Then the pmf associated to the state distribution of M p at time t can be computed as p t = p 0 P t .It is intuitive that the discrete pmf p t of the Markov chain M p approximates the continuous density function π t of the Markov process M s .In the rest of the section we show how to formalise this relationship: p t is used to construct an approximate function, denoted by ψ t , of the density π t .Theorem 4.2 shows that ψ t is a piecewise constant approximation (with values in its codomain that are the entries of the pmf p t , normalised by the Lebesgue measure of the associated partition set) of the original density function π t .Moreover, under the continuity assumption in (2.4) (ref.Lemma 4.1) we can establish the Lipschitz continuity of π t , which enables the quantification (in Theorem 4.2) of the error related to its piecewise constant approximation ψ t .Lemma 4.1.Suppose that the inequality in (2.4) holds.Then the state density functions π t (•) are globally Lipschitz continuous with constant λ f for all t ∈ N: Theorem 4.2.Under Assumptions 2.2 and 2.3, the functions π t (•) can be approximated by piecewise constant functions ψ t (•), defined as where 1 B (•) is the indicator function of a set B ⊂ S. The approximation error is upperbounded by the quantity ) where E t can be recursively computed as and δ is an upper bound on the diameters of the partition sets Note that the functions ψ t are defined over the whole state space S, but (4.2) implies that they are equal to zero outside the set Υ.
Corollary 4.3.The recursion in (4.4) admits the explicit solution Underlying Theorem 4.2 is the fact that ψ t (•) are in general sub-stochastic density functions: This is clearly due to the fact that we are operating on the dynamics of M s truncated over the set Υ.It is thus intuitive that the approximation procedure and the derived error bounds are also valid for the case of sub-stochastic density functions [3], namely the only difference being that the obtained Markov chain M p is as well sub-stochastic.Further, whenever the Lipschitz continuity requirement on the initial density function, as per (2.3) in Assumption 2.3, does not hold, (for instance, this is the case when the initial state of the process is deterministic) we can relax this continuity assumption on the initial distribution of the process by starting the discrete computation from the time step t = 1.In this case we define the pmf p 1 = [p 1 (1), p 1 (2), . . ., p 1 (n + 1)], where and derive p t = p 1 P t−1 for all t ∈ N. Theorem 4.2 follows along similar lines, except for equation (4.4), where the initial error is set to E 0 = 0 and the time-dependent terms E t can be derived as It is important to emphasise the practical computability of the derived errors, and the fact that they can be tuned by selecting a finer partition of set Υ that relates to a smaller global parameter δ.Further, in order to attain abstractions that are practically useful, it imperative to seek improvements on the derived error bounds: in particular, the approximation errors can be computed locally (under corresponding local Lipschitz continuity assumptions), following the procedures discussed in [13].
Example 2.1 (Continued).The error of the Markov chain abstraction can be expressed as The error can be tuned in two distinct ways: (1) by selecting larger values for α, which on the one hand leads to a less narrow truncation, but on the other requires the partition of a larger interval; (2) by reducing partitions diameter δ, which of course results in a larger cardinality of the partition sets.Let us select values b = 0, β 0 = 0, γ 0 = 1, σ = 0.1, and time horizon N = 5.For a = 1.2 we need to partition the interval Υ = [−0.75α,2.49 + 0.75α], which results in the error π t − ψ t ∞ ≤ 86.8δ+35.9φ 1 (α) for all t ∈ Z N .For a = 0.8 we need to partition the smaller interval Υ = [−0.34α,0.33 + 0.34α], which results in the error π t − ψ t ∞ ≤ 198.6δ + 82.1φ 1 (α) for all t ∈ Z N .Notice that in the case of a = 1.2, we partition a larger interval and obtain a smaller error, while for a = 0.8 we partition a smaller interval with correspondingly a larger error.It is obvious that the parameters δ, α can be chosen properly to ensure that a certain error precision is met.This simple model admits a solution in closed form, and its state density functions can be obtained as the convolution of a uniform distribution (the contribution of initial state) and a zero-mean Gaussian distribution with time-dependent variance (the contributions of the process noise).This leads to the plots in Figure 2, which display the original and the approximated state density functions for the set of parameters α = 2.4, δ = 0.05.

Higher-Order Approximation Schemes
In the previous section we have shown that a Markov chain abstraction can be employed to formally approximate the density function of a Markov process in time.This abstraction is interpreted as a piecewise-constant approximation of the density function of the Markov model.In this section we argue that this procedure can be extended to approximation techniques based on higher-order interpolations.With focus on the truncated region of the state space, let us denote with B(Υ) the space of bounded and measurable functions f : Υ → R, equipped with the infinity norm characterises the solution of the recursion in (3.5) as µ t (s) = R t Υ (µ 0 )(s), for any t ∈ N N .While in Section 4 we have proposed approximations of functions µ t (•) by piecewise-constant functions ψ t (•) with an explicit quantification of the associated error, in this section we are interested in considering approximations via higher-order interpolations.5.1.Quantification of the Error of a Projection Over a Function Space.Consider a set of basis functions Φ = {φ 1 (s), φ 2 (s), . . ., φ h (s)}, h ∈ N, the function space Ψ = span Φ generated by this set as a subset of B(Υ), and a linear operator Π Υ : B(Υ) → Ψ, which projects any function f ∈ B(Υ) onto the function space Ψ.Theorem 5.1 provides a theoretical result for approximating the solution of (3.5): the following section provides details on turning this result into a useful tool for approximations.
Theorem 5.1.Assume that a linear projection operator and that there exists a finite constant M h f , such that Then it holds that where the error E h t satisfies the difference equation Corollary 5.2.Under the assumptions raised in (5.2)-(5.3), the error E h t can be alternatively expressed explicitly as The error E h t formulated in Theorem 5.1 is comparable with the quantity E t computed in Theorem 4.2.Both E h t , E t represent bounds on the approximation error introduced by µ t (•), the density function obtained after state space truncation.The difference is in the initialisation of the corresponding recursions, where we have E h 0 = 0 because ψ h 0 = µ 0 , but E 0 = λ 0 δ since ψ 0 is a piecewise constant approximation of µ 0 in (4.2).As we mentioned before, the quantities in (4.5) can be alternatively employed as the starting values of the computation to relax the continuity assumption on π 0 , which results in an initial error E 0 = 0, thus providing a complete similarity between E t and E h t .

Construction of the Projection Operator.
In the ensuing sections we focus, for the sake of simplicity, on a Euclidean domain, namely Υ ⊂ S = R d , where d denotes a finite dimension.We discuss a general form for an interpolation operator related to the discussed projection operation.Let φ j : D ⊂ R d → R, j ∈ N h , be independent functions defined over a generic set D. The interpolation operator Π D is defined as a projection map onto the function space Ψ = span{φ 1 (s), φ 2 (s), . . ., φ h (s)}, which projects any function f : D → R to a unique function Π D (f ) = h j=1 α j φ j , using a finite set of data {(s j , f (s j ))|s j ∈ D, j ∈ N h } and such that Π D (f )(s j ) = f (s j ).The projection coefficients α j , j ∈ N h , satisfy the linear equation f = Qα, where f = [f (s i )] i∈N h and α = [α j ] j∈N h are h-dimensional column vectors, and Q = [φ j (s i )] i,j is the associated (h × h)-dimensional interpolation matrix.
Let us now shift the focus to the recursion in (3.5) discussed in the previous section and tailor the operators above accordingly.Let us select a partition {A i , i ∈ N n } for the set Υ, with finite cardinality n.Selecting a basis {φ ij , j ∈ N h } for each partition, let us introduce the interpolation operators Π A i for the projection over each partition set A i , which is done as described above by replacing the domain D with A i .Finally, let us introduce the (global) linear operator Π Υ , acting on a function f : Υ → R by where f | A i represents the restriction of the domain of function f to the partition set A i .

Approximation
Algorithm.An advantage of the interpolation operator in (5.6) is that Π Υ (f ) is fully characterised by the interpolation coefficients α ij , since The set of interpolation coefficients α ij is computable by matrix multiplication based on the data set {f These matrices depend solely on the interpolation points s ij and on the basis functions φ ij evaluated at these points and can be computed off-line (see step 4 in Algorithm 1, to be discussed shortly).Moreover, values of the function f only at the interpolation points s ij are sufficient for the computation of α ij .
Let us now focus on the recursion in (5.4), namely ψ h t+1 = Π Υ R Υ (ψ h t ), given the initialisation ψ h 0 = µ 0 , for the approximate computation of the value functions.This recursion indicates that the approximate functions ψ h t , t ∈ N N , belong to the image of the operator Π Υ , and as such can be expressed as where α t ij denote the interpolation coefficients referring to function ψ h t (at step t).This suggests that we need to store and update the coefficients α t ij for each iteration in (5.4).

Writing the recursion in the form
is in the range of the projection Π Υ .Therefore, it is sufficient to evaluate the function R Υ (ψ h t ) over the interpolation points in order to compute the coefficients α t+1 ij .In the following expressions, the pair i, u indicates the indices of related partition sets, namely A i , A u , whereas the pair of indices j, v show the ordering positions within partition sets.For an arbitrary interpolation point s uv we have: Introducing the following quantities we can succinctly express Algorithm 1 provides a general procedure for the discrete computation of the interpolation coefficients and of the approximate value functions. Algorithm Compute values β t+1 uv as β t+1 uv = i j α t ij P uv ij , for all u, v 9: It is possible to simplify Algorithm 1 when the interpolation matrices Q i are nonsingular.Let us transform the basis {φ ij , j ∈ N h } to its equivalent basis using matrix The interpolation matrices corresponding to the new basis will be the identity matrix.In other words, the new basis functions admit Q i = I h and thus step 4 can be skipped, and that the main update (steps 7 and 8) can be simplified as follows: In Algorithm 1, the interpolation points s ij are in general pair-wise distinct.By extending the domain of interpolation A i to its closure Āi , it is legitimate to use boundary points as interpolation points, which can lead to a reduction of the number of integrations required in Algorithm 1.In the ensuing sections, we will exploit this feature by specifically selecting equally spaced interpolation points.

Special Forms of the Projection Operator
In this section we leverage known interpolation theorems for the construction of the projection operator Π Υ : this should both yield useful schemes for a number of standard models, and further help with the understanding of the details discussed in the previous section.6.1.Piecewise Constant Approximations.We focus on the special case of the approximation of a function by a piecewise constant one, which has inspired Section 4. Let us select the basis functions φ ij (s) = 1 for all i ∈ N n , j ∈ N 1 -the cardinality of these sets of basis functions is simply equal to h = 1 (we eliminate the corresponding indices when appropriate).In this case the matrix operators Q i , i ∈ N n , (cf.step 4 in Algorithm 1) correspond to the identity matrix, and the projection operator Π Υ becomes where the quantities P uv ij (cf.step 3 in Algorithm 1) form a square matrix (see step 3 in Algorithm 2).The procedure is detailed in Algorithm 2, while the associated error is formally quantified in Theorem 6.1.
Algorithm 2 Piecewise constant computation of the functions ψ h t Require: Density function t s (s|s), set Υ 1: Select a finite n-dimensional partition of the set Υ = ∪ n i=1 A i (A i are non-overlapping) 2: For each A i , select one representative point s i ∈ A i 3: Compute matrix P = [P (i, j)] i,j with entries P (i, j) = A i t s (s j |s)ds, where i, j ∈ N n 4: Set t = 1 and α 1 (i) = Υ t s (s i |s)µ 0 (s)ds, for all i 5: if t < N then 6: Compute the row vector α t+1 = [α t+1 (i)] i based on α t+1 = α t P 7: Suppose the density function t s (•|s) satisfies the Lipschitz continuity assumption (2.4) with constant λ f .Then the projection operator (6.1) satisfies the inequality where δ = max i δ i is the partition diameter of ∪ n i=1 A i = Υ, with δ i = sup{ s − s ′ : s, s ′ ∈ A i }.Theorem 5.1 ensures that the approximation error of Algorithm 2 is upper bounded by the quantity Notice that the error E h t of Theorem 6.1 reduces to E t in Theorem 4.2 when employing the quantities in (4.5) to relax the continuity assumption on π 0 .
Let us compare Algorithms 1 and 2 in terms of their computational complexity.Algorithm 1 requires nh(nh + 1) integrations in the marginalisation steps (3 and 5), whereas n(n + 1) integrations are required in Algorithm 2. Furthermore, steps 4 and 7 in Algorithm 1 can be skipped by using proper equivalent basis functions, whereas these steps are not needed at all in Algorithm 2. As a bottom line, higher interpolation orders increase the computational complexity of the approximation procedure, however this can as well lead to a lower global approximation error.From a different perspective, since the global approximation error depends on the local partitioning sets (their diameter and the local continuity of the density function), for a given error higher interpolation procedures may require less partitions sets.
As a final note, comparing the transition probabilities of (4.1) with quantities P (i, j) in step 3 of Algorithm 2 reveals that the Markov chain abstraction presented in Section 4 is a special case of Algorithm 2.More precisely, the mean value theorem for integration ensures the existence of representative points s i such that P (i, j) of Algorithm 2 is equal to P ij in (4.1).6.2.Higher-order Approximations for One-Dimensional Systems.We study higherorder interpolations over the real axis, where the partition sets A i are real-valued intervals.We use this simple setting to quantify the error related to the approximate computation of the functions µ t .We select equally spaced points as the interpolation points and employ polynomial basis functions within each interval.
Consider a one dimensional Markov process, S = R, with a partitioning of Υ = ∪ n i=1 A i which is such that Define the interpolation operator Π Υ of (5.6) over the polynomial basis functions φ ij (s) = s j−1 , i ∈ N n , j ∈ N h , h ≥ 2, (or their equivalent Lagrange polynomials [24]) using equally spaced interpolation points The following result can be adapted from [24].
Theorem 6.2.Assume that the density function t s (•|s) is h-times differentiable and define the constant The interpolation operator Π Υ , constructed with polynomial basis functions and equally spaced interpolation points, satisfies the inequality where δ = max i δ i , with δ i = b i − a i , i ∈ N n , and where h is the cardinality of the set of basis functions.Theorem 6.2 provides the necessary ingredients for Theorem 5.1, leading to the quantification of the approximation error: employing Algorithm 1 with equally spaced points and polynomial basis functions of degree less than h, the approximation error is upper bounded by the quantity 3) and computed for this particular choice of basis functions and points.
It is worth highlighting that, unlike the piecewise constant case of Section 4, with higherorder approximation approaches the global error is a nonlinear function of the partition size δ, namely it depends on a power of the partition size contingent on the order of the selected interpolation operator.As such, its convergence speed, as δ is decreased, increases over that of the piecewise constant case.Example 2.1 (Continued).We partition the set Υ = ∪ n i=1 A i for the one dimensional system of Example 2.1 with the intervals A i = [a i , b i ].We select interpolation points {a i , a i+1 } with polynomial basis functions {1, s}, leading to piecewise affine approximations (namely, first-order interpolation with h = 2) of the density functions π t (•).This set of basis functions can be equivalently transformed to The constant M h f has the same value as M f = 1/a and the quantity M 2 in Theorem 6.2 is M 2 = 1/σ 3 √ 2π.The error related to this first-order approximation can be upper bounded as Notice that the first part of the error in (6.2), which specifically relates to the first-order approximation, is proportional to δ 2 -this improves the error bound computed in (4.6).Algorithm 1 is implemented for this linear system with the aforementioned parameters b = 0, β 0 = 0, γ 0 = 1, σ = 0.1, and the time horizon N = 5.The errors corresponding to the values a = 1.2 and a = 0.8 are analytically upper-bounded, for any t ∈ N N , as π t − ψ h t ∞ ≤ 179δ 2 + 35.9φ 1 (α) and as π t − ψ h t ∞ ≤ 409.3δ 2 + 82.1φ 1 (α), respectively.The plots in Figure 3 display the first-and zero-order approximations of the density function π N (•) and compare it with the analytic solution for two different values a = 1.2 (left) and a = 0.8 (right).The partition size n = 25 and parameter α = 2.4 have been selected in order to illustrate the differences of the two approximation methods in Figure 3, but may be increased at will in order to decrease the related error bound to match a desired value.6.3.Bilinear Interpolation for Two-Dimensional Systems.We directly tailor the results of Section 5 to a general two-dimensional Markov process, where s = (s 1 , s 2 ) ∈ S = R 2 .Assume that set Υ is replaced by a superset that is comprised of a finite union of rectangles: this replacement does not violate the bound on the truncation error formulated in Section 3. Consider a uniform partition (using squared partition sets of size δ) for the set Υ. We employ a bilinear interpolation within each partition set or their equivalent Lagrange polynomials [11]).Assume that the density function t s (•|s) is partially differentiable and define the following bounds on its derivatives The operator Π Υ in (5.6), constructed with bilinear interpolation within each partition set, satisfies the inequality We implement Algorithm 1 for two-dimensional processes using bilinear interpolation: the approximation error is upper-bounded by the quantity 3) and computed for this particular choice of basis functions and points.It can be proved that for bilinear interpolation basis functions, the constant M h f of (5.3) is upper bounded by M f of Assumption 2.3, and thus can be replaced by this quantity in the error computation.6.4.Trilinear Interpolation for Three-Dimensional Systems.We now apply the results of Section 5 to a general three-dimensional Markov process, where s = (s 1 , s 2 , s 3 ) ∈ S = R 3 .Again we replace the set Υ by a superset that is comprised of a finite union of boxes, without violating the bound on the truncation error formulated in Section 3. Consider a uniform partition (using cubic sets of size δ) for the set Υ. We employ a trilinear interpolation within each partition set with basis functions Assume that the density function t s (•|s) is partially differentiable and define the following bounds on its derivatives We implement Algorithm 1 for three-dimensional processes using trilinear interpolation in the operator Π Υ (5.6).The approximation error is then upper bounded by the quantity Similar to the bilinear interpolation case, the constant M h f can be replaced by M f of Assumption 2.3 in the error computation.

Application of the Formal Approximation Procedure to the Probabilistic Invariance Problem
The problem of probabilistic invariance (or, equivalently, safety) for general Markov processes has been theoretically characterised in [4] and further investigated computationally in [2,10,11,12].With reference to a discrete-time Markov process M s over a continuous state space S, and to a safe set A ∈ B(S), the goal is to quantify the probability More generally, it is of interest to quantify the probability p N π 0 (A), where the initial condition of the process s(0) is a random variable characterised by the density function π 0 (•).In Section 7.1 we present a forward computation of probabilistic invariance by application of the approximation procedure above, then review results on backward computation [2,10,11,12] in Section 7.2.We conclude in Section 7.3 with a comparison of the two approaches.
7.1.Forward Computation of Probabilistic Invariance.The technique for approximating the density function of a process in time can be easily employed for the approximate computation of probabilistic invariance.Define functions W t : S → R ≥0 , characterised as Then the solution of the problem is obtained as p N π 0 (A) = S W N (s)ds.A comparison of the recursions in (7.1) and in (3.4) reveals how probabilistic invariance can be computed as a special case of the general approximation procedure in this work.In applying the procedure, the only difference consists in replacing set Υ by the safe set A, and in restricting Assumption 2.3 to hold over the safe set -the solution over the complement of this set is trivially known, as such the error related to the truncation of the state space can be disregarded.The procedure consists in partitioning the safe set, in constructing the Markov chain M p as per (4.1), and in computing ψ t (•) as an approximation of W t (•) based on (4.2).The error of this approximation is W t − ψ t ∞ ≤ E t , which results in the following: Note that the sub-density functions satisfy the inequalities 7.2.Backward Computation of Probabilistic Invariance.The contributions in [2,10,11,12] have characterised specifications in PCTL with an alternative formulation based on backward recursions.In particular, the computation of probabilistic invariance can be obtained via the value functions V t : S → [0, 1], which are characterised as The desired probabilistic invariance is expressed as p N π 0 (A) = S V 0 (s)π 0 (s)ds.The value functions always map the state space to the interval [0, 1] and they are non-increasing, namely V t (s) ≤ V t+1 (s) for any fixed s ∈ S. The contributions in [2,10,11,12] discuss efficient algorithms for the approximate computation of the quantity p N π 0 (A), relying on different assumptions on the model under study.The easiest and most straightforward procedure is based on the following assumption [2].
Assumption 7.1.The conditional density function of the process is globally Lipschitz continuous with respect to the conditional state within the safe set.Namely, there exists a finite constant λ b , such that The procedure introduces a partition of the safe set A = ∪ n i=1 A i and extends it to S = ∪ n+1 i=1 A i , with A n+1 = S\A.Then it selects arbitrary representative points s i ∈ A i and constructs a finite-state Markov chain M b over the finite state space {s 1 , s 2 , . . ., s n+1 }, endowed with transition probabilities P (s i , s j ) = A j t s (s|s i )ds, P (s n+1 , s j ) = δ (n+1)j , ( for all i ∈ N n , j ∈ N n+1 .The error of such an approximation is [11]: where δ is the max partitions diameter, and L(A) is the Lebesgue measure of set A.

7.3.
Comparison of the Two Approaches.We first compare the two constructed Markov chains.The Markov chain M p obtained with the abstraction from the forward approach is a special case of the Markov chain M b from the backward approach: in the latter case in fact the representative points can be selected intelligently to determine the average probability of jumping from one partition set to another.More specifically, the quantities (4.1) are a special case of those in (7.3) (based on the mean value theorem for integration).We will show that this leads to a less conservative (smaller) error bound for the approximation.The forward computation is in general more informative than the backward computation since it provides not only the solution of the safety problem in time, but also the state distribution over the safe set.Further the forward approach may provide some insight to the solution of the infinite-horizon safety problem [28,30] for a given initial distribution.As discussed in [30], solution of the infinite-horizon safety problem depends on the existence of absorbing subsets of the safe set.The outcome of the forward approach can provide evidence on the non existence of such subsets.Finally, the forward approach presented in Sections 3-6 for approximating density functions can be used to approximate the value functions in the recursion (7.1) over unbounded safe sets since we do not require the state space (thus also the safe set) to be bounded, while boundedness of the safe set is required in all the results in the literature that are based on backward computations.
Next, we compare errors and related assumptions.The error computations rely on two different assumptions: the Lipschitz continuity of the conditional density function with respect to the current state or to the next state, respectively.Further, the constants M f and M b are generally different and play an important role in the form of the error.M b represents the maximum probability of remaining within a given set, while M f is an indication of the maximum concentration of the process evolution towards one state, over a single time-step.M b is always less than or equal to one, while M f could be any finite positive number.
If 0 < a < 1, the system trajectories converge to an equilibrium point (in expected value).
In this case the model solution has higher chances of ending up in a neighbourhood of the equilibrium in time, and the backward recursion provides a better error bound.If a > 1, the system trajectories tend to diverge with time.In this case the forward recursion provides a much better error bound, compared to the backward recursion.
For the numerical simulation we select a safety set A = [0, 1], a noise level σ = 0.1, and a time horizon N = 10.The solution of the safety problem for the two cases a = 1.2 and a = 0.8 is plotted in Figure 4. We have computed constants λ f = 24.20,M b = 1 (in both cases), while λ b = 29.03,M f = 0.83 for the first case and λ b = 19.36,M f = 1.25 for the second case.We have selected the center of the partition sets (distributed uniformly over the set A) as representative points for the Markov chain M b .In order to compare the two approaches, we have assumed the same computational effort (related to the same partition size of δ = 0.7 × 10 −4 ), and have obtained an error E f = 0.008, E b = 0.020 for a = 1.2 and E f = 0.056, E b = 0.014 for a = 0.8.The simulations show that the forward approach works better for a = 1.2, while the backward approach is better suitable for a = 0.8.Note that the approximate solutions provided by the two approaches are very close: the difference of the transition probabilities computed via the Markov chains M f , M b are in the order of 10 −8 , and the difference in the approximate solutions (black curve in Figure 4) is in the order of 10 −6 .This has been due to the selection of very fine partition sets that have resulted in small abstraction errors.Remark 7.2.Over deterministic models, [25] compares forward and backward reachability analysis and provides insights on their differences: the claim is that for systems with significant contraction, forward reachability is more effective than backward reachability because of numerical stability issues.On the other hand, for the probabilistic models under study, the result indicates that under Lipschitz continuity of the transition kernel the backward approach is more effective in systems with convergence in the state distribution.If we treat deterministic systems as special (limiting) instances of stochastic systems, our result is not contradicting with [25] since the Lipschitz continuity assumption on the transition kernels of probabilistic models does not hold over deterministic ones.Motivated by the previous example, we study how the convergence properties of a Markov process are related to the constant M f .Theorem 7.3.Assume that the initial density function π 0 (s) is bounded and that the constant M f is finite and M f < 1.If the state space is unbounded, the sequence of density functions {π t (s)|t ≥ 0} uniformly exponentially converges to zero.The sequence of probabilities P{s(t) ∈ A} and the corresponding solution of the safety problem for any compact safe set A exponentially converge to zero.Theorem 7.3 indicates that under the invoked assumptions the probability "spreads out" over the unbounded state space as time progresses.Moreover, the theorem ensures the absence of absorbing sets [28,30], which are indeed known to characterise the solution of infinite-horizon properties.Example 7.4 studies the relationship between constant M f and the stability of linear stochastic difference equations.where w(•) are i.i.d. with known distribution t w (•).Suppose that the vector field f : R d × R d → R d is continuously differentiable and that the matrix ∂f ∂w is invertible.Then the implicit function theorem guarantees the existence and uniqueness of a function g : R d × R d → R d such that w(t) = g(s(t + 1), s(t)).The conditional density function of the system in this case is [26]: t s (s|s) = det ∂g ∂s (s, s) t w (g(s, s)).
The Lipschitz constants λ f , λ b are specified by the dependence of function g(s, s) from the variables s, s, respectively.As a special case the invertibility of ∂f ∂w is guaranteed for systems with additive process noise, namely f (s, w) = f a (s) + w.Then g(s, s) = s − f a (s), λ f is the Lipschitz constant of t w (•), while λ b is the multiplication of the Lipschitz constant of t w (•) and of f a (•).

Conclusions
This contribution has put forward new algorithms, based on Markov chain abstractions, for the efficient computation of approximate solutions of the state distribution function in time of Markov processes evolving over continuous state spaces.A higher-order function approximation method has also been presented, with a formal derivation of an upper bound on the associated error.The approach has been applied to the verification of a particular non-nested PCTL formula (expressing probabilistic safety or invariance), and compared with an alternative computational approach from the literature.
Proof of Theorem 5.1.The definition of ψ h t implies that ψ h t+1 = Π Υ R Υ (ψ h t ), with ψ h 0 = µ 0 .Then (5.5) is true for t = 0 with E h 0 = 0. Assume that µ t − ψ h t ∞ ≤ E h t ; then for any s ∈ Υ, On the other hand, the second term is upper bounded as follows:

Figure 1
Figure 1 provides a visual illustration of the recursion in (3.2).

Figure 1 :
Figure 1: Graphical representation of the recursion in (3.2) for sets Λ t .

Figure 4 :
Figure 4: Approximate solution of the probabilistic invariance problem (thin black line), together with error intervals of forward (blue band) and backward (red band) approaches, for a = 1.2 (left) and a = 0.8 (right).

Example 2 . 1 (
Continued).The constants λ f , M f and λ b , M b for the one dimensional dynamical system of Example 2.1 are

V
t : S → [0, 1] value functions for backward computation of the safety probabilityE f = κ(N, M f )λ f δL(A)error of forward computation of the safety probabilityE b = κ(N, M b )λ b δL(A)error of backward computation of the safety probability λ bLipschitz constant of t(s|s) with respect to s M b upper bound for A t s (s|s)ds, for all s ∈ A M b finite-state Markov chain obtained via the backward abstraction approach P (s i , s j ) transition probabilities of the Markov chain M b

1
Approximate computation of the functions ψ h For each A i , select interpolation basis functions φ ij and points s ij ∈ A i , where j ∈ N h 3: Compute P uv ij = A i t s (s uv |s)φ ij (s)ds, where i, u ∈ N n and j, v ∈ N h 4: Compute a matrix representation for the operators Π A i , namely t Require: Density function t s (s|s), set Υ 1: Select a finite n-dimensional partition of the set Υ = ∪ n i=1 A i (A i are non-overlapping) 2: Then {π t } converges to zero uniformly exponentially, with a rate M f .With focus on the safety problem we have P{s(u) ∈ A for all u ∈ N t } ≤ P{s(t) ∈ A}, where ψ h t )(s)| ≤ Υ |Π Υ (t s (s|s))||µ t (s) − ψ h t (s)|ds ≤ E h t Υ |Π Υ (t s (s|s))|ds ≤ E h t M h f .The addition of the two bounds leads to the statement.Proof of Theorem 7.3.Based on the recursion over the density functions, we have thatπ t+1 (s) ≤ sup{π t (s), s ∈ S} S t s (s|s)ds ≤ M f sup{π t (s), s ∈ S} ⇒ π t (s) ≤ sup{π 0 (s), s ∈ S}M t f , ∀t ∈ N.t→∞ P{s(t) ∈ S} = lim t→∞ 1 = 1, then the state-space cannot be bounded under the assumptions.