A framework to measure the robustness of programs in the unpredictable environment

Due to the diffusion of IoT, modern software systems are often thought to control and coordinate smart devices in order to manage assets and resources, and to guarantee efficient behaviours. For this class of systems, which interact extensively with humans and with their environment, it is thus crucial to guarantee their correct behaviour in order to avoid unexpected and possibly dangerous situations. In this paper we will present a framework that allows us to measure the robustness of systems. This is the ability of a program to tolerate changes in the environmental conditions and preserving the original behaviour. In the proposed framework, the interaction of a program with its environment is represented as a sequence of random variables describing how both evolve in time. For this reason, the considered measures will be defined among probability distributions of observed data. The proposed framework will be then used to define the notions of adaptability and reliability. The former indicates the ability of a program to absorb perturbation on environmental conditions after a given amount of time. The latter expresses the ability of a program to maintain its intended behaviour (up-to some reasonable tolerance) despite the presence of perturbations in the environment. Moreover, an algorithm, based on statistical inference, is proposed to evaluate the proposed metric and the aforementioned properties. We use two case studies to the describe and evaluate the proposed approach.


Introduction
With the advent of IoT era [Kop11] we witnessed to the diffusion of a plethora of smart interconnected devices that permit sensing data from the environment and to operate on it. This revolution stimulated the design of increasingly pervasive systems, often named Cyber-Physical Systems (CPSs) [RLSS10], that are thought to control and coordinate these devices in order to manage assets and resources, and to guarantee efficient behaviours. In particular, it is crucial to guarantee that these devices always behave "correctly" in their A randomised algorithm that permits estimating the evolution sequences of systems and thus for the evaluation of the evolution metric is then introduced. Following [TK10], the Wasserstein metric is evaluated in time O(N log N ), where N is the (maximum) number of samples. We already adopted this approach in [CLT20a] in the context of finite-states selforganising collective systems, without any notion of environment or data space. Moreover, we will also show how the proposed algorithm can be used to estimate the robustness, adaptability, and reliability, of a system.
Organisation of contents. After reviewing some background mathematical notions in Section 2, we devote Section 3 to the presentation of our model. In Section 4 we discuss the evolution metric and all the ingredients necessary to define it. Then, in Section 5 we provide the algorithm, based on statistical inference, for the evaluation of the metric. The notions of adaptability and reliability of programs are formally introduced in Section 6 together with an example of application of our algorithm to their evaluation. In these sections a running scenario is used to clarify the proposed approach. This consists of a stochastic variant of the three-tanks laboratory experiment described in [RKO + 97]. In Section 7 our methodology is used to analyse the engine system proposed in [LMMT21]. This consists of two supervised self-coordinating refrigerated engine systems that are subject to cyber-physical attacks aiming to inflict overstress of equipment [GGI + 15]. Finally, Section 8 concludes the paper with a discussion on related work and future research directions.

Background
In this section we introduce the mathematical background on which we build our contribution. We remark that we present in detail only the notions that are fundamental to allow any reader to understand our work, like, e.g., those of probability measure and metric. Other notions that are needed exclusively to guarantee that the Mathematical constructs we use are well-defined (e.g., the notion of Radon measure for Wasserstein hemimetrics) are only mentioned. The interested reader can find all the formal definitions in any Analysis textbook. Measurable spaces and measurable functions. A σ-algebra over a set Ω is a family Σ of subsets of Ω such that: (1) Ω ∈ Σ, (2) Σ is closed under complementation, and (3) Σ is closed under countable union. The pair (Ω, Σ) is called a measurable space and the sets in Σ are called measurable sets, ranged over by A, B, . . . . For an arbitrary family Φ of subsets of Ω, the σ-algebra generated by Φ is the smallest σ-algebra over Ω containing Φ. In particular, we recall that given a topology T over Ω, the Borel σ-algebra over Ω, denoted B(Ω), is the σ-algebra generated by the open sets in T . For instance, given n ∈ N + , B(R n ) is the σ-algebra generated by the open intervals in R n .
Assume a probability space (Ω, Σ, µ) and a measurable space (Ω ′ , Σ ′ ). A function X : Ω → Ω ′ is called a random variable if it is Σ-measurable. The distribution measure, or cumulative distribution function (cdf), of X is the probability measure µ X on (Ω ′ , Σ ′ ) defined by µ X (A) = µ(X −1 (A)) for all A ∈ Σ ′ . Given random variables X i from (Ω i , Σ i , µ i ) to (Ω ′ i , Σ ′ i ), i = 1, . . . , n, the collection X = [X 1 , . . . , X n ] is called a random vector. The cdf of a random vector X is given by the joint distribution of the random variables in it.
Notation 2.1. With a slight abuse of notation, in the examples and explanations throughout the paper, we will sometimes use directly the cdf of a random variable rather than formally introducing the probability measure defined on the domain space. In fact, since we will consider Borel sets over R n , for some n ≥ 1, it is usually easier to define the cdfs than the probability measures on them. Similarly, when the cdf is absolutely continuous with respect to the Lebesgue measure, then we shall reason directly on the probability density function (pdf) of the random variable, namely the Radon-Nikodym derivative of the cdf with respect to the Lebesgue measure.
Consequently, we shall also henceforth use the more suggestive term distribution in place of the terms probability measure, cdf and pdf.
In this paper we are interested in defining a hemimetric on distributions. To this end we will make use of the Wasserstein lifting [Vas69], which evaluates the infimum, with respect to the joint distributions, of the expected value of the distance over the two distributions, and whose definition is based on the following notions and results. Given a set Ω and a topology T on Ω, the topological space (Ω, T ) is said to be completely metrisable if there exists at least one metric m on Ω such that (Ω, m) is a complete metric space and m induces the topology T . A Polish space is a separable completely metrisable topological space. In particular, we recall that: (i) R is a Polish space; and (ii) every closed subset of a Polish space is in turn a Polish space. Moreover, for any n ∈ N , if Ω 1 , . . . , Ω n are Polish spaces, then the Borel σ-algebra on their product coincides with the product σ-algebra generated by their Borel σ-algebras, namely (This is proved, e.g., in [Bog07] as Lemma 6.4.2 whose hypothesis are satisfied by Polish spaces since they are second countable.) These properties of Polish spaces are interesting for us since they guarantee that all the distributions we consider in this paper are Radon measures and, thus, the Wasserstein lifting is well-defined on them. For this reason, we also directly present the Wasserstein hemimetric by considering only distributions on Borel sets.
Despite the original version of the Wasserstein distance being defined on a metric on Ω, the Wasserstein hemimetric given above is well-defined. We refer the interested reader to [FR18] and the references therein for a formal proof of this fact. In particular, the Wasserstein hemimetric is given in [FR18] as Definition 7 (considering the compound risk excess metric defined in Equation (31) of that paper), and Proposition 4 in [FR18] guarantees that it is indeed a well-defined hemimetric on Π(Ω, B(Ω)). Moreover, Proposition 6 in [FR18] guarantees that the same result holds for the hemimetric m(x, y) = max{y − x, 0} which, as we will see, plays an important role in our work (cf. Definition 4.2 below). Remark 2.3. As elsewhere in the literature, for simplicity and brevity, we shall henceforth use the term metric in place of the term hemimetric.

The model
We devote this section to introduce the three components of our systems, namely the data space D, the process P describing the behaviour of the program, and the environment evolution E describing the effects of the environment. As already discussed in the Introduction, our choice of modelling program and environment separately will allow us to isolate programs from the environment and favour their analysis. In order to help the reader grasp the details of our approach, we introduce a stochastic variant of the three-tanks laboratory experiment, that embodies the kind of programenvironment interactions we are interested in. Several variants (with different number of tanks) of the n-tanks experiment have been widely used in control program education (see, among the others, [AL92, RKO + 97, Joh00, ALGG + 06]). Moreover, some recently proposed cyber-physical security testbeds, like SWaT [MT16, AGA + 17], can be considered as an evolution of the tanks experiment. Here, we consider a variant of the three-tanks laboratory experiment described in [RKO + 97].
Please notice that this example is meant to serve as a toy example allowing us to showcase the various features of our tool, without carrying out a heavy mathematical formulation. We delay to Section 7 the presentation of a case-study showing how our framework can be applied to the analysis of (more complex) real-world systems.
Notation 3.1. In the upcoming examples we will slightly abuse of notation and use a variable name x to denote all: the variable x, the (possible) function describing the evolution in time of the values assumed by x, and the (possible) random variable describing the distribution of the values that can be assumed by x at a given time. The role of the variable name x will always be clear from the context. Example 3.2. As schematised in Figure 1, there are three identical tanks connected by two pipes. Water enters in the first and in the last tank by means of a pump and an incoming pipe, respectively. The last tank is equipped with an outlet pump. We assume that water flows through the incoming pipe with a rate q 2 that is determined by the environment, whereas the flow rates through the two pumps (respectively, q 1 and q 3 ) are under the control of the program controlling the experiment. The rates q 1 , q 2 , q 3 can assume values in the range [0, q max ], for a given a maximal flow rate q max . The task of the program consists in guaranteeing that the levels of water in the three tanks fulfil some given requirements. The level of water in tank i at time τ is denoted by l i (τ ), for i = 1, 2, 3, and is always in the range [l min , l max ], for suitable l min and l max giving, respectively, the minimum and maximum level of water in the tanks.
The principal difference between our version of the three-tanks and its classical formulation is the flow rate q 2 . In fact, we assume that q 2 cannot be controlled by the program, that can only indirectly observe the effects it has on l 3 and react to those. In our setting, this is equivalent to say that q 2 is under the control of the environment. Moreover, to obtain a more complex scenario, we assume that the value of q 2 (at a given time) can be described only probabilistically. This can be interpreted either as the effect of uncertainties (like, e.g., measurement errors of the sensors having to determine q 2 ), or as the consequence of an unpredictable behaviour of the environment. Consequently, the exact value of q 2 (τ ) is unknown and will be evident only at execution time. Still, we can consider different scenarios that render the assumptions we might have on the environment. For instance, we can assume that the flow rate of the incoming pipe is normally distributed with mean q med ∈ [0, q max ] and variance ∆ q > 0, expressed, assuming sampling time interval ∆τ , by: (3.1) In a more elaborated scenario, we could assume that q 2 varies at each step by a value v that is normally distributed with mean 0 and variance 1. In this case, we have: (3.2) Conversely, the two pumps are controlled by the program that, by reading the values of l 1 (τ ) and l 3 (τ ), can select the value of q 1 (τ + ∆τ ) and q 3 (τ + ∆τ ). In detail, the equations describing the behaviour of the program are the following: otherwise. (3.4) Above, l goal is the desired level of water in the tanks, while q step ∈ [0, q max ) is the variation of the flow rate that is controllable by the pump in one time step ∆τ . The idea behind Equation (3.3) is that when l 1 is greater than l goal + ∆ l , the flow rate of the pump is decreased by q step . Similarly, when l 1 is less than l goal − ∆ l , that rate is increased by q step . The reasoning for Equation (3.4) is symmetrical. In both cases, we use the threshold ∆ l > 0 to avoid continuous contrasting updates. From now on, we will say that we are in scenario 1 if q 2 is determined by Equation (3.1), and that we are in scenario 2 in case q 2 is determined by Equation (3.2). In both scenarios, q 1 is determined by Equation (3.3) and q 3 by Equation (3.4). The dynamics of l i (τ ) can be modelled via the following set of stochastic difference equations, with sampling time interval ∆τ : where q 1 , q 2 , q 3 are as above (i.e., the flow rates through, respectively, the pump connected to the first tank, the incoming pipe, and the outlet pump), and q ij denotes the flow rate from tank i to tank j. The evaluation of q ij (τ ) depends on l i (τ ), l j (τ ), and on the physical dimensions of the tanks, and it is obtained as follows. Let A be the cross sectional area of each tank, and a be the cross sectional area of the connecting and outlet pipes. The volume balance difference equations of each tank are then the following: (3.6) We then apply Torricelli's law to the equations in (3.6) to obtain the flow rates q 12 and q 23 : −a 23 (τ ) · a · 2g l 3 (τ ) − l 2 (τ ) otherwise; (3.7) where g is the gravitational constant, and a 12 , a 23 are the loss coefficients of the respective pipes. These coefficients are represented as time dependent functions since they depend on the geometry of the pipes and on the water level in the tanks. In our experiments, we use the approximation g = 9.81, and the following values for the aforementioned coefficients: (i) a = 0.5, (ii) a 12 = 0.75, (iii) a 23 = 0.75.
We now proceed to formally introduce the three components of our systems and their interaction.
3.1. Modelling the data space. We define the data space by means of a finite set of variables Var representing: (i) environmental conditions, such as pressure, temperature, humidity, etc., (ii) values perceived by sensors, which depend on the value of environmental conditions and are unavoidably affected by imprecision and approximations introduced by sensors, and (iii) state of actuators, which are usually elements in a discrete domain, like {on, off } (which can be easily mapped to reals as, e.g., {0, 1}). For each x ∈ Var we assume a measurable space (D x , B x ), with D x ⊆ R the domain of x and B x the Borel σ-algebra on D x . Without loosing generality, we can assume that each domain D x is either a finite set or a compact subset of R. In particular, this means that each D x is a Polish space.
As Var is a finite set, we can always assume it to be ordered, namely Var = {x 1 , . . . , x n } for a suitable n ∈ N. Definition 3.3 (Data space). We define the data space over Var, notation D Var , as the Cartesian product of the variables domains, namely D Var = × n i=1 D x i . Then, as a σ-algebra on D Var we consider the the product σ-algebra B D Var = n i=1 B x i . When no confusion arises, we will use D and B D in place of D Var and B D Var , respectively.
Example 3.4. The data space D 3T for the considered three-tanks system is defined on the set of variables Var = {q 1 , q 2 , q 3 , l 1 , l 2 , l 3 }, with domains q i ∈ D q = [0, q max ] and l i ∈ D l = [l min , l max ], for i = 1, 2, 3. We remark that ∆ l is not included in the data space. This is due to the fact that, in our setting, ∆ l is never modified by the environment (it is also constant in time), and it is an inherent property of the program: different programs might have different values of ∆ l , according to how capable they are of controlling and modifying the flow rates, or of reading the levels of water in the tanks.
Elements in D are n-ples of the form (v 1 , . . . , v n ), with v i ∈ D x i , which can be also identified by means of functions d : Var → R from variables to values, with d(x) ∈ D x for all x ∈ Var. Each function d identifies a particular configuration of the data in the data space, and it is thus called a data state. For simplicity, we shall write d ∈ D in place of (d(x 1 ), . . . , d(x n )) ∈ D. Since program and environment interact on the basis of the current values of data, we have that at each step there is a data state d that identifies the current state of the data space on which the next computation step is built. Given a data state d, we let d[x = v] denote the data state d ′ associating v with x, and d(y) with y for any y ̸ = x. For x = (x i 1 , . . . , 3.2. Modelling processes. We introduce a simple process calculus allowing us to specify programs that interact with a data state d in a given environment. We assume that the action performed by a process at a given computation step is determined probabilistically, according to the generative probabilistic model [vGSS95]. Definition 3.6 (Syntax of processes). We let P be the set of processes P defined by: where p, p 1 , . . . range over probability weights in [0, 1], I is finite, A ranges over process variables, op k indicates a measurable operator R k → R, and · denotes a finite sequence of elements. We assume that for each process variable A we have a single definition A def = P . Moreover, we require that i∈I p i = 1 for any process i∈I p i · P i , and that var(P 1 ) ∩ var(P 2 ) = ∅ for any process P 1 | P 2 , where var(P ) = {x ∈ Var | x ∈ x for any (e → x) occurring in P }.
As usual, the expression (e → x) in (e → x).P ′ is called a prefix. In a single action a process can read and update a set of state variables. This is done by process (e → x).P that first evaluates the sequence of expressions e in the current data state and then assigns the results to sequence of variables x, which are stored in the data state and evolve according to  Table 1. Process Semantics the environment evolution E (Definition 3.9 below) before being available at next step, when the process will behave as P . We use √ to denote the empty prefix, i.e., the prefix in which the sequences e and x are empty, meaning that, in the current time step, the process does not read or update variables. Process if [e] P 1 else P 2 behaves either as P 1 , if e evaluates to 1, or as P 2 , if e evaluates to 0. Then, i∈I p i · P i is the generative probabilistic choice: it has probability p i to behave as P i . We may write n i=1 p i · P i as p 1 · P 1 + · · · + p n · P n . The generative probabilistic interleaving construct P 1 ∥ p P 2 lets the two argument processes to interleave their actions, where at each step the first process moves with probability p and the second with probability (1 − p). We also provide a synchronous parallel composition construct P 1 | P 2 that lets the two arguments perform their actions in a synchronous fashion, provided P 1 and P 2 do not modify the same variables. Finally, process variable A allows us to specify recursive behaviours by equations of the form A def = P . To avoid Zeno behaviours we assume that all occurrences of process variables appear guarded by prefixing constructs in P . We assume the standard notion of free and bound process variable and we only consider closed process terms, that is terms without free variables.
Actions performed by a process can be abstracted in terms of the effects they have on the data state, i.e., via substitutions of the form θ = [x i 1 ← v i 1 , . . . , x i k ← v i k ], also denoted by x ← v for x = x i 1 , . . . , x i k and v = v i 1 , . . . , v i k . Since in Definition 3.6 process operations on data and expressions are assumed to be measurable, we can model the effects as B D -measurable functions θ : We denote by Θ the set of effects. The behaviour of a process can then be defined by means of a function pstep : P × D → Π(Θ × P) that given a process P and a data state d yields a discrete distribution over Θ × P. The distributions in Π(Θ × P) are ranged over by π, π ′ , . . . . Function pstep is formally defined in Table 1.
In detail, rule (PR1) expresses that in a single action a process can read and update a set of variables: process (e → x).P first evaluates the sequence of expressions e in the data state d and then stores the results to x. These new data will evolve according to E before being available at next step, when the process will behave as P . The evaluation of an expression e in d, denoted by e d , is obtained by replacing each variable x in e with d(x) and by evaluating the resulting ground expression. Then, process (e → x).P evolves to the Dirac's distribution on the pair (x ← e d , P ). Rule (PR2) models the behaviour of if [e] P 1 else P 2 . We assume that boolean operations, like those in the if-guard, evaluate to 1 or 0, standing respectively for ⊤ and ⊥. Hence, if [e] P 1 else P 2 behaves as P 1 if e d = 1, and it behaves as P 2 otherwise. Rule (PR3) follows the generative approach to probabilistic choice: the behaviour of process P i is selected with probability p i . Rule (PR4) states that process P 1 ∥ p P 2 interleaves the moves by P 1 and P 2 , where at each step P 1 moves with probability p and P 2 with probability (1 − p). Given π ∈ Π(Θ × P), we let π ∥ p P (resp. P ∥ p π) denote the probability distribution π ′ over (Θ × P) such that: π ′ (θ, P ′ ) = π(θ, P ′′ ), whenever P ′ = P ′′ ∥ p P (resp. P ′ = P ∥ p P ′′ ), and 0, otherwise. Rule (PR5) gives us the semantics of the synchronous parallel composition: for π 1 , π 2 ∈ Π(Θ × P), we let π 1 | π 2 denote the probability distribution defined, for all θ i ∈ Θ and P i ∈ P, i = 1, 2, by: (π 1 | π 2 )(θ 1 θ 2 , P 1 | P 2 ) = π 1 (θ 1 , P 1 ) · π 2 (θ 2 , P 2 ) where the effect θ 1 θ 2 is given by the concatenation of the effects θ 1 and θ 2 . Notice that the concatenation is well-defined and it does not introduce contrasting effects on variables since we are assuming that var(P 1 ) ∩ var(P 2 ) = ∅. Finally, rule (PR6) states that A behaves like P whenever A def = P . We can observe that pstep returns a discrete distribution over Θ × P.
Proof. The proof follows by a simple induction on the construction of pstep(P, d).
Example 3.8. We proceed to define the program P Tanks responsible for controlling the two pumps in order to achieve (and maintain) the desired level of water in the three tanks. Intuitively, P Tanks will consist of a process P in controlling the flow rate through the pump attached to the first tank, and of a process P out controlling the flow rate through the outlet pump. The behaviour of P in and P out can be obtained as a straightforward translation of, respectively, Equation (3.3) and Equation (3.4) into our simple process calculus. Formally: However, for P Tanks to work properly, we need to allow P in and P out to execute simultaneously. Since the effects of read/update operators in P in and P out are applied to distinct variables (respectively, q 1 and q 3 ), we can obtain the desired execution by defining P Tanks as the synchronous parallel composition of P in and P out : 3.3. Modelling the environment. To model the action of the environment on data we use a mapping E, called environment evolution, taking a data state to a distribution over data states.
Definition 3.9 (Environment evolution). An environment evolution is a function E : We notice that, due to the interaction with the program, in our model the probability induced by E at the next time step depends only on the current state of the data space. It is then natural to assume that the behaviour of the environment is modelled as a discrete time Markov process, and to define function E we only need the initial distribution on D (which in our setting will be the Dirac's distribution on a data state in D) and the random vector X = [X 1 , . . . , X n ] in which each random variable X i models the changes induced by the environment on the value of variable x i .
Since the environment is an ensemble of physical phenomena, it is represented by a system of equations of the form x ′ = f (x), for x ∈ Var, and thus with the obvious syntax. Hence, for simplicity, we do not explicitly present a syntactic definition of the environment, but we model it directly as a function over data spaces. The following example shows how the environment evolution is obtained from the system of equations.
Example 3.10. To conclude the encoding of the three-tanks experiment from Example 3.2 into our model, we need to define a suitable environment evolution. This can be derived from Equations (3.5), (3.1) (or (3.2)), and (3.7) in the obvious way. For sake of completeness, below we present functions E 1 and E 2 modelling the evolution of the environment, respectively, in scenario 1 (with Equation (3.1)) and scenario 2 (with Equation (3.2)). We remark that: • The only stochastic variable in our setting is q 2 , giving thus that the distribution induced by E i is determined by the distribution of q 2 , for i = 1, 2. • The values of q 1 and q 3 are modified by P in and P out , respectively. In particular, according to Definition 3.13, this means that the environment does not affect the values of q 1 and q 3 and, moreover, when we evaluate E i (d), the effects of the two process over q 1 and q 3 have already been taken into account in d, for i = 1, 2. Define f 3T : [0, q max ] × D 3T → D 3T as the function such that, for any x ∈ [0, q max ] and data state d, gives us the data state f 3T (x, d) = d ′ obtained from the solution of the following system of equations: where a, a 12 , a 23 , g, ∆τ are constants. Then, consider the random variables X ∼ N (q med , ∆ q ) and Y ∼ N (0, 1). From them, we define the random variables Q 1 Then, for i = 1, 2 and for each data state d, the distribution E i (d) is obtained as the distribution of Q i 2 .
3.4. Modelling system's behaviour. We use the notion of configuration to model the state of the system at each time step.
Definition 3.11 (Configuration). A configuration is a triple c = ⟨P, d⟩ E , where P is a process, d is a data state and E is an environment evolution. We denote by C P,D,E the set of configurations defined over P, D and E.
When no confusion shall arise, we shall write C in place of C P,D,E . Let (P, Σ P ) be the measurable space of processes, where Σ P is the power set of P, and (D, B D ) be the measurable space of data states. As E is fixed, we can identify C with P × D and equip it with the product σ-algebra Notation 3.12. For µ P ∈ Π(P, Σ P ) and Our aim is to express the behaviour of a system in terms of the changes on data. We start with the one-step behaviour of a configuration, in which we combine the effects on the data state induced by the activity of the process (given by pstep) and the subsequent action by the environment. Formally, we define a function cstep that, given a configuration c, yields a distribution on (C, Σ C ) (Definition 3.13 below). Then, we use cstep to define the multi-step behaviour of configuration c as a sequence S C c,0 , S C c,1 , . . . of distributions on (C, Σ C ). To this end, we show that cstep is a Markov kernel (Proposition 3.15 below). Finally, to abstract from processes and focus only on data, from the sequence S C c,0 , S C c,1 , . . ., we obtain a sequence of distributions S D c,0 , S D c,1 , . . . on (D, B D ) called the evolution sequence of the system (Definition 3.18 below).
Proposition 3.14 (Properties of one-step configuration semantics). Assume any configuration ⟨P, d⟩ E ∈ C. Then cstep(⟨P, d⟩ E ) is a distribution on (C, Σ C ).
Proof. Being d a data state in D and θ an effect, θ(d) is a data state. Then, E(θ(d)) is a distribution on (D, B D ), thus implying that ⟨P ′ , E(θ(d))⟩ E is a distribution on (C, Σ C ) (see Notation 3.12). Finally, by Proposition 3.7 we get that cstep(⟨P, d⟩ E ) is a convex combination of distributions on (C, Σ C ) whose weights sum up to 1, which gives the thesis. Since cstep(c) ∈ Π(C, Σ C ) for each c ∈ C, we can rewrite cstep : C × Σ C → [0, 1], so that for each configuration c ∈ C and measurable set C ∈ Σ C , cstep(c)(C) denotes the probability of reaching in one step a configuration in C starting from c. We can prove that cstep is the Markov kernel of the Markov process modelling our system. This follows by Proposition 3.14 and by proving that for each C ∈ Σ C , the mapping c → cstep(c)(C) is Σ C -measurable.
Proposition 3.15. The function cstep is a Markov kernel.
Proof. We need to show that cstep satisfies the two properties of Markov kernels, namely (1) For each configuration c ∈ C, the mapping C → cstep(c)(C) is a distribution on (C, Σ C ).
Let us focus on item 2. By Definition 3.13, for each ⟨P, d⟩ E ∈ C we have that As, by definition, each θ ∈ Θ and E are B D -measurable functions, we can infer that also their composition E(θ(·)) is a B D -measurable function. Since, moreover, Σ C is the (smallest) σ-algebra generated by Σ P × B D and every subset of P is a measurable set in Σ P , we can also infer that ⟨(·), E(θ(·))⟩ E is a Σ C -measurable function. Finally, we recall that by Proposition 3.7 we have that supp(pstep(P, d)) is finite. Therefore, we can conclude that cstep is a Σ C -measurable function as linear combination of a finite collection of Σ C -measurable functions (see, e.g., [Roy88, Chapter 3.5, Proposition 19]).
Hence, the multi-step behaviour of configuration c can be defined as a time homogeneous Markov process having cstep as Markov kernel and δ c as initial distribution.
Definition 3.16 (Multi-step semantics of configurations). Let c ∈ C be a configuration. The multi-step behaviour of c is the sequence of distributions S C c,0 , S C c,1 , . . . on (C, Σ C ) defined inductively as follows: We can prove that S C c,0 , S C c,1 , . . . are well defined, namely they are distributions on (C, Σ C ). Proposition 3.17 (Properties of configuration multi-step semantics). For any c ∈ C, all S C c,0 , S C c,1 , . . . are distributions on (C, Σ C ). Proof. The proof follows by induction. The base case for S C c,0 is immediate. The inductive step follows by Proposition 3.15.
As the program-environment interplay can be observed only in the changes they induce on the data states, we define the evolution sequence of a configuration as the sequence of distributions over data states that are reached by it, step-by-step.

Towards a metric for systems
As outlined in the Introduction, we aim at defining a distance over the systems described in the previous section, called the evolution metric, allowing us to do the following: (1) Measure how well a program is fulfilling its tasks.
(2) Establish whether one program behaves better than another one in an environment.
(3) Compare the interactions of a program with different environments. As we will see, these three objectives can be naturally obtained thanks to the possibility of modelling the program in isolation from the environment typical of our model, and to our purely data-driven system semantics. Intuitively, since the behaviour of a system is entirely described by its evolution sequence, the evolution metric m will indeed be defined as a distance on the evolution sequences of systems. However, in order to obtain the proper technical definition of m, some considerations are due. Firstly, we notice that in most applications, not only the program-environment interplay, but also the tasks of the program can be expressed in a purely data-driven fashion. For instance, if we are monitoring a particular feature of the system then we can identify a (time-dependent) specific value of interest of a state variable, like an upper bound on the energy consumption at a given time step. Similarly, if we aim at controlling several interacting features or there is a range of parameters that can be considered acceptable, like in the case of our running example, then we can specify a (time-dependent) set of values of interest. At any time step, any difference between them and the data actually obtained can be interpreted as a flaw in systems behaviour. We use a penalty function ρ to quantify these differences. From the penalty function we can obtain a distance on data states, namely a 1-bounded hemimetric m D expressing how much a data state d 2 is worse than a data state d 1 according to parameters of interests. The idea is that if d 1 is obtained by the interaction of P 1 with E and d 2 is obtained from the one of P 2 with E, d 2 being worse than d 1 means that P 2 is performing worse than P 1 in this case. Similarly, if d 1 and d 2 are obtained from the interaction of P with E 1 and E 2 , respectively, we can express whether in this particular configuration and time step P behaves worse in one environment with respect to the other.
Secondly, we recall that the evolution sequence of a system consists in a sequence of distributions over data states. Hence, we use the Wasserstein metric to lift m D to a distance W(m D ) over distributions over data states. Informally, the Wasserstein metric gives the expected value of the ground distance between the elements in the support of the distributions. Thus, in our case, the lifting expresses how much worse a configuration is expected to behave with respect to another one at a given time.
Finally, we need to lift W(m D ) to a distance on the entire evolution sequences of systems. For our purposes, a reasonable choice is to take the maximum over time of the pointwise (with respect to time) Wasserstein distances (see Remark 4.7 below for further details on this choice). Actually, to favour computational tractability, our metric will not be evaluated on all the distributions in the evolution sequences, but only on those that are reached at certain time steps, called the observation times (OT).
We devote the remaining of this section to a formalisation of the intuitions above.
4.1. A metric on data states. We start by proposing a metric on data states, seen as static components in isolation from processes and environment. To this end, we introduce a penalty function ρ : D → their desired ones (hence ρ(d) = 0 if d respects all the parameters). Since sometimes the parameters can be time-dependent, we also introduce a time-dependent version of ρ: at any time step τ , the τ -penalty function ρ τ compares the data states with respect to the values of the parameters expected at time τ . When the value of the penalty function is independent from the time step, we simply omit the subscript τ .
Example 4.1. A requirement on the behaviour of the three-tanks system from Example 3.2 can be that the level of water l i should be at the level l goal , for i = 1, 2, 3. This can be easily rendered in terms of the penalty functions ρ l i , for i = 1, 2, 3, defined as the normalised distance between the current level of water d(l i ) and l goal , namely: Clearly, we can also consider as a requirement that all the l i are at level l goal . The penalty function thus becomes The ones proposed above are just a few simple examples of requirements, and related penalty functions, for the three-tanks experiment. Yet, in their simplicity, they will allow us to showcase a number of different applications of our framework while remaining in a setting where the reader can easily foresee the expected results.
A formal definition of the penalty function is beyond the purpose of this paper, also due to its context-dependent nature. Besides, notice that we can assume that ρ already includes some tolerances with respect to the exact values of the parameters in its evaluation, and thus we do not consider them. The (timed ) metric on data states is then defined as the asymmetric difference between the penalties assigned to them by the penalty function.
, the penalty assigned to d 2 is higher than that assigned to d 1 . For this reason, we say that m D ρ,τ (d 1 , d 2 ) expresses how worse d 2 is than d 1 with respect to the objectives of the system. It is not hard to see that for all , thus ensuring that m D ρ,τ is a 1-bounded hemimetric. When no confusion shall arise, we shall drop the ρ, τ subscripts. We remark that penalty functions allow us to define the distance between two data states, which are elements of R n , in terms of a distance on R. As we discuss in Section 5, this feature significantly lowers the complexity of the evaluation the evolution metric.

4.2.
Lifting m D to distributions. The second step to obtain the evolution metric consists in lifting m D to a metric on distributions on data states. In the literature, we can find a wealth of notions of distances on distributions (see [RKSF13] for a survey). For our purposes, the most suitable one is the Wasserstein metric [Vas69]. According to Definition 2.2 (given in Section 2), for any two distributions µ, ν on (D, B D ), the Wasserstein lifting of m D to a distance between µ and ν is defined by where W(µ, ν) is the set of the couplings of µ and ν, namely the set of joint distributions w over the product space (D × D, B D ⊗ B D ) having µ and ν as left and right marginal.
4.3. The evolution metric. We now need to lift W(m D ) to a distance on evolution sequences. To this end, we observe that the evolution sequence of a configuration includes the distributions over data states induced after each computation step. However, it could be the case that the changes on data induced by the environment can be appreciated only along wider time intervals. To deal with this kind of situations, we introduce the notion of observation times, namely a discrete set OT of time steps at which the modifications induced by the program-environment interplay give us useful information on the evolution of the system (with respect to the considered objectives). Hence, a comparison of the evolution sequences based on the differences in the distributions reached at the times in OT can be considered meaningful. Moreover, considering only the differences at the observation times will favour the computational tractability of the evolution metric.
To compare evolution sequences, we propose a sort of weighted infinity norm of the tuple of the Wasserstein distances between the distributions in them. As weight we consider a non-increasing function λ : OT → (0, 1] allowing us to express how much the distance at time τ , namely W(m D ρ,τ )(S D c 1 ,τ , S D c 2 ,τ ), affects the overall distance between configurations c 1 and c 2 . The idea is that, in certain application contexts, the differences in the behaviour of systems that are detected after wide time intervals, can be less relevant than the differences in the initial behaviour. Hence, we can use the weight λ(τ ) to mitigate the distance of events at time τ . Following the terminology introduced in the context of behavioural metrics [dAHM03,DGJP04], we refer to λ as to the discount function, and to λ(τ ) as to the discount factor at time τ . Clearly, the constant function λ(τ ) = 1, for all τ ∈ OT, means that no discount is applied.
Definition 4.5 (Evolution metric). Assume a set OT of observation times and a discount function λ. For each τ ∈ OT, let ρ τ be a τ -penalty function and let m D ρ,τ be the τ -metric on data states defined on it. Then, the λ-evolution metric over ρ and OT is the mapping m λ ρ,OT : C × C → [0, 1] defined, for all configurations c 1 , c 2 ∈ C, by Since W(m D ρ,τ ) is a 1-bounded hemimetric (Proposition 4.4) we can easily derive the same property for m λ ρ,OT . Proposition 4.6. Function m λ ρ,OT is a 1-bounded hemimetric on C. Notice that if λ is a strictly non-increasing function, then to obtain upper bounds on the evolution metric only a finite number of observations is needed.
Remark 4.7. Usually, due to the presence of uncertainties, the behaviour of a system can be considered acceptable even if it differs from its intended one up-to a certain tolerance. Similarly, the properties of adaptability, reliability, and robustness that we aim to study will check whether a program is able to perform well in a perturbed environment up-to a given tolerance. In this setting, the choice of defining the evolution metric as the pointwise maximal distance in time between the evolution sequences of systems is natural and reasonable: if in the worst case (the maximal distance) the program keeps the parameters of interest within the given tolerance, then its entire behaviour can be considered acceptable. However, with this approach we have that a program is only as good as its worst performance, and one could argue that there are application contexts in which our evolution metric would be less meaningful. For these reasons, we remark that we could have given a parametric version of Definition 4.5 and defining the evolution metric in terms of a generic aggregation function f over the tuple of Wasserstein distances. Then, one could choose the best instance for f according to the chosen application context. The use of a parametric definition would have not affected the technical development of our paper. However, in order to keep the notation and presentation as simple as possible, we opted to define m λ ρ,OT directly in the weighted infinity norm form. A similar reasoning applies to the definition of the penalty function that we gave in Example 4.1, and to the definition of the metric over data states (Definition 4.2).

Estimating the evolution metric
In this section we show how the evolution metric can be estimated via statistical techniques. Firstly, in Section 5.1 we show how we can estimate the evolution sequence of a given configuration c. Then, in Section 5.2 we evaluate the distance between two configurations c 1 and c 2 on their estimated evolution sequences. 5.1. Computing empirical evolution sequences. Given a configuration ⟨P, d⟩ E and an integer k we can use the function Simulate, defined in Figure 2, to sample a sequence of configurations of the form This sequence represents k-steps of a computation starting from ⟨P, d⟩ E = ⟨P 0 , d 0 ⟩ E . Each step of the sequence is computed by using function SimStep, also defined in Figure 2. There we let rand be a function that allows us to get a uniform random number in (0, 1] while sample(E, d) is used to sample a data state in the distribution E(d). We assume that for any measurable set of data states D ∈ B D the probability of sample(E, d) to be in D is equal to the probability of D induced by E(d), namely: (5.1) Example 5.1. If we consider the environment evolution from Examples 3.2 and 3.10, function sample(E, d) would sample a random value x for q 2 from the distribution chosen for it (according to the considered scenario), and then solve the system of equations identified by f 3T (x, d).  We can then extend this property to configurations and function SimStep: Lemma 5.2. For any configuration ⟨P, d⟩ E ∈ C, and for any measurable set C ∈ Σ C : Proof. To prove the thesis it is enough to show that the property holds on the generators of the σ-algebra Σ C . Hence, assume that C = ⟨P, D⟩ E for some P ∈ Σ P and D ∈ B D . Then we have where: • The first step follows by the definition of cstep (Definition 3.13).
• The second step follows by pstep(P, d) = p 1 · (θ 1 , P 1 ) + · · · + p n · (θ n , P n ), for some p i , θ i and P i (see Proposition 3.7). • The third and fourth steps follow by the definition of the distribution ⟨µ P , µ D ⟩ E (Notation 3.12), in which 1 P denotes the characteristic function of the set P, i.e., 1 P {P ′ } = 1 if P ′ ∈ P and 1 P {P ′ } = 0 otherwise. • The fifth step follows by the assumption on the function Sample( ) in Equation (5.1).
• The last step follows by the definition of function SimpStep( ). return E 0 , . . . , E k 10: end function To compute the empirical evolution sequence of a configuration c the function Estimate in Figure 3 can be used.
The function Estimate(c, k, N ) invokes N times the function Simulate in Figure  The algorithms in Figures 2 and 3 have been implemented in Python to support analysis of systems. The tool is available at https://github.com/quasylab/spear. Example 5.3. A simulation of the three-tanks laboratory experiment is given in Figure 4, with the following setting: (i) l min = 0, (ii) l max = 20, (iii) l goal = 10, (iv) ∆ l = 0.5, (v) q max = 6, (vi) q step = q max /5, (vii) ∆ q = 0.5, (viii) ∆τ = 0.1. Moreover, we assume an initial data state d 0 with l i = l min and q i = 0, for i = 1, 2, 3. The plot on the left hand side of Figure 4 depicts a simulation run of the system s 1 , namely the system in which the environment evolution follows scenario 1 and having c 1 = ⟨P Tanks , d 0 ⟩ E 1 as initial configuration. On the right hand side we have the corresponding plot for system s 2 , whose environment evolution follows scenario 2 and the initial configuration is thus c 2 = ⟨P Tanks , d 0 ⟩ E 2 .  In Figure 5 we consider the same simulations, but we compare the variations of the level of water, in time, in the same tank in the two scenarios. This can help us to highlight the potential differences in the evolution of the system with respect to the two environments.
In Figure 6 we give an estimation of the distributions of l 1 , l 2 , l 3 obtained from our simulations of s 1 and s 2 . In detail, the three plots in the top row report the comparison of the distributions over l 1 obtained in the two scenarios at time step, respectively, 50, 100 and 150 when using N = 1000 samples. The three plots in the central row are the corresponding ones for l 2 , and in the bottom row we have the plots related to l 3 . In both scenarios, we obtain Gaussian-like distributions, as expected given the probabilistic behaviour of the environment defined in Examples 3.2 and 3.10. However, we observe that while in both cases the mean of the distributions is close to l goal = 10, there is a significant difference in their variance: the variance of the distributions obtained in scenario 1 is generally smaller than that of those in scenario 2 (the curves for scenario 1 are "slimmer" than those for scenario 2). We will see how this disparity is captured by the evolution metric.
As usual in the related literature, we can apply the classic standard error approach to analyse the approximation error of our statistical estimation of the evolution sequences. Briefly, we let x ∈ {l 1 , l 2 , l 3 } and we focus on the distribution of the means of our samples (in particular we consider the cases N = 1000, 5000, 10000) for each variable x. In each case, for each i ∈ {0, . . . , k} we compute the mean E i (x) = 1 (see Figure 7b for the variation in time of the standard error for the distribution of x = l 3 ). Finally, we proceed to compute the z-score of our sampled distribution as follows: , where E(x) is the mean (or expected value) of the real distribution over x. In Figure 7c we report the variation in time of the z-score of the distribution over x = l 3 : the dashed red lines correspond to z = ±1.96, namely the value of the z-score corresponding to a confidence interval of the 95%. We can see that our results can already be given with a 95% confidence in the case of N = 1000. Please notice that the oscillation in time of the values of the z-scores is due to the perturbations introduced by the environment in the simulations and by the natural oscillation in the interval [l goal − ∆ l , l goal + ∆ l ] of the water levels in the considered experiment (see Figure 4). A similar analysis, with analogous results, can be carried out for the distributions of l 1 and l 2 . In Figure 7d we report the variation in time of the z-scores of the distributions of the three variables, in the case N = 1000.

5.2.
Computing distance between two configurations. Function Estimate allows us to collect independent samples at each time step i from 0 to a deadline k. These samples can be used to estimate the distance between two configurations c 1 and c 2 . Following an approach similar to the one presented in [TK10], to estimate the Wasserstein distance W(m D ρ,i ) between two (unknown) distributions S D c 1 ,i and S D c 2 ,i we can use N independent samples {c 1 1 , . . . , c N 1 } taken from S C c 1 ,i and ℓ · N independent samples {c 1 2 , . . . , c ℓ·N 2 } taken  from S C c 2 ,i . After that, we exploit the i-penalty function ρ i and we consider the two sequences of values ω 1 , . . . , ω N and ν 1 , . . . , ν ℓ·N defined as follows: We can assume, without loss of generality that these sequences are ordered, i.e., ω j ≤ ω j+1 and ν h ≤ ν h+1 . The value W(m D ρ,i )(S D c 1 ,i , S D c 2 ,i ) can be approximated as 1 ℓN ℓN h=1 max{ν h − ω ⌈ h ℓ ⌉ , 0}. The next theorem, based on results in [TK10,Vil08,Val74], ensures that the larger the number of samplings the closer the gap between the estimated value and the exact one.
Theorem 5.4. Let S C c 1 ,i , S C c 2 ,i ∈ Π(C, Σ C ) be unknown. Let {c 1 1 , . . . , c N 1 } be independent samples taken from S C c 1 ,i , and {c 1 2 , . . . , c ℓ·N 2 } independent samples taken from S C c 2 ,i . Let } be the ordered sequences obtained from the samples and the i-penalty function. Then, the Wasserstein distance between S D c 1 ,i and S D c 2 ,i is equal, a.s., to Proof. Let c j 1 = ⟨P j 1 , d j 1 ⟩ E , for all j = 1, . . . , N , and c j 2 = ⟨P j 2 , d j 2 ⟩ E , for all j = 1, . . . , ℓ · N . We split the proof into two parts showing respectively: We recall that the sequence {Ŝ D,N c l ,i } converges weakly to S D c l ,i for l ∈ {1, 2} (see Equation (5.2)). Moreover, we can prove that these sequences converge weakly in Π(D, B D ) in the sense of [Vil08, Definition 6.8]. In fact, given the i-ranking function ρ i , the existence of a data stated such that ρ i (d) = 0 is guaranteed (remember that the constraints used to define ρ i are on the possible values of state variables and a data state fulfilling all the requirements is assigned value 0). Thus, for any d ∈ D we have that . Since, moreover, by definition ρ i is continuous and bounded, the weak convergence of the distributions gives and thus Definition 6.8.(i) of [Vil08] is satisfied. As D is a Polish space, by [Vil08, Theorem 6.9] we obtain that • Proof of Equation (5.4). For this part of the proof we follow [TK10]. Since the ranking function is continuous, it is in particular B D measurable and therefore for any distribution µ on (D, B D ) we obtain that is a well defined cumulative distribution function. In particular, for µ =Ŝ D,N c 1 ,i we have that Since, moreover, we can always assume that the values ρ i (d j 1 ) are sorted, so that ρ i (d j i ) ≤ ρ i (d j+1 1 ) for each j = 1, . . . , N − 1, we can express the counter image of the cumulative distribution function as Then, by [FR18, Proposition 6.2], for each N we have that end for 10: return m 11: end function 1: function ComputeW(E 1 , E 2 , ρ) 2: . Moreover, both functions are constant on each the interval so that the value of the integral is given by the difference multiplied by the length of the interval: By substituting the last equality into Equation (5.3) we obtain the thesis.
The procedure outlined above is realised by functions Distance and ComputeW in Figure 8. The former takes as input the two configurations to compare, the penalty function (seen as the sequence of the i-penalty functions), the discount function λ, the set OT of observation times which we assume to be bounded, and the parameters N and ℓ used to obtain the samplings of computation. Function Distance collects the samples E i of possible computations during the observation period [0, max OT ], where max OT denotes the last observation time.
Then, for each observation time i ∈ OT, the distance at time i is computed via the function ComputeW(E 1,i , E 2,i , ρ i ).
Since the penalty function allows us to reduce the evaluation of the Wasserstein distance in R n to its evaluation on R, due to the sorting of {ν h | h ∈ [1, . . . , ℓN ]} the complexity of function ComputeW is O (ℓN log(ℓN )) (cf. [TK10]). We refer the interested reader to [SFG + 21, Corollary 3.5, Equation (3.10)] for an estimation of the approximation error given by the evaluation of the Wasserstein distance over N samples.

5.3.
Analysis of the three-tanks experiment. Our aim is now to use the evolution metric, and the algorithms introduced above, to study various properties of the behaviour of the three-tanks system. In particular, we consider the following two objectives: (1) Comparison of the behaviour of two systems generated from the interaction of the same program with two different environments, under the same initial conditions. (2) Comparison of the behaviour of two systems generated from the interaction of two different programs with the same environment, under the same initial conditions.
Notice that since s 1 and s 2 are distinguished only by the environment function, a comparison of their behaviour by means of the evolution metric allows us to establish in which scenario the program P Tanks performs better with respect to the aforementioned tasks.
Firstly, we focus on a single tank by considering the target l 3 = l goal , and thus the penalty function ρ l 3 . We do not consider any discount (so λ is the constant function 1, i.e., λ(τ ) = 1 for all τ ) and we let OT = N ∩ [0, 150]. In Figure 9 we show the evaluation of m 1 ρ l 3 ,OT between s 1 and s 2 (Figure 9a) and between s 2 and s 1 (Figure 9b) over a simulation with 5000 samples (N = 1000, l = 5). Notice that the first distance is m 1 ρ l 3 ,OT (c 1 , c 2 ) ∼ 0.15, whereas the second one is m 1 ρ l 3 ,OT (c 2 , c 1 ) ∼ 0.02. The different scale of the two distances (the latter is less than 1/7 of the former) already gives us an intuition of the advantages the use of a hemimetric gives us over a (pseudo)metric in terms of expressiveness.
Consider Figure 9a. Recall that, given the definition of our hemimetric on data states, the value of m 1 ρ l 3 ,OT (c 1 , c 2 ) expresses how much the probabilistic evolution of the data in the evolution sequence of c 2 is worse than that of c 1 , with respect to the task identified by ρ 3 . In other words, the higher the value of m 1 ρ l 3 ,OT (c 1 , c 2 ), the worse the performance of c 2 with respect to c 1 . Yet, for those who are not familiar with the Wasserstein distance, the plot alone can be a little foggy, so let us shed some light. Notice that, although after the first 40 steps the distance decreases considerably, it never reaches 0. This means that at each time step, the distribution in the evolution sequence of c 2 assigns positive probability to (at least) one measurable set of data states in which l 3 is farther from l goal than in all the measurable sets of data states in the support of the distribution reached at the same time by c 1 . This is perfectly in line with our observations on the differences between the variances of the distributions in Figure 6 (cf. the bottom row pertaining l 3 ).
A natural question is then why the distance m 1 ρ l 3 ,OT (c 2 , c 1 ), given in Figure 9b, is not equal to the constant 0. This is due to the combination of the Wasserstein distance with the hemimetric m D 3T ρ l 3 . Since m D 3T ρ l 3 (d 1 , d 2 ) = 0 whenever ρ l 3 (d 2 ) < ρ l 3 (d 1 ) and the Wasserstein  lifting is defined as the infimum distance over the couplings, the (measurable sets of) data states reached by c 2 that are better, with respect to ρ l 3 , than those reached by c 1 , are "hidden" in the evaluation of m 1 ρ l 3 ,OT (c 1 , c 2 ). However, they do contribute to the evaluation of m 1 ρ l 3 ,OT (c 2 , c 1 ). Clearly, once we compute both m 1 ρ l 3 ,OT (c 1 , c 2 ) and m 1 ρ l 3 ,OT (c 2 , c 1 ), we can observe that the latter distance is less than 1/7 of the former and we can conclude thus that s 1 shows a better behaviour than s 2 .
As a final observation on Figure 9b, we notice that, for instance, in the time interval I = [50, 70] the pointwise distance between the evolution sequences is 0. This means that for each (measurable set) data state d 2 in the support of S D 3T c 2 ,τ , τ ∈ I, there is one (measurable set) data state d 1 in the support of S D 3T c 1 ,τ , such that m D 3T ρ l 3 (d 2 , d 1 ) = 0. Clearly, one can repeat a similar analysis for the other tasks of the system. To give a general idea of the difference in the behaviour of the system in the two scenarios, in Figure 10 we report the evaluation of the pointwise distance between the evolution sequences of s 1 and s 2 (Figure 10a), and viceversa (Figure 10b). 5.3.2. Different programs, same environment. We introduce two new programs: P + Tanks and P − Tanks . They are defined exactly as P Tanks , the only difference being the value of ∆ l used in the if/else guards in P in and P out . For P Tanks we used ∆ l = 0.5; we use ∆ + l = 0.7 for P + Tanks , and ∆ − l = 0.3 for P − Tanks . Our aim is to compare the behaviour of s 1 with that of systems s + and s − having as initial configurations, respectively, c Notice that the three systems are characterised by the same environment evolution E 1 and the same initial data state d 0 . Moreover, as outlined above, we see the value of ∆ l as an inherent property of the program.
So, while the outcome of the experiment will be exactly the one the reader expects, this example shows that with our framework we can compare the ability of different programs to fulfil their objectives when operating on equal footing, thus offering us the possibility of choosing the most suitable one according to (possibly) external factors. For instance, we can imagine that while ∆ − l allows for a finer control, it also causes more frequent modifications to the flow rate q 1 than ∆ l does. These may in turn entail a higher risk of a breakdown of the pump attached to the first tank. In this case, we can use the evolution metric to  Figure 11. Evaluation of the distances between P − Tanks , P Tanks , P + Tanks with respect to ρ l3 . decide whether the difference in the performance of the programs justifies the higher risk, or if P Tanks is still "good enough" for our purposes.
In Figure 11 we present the results of the evaluations of m 1 ρ l 3 ,OT (c − , c 1 ), m 1 ρ l 3 ,OT (c 1 , c + ), and m 1 ρ l 3 ,OT (c − , c + ), where the target of the programs is l 3 = l goal (i.e., the penalty function ρ l 3 is considered). In detail, in Figure 11a we report the values of the three evolution metrics (see Figure 9 for an explanation of how to read the graph), while in Figure 11b we report the pointwise (with respect to time) Wasserstein distances between the distributions in the evolution sequences of the systems.

Robustness of programs
We devote this section to the analysis of the robustness of programs with respect to a data state and an environment. We also introduce two sensitivity properties of programs, which we call adaptability and reliability, entailing the ability of the program to induce a similar behaviour in systems that start operating from similar initial conditions. 6.1. Robustness. A system is said to be robust if it is able to function correctly even in the presence of uncertainties. In our setting, we formalise the robustness of programs by measuring their capability to tolerate perturbations in the environmental conditions.
First of all, we characterise the perturbations with respect to which we want the program to be robust. Given a program P , a data state d, and an environment E, we consider all data states d ′ such that the behaviour of the original configuration ⟨P, d⟩ E is at distance at most η 1 , for a chosen η 1 ∈ [0, 1), from the behaviour of ⟨P, d ′ ⟩ E . The distance is evaluated as the evolution metric with respect to a chosen penalty function ρ and a time interval I. The idea is that ρ and I can be used to characterise any particular situation we might want to study. For instance, if we think of a drone autonomously setting its trajectory, then I can be the time window within which a gust of wind occurs, and ρ can capture the distance from the intended trajectory.
We can then proceed to measure the robustness. Let ρ ′ be the penalty function related to the (principal) target of the system, and letτ be an observable time instant. We evaluate the evolution metric, with respect to ρ ′ , between the behaviour of ⟨P, d⟩ E shown after timeτ , and that of ⟨P, d ′ ⟩ E for all the perturbations d ′ obtained above. We say that the robustness of P with respect to d and E is η 2 ∈ [0, 1), if η 2 is an upper bound for all those distances. Notice that we use two penalty functions, one to identify the perturbations (ρ), and one to measure the robustness (ρ ′ ), which can be potentially related to different targets of the system. For instance, in the aforementioned case of the drone, ρ ′ can be related to the distance from the final location, or to battery consumption. This allows for an analysis of programs robustness in a general setting, where the perturbations and the (long term) system behaviour take into account different sets of variables. Moreover, the time instantτ plays the role of a time bonus given to the program to counter the effects of perturbations: differences detectable within timeτ are not considered in the evaluation of the robustness threshold.
Definition 6.1 formalises the intuitions given above.
Definition 6.1 (Robustness). Let ρ, ρ ′ be two penalty functions, I a time interval,τ ∈ OT, and η 1 , η 2 ∈ [0, 1). We say that P is (ρ, ρ ′ , I,τ , η 1 , η 2 )-robust with respect to the data state d and the environment evolution E if We can use our algorithm to measure the robustness of a given program. Given a configuration ⟨P, d⟩ E , a set OT of observation times, a given threshold η 1 ≥ 0, a penalty function ρ, and a time interval I, we can sample M variations {d 1 , . . . , d M } of d, such that for any i ∈ {1, . . . , M }, m λ ρ,OT∩I (d, d i ) ≤ η 1 . Then, for each sampled data state d i we can estimate the distance between c = ⟨P, d⟩ E and c i = ⟨P, d i ⟩ E , with respect to ρ ′ , after timeτ , namely m λ ρ ′ ,{τ ∈OT|τ ≥τ } (c, c i ) for anyτ ∈ OT. Finally, for eachτ ∈ OT, we let For the chosen η 1 , ρ, I, ρ ′ andτ , each ξτ gives us a lower bound to η 2 , and thus the robustness of the program. Example 6.2. We can use our definition of robustness to show that whenever the program is able to (initially) keep the difference with respect to the target l 3 = l goal bounded, then also the distance with respect to the target l 1 = l goal = l 2 will be bounded. Formally, we instantiate the parameters of Definition 6.1 as follows: ρ = ρ l 3 , ρ ′ = 1 2 ρ l 1 + 1 2 ρ l 2 , I = [0, 50], and η 1 = 0.3. Figure 12 reports the evaluation of ξ 50 with respect to M = 100 variations d ′ satisfying m 1 ρ,OT∩I (⟨P Tanks , d s ⟩ E 1 , ⟨P Tanks , d ′ ⟩ E 1 ) ≤ 0.3.
6.2. Adaptability and reliability. We define the sensitivity properties of adaptability and reliability as particular instances of the notion of robustness. Briefly, adaptability considers only one penalty function, and reduces the time interval I to the sole initial time step 0. Then, reliability also fixesτ = 0. Let us present them in detail. The notion of adaptability imposes some constraints on the long term behaviour of systems, disregarding their possible initial dissimilarities. Given the thresholds η 1 , η 2 ∈ [0, 1) and an observable timeτ , we say that a program P is adaptable with respect to a data state d and an environment evolution E if whenever P starts its computation from a data state d ′ that differs from d for at most η 1 , then we are guaranteed that the distance between the evolution sequences of the two systems after timeτ is bounded by η 2 . Definition 6.3 (Adaptability). Let ρ be a penalty function,τ ∈ OT and η 1 , η 2 ∈ [0, 1). We say that P is (τ , η 1 , η 2 )-adaptable with respect to the data state d and the environment evolution We remark that one can always consider the data state d as the ideal model of the world used for the specification of P , and the data state d ′ as the real world in which P has to execute. Hence, the idea behind adaptability is that even if the initial behaviour of the two systems is quite different, P is able to reduce the gap between the real evolution and the desired one within the time thresholdτ . Notice that being (τ , η 1 , η 2 )-adaptable for τ = min{τ | τ ∈ OT} is equivalent to being (τ, η 1 , η 2 )-adaptable for all τ ∈ OT.
The notion of reliability strengthens that of adaptability by bounding the distance on the evolution sequences from the beginning. A program is reliable if it guarantees that small variations in the initial conditions cause only bounded variations in its evolution.
Definition 6.4 (Reliability). Let ρ be a penalty function and η 1 , η 2 ∈ [0, 1). We say that P is (η 1 , η 2 )-reliable with respect to the data state d and the environment evolution The same strategy depicted above for the evaluation of the robustness can be applied also in the case of adaptability and reliability. Briefly, we simulate the behaviour of M variations satisfying the requirement on the initial distance between data states m D ρ,0 (d, d i ) ≤ η 1 , and use the bounds ξτ to estimate the adaptability bound η 2 . Similarly, for τ min = min OT τ , ξ τ min gives a lower bound for its reliability.
Example 6.5. We provide an example of the evaluation of the adaptability of P Tanks . We consider the environment evolution E 1 , i.e., the one corresponding to scenario 1, and, as initial data state, the data state d s such that d s (q i ) = 0 and d s (l i ) = 5, for i = 1, 2, 3. The rationale to consider d s in place of d 0 is that ρ(d 0 ) = 1 and thus m D 3T ρ (d 0 , d ′ ) = 0 for all data states d ′ , for any penalty function ρ ∈ {ρ l 1 , ρ l 2 , ρ l 3 , ρ max }. Conversely, there are several data states d ′ such that m D 3T ρ (d s , d ′ ) > 0, so that we can take into account the behaviour  of the system when starting from "worse" initial conditions. Let us stress that since the penalty functions ρ are evaluated only on (some of) the values l i , for i = 1, 2, 3, the data states d ′ that are within distance η 1 from d s can have any value in D q initially assigned to q j , j = 1, 2, 3 (and any value in D l initially assigned to the l j that are possibly not taken into account in the evaluation of ρ).
To obtain the results reported in Figure 13 we considered, for each ρ ∈ {ρ l 1 , ρ l 2 , ρ l 3 , ρ max }, M = 100 variations of the data state d s within the (closed) ball of radius η 1 = 0.3 with respect to m D 3T ρ , and performed each simulation over 5000 samples (N = 1000, ℓ = 5), keeping k = 150 as time horizon. Figure 13a represents the adaptability of P Tanks with respect to d and E 1 for ρ = ρ l 3 . We can then infer that P Tanks is (30, 0.3, 0.2)-adaptable and (50, 0.3, 0.05)-adaptable with respect to d s and E 1 , when considering the target l 3 = l goal . Please notice that one should always take into consideration the approximation error related to the computation of the Wasserstein distance [SFG + 21, Corollary 3.5, Equation (3.10)] for an exact evaluation. Yet, in our presentation we are more interested in showing the method rather than making exact calculations. Figure 13b offers a comparison of the adaptability modulo the various penalty functions defined for the three-tanks system.

Case-study: engine system
In this section we analyse a case study proposed in [LMMT21] and consisting in two supervised self-coordinating refrigerated engine systems that are subject to cyber-physical attacks aiming to inflict overstress of equipment [GGI + 15].
Since the impact of the attacks can be viewed as a flaw on system behaviour, in our context it will be quantified by employing suitable penalty functions. In particular, the impact of attacks aiming to overstress the equipment can be quantified by adopting two different approaches: by simply measuring the level of equipment's stress or by comparing such a value with the alarm signals generated by the system when attempting to perform attack detection. In the former case one focuses only on damages inflicted by attacks, in the latter case one is mainly interested in false positives and false negatives raised by the detection system. In the former case we will use penalty functions quantifying the level of stress of system's equipment, in the latter case penalty functions will quantify false negatives and false positives that are produced when attempting to detect attacks and to raise alarm events. In both cases, our simulations will allow us to estimate the distance between the genuine system and the system under attack. Essentially, these simulations will allow us to estimate the impact of attacks, thus quantifying how worse the system under attack behaves with respect to the genuine one.
7.1. Modelling of the engine system. We start by describing a single engine. Then, we will describe the system obtained by combining two identical engines with a supervisor. Finally, we will introduce cyber-physical attacks tampering with a single engine or with the whole system. Interestingly, our choice of modelling the program and the environment separately well supports the specifications of these attacks: since the attacks consist in pieces of malicious code that are somehow injected in the logic of the system, an attack specified by a process A can be easily injected in a genuine system ⟨P, d⟩ E by composing P and A in parallel, thus giving ⟨P | A, d⟩ E . The engine we consider is a CPS whose logic aims to perform three tasks: (i) regulate the speed, (ii) maintain the temperature within a specific range by means of a cooling system, and (iii) detect anomalies. The first two tasks are on charge of a controller, the last one is took over by an intrusion detection system, henceforth IDS.
The data space consists of the following variables: i. temp, representing a sensor detecting the temperature of the engine; ii. cool , representing an actuator to turn on/off the cooling system; iii. speed , representing an actuator to regulate the engine speed, with values in the set {slow, half, full}, where half is the normal setting; iv. ch speed , representing a channel used by the IDS to command the controller to set the actuator speed at slow, when an anomaly is detected, or at half, otherwise; v. ch in and ch out, representing two channels used for communication between the two engines: when an IDS commands to the controller of its own engine to operate at slow speed, in order to compensate the consequent lack of performance it asks to the controller of the other engine to operate at full speed; vi. stress, denoting the level of stress of the engine, due to its operating conditions; vii. six variables p k , for 1 ≤ k ≤ 6, recording the temperatures in the last six time instants; viii. temp fake, representing a noise introduced by malicious activity of attackers tempering with sensor temp: we assume that the IDS can access the genuine value temp, whereas the controller receives the value of temp through an insecure channel ch temp that may be compromised and whose value is obtained by summing up the noise temp fake and temp; ix. ch warning, representing a channel used by the IDS to raise anomalies. The environment evolution E affects these variables as follows: (1) temp changes by a value determined probabilistically according to a uniform distribution (2) the variables p k , for k ∈ 1 . . . 6, are updated as expected to record the last six temperatures detected by sensor temp; (3) stress remains unchanged if the temperature was below a constant max, set to 100 in our experiments, for at least 3 of the last 6 time instants; otherwise, the variable is increased (reaching at most the value 1) by a constant stressincr that we set at 0.02; (4) all variables representing channels or actuators are not modified by E, since they are under the control of the program.
Summarising, the dynamics of p k and stress can be modelled via the following set of stochastic difference equations, with sampling time interval ∆τ = 1: and temp varies by a value that is uniformly distributed in an interval depending on the state of actuators cool and speed : if cool (τ ) = off and speed (τ ) = full. (7. 2) The whole engine, the controller and the IDS are modelled by the following processes Eng, Ctrl and IDS, respectively: .Check else (on → cool ).Cooling ((hot → ch warning), (low → ch speed ), (full → ch out)).IDS else ((ok → ch warning), (half → ch speed ), (half → ch out)).IDS where we write ((e 1 → x 1 ), . . . , (e n → x n )) to denote the prefix (e 1 . . . e n → x 1 . . . x n ). At each scan cycle Ctrl checks if the value of temperature as carried by channel ch temp is too high, namely above threshold, which is a constant ≤ max and set to 99.8 in our experiments. If this is the case, then Ctrl activates the coolant for 5 consecutive time instants, otherwise the coolant remains off. In both cases, before re-starting its scan cycle, Ctrl waits for instructions/requests to change the speed: if it receives instructions through channel ch speed from the IDS to slow down the engine then it commands so through actuator speed . Otherwise, if the IDS of the other engine requests through channel ch in to work at full of half power, then Ctrl sets speed accordingly.
The process IDS checks whether the cooling system is active when the temperature is above max. If this safety condition is violated then: (i) it raises the presence of an anomaly, via channel ch warning; (ii) it commands Ctrl to slow down the engine, via ch speed ; (iii) it requests to the other engine to run at full power, via channel ch out. Otherwise, IDS asks (both local and external) controllers to set their engines at half power.
An engine system can be built by combining two engines, the "left" and the "right" one, interacting with a supervisory component that checks their correct functioning. The composite system can be defined as where processes Eng L and Eng R are obtained from Eng by renaming the variables as follows: (i) both ch in in Eng L and ch out in Eng R are renamed as ch speed R to L; (ii) both ch in in Eng R and ch out in Eng L are renamed as ch speed L to R; (iii) all remaining variables x used in Eng are renamed as x L in Eng L and as x R in Eng R. Then, we introduce a new channel ch alarm which is used by SV to forward to the external environment the information obtained from the IDSs through ch warning L and ch warning R. In detail, ch alarm carries a value in the set {none, left, right, both} depending on the anomaly warnings received from the left and right IDS: Process Att Act X models an integrity attack on actuator cool X . The five-instants cooling cycle started by Ctrl X is maliciously interrupted as soon as temp X goes below max − AC, with AC a constant with value in [1.0, 3.0], in order to let the temperature of the engine rise quickly above max, accumulating stress in the system, and requiring continuous activation of the cooling system. We recall that the quantification of stress accumulated by engine Eng X is recorded in variable stress X , which is incremented by E when |{k | 1 ≤ k ≤ 6 ∧ p k ≥ max}| > 3. In [LMMT21] this attack is classified as stealth, since the condition temp X > max ∧ cool X = off monitored by the IDS never holds. Notice that AC is a parameter of the attack. Then, Att Sen X is an integrity attack to sensor temp X . The attack adds a negative offset −TF, with TF a positive constant, to the temperature value carried by ch temp X , namely ch temp X becomes temp X − TF. This attack aims to prevent some activation of the cooling system by Ctrl X. Since the IDS raises a warning on ch warning X each time we have max < temp X ≤ max + TF and cool X = off, this attack is not stealth. In this case, TF is a parameter of the attack. Finally, Att Saw X performs the same attack of Att Sen X but only in a precise attack window. We assume that cs is a variable that simply counts the number of computation steps (the initial value is 0 and cs(τ + 1) = cs(τ ) + 1 is added to Equation 7.1). Then, [AW l , AW r ] is an interval, where AW l = 0 = AW r in the initial data state and we assume that the variables left and right take random non-negative integer values satisfying the property 0 ≤ right − left ≤ AWML, where AWML is a parameter of the attack representing the maximal length of the attack window. 7.2. Simulation of the engine system. We have applied our simulation strategy to the engine system. Figure 14 reports the results of six simulations of the single engine Eng L, where each simulation is obtained by simulating 100 runs consisting in 10000 steps, with the following initial data state d 0 : (i) temp L = p 1 L = · · · = p 6 L = 95; (ii) cool L = off; (iii) speed L = half; (iv) stress L = 0; (v) ch speed L = ch speed L to R= ch speed R to L= half; (vi) temp fake L = 0; and (vii) ch warning L = ok. We considered three different scenarios: (i) no attack, modelled by processes Eng L, (ii) attack on actuator cool L, modelled by Eng L | Att Act L, with constant AC set to 1.8, and (iii) attack on sensor temp L, modelled by Eng L | Att Sen L, with constant TF set to 0.4. The six pictures report the value of average temperature detected by sensor temp L and the average stress level carried by variable stress L in these three scenarios.
Clearly, by comparing the plots in Figure 14 it turns out that both attacks are able to cause an average size-up of the temperature of the engine and to take the engine to the maximal level of stress after 10000 steps. 7.3. Analysis of the engine system. Our aim is to use evolution metrics in order to study the impact of the attacks. This requires to compare the evolution sequences of the genuine system and those of the system under attack. The two approaches to the quantification of impact of attacks outlined above, consisting in evaluating only damages caused by attacks or both damages and alert signals, will be realised by employing different penalty functions. We start with quantifying the impact of cyber-physical attacks by focusing on the runtime values of false negatives and false positives. Intuitively, if we focus on a single engine Eng X, false negatives and false positives represent the average effectiveness, and the average precision, respectively, of the IDS modelled by IDS X to signal through channel ch warning X that the engine is under stress. In our setting, we can add two variables fn X and fp X that quantify false negatives and false positives depending on stress X and ch warning X . Both these variables are updated by E. In detail, E acts on fn X and fp X as follows: fn X (τ + 1) = τ * fn X (τ )+max(0,stress X (τ )−ch warning X (τ )) τ +1 fp X (τ + 1) = τ * fp X (τ )+max(0,ch warning X (τ )−stress X (τ )) τ +1 where ch warning X ∈ {hot, ok} is viewed as a real by setting hot = 1 and ok = 0. In this way, as reported in [LMMT21], fn X and fp X carry the average value, in probability and time, of the difference between the value of the stress of the system and of the warning raised by the IDS. False negatives and false positives for the same simulations described in Figure 14 are reported in Figure 15.
Clearly, if we take the value of variable fn X (resp. fp X ) as the value returned by our penalty function, we can quantify how worse Eng X under attack behaves with respect to the genuine version of Eng X , where the evaluation of these systems is done by taking into account their false negatives (resp. false positives). Notice that other approaches are possible. For instance, by defining a penalty function returning a convex combination of the values of fn X and fp X , systems are evaluated by suitable weighting their false negatives and false positives. In Figure 16 we report the distances between the genuine system Eng L and the compromised systems Eng L | Att Act L and Eng L | Att Sen L in terms of false negatives and false positives. The settings for these simulations are the same of those used for the simulations described in Figure 14.
Here, false negatives and false positives represent the average effectiveness, and the average precision, respectively, of the supervisors modelled by SV to signal through channel ch alarm that the system is under stress. We remark that the values 0.7 are the values chosen in [LMMT21] to quantify the security clearance of the two engines, where, in general, the clearance of a CPS component quantifies the importance of that component from a security point of view. In this case, the two engines have the same criticality and, consequently, the same security clearance.
In Figure 17 we report the distance, in terms of false negatives and false positives, between the system Eng Sys and the compromised system Eng Sys | Att Act L | Att Sen R that is subject to both an attack on the actuator cool L of the left engine and an attack on the sensor temp R of the right engine. Again, the settings for these simulations are the same of those used for simulations described in Figure 14.
Let us focus now on the third attack, namely the one implemented by process Att Saw X. Intuitively, we expect a correlation between the length of the attack window and the impact of the attack. In order to study such a relation, we may proceed as follows: (i) We define a penalty function ρ so that its value depends on the length of the attack window [AW l , AW r ]. Actually, we may define ρ to return (AWr − AWl ) /AWML. In particular, if we consider the genuine system then ρ returns 0, since AW l = 0 = AW r . (ii) We consider a penalty function ρ ′ expressing the impact of the attack. (iii) We estimate the (ρ, ρ ′ , [0, 0], 0, η 1 , η 2 )-robustness of the genuine system. More in detail, we may sample M variations of the system at ρ-distance from the genuine one bounded by η 1 . This means sampling M versions of Eng X | Att Saw X characterised by attack windows of length bounded by η 1 · AWML. Then, by applying our simulations, we can estimate how worse these compromised systems behave with respect to the genuine system Eng X, according to the impact expressed by ρ ′ .
In Figure 18 we report the results of such an estimation for ρ ′ coinciding with fn L, [AW l , AW r ] = [0, aw] an attack window of length aw starting at first instant, AWML set to 1000 and η 1 = 0.1, which means that aw is chosen probabilistically according to a uniform distribution on interval [0, 100].
Here we consider three different offset values assigned to TF: 0.4, 0.6, and 0.8. In all cases, we estimate to have robustness, for values for η 2 that are 0.012, 0.05, and 0.11, respectively. Now, let us change the approach for attack quantification and assume that only the inflicted overstress is considered, without taking into account the raised alarm signals. In particular, this overstress is quantified at a given computation step n as the ratio between the number of computation steps where the stress condition |{k | p k ≥ max}| > 3 holds and n. Therefore, the penalty function ρ ′ is defined now as the ratio between the values of two  Figure 18. Estimation of false negative distance between a genuine engine and an engine subject to tampering with sensor temp L. The attacker adds a negative offset −0.4 (Fig. 18a), or −0.6 ( Fig. 18b), or −0.8 (Fig. 18c)  (c) TF = 0.8 Figure 19. Estimation of average stress distance between a genuine engine and an engine subject to tampering with sensor temp L. The simulation settings are the same of Figure 18.
variables, one counting the number of instants with the stress condition and one counting the total number of steps. Robustness can be estimated as in the earlier case. In Figure 19 we report the robustness of the left engine Eng L. Here we use the same simulation settings used in Figure 18 and the same offset values. The figure shows that we have robustness for low values of η 2 , which, in essence, tells us that attacks in a given window are unable to generate overstress in the long run.

Concluding Remarks
The notion of robustness is used in different contexts, from control theory [ZD98] to biology [Kit07]. Commonly, it is meant as the ability of a system to maintain its functionalities against external and internal perturbations. In the context of CPSs, [FKP16] individuates five kinds of robustness: (i) input/output robustness; (ii) robustness with respect to system parameters; (iii) robustness in real-time system implementation; (iv) robustness due to unpredictable environment; (v) robustness to faults. The notions of adaptability and reliability considered in this paper fall in category (iv). In particular, in the example of the engine system the attacks are the source of environment's unpredictability. We notice that our notions of robustness differ from classical ones like those in [FP09,DM10]. Our notions require to compare all behaviours of two different systems, whereas [FP09,DM10] compare a single behaviour of a single system with the set of the behaviours that satisfy a given property, which is specified by means of a formula expressed in a suitable temporal logic.
Moreover, the generality of our notions of robustness allows us to capture classical properties of linear systems. For instance, our notion of adaptability can be used to encode the overshoot [ZNTH19] and settling time [Dod08] of signals. Indeed, there are substantial differences between them. Specifically: (i) adaptability is a property of CPSs in general, whereas overshoot and settling time are referred to signals; (ii) adaptability is defined over the evolution sequence of a system, i.e., over all possible trajectories, whereas the other notions are dependency measures of a single signal; (iii) using the notion of Definition 6.3, the overshoot can be expressed as the parameter η 1 , and the settling time asτ . This means that not only our notion of adaptability captures both properties, but it can be used to study their correlation.
As a first step for future research we will provide a simple logic, defined in the vein of Signal Temporal Logic (STL) [MN04], that can be used to specify requirements on the evolution sequences of a system. Our intuition is that we can exploit the evolution metric, and the algorithm we have proposed, to develop a quantitative model checking tool for this type of systems. Recently we have developed the Robustness Temporal Logic (RobTL) [CLT22], together with its model checker that is integrated in our Software Tool for the Analysis of Robustness in the unKnown environment (Stark) [CLT23]. This logic allows us to specify temporal requirements on the evolution of distances between the nominal behaviour of a system and its perturbed version, thus including properties of robustness against uncertainties of programs. However, we plan to propose a temporal logic allowing us to deal with the presence of uncertainties from another point of view: our aim is to express requirements on the expected behaviour of the system in the presence of uncertainties and perturbations, by using probability measures over data states as atomic propositions.
Moreover, we would like to enhance the modelling of time in our framework. Firstly we could relax the timing constraints on the evolution metric by introducing a time tolerance and defining a stretched evolution metric as a Skorokhod-like metric [Sko56], as those used for conformance testing [DMP17]. Then, we could provide an extension of our techniques to the case in which also the program shows a continuous time behaviour.
The use of metrics for the analysis of systems stems from [GJS90,DGJP04,KN96] where, in a process algebraic setting, it is argued that metrics are indeed more informative than behavioural equivalences when quantitative information on the behaviour is taken into account. The Wasserstein lifting has then found several successful applications: from the  TBGS18]). Usually, one can use behavioural metrics to quantify how well an implementation (I) meets its specification (S). In [CHR12] the authors do so by setting a two players game with weighted choices, and the cost of the game is interpreted as the distance between I and S. Hence the authors propose three distance functions: correctness, coverage, and robustness. Correctness expresses how often I violates S, coverage is its dual,and robustness measures how often I can make an unexpected error with the resulting behaviour still meeting S. A similar game-based approach is used in [CDDP19] to define a masking fault-tolerant distance. Briefly, a system is masking fault-tolerant if faults do not induce any observable behaviour in the system. Hence, the proposed distance measures how many faults are tolerated by I while being masked by the states of the system. Notice that the notion of robustness from [CHR12] and the masking fault-tolerant distance from [CDDP19] are quite different from our notions of reliability and robustness. In fact, we are not interested in counting how many times an error occurs, but in checking whether the system is able to regain the desired behaviour after the occurrence of an error. One of the main objectives in the control theory research area is the synthesis of controllers satisfying some desired property, such as safety, stability, robustness, etc. Conversely, our purpose in this paper was to provide some tools for the analysis of the interaction of a given program with an environment. The idea is that these tools can be used to test a synthesised controller against its deployment in various environments. While a direct application of our framework to the synthesis of programs does not seem feasible, it would be interesting to investigate whether a combination of it with learning techniques can be used for the design and synthesis of robust controllers. Our confidence in a potential application relies on some recently proposed metric-based approaches to controllers synthesis [GBS + 19, AP11], in which, however, the environment is only modelled deterministically.
To conclude, we remark that our evolution sequences are not a rewriting of Markov processes as transformers of distributions [KA04,KVAK10]. Roughly, in the latter approach, one can interpret state-to-state transition probabilities as a single distribution over the state space, so that the behaviour of the system is given by the sequence of the so obtained distributions. While the two approaches may seem similar, there are some substantial differences. Firstly, the state space in [KA04,KVAK10] is finite and discrete, whereas here we are in the continuous setting (cf. Remark 1.1). Secondly, the transformers of distributions consider the behaviour of the system as a whole, i.e., it is not possible to separate the logical component from the environment.