Conformance Relations and Hyperproperties for Doping Detection in Time and Space

We present a novel and generalised notion of doping cleanness for cyber-physical systems that allows for perturbing the inputs and observing the perturbed outputs both in the time- and value-domains. We instantiate our definition using existing notions of conformance for cyber-physical systems. As a formal basis for monitoring conformance-based cleanness, we develop the temporal logic HyperSTL*, an extension of Signal Temporal Logics with trace quantifiers and a freeze operator. We show that our generalised definitions are essential in a data-driven method for doping detection and apply our definitions to a case study concerning diesel emission tests.


Introduction
System doping, in our terminology, is an intentional intervention causing a change in the system's normal behaviour against the interests of the user or other stakeholders (such as the society at large). Examples of system doping are widespread and range from vendors' enforcing a monopoly on chargers and spare parts (by checking for and refusing third-party chargers and spare parts, respectively) to tampering with exhaust emission in order to detect and pass emission tests. Doping can be the result of embedding a piece of code or smuggling a piece of electronic circuit into the system and it can be caused by the original developers or by hackers. Software and system doping has been studied in the past couple of years and rigorous theories for it have been developed [BDFH16, DBB + 17, BDH18]. These theories were subsequently adopted in order to detect doping, or formally, to check system cleanness [HBDK18,BDH19] (corresponding to the absence of doping). In the present paper, we extend the theory of doping to the setting of cyber-physical systems (CPS) by exploiting the notions of conformance testing for CPS [AMF14,DMP17,KM15]. The existing theories of software doping define doping in terms of drastic deviations in output as a result of minor deviations in input, where the term "deviation" refers to differences in validity of propositions or values of variables. However, the current notions come short of properly dealing with the issues of retiming and delays, which are commonly present in the signals of CPS. We observe that this is an essential aspect of detecting doping for cyber-physical systems: often the traces to be tested for doping have subtly different timing behaviour, e.g., due to measurement and calibration errors or due to the slight deviations of human actors in acting upon the planned scenarios. The insufficient treatment of retiming and delays can both lead to false negatives, i.e., missing cases of doping, as well as false positives, i.e., reporting spurious doping cases.
To address these issues, we exploit the notion of conformance to devise a general theory of being clean from doping and instantiate that theory with some existing notions of conformance for hybrid systems. We show how these notions can account for retiming and lead to more precise notions of cleanness. Furthermore, we show how the retiming can be synchronized between input and output, leading to a refined notion of cleanness with a rigorous account of the relation between the retiming of input and output.
We illustrate the usefulness of our theory by empirical analysis of diesel engine exhaust emissions in the context of one of the official test cycles, the New European Driving Cycle (NEDC) [Uni13]. In particular, we show that catering for retiming is essential in effectively exploiting the actual driving cycles for performing doping analysis. We thus demonstrate that our new theory remedies a major shortcoming in the existing notions from the literature. To facilitate the presentation, we use throughout the remainder of this paper the following simple running example, which is inspired by our case study.
Example 1.1. Fig. 1.(a) shows two test cycles (evolution of speed over time), designed to detect whether the exhaust emission control of a particular vehicle is doped. The test cycle i st , depicted with a black solid line, is the standard one prescribed by the (fictitious) official regulation, while test cycle i dev , depicted by a red dotted line, is a slight deviation thereof. If the exhaust emissions measured during the test cycle i dev turn out to be significantly higher than the ones measured in test cycle i st , then we can conclude that the exhaust emission system is potentially doped, since it appears tailored to the standard test cycle. Fig. 1.(b) addresses a notorious problem of testing cars: a human tester is supposed to drive the car as just described, however, she can do this only up to a certain imprecision. Assume her driving of i dev exhibits a slight time shift τ relative to the test cycle, as in i ddev , while i st is being driven as intended.
The result of a test is the emission footprint measured at the exhaust pipe of the car. Fig. 1.(c) and Fig. 1.(d) show two different possible test results (obtained from different cars) for the scenario in Fig. 1.(b). Intuitively, the footprints in Fig. 1.(c) provide significant evidence for doping -a slightly different test cycle has resulted in significantly larger footprint. However, due to the time shift on the input side in Fig. 1.(b) the point-wise difference of the two driven test-cycles has grown very large. As we show in the remainder of this paper, the existing theory of doping fails to detect such a clear evidence, due to the minor delay during the execution of the driving cycle. The emission footprint in Fig. 1.(d) is another (synthetic) example of a significant deviation which cannot be detected for the input in Fig. 1.(b) using existing theories; this latter footprint sheds some light on the intricate design decisions in the theory we develop in this paper.
The contributions for this paper can be summarized as follows: • We define a general notion of conformance that can express different ways of comparing execution traces by allowing deviations both in value and in time; • We define a general notion of cleanness for hybrid systems, and show that it subsumes the existing notion of robust cleanness [DBB + 17]; • We define the notion of synchronized retiming, which provides a rigorous tool for relating the retiming of input and output and use it to produce a refined notion of cleanness; • We provide a logical account of cleanness (based on the notion of hybrid conformance) by developing a temporal logic, called HyperSTL*, that extends Signal Temporal Logic (STL) with a freeze operator and quantifiers over traces; and • We demonstrate the usefulness of the proposed generic framework by applying it to software doping tests in the automotive domain, where we show that the new cleanness definition is able to flag a case of software doping that goes unnoticed when robust cleanness is used. This paper substantially extends the theoretical material and the experimental results published in the earlier conference publication [DGM + 20]. In particular, the following contributions of the present paper are new with respect to the earlier conference publication: • the notion of cleanness with synchronised retiming in Section 5 and its application in Section 7 for doping detection in our case study, • the logical approach to cleanness in Section 6, the introduction of our logical formalism HyperSTL*, its monitoring, and its application to specify hybrid conformance, and • the redesign of our experimental setup to comply with the NEDC test cycle (with preconditioning and control of ambient temperature), as well as several new experiments to make full use of our theories of retiming. These new experiments led to much more substantial and decisive evidence for our case study.

Related Work
The term "software doping" was coined around 2015 [HHB15] in media uncovering the diesel exhaust emissions scandal. An informal problem formulation [BDFH16] pointed out the general phenomenon of intentionally added hidden software behaviour, which is not in the interest of the consumer. Shortly after, this observation has been complemented by a set of formal cleanness definitions [DBB + 17] laying the theoretical foundations upon which formal methods to detect such software behaviour can be used. It is possible to detect missing functionality and undesired existing functionality. The definitions support both sequential programs and nondeterministic reactive programs. To check satisfaction of the 3. Preliminaries 3.1. Semantic domain. In this section, we provide definitions regarding semantic domain, conformance, and robust cleanness. We begin with the definition of our semantic domain, called generalised timed traces [GM20]. This definition subsumes both discrete-time state sequences and continuous-time trajectories. A generalised timed trace is a function with a discrete or continuous domain (called time domain) and a co-domain which is a metric space. Intuitively, a generalized timed trace maps each element of its time domain to a state. We require that the set of possible states is a metric space since we study conformance notions that compare traces based on the distance between the states of the traces.
For a GTT µ : T → Y and time t 0 ∈ T , by µ[. . . t 0 ] we denote the prefix of µ up to t 0 , i.e., the restriction µ| t∈T :t≤t 0 ; likewise, by µ[t s . . . t e ], we shall denote the restriction µ| t∈T :ts≤t≤te A hybrid system is a mapping from generalised (input) timed traces to sets of generalised (output) timed traces.
Definition 3.2. A Y-valued hybrid system is a function H : GTT (Y) → P(GTT (Y)) such that for all µ ∈ GTT (Y) and all µ ∈ H (µ) it holds that dom(µ ) = dom(µ). We define H(Y) to be the set of all Y-valued hybrid systems.
In addition, we distinguish deterministic hybrid systems whose output values range over singleton sets only. In what follows, we identify deterministic hybrid systems with functions of the type GTT (Y) → GTT (Y).
For simplicity, we assume that the input and output domain are defined on the same metric spaces. The generalisation to different spaces is straightforward.
3.2. Conformance relations. Recently, a number of notions of conformance for cyberphysical systems have been proposed [ARM17,KM15]. It turns out that these notions, two of which are quoted below, can provide a rigorous basis for doping detection.
Note that throughout the paper, the variables τ and (with possible subscripts) always range over non-negative real numbers.
Example 3.4. Consider again the example shown in Fig. 1.
We can see that in Fig. 1.(a) i st and i dev are trace conformant with value threshold , as they only exhibit point-wise deviations by values less than . In contrast, i st and i ddev in Fig. 1.(b) are not trace conformant, yet they are hybrid conformant with time and value margins τ and , respectively. The key difference is that the inputs depicted in Fig. 1.(b) are very different if compared point-wise, but if one allows for retiming, they are close enough in value after retiming.
The outputs o (i st ) and o (i ddev ) in Fig. 1.(d) illustrate the fundamental difference between hybrid and Skorokhod conformance: although the order of rising and falling signals are reversed in the two trajectories, they are still hybrid conformant, because hybrid conformance disregards the order. However, Skorokhod conformance requires an orderpreserving retiming, and hence distinguishes these two trajectories. On the other hand, such retiming exists, e.g., for i st and i ddev in Fig. 1.(b), witnessing their Skorokhod conformance.

TraceConf
SkorConf τ, HybridConf τ, 3.3. Robust cleanness. We shall now state the original definition of robust cleanness from [DBB + 17], adapted to our framework of hybrid systems. It is based on Definition 7 and Proposition 19 from [DBB + 17]; the phrasing below abstracts from the so-called parameters of interest and standard inputs. Moreover it is cast in the setting of generalised timed traces rather than discrete-step programs, and stated using trace conformance with different thresholds for inputs and outputs, κ I and κ O . Intuitively, a hybrid system is robustly clean if for every pair of input prefixes on which no difference in the inputs exceeding κ I has occurred so far (i.e., all sub-prefixes are trace conformant), the corresponding sets of output prefixes are also conformant with respect to κ O . As we consider nondeterministic systems, Hausdorff distance is used to compare sets of outputs (see [DBB + 17] for details).
Definition 3.6. A hybrid system H is robustly clean, denoted RobustClean(κ I , κ O ), whenever: Note that in the above definition we do not require that dom(i 1 ) = dom(i 2 ). In practice, robust cleanness is typically applied to pairs of traces that are both defined over N. Here, however, for the sake of generality we impose no such restriction. In particular, when the Vol. 18:1CONFORMANCE AND HYPERPROPERTIES FOR DOPING DETECTION IN TIME AND SPACE14:7 time domains of two traces are different, for example disjoint, the predicate RobustClean will trivially evaluate to true.
Example 3.7. Consider the traces depicted in Fig. 1. The input prefixes i st and i ddev are given in Fig. 1.(b), and the corresponding pair of outputs is shown in Fig. 1.(c). For t ≤ t 0 , we have i ddev (t) = 0 and o(i ddev )(t) = 0. The trace i st results in output o(i st ) and i ddev results in o(i ddev ). Suppose that < |i st (t 0 ) − i ddev (t 0 )|, and for every t < t 0 it holds that for all t ≤ t < t 0 , and κ I = κ O . For t ≥ t 0 the implication is true as well: for t < t 0 the reasoning is as above, and for all t ≥ t 0 the left-hand side of the implication is false. Hence, regardless of the difference in the output values at t 1 , this pair of inputs satisfies the condition of RobustClean( , ), and, if these are the only traces in a hybrid system H then we can conclude that H is RobustClean( , ).

Conformance-Based Cleanness
We now define a general notion of conformance-based cleanness and provide two instantiations based on the conformance notions defined in the previous section. 4.1. Motivation. The need for considering disturbance in time as well as in value is motivated by our running example from Fig. 1. One of the challenges in performing doping tests for cyber-physical systems is that in such systems timing is rarely perfectly precise, due to imprecision in measurements, or caused by the interaction with the physical world. As illustrated in Example 1.1, for instance, when checking for software doping in a car [BDH19], the input to the system is the value of the car's speed over time, which is under the control of a driver, and can thus vary from one execution to the other, even if the driver is trying to execute the same input sequence. Clearly, those variations can be in value, as well as in time.
Example 4.1. Consider the test setup sketched in Fig. 1. There, i st and i ddev , depicted in Fig. 1.(b) define the speed of a car as a function of time. These two input sequences follow a trajectory of values differing by a small margin (the difference in value allowed by the standard defining the doping tests), but also shifted by a small unit of time τ . Observe further that |i st (t 0 ) − i ddev (t 0 )| . Thus, without allowing for deviations in time when comparing these input sequences, they will be considered sufficiently different, and as a result their respective exhaust emission outputs will fall out of the comparison when checking for doping according to Def. 3.6, even if the outputs H (i st (t)) and H (i ddev (t)) are vastly different, as depicted in Fig. 1.(c). This results in a false negative, i.e., failing to detect a clearly doped system.
In the above example, we demonstrated that not accounting for timing disturbances when relating input trajectories can result in false negatives in doping detection. Dually, using the traditional comparison for output traces can result in false positives by requiring overly strict matching of outputs.
The above example motivates the need to account for timing deviations in trajectories. Intuitively, for input trajectories this relaxation results in considering more traces as conforming, and thus enforcing more comparisons when checking if a system is clean. For output trajectories this means relaxing the conformance requirement by considering two output sequences as conforming even if their values are not perfectly aligned in time. Furthermore, different types of timing deviations need to be considered in different scenarios, for example, depending on whether the order in which values occur is important or not. This motivates considering conformance notions that require retimings to be order-preserving. Indeed, using Skorokhod conformance we can detect that the system is doped.
The above examples show that in order to be useful in a diverse set of applications, a software cleanness theory should allow for using a variety of conformance notions. To this end, we next take a more general view on conformance notions, in order to be able to develop a generic conformance-based cleanness framework.

4.2.
Retimings and a more generic view on conformance notions. So far, we have defined three specific notions of conformance which either coincide, or are closely inspired by ones that have appeared in the literature. In order to define a general framework for cleanness, we also wish to treat notions of conformance in a more generic manner. To this end, we propose an abstract definition of conformance predicates. As conformance predicates admit variations in time, as well as in value, our definition is based on retimings, a device that will play a key role in the context of this work. In its general form a retiming is a pair of functions between two time domains. Intuitively, given two GTTs, a retiming will define a mapping from points in each of the traces to points in the other trace. Note that in general the mappings are not required to be injective; this way we can cater for notions of conformance allowing for the so-called local disorder phenomenon (in particular hybrid conformance -see Proposition 4.5).
Definition 4.3. A retiming is a pair of functions between two time domains, i.e., a pair of the form (r 1 , r 2 ), where r 1 : T 1 → T 2 and r 2 : T 2 → T 1 , with time domains T 1 , T 2 ⊆ R ≥0 . Given two time domains T 1 and T 2 , we denote the set of all retimings between T 1 and T 2 with RET (T 1 , T 2 ).
Retiming is explicitly present in the definition of Skorokhod conformance; there, each Skorokhod retiming is required to be a strictly increasing continuous bijection. We can express a Skorokhod retiming r as an instance of our definition as the pair (r, r −1 ). In fact, one can also define hybrid conformance, as well as a whole class of conformance notions, using a suitable family of retimings.
A family of retimings Ret can be further constrained by τ to a subset Ret τ of Ret containing only functions that shift time by at most τ time units. In order to use a family of retimings for concrete sequences µ 1 and µ 2 , it is necessary to consider only functions that match the domains of the sequences. This leads to a generic notion of conformance associated with a given family of retimings Ret, a given time threshold τ and a given value threshold .
Definition 4.4. Let Ret be a family of retimings, and let A conformance notion with time threshold τ and value threshold induced by Ret is a predicate Conf Ret τ, on pairs of GTTs such that, for µ 1 : A conformance notion with unbounded time deviation and value threshold induced by Ret is a predicate Conf Ret ∞, on pairs of GTTs such that, for µ 1 : Unless we state explicitly otherwise, we consider conformance notions with finite time threshold τ ∈ R ≥0 . Using the above definition, we can easily express the specific notions of conformance defined in the previous section by selecting a suitable family of retimings. Definition 4.4 also enables us to define other notions of conformance, such as, for instance a "shift conformance", which, intuitively, shifts all time points by a given constant c ∈ R, i.e., Ret c = {(r, r −1 ) | r(t) = t + c}.
Next, we define a generic notion of cleanness, parametrised by conformance predicates for the input and for the output traces. Instantiating these predicates with existing or new conformance notions, yields different conformance-based notions of cleanness that can capture a variety of cleanness specifications. 4.3. Definition of Conformance-based Cleanness. We now extend the notion of robust cleanness [DBB + 17] to allow for "small" variations in time, in addition to the variations in value. To this end, the new notion makes use of two conformance predicates, one that postulates when two input traces should be considered close enough, and another one that specifies when two output traces are close enough.
Our starting point, the notion of robust cleanness in Definition 3.6, is based on comparison of matching prefixes of a pair of input traces and the corresponding prefixes of the associated output traces. As we now want to accommodate for distance in time, we (1) compare prefixes using a conformance relation, and (2) allow for variation in the length of the compared prefixes that is within the corresponding time-distance threshold. More precisely, when comparing two prefixes, we allow for discarding start and end segments of length at most τ . This intuition is formalized by the predicate PrefConf for relaxed comparison of GTT prefixes using a notion of conformance Conf with tolerance threshold τ for time disturbance. We use cascaded notation to define PrefConf as a higher-order function taking Conf as its first argument. The predicate PrefConf compares two prefixes µ 1 and µ 2 by requiring that there exist traces µ 1 [t s 1 . . . t e 1 ] and µ 2 [t s 2 . . . t e 2 ] obtained from them, that are conformant with respect to Conf. These traces are obtained by possibly removing a sub-prefix of length at most τ , and/or removing extending with a suffix of length at most τ .
Definition 4.6. Let Conf be a notion of conformance on GTTs with tolerance threshold τ ∈ R ≥0 for time disturbance. For any pair of GTTs µ 1 : T 1 → Y, µ 2 : T 2 → Y, and t ∈ T = T 1 ∪ T 2 , the predicate PrefConf is defined as: . For conformance notions with unbounded timing deviation PrefConf coincides with Conf.
The predicate PrefConf provides a generic notion of prefix-conformance. By instantiating it with conformance relations Conf I and Conf O for input and output traces respectively, we define the notion of (Conf I , Conf O )-cleanness. For deterministic systems (Conf I , Conf O )-cleanness requires that for all pairs of input prefixes for which all sub-prefixes are prefix-conformant w.r.t. Conf I , the corresponding pair of output prefixes are prefix-conformant w.
The above definition naturally generalises to nondeterministic hybrid systems, by comparing sets of possible output prefixes using Hausdorff distance as in [DBB + 17].
Robust cleanness [DBB + 17] can be now formulated as conformance-based cleanness, which establishes that (Conf I , Conf O )-cleanness is a generalisation. Using hybrid conformance, we define hybrid-conformance cleanness, and similarly, plugging in Skorokhod conformance, we define Skorokhod-conformance cleanness. Formally: 4.4. Properties. We will now establish some key relations between the cleanness notions defined previously. We begin by lifting the implication between conformance relations to implication between cleanness notions defined using those relations.
Proposition 4.9. Suppose that Conf 1 The proposition above has two important corollaries. The first one explains the relationships between the original robust cleanness, and notions of cleanness based on Skorokhod conformance and hybrid conformance, in particular stating the conservative generalisation property for the latter notions. The second corollary compares cleanness notions with different conformance thresholds.
Example 4.12. Consider the testing workflow in Fig. 1. The inputs passed to a car are i st and i ddev , depicted in Fig. 1.(b). One of the test results is presented in Fig. 1 • As we saw in Example 3.7, for inputs i st and i ddev , the car that emits the outputs depicted in Fig. 1.(c) is deemed RobustClean( , ). Note, that in the presence of other inputs the car used for testing might not be RobustClean( , ). We now discuss testing and falsification of conformance-based cleanness. For systems with discrete time domains the existing methods for verifying [DBB + 17] or testing [BDH19] robust cleanness can be readily applied.
In the case of hybrid cleanness, existing methods for testing hybrid conformance, such as [AHF + 14] and [ACM + 18] can be extended to testing and falsification of hybrid cleanness of hybrid systems consisting of traces with finite time domains. Methods for checking Skorokhod conformance were presented in [DMP17]. Due to the quantification over all time-points t in our Definition 4.7 and Definition 4.8, it is not clear how to directly extend them to testing Skorokhod cleanness.

Cleanness with Synchronized Retimings
5.1. Practical motivation. An intuitive and useful notion of doping cleanness should capture precisely what we expect from a clean system subject to disturbances in time and value. In this regard, one can observe that even the more discriminating SkorClean predicate has certain drawbacks. The following example motivates why one may want to resort to the finer definition to be proposed in this section.
Example 5.1. Consider the scenario of particle emission cleanness presented in Example 3.4 and the input (velocity)-and output trajectories depicted in Fig. 2. Assume that for some input trajectory i 1 , the vehicle shows the output (emission) profile o 1 ; for a second input i 2 (t) = i 1 (t − τ ), consider two possible output trajectories: one output is o 1 (t − τ ), i.e., it is shifted in the same manner as input; this is assumed to be the best response to i 2 . The other output is of the form o 1 (t − τ − δ), where δ > 0 can be arbitrarily small, i.e., it is the optimal output with an arbitrary small shift to the right. Skorokhod-conformance cleanness with τ I = τ O would accept the first output, but it would reject the second one. A potential solution could be to increase the value of τ O so that it is significantly larger than τ I , but this increases the imprecision by accepting too many trajectories shifted far to the left from Intuitively, when the input shifts by some τ , we would like to compare the corresponding output trajectory with the one that is shifted accordingly. In the above-mentioned case, one would therefore ideally like to perform conformance check of output against o 1 (t − τ ), rather than o 1 (t).
5.2. Formal theory of synchronized retiming. In order to alleviate this imprecision, we propose a definition of conformance-based cleanness with synchronised retimings, in which we do not check the conformance of the resulting outputs directly, but rather check conformance of each of the outputs against the transformation of the other output with the retiming that is expected, based on the retiming of the corresponding input. Note that the expected retiming of the output is not always precisely the same as that of the input. Instead, we assume that the set of expected output retimings to a given input retiming is available through a synchronisation function. As mentioned earlier, we can include in the set of conforming output trajectories the best expected response o 1 (t − τ ) by allowing a sufficiently large τ O , but this comes at the price of introducing imprecision in the conformance relation. By shifting the reference point of conformance comparison, our cleanness with synchronised retimings avoids this imprecision. What is more important, by performing the synchronisation independently of τ O , we introduce the opportunity to constrain the set of conforming output traces to those traces that are as close as desired to the ideal expected output behaviour.
We proceed to formalise the enhanced notion of cleanness with synchronisation outlined above. We start with two auxiliary definitions.
Definition 4.4 entails that whenever the conformance predicate Conf holds for certain pair of timed traces, it is witnessed by at least one relevant retiming (r 1 , r 2 ). The following operator "extracts" all such witness retimings: Similarly, we define the collection of all retimings that witness prefix conformance (PrefConf predicate, Definition 4.6): Note that the domains of the retimings in PrefixWit Conf(µ 1 , µ 2 , t) can be smaller than the domains of µ 1 and µ 2 .
Synchronisation is realised through a function Sync specifying all allowed pairs of output retimings for a given pair of input retimings. With this, we can extend the definition of (Conf I , Conf O ) cleanness as follows. Given two inputs that are conformant, i.e., PrefConf I (i 1 , i 2 ), we may pick any pair (r 1 , r 2 ) from the set PrefixWit Conf I (i 1 , i 2 ) of pairs of retiming functions for which the input conformance holds. This pair induces another pair (r 1 , r 2 ) ∈ Sync(r 1 , r 2 ) of retiming functions for the output timeline. For those, the prefix conformance predicates Conf O (o 1 • r 2 , o 2 ) and Conf O (o 1 , o 2 • r 1 ) must hold. This is formally expressed in the following definition.
) if the following holds: Through the function Sync, which is a parameter to the above definition, we can specify the allowed retimings for the output, such as, for example a scaling of the input retiming when the timelines of the input and the output have different scales. It is the responsibility of the cleanness tester or verifier to accurately specify the expected behaviour, as an inappropriately chosen Sync function can result in declaring doped systems to be clean. One important aspect of this definition is that by selecting a suitable synchronisation function Sync we can incorporate in the cleanness check any available knowledge regarding the expected output behaviours for conforming input trajectories. The following proposition states how the conformance-based notions of cleanness can be recovered by choosing appropriate retimings.  .7) is a special instance of (Conf I , Conf O )-cleanness through Sync.
By setting Sync(r 1 , r 2 ) = {(id, id)} we obtain the corresponding notion of (Conf I , Conf O )cleanness, which, in particular, means that cleanness with synchronised retimings is also a conservative generalization of robust cleanness.
Example 5.4. Consider the behaviour introduced in Example 5.1. As for the retiming witnessing conformance between i 1 and i 2 , let us take the most obvious one i.e. (r 1 , r 2 ) = (t − τ, t + τ ). If (r 1 , r 2 ) covers the whole domain of the output trace, then we can use Sync(r 1 , r 2 ) = {(r 1 , r 2 )}, according to which the output should be retimed in the same way as the input is. By reusing the same retiming of input for output, o 1 (t − τ − δ) conforms to the retimed output o 1 with respect to the margin δ.
We use this theory in our experimental setup in Section 7 and show how it can lead to a more accurate analysis of emission data in practice.

Expressing Cleanness in Timed Hyper Logics
In this section we introduce a logic that is capable of characterizing the notions of robust and hybrid cleanness. Since robust cleanness can be characterized in the logic HyperLTL [CFK + 14], the logic we propose is a temporal logic for hyperproperties. Our semantic domain consists of generalized timed traces, and thus, our logic extends Signal Temporal Logic (STL) [MN04] with quantifiers over traces. In order to be able to express deviations in time, our logic uses freeze quantifiers as the mechanism for comparing values at different time points. More precisely, the proposed logic is obtained by extending STL* [BDSV14] with trace quantifiers. In the remainder of the section we provide the formal definition of the logic and discuss its applicability in the context of specifying and monitoring cleanness of hybrid systems. 6.1. Preliminaries. For the presentation in this section it will be convenient to consider hybrid systems as sets of GTTs, where each GTT represents a pair of input and output GTTs. The reason for this is that we will define a logic whose formulas refer to both the inputs and the outputs of a hybrid system over time, and are therefore interpreted over sets of such combined traces that contain both the input to the system and the system's output.
Formally, we will represent a Y -valued hybrid system H : The definition of the GTTs µ in this set is possible since according to Definition 3.2 we have that for all µ I ∈ GTT (Y ) and all µ O ∈ H (µ I ) it holds that dom(µ I ) = dom(µ O ).
For the rest of this subsection, whenever we refer to a GTT µ ∈ H of a Y -valued hybrid system H , we mean a function µ : T → Y defined as above, with Y = Y × Y . Given µ ∈ H such that µ(t) = (µ I (t), µ O (t)) for t ∈ dom(µ), we denote with µ I : T → Y the projection of µ on the input component and with µ O : T → Y its projection on the output component. Let , or of the form [0, ∞), and Y = R n for some n ∈ N, we say that µ is a real-valued signal, and define length(µ) = b, respectively length(µ) = ∞, to be the time length of µ. If T is instead a strictly increasing sequence t 0 , t 1 , t 2 , . . . of rational numbers such that t 0 = 0 we say that µ is a timed word and similarly define its time length as length(µ) = max T if T is finite and length(µ) = ∞ otherwise.
Let X be a finite set of real-valued variables. We denote with R X the set of possible valuations of X. In the rest of this section we assume that the range of all GTTs that we consider is R X for a given finite set of real-valued variables. We will assume that the variables in X are indexed, i.e., X = {x 1 , x 2 , . . . x n } for some n ∈ N, and use R n instead of R X with the expected interpretation. An atomic predicate over X is a function α : When Y = R X for some set of variables X, we assume that the metric d can be expressed as an arithmetic expression d Y (X, X ) over the variables For a Y -valued hybrid system H we denote with d I and d O the arithmetic expressions that define the metrics associated with the underlying metric spaces for the input and output values of H.
6.2. The logic HyperSTL*. We now define the logic HyperSTL*, which extends the logic STL* [BDSV14] with quantifiers over traces, that are used to relate multiple GTTs in a hybrid system. To this end, let V trace be a countably infinite set of trace variables. For a set X of real-valued variables and a given trace variable π ∈ V trace , let X π = {x π | x ∈ X} be the set of variables indexed with π.
Let I = {1, . . . , m} for some m ∈ N be a finite index set. As in the logic STL*, the index set I consists of the indices of the positions in the frozen time vector. Intuitively, at each position of the frozen time vector a time point can be stored. For a trace variable π ∈ V trace and an index i ∈ I, let X * i π = {x * i π | x ∈ X} be the set of variables indexed with π and * i .
6.2.1. Syntax. Let X be a finite set of real-valued variables, and AP be a set of atomic predicates over the set of indexed variables π∈Vtrace X π ∪ π∈Vtrace ,i∈I X * i π . HyperSTL* formulas are defined according to the following grammar.
where π ∈ V trace is a trace variable, α is an atomic predicate from AP , J ⊆ R ≥0 is an interval with endpoints in Q ≥0 ∪ {∞}, and i ∈ I is an index.
The operators U and S are the temporal operators Until and Since. The * i operator, for i ∈ I is the signal-value freeze operator. Their semantics is formally defined below. When the interval J is of the form [0, ∞) we often omit it for convenience.
The Boolean constant ⊥ (false), additional Boolean operators, as well as additional temporal operators are defined in the usual way. More concretely, we define J ϕ = U J ϕ, J ϕ = ¬ J ¬ϕ, J ϕ = S J ϕ, and J ϕ = ¬ J ¬ϕ. A HyperSTL* formula is well-formed if each occurrence of a trace quantifier introduces a unique variable name, and it is closed if every occurrence of a variable in X π is in the scope of a quantifier for π. We will consider only well-formed HyperSTL* formulas.
Note that in HyperSTL*, unlike [BDSV14], we also allow the past operator S, as well as arbitrary intervals J in the operators U and S. We define HyperSTL* fin to be the fragment of HyperSTL* such that every interval J is of the form [a, b], where a, b ∈ Q ≥0 and a < b.
6.2.2. Semantics. HyperSTL* formulas are interpreted over trace assignments and register valuations. A trace assignment is a partial function with finite domain from V trace to the set of GTTs in a given hybrid system. Formally, given a hybrid system H represented as a set of input-output GTTs, a trace assignment Π is a partial function Π : V trace → H . Register valuations are |I|-dimensional vectors over R ≥0 .
Let Π be a trace assignment with domain U trace ⊆ V trace , and let α be an atomic predicate defined over the variables in π∈Utrace X π ∪ π∈Utrace ,i∈I X * i π . Consider a time point t ∈ R ≥0 such that for each π ∈ V trace for which a variable from X π occurs in α it holds that t ∈ dom(Π(π)), and a register valuation T such that for each pair π ∈ V trace and i ∈ I such that a variable from X * i π occurs in α it holds that T (i) ∈ dom(Π(π)). Then, the value of the atomic predicate α at the tuple (Π, T, t) is defined as: α(Π, T, t) = α((Π(π)(t)) π∈Utrace , (Π(π)(T (i))) π∈Utrace ,i∈I ).
Intuitively, the atomic predicate is evaluated using the signal values at time point t and at the time points stored in the frozen time vector T . If for some of the indexed variables that occur in α the corresponding time point is not in the time domain of the corresponding trace, then the value of the atomic predicate is undefined.
To define the semantics of HyperSTL*, we define the function Value that maps a formula Ψ, a trace assignment Π, a register assignment T and a time point t to a value in the set {T, F, U}, which indicates whether Ψ is true (T), false (F) or undefined (U) at (Π, T, t). Formally, for a hybrid system H, a trace assignment Π, a register valuation T , and t ∈ R ≥0 , the value Value(Ψ, H, Π, T, t) is defined by induction on the structure of HyperSTL* formulas. • If Ψ = ϕ 1 U J ϕ 2 , then Note that if a formula is closed, then its value is always either T or F. For a hybrid system H, a trace assignment Π, a register valuation T , t ∈ R ≥0 , and a HyperSTL* formula Φ we can define the satisfaction relation |= where We say that a hybrid system H satisfies a closed formula Φ, denoted H |= Φ, if and only if it holds that (H, Π ∅ , T 0 , 0) |= Φ, where Π ∅ is the empty trace assignment and T 0 is the register valuation in which 0 is stored at every index.
Consider the HyperSTL* formula Φ 1 = ∀π 1 .∀π 2 . [0,4] (x π 1 = x π 2 ) that states that for every pair of timed traces and every time point in the interval [0, 4] the value of the two traces must be equal (i.e., they agree on the value of variable x). We have that H |= Φ 1 since the two traces differ at time point t = 0. If, on the other hand we consider the formula Φ 2 = ∀π 1 .∀π 2 . [1,4] (x π 1 = x π 2 ) obtained from Φ 1 by replacing the interval [0, 4] by [1, 4], we have that H |= Φ 2 . The justification behind this is that there is no time point in the interval [1, 4] where we witness a violation of the atomic predicate x π 1 = x π 2 . In particular, in the time interval [1, 4] there is no point at which both traces are defined. Now, consider the HyperSTL* formula Φ 3 = ∀π 1 .∀π 2 . [0,4] (x π 1 = x π 2 ) that states that for every pair of traces, in the interval [0, 4] there exists a time point where the values of the two traces are the same. We have that H |= Φ 3 , as expected, since there is no point in this interval where both traces are defined and have the same value. Note that we also have H |= ∀π 1 .∀π 2 . [1,4] (x π 1 = x π 2 ) for the interval where the value of x π 1 = x π 2 is undefined.
Finally, let Φ 4 = ∀π 1 .∀π 2 . [0,3] * 1 [0,1] x * 1 π 1 = x π 2 . The formula Φ 4 states that for every pair of traces there exists a time point t 1 in [0, 3] such that there is a time point t 2 at most 1 time unit later, such that the value of the first trace at t 1 is equal to the value of the second trace at time t 2 . Here t 1 is the frozen time point per the semantics of the freeze operator * . We have that H |= Φ 4 . To see this, when π 1 is µ 1 and π 2 is µ 2 let t 1 = 2 and t 2 = 3, and when π 1 is µ 2 and π 2 is µ 1 let t 1 = 3 and t 2 = 4.
Remark 6.2. Due to the generality of our semantic domain, which generalizes both continuous signals and timed words, we have to address the issue of having to define the interpretation of HyperSTL* formulas over all time points in R ≥0 while the considered traces might not be defined at all points. Furthermore, the semantics of the logic has to account for the fact that formulas, even atomic propositions, refer to different traces which are possibly defined over different time domains. To this end, we defined the function Value that assigns values in the set {T, F, U}. For instance, if Value(α, H, Π, T, t) = U, then Value(α ∨ ¬α, H, Π, T, t) = U. For the temporal operators, our semantics is reminiscent of that in [GM20], in the sense that for evaluating ϕ 1 U ϕ 2 the subformula ϕ 1 is evaluated only in time points where its value is defined, and the time point where the obligation ϕ 2 must hold is one where its value is defined. The treatment in S is analogous. Our semantics interprets trace quantifiers over the traces for which the formula has a defined value.
Other temporal logics for timed hyperproperties face similar issues, which we discuss in Remark 6.4. The logic HyperSTL [NKJ + 17], on the other hand is not affected by such difficulties, since its semantics is defined over continuous signals defined over a whole interval. Remark 6.3. In our definition of the semantics of ϕ 1 U J ϕ 2 , similarly to [DMP17] and [GM20], we account for the fact that in a dense time domain there might not exist a first time point where ϕ 2 is satisfied. Therefore we allow for ϕ 1 to be violated at intermediate time points as long as at those points the value of ϕ 2 is T and the constraint imposed by J is satisfied. More precisely, ϕ 1 U J ϕ 2 is T at time point t if there exists a time point t ≥ t such that t − t ∈ J, ϕ 2 is T at t , and for all intermediate points t ∈ [t, t ) it holds that if ϕ 1 is F at t , then t must be such that t − t ∈ J and ϕ 2 is T at t . The analogous holds for S. Remark 6.4. Existing temporal logics for timed hyperproperties have also faced the challenge of dealing with timed traces that are defined over different sets of time points.
In [HZJ19] this leads to the consideration of two different semantics of their logic HyperMTL: an asynchronous semantics that does not require the time stamps in two timed traces to match, and a synchronous semantics in which the range of quantifiers is restricted to the traces that synchronize with the current trace assignment. The logic HyperMTL includes for each trace variable a Boolean constant (true) indexed with that variable, which allows for expressing syntactically in formulas the requirement that the current time point is in the domain of the corresponding trace. In contrast, in our logic HyperSTL* we account for undefined values on the semantic level in the definition of the value function, and values at different points in time on different traces can be related via the freeze operator.
The authors of [BPS20] provide an alternative logic HyperMTL by extending the logic MTL with quantifiers over traces in the point-wise semantics. The semantics of their logic has both a synchronous and an asynchronous layer. At the synchronous layer, traces are compared at the same points in time, and if a trace is undefined at a given point, the value at the closest previous event is used. At the asynchronous layer, an asynchronous version of the U operator allows for a bounded difference in the time points when the obligation of the Until formula is fulfilled in different traces. Our logic, on the other hand, allows for a general and flexible way of relating time points on different traces via the freeze operator.
6.3. Expressing robust and hybrid cleanness. Using the logic HyperSTL* we can express trace and hybrid conformance and robust and hybrid cleanness. We begin by first formalizing the conformance notions, and then provide the characterization of cleanness.
Let π 1 and π 2 be two trace variables, and let τ and be non-negative rational constants. We can express hybrid conformance with thresholds τ and , i.e., HybridConf τ, , as follows: where d Y (X, X ) is an arithmetic expression characterizing the metric d Y . Note that in the above formula the trace variables π 1 and π 2 are not quantified, and hence it is not closed. Intuitively, the formula states that for every time point on the trace described by π 1 it holds that within τ time units in the past or in the future, there exists a point on the trace described by π 2 where the value is -close to the value of π 1 at the current time point, and symmetrically for the other direction with traces π 1 and π 2 swapped. Proposition 6.5. Let H be a deterministic hybrid system defined over a set of real-valued variables X such that 0 ∈ dom(µ) for each µ ∈ H. Let τ, ≥ 0 be rational constants, and µ 1 , µ 2 ∈ H. Let π 1 and π 2 be trace variables and Π = {π 1 → µ 1 , π 2 → µ 2 }. Then Proof. (=⇒) First, suppose that HybridConf τ, (µ 1 , µ 2 ). By Definition 3.3, we have that for all t 1 ∈ dom(µ 1 ) there exists t 2 ∈ dom(µ 2 ) such that |t 1 − t 2 | ≤ τ and d Y (µ 2 (t 2 ), µ 1 (t 1 )) ≤ . Hence, when t 1 ∈ dom(µ 1 ), we have that If t 1 ∈ dom(µ 1 ), then, from the definition of the semantics of the operator * 1 we have ≤ , H, Π, T 0 , 0 = T can be shown by applying the same reasoning as above, this time for µ 2 .
As a special case, we can express TraceConf as Note that the formula ϕ = d Y (X π 1 , X π 2 ) ≤ does not characterize trace conformance as it does not assert the requirement that the time domains of the two traces must be the same. The formula ϕ TraceConf , on the other hand, requires that each time point where one of the traces is defined, must be matched by a value of the other trace at the same time point.
We use the idea of the above encoding to define a closed HyperSTL* formula that characterizes hybrid cleanness, HybridClean(τ I , I , τ O , O ), for deterministic hybrid systems.
For the rest of the section we consider hybrid systems H such that for every µ ∈ H it holds that 0 ∈ dom(µ), that is, we assume that all traces are defined at time point 0.
Furthermore, we assume that the set of variables X defining the states of the hybrid system H contains an explicit clock variable c representing the current time, that is never reset. That is, c simply captures the time-stamps of the values of the GTTs in H. Formally, for every GTT µ ∈ H, and every t ∈ R ≥0 , it holds that µ(t)(c) = t. With that, the atomic propositions in AP can refer to the current time point, and the freeze operator captures the current time-stamp together with the current values of the other variables. Let α ∈ AP be an atomic proposition and r, r i ∈ R ≥0 for i ∈ I be non-negative real constants. We denote by α[r, r 1 , . . . , r |I| ] the atomic predicate obtained from α by replacing each variable c π by r, and each variable c * i π by r i , for all π ∈ V trace and i ∈ I. By the definition of the clock variable c, for every trace assignment Π, register valuation T , and t ∈ R ≥0 we have that , T (1), . . . , T (|I|)], H, π, T, t), when for every π ∈ V trace for which c π occurs in α it holds that t ∈ dom(Π(π)) and for every π ∈ V trace and i ∈ I for which c * i π occurs in α it holds that T (i) ∈ dom(Π(π)). Let τ I and I be non-negative rational constants defining the threshold values for the input conformance relation, and τ O and O be the ones for the output conformance.
Let π and π be trace variables and i, s, e ∈ I. First, we define the formulas where d I (X, X ) and d O (X, X ) are the arithmetic expressions characterizing the metrics on the sets of input and output values of the considered hybrid system. Intuitively, the formula ϕ matchI τ I , I (π, i, π , s, e) evaluated at time point t and register valuation T is true if and only if there exists a time point t ∈ [t − τ I , t + τ I ] ∩ [T (s), T (e)] such that the input value at time t on the trace represented by π and the input value at time t on the trace represented by π are I -close. The formula ϕ matchO τ O , O (π, i, π , s, e) states the same for the output values. The need to constrain the time t where the match of the values at t must be found comes from the fact that in the definition of cleanness in Section 4, prefixes are compared using the predicate PrefConf. Recall that PrefConf compares two prefixes µ 1 and µ 2 by requiring that there exist segments µ 1 [t s 1 . . . t e 1 ] and µ 2 [t s 2 . . . t e 2 ] obtained from them, that are conformant. In the above formulas, the frozen values of the clock variable c represent the end points of the interval for the trace assigned to π .
Using the formula ϕ matchI τ I , I (π, i, π , s, e) we define the formula ϕ PrefConfI τ I , I (π 1 , π 2 ) which is true if and only if the current time point defines a pair of prefixes of the traces represented by π 1 and π 2 for which there exist hybrid conforming segments obtained from the prefixes by possibly removing a prefix/suffix of length at most τ I or adding a suffix of length at most τ I .
The conjunct * 7 ϕ π 1 handles the case when the current time point t (i.e., the last time point for the considered prefixes) is in the domain of π 1 . The second conjunct handles the case when t is in the domain of π 2 . If neither is the case, then the value of the conjunction is U. Since the formulas * 7 ϕ π 1 and * 7 ϕ π 2 differ only in the trace on which the time-point t is frozen, if both of them have a defined value, than these values are necessarily the same. Each of ϕ π 1 and ϕ π 2 asserts the existence of a sequence of time points defining the compared segments of the two traces and their input conformance (formula ϕ ConfI ). The formula ϕ ConfI captures the requirement that the two input prefixes ending at the current time point have segments (defined by the pairs of time points c * s 1 π 1 and c * e 1 π 1 , and c * s 2 π 2 and c * e 2 π 2 , respectively) that are hybrid conformant, as in Definition 4.6.
Proposition 6.7. Let H be a deterministic hybrid system defined over a set of real-valued variables X that includes an explicit clock variable c, and such that 0 ∈ dom(µ) for each µ ∈ H. Let τ I , τ O , I , O ≥ 0 be rational constants, and the predicates PrefConf I and PrefConf O be instantiated using HybridConf τ I , I and HybridConf τ O , O respectively. That is, let Conf I = HybridConf τ I , I and Conf O = HybridConf τ O , O . Let µ 1 , µ 2 ∈ H, let π 1 and π 2 be trace variables and Π = {π 1 → µ 1 , π 2 → µ 2 } a trace assignment.

t) is true if and only if Value ϕ PrefConfI
Proof. We show (1), the proof for (2) is analogous.
In the special case when τ I = τ O = 0, i.e., when we consider robust cleanness, the characterization of cleanness is simpler, since in the definition of PrefConf we only need to consider the actual compared prefixes (and not their truncated or extended versions) because the timing deviation is 0. As per the definition of robust cleanness, we consider PrefConf instantiated with TraceConf.
6.4. Monitoring HyperSTL* fin over finite-length real-valued signals. We now consider the fragment HyperSTL* fin and describe a method for offline monitoring of HyperSTL* fin properties on finite sets of finite-length signals.
If the given traces are finite timed words, we can obtain from them piecewise linear signals by linear interpolation, or piecewise constant signals by fixing the value for each half-closed interval between time points to be the value at the starting point of this interval.
If the signals in the set are of different time length, we take the minimum length across the set, and consider the traces up to that length. Thus, we ensure that for some B, all traces are defined over the interval [0, B]. Furthermore, we only consider formulas in HyperSTL* fin for which the bounds of the temporal operators are such that every subformula has a defined value when the formula is evaluated over traces with time domain [0, B].
Our method handles the trace quantifiers similarly to the algorithm for offline monitoring of HyperLTL formulas on finite traces given in [FHST17,FHST19]. The method iterates over tuples of generalized timed traces. The arity of the tuples is determined by the quantifier prefix of the formula. For instance, for monitoring a formula of the form Φ = ∀π 1 . . . ∀π n ∃π 1 . . . ∃π m .ϕ we will evaluate ϕ on tuples of GTTs of arity n + m, to either determine that Φ is satisfied over the given set of traces, or return an n-tuple witnessing a violation. In order to evaluate the trace-quantifier-free formula ϕ on an (n + m)-tuple of traces we compute a satisfaction evidence by using the method proposed in [BDSV14]. Note that unlike [FHST17,FHST19] we consider finite traces and formulas with bounded temporal operators. Since we assume that the length of the traces is sufficient for all subformulas of a given HyperSTL* fin formula of interest to have a defined value, the truth value of the quantifier-free part of the formula is defined. As we consider a fixed set of recorded traces, we can check offline the satisfaction of HyperSTL* fin formulas with arbitrary quantifier alternations similarly to [FHST17,FHST19], as outlined above.
Let ϕ be a trace-quantifier-free formula. Let H be a finite set of R l -valued finite-length signals, and let K ∈ H k be a tuple of traces of arity k. Then, we can interpret K as a real-valued signal κ of order k × l. The satisfaction set of the formula ϕ over the signal κ is defined analogously to [BDSV14]. Similarly to [BDSV14], we assume that the traces in H are piecewise linear and that the atomic predicates are linear, in order to make the computation of the satisfaction set tractable. Note that the atomic predicates used in the characterization of hybrid cleanness in Section 6.3 are linear when the expressions d I (X, X ) and d O (X, X ) are linear. The explicit clock variable defines a linear signal. Clearly, if the traces in H are piecewise linear, then so is the signal κ. Thus, the satisfaction set for ϕ can be calculated effectively, represented as convex polytopes.
The satisfaction set for the formula ϕ given the signal κ is a subset of R ≥0 × R I ≥0 , defined inductively with respect to structure of ϕ. Here, a signal κ of order k × l is interpreted as a trace assignment for k trace variables, in which each trace variable is assigned a GTT in the form of a real-valued signal of order l. Once the satisfaction sets for the atomic predicates appearing in the given formula have been computed, the satisfaction sets for the composite formulas can be constructed by following the inductive definition above. Under the assumptions we made above about the signals and the atomic predicates, the satisfaction sets for the atomic predicates can be computed directly using the method described in [BDSV14]. For further details, we refer to [BDSV14]. In order to perform monitoring for the formula Φ HybridClean τ I , I ,τ O , O defined in Section 6.3, we bound the temporal operators based on the signal length B, and consider the case when τ I > 0 and τ O > 0 which ensures that the intervals in the operators are non-singular.

Case Study
In this section we evaluate the proposed notion of conformance-based cleanness in a real application context, known as the Diesel Emissions Scandal [BDH19, HBDK18, KHB18, BDH18, DBB + 17, CLP + 17, BDFH16]: Starting in fall 2015, millions of diesel cars were found being equipped with defeat devices reducing the effectiveness of emission cleaning systems during real-world usage -in contrast to the regulator-defined driving scenarios on a chassis dynamometer, where the amount of emitted pollutants stay well below the applicable limits.
It was soon suspected, that the singularities of the testing procedure were straightforward to identify and hence made cheating easy; in particular, there was only a single test cycle for testing emission cleaning systems, to be executed under very particular conditions. In the European Union, this was the New European Driving Cycle (NEDC) [Uni13], the speed profile of which is shown in Fig. 3. The NEDC consists of four repetitions of an elementary urban driving cycle (UDC) followed by one extra urban driving cycle (EUDC). Each test run is preceded by a preconditioning phase (PreCon), in which three EUDCs are driven consecutively. Between PreCon and the test, the vehicle has to cool down for 6 to 36 hours at an ambient temperature between 20 and 30 degrees Celsius.
Robust cleanness gives us a way of deriving additional test cycles that are reasonable w.r.t. the official NEDC. For a concrete context, "reasonable" is defined by an accompanying formally defined contract. The contributions of this paper enable us to go beyond previous work [BDH19] where a contract allowed inputs and outputs to deviate only in the value domains, but not in the time domain. 7.1. Experimental Setup. Our empirical studies apply the theory developed in the previous sections in a very specific setting. The system under test is a Nissan NV200 Evalia equipped with a Renault 1.5 dci (110hp) diesel engine and approved w.r.t. regulation Euro 6b. All tests were conducted in November 2020. As shown in Fig. 3, the car is fixed on a chassis dynamometer and attached to a portable emissions measurement system (PEMS) in preparation for the test. The PEMS is connected to the onboard diagnostics (OBD) interface of the vehicle. During a test, the PEMS measures the amount of several gases at the end of the car's exhaust pipe and logs the data received from OBD. The PEMS is able to internally synchronise the times of gas measurements and OBD data. We will not consider the internal PEMS retiming and instead analyse the final data set. As input, we consider the OBD speed data, as output, the sum of emitted NO and NO 2 (abbreviated as NO x ). The input and output is sampled by 1 Hz. The amount of NO x emitted along different runs is comparable only to a limited extend. This is because the emission cleaning system used can have internal regeneration phases, which -from an external observer perspective -are described by a retiming function r p . An explicit definition of r p is space consuming, hence we omit it. Along with the new cycle, we propose two suitable variants of contract C with different input conformances. Neither input conformance is constrained by a time threshold; in other words, τ = ∞, so we omit τ in the index.
• Contract C a is as C, but entails input conformance Conf Reta I , where Ret a = {(r, r −1 ) | r ∈ T → T and r is total and bijective} is the family of retimings that allows any reordering of the NEDC inputs. Notably, no inputs can be added or removed.
• Contract C p adjusts C by enforcing input conformance Conf Retp I , where Ret p = {(r p , r −1 p )} is the family of retimings that only allows the particular retiming r p used to design the test cycle as discussed above. This input conformance is stricter than Conf Reta I above; it enforces that PermNEDC is not permuted any further by the driver. NEDC Lengthening: Conformance-based doping tests can run longer than the NEDC; this is not possible with robust cleanness. We propose the test cycle DoubleNEDC, which consists of two consecutive NEDCs. In contrast to all other test cycles in this paper, DoubleNEDC produces two outputs: the first after 1180 seconds, and the second after 2360 seconds. The first half of this cycle is a classical "cold" NEDC (i.e., the engine cooled down before the test execution). The second half is a "hot" NEDC, since the cool-down phase was implicitly skipped. Also, the PreCon phase is skipped implicitly; there is only a single EUDC (instead of three) prior to the second NEDC. , which allows only id and r d as retiming functions, i.e., Ret d = {(id, r d )}. Similarly, C d must include the synchronisation retiming function Sync d (r 1 , r 2 ) = (id, r d ), which enforces that both DoubleNEDC outputs are compared to the single NEDC output (independent of the input retimings r 1 and r 2 ). Human Time Imprecision Tolerance: Diesel doping tests are executed by humans driving a car. Humans tend to make mistakes when driving. Mistakes can be the over-or undershooting of the targeted speed (the error is on the value axis), or accelerations or decelerations happening too early or too late (the error is a shift on the time axis), or superpositions thereof. To compensate for both value and time errors, we use hybrid conformance. As a formal contract, this would be expressed by a variant of C, in which the input conformance is replaced by HybridConf τ I , I for some τ I > 0. For the purpose of demonstration, we will later analyse several such variants of C, each variant with a unique value for τ I and I , i.e., we consider the contract C(τ I , I ) parametrised in τ I and I . Concrete values for τ I and I must be specified when using the contract.
A test cycle that reflects drivings rich of acceleration and deceleration phasesand is hence particularly prone to human driving errors-is SineNEDC [BDH19]. SineNEDC is defined as the NEDC superimposed by a sine curve, formally SineNEDC(t) = max{0, NEDC(t) + 5 sin(0.5t)}, with a maximum input deviation from NEDC of 5km/h, compare Fig. 6. We will evaluate SineNEDC under several variants of C(τ I , I ).
Human time imprecision is as yet not considered in test cycles PermNEDC and Dou-bleNEDC, both cycles require a cycle-specific conformance predicate. However, tolerance for human imprecision can be added to these predicates by means of conformance and retiming composition. Let Ret (1) and Ret (2) be two families of retimings. Then 2 ) ∈ Ret (2) and (r (1) 2 ) ∈ Ret (1) } is the component-wise function composition. The definition for conformance composition τ 1 composes the individual retimings. The τ 1 -and τ 2 -constraints on Ret (1) and Ret (2) are applied before the composition. It is not necessary to apply further timing constraints to the resulting retiming, hence we allow infinite τ . To overcome the human imprecisions for PermNEDC and DoubleNEDC, we use the parametrised contracts C p (τ I , I ) and C d (τ I , I ), adaptations of C p and C d , with input conformances HybridPermConf τ I , I = HybridConf τ I , I • Conf Retp I and HybridDoubleConf τ I , I = HybridConf τ I , I • Conf Ret d I , respectively. As for hybrid conformance in C(τ I , I ), we will specify concrete τ I and I upon usage of the contracts. Notably, for DoubleNEDC, this does not have effects on the output conformance, because Sync d does not consider the input retiming. This is important, because outputs are available only at time points 1180 and 2360 and must not be moved to time points different than that. We do not compose Conf Reta I and hybrid conformance, because Ret a allows any possible NEDC permutation, which naturally reduces the effect of timing imprecisions.  for two test executions π 1 and π 2 . The algorithm is sketched below.

Test Results & Verdicts.
We executed each of NEDC, PermNEDC, DoubleNEDC and SineNEDC two times. We identify a concrete test execution by a suffix -1 or -2 to test cycle identifier (e.g., NEDC-1 is the first and NEDC-2 the second execution of NEDC). Raw data and the implementation of the analysis is available online [BFH20]. For NEDC, we combined the result of both executions to an average value of 182 mg/km of NO x . Notably, the Euro 6b regulation (to which our car is supposed to conform to) allows at most 80 mg/km, and the car under test is certified with 60.8 mg/km according to its documentation. The car is 3 years old. For doping detection, a test verdict is only meaningful if its input trace is conformant to that of the average NEDC execution; otherwise, the test is trivially passed. We will first evaluate PermNEDC w.r.t. C a and C p , DoubleNEDC w.r.t. C d , and SineNEDC w.r.t. C. To demonstrate the effects of hybrid conformance, we then analyse the experiments w.r.t. the parametrised variants of the contracts C, C p and C d , respectively. By definition of the test cycles, the nominal value difference for PermNEDC and DoubleNEDC after retiming is zero, and for SineNEDC it is 5 km/h. Though, due to human imprecisions, the actual differences are significantly higher.
• The executions of PermNEDC are shown in Fig. 7 Figure 9: DoubleNEDC-1 speed (black) and NEDC speed (blue) in km/h, and accumulated NO x for DoubleNEDC-1 (red) and NEDC (orange) in mg/km. compared to the NEDC output 182. The output conformance is violated for the second output, with a difference of 200 mg/km, exceeding the allowed O = 180 mg/km threshold. Hence, DoubleNEDC-1 fails -doping is detected. • During the test executions of SineNEDC, we measured 483 mg/km and 632 mg/km. The test progression is shown in Fig. 11 and 12. In SineNEDC-1, speed values deviate by up to 18 km/h, which exceeds the I threshold in C, so this test run is trivially passed. SineNEDC-2 respects the I threshold because inputs never deviate by more than 13 km/h. Consequently, SineNEDC-2 convicts our test car of doping, as the output difference of 450 mg/km is 2.5 times the allowed threshold O . • As discussed, we use hybrid conformance to compensate for human driving imprecisions.
In this context, Table 2 details the effect of a choice of τ on the maximal value error. We fix a maximum value that we allow for the time offset τ I . For this τ I we analyse our dataset to find the minimal I such that for the combination of τ I and I the input   Figure 12: SineNEDC-2 speed (black) and NEDC speed (blue) in km/h, and accumulated NO x for SineNEDC-2 (red) and NEDC (orange) in mg/km. values for τ I = 0, 1, 2, 5, 10, 15 and 20 seconds. As expected, an increasing τ I induces the minimal I to decrease. At τ I = 5 the decrease in the value error reduces notably. This happens because the error is only partially caused by the incorrect timing of the driver.
From the values reported in Table 2 we see that if we allow deviation for the input τ I = 2, and keep I = 15, then we have that HybridDoubleConf τ I , I (NEDC, DoubleNEDC-2) and HybridConf τ I , I (NEDC, SineNEDC-1) hold. For time threshold τ I = 3 seconds HybridPermConf τ I , I (NEDC, PermNEDC-1) also holds. Thus, under hybrid conformance these pairs of traces will be considered in the cleanness test for contracts C d (2, 15), C(2, 15) and C p (3, 15), respectively, while under their original contract and input conformance they are to be dismissed.
7.4. Evaluation and Discussion. The amounts of emitted NO x observed during our experiments provide clear indications of software doping regarding the car's emission cleaning system. The conformance-based contracts provide the formal basis for this verdict, as discussed above. We here complement this fact with a more intuitive explanation of the behaviour observed.
• PermNEDC slightly reorders NEDC segments in the UDC part of the test cycle. During this part, the measured NO x does not significantly differ from the NEDC reference. However, during the (unmodified) EUDC part, the amount of emissions grows significantly. It is very unlikely to find a physical explanation for the NO x increase; and very likely, that the cleaning system is optimised specifically for the NEDC. • The DoubleNEDC executions appear to reveal that the emission cleaning system optimisation can also rely on engine temperature or execution time instead of speed data. Physically, many of the common emission cleaning techniques require a hot engine to work properly (and none of them requires a cold engine). Therefore, a lower NO x value can be expected if the NEDC is run with a hot engine. In our experiments, however, the NO x emissions in the hot half are almost two times higher than in the initial cold part. In other words, the emission cleaning performance is reduced after the first NEDC execution. There is no physical explanation for this behaviour. Inside the software, detecting the end of an NEDC trip can be implemented very easily, for instance with a timer counting from 1180 -the length of NEDC -to zero. • With SineNEDC, we test the cleaning system during driving behaviour which is rich in accelerations and decelerations. An increased amount of NO x can possibly be explained by physical phenomena. However, we measured an increase of factors 2.7 and 3.5; these numbers can be safely considered as too high for a trustworthy emission cleaning system.
Software doping theory provides the basis for detecting software behaviour violating a formal contract. In this, physical aspects of the emission cleaning system should be considered during the construction of test cases, and test cycles for which drastically higher emissions can be explained physically, should not be considered. The test cycles we used for our experiments were picked with automotive expertise to avoid physically stressful cycles. If test cases are generated automatically from a contract, the physical constraints could be captured by the contract. The contracts we use for our experiments can be interpreted as very generous in favour of the manufacturers. Input thresholds such as 15 km/h and 2 seconds appear as reasonable values, keeping all tests close enough to the original NEDC. For the output threshold, we use a very large deviation value of 180 mg/km, which allows NO x emissions to almost double compared to the original NEDC value. Despite the generosity of the contracts, our experiments have been able to reveal doping for all experiments except PermNEDC-2.
The analysis of the data shows that it is indeed necessary to not only consider a deviation of value, but to also allow for timing deviations. Considering value and timing deviations offers a rich set of potential test cycles for doping tests and allows to realistically verify conformance of a test cycle and a reference cycle; especially when the quality of the studied driving tests suffers from the human-caused input distortions. In this regard, cleanness notions entailing hybrid conformance are more adequate than conformance notions demanding punctual test executions, such as robust cleanness. Without hybrid conformance, more of the doping cases we have detected would slip through.
Finally, while hybrid conformance is central to the case study considered here, our generic theory of conformance-based cleanness allows for using other conformance notions as appropriate for the CPS under test.