Foundations of probability-raising causality in Markov decision processes

This work introduces a novel cause-effect relation in Markov decision processes using the probability-raising principle. Initially, sets of states as causes and effects are considered, which is subsequently extended to regular path properties as effects and then as causes. The paper lays the mathematical foundations and analyzes the algorithmic properties of these cause-effect relations. This includes algorithms for checking cause conditions given an effect and deciding the existence of probability-raising causes. As the definition allows for sub-optimal coverage properties, quality measures for causes inspired by concepts of statistical analysis are studied. These include recall, coverage ratio and f-score. The computational complexity for finding optimal causes with respect to these measures is analyzed.


INTRODUCTION
In recent years, scientific and technological advancement in computer science and engineering led to an ever increasing influence of computer systems on our everyday lives. A lot of decisions which were historically done by humans are now in the hands of intelligent systems. At the same time, these systems grow more and more complex, and thus, harder to understand. This poses a huge challenge in the development of reliable and trustworthy systems. Therefore, an important task of computer science today is the broad development of comprehensive and versatile ways to understand modern software and cyber physical systems.
The area of formal verification aims to prove the correctness of a system with respect to a specification. While the formal verification process can provide guarantees on the behavior of a system, such a result alone does not tell much about the inner workings of the system. To give some additional insight, counterexamples, invariants or related certificates as a form of verifiable justification that a system does or does not behave according to a specification have been studied extensively (see e.g., [MP95,CGP99,Nam01]). These kinds of certificates, however, do not allow us to understand the system behavior to a full extend. In epistemic terms, the outcome of model checking applied to a system and a specification provides knowledge that a system satisfies a specification (or not) in terms of an assertion (whether the system satisfies the specification) and a justification (certificate or counterexample) to increase the belief in the result. However, model checking usually does not provide understanding on why a system behaves in a certain way. This establishes a desideratum for a more comprehensive understanding of why a system satisfies or violates a specification. Explications of the system are needed to assess how different components influence its behavior and performance. Causal relations between occurring events during the execution of a system can constitute a strong tool to form such an understanding. Moreover, causality is fundamental for determining moral responsibility [CH04,BvH12] or legal accountability [FHJ + 11], and ultimately fosters user acceptance through an increased level of transparency [Mil17].
The majority of prior work in this direction relies on causality notions based on Lewis' counterfactual principle [Lew73] which states that the effect would not have occurred if the cause would not have happened. A prominent formalization of the counterfactual principle is given by Halpern and Pearl [HP01] via structural equation models. This inspired formal definitions of causality and related notions of blameworthiness and responsibility in Kripke and game structures as well as reactive systems (see, e.g., [CHK08, BBC + 12, Cho16, YD16, FH19, YDJ + 19, BFM21, CFF + 22]).
A lot of systems are to a certain extend influenced by probabilistic events. Thus, a branch of formal methods is studying probabilistic models such as Markov chains (MCs) which are purely probabilistic or Markov decision processes (MDPs) which combine non-determinism and probabilistic choice. This gives rise to another approach to the concept of causality in a probabilistic setting, since the statement of counterfactual reasoning can be interpreted more gently in a probabilistic setting: Instead of saying "the effect would not have occurred, if the cause had not happened", we can say "the probability of the effect would have been lower if the cause would not have occurred". This interpretation leads to the widely accepted probability-raising principle which also has its roots in philosophy [Rei56,Sup70,Eel91,Hit16] and has been refined by Pearl [Pea09] for causal and probabilistic reasoning in intelligent systems. The different notions of probability-raising cause-effect relations discussed in the literature share the following two main principles: (C1): Causes raise the probabilities for their effects, informally expressed by the requirement "Pr( effect | cause ) > Pr( effect )". (C2): Causes must happen before their effects.
Despite the huge amount of work on probabilistic causation in other disciplines, research on probability-raising causality in the context of formal methods is comparably rare and has concentrated on the purely probabilistic setting in Markov chains (see, e.g., [KM09,Kle12,ZPF + 22] and the discussion of related work below). To the best of our knowledge, probabilistic causation for probabilistic operational models with nondeterminism has not been studied before.
In this work, we formalize principles (C1) and (C2) for Markov decision processes. We start in a basic setting by focusing on reachability properties where both effect and cause are sets of states. Later, we naturally extend this framework by considering the effect to be an ω-regular path property while causes can either still be state-based or ω-regular co-safety path properties.
As we like to to have probability-raising inherent in the MDP, we require (C1) under every scheduler. Thus, the cause-effect relation holds for every resolution of the non-deterministic choices. We consider two natural ways to interpret condition (C1): On one hand, the probability-raising property can be required locally for each element of the cause. This results in a strict property which requires that after each execution leading to the cause the probability of effect has been raised. Such causes are called strict probability-raising (SPR) causes in our framework. This interpretation is especially suited when the task is to identify system states that have to be avoided for lowering the effect probability. On the other hand, one might want to treat the cause globally as a unit in (C1) leading to the notion of global probability-raising (GPR) cause. This way, the causal relation can also be formulated between properties instead of considering individual elements or executions to be causal. Considering the cause as a whole is also better suited when further constraints are imposed on the candidates for cause sets. E.g. if the set of non-terminal states of the given MDP is partitioned into sets of states S i under control of different agents i, 1 ⩽ i ⩽ k and the task is to identify which agent's decisions might cause the effect, only the subsets of S 1 , . . . , S k are candidates for causes. Furthermore, global causes are more appropriate when causes are used for monitoring purposes under partial observability constraints as then the cause candidates are sets of indistinguishable states.
Even with the distinction between strict and global probability-raising causality, different causes might still vary substantially regarding how well they predict the effect. Within Markov decision processes this intuitively coincides with how well the executions exhibiting the cause cover executions showing the effect. However, solely focusing on broader coverage might also lead to more trivial causal relations. In order to take this trade off into account, we take inspiration from measures for binary classifiers used in statistical analysis (see, e.g., [Pow11]) to introduce quality measures for causes. These allow us to compare different causes and to look for optimal causes: The recall captures the probability that the effect is indeed preceded by the cause. The coverage-ratio quantifies the fraction of the probability that cause and effect are observed and the probability that the effect but not the cause is observed. The f-score, a widely used quality measure for binary classifiers, is the harmonic mean of recall and precision, where the precision is the probability that the cause is followed by the effect. Finally, we address the computation of arbitrary quality measures as long as they can be represented as algebraic functions.
Contributions. In this work we build mathematical and algorithmic foundations for probabilistic causation in Markov decision processes based on the principles (C1) and (C2). In the setting where the effect is represented as a set of terminal states, we introduce strict and global probability-raising cause sets in MDPs (Section 3). Algorithms are provided to check whether given cause and effect sets satisfy (one of) the probability-raising conditions (Section 4.1 and 4.2) and to check the existence of a cause set for a given effect set (Section 4.1). In order to evaluate the coverage properties of a cause, we subsequently introduce the above-mentioned quality measures (Section 5.1). We give algorithms for computing these values for given cause-effect sets (Section 5.2) and characterize the computational complexity of finding optimal cause sets with respect to the different measures (Section 5.3). We then extend the setting to ω-regular effects (Section 6.1), and evaluate how established properties transfer to this setting. We observe that in this extension SPR causes can be viewed as a collection of GPR causes. Finally we discuss the case where causes are also path properties, namely, ω-regular co-safety properties and investigate how this more general perspective affects cause-effect relations (Section 6.2). Here, the class of potential cause candidates is greatly increased. Table 1 summarizes our complexity results.
Related work. Previous work in the direction of probabilistic causation in stochastic operational models has mainly concentrated on Markov chains. Kleinberg [KM09,Kle12] introduced prima facie causes in finite Markov chains where both causes and effects are formalized as PCTL state formulae, and thus they can be seen as sets of states as in our approach. The correspondence of Kleinberg's PCTL constraints for prima facie causes and the strict probability-raising condition formalized using conditional probabilities has been worked out in the survey article [BDF + 21]. Our notion of strict probability-raising causes interpreted in Markov chains corresponds to Kleinberg's prima facie causes with the exception of a minimality condition forbidding redundant elements in our definition. effects are given as deterministic Rabin automata, ω-regular co-safety properties as causes are given as deterministic finite automata accepting good prefixes.
sets of states as causes and effects for fixed set Cause find optimal cause compute quality values check PR condition covratio-optimal = recall-optimal (recall, covratio, f-score) f-score-optimal SPR ∈ P poly-time poly-time poly-space poly-time for MC threshold problem ∈ NP ∩ coNP GPR ∈ coNP and ∈ P for MC poly-time poly-space threshold problems ∈ Σ P 2 and NP-hard and NP-complete for MC sets of states as causes and ω-regular effects for fixed set Cause find optimal cause compute quality values check PR condition covratio-optimal = recall-optimal (recall, covratio, f-score) f-score-optimal SPR ∈ coNP and ∈ P for MC poly-time PF NP (as def. in [Sel94]) threshold problems ∈ coNP poly-space threshold problem ∈ Σ P 2 GPR ∈ coNP and ∈ P for MC poly-time poly-space threshold problems ∈ Σ P 2 NP-hard ω-regular co-safety properties as causes and ω-regular effects for fixed cause property rCause find optimal cause compute quality values check PR condition covratio-optimal = recall-optimal (recall, covratio, f-score) f-score-optimal SPR difficulty illustrated in Example 6.14 poly-time optimal cause known, computation unclear open GPR ∈ coNP and ∈ P for MC poly-time in general, no optimal causes opeń Abrahám et al [ÁB18] introduces a hyperlogic for Markov chains and gives a formalization of probabilistic causation in Markov chains as a hyperproperty, which is consistent with Kleinberg's prima facie causes, and with strict probability-raising causes up to minimality. Cause-effect relations in Markov chains where effects are ω-regular properties and the causes are sets of paths have been introduced in [ZPF + 22]. These relations rely on a strict probability-raising condition, but use a probability threshold p instead of directly requiring probability-raising. Therefore, [ZPF + 22] permits a non-strict inequality in the PR condition with the consequence that causes always exist, which is not the case for our notions. However, a minimal good prefix of a co-safety strict probability-raising cause in a Markov chains corresponds to a probability-raising path in [ZPF + 22]. The survey article [BDF + 21] introduces notions of global probability-raising causes for Markov chains, where causes and effects can be path properties. [BDF + 21]'s notion of reachability causes in Markov chains directly corresponds to our notion GPR causes, the only difference being that [BDF + 21] deals with a relaxed minimality condition and requires that the cause set is reachable without visiting an effect state before. The latter is inherent in our approach as we suppose that all states are reachable and the effect states are terminal. On the other hand if we restrict [BDF + 21]'s notion of global PR-cause to ω-regular effects and co-safety causes, this corresponds to our notion of a co-safety GPR cause with the exception of the minimality condition. The same can be said about the correspondence of local PR-causes from [BDF + 21] and co-safety SPR causes.
To the best of our knowledge, probabilistic causation in MDPs has not been studied before. The only work in this direction we are aware of is the recent paper by Dimitrova et al [DFT20] on a hyperlogic, called PHL, for MDPs. While the paper focuses on the foundation of PHL, it contains an example illustrating how action causality can be formalized as a PHL formula. Roughly, the presented formula expresses that taking a specific action α increases the probability for reaching effect states. Thus, it also relies on the probability-raising principle, but compares the "effect probabilities" under different schedulers (which either schedule α or not) rather than comparing probabilities under the same scheduler as in our PR condition. However, to some extent our notions of PR causes can reason about action causality as well.
There has also been work on causality-based explanations of counterexamples in probabilistic models [KLL11,Lei15]. The underlying causality notion of this work, however, relies on the non-probabilistic counterfactual principle rather than the probability-raising condition. The same applies to the notions of forward and backward responsibility in stochastic games in extensive form introduced in the recent work [BFM21].

PRELIMINARIES
Throughout the paper, we will assume familiarity with basic concepts of Markov decision processes. Here, we present a brief summary of the notations used in the paper. For more details, we refer to [Put94, BK08, Kal20].
2.1. Markov decision processes. A Markov decision process (MDP) is a tuple M = (S, Act, P, init) where S is a finite set of states, Act a finite set of actions, init ∈ S a unique initial state and P : S × Act × S → [0, 1] the transition probability function such that t∈S P(s, α, t) ∈ {0, 1} for all states s ∈ S and actions α ∈ Act.
For α ∈ Act and T ⊆ S, P(s, α, T ) is a shortform notation for t∈t P(s, α, t). An action α is enabled in state s ∈ S if t∈S P(s, α, t) = 1. We define Act(s) = {α ∈ Act | α is enabled in s}. A state t is terminal if Act(t) = ∅. A Markov chain (MC) is a special case of MDP where Act is a singleton (we then write P(s, t) rather than P(s, α, t)). A path in an MDP M is a (finite or infinite) alternating sequence π = s 0 α 0 s 1 α 1 s 2 · · · ∈ (S×Act) * ×S∪(S×Act) ω such that P(s i , α i , s i+1 ) > 0 for all indices i. A path is called maximal if it is infinite or finite and ends in a terminal state. If π is a finite path in M then last(π) denotes the last state of π. That is, if π = s 0 α 0 . . . α n−1 s n then last(π) = s n .
A (randomized) scheduler S is a function that maps each finite non-maximal path s 0 α 0 . . . s n to a distribution over Act(s n ). S is called deterministic if S(π) is a Dirac distribution for all finite non-maximal paths π. If the chosen action only depends on the last state of the path, S is called memoryless. We write MR for the class of memoryless (randomized) and MD for the class of memoryless deterministic schedulers. Finite-memory schedulers are those that are representable by a finite-state automaton. A path π is said to be a S-path if S(s 0 α 0 . . . α i−1 s i )(α i ) > 0 for each i ∈ {0, . . . , n−1}. Given a path π = s 0 α 0 . . . α n−1 s n , the residual scheduler res(S, π) of S after π is defined by res(S, π)(ζ) = S(π • ζ) for all finite paths ζ starting in s n . Here, π • ζ denotes the concatenation of the paths π and ζ. Intuitively speaking, res(S, π) behaves like S after π has already been seen.
We use LTL-like temporal modalities such as ♢ (eventually) and U (until) to denote path properties. For X, T ⊆ S the formula X U T is satisfied by paths π = s 0 s 1 . . . such that there exists j ⩾ 0 such that for all i < j : s i ∈ X and s j ∈ T and ♢T = S U T . It is well-known that Pr min M (X U T ) and Pr max M (X U T ) and corresponding optimal MD-schedulers are computable in polynomial time. An end component (EC) of an MDP M is a strongly connected sub-MDP containing at least one state-action pair. ECs will be often identified with the set of their state-action pairs. An EC E is called maximal (abbreviated MEC) if there is no proper superset E ′ of (the set of state-action pairs of) E which is an EC.
2.2. MR-scheduler in MDPs without ECs. The following preliminary lemma is folklore (see, e.g., [Kal20,Theorem 9.16]) and used in the paper in the following form. Proof. Thanks to Lemma 2.1 we may suppose that S and T are MR-schedulers. Let where * stands for a state or a state-action pair in M. Let U be an MR-scheduler defined by U(s)(α) = f s,α f s for each non-terminal state s where f s > 0 and each action α ∈ Act(s). If f s = 0 then U selects an arbitrary distribution over Act(s).
Using Lemma 2.1 we then obtain f * = freq U ( * ) where * ranges over all states and state-action pairs in M. But this yields: Let M, S, T, λ be as in Lemma 2.2. Then, the notation λS ⊕ (1−λ)T will be used to denote any MR-scheduler U as in Lemma 2.2.
2.3. MEC-quotient. We recall the definition of the MEC-quotient, which is a standard concept for the analysis of MDPs [dA97]. Intuitively, the MEC-quotient of an MDP collapses all maximal end components, ignoring the actions in the end component while keeping outgoing transitions. More concretely, we use a modified version with an additional trap state as in [BBD + 18] that serves to mimic behaviors inside an end component of the original MDP.
Definition 2.3 (MEC-quotient of an MDP). Let M = (S, Act, P, init) be an MDP with end components. Let E 1 , . . . , E k be the maximal end components (MECs) of M. We may suppose without loss of generality that enabled actions of states are pairwise disjoint, i.e., whenever s 1 , s 2 are states in M with s 1 ̸ = s 2 then Act M (s 1 ) ∩ Act M (s 2 ) = ∅. This permits to consider E i as a subset of Act. Let U i denote the set of states that belong to E i and let U = U 1 ∪ . . . ∪ U k . The MEC-quotient of M is the MDP N = (S ′ , Act ′ , P ′ , init ′ ) and the function ι : S → S ′ are defined as follows.
• The action set Act ′ is Act ∪ {τ} where τ is a fresh action symbol.
• The set of actions enabled in state s ∈ S ′ of N and the transition probabilities are as follows: -If s is a state of M that does not belong to an MEC of M (i.e., s ∈ S∩S ′ ) then Act N (s) = Act M (s) and P ′ (s, α, s ′ ) = P(s, α, ι −1 (s ′ )) for all s ′ ∈ S ′ and α ∈ Act M (s). -If s = s E i is a state representing MEC E i of M then (as we view E i as a set of actions): The τ-action stands for the deterministic transition to the fresh state ⊥, i.e.: Thus, each terminal state of M is terminal in its MEC-quotient N too. Vice versa, every terminal state of N is either a terminal state of M or ⊥. Moreover, N has no end components, which implies that under every scheduler T for N, a terminal state will be reached with probability 1. In Section 4.2, we use the notation noeff tn rather than ⊥.
The original MDP and its MEC-quotient have been found to be connected by the following lemma (see also [dA97, dA99]).
Lemma 2.4 (Correspondence of an MDP and its MEC-quotient). Let M be an MDP and N its MEC-quotient. Then, for each scheduler S for M there is a scheduler T for N such that and vice versa. Moreover, if (2.1) holds then Pr T N (♢⊥) equals the probability for S to generate an infinite path in M that eventually enters and stays forever in an end component.
Proof. Given a scheduler T for N, we pick an MD-scheduler U such that U(u) ∈ E i for each u ∈ U i . Then, the corresponding scheduler S for M behaves as T as long as T does not choose the τ-transition to ⊥. As soon as T schedules τ then S behaves as U from this moment on.
Vice versa, given a scheduler S for M then a corresponding scheduler T for N mimics S as long as S has not visited a state belong to an end component E i of M. Scheduler T ignores S's transitions inside an MEC E i and takes β ∈ u∈U i (Act M (u) \ E i ) with the same probability as S leaves E i . With the remaining probability mass, S stays forever inside E i , which is mimicked by T by taking the τ-transition to ⊥.
For the formal definition of T, we use the following notation. For simplicity, let us assume that init / ∈ U 1 ∪ . . . ∪ U k . This yields init = init ′ . Given a finite path π = s 0 α 0 s 1 α 1 . . . α m−1 s m in M with s 0 = init, let π N the path in N resulting from by replacing each maximal path fragment s h α h . . . α j−1 s j consisting of actions inside an E i with state s E i . (Here, maximality means if h > 0 then α h−1 / ∈ E i and if j < m then α j+1 / ∈ E i .) Furthermore, let p S π denote the probability for S to generate the path π when starting in the first state of π.
Let ρ be a finite path in N with first state init (recall that we suppose that M's initial state does not belong to an MEC, which yields init = init ′ ) and last(ρ) ̸ = ⊥. Then, Π ρ denotes the set of finite paths π = s 0 α 0 s 1 α 1 . . . α m−1 s m in M such that (i) π N = ρ and (ii) if s m ∈ U i then α m−1 / ∈ E i . The formal definition of scheduler T is as follows. Let ρ be a finite path in N where the last state s of ρ is non-terminal. If s is a state of M and does not belong to an MEC of M and β ∈ Act M (s) then: M,last(π) "leave E i via action β" where "leave E i via action β" means the existence of a prefix whose action sequence consists of actions inside E i followed by action β. The last state of this prefix, however, could be a state of U i . (Note β ∈ Act N (s E i ) means that β could have reached a state outside U i , but there might be states inside U i that are accessible via β.) Similarly, M,last(π) "stay forever in E i " where "stay forever in E i " means that only actions inside E i are performed. By induction on the length of ρ we obtain: But this yields Pr S M (♢t) = Pr T N (♢t) for each terminal state t of M. Moreover the probability under S to eventually enter and stay forever in E i equals the probability for T to reach the terminal state ⊥ via a path of the form ρ τ ⊥ where last(ρ) = s E i .
2.4. Automata and ω-regular languages. In order to represent an ω-regular language, we use deterministic Rabin automata (DRA). A DRA is a tuple A = (Q, Σ, q 0 , δ, Acc) where Q is a finite set of states, Σ an alphabet, q 0 the initial state, δ : Q × Σ → Q the transition function and Acc ⊆ 2 Q × 2 Q the acceptance set. The run of A on a word w = w 0 w 1 · · · ∈ Σ ω is the sequence τ = q 0 q 1 . . . of states such that δ(q i , w i ) = q i+1 for all i. It is accepting if there exists a pair (L, K) ∈ Acc such that L is only visited finitely often and K is visited infinitely often by τ. The language L(A) is the set of all words w ∈ Σ ω on which the run of A is accepting.
A good prefix π for an ω-regular language L is a finite word such that all infinite extensions of π belong to L. An ω-regular language L is called a co-safety language if all words in L have a prefix that is a good prefix for L. A co-safety language L is uniquely determined by the regular set of minimal good prefixes of words in L.
The regular language of minimal good prefixes of a co-safety L which uniquely determines L can be represented by a deterministic finite automaton (DFA). A DFA is a tuple A = (Q, Σ, q 0 , δ, Acc) where Q is a finite set of states, Σ an alphabet, q 0 the initial state, δ : Q × Σ → Q the transition function and Acc ⊆ Q the acceptance set. The run of A on a finite word w = w 0 w 1 . . . w n is the sequence τ = q 0 q 1 . . . q n of states such that δ(q i , w i ) = q i+1 for all i. It is accepting if q n ∈ Acc. The language L(A) is the set of all words w ∈ Σ * on which the run of A is accepting.
Given an MDP M = (S, Act, P, init) and a DFA A = (Q, Σ, q 0 , δ, Acc) with Σ ⊆ S * we define the product MDP M ⊗ A = (S × Q, Act, P ′ , init ′ ) with P ′ (< s, q >, α, < t, r >) = P(s, α, t) if r = δ(q, s) and 0 otherwise, and init ′ = δ(q 0 , init). The same construction works for the product of an MDP with a DRA. The difference comes from the acceptance condition encoded in the second components of states of the product MDP.

STRICT AND GLOBAL PROBABILITY-RAISING CAUSES
Our contribution starts by providing novel formal definitions for cause-effect relations in MDPs which rely on the probability-raising (PR) principle P(E | C) > P(C) stated by (C1) and temporal priority of causes as stated by (C2) in the introduction. Here, we focus on the case where both causes and effects are state properties, i.e., sets of states.
In the sequel, let M = (S, Act, P, init) be an MDP and Eff ⊆ S \ {init} a nonempty set of terminal states. As the effect set is fixed, the assumption that all effect states are terminal contributes to the temporal priority (C2). We may also assume that every state s ∈ S is reachable from init.
We consider two variants of the probability-raising condition: the global setting treats the set Cause as a unit, while the strict view requires (C1) for all states in Cause individually.
Definition 3.1 (Global and strict probability-raising cause (GPR/SPR cause)). Let M and Eff be as above and Cause a nonempty subset of S \ Eff such that for each c ∈ Cause we have Pr max M ((¬Cause) U c) > 0. Then, Cause is said to be a GPR cause for Eff iff the following condition (G) holds: Cause is called an SPR cause for Eff iff the following condition (S) holds: (S) : For each state c ∈ Cause and each scheduler S where Pr S M ((¬Cause) U c) > 0: Note that we only consider sets Cause as PR cause when each state in c ∈ Cause is accessible from init without traversing other states in Cause. This can be seen as a minimality condition ensuring that a cause does not contain redundant elements. However, we could omit this requirement without affecting the covered effects (events where an effect state is reached after visiting a cause state) or uncovered effects (events where an effect state is reached without visiting a cause state before). More concretely, whenever a set C ⊆ S \ Eff satisfies conditions (G) or (S) then the set of states c ∈ C where M has a path from init satisfying (¬C) U c is a GPR resp. an SPR cause. On the other hand the set Cause is disjoint of the effect Eff to ensure temporal priority (C2).
3.1. Examples and simple properties of probability-raising causes. We first observe that SPR and GPR causes cannot contain the initial state init, since otherwise an equality instead of an inequality would hold in (GPR) and (SPR). Furthermore as a direct consequence of the definitions and using the equivalence of the LTL formulas ♢Cause and (¬Cause) U Cause we obtain: Lemma 3.2 (Singleton PR causes). If Cause is a singleton then Cause is a SPR cause for Eff if and only if Cause is a GPR cause for Eff.
The direction from SPR cause to GPR cause even holds in general as the event ♢Cause can be expressed as a disjoint union of all events (¬Cause) U c where c ∈ Cause. Therefore, the probability for covered effects Pr S M ( ♢Eff | ♢Cause ) is a weighted average of the probabilities Pr S M ( ♢Eff | (¬Cause) U c ) for c ∈ Cause, which yields: Lemma 3.3. Every SPR cause for Eff is a GPR cause for Eff.
Proof. Assume that Cause is a SPR cause for Eff in M and let S be a scheduler that reaches Cause with positive probability. Further, let As Cause is a SPR cause, m > Pr S M (♢Eff). The set of S-paths satisfying ♢Cause is the disjoint union of the sets of S-paths satisfying (¬Cause) U c with c ∈ C S . Hence, As m > Pr S M (♢Eff), the GPR condition (GPR) is satisfied under S.
Example 3.4 (Non-strict GPR cause). Consider the Markov chain M depicted in Figure 1 where the nodes represent states and the directed edges represent transitions labeled with their respective probabilities. Let Eff = {eff}. Then, Pr M (♢Eff) = 1 3 + 1 3 · 1 4 + 1 12 = 1 2 , Pr M (♢Eff|♢c 1 ) = Pr M,c 1 (♢eff) = 1 and Pr M (♢Eff|♢c 2 ) = Pr M,c 2 (♢eff) = 1 4 . Thus, {c 1 } is both an SPR and a GPR cause for Eff, while {c 2 } is not. The set Cause = {c 1 , c 2 } is a non-strict GPR cause for Eff as: . Non-strictness follows from the fact that the SPR condition does not hold for state c 2 . ◁ Example 3.5 (Probability-raising causes might not exist). PR causes might not exist, even if M is a Markov chain. This applies, e.g., to the MC in Figure 2 and the effect set Eff = {eff}. The only cause candidate is the singleton {init}. However, the strict inequality in (GPR) or (SPR) forbids {init} to be a PR cause. The same phenomenon occurs if all non-terminal states of a MC reach the effect states with the same probability. In such cases, however, the non-existence of PR causes is well justified as the events ♢Eff and ♢Cause are stochastically independent for every set Cause ⊆ S \ Eff. ◁ Remark 3.6 (Memory needed for refuting PR condition). Let M be the MDP in Figure 3, where the notation is similar to Example 3.4 with the addition of actions α, β and γ. Let Cause = {c} and Eff = {eff}. Only state s has a nondeterministic choice. Cause is not an PR cause. To see this, regard the finite-memory deterministic scheduler T that schedules β only for the first visit of s and α for the second visit of s. Then: Pr T M (♢eff) = 1 2 · 1 2 + 1 2 · 1 2 · 1 · 1 4 = 5 16 > 1 4 = Pr T M (♢eff|♢c) Denote the MR schedulers reaching c with positive probability as S λ with S λ (s)(α) = λ and S λ (s)(β) = 1−λ for some λ ∈ Here the state eff unc is not covered by the cause whereas eff cov is, hence their names. The two MD-schedulers S α and S β which select α resp. β for the initial state init are the only deterministic schedulers. As S α does not reach c, it is irrelevant for the PR conditions. On the other hand S β satisfies (SPR) and (GPR) since . The MR scheduler T which selects α and β with probability 1 2 in init also reaches c with positive probability but violates (SPR) and (GPR) as Pr T M (♢Eff|♢c) = 1 2 < 5 8 = 1 2 + 1 2 · 1 2 · 1 2 = Pr T M (♢Eff). ◁ Remark 3.8 (Cause-effect relations for regular classes of schedulers). The definitions of PR causes in MDPs impose constraints for all schedulers reaching a cause state. This condition is fairly strong and can often lead to the phenomenon that no PR cause exists. Replacing M with an MDP resulting from the synchronous parallel composition of M with a deterministic finite automaton representing a regular constraint on the scheduled state-action sequences (e.g., "alternate between actions α and β in state s" or "take α on every third visit to state s and actions β or γ otherwise") leads to a weaker notion of PR causality. This can be useful to obtain more detailed information on cause-effect relationships in special scenarios, be it at design time where multiple scenarios (regular classes of schedulers) are considered or for a post-hoc analysis where one seeks for the causes of an occurred effect and where information about the scheduled actions is extractable from log files or the information gathered by a monitor. Remark 3.9 (Action PR causality). Our notions of PR causes are purely state-based with PR conditions that compare probabilities under the same scheduler. However, in combination with model transformations, the proposed notions of PR causes are also applicable for reasoning about other forms of PR causality. Suppose, the task is to check whether taking action α in state s raises the effect probabilities compared to never scheduling α in state s. This form of action causality was discussed in an example in [DFT20]. We argue that we can deal with this kind of causality to. For this we assume there are no cycles in M containing s. Let M 0 and M 1 be copies of M with the following modifications: In M 0 , the only enabled action of state s is α, while in M 1 the enabled actions of state s are the elements of The action α raises the effect probability in M if and only if for all scheduler S of N the copy of s in M 0 satisfies (SPR) in N. This idea can be generalized to check whether scheduler classes satisfying a regular constraint have higher effect probability compared to all other schedulers. In this case, we can deal with an MDP N as above where M 0 and M 1 are defined as the synchronous product of deterministic finite automata and M.
To demonstrate this consider the MDP M from Figure 5. We are interested whether taking α in s raises the probability to reach the effect state eff. The constructed MDP N with two adapted copies of M is depicted in Figure 6. For all scheduler S of N the state s 0 satisfies (SPR) by

CHECKING THE EXISTENCE OF PR CAUSES AND THE PR CONDITIONS
We now turn to algorithms for checking whether a given set Cause is an SPR or GPR cause for Eff in M. Since the minimality condition (for all c ∈ Cause : Pr max M (¬Cause U c) > 0) of PR causes is verifiable by standard model checking techniques in polynomial time, we concentrate on checking the probability-raising conditions (S) and (G). For Markov chains, both (SPR) and (GPR) can be checked in polynomial time by computing the corresponding probabilities. So, the interesting case is checking the PR conditions in MDPs. In case of SPR causality this is closely related to the existence of PR causes and solvable in polynomial time (Section 4.1), while checking the GPR condition is more complex and polynomially reducible to (the non-solvability of) a quadratic constraint system (Section 4.2).
We start by stating that for (S) and (G), it suffices to consider kind of worst-case scheduler, which are minimizing the probability to reach an effect state from every cause state. For this we transform the MDP in question according to the cause candidate in question. Obviously, for all c ∈ Cause : Pr max M (¬Cause U c) > 0 holds for Cause in M if and only if it holds for Cause in M [Cause] . Furthermore, it is clear all SPR resp. GPR causes of M are also SPR resp. GPR causes in M [Cause] . So, it remains to prove the converse direction. This will be done in Lemma 4.4 for SPR causes and in Lemma 4.5 for GPR causes. Proof. We show that Cause is an SPR cause in M by showing (S) for all states in Cause. Thus, we fix a state c ∈ Cause. Recall also that we assume the states in Eff to be terminal. Let ψ c = (¬Cause) U c, w c = Pr min M,c (♢Eff) and let Υ c denote the set of all schedulers U for M such that • Pr U M (ψ c ) > 0 and • Pr for each finite S-path from the initial state init to a state c ∈ Cause. To prove that (GPR) holds for all schedulers S that satisfy Pr S M (♢Cause) > 0, we introduce the following notation: We write • Σ >0 for the set of all schedulers S such that Pr S M (♢Cause) > 0, • Σ >0,min for the set of all schedulers with Pr S M (♢Cause) > 0 such that Pr res(S,π) M,c (♢Eff) = Pr min M,c (♢Eff) for each finite S-path from the initial state init to a state c ∈ Cause. It now suffices to show that for each scheduler S ∈ Σ >0 there exists a scheduler S ′ ∈ Σ >0,min such that if (GPR) holds for S ′ then (GPR) holds for S. So, let S ∈ Σ >0 .
For c ∈ Cause, let Π c denote the set of finite paths π = s 0 α 0 s 1 α 1 . . . α n−1 s n with s 0 = init, s n = c and {s 0 , . . . , Furthermore, let p S π denote the probability for (the cylinder set of) π under scheduler S. Then Moreover: Thus, the condition (GPR) holds for the scheduler S ∈ Σ >0 if and only if The latter is equivalent to: which again is equivalent to: Pick an MD-scheduler T that minimizes the probability to reach Eff from every state. In particular, w c = w T π ⩽ w S π for every state c ∈ Cause and every path π ∈ Π c (recall that w c = Pr min M,c (♢Eff)). Moreover, the scheduler S can be transformed into a scheduler S T ∈ Σ >0,min that is "equivalent" to S with respect to the global probability-raising condition. More concretely, let S T denote the scheduler that behaves as S as long as S has not yet visited a state in Cause and behaves as T as soon as a state in Cause has been reached. Thus, p S π = p S T π and res(S T , π) = T for each π ∈ Π c . This yields that the probability to reach c ∈ Cause from init is the same under S and S T , i.e., Pr S M (♢c) = Pr S T M (♢c). Therefore Pr S M (♢Cause) = Pr S T M (♢Cause). The latter implies that S T ∈ Σ >0 , and hence S T ∈ Σ >0,min . Moreover, S and S T reach Eff without visiting Cause with the same probability, i.e., Pr S M (¬Cause U Eff) = Pr S T M (¬Cause U Eff). But this yields: if (4.4) holds for S T then (4.4) holds for S. As (4.4) holds for S T by assumption, this completes the proof. . 1. If q init < w c , then return "yes, (SPR) holds for c". 2. If q init > w c , then return "no, (SPR) does not hold for c".
, then return "no, (SPR) does not hold for c".

If c is not reachable from init in M max
[c] , then return "yes, (SPR) holds for c". As the construction of the MDP M [c] suggests, the two values compared by the algorithm are instances of worst-case scheduler. On one hand, the probability to reach Eff starting in c is minimized, while it is maximized if c was not seen yet. If in such a scenario we have case 1. q init < w c then c obviously satisfies (SPR). In the case 2. q init > w c we can build a scheduler which refuses (SPR) for c. Lastly, in the corner case 3. q init = w c a treatment by a reachability analysis is needed, as seen in the following Example 4.7. Proof. First, we show the soundness of Algorithm 4.6. By the virtue of Lemma 4.3 stating the soundness of the tranformation M to M [c] it suffices to show that Algorithm 4.6 returns the correct answers "yes" or "no" when the task is to check whether the singleton Cause = {c} is an SPR cause in Algorithm 4.6 correctly answers "no" (case 2 or 3.1) if w c = 0. Suppose that w c > 0. Thus, the SPR condition for c reduces to Pr S N (♢Eff) < w c for all schedulers S of N with Pr S N (♢c) > 0. 1. of Algorithm 4.6 (i.e., if q < w c ), the answer "yes" is sound as then Pr max N (♢Eff) = q < w c . 2. (i.e., if q > w c ) Let T be an MD-scheduler with Pr T N,s (♢Eff) = q s for each state s and pick an MD-scheduler S with Pr S N (♢c) > 0. It is no restriction to suppose that T and S realize the same end components of N. (Note that if state s belongs to an end component that is realized by T then s contained in a bottom strongly connected component of the Markov chain induced by T. But then q s = 0, i.e., no effect state is reachable from s in N. Recall that all effect states are terminal and thus not contained in end components. But then we can safely assume that T and S schedule the same action for state s.) Let λ be any real number with 1 > λ > w c q and let K denote the sub-MDP of N with state space S where the enabled actions of state s are the actions scheduled for s under one of the schedulers T or S. Let now U be the MR-scheduler λT ⊕ (1−λ)S defined as in Lemma 2.2 for the EC-free MDP resulting from K when collapsing K's end components into a single terminal state. For the states belonging to an end component of K, U schedules the same action as T and S. Then, Pr U N (♢t) = λPr T N (♢t) + (1−λ)Pr S N (♢t) for all terminal states t of N and t = c. Hence: w c Thus, scheduler U is a witness why (SPR) does not hold for c.

Pick an MD-scheduler S of M max
[c] such that c is reachable from init via S and Pr S N,s (♢Eff) = q s for all states s. Hence, (SPR) does not hold for c and the scheduler S. 3.2 We have the property that Pr S N (♢c) = 0 for all schedulers S for N with Pr S N (♢Eff) = q = w c . But then Pr S N (♢c) > 0 implies Pr S N (♢Eff) < w c as required in (SPR). The polynomial runtime of Algorithm 4.6 follows from the fact that minimal and maximal reachability probabilities and hence also the MDPs N = M [c] and its sub-MDP M max [c] can be computed in polynomial time.
By applying Algorithm 4.6 to all states c ∈ Cause and standard algorithms to check the existence of a path satisfying (¬Cause) U c for every state c ∈ Cause, we obtain: Theorem 4.9 (Checking SPR causes). The problem "given M, Cause and Eff, check whether Cause is a SPR cause for Eff in M" is solvable in polynomial-time.
Remark 4.10 (Memory requirements for (S)). As the soundness proof for Algorithm 4.6 shows: If Cause does not satisfy (S), then there is an MR-scheduler S for M [Cause] witnessing the violation of (SPR). Scheduler S corresponds to a finite-memory (randomized) scheduler T with two memory cells for M: "before Cause" (where T behaves as S) and "after Cause" (where T behaves as an MD-scheduler minimizing the effect probability). ◁ Lemma 4.11 (Criterion for the existence of PR causes). Let M be an MDP and Eff a nonempty set of states. The following statements are equivalent: (a) Eff has an SPR cause in M, (b) Eff has a GPR cause in M, (c) there is a state c 0 ∈ S \ Eff such that the singleton {c 0 } is an SPR cause (and therefore a GRP cause) for Eff in M.
Thus, the existence of SPR and GPR causes can be checked with Algorithm 4.6 in polynomial time.
Proof. Obviously, statement (c) implies statements (a) and (b). The implication "(a) =⇒ (b)" follows from Lemma 3.3. We now turn to the proof of "(b) =⇒ (c)". For this, we assume that we are given a GPR cause Cause for Eff in M. For c ∈ Cause, let w c = Pr min M,c (♢Eff). Pick a state c 0 ∈ Cause such that w c 0 = max{w c : c ∈ Cause}. For every scheduler S for M that minimizes the effect probability whenever it visits a state in Cause, and visits Cause with positive probability, the conditional probability Pr S M (♢Eff|♢Cause) is a weighted average of the values w c , c ∈ Cause, and thus bounded by w c 0 . Using Lemma 4.3 we see that it is sufficient to only consider the minimal probabilities w c = Pr min M,c (♢Eff). Thus, we conclude that {c 0 } is both an SPR and a GPR cause for Eff.

4.2.
Checking the global probability-raising condition. Throughout this section, we suppose that both the effect set Eff and the cause candidate Cause are fixed disjoint subsets of the state space of the MDP M = (S, Act, P, init), and address the task to check whether Cause is a global probability-raising cause for Eff in M. As the minimality condition (for all c ∈ Cause : Pr max M (¬Cause U c) > 0) can be checked in polynomial time using a standard graph algorithm, we will concentrate on an algorithm to check the probability-raising condition (GPR). We start by stating the main results of this section.
Theorem 4.12. Given M, Cause and Eff, deciding whether Cause is a GPR cause for Eff in M can be done in coNP.
In order to provide an algorithm, we perform a model transformation after which the violation of (GPR) by a scheduler S can be expressed solely in terms of the expected frequencies of the state-action pairs of the transformed MDP under S. This allows us to express the existence of a scheduler witnessing the non-causality of Cause in terms of the satisfiability of a quadratic constraint system. Thus, we can restrict the quantification in (G) to MR-schedulers in the transformed model. We trace back the memory requirements to M [Cause] and to the original MDP M yielding the second main result. The remainder of this section is concerned with the proofs of both Theorem 4.12 and Theorem 4.13. For this, we suppose that Cause satisfies for all c ∈ Cause : Pr max M (¬Cause U c) > 0 which can be checked preemptively in polynomial time as argued before.
Checking the GPR condition (Proof of Theorem 4.12). We will start with a polynomial-time model transformation into a kind of "canonical form" after which we can make the following assumptions when checking the GPR condition of Cause for Eff (A1): Eff = {eff unc , eff cov } consists of two terminal states. (A2): For every c ∈ Cause, there is a single enabled action Act(c) = {γ}, and there is w c ∈ [0, 1] ∩ Q such that P(c, γ, eff cov ) = w c and P(c, γ, noeff fp ) = 1−w c , where noeff fp is a terminal noneffect state and noeff fp and eff cov are only accessible via γ-transition from the c ∈ Cause. (A3): M has no end components and there is a further terminal state noeff tn and an action τ such that τ ∈ Act(s) implies P(s, τ, noeff tn ) = 1. The terminal states eff unc , eff cov , noeff fp and noeff tn are pairwise distinct. M can have further terminal states representing true negatives. These can be identified with noeff tn .
Intuitively, eff cov stands for covered effects ("Eff after Cause") and can be seen as a true positive, while eff unc represents the uncovered effects ("Eff without preceding Cause") and corresponds to a false negative. Let S be a scheduler in M.
As the cause states can not reach each other we also have Pr S The intuitive meaning of noeff fp is a false positive ("no effect after Cause"), while noeff tn stands for true negatives where neither the effect nor the cause is observed.
Establishing assumptions (A1)-(A3): We justify the assumptions as we can transform M into a new MDP of the same asymptotic size satisfying the above assumptions. Thanks to Lemma 4.3, we may suppose that M = M [Cause] without changing the satisfaction of (G). Thus, from cause states c ∈ Cause there are only two outgoing transitions, either to a terminal effect state eff with probability Pr min c (♢Eff) or to a terminal non-effect state noeff with the remaining probability (see Notation 4.1). We then may rename the effect state eff and the non-effect state noeff reachable from Cause into eff cov and noeff fp , respectively. Furthermore, we collapse all other effect states into a single state eff unc and all true negative states into noeff tn . Similarly, by renaming and possibly duplicating terminal states we also suppose that noeff fp has no other incoming transitions than the γ-transitions from the states in Cause. This ensures (A1) and (A2). For (A3) consider the set T of terminal states in the MDP obtained so far. We remove all non-trivial end components by switching to the MEC-quotient [dA97], i.e., we collapse all states that belong to the same MEC E into a single state s E while ignoring the actions inside E. Additionally, we add a fresh τ-transition from the states s E to noeff tn (i.e., P(s E , τ, noeff tn ) = 1). The τ-transitions from states s E to noeff tn mimic cases where the scheduler of the original MDP enters the end component E and stays there forever.
In particular, consider the MEC-quotient N of M [Cause] (see Definition 2.3). Let noeff tn be the state to which we add a τ-transition with probability 1 from each MEC that we collapse in the MEC-quotient. That is, noeff tn = ⊥ with the notations of Definition 2.3.
Example 4.14. For a demonstration of the above described transformations consider the abstract MDP M from Figure 11, where the dotted circles correspond to sets of states in the MDP. The MDP already has the form M = M [c] . Now rename the effect state eff reachable from Cause to eff cov and noeff to noeff fp . All effect states which are not reachable from Cause are collapsed into the state eff unc . In this example there are no terminal non-effect states which are not reachable from c. These would be collapsed to noeff tn . Taking the MEC quotient collapses the MECs to states s E i while keeping outgoing transitions. Additionally, there is a fresh outgoing action τ from the states s E i to the fresh state noeff tn . Thus we get N from Figure 12.  , there is a scheduler T for N, and vice versa, such that Proof. By  Note, however, that the transformation changes the memory-requirements of schedulers witnessing that Cause is not a GPR cause for Eff. We will address the memory requirements in the original MDP later. With assumptions (A1)-(A3), condition (G) can be reformulated as follows: With assumptions (A1)-(A3), a terminal state of M is reached almost surely under any scheduler after finitely many steps in expectation. Given a scheduler S for M recall the definition of expected frequencies of state action-pairs (s, α), states s ∈ S and state-sets T ⊆ S under S: Let T be one of the sets {eff cov }, {eff unc }, Cause, or a singleton {c} with c ∈ Cause. As T is visited at most once during each run of M (assumptions (A1) and (A2)), we have Pr S N (♢T ) = freq S (T ) for each scheduler S. This allows us to express the violation of (G) in terms of a quadratic constraint system over variables for the expected frequencies of state-action pairs. Let StAct denote the set of state-action pairs in M. We consider the following constraint system over the variables x s,α for each (s, α) ∈ StAct where we use the short form notation x s = α∈Act(s) x s,α : Using well-known results for MDPs without ECs (see, e.g., [Kal20, Theorem 9.16]), given a vector x ∈ R StAct , then x is a solution to (S1) and the balance equations (S2) and (S3) if and only if there is a (possibly history-dependent) scheduler S for M with x s,α = freq S (s, α) for all (s, α) ∈ StAct if and only if there is an MR-scheduler S for M with x s,α = freq S (s, α) for all (s, α) ∈ StAct.
The violation of (GPR-1) in Lemma 4.17 can be reformulated in terms of the frequency-variables as follows where x Cause is an abbreviation for c∈Cause x c : Finally, we can reformulate the condition Pr S M (♢Cause) > 0 by frequency variables: Lemma 4.18. Under assumptions (A1)-(A3), the set Cause is not a GPR cause for Eff in M iff the constructed quadratic system of inequalities (S1)-(S5) has a solution.
We can now prove that deciding the GPR condition can be done in coNP, Theorem 4.12.
Proof of Theorem 4.12. The quadratic system of inequalities can be constructed from M, Cause, and Eff in polynomial time. Except for the strict inequality constraint in (S5), it has the form of a quadratic program, for which the threshold problem can be decided in NP by [Vav90]. We will prove that also with this strict inequality, it can be checked in NP whether the system (S1)-(S5) has a solution. As the system of inequalities is expressing the violation of (GPR), deciding whether a set Cause is a GPR cause can then be done in coNP.
To show that satisfiability of the system (S1)-(S5) is in NP, we will provide a non-deterministic algorithm that runs in polynomial time and finds a solution if one exists. Some of the arguments are similar to the arguments used in [Vav90]. Additionally, we will rely on the implicit function theorem.
We begin by proving what a solution to (S1)-(S5) can be assumed to look like. Thus assume that a solution to (S1)-(S5) exists. There are two possible cases: Case 1: All solutions to (S1)-(S3) and (S5) satisfy (S4). Then, in particular, the frequency values of an MD-scheduler maximizing the probability to reach Cause are a solution to (S1)-(S3) and (S5) and hence to (S4) in this case. Case 2: There are solutions to (S1)-(S3) and (S5) that violate (S4). The space of feasible points for conditions (S1)-(S3) and (S5) is connected. Furthermore, the right hand side of (S4) is continuous. Hence, as there are also solutions to (S1)-(S3) and (S5) that satisfy (S4) by assumption, there is a solution to (S1)-(S3) and (S5) that satisfies Now, let us take a closer look at Case 2: First of all, we add the equation to our system. Thus, the variables are x Cause and x s,α for each (s, α) ∈ StAct. Obviously, this does not influence the satisfiability. Equation (S4') now contains the new variable x Cause , which is not an abbreviation anymore. We write x for the vector of variables x s,α with (s, α) ∈ StAct.
In Case 2, there is a solution (x * , x * Cause ) such that the maximal possible number of variables is 0 and such that x * Cause is maximal among all such solutions. Let X ′ be the set of variables that are 0 in (x * , x * Cause ). We remove all variables from X ′ from all constraints by setting them to 0 and call the resulting system (T1)-(T6) where (T4) is obtained from (S4'), while all other equations (Ti) are obtained from (Si), by removing the chosen variables. We then collect the remaining variables in the vector v = (y, y Cause ). Let (y * , y * Cause ) be the solution (x * , x * Cause ) after the variables in X ′ have been removed. Thus, all values in this vector are positive.
Define the function f as the right hand side of (T4): where the variables y are as the original variables x after the variables in X ′ have been removed. Now, we apply the implicit function theorem: Observe that Evaluated at (y * , y * Cause ), this value is non-zero as all summands are negative and there are at least some of the variables in the abbreviation y c with c ∈ Cause left, i.e., not removed because they were not 0 due to the original constraint (S5). So, we can apply the implicit function theorem, which guarantees us the existence of a function g(y), such that g(y * ) = y * Cause and, for all y ′ in an open ball B 1 around y * , we have f(y ′ , g(y ′ )) = 0.
By the implicit function theorem, we can explicitly compute the gradient of the derivatives on B 1 for the appropriate k from the derivatives of f. Note that on B 1 , the gradient ∇g is 0 iff ∂y k is 0. Furthermore, all entries of H(y, y Cause ) are linear in the variables v as the function f is quadratic. As the function g has a local maximum in y * , we know that ∇g evaluated at y * is 0.
Equations (T2), (T3), and (T6) are linear equations in the remaining variables v. We can rewrite these three equations with a matrix M and a vector b whose entries can easily be expressed in terms of the coefficients of the original system (again, after the set of variables X ′ has been removed) as The solutions to this equation form an r-dimensional affine space W. It can be written as for some vectors c 0 , c 1 , . . . , c r which can be computed from M and b in polynomial time.
Let B 2 be an open ball in R r such that h(B 2 ) ⊆ B 1 and such that h(B 2 ) contains (y * , y * Cause ). We claim that g • h : B 2 → R has an isolated local maximum at z * def = h −1 (y * , y * Cause ). It is clear that g • h has a local maximum since g has a local maximum at (y * , y * Cause ). Suppose now, that g • h does not have an isolated local maximum at h −1 (y * , y * Cause ). As h is an affine map and the graph of g is the solution to a quadratic equation, this is only possible if there is a direction d ∈ R r \ {0} such that for all t ∈ R. Due to the boundedness of the polyhedron described by conditions (T1)-(T3), (T5) and (T6) and since z * lies in the interior of this polyhedron, this means that there must be a value Since H(h(z)) is a vector of linear expressions in z, this implies that z * is the only solution on R r to H(h(z)) = 0. This is the key result that we need to provide a non-deterministic polynomial-time algorithm to check the satisfiability of the original constraint system.
Let us now describe the algorithm: The algorithm begins by computing the frequency values of an MD-scheduler as in Case 1 in polynomial time and checks whether the resulting vector of frequency values satisfies (S1)-(S5). If this is the case, the algorithm returns that the system is satisfiable.
If this is not the case, the algorithm tries to compute a solution to (S1)-(S3), (S5), and (S4') as in Case 2. The algorithm non-deterministically guesses a subset of the variables and removes them from all constraints by replacing them with 0.
Suppose we guess the set X ′ as above. We show that we then compute a solution. After the variables from X ′ have been removed, H(y, y Cause ) can be computed in polynomial time as all the derivatives of f are linear expressions in the variables which require basic arithmetic and can be computed in polynomial time. Likewise, M and b can be computed in polynomial time from the original constraints after the guessed variables have been removed. The vectors c 0 , c 1 , . . . , c r describing the solution space to Mv = b can then also be computed in polynomial time.
Thus, also the vector H(h(z)) of linear expressions in the variables z can be computed in polynomial time. The equation system H(h(z)) = 0 has a unique solution if the guessed variables were indeed X ′ . In this case, the solution z * can be computed in polynomial time as well. If the guess of variables was not X ′ , then either there is no unique solution to this equation system which can be detected in polynomial time, or the solution, which is computed in the sequel in polynomial time, might not satisfy the original constraints, which is checked in the end.
From z * , we can compute y * = h(z * ) using the vectors c 0 , c 1 , . . . , c r . The solution x * is then obtained by plugging in 0s for the removed variables. Checking whether the resulting vector satisfies all constraints can also be done in polynomial time in the end. If X ′ was guessed correctly, this vector x * indeed forms a solution to the original constraints as we have seen.
In summary, the algorithm needs to guess the set X ′ of variables which are 0 in a solution to the original constraints with the maximal number of zeroes. All other steps are deterministic polynomial-time computations. Thus, satisfiability of (S1)-(S5) can be checked in NP.
Memory requirements of schedulers in the original MDP (Proof of Theorem 4.13). Every solution to the linear system of inequalities (S1), (S2), and (S3) corresponds to the expected frequencies of state-action pairs of an MR-scheduler in the transformed model satisfying (A1)-(A3). Hence: Corollary 4.19. Under assumptions (A1)-(A3), Cause is no GPR cause for Eff iff there exists an MR-scheduler T with Pr T M (♢Cause) > 0 violating (GPR). The model transformation we used for assumptions (A1)-(A3), however, does affect the memory requirements of scheduler. We may further restrict the MR-schedulers necessary to witness noncausality under assumptions (A1)-(A3). For the following lemma, recall that τ is the action of the MEC quotient used for the extra transition from states representing MECs to a new trap state (see also assumption (A3)).
We will show how to transform U into an MR-scheduler T that schedules the τ-transitions to noeff tn with probability 0 or 1. We regard the set U of states u which have a τ-transition to noeff tn (recall that then P(u, τ, noeff tn ) = 1) and where 0 < U(u)(τ) < 1. We now process the U-states in an arbitrary order, say u 1 , . . . , u k , and generate a sequence T 0 = U, T 1 , . . . , T k of MR-schedulers such that for i ∈ {1, . . . , k}: • T i refutes (GPR) (or equivalently condition (GPR-1) from Lemma 4.17) • T i agrees with T i−1 for all states but u i , Thus, the final scheduler T k satisfies the desired properties.
To explain how to derive T i from T i−1 , let i ∈ {1, . . . , k}, V = T i−1 , u = u i and y = 1−V(u)(τ). Then, 0 < y < 1 (as u ∈ U and by definition of U) and y = α∈Act(u)\{τ} V(u)(α). For x ∈ [0, 1], let V x denote the MR-scheduler that agrees with V for all states but u, for which V x 's decision is: Obviously, V y = V. We now show that at least one of the two MR-schedulers V 0 or V 1 also refutes (GPR). For this, we suppose by contraction that this is not the case, which means that (GPR) holds for both. Let f : [0, 1] → Q be defined by As V = V y violates (GPR-1), while V 0 and V 1 satisfy (GPR-1) we obtain: We now split Cause into the set C of states c ∈ Cause such that there is a V-path from init to c that traverses u and D = Cause \ C. Thus, As y is fixed, the values p y , p y,c , q y , v y can be seen as constants. Moreover, p x , p x,c , q x , v x differ from p y , p y,c , q y , v y only by the factor x y . That is: p x = p y x y , p x,c = p y,c x y , q x = q y x y and v x = v y x y . Thus, f(x) has the following form: For the value a, we have ax 2 = p x q x +p x v x and hence a = 1 y 2 (p y q y + p y v y ) > 0. But then the second derivative f ′′ (x) = 2a of f is positive, which yields that f has a global minimum at some point x 0 and is strictly decreasing for x < x 0 and strictly increasing for x > x 0 . As f(0) and f(1) are both negative, we obtain f(x) < 0 for all x in the interval [0, 1]. But this contradicts f(y) ⩾ 0.
This yields that at least one of the schedulers V 0 or V 1 witnesses the violation of (GPR). Thus, we can define T i ∈ {V 0 , V 1 } accordingly.
The number of states k in U is bounded by the number of states in S. In each iteration of the above construction, the function value f(0) is sufficient to determine one of the schedulers V 0 and V 1 witnessing the violation of (GPR). So, the procedure has to compute the values in condition (GPR-1) for k-many MR-schedulers and update the scheduler afterwards. As the update can easily be carried out in polynomial time, the run-time of all k iterations is polynomial as well.
The condition that τ only has to be scheduled with probability 0 or 1 in each state is the key to transfer the sufficiency of MR-schedulers to the MDP M [Cause] . This fact is of general interest as well and stated in the following theorem where τ again is the action added to move from a state s E to the new trap state in the MEC-quotient.  (s 1 , α 1 ), . . . , (s k , α k ). Further, let p i def = S(s E )(α i ) > 0 for all 1 ⩽ i ⩽ k. By assumption 1⩽i⩽k p i = 1. When entering E, the scheduler works in k memory modes 1, . . . , k until an action α that does not belong to E is scheduled starting in memory mode 1. In each memory mode i, T follows an MD-scheduler for E that reaches s i with probability 1 from all states of E. Once, s i is reached, T chooses action α i with probability Now T leaves E via (s k , α k ) with probability 1 if it reaches the last memory mode k. As T behaves MD in each mode, it leaves the end component E after finitely many steps in expectation. Furthermore,  Together with Lemma 4.20, this means that T and hence the scheduler with two memory modes whose existence is stated in Theorem 4.13 can be computed from a solution to the constraint system (S1)-(S5) from Section 4. Remark 4.23 (On lower bounds on GPR checking). Solving systems of quadratic inequalities with linear side constraints is NP-hard in general (see, e.g., [GJ79]). For convex problems, in which the associated symmetric matrix occurring in the quadratic inequality has only non-negative eigenvalues, the problem is, however, solvable in polynomial time [KTK80]. Unfortunately, the quadratic constraint system describing a scheduler refuting (GPR) given by (S1)-(S5) is not of this form. We observe that even if Cause is a singleton {c} and the variable x eff unc is forced to take a constant value y by (S1)-(S3), i.e., by the structure of the MDP, the inequality (S4) takes the form: Here, the 1 × 1-matrix (−w c −y) has a negative eigenvalue. Although it is not ruled out that (S1)-(S5) belongs to another class of efficiently solvable constraint systems, the NP-hardness result in [PV91] for the solvability of quadratic inequalities of the form (4.6) with linear side constraints might be an indication for the computational difficulty. ◁

QUALITY AND OPTIMALITY OF CAUSES
The goal of this section is to identify notions that measure how "good" causes are and to present algorithms to determine good causes according to the proposed quality measures. We have seen so far that small (singleton) causes are easy to determine (see Section 4.1). Moreover, it is easy to see that the proposed existence-checking algorithm can be formulated in such a way that the algorithm returns a singleton (strict or global) probability-raising cause {c 0 } with maximal precision, i.e., a state c 0 where inf S Pr S M (♢Eff|♢c 0 ) = Pr min M,c 0 (♢Eff) is maximal. On the other hand, singleton or small cause sets might have poor coverage in the sense that the probability for paths that reach an effect state without visiting a cause state before ("uncovered effects") can be large. This motivates

Path hits
Eff ¬Eff Cause True positive (tp) False positive (fp) Cause correctly predicted Eff Cause falsely predicted Eff ¬Cause False negative (fn) True negative (tn) Cause falsely not predicted Eff Cause correctly not predicted Eff Figure 13: Confusion matrix for Cause as a binary classifier for Eff the consideration of quality notions for causes that incorporate how well effect scenarios are covered. We take inspiration of quality measures that are considered in statistical analysis (see e.g. [Pow11]). This includes the recall as a measure for the relative coverage (proportion of covered effects among all effect scenarios), the coverage ratio (quotient of covered and uncovered effects) as well as the f-score. The f-score is a standard measure for classifiers defined by the harmonic mean of precision and recall. It can be seen as a compromise to achieve both good precision and good recall.
In this section, we assume as before an MDP M = (S, Act, P, init) and Eff ⊆ S are given where all effect states are terminal. Furthermore, we suppose all states s ∈ S are reachable from init.

Quality measures for causes.
In statistical analysis, the precision of a classifier with binary outcomes ("positive" or "negative") is defined as the ratio of all true positives among all positively classified elements, while its recall is defined as the ratio of all true positives among all actual positive elements. Translated to our setting, we consider classifiers induced by a given cause set Cause that return "positive" for sample paths in case that a cause state is visited and "negative" otherwise. The intuitive meaning of true positives, false positives, true negatives and false negatives is as described in the confusion matrix in Figure 13. The formal definition is . With this interpretation of causes as binary classifiers in mind, the recall and precision and coverage ratio of a cause set Cause under a scheduler S are defined as follows: Note that for these definitions we make some respective assumptions on the scheduler. We assume Finally, the f-score of Cause under a scheduler S is defined as the harmonic mean of the precision and recall. Here we assume Pr S M (♢Cause) > 0, which implies Pr S M (♢Eff) > 0: If, however, Pr S M (♢Eff) > 0 and Pr S M (♢Cause) = 0 for some S, define fscore S (Cause) = 0. This again makes sense as for a sequence of schedulers converging to S the f-score also converges to 0 (also see Lemma 5.6).
To lift the definitions of the quality measures under a scheduler to the quality measure of a cause, we consider the worst-case scheduler: Definition 5.1 (Quality measures for causes). Let Cause be a PR cause. We define recall(Cause) = inf S recall S (Cause) = Pr min M ( ♢Cause | ♢Eff ) when ranging over all schedulers S with Pr S M (♢Eff) > 0. Likewise, the coverage ratio and f-score of Cause are defined by the worst-case coverage ratio resp. f-score -ranging over schedulers for which covrat S (Cause) resp. fscore S (Cause) is defined: Besides the quality measures defined so far, which we will address in detail, there is a vast landscape of further quality measures for binary classifiers in the literature (for an overview, see, e.g., [Pow11]). One prominent example which has been claimed to be superior to the f-score recently [CJ20] is Matthews correlation coefficient (MCC). In terms of the entries of a confusion matrix (as in Figure 13), it is defined as MCC = tp · tn − fp · fn (tp + fp) · (tp + fn) · (tn + fp) · (tn + fn) In contrast to the f-score (as well as recall and coverage ratio), it makes use of all four entries of the confusion matrix. In our setting, we could assign the MCC to a Cause by again taking the infimum of the value over all sensible schedulers.
Like the MCC, almost all (cf. [Pow11]) of the quality measures studied in the literature are algebraic functions (intuitively speaking, built from polynomials, fractions and root functions) in the entries of the confusion matrix. At the end of this section, we will comment on the computational properties of finding good causes when quality is measured by the infimum over all sensible schedulers of an algebraic function in the entries of the confusion matrix.

5.2.
Computation schemes for the quality measures for fixed cause set. For this section, we assume a fixed PR cause Cause is given and address the problem to compute its quality values. The first observation is, that all quality measures are preserved by the switch from M to M [Cause] as well as the transformations of M [Cause] to an MDP that satisfies conditions (A1)-(A3) of Section 4.2. In the following Lemmata 5.2 and 5.3 we show that the quality measures recall, covrat and fscore of a fixed Cause are compatible with the model transformations from section 4. These are, on one hand a transformation to M [Cause] , which only considers the minimal probability to reach Eff starting from Cause, and on the other hand a transformation to an MDP N satisfying (A1)-(A3), which has no end components and has exactly four terminal states eff cov , eff unc , noeff fp , noeff tn .
Lemma 5.2. If Cause is an SPR or a GPR cause then: Proof. "⩽": A scheduler for M [Cause] can be seen as a scheduler S for M behaving as an MDscheduler minimizing the reachability probability of Eff from every state in Cause and we have: and therefore: We obtain recall M (Cause) ⩽ recall M [Cause] (Cause) and the analogous statements for the coverage ratio and the f-score. "⩾": Let S be a scheduler of M. Let T be the scheduler of M that behaves as S until the first visit to a state in Cause. As soon as T has reached Cause, it behaves as an MD-scheduler minimizing the probability to reach Eff. Recall and coverage under T and S have the form: With similar arguments we get: As the harmonic mean viewed as a function f : R 2 >0 → R, f(x, y) = 2 xy x+y is monotonically increasing in both arguments (note that df dx = y 2 (x+y) 2 > 0 and df dy = x 2 (x+y) 2 > 0), we obtain: This now allows us to work under assumptions (A1)-(A3) when addressing problems concerning the quality measures for a fixed cause set.
As efficient computation methods for recall(Cause) are known from literature (see [BKKM14,Mär20] for poly-time algorithms to compute conditional reachability probabilities), we can use the same methods to compute the coverage ratio. which can be computed in polynomial time by [BKKM14,Mär20].
In contrast to these results, we are not aware of known concepts that are applicable for computing the f-score. Indeed, this quality measure is efficiently computable: Theorem 5.5. The value fscore(Cause) and corresponding worst-case schedulers are computable in polynomial time.
The remainder of this subsection is devoted to the proof of Theorem 5.5. We can express fscore(Cause) in terms of the supremum of a quotient of reachability probabilities for disjoint sets of terminal states. More precisely, under assumptions (A1)-(A3) and assuming fscore(Cause) > 0, we have: where S ranges over all schedulers with Pr S M (♢eff cov ) > 0. Moreover, we can show that we can handle the corner case of fscore(Cause) = 0.
Lemma 5.6. Let Cause be an SPR or a GPR cause. Then, the following three statements are equivalent: To compute these, we rely on a polynomial reduction to the classical stochastic shortest path problem [BT91]. For this, consider the MDP N arising from M by adding reset transitions from all terminal states t ∈ S\V to init. Thus, exactly the V-states are terminal in N. N might contain ECs, which, however, do not intersect with V. We equip N with the weight function that assigns 1 to all states in U and 0 to all other states. Let T be the scheduler that behaves as S in the first round and after each reset. Then: where (5.2) relies on some basic calculations (see Lemma 5.9). This yields: . For the other direction E min N (⊞V) ⩾ ratio min M (U, V), we use the fact that there is an MD-scheduler T for N such that E T N (⊞V) = E min N (⊞V). T can be viewed as an MD-scheduler for the original MDP M. Again we can rely on (5.1) to obtain that: we use similar arguments. We can now rely on known results [BT91, dA99, BBD + 18] to compute E min N (⊞V) and E max N (⊞V) in polynomial time.
Lemma 5.9. Let x, p, q ∈ [0, 1] such that x+q+p = 1. Then: Proof. We first show for 0 < q < 1, n ∈ N and a n def = ∞ k=0 n+k k q k , we have This is done by induction on n. The claim is clear for n=0. For the step of induction we use: But this yields a n+1 = a n + q · a n+1 . Hence: a n+1 = a n 1−q The claim then follows directly from the induction hypothesis.
The statement of Lemma 5.9 now follows by some calculations and the preliminary induction.
Applying this framework for ratio max M (U, V) to the f-score we now prove Theorem 5.5.
Proof of Theorem 5.5. We use the simplifying assumptions (A1)-(A3) that can be made due to Lemmas 5.2 and 5.3. For fscore(Cause) we have after some straight-forward transformations fscore S (Cause) = 2tp S 2tp S + fn S + fp S .
Using this we get 2 fscore S (Cause) − 2 = fp S + fn S tp S = Pr S M (♢noeff fp ) + Pr S M (♢eff unc ) Pr S M (♢eff cov ) Thus, the task is to compute where S ranges over all schedulers with Pr S M (♢eff cov ) > 0. We have But X can be expressed as a supremum in the form of Theorem 5.7. This yields the claim that the optimal value is computable in polynomial time.
In case fscore(Cause) = 0, we do not obtain an optimal scheduler via Theorem 5.7. Lemma 5.6, however, shows that there is a scheduler S with Pr S M (♢Eff) > 0 and Pr S M (♢Cause) = 0. Such a scheduler can be computed in polynomial time as any (memoryless) scheduler in the largest sub-MDP of M that does not contain states in Cause. (This sub-MDP can be constructed by successively removing states and state-action pairs.) 5.3. Quality-optimal probability-raising causes. For the computation there is no difference between GPR and SPR causes as only the quality properties of the set are in question. However, when finding optimal causes the distinction makes a difference. Here, we say an SPR cause Cause is recall-optimal if recall(Cause) = max C recall(C) where C ranges over all SPR causes. Likewise, ratio-optimality resp. f-score-optimality of Cause means maximality of covrat(Cause) resp. fscore(Cause) among all SPR causes. Recall-, ratio-and f-score-optimality for GPR causes are defined accordingly.
Lemma 5.10. Let Cause be an SPR or a GPR cause. Then, Cause is recall-optimal if and only if Cause is ratio-optimal.
Proof. Essentially the proof uses the same connection between recall and covrat as Corollary 5.4. Here we do not assume (A1)-(A3). However, for each scheduler S and each set C of states we have: For all non-negative reals p, q, p ′ , q ′ where q, q ′ > 0 we have: Hence, if C is fixed and S ranges over all schedulers with tp S C > 0: Thus, if C is fixed and S = S C is a scheduler achieving the worst-case (i.e., minimal) coverage ratio for C then S achieves the minimal recall for C, and vice versa.
Let now fn C = fn S C C , tp C = tp S C C where S C is a scheduler that minimizes the coverage ratio and minimizes the recall for cause set C. Then: where the extrema range over all SPR resp. GPR causes C. This yields the claim.
Recall-and ratio-optimal SPR causes. The techniques of Section 4.1 yield an algorithm for generating a canonical SPR cause with optimal recall and coverage ratio. To see this, let C denote the set of all states which constitute a singleton SPR cause. The canonical cause CanCause is defined as the set of states c ∈ C such that there is a scheduler S with Pr S M ((¬C) U c) > 0. So to speak CanCause is the "front" of C. Obviously, C and CanCause are computable in polynomial time.
Theorem 5.11. If C ̸ = ∅ then CanCause is a ratio-and recall-optimal SPR cause.
Proof. By definition of SPR causes any subset C ⊆ C satisfying Pr max (¬C U c) for each c ∈ C constitutes an SPR cause and thus CanCause is also an SPR cause. Optimality is a consequence as CanCause even yields path-wise optimal coverage in the following sense. If C is any SPR cause then C ⊆ C by definition and for each path π in M: But then for every scheduler S,which yields the claim.
Remark 5.12. It is not true that the canonical SPR cause CanCause is f-score-optimal. To see this, Consider the Markov chain from Figure 16. There we have CanCause = {s 1 }, which has precision(CanCause) = 3 4 and recall(CanCause) = 3 8 /( 1 4 + 3 8 ) = 3 5 . But the SPR cause {s 2 } has better f-score as its precision is 1 and it has the same recall as CanCause. ◁ F-score-optimal SPR cause. From Section 5.2, we see that f-score-optimal SPR causes in MDPs can be computed in polynomial space by computing the f-score for all potential SPR causes one by one in polynomial time (Theorem 5.5). As the space can be reused after each computation, this results in polynomial space. For Markov chains, we can do better and compute an f-score-optimal SPR cause in polynomial time via a polynomial reduction to the stochastic shortest path problem: Theorem 5.13. In Markov chains that have SPR causes, an f-score-optimal SPR cause can be computed in polynomial time.
Proof. We regard the given Markov chain M as an MDP with a singleton action set Act = {α}. As M has SPR causes, the set C of states that constitute a singleton SPR cause is nonempty. We may assume that M has no non-trivial (i.e., cyclic) bottom strongly connected components as we may collapse them. Let w c = Pr M,c (♢Eff). We switch from M to a new MDP K with state space S K = S ∪ {eff cov , noeff fp } with fresh states noeff fp and eff cov and the action set Act K = {α, γ}. The MDP K arises from M by adding (i) : for each SPR state c ∈ C a fresh state-action pair (c, γ) such that P K (c, γ, eff cov ) = w c and P K (c, γ, noeff fp ) = 1−w c and (ii) : reset transitions to init with action label α from the new state noeff fp and all terminal states of M, i.e., P K (noeff fp , α, init) = 1 and P K (s, α, init) = 1 for s ∈ Eff or if s is a terminal non-effect state of M. So, exactly eff cov is terminal in K, and Act K (c) = {α, γ} for c ∈ C, while Act K (s) = {α} for all other states s. Intuitively, taking action γ in state c ∈ C selects c to be a cause state. The states in Eff represent uncovered effects in K, while eff cov stands for covered effects.
We assign weight 1 to all states in U = Eff ∪ {noeff fp } and weight 0 to all other states of K. Let V = {eff cov }. Then, f = E min K (⊞V) and an MD-scheduler S for K such that E S K (⊞V) = f are computable in polynomial time. Let C γ denote the set of states c ∈ C where S(c) = γ and let Cause be the set of states c ∈ C γ where M has a path satisfying (¬C γ ) U c. Then, Cause is an SPR cause of M. With arguments as in Section 5.2 we obtain fscore(Cause) = 2/(f+2). It remains to show that Cause is f-score-optimal. Let C be an arbitrary SPR cause. Then, C ⊆ C. Let T be the MD-scheduler for K that schedules γ in C and α for all other states of K. Then, fscore . Hence, f ⩽ f T , which yields fscore(Cause) ⩾ fscore(C).
The naïve adaption of the construction presented in the proof of Theorem 5.13 for MDPs would yield a stochastic game structure where the objective of one player is to minimize the expected accumulated weight until reaching a target state. Although algorithms for stochastic shortest path (SSP) games are known [PB99], they rely on assumptions on the game structure which would not be satisfied here. However, for the threshold problem SPR-f-score where inputs are an MDP M, Eff and ϑ ∈ Q ⩾0 and the task is to decide the existence of an SPR cause whose f-score exceeds ϑ, we can establish a polynomial reduction to SSP games, which yields an NP ∩ coNP upper bound: Theorem 5.14. The decision problem SPR-f-score is in NP ∩ coNP.
Recall that for a given SPR cause C and scheduler S we have In order to proof the upper bound of SPR-f-score we reformulate the condition of SPR-f-score.
Proof of Theorem 5.14. Let M = (S, Act, P, init) be an MDP, Eff ⊆ S a set of terminal states, and ϑ a rational. Consider C, the set of states c ∈ S \ Eff where {c} is an SPR cause. If C is empty then the threshold problem is trivial as there is no SPR cause at all. Thus, we suppose that C is nonempty.
Note Claim 1: There is an SPR cause C for Eff in M with fscore(C) > ϑ if and only if there is an SPR cause C ′ for Eff in N with fscore(C ′ ) > ϑ. Proof of Claim 1. We first observe that all reachability probabilities involved in the claim do not depend on the behavior during the traversal of MECs. Furthermore, staying inside a MEC in M can be mimicked in N by moving to ⊥, and vice versa. More precisely, let C ⊆ C. Then, analogously to Lemma 4.15, for each scheduler S for M, there is a scheduler T for N, and vice versa, such that

Model transformation for ensuring positive effect probabilities.
Recall that the f-score is only defined for schedulers reaching Eff with positive probability. Now, we will provide a further model transformation that will ensure that Eff is reached with positive probability under all schedulers. If this is already the case, there is nothing to do. So, we assume now that Pr min N,init N (♢Eff) = 0. We define the subset of states from which Eff can be avoided as D ⊆ S N by Note that init N ∈ D. For each s ∈ D, we further define the set of actions minimizing the reachability probability of Eff from s by Act min (s) = {α ∈ Act N (s) | P N (s, α, D) = 1}.
Finally, let E ⊆ D be the set of states that are reachable from init N when only choosing actions from Act min (·). Note that E does not contain any states from C, meaning no state in E constitutes a singleton SPR cause. All schedulers that reach Eff with positive probability in N have to leave the sub-MDP consisting of E and the actions in Act min (·) at some point. Let us call this sub-MDP N min E . We define the set of state-action pairs Π that leave the sub-MDP N min E : We now construct a further MDP K. The idea is that K behaves like the end-component free MDP N after initially a scheduler is forced to choose a probability distribution over state-action pairs from Π. In this way, Eff is reached with positive probability under all schedulers. Given an SPR cause, we will observe that for the f-score of this cause under a scheduler, it is only important how large the probabilities with which state action pairs from Π are chosen are relative to each other while the absolute values are not important. Due to this observation, for each SPR cause C and for each scheduler S for N that reaches Eff with positive probability, we can then construct a scheduler for K that leads to the same recall and precision of C.
Formally, K is defined as follows: The state space is S N ∪ {init K } where init K is a fresh initial state. For all states in S N , the same actions as in N are available with the same transition probabilities. I.e., for all s, t ∈ S N , Act K (s) def = Act N (s) and P K (s, α, t) def = P N (s, α, t) for all α ∈ Act K (s).
For each state-action pair (s, α) from Π, we now add a new action β (s,α) that is enabled only in init K . These are all actions enabled in init K , i.e., For each state t ∈ S N , we define the transition probabilities under β (s,α) by Claim 2: A subset C ⊆ C such that for all c ∈ C : Pr max N (¬C U c) > 0 is an SPR cause for Eff in N with fscore(C) > ϑ if and only if for all schedulers T for K, we have Proof of Claim 2. We first prove the direction "⇒". So, let C be an SPR cause for Eff in endcomponent free MDP N with fscore(C) > ϑ. As first observation we have that in order to prove (5.4) for all schedulers T for K, it suffices to consider schedulers T that start with a deterministic choice for state init K and then behave in an arbitrary way.
To see this, we consider the MDP K C which consists of two copies of K: "before C" and "after C". When K C enters a C-state in the first copy ("before C"), it switches to the second copy ("after C") and stays there forever. Let us write (s, 1) for state s in the first copy and (s, 2) for the copy of state s in the second copy. Thus, in K C the event ♢C ∧ ♢Eff is equivalent to reaching a state (eff, 2) where eff ∈ Eff, while ♢C ∧ ¬♢Eff is equivalent to reaching a non-terminal state in the second copy, while ¬♢C ∧ ♢Eff corresponds to the event reaching an effect state in the first copy.
Obviously, there is a one-to-one-correspondence of the schedulers of K and K C . As K has no end components so does K C . Therefore, a terminal state will be reached almost surely under every scheduler. Furthermore, we equip K C with a weight function on states which assigns weight 2(1−ϑ) to the states (eff, 2) where eff ∈ Eff, weight −ϑ to the states (eff, 1) where eff ∈ Eff and to the states (s, 2) where s is a terminal non-effect state in K (and K C ), and weight 0 to all other states. Let V denote the set of all terminal states in K C . Then, the expression on the left hand side of (5.4) equals E T K C (⊞V), the expected accumulated weight until reaching a terminal state under scheduler T. Hence, (5.4) holds for all schedulers T in K if and only if E min K C (⊞V) > 0. It is well-known that the minimal expected accumulated weight in EC-free MDPs is achieved by an MD-scheduler [BK08]. Thus, there is an MD-scheduler T of K C with E min K C (⊞V) = E T K C (⊞V). When viewed as a scheduler of K, T behaves memoryless deterministic before reaching C. In particular, T's initial choice in init K is deterministic. Recall the set Act min (s) of actions minimizing the reachability probability of Eff from s. Consider a scheduler T for K with a deterministic choice T(init K )(β (s,α) ) = 1 where (s, α) ∈ Π. To construct an analogous scheduler S of N, we pick an MD-scheduler U of the sub-MDP N min E of N induced by the state-action pairs (u, β) where u ∈ E and β ∈ Act min (u) such that there is a U-path from init N to state s.
Scheduler S of N operates with the mode m 1 and the modes m 2,t for t ∈ S N . In its initial mode m 1 , scheduler S behaves as U as long as state s has not been visited. When having reached state s in mode m 1 , then S schedules the action α with probability 1. Let t ∈ S N be the state that S reaches via the α-transition from s. Then, S switches to mode m 2,t and behaves from then on as the residual scheduler res(T, ϖ) of T for the T-path ϖ = init K β (s,α) t in K. That is, after having scheduled the action β (s,α) , scheduler S behaves exactly as T.
Let λ denote S's probability to leave mode m 1 , which equals U's probability to reach s from init N . Thus, λ = Pr U N (♢s) when U is viewed as a scheduler of N. As E is disjoint from C and Eff, scheduler S stays forever in mode m 1 and never reaches a state in C ∪ Eff with probability 1−λ.
S and T behave identically after choosing the state-action pair (s, α) ∈ Π or the corresponding action β (s,α) , respectively, which implies that . Recall the sub-MDP N min E consisting of E and the actions in Act min (·). As S leaves the sub-MDP N min E with probability λ > 0, we have Pr S N (♢Eff) > 0. By Lemma 5.15, we can conclude that 2(1−ϑ)tp S N − ϑfp S N − ϑfn S N > 0. By the equations above, this in turn implies that For the direction "⇐", first recall that any subset of C satisfying (M) is an SPR cause for Eff in N by definition of C. Now, let S be a scheduler for N with Pr S N (♢Eff) > 0. Let Γ be the set of finite S-paths γ in the sub-MDP N min E such that S chooses an action in Act N (last(γ)) \ Act min (last(γ)) with positive probability after γ where last(γ) denotes the last state of γ. Let q def = γ∈Γ α∈Act N (last(γ))\Act min (last(γ)) So, q is the overall probability that a state-action pair from Π is chosen under S. We now define a scheduler T for K: For each γ ∈ Γ ending in a state s and each α ∈ Act N (s) \ Act min (s), the scheduler T chooses action β (s,α) in init K with probability P N (γ) · S(γ)(α)/q. When reaching a state t afterwards, T behaves like res(S, γ α t) afterwards. Note that by definition this indeed defines a probability distribution over the actions in the initial state init K .
By assumption, we know that now 2(1−ϑ)tp T K − ϑfp T K − ϑfn T K > 0. As the probability with which an action β (s,α) is chosen by T for a (s, α) ∈ Π is 1/q times the probability that α is chosen in s to leave the sub-MDP N min E under S in N and as the residual behavior is identical, we conclude that By Lemma 5.15, this shows that fscore(C) > ϑ in N and finishes the proof of Claim 2.
Construction of a game structure. Recall the set of singleton SPR causes C. We now construct a stochastic shortest path game (see [PB99]) to check whether there is a subset C ⊆ C such that (5.4) holds in the EC-free MDP K in which visiting effect states always has a non-zero probability. Such a game is played on an MDP-like structure with the only difference that the set of states is partitioned into two sets indicating which player controls which states.
The game G has states (S K × {yes, no}) ∪ C × {choice}. On the subset S K × {yes}, all actions and transition probabilities are just as in K and this copy of K cannot be left. Formally, for all s, t ∈ S K and α ∈ Act K (s), we have Act G ((s, yes)) = Act K (s) and P G ((s, yes), α, (t, yes)) = P K (s, α, t).
In the "no"-copy, the game also behaves like K but when a state in C would be entered, the game moves to a state in C × {choice} instead. In a state of the form (c, choice) with c ∈ C, two actions α and β are available. Choosing α leads to the state (c, yes) while choosing β leads to (c, no) with probability 1.
Intuitively speaking, whether a state c ∈ C should belong to the cause set can be decided in the state (c, choice). The "yes"-copy encodes that an effect state has been selected. More concretely, the "yes-copy" is entered as soon as α has been chosen in some state (c, choice) and will never be left from then on. The "no"-copy of K then encodes that no state c ∈ C which has been selected to become a cause state has been visited so far. That is, if the current state of a play in G belongs to the no-copy then in all previous decisions in the states (c, choice), action β has been chosen.
Finally, we equip the game with a weight structure. All states in Eff × {yes} get weight 2(1 − ϑ). All remaining terminal states in S K × {yes} get weight −ϑ. All states in Eff × {no} get weight −ϑ. All remaining states have weight 0.
The game is played between two players 0 and 1. Player 0 controls all states in C × {choice} while player 1 controls the remaining states. The goal of player 0 is to ensure that the expected accumulated weight is > 0.
Claim 3: Player 0 has a winning strategy ensuring that the expected accumulated weight is > 0 in the game G if and only if there is a subset C ⊆ C in K which satisfies for all c ∈ C : Pr max K (¬C U c) > 0 and for all schedulers T for K we have Proof of Claim 3. As K has no end components, also in the game G a terminal state is reached almost surely under any pair of strategies. Hence, we can rely on the results of [PB99] that state that both players have an optimal memoryless deterministic strategy. We start by proving direction "⇒" of Claim 3. Let ζ be a memoryless deterministic winning strategy for player 0. I.e., ζ assigns to each state in C × {choice} an action from {α, β}. We define Note that C α is not empty as otherwise a positive expected accumulated weight in the game is not possible. (Here we use the fact that only the effect states in the yes-copy have positive weight and that the yes-copy can only be entered by taking α in one of the states (c, choice).) To ensure for all c ∈ C : Pr max K (¬C U c) > 0, we remove states that cannot be visited as the first state of this set: Note that the strategies for player 0 in G which correspond to the sets C α and C lead to exactly the same plays. Let T be a scheduler for K. This scheduler can be used as a strategy for player 1 in G. Let us denote the expected accumulated weight when player 0 plays according to ζ and player 1 plays according to T by w(ζ, T). As ζ is winning for player 0 we have w(ζ, T) > 0. By the construction of the game, it follows directly that Putting things together yields: For the other direction, suppose there is a set C ⊆ C that satisfies Pr max K (¬C U c) > 0 for all c ∈ C and (5.5) for all schedulers T for K. We define the MD-strategy ζ from C by letting ζ((c, choice)) = α if and only if c ∈ C. For any strategy T for player 1, we can again view T also as a scheduler for K. Equation (5.6) holds again and shows that the expected accumulated weight in G is positive if player 0 plays according to ζ against any strategy for player 1. This finishes the proof of Claim 3.
Putting together Claims 1-3. We conclude that there is an SPR cause C in the original MDP M with fscore(C) > ϑ if and only if player 1 has a winning strategy in the constructed game G. As both players have optimal MD-strategies in G [PB99], the decision problem is in NP ∩ coNP: We can guess the MD-strategy for player 0 and solve the resulting stochastic shortest path problem in polynomial time [BT91] to obtain an NP-upper bound. Likewise, we can guess the MD-strategy for player 1 and solve the resulting stochastic shortest path problem to obtain the coNP-upper bound.
Optimality and threshold constraints for GPR causes. Computing optimal GPR causes for either quality measure can be done in polynomial space by considering all cause candidates, checking (G) in coNP and computing the corresponding quality measure in polynomial time (Section 5.2). As the space can be reused after each computation, this results in polynomial space. However, we show that no polynomial-time algorithms can be expected as the corresponding threshold problems are NP-hard. Let GPR-covratio (resp. GPR-recall, GPR-f-score) denote the decision problems: Given M, Eff and ϑ ∈ Q, decide whether there exists a GPR cause with coverage ratio (resp. recall, f-score) at least ϑ.
Theorem 5.16. The problems GPR-covratio, GPR-recall and GPR-f-score are NP-hard and belong to Σ P 2 . For Markov chains, all three problems are NP-complete. NP-hardness even holds for tree-like Markov chains.
Proof. Σ P 2 -membership. The algorithms for GPR-covratio, GPR-recall and GPR-f-score rely on the guess-and-check principle: they start by non-deterministically guessing a set Cause ⊆ S, then check in coNP whether Cause constitutes a GPR cause (see Section 4) and finally check recall(Cause) ⩽ ϑ (with standard techniques), resp. covrat(Cause) ⩽ ϑ, resp. fscore(Cause) ⩽ ϑ (Theorem 5.5) in polynomial time. The alternation between the existential quantification for guessing Cause and the universal quantification for the coNP check of the GPR condition results in the complexity Σ P 2 of the polynomial-time hierarchy.
NP-membership for Markov chains. NP-membership for all three problems within Markov chains is straightforward as we may non-deterministically guess a cause and check in polynomial time whether it constitutes a GPR cause and satisfies the threshold condition for the recall, coverage ratio or f-score.
Note that p 0 + p 1 + . . . + p n < 1 2 as all p i 's are strictly smaller than 1 2(n+1) . As the w i 's are bounded by 1, this yields 0 < P(init, eff unc ) < 1 2 and 0 < P(init, noeff) < 1. The graph structure of M is indeed a tree and M can be constructed from the values A, A 1 , . . . , A n and B, B 1 , . . . , B n in polynomial time. Moreover, for Eff = {eff unc } ∪ {eff i : i = 0, 1, . . . , n} we have: As the values w 1 , . . . , w n are strictly smaller than 1 2 , we have Pr M ( ♢Eff | ♢C ) < 1 2 for each nonempty subset C of {s 1 , . . . , s n }. Thus, the only candidates for GPR causes are the sets C I = {s i : i ∈ I 0 } where I ⊆ {1, . . . , n} where as before I 0 = I ∪ {0}. Note that for all states s ∈ C I there is a path satisfying (¬C I ) U s. Thus, C I is a GPR cause iff C I satisfies (G). We have: Thus, C I is a GPR cause with recall at least 2(p 0 + b) if and only if the two conditions in (5.9) hold, which again is equivalent to the satisfaction of the conditions in (5.7). But this yields that M has a GPR cause with recall at least 2(p 0 + b) if and only if the knapsack problem is solvable for the input A, A 1 , . . . , A n , B, B 1 , . . . , B n . NP-hardness of GPR-f-score. Using similar ideas, we also provide a polynomial reduction from the knapsack problem. Let A, A 1 , . . . , A n , B, B 1 , . . . , B n be an input for the knapsack problem. We replace the A-sequence with a, a 1 , . . . , a n where a = A N and a i = A i N where N is as before. The topological structure of the Markov chain that we are going to construct is the same as in the NP-hardness proof for GPR-recall.
We will define polynomial-time computable values p 0 , p 1 , . . . , p n ∈ ]0, 1[ (where p i = P(init, s i )), w 1 , . . . , w n ∈ ]0, 1[ (where w i = P(s i , eff i )) and auxiliary variables δ ∈ ]0, 1[ and λ > 1 such that: Assuming such values have been defined, we obtain: For all positive real numbers x, y, u, v with x y = 1 λ we have: By the constraints for λ (see (2)), we have p 0 p 0 + 1 2 −δ = 1 λ . Therefore: As before let w 0 = 1 and I 0 = I ∪ {0}. Then, the above yields: As in the NP-hardness proof for GPR-recall and using (3.1): Thus, each GPR cause must have the form C I = {s i : i ∈ I 0 } for some subset I of {1, . . . , n}. Moreover: So, the f-score of C I is: With p 0 = 2a and using (3.1) and arguments as in the NP-hardness proof for GPR-recall, we obtain: Thus, the constructed Markov chain has a GPR cause with f-score at least 2 λ if and only if the knapsack problem is solvable for the input A, A 1 , . . . , A n , B, B 1 , . . . , B n .
Arbitrary quality measures. Consider any algebraic function f(tp, tn, fp, fn). That is f satisfies some polynomial equation where the coefficients are polynomials in tp, tn, fp and fn. Almost every quality measure for binary classifiers (see [Pow11]) is such a function. Taking the worst case scheduler for such a function we define where S ranges over all schedulers such that f S is well defined. Given a PR cause Cause and a rational ϑ ∈ Q, deciding whether f(Cause) ⩽ ϑ can be done in PSPACE as a satisfiability problem in the existential theory of the reals [Can88]. As we can decide for a given cause candidate Cause whether it is a SPR cause in P or a GPR cause in coNP, this also yields an algorithm for finding optimal causes for f. Given an MDP M with terminal effect set Eff and quality measure f as an algebraic function we consider each cause candidate Cause, check whether it is a PR cause (SPR or GPR) and consider the decision problem f(Cause) ⩽ ϑ. As all of these steps have a complexity upper bound of PSPACE and we only need to save the best cause candidate so far with its value f(Cause), this results in polynomial space as well.

ω-REGULAR EFFECT SCENARIOS
In this section, we turn to an extension of the previous definition of PR causes. So far, we considered both the cause and the effect as sets of states in an MDP M with state space S. We will refer to this setting as the state-based setting from now on. In a more general approach, we now consider the effect to be an ω-regular language rEff ⊆ S ω over the state space S. Note that we denote regular events as effects mainly by rEff to avoid confusion with effects as sets of states.
In a first step, we still consider sets of states Cause ⊆ S as causes, which we call reachability causes (Section 6.1). For reachability GPR causes, the techniques from the previous section are mostly still applicable. For reachability SPR causes, on the other hand, we observe that they take on the flavor of state-based GPR causes as well. Afterwards, we generalize the definition further to allow ω-regular co-safety properties over the state space S as causes, which we call co-safety causes (Section 6.2). While this allows us to express much more involved cause-effect relationships, we will see that attempts of checking co-safety SPR causality or of finding good causes for a given effect encounter major new difficulties. 6.1. Sets of states as causes. Throughout this section, let M = (S, Act, P, init) be an MDP. As long as we use sets of states as causes, the definition of GPR and SPR causes can easily be adapted to ω-regular effects: Definition 6.1 (Reachability GPR/SPR causes). Let M be as above. Let rEff ⊆ S ω be an ω-regular language over S and Cause a nonempty subset of S such that for each c ∈ Cause, there is a scheduler S with Pr S M ((¬Cause) U c) > 0. Then, Cause is said to be a reachability GPR cause for rEff iff the following condition (rG) holds: (rG): For each scheduler S where Pr S M (♢Cause) > 0: Cause is called a reachability SPR cause for rEff iff the following condition (rS) holds: (rS): For each state c ∈ Cause and each scheduler S where Pr S M ((¬Cause) U c) > 0: There is one small caveat that we want to mention here: If the effect rEff is a reachability property ♢Eff for a set of states Eff ⊆ S, then this new definition allows for GPR/SPR causes Cause not disjoint from the set of states Eff. If two sets Cause, Eff ⊆ S are disjoint, however, then Cause is a GPR/SPR cause for Eff according to Definition 3.1 iff Cause is a reachability GPR/SPR cause for the ω-regular event ♢Eff according to the new definition. As we now view the effect as the ω-regular property on infinite executions, one can, nevertheless, argue that the temporal priority (C2) is captured by the new definition since the cause will be reached after finitely many steps if it is reached. We will address problems with this interpretation and a stronger notion of temporal priority in Section 6.1.3.
A first simple observation that follows as in the state-based setting is that a reachability SPR cause for rEff is also a reachability GPR cause for rEff 6.1.1. Checking causality and existence of reachability PR causes. To explore how this change of definition influences the previously established results for GPR and SPR causes, we have to clarify how effects will be represented. We use deterministic Rabin automata (DRAs) as they are expressive enough to capture all ω-regular languages and they are deterministic, which will allow us to form well-behaved products of the automata with MDPs. Let M be an MDP, rEff an effect given by the DRA A rEff and Cause ⊆ S a cause candidate. As a special case we again have Markov chains with no non-deterministic choices. Then, the conditions (rG) and (rS) can easily be checked by computing the corresponding probabilities in polynomial time (see [BKKM14] for algorithms to compute conditional probabilities in MCs for path properties.) We now consider the case where non-deterministic choices exist. We will provide a model transformation of M using the DRA such that the resulting MDP has no end components and the effect is a reachability property again.
Notation 6.2 (Removing end components). Let M and A rEff be as above. Consider the product MDP N def = M ⊗ A rEff . This product is an MDP equipped with a Rabin acceptance condition found in the second component of each state in the product.
We now take two copies of N representing a mode before Cause has been reached and one mode after Cause has been reached. So, each state s is equipped with one extra bit 0 or 1. Initially, the MDP starts in the copy labeled with 0 and behaves like N until a state with its first component in Cause is reached. From there, the process moves to the corresponding successor states in the second copy labeled with 1. We call the resulting MDP N ′ and denote the set of states with their first component in Cause in the first copy that are reachable in N ′ by Cause A rEff , in particular, to express that these states are enriched with states of the automaton A rEff .
Next, we consider the MECs E 1 , . . . , E k of N ′ . Note that the states in Cause A rEff are not contained in any MEC. Furthermore, all MECs consist either only of states from the first copy labeled 0, or only of states from the second copy labeled with 1. For each MEC E i , we determine whether there is a scheduler for E i that ensures the event Acc(A rEff ) that the acceptance condition of A rEff is met with probability 1 and whether there is a scheduler that ensures this event with probability 0. With the techniques of [dA97] and [BGC09] this can be done in polynomial time. We then add four new terminal states eff cov , noeff fp , eff unc , and noeff tn and construct the MEC-quotient of N ′ while, for each i ⩽ k, enabling a new action in the state s E i obtained from E i leading to • eff unc if Acc(A rEff ) can be ensured with probability 1 in E i and E i is contained in copy 0, and • another new action leading to noeff tn if Acc(A rEff ) can be ensured with probability 0 in E i and E i is contained in copy 0; • eff cov if Acc(A rEff ) can be ensured with probability 1 in E i and E i is contained in copy 1, and • another new action leading to noeff fp if Acc(A rEff ) can be ensured with probability 0 in E i and E i is contained in copy 1.
Finally, we remove all states which are not reachable (from the initial state . The scheduler T can be mimicked by a scheduler S for M: As long as T moves through the MEC-quotient of N ′ , the scheduler S follows this behavior by leaving MECs through the corresponding actions in M. Whenever T moves to one of the states eff cov and eff unc , the last step begins in a state s E obtained from a MEC E of M. In this case, S will stay in E while ensuring with probability 1 that the resulting run is accepted by A rEff . Similarly, if T moves to noeff tn or noeff fp , the scheduler S for M stays in the corresponding MEC and realizes the acceptance condition of A rEff with probability 0. Vice versa, a scheduler S for M can be mimicked by a scheduler T for M [rEff,Cause] analogously. So in an end component E, if S stays in E and ensures that the resulting run is accepted by A rEff with probability p 1 , stays in E and ensures that the resulting run is not accepted by A rEff with probability p 2 , and leaves E with probability p 3 , T will move to the corresponding state eff cov or eff unc with probability p 1 , to noeff tn or noeff fp with probability p 2 , and leave take actions leaving s E to other states with the same probability distribution with which S takes the leaving actions of E. For such a pair of schedulers S and T, we observe that This Lemma 6.3 shows that reachability SPR causality shares similarities with GPR causality. Our algorithmic results for reachability PR causes stem from the reduction provided by Notation 6.2 and Lemma 6.3. As an immediate consequence we can check conditions (rG) and (rS) in M [rEff,Cause] by using the already established algorithms for GPR causes. This results in the following complexity upper bounds.
Corollary 6.4. Let M be an MDP and rEff ⊆ S ω an ω-regular language given as DRA A rEff . Given a set Cause ⊆ S we can decide whether Cause is a reachability SPR/GPR cause for rEff in coNP.
Proof. By Lemma 6.3, we can use the construction of M [rEff,Cause] , which takes polynomial time. Then, Theorem 4.12 allows us to check whether Cause is a reachability GPR cause in coNP directly, while we can apply the GPR check to each set c A rEff and the effect {eff cov , eff unc } in M [rEff,Cause] for c ∈ Cause in order to determine whether Cause is a reachability SPR Cause for rEff.
As in the state-based setting, we can argue that there is a reachability GPR cause iff there is a reachability SPR cause iff there is a singleton reachability SPR cause. Consequently, the existence of a reachability GPR/SPR cause can be checked by checking for each state c of the MDP whether {c} constitutes a reachability SPR cause. We conclude: Corollary 6.5. The existence of a reachability GPR/SPR cause can be decided in coNP. 6.1.2. Computing quality measures of reachability PR causes. As in Section 5.1, we can view reachability PR causes as binary classifiers. This leads to the confusion matrix as before ( Figure  13) with the difference that this time the path does not "hit" the effect set, but rather rEff holds on the path. Hence, we define the following entries of the confusion matrix: Given rEff, Cause and a scheduler S, we let where the values for T are defined in M [rEff,Cause] as for the state-based setting (cf. Section 5.1).
Proof. In the proof of Lemma 6.3, we have seen that for each scheduler S for M, we can find a scheduler T for M [rEff,Cause] and vice versa such that Equations (6.1) -(6.3) hold. This implies the equalities claimed here.
Analogously to Section 5.1, we can now define recall, covrat, and fscore of a reachability PR cause as the infimum of these values in terms of tp S , fp S , tn S , and fn S over all schedulers S for which the respective quality measures are defined. The computation of these values can then be done with the methods from the state-based setting: Corollary 6.7. Let M be an MDP and rEff ⊆ S ω an ω-regular language given as DRA A rEff . Given a reachability SPR/GPR cause Cause ⊆ S we can compute recall(Cause), covrat(Cause) and fscore(Cause) in polynomial time.
Proof. Use Lemma 6.6, Corollary 5.4 and Theorem 5.5. 6.1.3. Finding quality optimal reachability PR causes. When trying to find good causes for an ω-regular effect rEff, we cannot say that effects and causes should be disjoint as in the setting where effects and causes were sets of states. This leads to the possibility that causes might exist that do not capture the intuition behind temporal priority: E.g., if the effect rEff is a reachability property ♢E for a set of states E, the set of states E itself will be a reachability PR cause if for each state c ∈ E, Pr max M (¬E U c) > 0 holds. Furthermore, there might be causes C that can only be reached after E has already been reached.
In order to account for the temporal priority of causes, we will include the following condition when trying to find good causes: We require for a cause Cause ⊆ S that Pr min M (rEff | ¬Cause U c) < 1 for all c ∈ Cause.
Intuitively, this states that it is never already certain that the execution will belong to rEff when the cause is reached. A variation of this criterium has been proposed also in [BDF + 21].
Remark 6.8. The condition (TempPrio) could be added to the definition of reachability PR causes. After the product construction in Notation 6.2, the condition can easily be checked for a given cause candidate Cause ⊆ S: For each c ∈ Cause, there must be at least one state with c in the first component in M [rEff,Cause] such that noeff fp is reachable from this state. Furthermore, this condition is stronger than the requirement that causes and effects are disjoint in the state-based setting. In the state-based setting, however, the analogue of condition (TempPrio) could also be included easily. Instead of having to be disjoint from a set of effect states Eff, a cause Cause would then simply not be allowed to contain any state s with Pr min M,s (♢Eff) = 1. ◁ Now, we want to find recall-and coverage ratio-optimal reachability SPR causes. As in the state-based setting, we define the set C of all possible singleton reachability SPR causes for rEff that in addition satisfy (TempPrio) as explained in Remark 6.8. By Corollary 6.4, we can check whether a state c ∈ S is a singleton reachability SPR cause in coNP; whether there is at least one state with c in the first component in M [rEff,Cause] such that noeff fp is reachable from this state is checkable in polynomial time. Thus, we can again define the set of singleton reachability SPR causes C = {s ∈ S | s is reachability SPR cause satisfying Pr min M (rEff | ¬Cause U c) < 1}. As before, the canonical cause CanCause is now the set of states c ∈ C for which there is a scheduler S with Pr S (¬C U c).
For the complexity of the computation of the recall-and coverage ratio-optimal canonical cause and its values, the observations above lead us to the complexity class PF NP as defined in [Sel94]. It consists of all functions that can be computed in polynomial time with access to an NP-oracle, or equivalently a coNP-oracle.
Proposition 6.9. If C ̸ = ∅ then CanCause is a ratio-and recall optimal reachability SPR cause for rEff. The threshold problem for the coverage ratio and the recall can be decided in coNP. The optimal values recall(CanCause) and covrat(CanCause) can be computed in PF NP .
Proof. The optimality of CanCause follows by the arguments used for Theorem 5.11. In order to compute CanCause we check for each state s whether (rS) does not hold in NP and then take the remaining states c ∈ S to define C by checking (TempPrio) for each c. If (TempPrio) holds then c ∈ C and CanCause = {c ∈ C | Pr max M (¬C U c) > 0}. This allows to compute the recall-and coverage ratio-optimal cause CanCause in PF NP .
For the threshold problem whether there is a reachability SPR cause with recall at least a given ϑ ∈ Q, the coNP upper bound can be shown as follows: For each state c that does not belong to C, i.e., that is not a singleton reachability SPR cause, there is a polynomial size certificate for this, as it can be checked in coNP. The collection of all states that do not belong to C together with these certificates (they do not belong to C) can now serve as a certificate that the optimal recall is less than ϑ in this the case. Given such a collection of certificates, we can check in polynomial time that indeed all provided states do not belong to C. The complement of the provided states forms a super set D of CanCause. Computing the recall of the set D can then be done in polynomial time as in Corollary 6.7. This value is an upper bound for the recall of CanCause. Note that if all states that do not belong to C are given in the certificate, the value even equals the recall of CanCause.
So, if the optimal value is less than ϑ, this is witnessed by the described certificate containing all states not in C. Vice versa, if a certificate is given resulting in a super set D of CanCause such that the recall of D is less than ϑ, then there is no reachability SPR cause with a recall of at least ϑ. So, the threshold problem lies in coNP. For the coverage ratio, the analogous argument works.
For the threshold problems for f-score-optimal reachability SPR causes and reachability GPR causes optimal with respect to recall, coverage ration or f-score that satisfy (TempPrio), we rely on the guess-and-check approach used for optimal GPR causes in the state-based setting. We guess a subset Cause of states of the MDP, check whether we found a reachability SPR/GPR cause in coNP, and compute the quality measure under consideration in polynomial time as explained in the previous section. For the computation of an optimal cause, we obtain a polynomial-space algorithm.
Corollary 6.10. Let M be an MDP and rEff be an ω-regular language given by A rEff . Given ϑ ∈ Q, deciding whether there exists a reachability GPR cause Cause with recall(Cause) ⩾ ϑ (resp. covrat(Cause) ⩾ ϑ, fscore(Cause) ⩾ ϑ) can be done in Σ P 2 and is NP-hard. NP-hardness even holds for Markov chains.
Deciding whether there is a reachability SPR cause Cause with fscore(Cause) ⩾ ϑ can be done in Σ P 2 . A recall-, covratio-, or f-score-optimal reachability GPR cause as well as an f-score-optimal reachability SPR cause can be computed in polynomial space.
Proof. Obviously the lower bounds extend to this setting as we can interpret GPR causes for Eff as reachability GPR causes for ♢Eff. The upper bounds extend to this setting by using the construction from Notation 6.2. Since for Theorem 5.16 we relied on guess-and-check algorithms to establish the upper bounds for the threshold problems, we can use analogous algorithms in setting of ω-regular effects. We guess a set Cause ⊆ S, check the reachability GPR causality in coNP (Corollary 6.4) and compute the value of the quality measure in polynomial time (Corollary 6.7). Again, the alternation between the existential quantification for guessing Cause and the universal quantification for the coNP check results in the complexity Σ P 2 of the polynomial-time hierarchy. In order to show that the decision problem for fscore(Cause) ⩾ ϑ for SPR causes Cause is in Σ P 2 we resort to the constructed MDP M [rEff,Cause] . By Lemma 6.3 a reachability SPR cause Cause in M corresponds to a set of GPR causes {c A rEff | c ∈ Cause} in M [rEff,Cause] , which can be interpreted as GPR cause C = c∈Cause c A rEff . This way we can encode the property fscore(Cause) ⩾ ϑ in M by fscore(C) ⩾ ϑ in M [rEff,Cause] . This results in a decision problem GPR-f-score for GPR causes which is in Σ P 2 by Theorem 5.16. For computing optimal GPR causes as well as f-score-optimal SPR causes we can try all cause candidates by computing the related value (recall, covrat or fscore) and always store the best one so far. As the space for the cause can be reused, this results in a polynomial space algorithm.
6.2. ω-regular co-safety properties as causes. We now want to discuss an extension of the previous framework when we also consider causes to be regular sets of executions. However, in order to account for the temporal priority of causes, i.e., the fact that causes should occur before their effects, it makes sense to restrict causes to ω-regular co-safety properties. The reason is that an ω-regular co-safety property L is uniquely determined by the regular set of minimal good prefixes of words in L. Recall that a good prefix π for L is a finite word such that all infinite extensions of π belong to L and that all infinite words in the co-safety language L have a good prefix. Hence, we can say that a cause rCause occurred as soon as a good prefix for rCause has been produced. For this subsection we will denote regular effects and causes mainly by rEff and rCause to avoid confusion with effects and causes as sets of states. In the following formal definition, we use finite words σ ∈ S * to denote the event σS ω .
Definition 6.11 (co-safety GPR/SPR causes). Let M be an MDP with state space S and let rEff ⊆ S ω be an ω-regular language. An ω-regular co-safety language rCause ⊆ S ω is a co-safety GPR cause for rEff if the following condition (coG) holds: (coG): For each scheduler S where Pr S M (rCause) > 0: The event rCause is called a co-safety SPR cause for rEff if the following condition (coS) hold: (coS): For each minimal good prefix σ for rCause and each scheduler S where Pr S M (σ) > 0: As in the state-based setting it follows that co-safety SPR cause are also co-safety GPR causes.
6.2.1. Checking co-safety causality. We will represent co-safety PR causes as DFAs which accept good prefixes of the represented ω-regular event. Note that, for any ω-regular co-safety property, there is a DFA accepting exactly the minimal good prefixes. So, we will restrict to such DFAs that accept the minimal good prefixes of an ω-regular co-safety property. Such a DFA can never accept a word w as well as a proper prefix v of w. Let now M be an MDP, rEff an effect given by the DRA A rEff and rCause a cause candidate given by a DFA A rCause as above. So, in particular, A rCause accepts exactly the minimal good prefixes for rCause. We now want to check, whether rCause is a co-safety SPR cause (resp. co-safety GPR cause). For the special case of Markov chains the check can be done in polynomial time analogously to reachability PR causes by computing the corresponding conditional probabilities. We can provide a model transformation of M using both automata such that the resulting MDP has no end components and the effect is a reachability property again similar to Notation 6.2.
For this consider the product N def = M ⊗ A rEff ⊗ A rCause . This product is an MDP equipped with two kinds of acceptance conditions. The Rabin acceptance of A rEff in the second component of each state and the acceptance condition of A rCause in the third component. Now let rCause A rEff be the set of all states of N whose third component is accepting in A rCause and which are reachable from the initial state.
As in Notation 6.2, we construct an MDP N ′ by introducing a mode before rCause A rEff and a mode after rCause A rEff . We then take the MEC-quotient with the four terminal states eff cov , noeff fp , eff unc , and noeff tn , which are reachable from states s E that result from collapsing the MEC E depending on whether E is contained in the before-or after-rCause A rEff mode and whether the acceptance condition of A rEff can be realized with probability 0 and 1, respectively in E, analogously to Notation 6.2. We call the resulting MDP M [rEff,rCause] and emphasize that this MDP still contains all states in rCause A rEff as they are not contained in any end component.
We start with the observation, that for co-safety GPR causes this reduction characterizes the condition (coG) completely.
Lemma 6.12. Let M be and MDP, A rEff an DRA, and A rCause a DFA be as above and let M [rEff,rCause] be the constructed MDP that contains the set rCause A rEff of reachable states that have an accepting A rCause -component. Then, rCause is a co-safety GPR cause for rEff in M if and only if the set of states rCause A rEff is a GPR cause for {eff cov , eff unc } in M [rEff,rCause] .
Proof. The set rCause A rEff in M [rEff,rCause] satisfies Pr max M [rEff,rCause] (¬rCause A rEff U c) > 0 for each c ∈ rCause A rEff by construction, since all states in rCause A rEff are reachable and a run cannot reach two different states in rCause A rEff . Thus, the minimality condition is satisfied in any case. Now, let S be a scheduler for M [rEff,rCause] . The scheduler S can be mimicked by a scheduler T for M: As long as S moves through the MEC-quotient of N ′ , the scheduler T follows this behavior by leaving MECs through the corresponding actions in M. Whenever S moves to one of the states eff cov and eff unc , the last step begins in a state s E obtained from a MEC E of M. In this case, T will stay in E while ensuring with probability 1 that the resulting run is accepted by A rEff . Similarly, if S moves to noeff tn or noeff fp , the scheduler T for M stays in the corresponding MEC and realizes the acceptance condition of A rEff with probability 0. Vice versa, a scheduler T for M [rEff,rCause] can be mimicked by a scheduler S for M analogously (see also the proof of Lemma 6.3). For such a pair of schedulers S and T, we observe that Example 6.14. Consider the MDP M from Figure 17 with rEff = ♢eff. For every scheduler S with Pr S M (♢c) > 0 we have Pr S M (♢eff | ♢c) > Pr S M (♢eff) and thus c is a state-based SPR cause for eff. On the other hand for the scheduler τ, which chooses α after the path π = a b c and β otherwise ,we have Pr τ M (♢eff | π) = 1 2 = Pr τ M (♢eff).
Therefore, the desired reduction as in Lemma 6.12 does not work for M. Note that the violation of the condition (coS) is only possible in this example if the scheduler behaves differently depending on how state c is reached. This different behavior, however, does not have anything to do with the effect, and potentially different residual properties that have to be satisfied to achieve the effect; in the example, the effect is just a reachability property. Furthermore, we want to emphasize that the concrete probabilities of the individual paths leading to c are important for the violation. In general, this imposes a major challenge for checking the condition (coS), for which we do not know a solution. Similar problems arise when trying to check the existence of a co-safety SPR cause. A witness might be just one individual path, potentially only together with a scheduler that realizes this path with very low probability. ◁ 6.2.2. Computation of quality measures of co-safety causes. Analogously to Section 6.1.2, we can define recall, coverage ratio, and f-score of co-safety PR causes. With the construction of M [rEff,rCause] and the correspondence between schedulers S for M and T for M [rEff,rCause] satisfying Equations (6.4)-(6.6) established in the proof of Lemma 6.12, we obtain analogously to the setting of reachability PR causes: Corollary 6.15. Let M be an MDP and rEff ⊆ S ω an ω-regular language given as DRA A rEff . Given a co-safety SPR/GPR cause rCause ⊆ S ω by a DFA A rCause , we can compute recall(Cause), covrat(Cause) and fscore(Cause) in polynomial time.
6.2.3. Finding optimal co-safety PR causes. Already for reachability PR causes, we have seen that without further restrictions on the causes we allow, causes might be trivial and intuitively violate the idea of temporal priority (cf. Section 6.1.3). Hence, also here, we impose an additional condition, a variation of the condition (TempPrio) used above: In line with the difference between the definitions of reachability PR causes and co-safety PR causes, we require that after any good prefix σ of a co-safety cause rCause, the probability that effect rEff will occur is not guaranteed to be 1, i.e., we require that Pr min M (rEff | σ) < 1 for all good prefixes σ of rCause.
Unfortunately, we will observe that there are some obstacles in the way when trying to find optimal co-safety PR causes. Following the observations from Theorem 5.11 we can define a canonical co-safety PR cause which is an optimal co-safety SPR cause for both recall and coverage ratio. In this fully path-based setting this canonical cause consists of all minimal paths which are singleton co-safety SPR causes. However, as we are not aware of a feasible way to check (coS), the computation of this cause is unclear. For co-safety GPR causes, the following example illustrates that there might be no recall-optimal causes that respect (TempPrio2). Intuitively, the reason is that causes can be pushed arbitrarily close towards a violation of the probability raising condition while increasing the recall: Example 6.16. This example will show that there is a Markov chain M with a state e such that the effect rEff = ♢e has regular GPR causes that respect the condition (TempPrio2), but no recall-optimal co-safety GPR cause that respects (TempPrio2).
Consider the Markov chain M depicted in Figure 18 with states S and the effect rEff = ♢e. First of all, we have that Pr M (rEff) = 2/3. Furthermore, clearly the cause init b S ω with the unique minimal good prefix init b is a regular GPR cause for example as Pr M (rEff | init b) = 3/4. Next, note that there cannot be a regular GPR cause rCause ′ that does not have init b as a minimal good prefix. By (TempPrio2) the minimal good prefixes are not allowed to end in e. Furthermore init is clearly also no candidate for a minimal good prefix of rCause ′ as that would imply that rCause consists of all paths of M and cannot satisfy the condition (rGPR). If init b is not a minimal good prefix, hence all minimal good prefixes have to end in a, c, or f. Afterwards, the probability to reach e is at most 1/4 and hence also the probability Pr M (rEff | rCause ′ ) ⩽ 1/4 because it is a weighted average of the probabilities Pr M (rEff | σ) ⩽ 1/4 of the minimal good prefixes σ of rCause ′ .
So, all regular GPR causes have the minimal good prefix init b together with potentially further minimal good prefixes. Let now where A p is a regular subset of the paths {init a k c | k ⩾ 1} such that all the paths in A p together have probability mass 1 3 p. Note that we can find such a set A p for a dense set of values p ∈ [0, 1]. We compute Pr M (rCause p ) = 1 3 (1 + p) and Pr M (rCause p ∧ rEff) = 1 4 + 1 12 p.
So, among the co-safety GPR causes of the form rCause p , there is no recall-optimal one. For p tending to 1/5 from below, the recall always increases. Note also that an ε-recall-optimal co-safety GPR cause for ε > 0 must take a very complicated form. It has to select paths of the form init a k c that have probability 1 3·2 k such that there probability adds up to a value less than, but close to 1 15 . ◁ We have seen that for recall-optimal (and hence coverage ratio-optimal) SPR causes for an effect given by A rEff , we can provide a characterization of the canonical cause. How to compute this cause, however, is unclear as we do not know how to check the co-safety SPR condition and as there might be some paths ending in a given state in the product of M and A rEff that belong to the canonical cause while other paths ending in that state do not. For co-safety GPR causes, we have even seen that there might be no (non-trivial) optimal causes and that causes close to the optimum can be required to take a very complicated shape. As the f-score is a more involved quality measure than the recall, we cannot expect that the search for f-score optimal causes is simpler. It seems to be likely that the situation is at least as bad as for recall-optimal causes if not worse.

CONCLUSION
In this work we formalized the probability-raising principle in MDPs and studied several quality notions for probability-raising causes. We covered fundamental algorithmic problems for both the strict (local) and global view, where we considered a basic state-based setting in which cause and effect are given as sets of states. We extended this setting to ω-regular path properties as effects in two ways. In a more simple setting we kept causes as sets of states and in a more general approach considered co-safety path properties as causes.
Strict vs. Global probability raising. In our basic setting of state-based cause-effect relations, our results indicate that GPR causes are more general overall by leaving more flexibility to achieve better quality measures, while algorithmic reasoning on SPR causes is simpler. This changed when extending the framework by considering ω-regular effects given by a deterministic Rabin automaton. Our results mainly stem from a polynomial reduction from ω-regular effects to reachability effects (Lemma 6.3). The caveat here is that the strict PR condition translates to a global PR condition after this transformation, which increases the algorithmic complexity of reachability SPR causes to the level of reachability GPR causes. Thus, the strict probability-raising loses its advantage over the global perspective. Furthermore, when considering causes as co-safety path properties we observe increasing difficulties to handle strict probability-raising. This stems from an underlying problem in the approach of strict probability-raising applied to path properties. As we consider cause-effect relations between these properties, it is somewhat unnatural to require each individual path to raise the probability of the effect property. Rather, it is more natural to say a path property as a whole causes another one, instead of saying all possible realizations of a path property cause another one. This means that co-safety GPR causes also seem more natural than co-safety SPR causes from a philosophical standpoint.
Non-strict inequality in the PR conditions. The approach of probability-raising within this work is in line with the classical notion in literature that uses a strict inequality in the PR condition. As a consequence causes might not exist (see Example 3.5). However, relaxing the PR condition by only requiring a non-strict inequality would apparently be a minor change that broadens the choice of causes. Indeed, the proposed algorithms for checking the SPR and GPR condition for reachability effects (Section 4) can easily be modified for the relaxed definition. As the algorithms of both extended settings discussed in Section 6 stem mainly from a reduction to reachability effects this also holds for reachability and co-safety causes of regular effects. However, a non-strict inequality in the PR condition would lead to a questionable notion of causality, as e.g. {init} would always be a recall-and ratio-optimal cause. Thus, other side constraints are needed in order to make use of the relaxed PR condition. E.g., requiring the non-strict inequality for all schedulers that reach a cause with positive probability and also requiring the existence of a witnessing scheduler for the PR condition with strict inequality might be a useful alternative definition which agrees with Def. 3.1 for Markov chains.
Relaxing the minimality condition. As many causality notions in the literature include some minimality constraint, we included the condition Pr max M (¬Cause U c) > 0 for all states of Cause in the state-based setting and for reachability PR causes of regular effects. However, this requirement could be dropped without affecting the algorithmic results presented here. This can be useful when the task is to identify components or agents which are responsible for the occurrences of undesired effects. In these cases the cause candidates are fixed (e.g., for each agent i, the set of states controlled by agent i), but some of them might violate the minimality condition.
Future directions. In this work we considered type-like causality where cause-effect relations are defined within the model without needing an actual execution that shows the effect. Hence, causes are considered in a forward-looking manner. Notions of probabilistic backward causality that take a concrete execution of the system into account and considerations on PR causality with external interventions as in Pearl's do-calculus [Pea09] are left for future work.