Distribution Bisimilarity via the Power of Convex Algebras

Probabilistic automata (PA), also known as probabilistic nondeterministic labelled transition systems, combine probability and nondeterminism. They can be given different semantics, like strong bisimilarity, convex bisimilarity, or (more recently) distribution bisimilarity. The latter is based on the view of PA as transformers of probability distributions, also called belief states, and promotes distributions to first-class citizens. We give a coalgebraic account of distribution bisimilarity, and explain the genesis of the belief-state transformer from a PA. To do so, we make explicit the convex algebraic structure present in PA and identify belief-state transformers as transition systems with state space that carries a convex algebra. As a consequence of our abstract approach, we can give a sound proof technique which we call bisimulation up-to convex hull.

An example is shown on the left of Figure 1.Probabilistic automata can be given different semantics, e.g., (strong probabilistic) bisimilarity [LS91], convex (probabilistic) bisimilarity [SL94], and as transformers of belief states [CPP09, FZ14, DH13, DvGHM09, DvGHM08, HKK14] whose definitions we present next.For the rest of the section, we fix a PA M = (S, L, →).Here, ≡ R ⊆ D(S) × D(S) is the lifting of R to distributions, defined by ξ ≡ R ξ if and only if there exists a distribution ν ∈ D(S × S) such that 1. t∈S ν(s, t) = ξ(s) for any s ∈ S, → ξ 2 denote that ζ can perform infinitely many transitions to states obtained as convex combinations of ξ 1 and ξ 2 .For instance y 0 a → 1 4 y 1 + 1 4 y 2 + 1 2 y 3 .
2. s∈S ν(s, t) = ξ (t) for any t ∈ T , and 3. ν(s, t) = 0 implies (s, t) ∈ R. Two states s and t are (strongly probabilistically) bisimilar, notation s ∼ t, if there exists a (strong probabilistic) bisimulation R with (s, t) ∈ R.Here → c denotes the convex transition relation, defined as follows: s a → c ξ if and only if ξ = n i=1 p i ξ i for some ξ i ∈ D(S) and p i ∈ [0, 1] satisfying n i=1 p i = 1 and s a → ξ i for i = 1, . . ., n.Two states s and t are convex bisimilar, notation s ∼ c t, if there exists a convex bisimulation R with (s, t) ∈ R.
Convex bisimilarity is (strong probabilistic) bisimilarity on the "convex closure" of the given PA.More precisely, consider the PA M c = (S, L, → c ) in which s a → c ξ whenever s ∈ S and ξ is in the convex hull (see Section 3 for a definition) of the set {ζ ∈ D(S) | s a → ζ}.Then convex bisimilarity of M is bisimilarity of M c .Hence, if bisimilarity is the behavioural equivalence of interest, we see that convex semantics arises from a different perspective on the representation of a PA: instead of seeing the given transitions as independent, we look at them as generators of infinitely many transitions in the convex closure.For readers familiar with the schedulers, convex bisimilarity means that schedulers in PA can randomise.
There is yet another way to understand PA, as belief-state transformers, present but sometimes implicit in [CPP09, Hen12, FZ14, DH13, DvGHM09, DvGHM08, HKK14, CR11] to name a few, with behavioural equivalences on distributions.We were particularly inspired by the original work of Deng et al. [DH13,DvGHM09,DvGHM08] as well as [HKK14].Given a PA M = (S, L, →), consider the labeled transition system M bs = (DS, L, →) with states distributions over the original states of M , and transitions → ⊆ DS × L × DS defined by We call M bs the belief-state transformer of M .Figure 1, right, displays a part of the belief-state transformer induced by the PA of Figure 1, left.According to this definition, a distribution makes an action step only if all its support states can make the step.This, and hence the corresponding notion of bisimulation, can vary.For example, in [HKK14] a distribution makes a transition a → if some of its support states can perform an a → step.There are several proposed notions of equivalences on distributions [Hen12, DHR08, DMS14, FZ14, DH13, CPP09, HKK14] that mainly differ in the treatment of termination.See [HKK14] for a detailed comparison.
In this paper we focus on full probability distributions for reasons of clarity and canonical presentation.This has some effect on the obtained semantics (and can lead to undesired corner cases).However, all our results can be recast in the setting of subprobability distributions too, yielding a more natural semantics on the expense of more complicated presentation.While the foundations of strong probabilistic bisimilarity are well-studied [Sok11, BSdV04, VR99] and convex probabilistic bisimilarity was also recently captured coalgebraically [Mio14], the foundations of the semantics of PA as transformers of belief states is not yet explained.One of the goals of the present paper is to show that also that semantics (naturally on distributions [HKK14]) is an instance of generic behavioural equivalence.Note that a (somewhat concrete) proof is given for the bisimilarity of [HKK14]-the authors have proven that their bisimilarity is coalgebraic bisimilarity of a certain coalgebra corresponding to the belief-state transformer.What is missing there, and in all related work, is an explanation of the relationship of the belief-state transformer to the original PA.Clarifying the foundations of the belief-state transformer and the distribution bisimilarity is our initial motivation.
A useful side effect of such clarification is the proof technique that we name up-to convex hull: the state y 0 of the belief-state transformer on the right of Figure 1 can reach infinitely many states (e.g., the distributions 1 2 n y 1 + (1 − 1 2 n )y 2 for all natural numbers n); therefore checking whether y 0 is distribution bisimilar to another state, e.g., x 0 , would require a distribution bisimulation which relates infinitely many states.However, as we shall see in Example 7.5, a finite relation which is a bisimulation up-to convex hull will be enough to prove that y 0 is distribution bisimilar to x 0 .Intuitively, bisimulations up-to convex hull allow to exploit the algebraic structure underlying probability distributions.We illustrate this structure in the next section.

Convex Algebras
In order to model belief-state transformers as coalgebras, we first need to uncover the algebraic theory underlying probability distributions.By algebraic theory here we mean a set of operations (signature) and a set of equations that characterises, in a sense that we will make precise in Proposition 4.9, probability distributions.It is well known that such theory is the one of convex algebras, that we quickly recall in this section.
By C we denote the signature of convex algebras The operation symbol (p i ) n i=0 has arity (n + 1) and it will be interpreted by a convex combination with coefficients p i for i = 0, . . ., n.For p ∈ [0, 1] we write p = 1 − p. Definition 3.1.A convex algebra X is an algebra with signature C, i.e., a set X together with an operation n i=0 p i (−) i for each operational symbol (p i ) n i=0 ∈ C, such that the following two axioms hold: Let X be a convex algebra.Then (for p n = 1) Hence, an (n + 1)-ary convex combination can be written as a binary convex combination using an n-ary convex combination.As a consequence, if X is a set that carries two convex algebras X 1 and X 2 with operations n i=0 p i (−) i and n i=0 p i (−) i , respectively (and binary versions + and ⊕, respectively) such that px + py = px ⊕ py for all p, x, y, then X 1 = X 2 .
Hence, it suffices to consider binary convex combinations only, whenever more convenient.Definition 3.4.Let X be a convex algebra, with carrier X and C ⊆ X. C is convex if it is the carrier of a subalgebra of X, i.e., if px + py ∈ C for x, y ∈ C and p ∈ (0, 1).The convex hull of a set S ⊆ X, denoted conv(S), is the smallest convex set that contains S.
Clearly, a set C ⊆ X for X being the carrier of a convex algebra X is convex if and only if C = conv(C).Convexity plays an important role in the semantics of probabilistic automata, for example in the definition of convex bisimulation, Definition 2.3.
In order to illustrate the concept of convexity, we consider now some subsets of D(X).Take C 1 = {δ x , δ y } where δ x and δ y are the Dirac distributions for x and y, respectively.This set is not convex since 1 2 δ x + 1 2 δ y , the distribution mapping both x and y to 1 2 , does not belong to C 1 .The smallest convex set that contains C 1 is its convex hull conv(C 1 ): this is the set of all probability distributions over {x, y}.
For a similar reason, the set In contrast, the set C 3 = {δ x } is convex.Indeed, for all p ∈ (0, 1), pδ x + pδ x is, by idempotence, equal to δ x .Finally, note that by definition the empty set is also convex.

Coalgebras and Generalised Determinisation
In this section we give a gentle introduction to (co)algebra that enables us to highlight the generic principles behind the semantics of probabilistic automata.The interested reader is referred to [Jac16, Rut00, JR97] for more details.We start by recalling even the basic notions of category, functor and natural transformation, so that all of the results in the paper are accessible also to non-experts.
A category C is a collection of objects and a collection of arrows (or morphisms) from one object to another.For every object X ∈ C, there is an identity arrow id X : X → X.For any three objects X, Y, Z ∈ C, given two arrows f : X → Y and g : Y → Z, there exists an arrow g • f : X → Z. Arrow composition is associative and id X is neutral w.r.t.composition.The standard example is Sets, the category of sets and functions.We will sometimes use the epi-mono factorisation of arrows in Sets-every function is the composition of an epi followed by a mono-where an epi is a surjective function and a mono is an injective function in Sets.
A functor F from a category C to a category D, notation F : C → D, assigns to every object X ∈ C, an object F X ∈ D, and to every arrow f : X → Y in C an arrow F f : F X → F Y in D such that identity arrows and composition are preserved.(1) The constant exponent functor (−) L for a set L, mapping a set X to the set X L of all functions from L to X, and a function f : (2) The powerset functor P mapping a set X to its powerset PX = {S | S ⊆ X} and on functions f : X → Y given by direct image: Pf : PX → PY , P(f (3) The finitely supported probability distribution functor D is defined, for a set X and a function f : X → Y , as The support set of a distribution ϕ ∈ DX is defined as (4) The functor C [Mio14, Jac08, Var03] maps a set X to the set of all nonempty convex subsets of distributions over X, and a function f : X → Y to the function PDf .Hence, We will often decompose P as P ne + 1 where P ne is the nonempty powerset functor and (−) + 1 is the termination functor defined for every set X by A category C is concrete, if it admits a canonical forgetful functor U : C → Sets.By a forgetful functor we mean a functor that is identity on arrows.Intuitively, a concrete category has objects that are sets with some additional structure, e.g.particular algebras, and morphisms that are a particular kind of functions between the sets, e.g. the corresponding algebra homomorphisms.The forgetful functor forgets the additional structure and maps the objects to the underlying sets.It is identity on arrows in the sense that every such particular kind of function is also just a function between the underlying sets and hence mapped to itself.For example, the category of monoids with monoid homomorphisms as arrows is concrete.The forgetful functor maps every monoid to its carrier set, and every monoid homomorphism to itself (being a function from the one carrier set to the other). Let Coalgebras provide an abstract framework for state-based systems.Let C be a base category.A coalgebra is a pair (S, c) of a state space S (object in C) and an arrow c : S → F S in C where F : C → C is a functor that specifies the type of transitions.
We will sometimes just say the coalgebra c : S → F S, meaning the coalgebra (S, c).A coalgebra homomorphism from a coalgebra (S, c) to a coalgebra (T, d) is an arrow h : S → T in C that makes the following diagram commute.
Coalgebras of a functor F and their coalgebra homomorphisms form a category that we denote by Coalg C (F ).
(3) Labelled transition systems (LTS) with labels in L are coalgebras for P L .(4) Markov chains are coalgebras for the distribution functor D.
Coalgebras over a concrete category, with U : C → Sets being the forgetful functor, are equipped with a generic behavioural equivalence, which we define next.Let (S, c) be an Two states s, t of a coalgebra are behaviourally equivalent, notation s ≈ t, if and only if there is a kernel bisimulation R with (s, t) ∈ R. It is not difficult to show that behaviour equivalence is itself a kernel bisimulation.
Moreover, coalgebras over a concrete category have a set of states, namely the set of states of (S, c) is US.The following property is simple but important: if there is a functor from one category of coalgebras (over a concrete category) to another that keeps the set of states the same and is identity on morphisms, then this functor preserves behavioural equivalence, i.e., if two states are equivalent in a coalgebra of the first category, then they are also equivalent in the image under the functor in the second category.More precisely, let being the composition of the obvious forgetful functors with ŪX (S, c) = S and U X the forgetful functor making X concrete.Then T preserves behavioural equivalence.Note that U D • T = U C on objects means that T keeps the state set the same and U D • T = U C on morphisms means that T is identity on morphisms.
We are now in position to connect probabilistic automata to coalgebras.
Example 4.4.The PA on the left of Figure 1 corresponds to a coalgebra c M : S → (PDS) L .The set of states is S = {x 0 , x 1 , x 2 , x 3 , y 0 , y 1 , y 2 , y 3 }.The transition function c M is defined as expected.For instance, c M (y 0 )(a) = {δ y 3 , 1 2 δ y 1 + 1 2 δ y 2 }.It is also possible to provide convex bisimilarity semantics to probabilistic automata via coalgebraic behavioural equivalence, as the next proposition shows.
Proposition 4.5 [Mio14].Let M = (S, L, →) be a probabilistic automaton, and let (S, cM ) be a (C + 1) L -coalgebra on Sets defined by cM (s The connection between (S, c M ) and (S, cM ) in Proposition 4.5 is the same as the connection between M and M c in Section 2. Abstractly, it can be explained using the following well known generic property.
on objects and identity on morphisms is a functor that preserves behavioural equivalence.If σ is injective, meaning that all components of σ are injective, then T also reflects behavioural equivalence.
Example 4.7.We have that conv : PD ⇒ C + 1 given by conv(∅) = * and conv(X) is the already-introduced convex hull for X ⊆ DS, X = ∅ is a natural transformation.Therefore, conv L : (PD) L ⇒ (C + 1) L is one as well, defined pointwise.As a consequence from Lemma 4.6, we get a functor T conv : Coalg Sets ((PD) L ) → Coalg Sets ((C + 1) L ) and hence bisimilarity implies convex bisimilarity in probabilistic automata.
On the other hand, we have the injective natural transformation ι : C + 1 ⇒ PD given by ι(X) = X and ι( * ) = ∅ yielding an injective natural transformation χ : (C + 1) L ⇒ (PD) L .As a consequence, convex bisimilarity coincides with strong bisimilarity on the "convex-closed" probabilistic automaton M c , i.e., the coalgebra (S, cM ) whose transitions are all convex combinations of M -transitions.

4.1.
Algebras for a Monad.The behaviour functor F often is, or involves, a monad M, providing certain computational effects, such as partial, non-deterministic, or probabilistic computation.More precisely, a monad is a functor M : C → C together with two natural transformations: a unit η : id C ⇒ M and multiplication µ : M 2 ⇒ M.These are required to make the following diagrams commute, for X ∈ C.
We briefly describe two examples of monads on Sets: • The powerset monad P, with unit given by singleton η(x) = {x} and multiplication by union • The distribution monad D, with unit given by a Dirac distribution η(x) = δ x = (x → 1) for x ∈ X and the multiplication by With a monad M on a category C one associates the Eilenberg-Moore category EM(M) of Eilenberg-Moore algebras.Objects of EM(M) are pairs A = (A, a) of an object A ∈ C and an arrow a : MA → A, making the first two diagrams below commute.
in C between the underlying objects making the diagram above on the right commute.The diagram in the middle thus says that the map a is a homomorphism from (MA, µ A ) to A.
The forgetful functor U : EM(M) → C mapping an algebra to its carrier has a left adjoint F, mapping an object X ∈ C to the (free) algebra (MX, µ X ) generated by X.We have that M = F • U. Recall that an algebra A = (A, a) is a free algebra generated by a set X ⊆ A if any map f from X to the carrier of another algebra B = (B, b), i.e. f : X → B, extends to a unique homomorphism f # from A to B. Free algebras are unique up to isomorphism.A category of Eilenberg-Moore algebras which is particularly relevant for our exposition is described in the following proposition.See [ Ś74] and [Sem73] for the original result, but also [Dob06,Dob08] or [Jac10, Theorem 4] where a concrete and simple proof is given.As a consequence, we will interchangeably use the abstract (Eilenberg-Moore algebra) and the concrete definition (convex algebra), whatever is more convenient.For the latter, we also just use binary convex operations, by Proposition 3.3, whenever more convenient.4.2.The Generalised Determinisation.We now, briefly and without all details, recall the general determinisation construction from [SBBR10], which serves as inspiration for our work.Our goal is to understand the belief-state transformer as a particular determinisation of the original PA.Just like determinisation of NDA transforms a nondeterministic automaton S → 2 × (PS) L into a deterministic automaton on the powerset of original states PS → 2 × (PS) L , we aim at transforming a PA S → (PDS) L into its belief-state transformer DS → (PDS) L .
A functor F : EM(M) → EM(M) is said to be a lifting of a functor F : C → C if and only if U • F = F • U, as shown in the diagram below.Concretely, this implies that F maps an algebra with carrier X to an algebra with carrier F X.

EM(M)
Here, U is the forgetful functor U : EM(M) → C mapping an algebra to its carrier.It has a left adjoint F, mapping an object X ∈ C to the (free) algebra (MX, µ X ).We have that M = U • F.

10:12
Whenever F : C → C has a lifting F : EM(M) → EM(M), one has the following functors between categories of coalgebras.
The functor F transforms every coalgebra c : S → F MS over the base category into a coalgebra c : FS → F FS.Note that this is a coalgebra on EM(M): the state space carries an algebra, actually the freely generated one, and c is a homomorphism of M-algebras.Intuitively, this amounts to compositionality: like in GSOS2 specifications, the transitions of a compound state are determined by the transitions of its components.
The functor U simply forgets about the algebraic structure: c is mapped into An important property of U is that it preserves and reflects behavioural equivalence.On the one hand, this fact usually allows to give concrete characterisation of ≈ for F -coalgebras.On the other, it allows, by means of the so-called up-to techniques, to exploit the M-algebraic structure of FS to check ≈ on Uc .
By taking F = 2 × (−) L and M = P, one transforms c : S → 2 × (PS) L into Uc : PS → 2×(PS) L .The former is a non-deterministic automaton (every c of this type is a pairing o, t of o : S → 2, defining the final states, and t : S → P(S) L , defining the transition relation) and the latter is a deterministic automaton which has PS as states space.In [SBBR10], see also [JSS15], it is shown that, for a certain choice of the lifting F , this amounts exactly to the standard determinisation from automata theory.This explains why this construction is called the generalised determinisation.
In a sense, this is similar to the translation of probabilistic automata into belief-state transformers that we have seen in Section 2. Indeed, probabilistic automata are coalgebras c : S → (PDS) L and belief state transformers are coalgebras of type DS → (PDS) L .One would like to take F = P L and M = D and reuse the above construction but, unfortunately, P L does not have a suitable lifting to EM(D).This is a consequence of two well known facts: the lack of a suitable distributive law ρ : DP ⇒ PD [VW06] 3 and the one-to-one correspondence between distributive laws and liftings, see e.g.[JSS15].In the next section, we will nevertheless provide a "powerset-like" functor on EM(D) that we will exploit then in Section 6 to properly model PA as belief-state transformers.

Coalgebras on Convex Algebras
In this section we provide several functors on EM(D) that will be used in the modelling of probabilistic automata as coalgebras over EM(D).This will make explicit the implicit algebraic structure (convexity) in probabilistic automata and lead to distribution bisimilarity as natural semantics for probabilistic automata in Section 6.Note that we slightly overload Vol.17:3 DISTRIBUTION BISIMILARITY VIATHE POWER OF CONVEX ALGEBRAS 10:13 the notation here and conveniently use C for denoting convex subsets of a convex algebra, hoping that this does not bring confusion with the nonempty convex subsets of distributions functor which is also denoted by C.

Convex Powerset on Convex
Algebras.We now define a functor, the (nonempty) convex powerset functor, on EM(D).Let A be a convex algebra.We define P c A to be where and a c is the convex algebra structure given by the following pointwise binary convex combinations: It is important that we only allow nonempty convex subsets in the carrier A c of P c A, as otherwise the projection axiom fails.
Example 5.1.In Example 3.5, we have discussed the convex algebra D(X) for X = {x, y, z} and we have given several examples of convex subsets of D(X).All those convex subsets belong to P c D(X) apart from ∅. Observe that, according to the definition above pC + p∅ would be equal to ∅ violating the projection axiom (as noted above).Consider again the sets C 1 and C 2 from Example 3.5 and their convex closures: } which, as we will demonstrate, equals the set {ϕ ∈ D(X) | ϕ(x) ≥ 1 4 , ϕ(y) = 1 − ϕ(x)}.To start with, it is important to notice that which can be shown for the convex algebra on D(X).Hence, we focus on 1 2 ϕ 1 + 1 2 ϕ 2 for ϕ 1 and ϕ 2 in the generating sets C 1 = {δ x , δ y } and C 2 = {δ x , 1 2 δ x + 1 2 δ y }, respectively.We obtain the following four distributions: Finally, it is easy to show that the latter is the set {ϕ ∈ D(X) | ϕ(x) ≥ 1 4 , ϕ(y) = 1 − ϕ(x)}.Remark 5.2.For convex subsets of a finite dimensional vector space, the pointwise operations are known as the Minkowski addition and are a basic construction in convex geometry, see e.g.[Sch93].The pointwise way of defining algebras over subsets (carriers of subalgebras) has also been studied in universal algebra, see e.g.[BM07,BM03,Bri93].
Next, we define P c on arrows of EM(D).For a convex homomorphism h : A → B, we fix P c h = Ph.The following property ensures that we are on the right track, namely that P c : EM(D) → EM(D) is a functor.Proposition 5.3.P c A is a convex algebra.If h : A → B is a convex homomorphism, then so is P c h : P c A → P c B. P c is a functor on EM(D).
Proof.Due to Proposition 3.3, to prove that P c A is a convex algebra, all we need is to check (1) idempotence, (2) parametric commutativity, and (3) parametric associativity.
(1) C ⊆ pC + pC as c = pc + pc ∈ pC + pC.For the opposite inclusion, consider Furthermore, we have, straightforwardly, Proposition 4.9 now implies the property.
Remark 5.4.P c is not a lifting of the nonempty convex subsets of distributions functor C to EM(D), yet it satisfies an equation close to a lifting, namely C = U • P c • F as illustrated below on the left.Moreover, P c is also not a lifting of P ne , the nonempty powerset functor, but there exists a natural embedding e : U • P c ⇒ P ne • U given by e(C) = C, i.e., we are in the situation:

Sets Sets
Pne G G Sets The right diagram in Remark 5.4 simply states that every convex subset is a subset, but this fact and the natural transformation e are useful in the sequel.In particular, using e we can show the next result.Proposition 5.5.P c is a monad on EM(D), with η and µ as for the powerset monad.
Proof.Let X be a convex algebra and consider P c X. We have η(x) = {x} is a convex subset, as every singleton is.Moreover, η is a convex homomorphism as p{x} + p{y} = {px + py}.We have η (of P c ) is natural if and only if the upper square of the left diagram below commutes.

UX
However, the outer square of the diagram does commute -due to naturality of η (of P), the lower square does commute -due to naturality of e, the outside triangles also do -due to the definitions of both η's and e, and e is injective.As a consequence, the upper square commutes as well.
For µ, notice that also µ X is a convex homomorphism from P c P c X to P c X, and all the arguments that we used for naturality of η apply to the naturality of µ (of P c ) as well, when looking at the right diagram above.So, µ is natural as well.
Clearly, η and µ (of P c ) satisfy the compatibility conditions of the definition of a monad, as so do η and µ (of P).

Termination on Convex
Algebras.The functor P c defined in the previous section allows only for nonempty convex subsets.We still miss a way to express termination.The question of termination amounts to the question of extending a convex algebra A with a single element * .This question turns out to be rather involved, beyond the scope of this paper.The answer from [SW18] is: there are many ways to extend any convex algebra A with a single element, but there is only one natural functorial way.Somehow now mathematics is forcing us the choice of a specific computational behaviour for termination!Given a convex algebra A, let A + 1 have the carrier A + { * } for * / ∈ A and convex operations for p ∈ (0, 1) given by px ⊕ py = px + py , x, y ∈ A, * , x = * or y = * . (5.1) Here, the newly added * behaves as a black hole that attracts every other element of the algebra in a convex combination.It is worth to remark that this extension is folklore [Fri15].
Proposition 5.6 [SW18,Fri15].A + 1 as defined above is a convex algebra that extends A by a single element.The map h + 1 obtained with the termination functor in Sets is a convex homomorphism if h : A → B is.The assignments (−) + 1 give a functor on EM(D).
We call the functor (−) + 1 on EM(D) the termination functor, due to the following property that follows directly from the definitions.
Remark 5.8.Note that we are abusing notation here: our termination functor (−) + 1 on EM(D) is not the coproduct (−) + 1 in EM(D).The coproduct was concretely described Lemma 5.10.The constant exponent (−) L on EM(D) is a lifting of the constant exponent functor (−) L on Sets.
Example 5.11.Consider a free algebra FS = (DS, µ) of distributions over the set S. By applying first the functor P c , then (−) + 1 and then (−) L , one obtains the algebra where CS is the set of non-empty convex subsets of distributions over S, and α corresponds to the convex operations 4 p i f i defined by ∈ CS for all i ∈ {1, . . ., n} * f i (l) = * for some i ∈ {1, . . ., n} 5.4.Transition Systems on Convex Algebras.We now compose the three functors introduced above to properly model transition systems as coalgebras on EM(D).The functor that we are interested in is (P c + 1) L : EM(D) → EM(D).A coalgebra (S, c) for this functor can be thought of as a transition system with labels in L where the state space carries a convex algebra and the transition function c : S → (P c S + 1) L is a homomorphism of convex algebras.This property entails compositionality: the transitions of a composite state px 1 + px 2 are fully determined by the transitions of its components x 1 and x 2 , as shown in the next proposition.We write x a → y for x, y ∈ S, the carrier of S if y ∈ c(x)(a), and Proposition 5.12.Let (S, c) be a (P c + 1) L -coalgebra, and let x 1 , x 2 , y 1 , y 2 , z be elements of S, the carrier of S.Then, for all p ∈ (0, 1) and a ∈ L Vol. 17:3 DISTRIBUTION BISIMILARITY VIATHE POWER OF CONVEX ALGEBRAS 10:17 Proof.Since c is a convex algebra homomorphism, we have that for all p ∈ (0, 1) and a ∈ L, c(px 1 + px 2 )(a) = (pc(x 1 ) + pc(x 2 ))(a).The latter is equal, by definition of (−) L (see Section 5.3), to pc(x 1 )(a) + pc(x 2 )(a).If there is i ∈ {1, 2} such that c(x i )(a) = * , then pc(x 1 )(a) + pc(x 2 )(a) = * (see Section 5.2).If not, then both c(x 1 )(a) and c(x 2 )(a) are in P c S: pc(x 1 )(a) + pc(x 2 )(a) is by definition (see Section 5.1) the set {py 1 + py 2 | y 1 ∈ c(x 1 )(a) and y 2 ∈ c(x 2 )(a)}.
Example 5.13.Consider the PA in Figure 1 (left) and the belief-state transformer generated by it (right).We have Every transition of 1 2 y 1 + 1 2 y 2 arises in this way.Transition systems on convex algebras are the bridge between PA and LTSs.In the next section we will show that one can transform an arbitrary PA into a (P c + 1) L -coalgebra and that, in the latter, behavioural equivalence coincides with the standard notion of bisimilarity for LTSs (Proposition 6.6).

From PA to Belief-State Transformers
We will now focus on explaining the genesis of the belief-state transformer, as announced in Section 4.2.Recall from Remark 5.4 how P c is related to C and P ne , namely C = U • P c • F and there is an embedding natural transformation e : U • P c ⇒ P ne • U.The following definition is the obvious generalisation.It is convenient for us to make this further step of abstraction in order to understand the big picture and avoid repetition and ad-hoc solutions.Our main observation is that in a situation like in the following definition, one can perform generalised determinisation without having a lifting.Definition 6.1.Let M : Sets → Sets be a monad with M = U • F and L 1 , L 2 : Sets → Sets be two functors.A functor H : EM(M) → EM(M) is • a quasi lifting of L 1 if the diagram on the left commutes.
• a lax lifting of L 2 if there exists an injective natural transformation e : U • H ⇒ L 2 • U as depicted on the right.
So, for instance, P c is a (C, P ne ) quasi-lax lifting by Remark 5.4.From this fact, it follows that (P c + 1) L is a ((C + 1) L , (P ne + 1) L ) quasi-lax lifting.Another interesting example is the generalised determinisation (Section 4.2): it is easy to see that F is a (F M, F )-quasi-lax lifting.Like in the generalised powerset construction, in the presence of a quasi-lax lifting one can construct the following functors.
We first define F. Take an L 1 -coalgebra (S, c : S → L 1 S) and recall that FS is the free algebra µ : MMS → MS.The left diagram in Definition 6.1 entails that HFS has as carrier set L 1 S, so HFS is an algebra α : ML 1 S → L 1 S. We call Uc the composition The next lemma shows that c : FS → HFS is a map in EM(M).Lemma 6.2.There is a 1-1 correspondence between L 1 -coalgebras on Sets and H-coalgebras on EM(M) with carriers free algebras, if H is a quasi lifting of L 1 : Proof.First, given c, consider the map α • Mc.We need to show that α • Mc is an algebra homomorphism from the free algebra FS to HFS in EM(M).This will show that c # : FS → HFS and The needed homomorphism property holds since the following diagram commutes: as the left square commutes by the naturality of µ and the right one by the Eilenberg-Moore law for α.
Next, we show that the assignments c → c # and c # → c are inverse to each other.We have where the equality marked by ( * ) holds since Uc # is an algebra homomorphism, proven above, and the equality marked by ( * * ) holds by the monad law.
By the above, F is well defined on objects.It remains to prove that for two L 1coalgebras on Sets (S, c S ) and (T, c T ), and a coalgebra homomorphism h : (S, c S ) → (T, c T ) with c T • h = L 1 h • c S we have that Mh is a coalgebra homomorphism in EM(M) from (FS, c # S ) to (FT, c # T ).10:19 We have where the outer triangles commute by definition; the upper square commutes by assumption, i.e., since h is a homomorphism and U and F are functors; and the lower square simply states that HFh is an arrow in EM(M) which of course holds as H and F are functors.It is immediate to see that U : Coalg EM(M) (H) → Coalg Sets (L 2 ) preserves behavioral equivalence: if two states of a coalgebra (S, c) in Coalg EM(M) (H) are behaviourally equivalent, then they are also equivalent on U(S, c).Indeed, since U is a functor that keeps the state set constant and is identity on morphisms, every kernel bisimulation on (S, c) is also a kernel bisimulation on U(S, c).The converse is not true in general, as illustrated below.
Example 6.4.For sake of simplicity we now consider a different PA.Let us take the set of labels L to be {a}; the set of states S to be {x, y}; as transitions we only have: x . Moreover, it is easy to see that any other map with kernel R is not an algebra homomorphism.Therefore, R cannot be a kernel bisimulation on EM(D).
We can however state a precise correspondence: a kernel bisimulation R on U(S, c) is a kernel bisimulation on (S, c) only if it is a congruence with respect to the algebraic structure of S. Formally, R is a congruence if and only if the set US/R of equivalence classes of R carries an Eilenberg-Moore algebra and the function U[−] R : US → US/R mapping every element of US to its R-equivalence class is an algebra homomorphism.
Table 1: The three PA models, their corresponding Sets-coalgebras, and relations to M .
Sets-endofunctor does, so that MU[−] R is an epi.From this fact and the following derivation, we conclude that c R In particular, Proposition 6.5, the fact that behavioural equivalence ≈ is a kernel bisimulation, and the following result ensure that the U : Coalg EM(D) (P c + 1) L → Coalg Sets P L preserves and reflects ≈.
Proof.U(S, c) is a coalgebra for the functor P L : Sets → Sets, namely a labeled transition system.It is well known that for this kind of coalgebras, behavioural equivalence (≈) coincides with the standard notion of bisimilarity [Rut00].We can thus proceed by exploiting coinduction and prove that the following relation is a bisimulation (in the standard sense).This means that ≈ for (P c + 1) L -coalgebras, called transition systems on convex algebras in Section 5.4, coincides with the standard notion of bisimilarity for LTSs.
Table 1 summarises all models of PA: from the classical model M being a (PD) Lcoalgebra (S, c M ) on Sets, via the convex model M c obtained as T conv (S, c M ), to the belief-state transformer M bs that coincides with U • F • T conv (S, c M ).The first line recalls the concrete notation for each model, the second the corresponding Sets-coalgebra type, the third the way it is related to the original model M , and the last spells out the definition of the transition function in relation to the transition function c M of M .Theorem 6.7.Let (S, c M ) be a probabilistic automaton.For all ξ, ζ ∈ DS, Vol. 17 Observe that these are just regular bisimulations for labeled transition systems and that the greatest fixpoint of b coincides exactly with ∼ d .The coinduction principle informs us that to prove that Proving that x 0 ∼ d y 0 is more complicated.We will show this in Example 7.5 but, for the time being, observe that one would need an infinite bisimulation containing the following pairs of states.
Indeed, all the distributions depicted above have infinitely many possible choices for a → but, whenever one of them executes a depicted transition, the corresponding distribution is forced, because of (7.1), to also choose the depicted transition.
An up-to technique is a monotone map f : Rel D(S) → Rel D(S) , while a bisimulation up-to In [PS11], it is shown that every compatible up-to technique is also sound.
Hereafter we consider the convex hull technique conv : Rel D(S) → Rel D(S) mapping every relation R ∈ Rel D(S) into its convex hull which, for the sake of clarity, is To prove that ( 1 , 2 ) ∈ b(conv(R)), assume that 1 a → 1 for some a ∈ A and 1 ∈ D(S).Then, by (a) and Proposition 5.
One can proceed symmetrically for 2 a → 2 .Therefore, by definition of b, ( 1 , 2 ) ∈ b(conv(R)).This result has two consequences: first, conv is sound8 and thus one can prove ∼ d by means of bisimulation up-to conv; second, conv can be effectively combined with other compatible up-to techniques [PS11].In particular, by combining conv with up-to equivalencewhich is well known to be compatible-one obtains up-to congruence cgr : Rel D(S) → Rel D(S) .This technique maps a relation R into its congruence closure: the smallest relation containing R which is a congruence.Proposition 7.3.cgr is compatible.
The proof of Proposition 7.3 can be elegantly presented using the modular approach developed in [PS11] that we recall hereafter.
Up-to techniques can be combined in a number of interesting ways.For a map f : Rel D(S) → Rel D(S) , the n-iteration of f is defined as f n+1 = f • f n and f 0 = id, the identity function.The omega iteration is defined as f ω (R) = ∞ i=0 f i (R).The following result from [PS11] informs us that compatible up-to techniques can be composed resulting in other compatible techniques.It is easy to check that all these functions are compatible.Lemma 7.4 allows us to combine them so to obtain novel compatible up-to techniques.For instance the equivalence closure e : Rel D(S) → Rel D(S) can be decomposed as (id ∪ r ∪ t ∪ s) ω .The fact that e is compatible follows immediately from Lemma 7.4.
In a similar way, we get that cgr is compatible.
Since cgr is compatible and thus sound, we can use bisimulation up-to cgr to check ∼ d .
Recall that in Example 7.1, we showed that to prove x 0 ∼ d y 0 without up-to techniques one would need an infinite bisimulation.Instead, the relation R in Example 7.5 is a finite bisimulation up-to cgr.It turns out that one can always check ∼ d by means of only finite bisimulations up-to.The key to this result is the following theorem, proved in [SW15].This last observation has another pleasant consequence 9 : distribution bisimilarity ∼ d is decidable, provided we restrict to finite PA with rational probabilities.Indeed, given a finite PA (S, L, →) with rational probabilities, by Corollary 7.7, checking distribution bisimilarity is semi-decidable (it requires enumerating all finite bisimulations up-to).At the same time, proving that two distributions are not distribution bisimilar is also enumerable, having a finite proof.Together, these two facts imply decidability of distribution bisimilarity.

Conclusions and Future Work
Belief-state transformers and distribution bisimilarity have a strong coalgebraic foundation which leads to a new proof method-bisimulation up-to convex hull.More interestingly, and somewhat surprisingly, proving distribution bisimilarity can be achieved using only finite bisimulation up-to witness.This opens exciting new avenues: Corollary 7.7 gives us hope that bisimulations up-to may play an important role in designing algorithms for automatic 1. Introduction Probabilistic automata (PA), closely related to Markov decision processes (MDPs), have been used along the years in various areas of verification [LMOW08, KNP11, KNSS02, BK08], machine learning [GMR + 12, MCJ + 16], and semantics [WJV + 15, SK12].Recent interest in Definition 2.1.A probabilistic automaton (PA) is a triple M = (S, L, →) where S is a set of states, L is a set of actions or action labels, and → ⊆ S × L × D(S) is the transition relation.As usual, s a → ζ stands for (s, a, ζ) ∈ →.

Figure 1 :
Figure 1: On the left: a PA with set of actions L = {a} and set of states S = {x 0 , x 1 , x 2 , x 3 , y 0 , y 1 , y 2 , y 3 }.We depict each transition s a → ζ in two stages: a straight action-labeled arrow from s to • and then several dotted arrows from • to states in S specifing the distribution ζ.On the right: part of the corresponding belief-state transformer.The dots between two arrows ζ a → ξ 1 and ζ a Definition 2.3 (Convex Bisimilarity).A relation R ⊆ S × S is a convex (probabilistic) bisimulation if (s, t) ∈ R implies, for all actions a ∈ L and all ξ, ζ ∈ D(S), that s a → ξ ⇒ ∃ξ ∈ D(S).t a → c ξ ∧ ξ ≡ R ξ , and t a → ζ ⇒ ∃ζ ∈ D(S).s a → c ζ ∧ ζ ≡ R ζ.
Definition 2.4 (Distribution Bisimilarity).An equivalence R ⊆ DS × DS is a distribution bisimulation of M if and only if it is a bisimulation of the belief-state transformer M bs .Two distributions ξ and ζ are distribution bisimilar, notation ξ ∼ d ζ, if there exists a bisimulation R with (ξ, ζ) ∈ R. Two states s and t are distribution bisimilar, notation s ∼ d t, if δ s ∼ d δ t , where δ x denotes the Dirac distribution with δ x (x) = 1.
Example 4.1.Examples of functors on Sets which are of interest to us are: Proposition 4.9[ Ś74,Dob06,Dob08,Jac10]. Eilenberg-Moore algebras of the finitely supported distribution monad D are exactly convex algebras as defined in Section 3. The arrows in the Eilenberg-Moore category EM(D) are convex algebra homomorphisms.

a → δ x and y a →
δ y .By applying the construction that we have seen so far, we obtain a P L -coalgebra on Sets, call it c, where the state space is D(S) and transitions are given as ζ a → ζ for all ζ ∈ D(S).Consider now the function h : D(S) → D(S) behaving like the identity except on δ y which is mapped to δ x .It is immediate to see that h : (D(S), c) → (D(S), c) is a coalgebra homomorphism and thus the kernel of h, namely the relation R = {(ζ, ζ) | ζ ∈ DX} ∪ {(δ x , δ y ), (δ y , δ x )}, is a kernel bisimulation on Sets.However the function h is not an algebra homomorphism from (D(S), µ) to (D(S), µ) since h( 1

: 3 DISTRIBUTION
BISIMILARITY VIATHE POWER OF CONVEX ALGEBRAS 10:23 Hereafter we fix a PA M = (S, L, →) and the corresponding belief-states transformer M bs = (D(S), L, →).We denote by Rel D(S) the lattice of relations over D(S) and define the monotone function b : Rel D(S) → Rel D(S) mapping every relation R ∈ Rel D(S) into b(R) ::

Lemma 7. 4 .
The following functions are compatible:• id: the identity function;• f • g: the composition of compatible functions f and g; • F : the pointwise union of an arbitrary family F of compatible functions:F (R) = f∈F f(R); • f ω : the (omega) iteration of a compatible function f, defined as f ω (R) = ∞ i=0 f i (R)Apart from conv, we are interested in the following up-to techniques.• the constant function r mapping every R into the identity relation Id ⊆ D(S) × D(S);• the square function t mapping every R into t(R) = R 2 = {(ζ 1 , ζ 3 ) | ∃ζ 2 .ζ 1 Rζ 2 Rζ 3 };• the opposite function s mapping every R into its opposite relation R −1 .

Theorem 7. 6 .
Congruences of finitely generated convex algebras are finitely generated.This result informs us that for a PA with a finite state space S, ∼ d ⊆ D(S) × D(S) is finitely generated (since ∼ d is a congruence, see Proposition 6.6).In other words there exists a finite relation R such that cgr(R) = ∼ d .Such R is a finite bisimulation up-to cgr: R ⊆ cgr(R) = ∼ d = b(∼ d ) = b(cgr(R)).Corollary 7.7.Let (S, L, →) be a finite PA and let ζ 1 , ζ 2 ∈ D(S) be two distributions such that ζ 1 ∼ d ζ 2 .Then, there exists a finite bisimulation up-to cgr R such that (ζ 1 , ζ 2 ) ∈ R.