The Big-O Problem

Given two weighted automata, we consider the problem of whether one is big-O of the other, i.e., if the weight of every finite word in the first is not greater than some constant multiple of the weight in the second. We show that the problem is undecidable, even for the instantiation of weighted automata as labelled Markov chains. Moreover, even when it is known that one weighted automaton is big-O of another, the problem of finding or approximating the associated constant is also undecidable. Our positive results show that the big-O problem is polynomial-time solvable for unambiguous automata, coNP-complete for unlabelled weighted automata (i.e., when the alphabet is a single character) and decidable, subject to Schanuel's conjecture, when the language is bounded (i.e., a subset of $w_1^*\dots w_m^*$ for some finite words $w_1,\dots,w_m$) or when the automaton has finite ambiguity. On labelled Markov chains, the problem can be restated as a ratio total variation distance, which, instead of finding the maximum difference between the probabilities of any two events, finds the maximum ratio between the probabilities of any two events. The problem is related to $\varepsilon$-differential privacy, for which the optimal constant of the big-O notation is exactly $\exp(\varepsilon)$.


Introduction
Weighted automata over finite words are a well-known and powerful model of computation, a quantitative analogue of finite-state automata.Special cases of weighted automata include nondeterministic finite automata and labelled Markov chains, two standard formalisms for modelling systems and processes.Algorithms for analysis of weighted automata have been studied both in the early theory of computing and more recently by the infinite-state systems and algorithmic verification communities.
Given two weighted automata A, B over an algebraic structure (S, +, ×), the equivalence problem asks whether the two associated functions f A , f B : Σ * → S are equal: f A (w) = f B (w) for all finite words w over the alphabet Σ.Over the ring (Q, +, ×), equivalence is decidable in polynomial time by the results of Schützenberger [Sch61] and Tzeng [Tze92]; subsequently, fast parallel (NC and RNC) algorithms have been found for this problem [Tze96, KMO + 13].In contrast, for semirings the equivalence problem is hard: undecidable [Kro94,ABK11] for the semiring (Q, max, +) and PSPACE-hard [MS72] for the Boolean semiring (for which weighted automata are usual nondeterministic finite automata and equivalence is equality of recognized languages).For the ring (Q, +, ×), replacing = with ≤ makes the problem harder: the question of whether f A (w) ≤ f B (w) for all w ∈ Σ * is undecidable-even if f A is constant [Paz14].This problem subsumes the universality problem for (Rabin) probabilistic automata, yet another subclass of weighted automata (see, e.g., [Fij17]).
The problem of whether f A (w) ≤ f B (w) holds for all w ∈ Σ * has often been considered, under different semirings, but also for other forms of weighted automata, in which, e.g., lim sup or (discounted) limit averages are used to combine the weights along a run.For example, most decidability and complexity results in [CDH10,BCV18] are on such weighted automata.Alternatively, to regain decidability in view of the undecidability result mentioned in the previous paragraph, one can consider semantic restrictions on the weighted automaton, e.g., the assumption that, for every w, all runs on w have the same weight.This route has been taken, e.g., in [FGR15].
In this paper, we introduce and study another natural problem, in which the condition f A (w) ≤ f B (w) is relaxed.Given A and B as above, is it true that there exists a constant c > 0 such that f A (w) ≤ c • f B (w) for all w ∈ Σ * ?Using standard mathematical notation, this condition asserts that f A (w) = O(f B (w)) as |w| → ∞, and we refer to this problem as the big-O problem accordingly. 1 The big-Θ problem (which turns out to be computationally equivalent to the big-O problem), in line with the Θ(•) notation in analysis of algorithms, asks whether f A = O(f B ) and f B = O(f A ).
We restrict our attention to the ring (Q, +, ×) and only consider non-negative weighted automata, i.e., those in which all transitions have non-negative weights.We remark that, even under this restriction, weighted automata still form a superclass of (Rabin) probabilistic automata, a non-trivial and rich model of computation.Our initial motivation to study the big-O problem came from yet another formalism, labelled Markov chains (LMCs).One can think of the semantics of LMCs as giving a probability distribution or subdistribution on the set of all finite words.LMCs, often under the name Hidden Markov Models, are widely employed in a diverse range of applications; in computer-aided verification, they are perhaps the most fundamental model for probabilistic systems, with model-checking tools such as Prism [KNP11] or Storm [DJKV17] based on analyzing LMCs efficiently.All the results in our paper (including hardness results) hold for LMCs too.Our main findings are as follows.
• The big-O problem for non-negative WA and LMCs turns out to be undecidable in general, by a reduction from nonemptiness for probabilistic automata.• In the unary case, i.e., if the input alphabet Σ is a singleton, the big-O problem becomes decidable and is, in fact, complete for the complexity class coNP.Unary LMCs are a simple and pure probabilistic model of computation: they run in discrete time and can terminate at any step; the big-O problem refers to this termination probability in two LMCs (or two WA).Our upper bound argument refines an analysis of growth of entries in powers of non-negative matrices by Friedland and Schneider [Sch86], and the lower bound is obtained by a reduction from unary NFA universality [SM73].• In a more general bounded case, i.e., if the languages of all words w associated with non-zero weight are included in w * 1 w * 2 . . .w * m for some finite words w 1 , . . ., w m ∈ Σ * (that is, are bounded in the sense of Ginsburg and Spanier ; see [Gin66,Chapter 5] and [GS64]), the big-O problem is decidable subject to Schanuel's conjecture.This is a well-known conjecture in transcendental number theory [Lan66], which implies that the first-order theory of the real numbers with the exponential function is decidable [MW96].Intuitively, our reliance on this conjecture is linked to the expressions for the growth rate in powers of non-negative matrices.These expressions are sums of terms of the form ρ n • n k , where n is the length of a word, k ∈ N, and ρ is an algebraic number.Our algorithms (however implicitly) need to compare for equality pairs of real numbers of the form log ρ 1 / log ρ 2 , where ρ i are algebraic, and it is an open problem in number theory whether there is an effective procedure for this task (the four exponentials conjecture asks whether two such ratios can ever be equal; see, e.g., Waldschmidt [Wal00, Sections 1.3 and 1.4]).
Bounded languages form a well-known subclass of regular languages.In fact, a regular (or even context-free) language L is bounded if and only if the number of words of length n in L is at most polynomial in n.All other regular languages have, in contrast, exponential growth rate (a fact rediscovered multiple times; see, e.g., references in Gawrychowski et al. [GKRS10]).Bounded languages have been studied from combinatorial and algorithmic points of view since the 1960s [GS64, GKRS10], and have recently been used, e.g., in the analysis of quantitative information flow problems in computer security [Mes19b,Mes19a].In the context of labelled Markov chains, languages that are subsets of a * 1 a * 2 . . .a * m (for individual letters a 1 , . . ., a m ∈ Σ) model consecutive arrival of m events in a discrete-time system.It is curious that natural decision problems for such simple systems can lead to intricate algorithmic questions in number theory at the border of decidability.• For unambiguous automata, i.e., where every word has at most one accepting path, the big-O problem is also decidable and can be solved in polynomial time.• For finitely ambiguous automata, i.e., where there exists k, such that every word has at most k accepting paths, the big-O problem is decidable subject to Schanuel's conjecture, similarly to the case of bounded languages.
Further motivation and related work.In the labelled Markov chain setting, the big-O problem can be reformulated as a boundedness problem for the following function.For two LMCs A and B, define the (asymmetric) ratio variation function by where f A (E) and f B (E) denote the total probability mass associated with an arbitrary set of finite words E ⊆ Σ * in A and B, respectively.Here we assume 0 0 = 0 and x 0 = ∞ for x > 0. Observe that, because max( a b , c d ) ≥ a+c b+d for a, b, c, d ≥ 0, the supremum over E ⊆ Σ * can be replaced with supremum over w ∈ Σ * .Consequently, the big-O problem for LMCs is equivalent to deciding whether r (A, B) < ∞, we give a formal proof of this in Section 3.
Finding the value of r amounts to asking for the optimal (minimal) constant in the big-O notation.Further, one can consider a symmetric variant, the ratio distance: rd (A, B) = max{r (A, B), r (B, A)}, by analogy with big-Θ (Proposition 3.2 also applies when r is replaced with rd ).Now, rd is a ratio-oriented variant of the classic total variation distance tv , defined by tv , which is a well-established way of comparing two labelled Markov chains [CK14,Kie18].We also consider the problem of approximating r (as well as rd ) to a given precision and the problem of comparing it with a given constant (threshold problem), showing that both are undecidable.
The ratio distance rd is also equivalent to the exponential of the multiplicative total variation distance defined in [CGPX14,Smi08] in the context of differential privacy.Consider a system M, modelled by a single labelled Markov chain, where output words are observable to the environment but we want to protect the privacy of the starting configuration.Let R ⊆ Q × Q be a symmetric relation, which relates the starting configurations intended to remain indistinguishable.Given ε ≥ 0, we say that M is ε-differentially private (with respect to R) if, for all (s, s Here in the subscript of f and elsewhere, references to states s and s replace references to LMCs/automata: M stays implicit, and we specify which state it is executed from.2Note that there exists such an ε if and only if r (s, s ) < ∞ for all (s, s ) ∈ R or, equivalently, (the LMC M executed from) s is big-O of (the LMC M executed from) s for all (s, s ) ∈ R. In fact, the minimal such ε satisfies e ε = max (s,s )∈R r (s, s ), thus r captures the level of differential privacy between s and s .
Our results show that even deciding whether the multiplicative total variation distance is finite or +∞ is, in general, impossible.Likewise, it is undecidable whether a system modelled by a labelled Markov chain provides any degree of differential privacy, however low.

Preliminaries
Let N, Z, Q and R be the natural, integer, rational and real numbers respectively.When accompanied by a constraint in the subscript, we restrict to the subset of numbers satisfying the constraint.For example, N ≥k are the natural numbers greater than or equal to k, and in particular, N >0 are the positive natural numbers.Definition 2.1.A weighted automaton W over the (Q, +, ×) semiring is defined as a 4-tuple Q, Σ, M, F , where Q is a finite set of states, Σ is a finite alphabet, M : Σ → Q Q×Q is a transition weighting function, and F ⊆ Q is a set of final states.We consider only non-negative weighted automata, i.e.M (a)(q, q ) ≥ 0 for all a ∈ Σ and q, q ∈ Q.
In complexity-theoretic arguments, we assume that each weight is given as a pair of integers (numerator and denominator) in binary.The description size is then the number of bits required to represent Q, Σ, M, F , including the bit size of the weights.Each weighted automaton defines functions f s : Σ * → R, where for all s ∈ Q and A × B is standard matrix multiplication.We refer to f s (w) as the weight of w from state s.Without loss of generality, a weighted automaton can have a single final state.If not, introduce a new unique final state t s.t.M (a)(q, t) = q ∈F M (a)(q, q ) for all q ∈ Q,a ∈ Σ.
We will typically define weighted automata by listing transitions as q p − → a q (to mean M (a)(q, q ) = p) with the assumption that any unspecified transition has weight 0.
Definition 2.2.We denote by L s (W) the set of w ∈ Σ * with f s (w) > 0, that is, with positive weight from s. Equivalently, this is the language of N s (W), the non-deterministic finite automaton (NFA) formed from the same set of states (and final states) as W, start state s, and transitions q a − → q whenever M (a)(q, q ) > 0.
Given s, s ∈ Q, we say that s is big-O of s if there exists C > 0 such that f s (w) ≤ C • f s (w) for all w ∈ Σ * .The paper studies the following problem.
In the paper we also work with labelled Markov chains.In particular, they will appear in examples and hardness (including undecidability) arguments.As they are a special class of weighted automata, this will imply hardness (resp.undecidability) for weighted automata in general.On the other hand, our decidability results will be phrased using weighted automata, which makes them applicable to labelled Markov chains.
Definition 2.4.A labelled Markov chain (LMC) is a (non-negative) weighted automaton Q, Σ, M, F such that, for all q ∈ Q \ F , we have q ∈Q a∈Σ M (a)(q, q ) = 1 and M (a)(q, q ) = 0 for all a ∈ Σ, q ∈ F and q ∈ Q.
Since final states have no outgoing transitions, w.l.o.g., one can assume a unique final state.For LMCs, the function f s can be extended to a measure on the powerset of Σ * by f s (E) = w∈E f s (w), where E ⊆ Σ * .The measure is a subdistribution: w∈Σ * f s (w) ≤ 1.
Probabilistic automata are similar to LMCs, except that M (a) is stochastic for every a, rather than a∈Σ M (a) being stochastic.
Definition 2.5.A probabilistic automaton A is a non-negative weighted automaton with a distinguished start state q s such that q ∈Q M (a)(q, q ) = 1 for all q ∈ Q and a ∈ Σ.We use the notation P A (w) = f qs (w), where q s is the start state of the probabilistic automaton A.
We will also consider unary weighted automata, and similarly LMCs, where |Σ| = 1.Then we will often omit Σ on the understanding that Σ = {a}, and describe transitions with a single transition matrix A = M (a) so that f s (a n ) = A n s,t , where t is the unique final state.Note that A n s,t stands for (A n )(s, t), and not (A(s, t)) n .Using the notation of regular expressions, we can write L s (W) ⊆ a * .It will turn out fruitful to consider several larger classes of languages: In each case, if the language of an NFA is suitably bounded, one can extract a corresponding bounding regular expression [GKRS10].

Related problems
In this section we show three related problems to the Big-O problem: the ratio variation function, the Big-Θ problem and the eventually Big-O problem.In particular, Observation 3.1, relating the ratio variation function, will be useful when we show undecidability.
3.1.Ratio variation functions and distances.We introduced in Section 1 the ratio variation function r (s, s ) = sup E⊆Σ * (f s (E)/f s (E)), (now specified between two different starting states) and the ratio distance rd (s, s ) = max{r (s, s ), r (s , s)} on labelled Markov chains.The supremum is taken over sets of words, reflecting its origin as a generalisation of the total variation distance.However, over labelled Markov chains this is not necessary, as the supremum function can be taken over individual words.The big-O problem is also taken over individual words, and taken in the formulation r (s, s ) = sup w∈Σ * (f s (w)/f s (w)) the connection between the notion ratio variation function and the big-O problem becomes clear: We now formally show this reformulation of the ratio variation distance.
Proposition 3.2.For r on labelled Markov chains, it is sufficient to consider the supremum over w ∈ Σ * rather than E ⊆ Σ * .
Proof of Proposition 3.2.We will show we can approximate any event by a finite subset, then we can always simplify an event with more than one word, and not decrease the ratio.
Suppose a+c b+d > a b and a+c b+d > c d .By the first inequality, we have ab + bc > ab + ad, so bc > ad.But by the second inequality, we have ad + cd > bc + cd, so ad > bc.This is a contradiction.
By repeated application of this technique, a finite set can always be simplified until the set has just a single element, resulting in a ratio which is no smaller.That is, there exists Now, we show that the consideration of finite sets is sufficient: consider an event E ⊆ Σ * , then for every λ > 0 there is a Lemma 12].For any ε, by choice of sufficiently small λ there is a finite set (E) and by Eq. (3.1) this is equivalent to lim k→∞ sup w∈Σ * ∩Σ ≤k f A (w) 3.2.The big-Θ problem.One could consider whether s is big-Θ of s , defined as s is big-O of s and s is big-O of s; or equivalently for LMCs, whether rd (s, s ) < ∞.We note that these two notions reduce to each other, justifying our consideration of only the big-O problem.There is an obvious reduction from big-Θ to big-O making two oracle calls (a Cook reduction), but, as we show below, this can be strengthened to a single call preserving the answer (a Karp reduction).In the other direction, one can ask if s big-O of s using big-Θ by asking if a linear combination of s and s is big-Θ of s : this ensures that the answer to s is big-O of s and s is always true, essentially leaving only s and s big-O of s to be checked, which depends only on whether s is big-O of s .
Lemma 3.3.The big-O problem is interreducible with the big-Θ problem. Proof.
Direction 1 (big-O problem reduces to the big-Θ problem).To ask if s is big-O of s , add states q, q using the construction of Fig. 1a, then ask if q is big-Θ of q .
f q (aw) f q (aw) = 0.5f s (w) + 0.5f s (w) Direction 2 (big-Θ problem reduces to the big-O problem).Given an automaton with weighting function f , let us construct a new automaton f by replacing every transition is a new state.The effect of this procedure is that Any word of odd length has weight zero in f .Given a word w = a 1 . . .a n , denote by w = a 1 a 1 . . .a n a n .Let us choose any character a ∈ Σ.We now add states q, q , • 1 , • 2 as follows: (1) q 0.5 − − → a s and q 0.5 We claim that to ask if s is big-Θ of s is equivalent to asking if q is big-O of q .The idea of the construction is provided in Fig. 1b.Then we obtain the requisite bounds as follows: f q (a w) Each of the reductions operates in logarithmic space.
3.3.The Eventually Big-O Problem.Readers familiar with the big-O notation may recall definition on f, g : N → N, of the form: Despite excluding finitely many points, when g(n) ≥ 1, it is equivalent to the definition we use, ∃C > 0 ∀n > 0 f (n) ≤ C g(n), by taking C large enough to deal with the finite prefix.
In the paper, though, we formally consider s to not be big-O of s if there exists even a single word w such that f s (w) > 0 and f s (w) = 0.However, for weighted automata, we could amend our definition to "eventually big-O" as follows: The Eventually Big-O Problem and Big-O Problem are also interreducible.We delay showing this equivalence to Section 6.1, as it depends on some further technical development.
Allowing additive error (in addition to the multiplicative error) in the inequality can also capture a similar relaxation of the Big-O problem.Colcombet and Daviaud [CD13] study the problem of affine domination, which asks whether there exists c > 0 such that f s (w) ≤ c • f s (w) + c for all w ∈ Σ * ?The problem is shown decidable for weighted automata over the tropical (N ∪ {∞}, min, +) semiring, contrasting the containment problem (f s (w) ≤ f s (w) for all w ∈ Σ * ?) which is also undecidable in that semiring.We observe that both containment and the big-O problem are undecidable in the (Q ≥0 , +, ×) semiring that we are considering.

Big-O, Threshold and Approximation problems are undecidable
We show that the big-O problem is undecidable.We also establish undecidability for several other problems related to computing and approximating the ratio variation distance.Recall that this corresponds to identifying the optimal constant for positive instances of the big-O problem or the level of differential privacy between two states in a labelled Markov chain.Results in this section are presented on ratio total variation distances on labelled Markov chains, and thus apply to the big-O problem in the more general weighted automata.
In the following definition, a promise problem (see e.g., [ESY84]) restricts the inputs to the problem.An algorithm professing to decide the problem need only give a correct answer on the inputs conforming to the promise.If the input does not meet the promise, then the answer is not specified and can be arbitrary (including non-termination).In particular, it need not be decidable whether the promise holds.
Definition 4.1.The asymmetric threshold problem takes an LMC along with two states s, s and a constant θ, and asks if r (s, s ) ≤ θ.The variant under the promise of boundedness promises that r (s, s ) < ∞; i.e., the input to the problem is restricted to the instances where it is known that r (s, s ) < ∞.
The strict variant of each problem replaces ≤ with <.
The asymmetric additive approximation task takes an LMC, two states s, s and a constant γ, and asks for x such that |r (s, s ) − x| ≤ γ.The asymmetric multiplicative Bruijn's classic monograph [dB81, Section 1.2].For two nonnegative functions f, g : R → R: for all x ∈ S, -f = O(g) as x → a (as x → ∞, respectively) if there is a neighbourhood of a (of ∞, respectively) such that f = O(g) in that neighbourhood, in the sense defined above.It is now clear that our definition of big-O from Section 2 asserts fs(w) = O(f s (w)) for all w ∈ Σ * , i.e., on the set Σ * .In comparison, the "eventually big-O" asserts that fs(w) = O(f s (w)) as |w| → ∞. approximation task takes an LMC, two states s, s and a constant γ, and asks for x such that 1 − γ ≤ x r (s,s ) ≤ 1 + γ.In each case, the symmetric variant is obtained by replacing r with rd .
• The big-O problem is undecidable, even for LMCs.
• Each variant of the threshold problem (asymmetric/symmetric, non-strict/strict) is undecidable, even under the promise of boundedness.• All variants of the approximation tasks (asymmetric/symmetric, additive/multiplicative) are recursively unsolvable, even under the promise of boundedness.
The proof of Theorem 4.2 will reduce from the emptiness problem for probabilistic automata (recall Definition 2.5).The problem Empty asks, given a probabilistic automaton A, if P A (w) ≤ 1 2 for all words w ∈ Σ * .(Is the language {w ∈ Σ * | P A (w) > 1 2 } empty?)It is known to be undecidable [Paz14,Fij17].Recall, we use the notation P A (w) to refer to the probability mass in the probabilistic automaton, and the notation f s (w) for the weight (from state s) in the labelled Markov chain.
Proof idea for Theorem 4.2.We reduce from Empty.We assume we are given any probabilistic automaton A, for which we would like to decide the emptiness problem.We first show a construction which will construct a specific labelled Markov chain from the probabilistic automaton.The resulting labelled Markov chain will have the property that answering any of the questions of Definition 4.1 would allow us to answer the emptiness problem for the probabilistic automaton A. Therefore, an algorithm to decide any of the questions in Definition 4.1 would give an algorithm to decide the emptinenss problem.Since the emptiness problem is undecidabile, meaning there is no algorithm that can decide whether any given probabilistic automaton is empty or not, then there can be no algorithm to decide any of these questions.
The construction creates two branches of a labelled Markov chain.The first simulates the probabilistic automaton using the original weights multiplied by a scalar ( 1 4 in the case |Σ| = 2).The other branch will process each letter from Σ with equal weight (also 1 4 in an infinite loop).Consequently, if there is a word accepted with probability greater than 1 2 , the ratio between the two branches will be greater than 1.The construction will enable words to be processed repeatedly, so that the ratio can then be pumped unboundedly.Certain linear combinations of the branches entail a gap promise, entailing undecidability of the threshold and approximation tasks.
We now implement the proof idea: first we show the properties required of the construction and (in Section 4.1) how these properties imply Theorem 4.2.Then we show the construction (in Section 4.2) and prove that the desired properties hold (in Section 4.3).
Given any probabilistic automaton A, we construct a labelled Markov chain with three distinguished starting states s, s and s .These states will exhibit the following the properties: 4.1.Properties 4.3 imply Theorem 4.2.The following lemma plays a key role in proving the result.In its statement, "undecidable to distinguish" means that the corresponding promise problem is undecidable.In other words, if the input is not in one of the two cases which should be distinguished between, the answer is not specified and can be arbitrary (including non-termination).
Due to the undecidability of Empty, a construction satisfying Properties 4.3 immediately establishes the following undecidability results: Lemma 4.4.
• Given an LMC along with two states s, s and constant c, it is undecidable to distinguish between r (s, s ) ≤ c and r (s, s ) = ∞.• Given an LMC along with two states s, s and two numbers c and C such that c < C, it is undecidable to distinguish between r (s, s ) ≤ c and C ≤ r (s, s ) < ∞.
Both statements remain true if r is replaced with rd .This is because r (s , s) ≤ 2 and r (s , s) ≤ 2 ensures that r (s, s ) and r (s, s ) dominate in the computation of rd , thus the proceeding four statements hold when replaced by rd .Before we give the construction satisfying Properties 4.3, we show how Lemma 4.4 shows our theorem.
Proof of Theorem 4.2.We reason by contradiction using Lemma 4.4.For the big-O problem, it suffices to observe that, if it were decidable, one could use it to solve the first promise problem from the Lemma (recall that in a promise problem the input is guaranteed to fall into one of the two cases).This would contradict Lemma 4.4.
Similarly, the decidability of the (asymmetric) threshold problem would allow us to distinguish between r (s, s ) ≤ c and C ≤ r (s, s ) < ∞ (second promise problem from the Lemma) by considering the instance r (s, s ) ≤ c+C 2 (non-strict variant) or r (s, s ) < c+C

(strict variant)
. A positive answer (regardless of the variant) implies r (s, s ) ≤ c, while a negative one yields r (s, s ) ≥ C, which suffices to distinguish the cases.Note that in both cases r (s, s ) is bounded, so the reasoning remains valid if it is known in advance that r (s, s ) is bounded.
For additive (asymmetric) approximation, we observe that finding x such that |r (s, s ) − x| ≤ C−c 4 and comparing it with c+C 2 makes it possible to distinguish between r (s, s 4C and comparing x with c+C 2 yields an analogous argument.Since Lemma 4.4 also applies to rd , all of our results hold when r is replaced by rd .4.2.The construction.We now show how to build, given a probabilistic automaton, a labelled Markov chain that satisfies Properties 4.3.(We prove that the construction indeed satisfies these properties in the next section.)Construction 4.5.For both cases, we reduce from Empty.We show our construction for Σ = {a, b}, but the procedure can be generalised to arbitrary alphabets.
The construction will create two branches of a labelled Markov chain.The first, from state q s , will simulate the given probabilistic automaton using the original weights multiplied by the same scalar (in this case 1 4 ).The other branch, from state s 0 , will process each letter from Σ with equal weight (also 1 4 in an infinite loop).Consequently, if there is a word accepted with probability greater than 1 2 , the ratio between the two branches will be greater than 1.The construction will make it possible to process words repeatedly, so that the ratio can then be pumped unboundedly.
Formally, suppose we are given a probabilistic automaton A = Q, Σ, M, F with start state q s .First observe that w.l.o.g.q s is not accepting, since in this case the empty word is accepted with probability 1, and thus there is a word with probability greater than 1 2 and a trivial positive instance of the big-O problem can be returned.
We construct the LMC Q , Σ , δ, F taking Q = Q {s, s , s , s 0 , t} where denotes disjoint union, Σ = {a, b, acc, rej , }, F = {t} and δ as specified below.First we simulate the probabilistic automaton with a scaling factor of 1 4 : for all q, q ∈ Q, Originally accepting runs trigger a restart, while rejecting ones are redirected to t: We then add a part of the chain which behaves equally, rather than according to the probabilistic automaton: The construction is illustrated in Fig. 2. To complete the reduction, we add the following transitions from s, s , s :

4.3.
Proof that the construction satisfies Properties 4.3.It remains to show that Construction 4.5 satisfies Properties 4.3, which we show in the following two lemmas.
If there is a word w that is accepted by the automaton with probability > 1 2 , then let w i = (w acc) i rej and we have Since P(w) > 1 2 then 2P(w) > 1 and we have:  Reduction; where q a represents accepting states of the probabilistic automaton, q r represents rejecting states and q s represents the start state (assumed to be rejecting).
If there is no such word then ∀w ∈ Σ * : P(w) ≤ 1 2 , then probability ratio of all words is bounded.All words stating from states s and s start with and are terminated by rej , so in general all words take the form w = (w 1 acc) . . .(w n acc)(w n+1 rej ).Let us consider the probability of w = (w 1 acc) . . .(w n acc)(w n+1 rej ) words from s 0 and q s .Then: Then using Eq.(4.1) for every word w we have 1 2 ≤ fs(w) f s (w) ≤ 3 2 and r (s, s ) ≤ 3 2 and rd (s, s ) ≤ 2. Lemma 4.7.If A ∈ Empty then 49 < r (s, s ) ≤ 51.If A ∈ Empty then r (s, s ) ≤ 2. In either case, r (s , s) ≤ 2.
Proof.We first observe that f s ( w) fs( w) is always ≤ 2, resulting in the only interesting direction being fs( w) f s ( w) : We observe that for all words w, the quantities r and rd are bounded: If there is a word w that is accepted by the automaton with probability > 1 2 , then we consider the sequence of words w i = (w acc) i rej : .
By the previous proof (Eq.(4.2)) we know So for all ε there exists an i such that fs( w i ) If there is no such word then P(w) ≤ 1 2 for all w ∈ Σ * .We show that the total variation distance will be small.All words starting from state s and s start with and are terminated by rej , so in general all words take the form w = (w 1 acc) . . .(w n acc)(w n+1 rej ).Let us consider the probability of such words from s, s : This creates a significant gap between the case where there is a word with probability greater than one half and not; in particular if there exists w with P(w) > 1 2 then 49 < r (s, s ) ≤ 51 and 49 < rd (s, s ) ≤ 51, but if no such word exists then r (s, s ) ≤ 2 and rd (s, s ) ≤ 2.
Remark 4.8.The classic non-strict threshold problem for the total variation distance (i.e.whether tv (s, s ) ≤ θ) is known to be undecidable [Kie18], like our distances.However, it is not known if its strict variant (i.e.whether tv (s, s ) < θ) is also undecidable.In contrast, in our case, both variants are undecidable.Further note that (additive) approximation of tv is possible [Kie18,CK14], but this is not the case for our distances r and rd .

The relation to the Value-1 Problem
In the previous section we have shown the undecidability of the big-O problem using the undecidability of the emptiness problem for probabilistic automata.Another proof of undecidability can be obtained using the Value-1 problem, shown to be undecidable in [GO10]: indeed, we will show in this section that the big-O problem and the Value-1 problem are interreducible.
The Value-1 problem asks whether there exists a sequence of words with probability tending to 1: Probabilistic automaton A output For all δ > 0, does there exist a word w such that P A (w) > 1 − δ ?
We will exhibit a close, but not complete, connection between the Value-1 problem and big-O problem by reducing in both directions between the two, see Section 5.1.In this section we are concerned only with matters of decidability, and thus do not consider whether the reductions are efficient.
Our main decidability results (in Sections 7, 8, and 9) are independent of the findings of this section, so it can be skipped during the first reading.However, Section 5.2 below justifies these further developments, showing why the connection exhibited here does not entail decidability results for the big-O problem by a transfer from decidable cases of Value-1.Proof.Assume we are given a probabilistic automaton A = Q, Σ, M, F A and a dedicated starting state q 0 ∈ Q, which accepts words w with probability P A (w).First construct A in which words w are accepted with probability P A (w) = 1 − P A (w), by inverting accepting states, that is,

Interreduction between
Let us first consider the idea of the proof.Consider the probabilistic automaton B such that B(w) = 1 for all w ∈ Σ * .Then B is not big-O of A if and only if there is a sequence of words for which A (w i ) i→∞ − −− → 0. Since we wish to show the undecidability even for labelled Markov chains, we adjust the weight at each step by the size of the alphabet in order to encode a probabilistic automaton inside the labelled Markov chain.The remainder of the proof formally builds this construction.
Construct a Markov chain and F = {acc}.The probabilistic automaton A will be simulated by M A .The map M (a)(q, q ) = p is described using the notation q p − → a q as follows: For all q ∈ Q : Note the only words with positive probability are words of the form $Σ * $ ⊆ Σ * .Then Then if there is a sequence of words (w i ) i for which P A (w i ) tends to 1 then fs($w i $) f s ($w i $) is unbounded.However, if there exists some γ > 0 so that for all w ∈ Σ * we have P A (w) ≤ (1−γ) then (1 − P A (w)) ≥ γ, and so fs($w$) f s ($w$) ≤ 1 γ .
Proposition 5.3.The big-O problem reduces to the Value-1 problem.
Proof.We take an instance of the big-O problem, assumed to be a labelled Markov chain with states s, s and construct probabilistic automaton A as an instance of the Value-1 problem.This new probabilistic automaton will have the property that s is not big-O of s if and only if A is a positive instance of the Value-1 problem.We start with the construction, followed by both directions of the implication.
Construction 5.4.Given a labelled Markov chain M = Q, Σ, M, F and s, s ∈ Q, construct a probabilistic automaton A = Q , Σ , M , F .Each state of Q will be duplicated, once for s and once for s ; The reduction can be seen in Fig. 3.Each transition of M will be simulated in each of the copies according to the probability in M. For every q, q ∈ Q and a ∈ Σ, let M (a)(q s , q s ) = M (a)(q, q ) and M (a)(q s , q s ) = M (a)(q, q ).A probabilistic automaton should be stochastic for every a ∈ Σ, so there is q 0 s s s s sink 1 q j s : q j ∈ F q i s : q i ∈ F q j s : q j ∈ F q i s : Only the effect of transitions on the $ symbol are shown in black, with the possibility to transition to the recoverable sink state sink 1 depicted in grey (on symbols in Σ).All remaining transitions and sink 2 are omitted.
unused probability for each character, which will divert to a (recoverable) sink.For every q ∈ Q and a ∈ Σ, let There is an additional control character $: from q 0 the machine will pick either of the two machines with equal probability: M ($)(q 0 , s s ) = M ($)(q 0 , s s ) = 1 2 .If in the accepting, rejecting, or second (unrecoverable) sink state the system will stay there forever: The behaviour on $ will differ in the two copies of M. If in an accepting state reached from s, the system will accept.If in an accepting state reached from s the system will reject.Otherwise the system will restart back to q 0 .Formally, M ($)(q s , acc) = 1 when q s ∈ F and M ($)(q s , q 0 ) = 1 when q s ∈ F and M ($)(q s , rej ) = 1 when q s ∈ F and M ($)(q s , q 0 ) = 1 when q s ∈ F.
It is intended that all a ∈ Σ are unreadable from q 0 : the system moves to the unrecoverable sink state sink 2 , M (a)(q 0 , sink 2 ) = 1.
The idea is that if f s (w) f s (w) then, by repeatedly reading the word w, all of the probability mass will eventually move to acc; otherwise a sufficiently large amount of mass will be lost to rej .
Denote by P A (w) the probability of a word w in the probabilistic automaton A, from state q 0 .Here we use the notation f to refer to the probability in the original labelled Markov chain M. Further, the notation P[q w − → q ] stands for M (w 1 ) × • • • × M (w |w| ) q,q , i.e., the probability of transitioning from state q to q while reading w in A.
Direction 1 (Not big-O implies Value-1).The proof shows that for every δ > 0 there exists C > 0 such that, for all w ∈ Σ * , the inequality f s (w) > Cf s (w) implies P A (($w$) i ) > 1 − δ for some appropriately chosen i ∈ N.
Hence given δ, choose C such that (1 Then by the not big-O property, choose a word w such that f s (w The input word $w$ induces a (unary) Markov chain, represented by the matrix A, representing states q 0 , acc and rej in the three positions, respectively: Then, for the word ($w$) i , for i ∈ N, starting from state q 0 , observe: For each i, we have A i (q 0 , acc) + A i (q 0 , rej ) + A i (q 0 , q 0 ) = 1.Choose i such that A i (q 0 , q 0 ) ≤ δ 2 .Then A i (q 0 , acc) + A i (q 0 , rej ) ≥ 1 − δ 2 , using the fact that A i (q 0 , acc) = C A i (q 0 , rej ), obtaining A i (q 0 , acc) Direction 2 (big-O implies Not Value-1).We know that there exists C > 0 such that, for all w ∈ Σ * , f s (w) ≤ Cf s (w), and should show that there exists δ > 0 such that every w ∈ (Σ ∪ {$}) * has P A (w) ≤ 1 − δ.
To move probability from q 0 to acc it is necessary to use words of the form ($Σ * $) * where Σ is the alphabet of M. Hence any word w can be decomposed into $w m $$w m−1 $...$w 1 $. and observe x m = P A (w).We compute x i , y i inductively and show x i ≤ Cy i for all 1 ≤ i ≤ m.

Let us define x
Consider reading w 1 , the probability is such that Since there exists C > 0 such that, for all w i , f s (w i ) ≤ Cf s (w i ), we have x 1 ≤ Cy 1 .Repeating this argument inductively, we show that x i ≤ Cy i for all i.Indeed: Hence, assuming f s (w i ) ≤ Cf s (w i ) and x i−1 ≤ Cy i−1 we have: We have , so the probability of reaching acc is bounded away from 1 for every word.

What do decidable cases of the Value-1 problem tell us about the big-O problem?
The Value-1 problem is undecidable in general, however it is decidable in the unary case in coNP [CKV14] and for leaktight automata [FGO12].Note, however, that the construction of Proposition 5.3 combined with these decidability results does not entail decidability results for the big-O problem.Firstly note that the construction adds an additional character, and so a unary instance of the big-O problem always has at least two characters when translated to the Value-1 problem.Further, as we show in the remainder of this section, the construction does not result in a leaktight automaton.The following argument, does not, of course, preclude the existence of an alternative construction which could, hypothetically, maintain these properties.
Let us recall the definition of leaktight automata [FGO12].40:19 Definition 5.5.A finite word u is idempotent if reading once or twice the word u does not change qualitatively the transition probabilities.That is Let u n be a sequence of idempotent words.Assume that the sequence of matrices P A (u n ) converges to a limit M , that this limit is idempotent and denote M the associated Markov chain.The sequence u n is a leak if there exist r, q ∈ Q such that the following three conditions hold: (1) r and q are recurrent in M , (2) An automaton is leaktight if there is no leak.
If there were no leak in the probabilistic automata obtained from the reduction of Proposition 5.3, then the decidability of the big-O problem for labelled Markov chains would follow.However, this is not the case, i.e., Proposition 5.3 does not solve any cases of the big-O problem by reducing to known decidable fragments of Value-1.
Proposition 5.6.The resulting automaton from the reduction of the big-O problem to the Value-1 problem has a leak.
Proof.Consider some infinite sequence of words w i growing in length, such that f s (w i ) > 0 for every i.Let u i = $w i $.
Observe that this word is idempotent.For each starting state, consider the possible states with non-zero probability and from each of these the set of reachable states.Observe that in all cases the set reachable after one application is equal to the set reachable after two: Assume that the original labelled Markov chain has a sink, that is, the decision to terminate the word must be made by probability.Then for all λ > 0 there exists n such that f s (Σ >n ) < λ and f s (Σ >n ) < λ [Kie18, Lemma 12].
Suppose the sequence of matrices P A (u n ) converges to a limit M (this is with no loss of generality, because each matrix in the sequence belongs to a compact set, so a convergent subsequence must exist) and let r = q 0 and q = acc.By the observation above, for longer and longer words the probability of reaching acc from q 0 is diminishing.Thus lim P A [r un −→ q] = 0, and in M we have r and q in different SCCs.The state acc is recurrent as it is deterministically looping on every character.Since the probability of reaching a final state is diminishing for longer and longer words, whenever $ is read the state returns to r, hence all words return to r with probability 1 in the limit.By the choice of words in the sequence, for every word f s (w n ) > 0, we have P A [r un −→ q] > 0 for all n.
Hence a leak has been defined, even in the case where M is unary.

The Language Containment condition
Towards decidability results, we identify a simple necessary (but insufficient) condition for s being big-O of s .
Definition 6.1 (LC condition).A weighted automaton W = Q, Σ, M, F and s, s ∈ Q satisfy the language containment condition (LC) if for all words w with f s (w) > 0 we also have f s (w) > 0. Equivalently, L s (W) ⊆ L s (W).
The condition can be verified by constructing NFA N s (W), N s (W) that accept L s (W) and L s (W) respectively and verifying L(N s (W)) ⊆ L(N s (W)).Remark 6.2.Recall that NFA language containment is NL-complete if the automata are in fact deterministic, in P if they are unambiguous [Col15, Theorem 3], coNP-complete if they are unary [SM73] and PSPACE-complete in general [MS72].In all cases this complexity level will match, or be lower than that for our respective algorithm for the big-O problem.
We observe that, if s is big-O of s , the LC condition must hold and so the LC condition is the first step in each of our verification routines.Example 6.3 shows that the condition alone is not sufficient to solve the big-O problem, because two states can admit the same set of words with non-zero weight, yet the weight ratios become unbounded.
Example 6.3.Consider the unary automaton W in Fig. 4. We have 6.1.The Eventually Big-O Problem.Recall from Section 3.3 that the eventually big-O problem asks whether there exists C > 0, k > 0 such that forall w ∈ Σ ≥k we have f s (w) ≤ C • f s (w).We now justify our focus on the Big-O problem by showing the close relationship between the two problems.The big-O problem reduces to its eventual variant by checking both the LC condition and the eventually big-O condition.Thus our undecidability (and hardness) results transfer to the eventually big-O problem.The eventually big-O problem can be solved via the big-O problem by "fixing" the LC condition through the addition of a branch from s that accepts all appropriate words with very low probability.
Let W = Q, Σ, M, {t} be a weighted automaton, s, s ∈ Q, and s = s .Below, whenever we write f s (resp.f s ), this will refer to word weights from s (resp.s ) in W. We assume without loss of generality that the state s has no incoming transitions (if this is not the case, a fresh copy of s can be made).
Choose δ to be a real number such that 0 < δ < 1 and δ is smaller than any positive weight in W. Construct W by adding the following transitions for all x ∈ Σ: where • is a new state.Consequently, for any w ∈ Σ + , we get: (1) the weight of w in Proof.
Direction 1 (⇒).Suppose s is eventually big-O of s in W, i.e. there exist C, k such that, for all w ∈ Σ ≥k , f s (w) ≤ Cf s (w).Note that, for w ∈ Σ ≥k , this implies that, whenever f s (w) > 0, we must also have Note that δ k ≤ δ |w| follows from w ∈ Σ <k and 0 < δ < 1.
Taking max(C, C ) as the relevant constant, we can conclude that s is big-O of s in W .
Consequently, for any w ∈ Σ ≥k , f s (w) ≤ 2Cf s (w), i.e. s is eventually big-O of s in W.
The above argument relied on completing the automaton so that any word is accepted with some weight.To transfer our decidability results for bounded languages, it will be necessary to complete the automaton with respect to a bound, i.e. the extra weights are added only for words from a This can be done easily by introducing the extra transitions according to DFA for the bounding language.

The big-O problem for unary weighted automata is coNP-complete
In this section we show coNP-completeness in the unary case.
Theorem 7.1.The big-O problem for unary weighted automata is coNP-complete.It is coNP-hard even for unary labelled Markov chains.
For the upper bound, our analysis will refine the analysis of the growth of powers of non-negative matrices of Friedland and Schneider [FS80,Sch86] which gives the asymptotic order of growth of A n s,t + A n+1 s,t + • • • + A n+q s,t ≈ ρ n n k for some ρ, k and q, which smooths over the periodic behaviour (see Theorem 7.5).Our results require a non-smoothed analysis, valid for each n.This is not provided in [FS80,Sch86], where the smoothing forces the existence of a single limit-which we don't require.Our big-Θ lemma (Lemma 7.9) will accurately characterise the asymptotic behaviour of A n s,t by exhibiting the correct value of ρ and k for every word.
We show that these asymptotic behaviours can be captured using suitably defined finite automata.We then reduce the big-O problem to language containment problems on these automata.We start by introducing several definitions, and the result of Friedland and Schneider in Section 7.1, before implementing the proof of the upper bound in Section 7.2.We show coNP-hardness in Section 7.3 to complete the proof of Theorem 7.1.7.1.Preliminaries.Let W be a unary non-negative weighted automaton with states Q, transition matrix A and a unique final state t.When we refer to a path in W, we mean a path in the NFA of W, i.e. paths only use transitions with non-zero weights and states on a path may repeat.Definition 7.2.
• A state q can reach q if there is a path from q to q .In particular, any state q can always reach itself.• A strongly connected component (SCC) ϕ ⊆ Q is a maximal set of states such that for each q, q ∈ ϕ, q can reach q .We denote by SCC(q) the SCC of state q and by A ϕ the |ϕ| × |ϕ| transition matrix of ϕ.Note every state is in a SCC, even if it is a singleton.• The DAG of W is the directed acyclic graph of strongly connected components.Components ϕ, ϕ are connected by an edge if there exist q ∈ ϕ and q ∈ ϕ with A(q, q ) > 0. • The spectral radius of an m × m matrix A is the largest absolute value of its eigenvalues.
Recall the eigenvalues of A are {λ ∈ C | exists vector x ∈ C m , x = 0 with A x = λ x}.The spectral radius of ϕ, denoted by ρ ϕ , is the spectral radius of A ϕ .By ρ(q) we denote the spectral radius of the SCC in which q is a member.• We denote by T ϕ the period of the SCC ϕ: the greatest common divisor of return times for some state s ∈ ϕ, i.e. gcd{t ∈ N | A t (s, s) > 0}.It is known that any choice of state in the SCC gives the same value (see e.g., [Ser13,Theorem 1.20]).If A ϕ = [0] then T ϕ = 0. • Let P(s, s ) be the set of paths from the SCC of s to the SCC of s in the DAG of W.
Lemma 7.3.Given A ϕ , a representation of the value ρ ϕ can be found in polynomial time.This representation will admit polynomial time testing of ρ ϕ > ρ ϕ and ρ ϕ = ρ ϕ and can be embedded into the first order theory of the reals.
Proof.An algebraic number z can be represented as a tuple (p z , a, b, r) Here p z is a polynomial over x and a, b, r form an approximation such that z is the only root of Given a polynomial, one can find the representation of each of its roots in polynomial time using the root separation bound due to Mignotte [Mig82].Operations such as addition and multiplication of two algebraic numbers, finding |x|, testing if x > 0 can be done in polynomial time in the size of the representation (p, a, b, r), yielding the same representation [BPR05,Coh13,Pan96]; see, e.g., [OW14, Section 3] for a summary.
Any coefficient of the characteristic polynomial of an integer matrix can be found in GapL [HT03].GapL is the difference of two #L calls, each of which can be found in NC 2 ⊆ P.Here the matrix will be rational; but it can be normalised to an integer matrix by a scalar, the least common multiple of the denominator of each rational.Whilst this number could be exponential, it is representable in polynomial space.The final eigenvalues can be renormalised by this constant.
The characteristic polynomial of an n × n matrix has degree at most n, since each coefficient can be found in polynomial time, the whole characteristic polynomial can be found in this time.Thus by enumerating its roots (at most n), taking the modulus of each, and sorting them (a > b ⇐⇒ a + (−1) × b > 0) we can find the spectral radius in this form (p z , a, b, r).
Note that the spectral radius is a real number, so that given the spectral radius in the form (p z , a, b, r) we actually have b = 0. Then the number can be encoded exactly in the first order theory of the reals using ∃x : The asymptotic behaviours of weighted automata will be characterised using (ρ, k)-pairs: Friedland and Schneider [FS80,Sch86] essentially use (ρ, k)-pairs to show the asymptotic behaviour of the powers of non-negative matrices.In particular they find the asymptotic behaviour of the sum of several A n s,s , smoothing the periodic behaviour of the matrix.
Theorem 7.5 (Friedland and Schneider [FS80,Sch86]).Let A be an m × m non-negative matrix, inducing a unary weighted automaton W with states In the case where the local period is 1 (T (s, t) = T (s , t) = 1), Theorem 7.5 can already be used to solve the big-O problem (in particular if the matrix A is aperiodic).In this case A n s,t = B n s,t = Θ(ρ(s, t) n n k(s,t) ).Then to establish that s is big-O of s we check that the language containment condition holds and that (ρ(s, t), k(s, t)) ≤ (ρ(s , t), k(s , t)).However, this is not sufficient if the local period is not 1.Example 7.6.Consider the chains shown in Fig. 5 with local period 2. The behaviour for n ≥ 3 is A n s,t = Θ(0.5 n n) and A n s ,t = Θ(0.25 n ) when n is odd and A n s ,t = Θ(0.5 n n) when n is even.However, Theorem 7.5 tells us B n s,t = Θ(0.5 n n) and B n s ,t = Θ(0.

Upper bound:
The unary big-O problem is in coNP.Let W be a unary weighted automaton and suppose we are asked whether s is big-O of s .We assume w.l.o.g.(a) that there is a unique final state t with no outgoing transitions, and (b) that s, s do not appear on any cycle (if this is not the case, copies of s, s and their transitions can be taken).
Next we define a "degree function", which captures the asymptotic behaviour of each word a n by a (ρ, k)-pair, capturing the exponential and polynomial behaviours respectively.Definition 7.7.Given a unary weighted automaton W, let d s,t : N → R × N be defined by d s,t (n) = (ρ, k), where: • ρ is the largest spectral radius of any vertex visited on any path of length n from s to t; • the path from s to t that visits the most SCCs of spectral radius ρ visits k + 1 such SCCs; • if there is no length-n path from s to t, then (ρ, k)=(0, 0).
The set of admissible (ρ, k)-pairs is the image of d s,t .Observe that this set is finite and of size at most |Q| 2 : there can be no more than |Q| values of ρ (if at worst each state were its own SCC) and the value of k is also bounded by the number of SCCs and thus |Q|.
To prepare for the proof of our key big-Θ lemma (stated below), we next define the (ρ, k)-annotated version of the weighted automaton W. Namely, in each state we record the relevant value of (ρ, k) corresponding to the current run to the state.
Observe that the automaton W † is constructable in polynomial time given W. Indeed, the spectral radii of all SCCs can be computed and compared to each other in time polynomial in the size of W (see Lemma 7.3).
Let s, t ∈ Q be fixed.We are now ready to state and prove the key technical lemma of this subsection (cf.Theorem 7.5, Friedland and Schneider [FS80,Sch86]), where we assume the functions ρ(n), k(n), defined by Lemma 7.9 (The big-Θ lemma).There exist c, C > 0 such that, for every n > |Q|, Direction 1 (lower bound).Let us fix (ρ , k ) and show the bound for the elements of A witnessing path in W is a length-n path from s to t that visits k + 1 SCCs of spectral radius ρ and no SCC with a larger spectral radius.Let π = ϕ 1 . . .ϕ k ∈ P(s, t) be the corresponding sequence of SCCs visited by some witnessing path and let s i , e i (1 ≤ i ≤ k) be the entry and exit points (respectively into and out of ϕ i ) on that path, i.e. s = s 1 , SCC (s i ) = SCC (e i ) = ϕ i (1 ≤ i ≤ k), there is a transition (of positive weight) from e i to s i+1 and e k = t.We write s i , e i to represent the particular sequence of entry/exit points.
Let us define a new unary weighted automaton W s i ,e i to be a restriction of W so that the only transitions between its SCCs are from e i to s i+1 , for 1 ≤ i < k, i.e., the weight is reduced to zero for any violating transition.There are finitely many such s i , e i , so that if n ∈ {n ∈ N | d s,t (n) = (ρ , k )} then there exists some W s i ,e i with transition matrix D such that D n s,t > 0. Let us restrict to a single choice of s i , e i , and fix D be the transition matrix of W s i ,e i .Clearly A n s,t ≥ D n s,t for all n, since W s i ,e i is a restriction of W. We now show that D n s,t ≥ c s i ,e i (ρ ) n n k for some c s i ,e i > 0 whenever D n s,t > 0. Hence, this will imply that Note that, in W s i ,e i , ρ(s, t) = ρ and k(s, t) = k , because all paths from s to t must visit k + 1 SCC's with spectral radius ρ .Therefore, by Theorem 7.5 there exists c > 0 be such that where T is the local period from s to t in W s i ,e i .By the definition of limit, for all ε > 0 the ratio is at least c − ε if m is big enough.Pick ε = c /2, then the inequality D m s,t + D m+1 s,t Next we shall show that whenever D n s,t > 0, that is the length n word has a positive path from s to t in W s i ,e i , we have = 0.This will imply 40:26 Let L be the length of the shortest path from s to t in W s i ,e i .Observe that paths from s to t in W s i ,e i can only have lengths from and, thus, {L + • gcd{T SCC (s 1 ) , . . ., T SCC (s k ) } | ∈ N}.As P(s, t) = {π} in W s i ,e i , T = gcd{T SCC (s 1 ) , . . ., T SCC (s k ) }. Consequently, all paths from s to t in W s i ,e i have length in {L + T | ∈ N}.Hence, whenever D n s,t is positive, there are no paths which can contribute positive value to . For small n ≤ N , we can always take c s i ,e i small enough so that D n s,t ≥ c s i ,e i (ρ ) n n k when D n s,t > 0. Take c = min D m s,t /(ρ ) m m k where the minimum is over all m ≤ N for which D m s,t > 0. Then D m s,t /(ρ ) m m k ≥ c for all such m.This means we can choose c s i ,e i = min(c /2, c ) > 0, regardless of whether n is big or small: this constant depends on s i , e i but not on n.
As c s i ,e i depends only on s i , e i , to finish the proof it suffices to take c to be the smallest among the finitely many c s i ,e i .
for all n ∈ N, it will suffice to take C to be the maximum over all C (ρ ,k ) .
Let us fix (ρ , k ).Consider W • to be W † in which, for every (ρ, k) ≤ (ρ , k ), we merge the states (t, ρ, k) into a single final state t .Let us rename the state (s, 0, 0) to s .(This merger and renaming are justified by assumptions (a) and (b) made at the beginning of this subsection, see p. 24.)Let E be the corresponding transition matrix of W • .Note that all paths from s to t in W • go through at most k + 1 SCCs with spectral radius ρ .
Claim 7.10.For all n ∈ N (ρ ,k ) , we have A n s,t = E n s ,t .
Proof.Consider any path s → q There is a corresponding path in W • , however the states q i are annotated as (q i , ρ, k), where ρ is the largest spectral radius seen so far, and k + 1 is the number of SCC's of that radius number seen so far.The only paths removed are those terminating at (t, ρ, k) with (ρ, k) > (ρ , k ).Since d s,t (n) = (ρ , k ), we know that no path visits more than k + 1 SCCs of spectral radius ρ , or an SCC of spectral radius greater than ρ .Consequently, no such path is disallowed in W • .No paths were added either.Because every SCC in W remains a strongly connected component in W • (duplicated with various (ρ, k)) and its transition probability matrix (and hence the spectral radius) remains the same, we can conclude that A n s,t = E n s ,t .This completes the proof of Claim 7.10.Claim 7.11.There exists , where T (s , t ) is the local period between states s and t in W • .By Theorem 7.5, there exists C (ρ ,k ) such that this quantity is bounded by C (ρ ,k ) (ρ ) n n k .Thus, for n ∈ N (ρ ,k ) , we have A n s,t ≤ C (ρ ,k ) (ρ ) n n k .This concludes the proof of Claim 7.11.By the argument above, the upper bound of Claim 7.11 completes the proof of Lemma 7.9.Remark 7.12.When n ≤ |Q| it is possible that A n s,t > 0 but every path may never take any loops.In which case our degree function allocates d s,t (n) = (0, 0), and so Lemma 7.9 would not hold as A n s,t ≤ 0 n n 0 .The behaviour on short words will be handled by the language containment condition.
We will now see how the big-Θ lemma (Lemma 7.9) enables us to characterise the big-O relation on states of weighted automata.For the following lemma, recall the language containment (LC) condition from Definition 6.1 and the ordering on (ρ, k)-pairs from Definition 7.4.Lemma 7.13.A state s is big-O of s if and only if the LC condition holds and, for all but finitely many n ∈ N, we have d s,t (n) ≤ d s ,t (n).
Proof.Let us start off by giving a short summary of the proof.The idea is that, whenever d s,t (n) ≤ d s ,t (n), by Lemma 7.9, we have ) n n k−k = 0 and so ( ρ ρ ) n n k−k ≤ 1 for all but finitely many n.At the same time, whenever d s,t (n) > d s ,t (n), Lemma 7.9 yields We now show how to fill in the details in this summary.First we note some consequences of d s,t (n) ≤ d s ,t (n).Suppose d s,t (n) = (ρ, k) and d s ,t (n) = (ρ , k ).Thanks to Lemma 7.9, we have • In the former case, ( ρ ρ ) n n k−k = 1 and, thus, f s (a n ) ≤ ( C c ) • f s (a n ).• In the latter case, we have lim m→∞ ( ρ ρ ) m m k−k = 0 and, thus, ( ρ ρ ) m m k−k < 1 for all but finitely many m.Consequently, for all but finitely many n, we can conclude f s (a n ) ≤ ( C c ) • f s (a n ).Thanks to the above analysis, if d s,t (n) ≤ d s ,t (n) holds for all but finitely many n, it follows that f s (a n ) ≤ ( C c ) • f s (a n ) for all but finitely many n.Moreover, the language containment condition implies that f s (a n ) ≤ C • f s (a n ) for some C in the remaining (finitely many) cases.Hence, s is big-O of s , which shows the right-to-left implication.
For the converse, recall that we have already established that "s is big-O of s " implies the language containment condition.For the remaining part, we reason by contrapositive and suppose that there are infinitely many n with d s,t (n) > d s ,t (n).As there are finitely many values in the range of d s,t and d s ,t , there exist (ρ, k) and (ρ , k ) such that (ρ, k) > (ρ , k ) and, for infinitely many n, d s,t = (ρ, k) and d s ,t = (ρ , k ).Note that (ρ , k ) = (0, 0), as otherwise f s (a n ) = 0 and, by the language containment condition, f s (a n ) = 0 and (ρ, k) = (0, 0), a contradiction.Therefore, f s (a n ) > 0 for all such n, and moreover Lemma 7.9 yields i.e. ( ρ ρ ) n n k−k is unbounded.Thus, s cannot be big-O of s .
We are going to use the characterisation from Lemma 7.13 to prove Theorem 7.1.As already discussed, the LC condition can be checked via NFA inclusion testing.To tackle the "for all but finitely many ..." condition, we introduce the concept of eventual inclusion.
Remark 7.19.An alternative approach for obtaining an upper bound could be to compute the Jordan normal form of the transition matrix and consider its powers.Instead of the interplay of strongly connected components in the transition graph, we would need to consider linear combinations of the nth powers of complex numbers (such as roots of unity).It is not clear this algebraic approach leads to a representation more convenient for our purposes.Let us first consider a particular form of unary NFAs.
• the remaining transitions connect the end of the path to each cycle: Any unary NFA can be translated to this representation with at most quadratic blow-up in the size of the machine [Chr86], and such representation can be found in polynomial time [To09,Mar02].In addition, to simplify our arguments, we introduce a restricted Chrobak normal form, which requires that there is exactly one accepting state in each cycle.This restricted form can be found with at most a further quadratic blow-up over Chrobak normal form, by creating copies of cycles-one for each accepting state in the cycle.
Observe that S ⊆ F is a necessary condition for the universality of a unary NFA in Chrobak normal form.Consequently, the universality problem for unary NFA in restricted Chrobak normal form such that k = 1 is already coNP-hard.This is the problem we are going to reduce from in the following.
Proof of Theorem 7.20.Let N = Q, − →, q s , F be a unary NFA in restricted Chrobak normal form with k = 1.We will construct a unary Markov chain M, depicted in Fig. 6, with states Q = Q ∪ {s, u, v, t}, where t is final.The branch starting from s, defined below, guarantees We take s = q s and create a similar branch from s , albeit with a smaller weight, to create paths of weight Θ ( 1 4 ) n when reading a n .s . . .Moreover, we add weights to the original NFA transitions from as follows: where j 1 = (|C i | + j − 1) mod |C i |.Note that the weights have been selected as if each letter were read with weight 1 2 except for a bounded number of transitions, where the bound is max |C i |.Consequently, whenever there are accepting paths for a n in N , their overall weight in M will be Θ ( 1 2 ) n .It it easy to check that the reduction produces an LMC and can be carried out in polynomial time.It remains to argue that the reduction is correct.
If N is not universal, there exists n such that a n ∈ F .Because of the cyclic structure of Chrobak normal form, a n k ∈ F for n k = n + kq, where q = lcm{|C 1 |, . . ., |C m |} and k ∈ N.Then, by the earlier observations about growth, there exists C > 0 such that sup k fs(a n k ) s is not big-O of s .If N is universal then, starting from s in M, every word a n will have a path weighted Θ ( 1 4 ) n as well as paths weighted Θ ( 1 2 ) n .Hence, there exists C > 0 such that Remark 7.22.We note that the 1 4 branch via state v is not strictly necessary, but it demonstrates that the problem is hard even if the LC condition is satisfied (i.e., "it can be the numbers that make the hardness").

Decidability for weighted automata with bounded languages
In this section we consider the big-O problem for a weighted automaton W and states s, s such that L s (W), L s (W) are bounded.Throughout the section, we assume that the LC condition has already been checked, i.e.L s (W) ⊆ L s (W).We will show that the problem is conditionally decidable, subject to Schanuel's conjecture.We give a quick introduction to this conjecture in Section 8.1.
Theorem 8.1.Given a weighted automaton W = Q, Σ, M, F , s, s ∈ Q, with L s (W) and L s (W) bounded, it is decidable whether s is big-O of s , subject to Schanuel's conjecture.
Before delving into proof details, we illustrate the arising challenges using a simple representative example in Section 8.2.After that, the proof of Theorem 8.1 is spread over Sections 8.3 to 8.5.First, a version of Theorem 8.1 restricted to plus-letter-bounded languages is shown (Lemma 8.14 in Section 8.3).Subsequently, letter-bounded languages are also captured by reduction to this case (Lemma 8.15 in Section 8.4).Finally, the case of bounded languages is reduced to the case of letter-bounded languages (Lemma 8.16) in Section 8.5.8.1.Logical theories of arithmetic and Schanuel's conjecture.In first-order logical theories of arithmetic, variables denote numbers (from Z or R, as appropriate), and atomic predicates are equalities and inequalities between terms built from variables and function symbols.Nullary function symbols are constants, always from Z.If binary addition and multiplication are available, then: • for R we obtain the first-order theory of the reals, where the truth value of sentences is decidable due to the celebrated Tarski-Seidenberg theorem [BPR06, Chapter 11 and Theorem 2.77]; • for Z, the first-order theory of the integers is, in contrast, undecidable (see, e.g, [Poo03]).
In the case of R, adding the unary symbol for the exponential function x → e x leads to the first-order theory of the real numbers with exponential function (Th(R exp )).Logarithms base 2, for example, are easily expressible in Th(R exp ).The decidability of Th(R exp ) is an open problem and hinges upon Schanuel's conjecture [MW96].
Schanuel's conjecture [Lan66] is a unifying conjecture of transcendental number theory, saying that for all z 1 , . . ., z n ∈ C linearly independent over Q the field extension Q(z 1 , . . ., z n , e z 1 , . . ., e zn ) has transcendence degree at least n over Q, meaning that for some S ⊆ {z 1 , . . ., z n , e z 1 , . . ., e zn } of cardinality n, say S = {s 1 , . . ., s n }, the only polynomial p over Q satisfying p(s 1 , . . ., s n ) = 0 is p ≡ 0. See, e.g., Waldschmidt's book [Wal00, Section 1.4] for further context.If indeed true, this conjecture would generalise several known results, including the Lindemann-Weierstrass theorem and Baker's theorem, and would entail the decidability of Th(R exp ).A recent exciting line of research reduces problems from verification [DJL + 21, MSS20], linear dynamical systems [ACOW18, COW16], and symbolic computation [HLXL18] to the decision problem for Th(R exp ).8.2.Difference to the unary case.In the unary case, it was sufficient to consider the relative order between spectral radii, with careful handling of the periodic behaviour.This approach is insufficient in the bounded case.Example 8.2 highlights that the actual values of the spectral radii have to be examined.
Example 8.2 (Relative orderings are insufficient).Consider the LMC in Fig. 7, with 0.61 ≤ p ≤ 0.62.We have f s (a m b n ) = Θ(0.6 m 0.4 n ) and f s (a m b n ) = Θ(p m 0.39 n +0.59 m 0.41 n ).Note that neither 0.59 m 0.41 n nor p m 0.39 n dominate, nor are dominated by, 0.6 m 0.4 n for any value of 0.61 ≤ p ≤ 0.62.That is, there are values of m, n where 0.59 m 0.41 n 0.6 m 0.4 n (in particular large n) and values of m, n where 0.59 m 0.41 n 0.6 m 0.4 n (in particular large m); similarly for p m 0.39 n vs 0.6 m 0.4 n (but the cases in which n or m needs to be large are swapped).However, the big-O status can be different for different values of p ∈ [0.61, 0.62], despite the same relative ordering between spectral radii.When p = 0.62, the ratio turns out to be bounded: for all m, n (in particular, maximal at m = n = 0, fs(aab) f s (aab) = 1600 1579 , for all larger m, n the ratio is smaller).
In contrast, when p = 0.61, we have fs(a m b 0.66m ) f s (a m b 0.66m ) − −−− → m→∞ ∞.To see that, observe there is an x ∈ Q such that 0.61 • 0.39 x < 0.6 • 0.4 x and 0.59 • 0.41 x < 0.6 • 0.4 x , e.g., x = 0.66.We can take m = xn such that m, n ∈ N and m, n → ∞.Whilst useful for illustration in this example, this effect is not limited to a linear relation between the characters, and so heavier machinery is required.
We first prove Theorem 8.1 for the plus-letter-bounded case, which is the most technically involved; the other bounded cases will be reduced to it.In the plus-letter-bounded case, we will characterise the behaviour of such automata, generalising (ρ, k)-pairs of the unary case.We will need to rely upon the first-order theory of the reals with exponentials to compare these behaviours. in another character.For this case, we introduce a new number δ = 1 2 min ϕ:ρϕ>0 ρ ϕ which is strictly smaller than the spectral radius of every non-zero SCC (so will not dominate with the partial order), but non-zero.This allows us to distinguish between no-path using the (0, 0)-pair and a short non-looping path using the (δ, 0)-pair.
. .a nm m -labelled path from s (to the final state) is compatible with ρ if, for each i = 1, . . ., m, it visits k i + 1 SCCs with spectral radius ρ i while reading a i , unless the path visits only singletons with no loops, in which case (ρ i , k i ) = (δ, 0).The notation (ρ, k) ∈ ρ is used for "(ρ, k) is an element of ρ".Observe that ρ may range over at most |Q| 2m possible values.Let D be the set of all of them, so that d s : N m → P(D).In this extended setting, the big-Θ lemma (Lemma 7.9) may be generalised as follows.
Let s = q 0 and t = q m , then a sequence q 0 , . . ., q m describes a possible path through the automaton.By Lemma 7.9 in the unary case, for each M (a i ) n i q i−1 ,q i > 0, there is a (ρ Otherwise if n i ≤ |Q|, there exists c, C satisfying Eq. (8.1) regardless of the value of d q i−1 ,q i (n i ) = (ρ, k), since there are at most |Q| characters thus bounding M (a i ) n i q i−1 ,q i .In particular Eq.(8.1) holds for (ρ, k) = (δ, 0), that is, if no path of length n i between q i−1 and q i goes through a loop.
When M (a i ) n i q i−1 ,q i is zero, M (a i ) n i q i−1 ,q i correspond to the (0, 0)-pair, and M (a i ) n i q i−1 ,q i = 0 n i n 0 i = 0).
We observe that z(n 1 , . . ., n m ) is nearly z(n 1 , . . ., n m ).First we omit paths which contain a (0, 0) element, then each summand ρ n 1 s,q 1 n ks,q 1 1 • . . .• ρ nm q m−1 ,t n kq m−1 ,t m of the sum corresponds with a candidate element ρ of d s (n 1 , . . ., n m ).Given any two elements where ρ = σ only one representative need be kept in d s (n 1 , . . ., n m ) by replacing c and C with c/2 and 2C.
The following lemma provides the key characterisation of negative instances of the big-O problem, in the plus-letter-bounded case and assuming the LC condition.Here and below, we write n(t) to refer to the the tth vector in a sequence n : N → N m .where h Y ⊆ {1, . . ., |Y|}, α j,i ∈ R, p j,i ∈ Z (1 ≤ i ≤ m) are uniquely determined by X and Y (in a way detailed below), h Y and p j,i 's are effectively computable and α j,i 's are first-order expressible (with exponential function).
Proof.Observe that then s is not big-O of s iff there exists an infinite sequence of words such that, for all C > 0, the sequence contains a word w such that fs(w) f s (w) > C. Thanks to Lemma 8.6, this is equivalent to the existence of a sequence n : N → N m such that X∈ds(n(t) 1 ,...,n(t)m) where n(t) i denotes the ith component of n(t).Since there are finitely many possible values of d s and d s , it suffices to look for sequences n such that d s (n(t)) and d s (n(t)) are fixed.Further, because of the sum in the numerator, only one X ∈ X is required such that X ∈ d s (n 1 , . . ., n m ).Thus, we need to determine whether there exist X ∈ D, Y ⊆ D and n : N → N m such that X ∈ d s (n(t)), d s (n(t)) = Y (for all t) and  The number α j,i is the logarithm of the ratio of two algebraic numbers, which are not given explicitly.However, they admit an unambiguous, first-order expressible characterisation (recall Lemma 7.3).The logarithm is encoded using the exponential function: log(z) is the x ∈ R such that exp(x) = z.Lemma 8.7 identifies violation of the big-O property using two conditions.In the remainder of this subsection we will handle Condition (a) using automata-theoretic tools (the Parikh theorem and semi-linear sets) and Condition (b) using logics.In summary, the characterisation of Lemma 8.7 will be expressed in the first-order theory of the reals with exponentiation, which is decidable subject to Schanuel's conjecture. . ., n m ) σ ≥ ρ} by tracking the current maximum spectral radius seen and the number of different SCCs with this spectral radius.If the only states seen so far have been singletons with no loops (formally having spectral radius 0), the value should be tracked as (δ, 0) regardless of how many have been seen.
Passage from states reading a j to states reading a j+1 is allowed only if the tracked value is at least (ρ j , k j ), and states should be final if the tracked value of a m is at least (ρ m , k m ).As previously, comparisons are with respect to the partial order of Definition 7.4.
Similarly, one can construct N s >ρ with L(N s >ρ ) = {a n 1 1 . . .a nm m | ∃σ ∈ d s (n 1 , . . ., n m ) σ > ρ}.The construction is the same as for N s ≥ρ except that, in order to accept, we need to be sure that at least one of the 'at least' comparisons was strict.This can be achieved by maintaining an extra bit at run time.
Since we can iterate through (p, q 1 , . . .q k , r, s 1 , . . .s ) ∈ ∆, it remains to show that it is possible to check the logical characterisation.Like in Claim 8.13, we show this logical characterisation using naturals is equivalent to a statement that can be encoded in Th(R exp ), the theory of the reals with exponential function.
Lemma 9.5.A is not big-O of B if and only if there exists (p, q 1 , . . .q k , r, s 1 , . . .s ) ∈ ∆, such that for all C > 0, there exists x ∈ R m ≥0 such that k i=1 p i (q i 1 ) x 1 . . .
Proof.Clearly if the real formulation is unsatisfied, then the formulation with naturals is unsatisfied.It remains to show that if the real formulation is satisfied, then so too is the formulation with naturals.We assume the condition in Lemma 9.5 is satisfied for (p, q 1 , . . .q k , r, s 1 , . . .s ) ∈ ∆ fixed and we show: Let C be given, we show the existence of a relevant vector n.Let us choose C = T C, the exact value of T ∈ R + will be chosen later, so that by assumption there exists x ∈ R m ≥0 such that k i=1 p i (q i 1 ) x 1 . . .(q i m ) xm ≥ C i=1 r i (s i 1 ) x 1 . . .(s i m ) xm .(9.2) We decompose x into its integer and fractional parts.Let n, y be such that x = n + y, 0 ≤ y < 1 and n ∈ N m .

Figure 2 .
Figure2.Reduction; where q a represents accepting states of the probabilistic automaton, q r represents rejecting states and q s represents the start state (assumed to be rejecting).
Value-1 and the big-O problem.Proposition 5.2.The Value-1 problem reduces to the big-O problem.

Figure 5 .
Figure 5. Different rates for different phases.

7. 3 .
coNP-hardness for unary LMC.Given a unary NFA N , the NFA universality problem asks if L(N ) = {a n | n ∈ N}.This problem is coNP-complete [SM73].We exhibit a polynomial-time reduction from (a variant of) the unary universality problem to the big-O problem on unary Markov chains.Theorem 7.20.The big-O problem is coNP-hard on unary Markov chains.

Figure 7 .
Figure 7. Relative orderings are the same, but the big-O relations are different.
Condition (a) via automata.It turns out that sequences n satisfying Condition (a) in Lemma 8.7 can be captured by a finite automaton.In more detail, for any X ∈ D, there exists an automaton N s X such thatL(N s X ) = {a n 1 1 • • • a nm m | X ∈ d s (n 1 , • • • , n m )}.For any Y ⊆ D, there exists an automaton N s Y such that L(N s Y ) = {a n 1 1 • • • a nm m | d s (n 1 , • • • , n m ) = Y}.The relevant automaton capturing X and Y is then found by taking the intersection of L(N s X ) and L(N s Y ).Lemma 8.8.For any X ∈ D and Y ⊆ D, there exists an automaton N X,Y such thatL(N X,Y ) = {a n 1 1 • • • a nm m | X ∈ d s (n 1 , • • • , n m ), Y = d s (n 1 , • • • , n m )}.Proof.Let ρ = (ρ 1 , k 1 ) • • • (ρ m , k m ).Building on Definition 7.8, one can construct an automaton N s ≥ρ with L(N s ≥ρ ) = {a n 1 1 . . .a nm m | ∃σ ∈ d s (n 1 , . Definition 8.5.Let d s : N m → P((R × N) m ) be s.t.: ρ ∈ d s (n 1 , . . ., n m ) if and only if (1) there exists an a n 1 1 a n 2 2 . . .a nm m -labelled path from s to the final state compatible with ρ, and (2) for every a n 1 1 a n 2 2 . . .a nm m -labelled path from s compatible with σ s.t.ρ ≤ σ, we have ρ = σ.