Tight Polynomial Bounds for Loop Programs in Polynomial Space

We consider the following problem: given a program, find tight asymptotic bounds on the values of some variables at the end of the computation (or at any given program point) in terms of its input values. We focus on the case of polynomially-bounded variables, and on a weak programming language for which we have recently shown that tight bounds for polynomially-bounded variables are computable. These bounds are sets of multivariate polynomials. While their computability has been settled, the complexity of this program-analysis problem remained open. In this paper, we show the problem to be PSPACE-complete. The main contribution is a new, space-efficient analysis algorithm. This algorithm is obtained in a few steps. First, we develop an algorithm for univariate bounds, a sub-problem which is already PSPACE-hard. Then, a decision procedure for multivariate bounds is achieved by reducing this problem to the univariate case; this reduction is orthogonal to the solution of the univariate problem and uses observations on the geometry of a set of vectors that represent multivariate bounds. Finally, we transform the univariate-bound algorithm to produce multivariate bounds.


Introduction
A static analysis algorithm takes, as input, program code and answers a question about its possible behaviours. A standard example is to find the set of values that a variable can assume (at a given program point). The algorithm seeks a set of a particular kind (for example, intervals [a, b]) and is complete if it can always find the tightest such result (e.g., the smallest interval that contains all reachable values). A more complex analysis may establish a relation among the values of several variables. While such results cannot be fully computable for programs in general, there are algorithms that guarantee a sound and complete solution for a class of weak programs, defined by removing or weakening some features of full programming languages. For example, [HOP + 18] shows an algorithm that, for a particular weak language, finds any algebraic relation that exists among variable values. Such an algorithm can establish facts like: two variables x and y always satisfy, at the us to answer a question like: does the program square its input, or in general, can an output value be expressed as a polynomial in the input values and what is this polynomial?
In this paper we are interested in final values that are not necessarily expressible by polynomials, but are polynomially bounded, and we are seeking a tight asymptotic upper bound. A prototypical context where such a question is asked is complexity analysis: the quantity to be analyzed may represent a counter of steps performed by the program and we wish to establish a result such as "the program's worst case is Θ(n 2 ) steps" (where n is the size of an input parameter).
In 2008, Ben-Amram et al. [BJK08] considered a simple imperative programming language with explicitly-bounded loops, and proved that it is decidable whether computed values are polynomially bounded. This work left open the problem of finding an explicit tight bound for such values. Recently, in [BAH20], we solved this bound analysis problem (the class of programs remained as in [BJK08]). However, the solution was very inefficient (doubly-exponential space) and did not settle the complexity of this problem. In this paper we answer this question and prove that tight polynomial bounds (for the same language) can be computed in polynomial space. Our approach is to first develop an analysis for univariate bounds only. This simplifies the task and makes the problem of solving it efficiently more tractable. However the full problem is to compute multivariate bounds (such as O(xy + yz 2 ), equivalently O(max(xy, yz 2 ))). In Sections 11 and 12 we show how a solution to the multivariate problem can be evolved from our solution to the univariate problem.
We now discuss, somewhat informally, some key features of the approach. We have already noted that bounds may be multivariate, i.e., depend on multiple variables. In [BAH20] we argued that it is also necessary to compute simultaneous bounds on multiple variables; moreover, we find sets of simultaneous bounds. The capability of computing sets of simultaneous bounds is important even if we are ultimately interested in bounding the value of a single variable. Consider, for example, the following piece of code, where choose represents non-deterministic choice: choose Y := X*X or { Y := X; X := X*X } For it we have two incomparable simultaneous bounds on X, Y: x, x 2 and x 2 , x (where x stands for the initial value of X). This allows us to deduce that if the above command is followed by: As an example, the sum of the abstract polynomial x + y and the abstract polynomial y + z is x + y + z, not x + 2y + z, as in the ring of coefficients, 1 + 1 = 1 (also 1 · 1 = 1). The intention is that an abstract polynomial represents all the polynomials which are formed by varying its non-zero coefficients, thus x + y represents ax + by for all a, b > 0. We can also view an abstract polynomial as a set of monomials (whose coefficients are, implicitly, 1). We use AMP as an abbreviation for abstract multi-polynomial. We use α(p) for the abstraction of p (obtained by modifying all non-zero coefficients to 1). Conversely, if p is abstract we can "coerce" it to a concrete polynomial by considering the 1-valued coefficients as being literally 1; this is how one should read any statement that uses an abstract polynomial where a concrete function is needed. Abstract polynomials are used to ensure that the analysis is finite, since there is a finite number of different abstract polynomials of any given degree. They correspond to the use of big-O (or Θ) notation in typical algorithm analysis.
The set of abstract multi-polynomials is denoted by AMPol; this has a composition operation p • q, which relates to the standard composition p • q by α(p) • α(q) = α(p • q); the different operator symbol ("•" versus "•") helps in disambiguating the meaning of an expression (referring to abstract polynomials versus concrete ones).
In the special case of abstract univariate polynomials, it is useful to restrict them to be monomials x d ; this can be done since in "big Θ" bounds, a univariate polynomial reduces to its leading monomial. We allow the monomial x 0 , as well as 0, and thus we obtain a semiring isomorphic to N ∪ {−∞}, max, + .
Definition 2.4. A univariate state is an n-tuple of univariate monomials. A univariate state is called positive if it does not include a zero.
Based on this definition, a univariate state can be written as x d 1 , . . . , x dn and internally represented by either a vector of integers d 1 , . . . , d n (for a non-zero monomial) or the constant 0.

Formal Statement of the Problem and our Results
Our goal is to prove polynomial-space complexity bounds for the tight-bound problem. In [BAH20], our algorithm was encumbered by creating and calculating with exponentiallylarge objects. In particular, we argued that there are commands for which a set of multipolynomials that provides tight bounds must include exponentially many elements. In order to reduce the size of objects internal to the algorithm, we have to rewrite the algorithm, but first we circumvent the issue of exponential output size by adjusting the specification of the algorithm. This can be done in two ways. First, as one commonly does in Complexity Theory, one can reduce the problem to a decision problem: Problem 3.1. Multivariate decision problem: Given a core-language program P and a monomial m(x 1 , . . . , x n ) = x d 1 1 . . . x dn n , report whether m constitutes a "big-Omega" lower bound on the highest final value of X n for initial state x 1 , . . . , x n .
There is, of course, no loss of generality in fixing the queried variable to be the highestnumbered; neither do we lose generality by restricting the bounds to monomials (a polynomial is an asymptotic lower bound if and only if all its monomials are). Note that we have chosen to focus on worst-case lower bounds, unlike most publications in this area, which focus on upper bounds for worst-case results. Our reader should keep in mind, though, that [BAH20] shows that for our core language, whenever a variable is polynomially bounded, there is a set of polynomials which provides tight bounds for all executions. This means that if we form the function "max of (all the lower bounds)," we have a tight worst-case upper bound. Thus solving for lower bounds also solves the upper bound problem. Furthermore, we note that if we are looking at a polynomially-bounded variable, the set of abstract monomials (or even polynomials) that lower-bound the final value is finite. This is convenient (for instance, the expression "max of (all the lower bounds)" can be explicitly written out, if desired). We refer to a polynomial in this set as attainable (since the program can compute values that reach, or surpass, this polynomial, asymptotically).
We call a function attainable (by a program) if its values can be reached or surpassed.
Definition 3.2 (Attainable). Consider a command C, and a function f : N n → N n we say that f is attainable by C if there are constants d > 0 and x 0 ∈ N n such that for all x ≥ x 0 there is a y such that x C y and y ≥ df (x). Given a function f : N n → N, extend the definition to f by applying the above condition to f (x) = 0, . . . , 0, f (x) , in other words we use f as a lower bound on a single variable (X n ).
We remark that in the above inequation, d ranges over real numbers, and may be smaller than one.
Thus, for a program that starts with data x 1 and x 2 in its two variables, and nondeterministically chooses to exchange them, the bounds x 1 , x 2 and x 2 , x 1 are both attainable, while x 1 , x 1 is not.
In the current work, we are attempting to find tight bounds only for variables that are polynomially bounded in terms of the input (the initial contents of the variables). The problem of identifying which variables are polynomially bounded is completely solved in [BJK08], by a polynomial-time algorithm. We will tacitly rely on that algorithm. More precisely, we assume that the algorithm is invoked as a preprocessing step for the analysis of each loop in the program, and allows for excluding variables that may grow superpolynomially in the given loop. This reduces the problem of handling any loop in our language to that of handling loops in which all variables are polynomially bounded; the key observation here is that, due to the restricted arithmetics in our language, which only include addition and product, any value derived from a super-polynomial value is also super-polynomial. Hence, ignoring variables which are not polynomially bounded does not lose any information necessary for analyzing the polynomially bounded ones.
Note that in the formulation as a decision problem, the user queries about a given polynomial; we think that it is a natural use-case, but one can also ask the analyzer to furnish a bound: Problem 3.3. Multivariate bound generation problem: Given a core-language program P , and an integer d, find a polynomial p(x 1 , . . . , x n ) with the degree in every variable bounded by d, such that p constitutes a "big-Omega" lower bound on the highest final value of X n as a function of x 1 , . . . , x n .
The function of the input parameter d is to (possibly) reduce the complexity of the search-not looking further than what the user is wishing to find. There is, of course, no loss of generality in fixing the queried variable to be the highest-numbered.
For this problem we claim a solution whose space complexity is polynomial in the input size, namely the program size plus the bit-size of d. Recall that if we have a polynomialspace non-deterministic algorithm, we can also determinize it in polynomial space (this is the essence of the famous equality NPSPACE=PSPACE, which follows from Savitch's Theorem [Jon97]). This means that it is possible to transform the non-deterministic solution to Problem 3.3 (as long as it is complete) into a deterministic generator that writes out one monomial at a time. Moreover, by exhaustive search it is possible to generate all the monomials needed to bound all program executions, even when their number is exponential.
It should thus be clear that in a similar manner we can also search for a bound of highest degree d max . The complexity of our solution to the multivariate-bound problem (Sections 11 and 12) is output-sensitive in the sense that the bound is polynomial in the program size and the bit-size of d max .
A main idea in this paper is to reduce Problem 3.1 to a problem concerning univariate bounds. In this problem, we assume that we are given a univariate initial state in terms of a single input variable x. The initial state assigns to every variable X i an initial symbolic value of form x d i for some integer d i ≥ 0, e.g., i(x) = x, x 3 , x 2 . The case d i = 0 intuitively represents a variable whose initial value should be treated as an unknown constant (not dependent on x, but possibly large). The univariate form seems to be rather common in textbook examples of algorithm analysis (where the single parameter in terms of which we express the bounds is usually called n), and the facility of stating initial polynomial values for multiple variables may be useful to express questions like: An algorithm processes a graph. Its running time depends on the number of vertices n and the number of edges m. Bound its worst-case time, in terms of n, given that m ≤ n 2 . In this case we would use an initial state like n, n 2 . Our goal is now to find an attainable univariate state, providing a lower bound on the worst-case results of a computation from the given initial state. Here is the formal definition; note that we are providing simultaneous bounds for all variables.
Definition 3.4. Given a command C, a function g : N → N n and an initial univariate state i, we say that g is attainable (by C, given i) if there are constants d > 0, x 0 such that for all x ≥ x 0 there is a y such that i(x) C y and y ≥ dg(x).
As in Definition 3.2, a function with codomain N may be used an interpreted as a lower bound for just X n .
We thus focus on the following problem.
Problem 3.5. Univariate decision problem: Given core-language program P , a positive 2 initial state i, the index of a chosen variable X j and an integer d, decide whether there is an attainable univariate state whose degree in the jth component is at least d.
We also solve the corresponding bound generation problem-even for simultaneous bounds, namely Problem 3.6. Univariate bound generation problem: Given a core-language program P , a positive initial state i, a variable index j and an integer d, non-deterministically find an attainable univariate state whose degree in the jth component is at least d.
We have included the parameter d in this problem formulation as well, since we will be able to bound the complexity of our algorithm in terms of d (instead of d max ).
Here is a summary of the results we are going to prove regarding the above problems: • In Section 7, we give a polynomial space algorithm to solve Problem 3.6.
• The algorithm in Section 7, along with the complexity analysis in Section 8, prove that Problem 3.5 and Problem 3.6 can be solved in polynomial space. Note that this means that the space complexity is polynomial in the input size, consisting of the size of program P and the bit-size of the numbers in i and d. • In Section 10 we show that this result is essentially optimal since the problem is PSPACEhard. This clearly means that the more involved problems (for multivariate bounds) are also PSPACE-hard. • In Section 11 we show a reduction of the multivariate-bound decision problem (Problem 3.1) to the univariate problem (Problem 3.5), and obtain in this way a solution with space complexity polynomial in terms of the size of the program and the maximum degree. • In Section 12 we solve the multivariate-bound generation problem (Problem 3.3).
We'd like to mention that solving the above problems also solves some variants that can be reduced to tight-bound computation: • We can find tight bounds on the number of visits to a given set of program locations (counting all locations gives the program's time complexity, in the ordinary unit-cost model). The reduction is to instrument the program with a counter that counts the visits to the locations of interest [BAH20, Section 2]. • Sometimes we want to verify a bound for the values of a variable throughout the program, not just at its end. This can also be easily reduced to our problem by instrumentation. • Similarly, one can find a tight bound on the highest value assumed by a variable at a given program location.
2 A zero in the initial state can be accommodated in an extension discussed in Section 13, where an assignment X := 0 is also allowed.

Properties of the Core Language
We next present some important properties of the core language that follow from [BAH20] and are required for our result. First, let us cite the main result of [BAH20]: ). There is an algorithm which, for a command C, over variables X 1 through X n , outputs a set B of multi-polynomials, such that the following hold, where PB is the set of indices i of variables X i which are polynomially bounded under C .
(1) (Bounding) There is a constant c such that (2) (Tightness) There are constants d > 0, x 0 such that for every p ∈ B, for all x ≥ x 0 there is a y such that x C y and ∀i ∈ PB . y i ≥ dp[i](x).
We rely on this result because our new algorithm will be proven correct by showing that it matches the bounds of the previous algorithm. But the implied existence result is crucial in itself: the fact that a set B as above exists, which provides both upper bounds (clause "Bounding") and matching worst-case lower bounds (clause "Tightness"). This property is tied to the limitations of our language: in a Turing-complete programming language this clearly does not hold-some times no polynomial tight bound exists, as for a program that inputs x and computes log x, √ x, etc. In other cases the bound is only "tight" in the sense that it is matched by a lower bound for infinitely many inputs, but not for all inputs (this is sometimes expressed by the notation Ω ∞ ); e.g., a program that squares a number only if it is odd. These cases cannot happen with our language.
Since a polynomial is a sum of a constant number of monomials, by simple arithmetics we obtain: Corollary 4.2. For a command C, over variables X 1 through X n , assume that X n is polynomially bounded in terms of the variables' initial values. Then there is a set S of monomials such that max S is a tight asymptotic bound on the highest value obtained by X n at the conclusion of C; or in more detail, (1) (Bounding) There is a constant c such that ∀x, y . x C y =⇒ ∃m ∈ S . y n ≤ cm(x) (2) (Tightness) There are constants d, x 0 such that for every m ∈ S, for all x ≥ x 0 there is a y with x C y and y n ≥ dm(x) .
We refer to monomials satisfying the Tightness condition as attainable; they constitute valid answers to Problem 3.3 (subject to the degree bound); and the fact that they also provide an upper bound is, again, an important property that depends on the limitations of the programming language. This will be used in Sections 11 and 12 where we check or compute multivariate bounds in the form of monomials.

Algorithmic ideas
In this section we give a brief overview and motivation of key elements in our solution. In particular, we relate these elements to prior work. We discuss symbolic evaluation, the motivation for working with univariate bounds, and data-flow matrices. This is followed by an overview of the way we prove correctness of this algorithm, and the way we finally solve the multivariate-bound problem.
5.1. Symbolic evaluation. This is an old concept. We interpret the program, with states mapping variables not to integers, but to polynomials in x 1 , . . . , x n , where x i represents the initial value of the corresponding variable X i . Due to the restricted arithmetics in our language, we can carry out symbolic evaluation precisely, staying within the domain of multivariate polynomials. Thus a program state is a MP. However, when there are different program paths that reach the same point, we can get different polynomials, and therefore we construct sets of polynomials. The abstraction to abstract polynomials may reduce the size of this set. Here is an example of a piece of program code, along with a symbolic state, as a set of AMPs, before and after each command: This is all quite straight-forward until one comes to a loop. Since we are analysing the program statically, we have to take into account any number of iterations. Consequently, had we worked with concrete polynomials, they would grow indefinitely.
Example 5.1. Consider this loop: loop X 2 { X 1 := X 1 + X 2 ; } Evaluating with concrete polynomials, the value for X 1 would take the values x 1 , x 1 + x 2 , x 1 + 2x 2 , x 1 + 3x 2 , etc. Working with abstract polynomials, the coefficients are ignored, and we have a finite set, which is reached after a finite number of iterations. In our example, the value for X 1 does not rise beyond x 1 + x 2 . This is good, but unfortunately, we also lose completeness: x 1 +x 2 is not a valid asymptotic upper bound on the final value of X 1 ; a correct upper bound is x 1 + x 2 2 , taking into account the bound x 2 on the number of loop iterations. The passage from x 1 + x 2 to x 1 + x 2 2 is based on correctly generalizing the behaviour that we see in one iteration to the long-term behaviour, and deducing that the increment +x 2 may be applied to this variable at most x 2 times. Making such generalizations precisely is the main challenge addressed in the algorithm of [BAH20].
Our algorithm in the cited work gave a precise set of bounds but suffered from efficiency problems. This is inherent in the approach of generating symbolic polynomials. In fact, it is easy to write pieces of code that generate exponentially big sets of expressions-try the following: choose { X 3 := X 1 } or { X 3 := X 2 } ; choose { X 4 := X 1 } or { X 4 := X 2 } ; . . . choose { X n := X 1 } or { X n := X 2 } 5.2. Univariate bounds. Suppose that we decide to compute only univariate bounds. We have a single input parameter x and we want to express everything in terms of x. Then, if we also ignore coefficients, the only functions we need are x, x 2 , x 3 ,. . . and the efficiency issue, at least the problem of big expressions, is resolved. An abstract state is just an n-tuple of degrees and takes polynomial space. However, important information is lost. If we start with all variables set to x, and the loop body sets X 1 := X 1 + X 2 , we get the symbolic value 2x in X 1 after the first iteration (or just x if we ignore coefficients). How do we know that this becomes x + tx after t iterations? We cannot deduce it from the final expression. We are compelled to also keep track of how the value was computed-so we know that there is an increment that can accumulate upon iteration. Note that the fact that the value after one iteration is different from the value before does not necessarily mean that the difference will accumulate-consider a loop body that simply sets X 1 := X 2 , where the initial value of X 2 is bigger than that of X 1 .

5.3.
Data-flow matrices. The discussion in the last paragraph hints that it may be useful to record data-flow: does x 1 (value of X 1 after an iteration) depend on x 1 (its initial prior to the iteration)? On x 2 ? On both? Such information may be represented by a bipartite graph with arcs from x 1 , x 2 , . . . , x n leading to x 1 , x 2 , . . . , x n and showing how values propagate, or a data-flow matrix, which is the same information in matrix form. This is a very concise representation which does not record the actual expressions computed, only the Boolean property that some x j depends, or does not depend, on x i . Some of the previous results in complexity analysis [KA80,KN04] showed that the existence of polynomial bounds on computed values may sometimes be deduced by examining the dependence graph. Later works [NW06,JK09] showed that by enriching the information in data-flow matrices (allowing for a finite number of "dependence types") one has sufficient information to soundly conclude that a result is polynomially bounded in a larger set of programs. As a basic example, the technique allows us to distinguish a loop body that sets x 1 = 2x 1 (doubling the value of X 1 , leading to exponential growth upon iteration) from one that sets x 1 = x 1 + x 2 (which entails polynomial growth). Ben-Amram et al. [BJK08] derived a complete decision procedure for polynomial growth-rates in the Core Language, still by tracking data-flow properties (with an abstraction similar to data-flow matrices) and without explicitly computing any bounds.
In the current work, we use data-flow matrices in addition to a symbolic state, to aid the analysis of loops, specifically the precise generalization of iterations that seem to increase certain variables. Suprisingly, it turns out that all we need is Boolean matrices which record only linear data-dependence.

Correctness.
We have motivated our work by wishing to solve the efficiency issue with the algorithm of [BAH20]. However that algorithm-the so-called Closure Algorithmremains the foundation of our current contribution. We both motivate the polynomial-space algorithm, and formally prove its correctness, based on the idea that it should match the results of the Closure Algorithm, in the following sense: the Closure Algorithm computes AMPs; for instance, for the loop in Example 5.1, it outputs the AMP p = x 1 + x 2 2 , x 2 . Our algorithm works with an initial univariate state, e.g., x = x 3 , x 2 . An abstract multipolynomial can be applied to a univariate state, e.g., p • x = x 4 , x 2 (as already mentioned, we reduce abstract univariate polynomials to their leading monomial). The correctness assertion which we will prove for the polynomial-space algorithm is that it can match the result p • x (though it never maintains p). Our algorithm will be non-deterministic, and in this example it will also generate additional final abstract states, including x 3 , x 2 . In the soundness part of the proof, we argue that every such result is attainable, by showing that it is bounded from above by p • x for p computed by the Closure Algorithm; in the completeness part, we show that for every p computed by the Closure Algorithm, we can reach (or surpass) p • x (clearly, if p • x is a maximal element in the set of attainable bounds, then based on soundness, we will reach but not exceed it). Thus, our proof does not validate the new analysis by relating its results with the concrete semantics of the analysed program, but with the results of the Closure Algorithm, which [BAH20] already related to concrete results. 5.5. Multivariate bounds. As the restriction to univariate bounds was a key element in the development and proof of the polynomial-space algorithm, it came as a surprise that the same algorithm (practically the same code)-with a change of domain from univariate AMPs to multivariate ones-can compute tight multivariate bounds. But our correctness proof, as outlined above, does not work for the multivariate algorithm (otherwise we could have skipped the univariate stage altogether). We arrive at the result for multivariate bounds by first establishing a reduction among decision problems-the attainability of a multivariate bound and that of univariate bounds. We study the degree vectors (d 1 , d 2 , . . . , d n ) of monomial bounds x d 1 1 . . . x dn n , and show that these vectors define a polyhedron in n-space, and that a linear-programming view of the problem allows us to decide membership in this polyhedron using the univariate analysis. This linear-programming approach also shows that in order to obtain a tight upper bound it is not necessary to take a max of all the attainable bounds-it suffices to use those that figure as the vertices of the polyhedron. We proceed to use this insight, along with the linear-programming connection to univariate bounds, to prove that we can compute a sound and complete set of multivariate bounds-where by sound we mean that the bounds are attainable, i.e., constitute lower bounds on some possible results, and by complete we mean that their maximum is an upper bound on all possible results.

The Closure Algorithm
In this section, we present a version of the algorithm of [BAH20], computing in exponential space and time. We call it "the Closure Algorithm" since an important component of the algorithm (and a cause of combinatorial explosion) is the computation of transitive closure to find the effect of any number of loop iterations. We need the Closure Algorithm because our polynomial-space solution evolved from this one, and moreover our proof for the efficient algorithm relies on the correctness of its predecessor. In fact, the algorithm below is already a step beyond [BAH20]: even if it is still exponential, it is somewhat simplified. Since this is not the main contribution here, we have decided to present it, in this section, concisely and without proofs, to allow the reader to understand and proceed quickly to our new polynomial-space algorithm. For completeness, Appendix A includes proofs that show that the Closure Algorithm below is equivalent to the old one [BAH20].
6.1. Symbolic semantics. We present our analysis as symbolic semantics that assigns to every command C a symbolic abstract value C S ∈ ℘(AMPol), that is, a set of AMPs. For atomic commands, this set is a singleton: where set ij , add ijk and mul ijk are AMPs defined by: For composite commands, except the loop, the definition is also straightforward: To handle a loop command, loop X {C}, we first compute S = C S , obtaining a representation of the possible effects of any single iteration. Then we have to apply certain operations to calculate, from S, the effect of the entire loop: The rest of this section builds up to the definition of the function LC and the explanation of the above expression.
Example 6.1. Consider the following loop (also considered in Section 5): The abstraction of the loop body is the following set of AMPs: We will later show how this is used to compute the effect of the whole loop. 6.2. Simple Disjunctive Loops, Closure and Generalization. We cite some definitions from [BAH20] that we use in presenting the inference of the effect of a loop from an abstraction of its body.
Definition 6.2. A polynomial transition (PT) represents the passage from a state x = x 1 , . . . , x n to a new state The loop is "disjunctive" because the meaning is that, in every iteration, any of the given transitions may be chosen. The sense in which abstract MPs represent concrete transitions calls for a detailed definition but we omit it here, since in our context the intent should be clear: we expect S to be the result of analyzing the loop body with an analysis that generates asymptotically tight bounds. Importantly, a SDL does not specify the number of iterations; our analysis of a SDL generates results parameterized by the number of iterations as well as the initial state. For this purpose, we now introduce τ -polynomials, where τ (for "time") is a parameter to represent the number of iterations.
As τ represents the number of iterations, it is not a component of the state vector, which still has dimension n. If p is a τ -polynomial, then p(v 1 , . . . , v n ) is the result of substituting each v i for the respective x i ; and we also write p(v 1 , . . . , v n , t) for the result of substituting t for τ as well.
We form multi-polynomials from τ -polynomials to represent the effect of a variable number of iterations. For example, the τ -polynomial transition x 1 , x 2 = x 1 , x 2 + τ x 1 represents the effect of repeating (τ times) the assignment X 2 :=X 2 + X 1 . The effect of iterating the composite command: X 2 :=X 2 + X 1 ; X 3 :=X 3 + X 2 has an effect described by x = x 1 , x 2 + τ x 1 , x 3 + τ x 2 + τ 2 x 1 (note that this is an upper bound which is not reached precisely, but is correct up to a constant factor). We denote the set of τ -multi-polynomials by τ MPol. We should note that composition q • p over τ MPol is performed by substituting p[i] for each occurrence of x i in q. Occurrences of τ are unaffected (since τ is not part of the state).
The notion of abstract (multi-) polynomials is extended naturally to abstract τ -(multi-) polynomials. We denote the set of abstract τ -polynomials (respectively, multi-polynomials) by τ APol (respectively, τ AMPol). The SDL Problem is to compute, from S, a set of abstract τ -multi-polynomials that represent the effect of any number of loop transitions, where τ is used to express dependence on the number of transitions. As for general programs, we focus on loops which are polynomially bounded, and search for attainable bounds; these notions are defined as follows.
Definition 6.5. A SDL S is said to be polynomially bounded when there exists a τ -MP, b, such that for all x ∈ N n , if we start with x and consecutively apply t arbitrary transitions from S, the final state y satisfies y ≤ b(x, t).
Note that we require a polynomial bound dependent on the number of iterations t, considering t as an independent variable. This makes the SDL analysis independent of the program context where the SDL is extracted from. Clearly, when we plug the results of SDL analysis back into a context where the number of iterations is also bounded by a polynomial, we shall obtain a polynomial bound on the final state in terms of the program variables (eliminating t).
Definition 6.6. Given SDL S and a function f : N n+1 → N n (in the current context, f will be a τ -MP) we say that f is attainable over S if there are constants d > 0, x 0 such that for all x ≥ x 0 , for infinitely many values t > 0, there exist y 0 , . . . , y t such that We remark that SDLs enjoy the properties we pointed out in Section 4. In particular, a complete set of attainable lower bounds provides a tight asymptotic upper bound as well.
In [BAH20], we studied the SDL problem, and what we present next is an improved version of our solution from that article; the difference is explained at the end of this section. The solution consists of applying two operators to a set of (τ )-AMPs, which we shall now define. The first, abstract closure, naturally represents the limit-set of accumulated iterations.
Definition 6.7 (abstract closure). For finite P ⊂ τ AMPol, we define Cl(P ) to be the smallest set including Id and the elements of P , and closed under AMP composition.
In [BAH20] we proved that when loop S is polynomially bounded, the abstract closure of S is finite, and can be computed in a finite number of steps. Briefly, the reason that the closure must be finite, when the loop is polynomially bounded, is that the polynomials in the abstract closure are of bounded degree (since they are valid lower bounds). Moreover, there is a finite number of different abstract polynomials of any given degree.
Example 6.8. In Example 6.1 we have seen that Computing Cl(S) adds Id as well as the composition of the above two AMPs, which equals The reader is invited to verify that this gives a composition-closed set.
The second operation is called generalization and its role is to capture the behaviour of variables that grow by accumulating increments in the loop, and make explicit the dependence on the number of iterations. The identification of which additive terms in a MP should be considered as increments that accumulate is at the heart of our problem, and its solution led to the definition of iterative MPs and iterative kernel below.
Definition 6.9. The support sup p of a polynomial p is the set of variables on which it depends, identified by their indices, e.g., sup( Definition 6.10. For p an (abstract) multi-polynomial, we say that We also say that the entry p[i] is self-dependent; the choice of term depends on context and the meaning should be clear either way. We denote by SD(p) the set {i : i ∈ sup p[i]}, i.e., the self-dependent variables of p.
It is easy to see that, given that variable X i is polynomially bounded in a loop, if x i is self-dependent, then p[i] must be of the form x i + q, where q is a polynomial that does not depend on x i (we say that x i is linearly self-dependent in p). Otherwise, p represents a transition that multiplies X i by a non-constant quantity and iterating p will cause exponential growth. Definition 6.11. We call an (abstract) MP p doubling-free if for any i, if entry p[i] has a monomial m · x i then m = 1.
Assuming that we have made sure that a loop under analysis is polynomially bounded (as briefly discussed in Section 3 and more fully in [BAH20]), then all the loop transitions must be doubling-free. Hence if p[i] depends on x i , it must include the monomial x i .
. We call p iterative if all its monomials are iterative with respect to p.
Iterative AMPs are crucial in the analysis of loops as, intuitively, they represent a transformation that can happen multiple times. To see what we mean compare the following assignment statements: • X 1 :=X 2 +X 3 , represented by the AMP p = x 2 + x 3 , x 2 , x 3 . Both x 2 and x 3 are selfdependent and therefore p is iterative. In fact, if iterated any number of times, the resulting AMP is the same (in this case even the concrete MP remains the same, since there is no growth in the loop). • X 1 :=X 1 +X 3 , represented by the AMP p = x 1 + x 3 , x 2 , x 3 . All three variables are selfdependent and therefore p is iterative. In fact, if iterated any number of times, the resulting AMP is the same. Importantly, the concrete MP will not be the same: increments of x 3 accumulate in variable X 1 . The algorithm will have to correctly express this growth, is not, therefore q is not iterative. In fact, if iterated twice, we get x 1 = x 1 + x 3 + x 2 ; informally, the action of the first application is to add x 3 to X 1 while the action of the second is to add x 2 . It would have been incorrect to generalize from the first step and assume that increments of x 3 will be accumulated on iteration.
Definition 6.13 (ordering of multi-polynomials). For p, q ∈ AMPol we define p q to hold if every monomial of p[i] also appears in q[i] for i = 1, . . . , n. We then say that p is a fragment of q.
Lemma 6.14. Let p ∈ AMPol. Then p has a unique maximal iterative fragment, which shall be denoted by p ("the iterative kernel of p").
Proof. Every monomial which is not iterative in p will not be iterative in any fragment of p, so form q by deleting all these monomials. Then we must have i q for any iterative ı p. On the other hand, q is clearly iterative, so q is the maximal iterative fragment.
Note that even if all entries of p are non-zero, p may include zeros, e.g., for p = Consider a loop that has a transition p with p[3] = x 3 + x 1 x 2 , where x 1 and x 2 are also self-dependent in p. Then the initial contents of variables X 1 and X 2 are preserved (or may even grow) when we iterate p; and X 3 accumulates an increment of (at least) x 1 x 2 on each iteration. This motivates the following definition.
Definition 6.15 (generalization). Let p ∈ AMPol be iterative; we define the generalization of p to be To continue the example presented above the definition, let We now define the operators used in analysing a loop command: Recall that we have defined This means that one takes the set S representing the analysis of the loop body, applies the function LC, which generates τ -AMPs, and concludes by substituting x , the maximum number of iterations, for τ .
Example 6.16. In Example 6.1 we have seen that and in Example 6.8 we have seen that Next, we apply generalization. We will take a shortcut and apply it only to p, as it subsumes the two previous AMPs. We have: Then, in the final closure computation, we construct the composition and this is as high as we get. So we finish by substituting x 3 (being the loop bound) for τ , obtaining the result We conclude this section with some comments about the improvement of the above algorithm on [BAH20]. First, in [BAH20], iterative kernels were not used. Generalization was applied to idempotent elements in the closure set (satisfying p • p = p); an idempotent AMP is not necessarily iterative, and the definition of the generalization operator was slightly more complex than the definition for iterative AMPs. The choice to focus on idempotent elements was guided by an algebraic theorem that we used to prove the algorithm correct (Simon's Forest Factorization Theorem). Another difference is that in the original algorithm, the computation of the closure and subsequent generalization had to be performed in several rounds (until the set stabilized); in contrast, when generalization is applied to iterative kernels as done here, it suffices to compute closure, generalize and close once more, as expressed by (6.2).

The Polynomial-Space Algorithm
In this section we provide the polynomial-space algorithm for the univariate bound problem. We will try to present the algorithm in a way which motivates it and clarifies why it works. First, let us remind the reader that we are trying to compute attainable functions: such a function describes a bound that can be reached (for almost all inputs, on at least one possible computation for every such input), and therefore provides a lower bound on the worst-case result; moreover, we know by Theorem 4.1 that a complete set of attainable bounds exists, in the sense that it also implies a tight upper bound (specifically, the max of this finite set of functions is a worst-case upper bound). Intuitively, since an attainable function is "something that can happen," to find such functions we just have to evaluate the program symbolically and see what it computes: expressed as a function of the input, this will be an attainable function. Our evaluator will be non-deterministic and so generate a set of attainable functions. It will be then necessary to prove that this is a complete set.
7.1. The Simple Case: Without Additions in Loops. As long as we do not need generalization-which is only necessary when additions are performed within a loopour analysis (Section 6) is nothing more than a symbolic evaluation, where instead of maintaining a concrete state as a vector of numbers and applying the operations numerically, we maintain AMPs that represent the computation so far, and apply symbolic arithmetics to them. While C S represents all the possible effects of C, as a set, we develop in this section a non-deterministic evaluator, which only follows one possible path. Thus a choose command will be implemented verbatim-by non-deterministically choosing one branch or the other. Similarly, in a loop, the non-deterministic evaluator literally iterates its body a non-deterministic number of times. The information that it has to maintain, besides the current command, is just the current symbolic state. Importantly, since we are provided with an initial univariate state, all our computation is with such states, which are much more compact than AMPs. Given a degree bound d, a univariate state can be represented by a vector in [d] n . This representation requires polynomial space. Next, we give a set of definitions for a simple non-deterministic interpreter Ev S , which does not handle generalization (we give this simple version first for the sake of presentation). Throughout this presentation, s refers to a univariate state.
The interpretation of skip, assignment and multiplication commands is the obvious where set, mul are as in the last section, and ⇒ is the evaluation relation. Since the addition of abstract univariate monomials x a and x b is x max(a,b) , we could simulate addition ; however we prefer to use non-determinism and define two alternatives for evaluating this command: This non-determinism means that we can get the precise result as well as another one which is lower and seems redundant. However, this redundancy turns out to be advantageous later, when we get to multivariate bounds. The interpretation of the choice and sequencing commands is natural, using nondeterminism to implement choose.
For the loop, since we are assuming that there is no addition-hence generalization is not necessary-we just compute composition-closure, i.e., iterate the loop body any finite number of times, using the auxiliary function Ev S * C : For any x, the set C S • x and the set of possible results of evaluating Ev S C x have the same maximal elements; thus they define the same worst-case bound.
The difference between computing with full AMPs and evaluating with univariate states as above can be explained as follows: the effect of any computation path of the program is given by an expression p 1 p 2 . . . p j where p i represents the ith atomic command (i.e., an assignment) in the path, and juxtaposition pq means applying q after p (left-to-right composition). In order to evaluate the result given an initial state i, we compute The point is that thanks to associativity, we can parenthesize such an expression in different ways. The expression i(p 1 p 2 . . . p j ) represents transforming the path to an AMP (our symbolic semantics from the last section) and then applying the AMP to the given initial state. On the other hand, Ev S implements the computation ((ip 1 )p 2 ) . . . p j (which is our default reading of a product-left-associative), which gains efficiency, since we only maintain a univariate state. 7.2. Data-flow matrices. When we consider loops that include addition, we have to introduce generalization, in order to account for repeated increments. From the definition of the function LC (Eq. 6.2), we can represent each of the values of LC( C S ) by an expression of this form: p 11 p 12 . . . p 1j 1 p 21 p 22 . . . p 2j 2 τ p 31 p 32 . . . p 3j 3 p 41 p 42 . . . p 4j 4 τ · · · (7.1) where each p ij is an AMP from C S , representing a single iteration. Our interpreter will start with a given univariate state and repeatedly apply the body of the loop; due to non-determinism, each application is equivalent to the application of one possible AMP from C S , so we are effectively applying p 11 p 12 . . . . However, the parenthesized expressions need a different handling. In order to compute p τ , even if we only want to evaluate it on a univariate state-i.e., i p τ -we need to know more about p: namely, we have to identify self-dependent variables, and to ensure that generalization is sound (i.e., only applied to iterative fragments). In order to track variable dependences and verify self-dependence, we now introduce data-flow matrices and add their maintenance to our interpreter.
depends on x i , and 0 otherwise.
Example 7.2. A data-flow matrix can be associated with any sequence of assignments. Consider the command X 1 :=X 1 *X 2 ; X 3 :=X 2 +X 3 , in a program with n = 3 variables, it is represented by the AMP p = x 1 x 2 , x 2 , x 2 + x 3 . The data-flow matrix is In previous works [NW06, JK09, BJK08], matrices were not Boolean, and the values of their components used to differentiate types of value dependence (the precise meaning of "type" varies). For our algorithm, we find that we only need to track one type of dependence, namely linear dependence. Non-linear dependence will not be tracked at all (it could be, but this would complicate the algorithm; we go by the rule of maintaining the minimal information necessary for a purpose). In order to compute data-flow matrices for composite commands we use ordinary Boolean matrix product. The reader is invited to verify that, if A represents a command C and B represents command D, then A·B represents the data-flow in C;D. This works both for DFM and for LDFM. In terms of AMPs, we have Let us define some notations. I is the identity matrix (we always assume that the dimension is fixed by context; it is the number of variables in the program under analysis). The "assignment" matrix A(j, i) represents the data-flow in the command X i := X j . It differs from I in a single column, column i, which is the jth unit vector: Ev M skip s ⇒ (s, I) We similarly define E(i) ("erase") to differ from I in having the ith column zero. This represents the LDFM of X i := X j * X k . The following observations will be useful.
• Let M be an LDFM representing some transformation p. Then I ∧ M (the ∧ is applied component-wise) gives the self-dependent variables of p (as the non-zero diagonal elements).
7.3. Maintaining Data-Flow Matrices in Non-Looping Commands. Next, we give to each non-loop command a (non-deterministic) semantics in terms of both a univariate state and a data-flow matrix. The evaluator Ev M computes this semantics. It is invoked as Ev M C s, to evaluate command C over initial univariate state s. This returns a pair: the final state and a data-flow matrix, abstracting the computation that took the given state to the final one. See Figure 2. Note that in the case of addition, even if we use the fact that the sum of two abstract monomials equals one of them, we still record the data-flow from both summands. 7.4. Handling loops. When we handle a loop, we will be (in essence) evaluating expressions of the type in Section (7.1). We now outline the ideas used for simulating a parenthesized sub-expression p 1 p 2 . . . p j τ . We introduce the following notation for changing a set of components of x: Definition 7.6. For Z ⊂ [n], and an n-tuple x, x[Z ← 0] is obtained by changing the components indexed by Z to 0. We can use this notation with univariate states as well as with (abstract) MPs. In fact, we note that Let us recall how, in the Closure Algorithm, we arrive at p τ for some p ∈ Cl( C S ).
(1) We form p ∈ Cl( C S ) as p = p 1 p 2 . . . p j where every p i ∈ C S represents the effect of an iteration.
(2) We reduce p to p . This involves deleting monomials which are not self-dependent.
(3) We apply generalization to form p τ .  Our PSPACE algorithm never forms p. But it goes through p 1 , p 2 . . . p j while simulating their action on an initial univariate state, say x. If we always do this faithfully, we end up with a result that corresponds to an application of p to the univariate state, p • x. Now we have two challenges: (1) how to obtain p • x, which may differ from p • x.
(2) how to simulate generalization. The solution to both challenges involves two tools: first, the data-flow matrix; second, we modify the initial state for the simulation of this sequence of loop iterations by nondeterministically setting a set of components to zero. Let us explain how this handles the challenges.
(1) p is obtained from p by deleting all non-iterative monomials. It is easy to see that where by S we mean the complement of S. Our interpreter will guess a set Z and replace x with x[Z ← 0]. After simulating a number of iterations of the loop-corresponding to the application of some p ∈ Cl( C S )-we verify that we made a valid guess, meaning that all variables not in Z are self-dependent in p, using the associated data-flow matrix. Note that a set larger than SD(p) is considered valid; this is important for the next point.
(2) Consider p[i] of the form x i + q, where q(x 1 , . . . , x n ) depends only on self-dependent variables: such an entry is subject to generalization, multiplying q by τ , which is later replaced by the loop bound x . In algorithm GEN below, we shall multiply by x directly without the intermediate use of the τ symbol. So the expected result is x i + x · q, which, when evaluated with abstract univariate states, is max(x i , x · q). We see that we can get a correct result in the following way: we non-deterministically either set x i to 0, and at the end multiply the i'th component of the result by x , or we modify neither x i nor the final state. In the first case we compute x · q, and in the second max(x i , q). The latter result is always sound (it skips generalization), while the former is sound as long as generalization can be rightfully applied, a condition which can be checked using the data-flow matrix (namely that we have evaluated an iterative MP and that i ∈ SD(p)). The code for interpreting a loop is presented in Figure 3. The non-deterministic alternatives for Ev M * and Ev M correspond to zero or more loop iterations respectively. Note how the definition of Ev M corresponds to the expression Cl(Gen(Cl( C S ))) in Equation 6.2: an element of the inner closure, Cl( C S ), corresponds to a simulation of a finite number of iterations, as done by function Ev M * ; an element of the outer closure is a composition of a finite number of elements, some of which are generalized (we actually generalize whenever possible).
Function GEN checks, using the data-flow matrix, whether we have indeed simulated an iterative MP, and if we did, it applies generalization by multiplying by x the outcome of self-dependent variables that have been initialized to zero in this sub-computation. As explained above, the effect is that we multiply the increments (the term q where p[i] = x i +q). Note that the multiplication is performed by applying mul ii for each such i. There may be several such indices, hence the use of notation in s gen (it expresses iteration of postfix composition). The formula for M gen deserves some explanation. Note that it does not depend on Z at all. This is important for the correctness proof. Informally, assuming we have reached the current state by simulating a sequence of assignments equivalent to an AMP p, the expression (I ∧ M )M is LDFM( p ); multiplying by (I ∧ M ) results in removing from this matrix the columns that correspond to self-dependent entries; and then we put them back by adding I ∧ M . The point of this exercise can be seen in the next example.
Example 7.7. We illustrate an application of function GEN . We refer back to Example 6.1. Recall that the analysis of the loop body produced the following AMPs We focus on the path corresponding to the first AMP, and consider its evaluation with initial state s = x 1 , x 2 , x 3 , x 4 . Applying Ev M * , we may obtain the final result s = x 3 , x 4 , x 3 , x 3 along with the matrix effectively replaces the first column with a unit column. Why? In M , we had two 1's in this column, corresponding to the expression x 1 + x 3 . Generalization turns this into x 1 + x 3 · x 3 . But we are only interested in linear terms, so we replace this column with a unit column, erasing the dependence on x 3 .

Complexity
We now consider the space complexity of the algorithm. The size of a univariate state, implemented as a degree vector, is O(n(1 + log d)) bits where d ≥ 1 is the highest degree reached. In the problem formulation where a degree is specified as input, the algorithm can safely truncate any higher degree down to d, as this will still give a correct answer to the user's question. It remains to consider the space occupied by the recursion stack. This will be proportional to the depth of the program's syntax tree once we avoid the calls of functions Ev M * and Ev M to themselves, or rather change them into tail calls. This can be done by routine rewriting of the functions, using an auxiliary "accumulator" parameter, so for instance the code Note that the non-deterministic nature of the algorithm places the decision problem in NPSPACE, however by Savitch's theorem this implies PSPACE as well. The corresponding generation problem can also be determinized by essentially turning it into an exhaustive search, still in polynomial space. Finally we would like to recall that a program in our language can have exponentially-growing variables as well, and emphasize that handling them, as described in [BAH20], does not increase the complexity of our problem (the output in such a case will simply indicate that the variable has an exponential lower bound; no tight bound will be given).

Proof of the PSPACE Algorithm
The purpose of this section is to show that our PSPACE algorithm, Ev M , obtains correct results. How do we define correctness? The original goal of the algorithm is to obtain symbolic expressions that tightly bound the concrete numeric results obtained by a corelanguage program, applied to an initial integer-valued state. However, thanks to our previous work, we know that such a set of symbolic expressions-namely AMPs-is obtained by the abstract semantics of Section 6. So now we are able to define C S as our reference and define our goal as matching the results it provides, specialized to a univariate initial state. So the purpose of this section is to prove the following: Theorem 9.1. The interpreter Ev M satisfies these two correctness claims: (1) Soundness: Given any univariate state x, if Ev M C x ⇒ * (y, M ) (for any M ) then ∃p ∈ C S such that y ≤ p • x.
(2) Completeness: Given any p ∈ C S , and univariate state x, there is y ≥ p • x such that Ev M C x ⇒ * (y, M ) for some M .
Note that we can also state this as follows: for any x, the set C S • x and the set of possible results of evaluating Ev M C x have the same maximal elements; thus they define the same worst-case bound. Obviously, the proof is inductive and to carry it out completely we also have to assert something about the matrices. We define a notation for brevity: Definition 9.2. Let f be a program function that, given a univariate state, evaluates to a pair-a univariate state and a matrix. Let S be a set of AMPs. We write: The full correctness claim will be the following: Theorem 9.3. The interpreter Ev M satisfies these two correctness claims: (1) Soundness: Given any command C, Ev M C → C S .
(2) Completeness: Given any command C, Ev M C ← C S .
Correctness is straight-forward for straight-line code (this basically amounts to the associativity argument proposed in Section 7.1, with Lemma 7.5 justifying the matrices), and it trivially extends to commands with branching, and even to loops without additions (since they are analyzed by unrolling finitely many times). It is where generalization is used that correctness is subtle and so, this is the focus of this section. Since the case of straight-line (or more generally, loop-free) code is simple, we will state the properties without a detailed proof. 9.1. Correctness without loops.
Lemma 9.4. Let C be a loop-free command. Then Ev M C → C S and Ev M C ← C S .
Ev M * just iterates Ev M an undetermined number of times, which matches the definition of composition-closure precisely so the following is also straight-forward.
Lemma 9.5. Let C be a loop-free command. Then Ev M * C → Cl( C S ) and Ev M * C ← Cl( C S ).

A.M. Ben-Amram and G.W. Hamilton
Vol. 17:4 9.2. Soundness of the analysis of loops. We now wish to extend the soundness claim to loops. The proof is by structural induction, where all loop-free commands are covered by the above lemmas, and similarly, given correctness for commands C and D, correctness for their sequential composition, or non-deterministic choice, follow straight from definitions. So the main task is to prove the correctness for loops, namely the correctness of Ev M . The main task is to relate the possible results of GEN (Z, , Ev M * C (x[Z ← 0])), where Z ranges over subsets of [n], with members of the AMP set Gen(Cl( C S ))[x /τ ]. Note that in the expression Gen(Cl( C S ))[x /τ ], unlike in Section 6, we substitute the x for τ immediately after generalization; this is done for the sake of comparing the results, since Ev M does not use τ . This modification of the Closure Algorithm is harmless, as for AMPs p, q from this set, we can use the rule (p , due to the assumption that X is not modified inside the loop. Note that this lemma constitutes an induction step; it is used under the inductive assumption that for the loop body C, we have soundness. Next, suppose that matrix M satisfies the condition for generalization (∀i / ∈ Z : M ii = 1). By the inductive assumption, we can choose p ∈ Cl( C S ) such that s ≤ p • x[Z ← 0] and M = LDFM(p). We now wish to show that y = s gen ≤ p τ [x /τ ] • x. We focus on the non-trivial case which is that of an index k to which generalization applies, i.e., k ∈ Z and M kk = 1. Thus x k is self-dependent in p.
We conclude that the lemma is fulfilled by r = p τ .
Regarding the computation of the matrix M gen , we refer to the text in Section 7.4. It explains why, given M = LDFM(p), we get M gen = LDFM( p τ [x /τ ]).  9.3. Completeness for an unnested loop. Again, the crux of this proof will be the application of generalization. Here we want to show that when the Closure Algorithm generalizes, Ev M can match its result. Based on the presentation in Section 7.4, we might expect that for p ∈ Cl( C S ), given that Ev M * can match p • x, we will capture the results of generalization by making a correct guess of the self-dependent variables and the accumulators. Let us first formalize the latter term.
So that, given a univariate state x, Let Acc( , p, x) be the set of indices i where the first term under the max is larger; intuitively, where generalization increases the result. We refer to Acc( , p, x) as the accumulators in the computation under consideration (i.e., iterating p starting at state x).
Recall that function GEN is applied to a state-matrix pair (s , M ) obtained from Ev M * C s. By correctness of this analysis (i.e., the inductive assumption), we assume s ≥ p • s, where p ∈ Cl( C S ), and M = LDFM(p). We shall say that this application of Ev M * simulates p (in the completeness statement, we quantify over p first, so it may be considered fixed throughout the discussion). When Ev M * is applied to a state in which some entries are set to 0, namely, s[Z ← 0], this is equivalent to simulating p • Id [Z ← 0] on s. Then, GEN possibly multiplies some entries of s by s [ ] = s[ ]. This can be seen as modifying p • Id [Z ← 0] into a new AMP q by multiplying some entries by x , before applying the AMP to s. We express this by writing q = GEN (Z, p, ), noting that the action turning p into q depends on p, M (which is determined by p) and Z, but not on s. We also note that M gen depends only on M .
Based on the presentation of the algorithm, the reader may infer that we intend that, for an iterative p, letting A = Acc( , p, x), we should have GEN (A, p, However, this is not always the case, as shown by the following example. Example 9.9. Let p = x 1 , x 2 + x 1 , x 3 + x 2 , x 4 , and = 4. Note that p is iterative. Let x = x 2 , x 2 , x 1 , x 1 . Then Acc( , p, x) = {2, 3}. Letting A = {2, 3}, it is easy to check that q = GEN (A, p, ) = x 1 , x 1 x 4 , 0, x 4 . Since we have begun our computation with x[A ← 0] = x 2 , 0, 0, x 1 , the final state is x 2 , x 3 , 0, x 1 . Note that the result falls short of The solution to this mismatch is given by Lemma 9.12 below, which shows that by calling GEN multiple times in succession, in other words, by going through a number of recursive calls of Ev M , completeness is recovered.
We shall now make some preparations for this lemma.
The Dependence Graph. The matrix DFM(p) may be seen as the adjacency matrix of a graph, which we call G(p). Arcs in this graph represent data-flow in p (we use DFM(p) and not LDFM(p), so this includes non-linear dependence). Paths in the graph correspond to data-flow effected by iterating p. For instance, if we have a path i → j → k in G(p), then x i will appear in the expression (p • p)[k].
Lemma 9.10. If p is iterative, and causes no exponential growth when iterated on concrete data, then G(p) has no cycles of length greater than 1. Note that the assumption refers to concrete computation (p • p • . . .), and that we are building on the assumption that the loop under analysis is polynomially bounded (this was presumably verified beforehand, see Section 3).
Proof. Assume, to the contrary, that G(p) does have a cycle i → · · · → k → i, where k = i. Let r be the length of the cycle. Since p is iterative, all these variables must be self-dependent (as some other entry depends on them). Thus we have p[i] = x i + q(x 1 , . . . , x n ) where q involves x k ; but p[k] depends on x k−1 , and so on; hence p (r−1) [k] ≥ x i , and p (r) [i] ≥ 2x i . Thus iteration of p generates exponential growth, contradicting the assumption.
When focusing on a particular AMP we may assume, w.l.o.g., that the variables are indexed in an order consistent with G(p), so that if x i depends on x j then j ≤ i. We shall refer to an (abstract) MP satisfying this property as neat 3 .
We also state a rather evident property of an iterative MP: Lemma 9.12. Let p ∈ AMPol be neat and iterative, and let x be a univariate state. There are sets Z 1 , . . . , Z n−1 such that Proof. Let A = Acc( , p, x). We define sets Z i for i = 1, 2, . . . , n − 1 as follows: We now state an inductive claim and prove it. The claim is: for 1 ≤ i ≤ n, for all j ≤ i, Note that (9.1) specializes to the lemma's statement for i = n.
To start the induction, consider i = 1. Then we only have to prove that By neatness, p[1] can only depend on x 1 , in fact p[1] = x 1 . Also p τ [1] = x 1 . So equality holds. Now, let k > 1, and assume that (9.1) holds when i = k − 1. To prove it for i = k, we consider some sub-cases.
(1) j ≤ k and j / ∈ SD(p): in this case, p[j] only depends on x t with t < k where, by IH, 3 Readers who like linear algebra may draw some intuition about neat MPs from thinking about triangular matrices whose diagonal elements are in {0, 1}. Interestingly, this structure has also popped up in other works in the area [HFG20].
We conclude, using Lemma 9.5, that starting with x, and making at most n recursive calls, Ev M can reach y ≥ r[x /τ ] • x as desired. The correctness of the matrix is again the equation M gen = LDFM( p τ [x /τ ]), which has been argued in Section 7.4.

Straight-forward induction, and the definition of loop X {C} S , give
Lemma 9.14 (Completeness for loops). Let C be a command such that Ev M C ← C S . Then Hence,

PSPACE-Completeness of bound analysis
In this section we complement our PSPACE upper bound with a hardness result, to show that our classification of the problem's complexity as PSPACE is tight. The hardness proof is a reduction from termination of Boolean Programs, a known PSPACE-complete problem. First, we state the definition of the decision problem to which we reduce. This is a special case of the univariate-bound decision problem, with a fixed initial state.
Definition 10.1. The decision problem Deg (for "degree") is defined as the set of triples (P, j, d) such that P is a core-language program, where the maximal value of X j at the completion of the program, in terms of the univariate input x = x, x, . . . , x , has a polynomial lower bound of Ω(x d ).
The complexity of the problem is classified in relation to the "input size" defined as: |P | + d, where |P | is the size of the syntax tree representing P . This means that we allow d to be represented in unary notation (and we will find that this does not affect the complexity class).
Instructions I have two forms: X i := not X i , and if X i then goto else . Here 1 ≤ i ≤ k and , , ∈ {0, 1, 2, . . . , m}, where , , are always three different locations. Semantics: the computation by b is a finite or infinite state sequence b ( 1 , σ 1 ) → ( 2 , σ 2 ) → . . ., where each store σ assigns a truth value in {true, false} to each of b's variables, and t is the program location at time t.
We are considering input free programs. These programs have a fixed initial state: 1 = 1, and σ 1 assigns false to every variable. Given state ( t , σ t ), if t = 0 then the computation has terminated, else the following rules apply. If instruction I t is X i := not X i , then σ t+1 is identical to σ t except that σ t+1 (B i ) = ¬σ t (B i ). Further, t+1 = ( t + 1) mod (m + 1).
If instruction I t is if X i then goto else , then σ t+1 is identical to σ t . Further, t+1 = if σ t (B i ) = true, and t+1 = if σ t (B i ) = false.
The following lemma is proved in [Jon97, Chapter 28] (with a trivial difference, concerning acceptance instead of termination): Theorem 10.4. The problem DEG is PSPACE-hard, even for programs with a single loop.
Proof. Reduction from problem B above. Suppose program b = 1:I 1 2:I 2 ...m:I m has k variables B 1 ,. . . , B k . Without loss of generality, each variable has value false after execution, if b terminates (just add at the end of b one test and one assignment for each variable.) Program p will have 2+2k(m+1) variables named X 1 , X 2 , and then X ,i,v for all 0 ≤ ≤ m, 1 ≤ i ≤ k and v ∈ {0, 1}. Informally, the program simulates the Boolean program, such that the pair X ,i,0 and X ,i,1 represent the value b of variable B i when program location was last visited. This is not a deterministic simulation, due to the absence of deterministic branching; instead, the program may take many paths, most of which are wrong simulations; and the contents of variables X ,i,v will reflect the correctness of the simulation. In fact, the value of each such variable will be either x or x 2 , where the value x denotes error. That is, if the program reaches location with B i = v, only the path that simulates it correctly will have X ,i,v = x 2 .
We first define the initialization command INIT: for all i, it sets X 1,i,0 to X 2 1 (the initial value of B i is false). All other simulation variables are set to X 1 .
For every program location we define a command C "simulating" the corresponding instruction, as follows: • For instruction : X i := not X i , C modifies only variables X ,i,v where = ( + 1) mod (m + 1). For each v = 0, 1, it sets: X ,i,v := X ,i,¬v while for all j = i, and v = 0, 1, it sets X ,i,v := X ,i,v • For instruction : if X i then goto else , C modifies the variables X ,i,v and X ,i,v as follows: X ,i,0 := X 1 ; 0 is definitely an error here X ,i,1 := X ,i,1 ; 1 is an error if it was so before X ,i,1 := X 1 ; 1 is definitely an error here X ,i,0 := X ,i,0 0 is an error if it was so before For all j = i, we simply have, for both values of v, Finally, these commands are put together to make the program p: INIT; loop X 1 { choose C 1 or C 2 or ... C m }; X 2 := X 0,1,0 * X 0,2,0 * ... * X 0,k,0 The outcome of the reduction is (p, 2, 2k). Thus, we are asking whether the final value of X 2 depends on x with a degree of 2k or more, which will only happen if the loop is completed with x 2 in each of the variables X 0,i,0 . This means that program b could be successfully simulated up to a point where the location was 0 and all variables false (that is, the program really terminated).
A comment is due on the subject of the loop bound (which we chose, somewhat arbitrarily, to be x). Obviously we do not know in advance how long b runs. But since it is input-free, if b terminates, its running time is a constant and the desired output (x 2k ) will be achieved for all values of x large enough to permit the complete simulation of b. If b does not terminate, the output will always be bounded by x 2k−1 .
What is surprising in the above proof is the simplicity of the programs; curiously, they do not contain implicit data-flow in loops, which was the main challenge in the problem's solution.

Checking Multivariate Bounds
In this section we are moving to multivariate bounds. We show that checking a multivariate bound can be reduced to the univariate problem. We focus on the following decision problem (first defined in Section 3). Problem 3.1. The multivariate bound decision problem is: Given a core-language program P and a monomial m(x 1 , . . . , x n ) = x d 1 1 . . . x dn n , report whether m constitutes an attainable bound on the final value of X n ; namely whether constants x 0 , c > 0 exist such that We later discuss how to solve the related search problem, asking to find a complete set of attainable bounds. If there is no polynomial upper bound on x n , then all monomials are attainable. This case can be detected in polynomial time using the algorithm of [BJK08], so we assume henceforth that x n is polynomially bounded. In fact, as already discussed, we may assume that all super-polynomially growing variables have been excluded.
Given P, we consider the set of attainable monomials (forming positive instances of the problem). We represent a monomial by a column vector of degrees: d = d 1 , . . . , d n T . The main idea in this section is to make use of the geometry of this set of vectors by viewing the problem as a linear programming problem. Before proceeding, we recall some background knowledge.
11.1. Polyhedra. We recall some useful definitions and properties, all can be found in [Sch86]. Point x ∈ Q n is a convex combination of points x 1 , . . . , where all a i ≥ 0 and i a i = 1. The convex hull of a set of points is the set of all their convex combinations.
A rational convex polyhedron P ⊆ Q n (polyhedron for short) can be defined in two equivalent ways: (1) As the set of solutions of a set of inequalities Ax ≤ b, namely P = {x ∈ Q n | Ax ≤ b}, where A ∈ Q m×n is a rational matrix of n columns and m rows, x ∈ Q n and b ∈ Q m are column vectors of n and m rational values respectively. Each linear inequality (specified by a row of A and the corresponding element of b) is known as a constraint.
(2) As the convex hull of a finite set of points x i and rays y j : P = convhull{x 1 , . . . , x m } + cone{y 1 , . . . , y t }, (11.1) or more explicitly: x ∈ P if and only if x = m i=1 a i · x i + t j=1 b j · y j for some rationals a i , b j ≥ 0, where m i=1 a i = 1. The vectors y 1 , . . . , y t are recession directions of P, i.e., directions in which the polyhedron is unbounded; in terms of the constraint representation, they satisfy Ay i ≤ 0. The points x i of a minimal set of generators are the vertices of P. For any set S ⊆ Q n we let I(S) be S ∩ Z n , i.e., the set of integer points of S. The integer hull of S, commonly denoted by S I , is defined as the convex hull of I(S). A polyhedron P is integral if P = P I . This is equivalent to stating that it is generated (Eq. 11.1) by a set of integer points and rays. In particular, its vertices are integral.
We will be using some results from [Sch86], regarding the computational complexity of algorithms on polyhedra. Thus we adhere to definitions used by the author, which we now recite for completeness. Schrijver denotes the bit-size of an integer x y x = 1+ log(|x|+1) ; the bit-size of an n-dimensional vector a is defined as a = n + n i=1 a i ; and the bit-size of an inequality a · x ≤ c as 1 + c + a . For a polyhedron P ⊆ Q n defined by Ax ≤ b, the facet size, denoted by P φ , is the smallest number φ ≥ n such that P may be described by some Ax ≤ b where each inequality in Ax ≤ b fits in φ bits. The vertex size, denoted by P ψ , is the smallest number ψ ≥ n such that P has a generator representation in which each of x i and y j fits in ψ bits (the size of a vector is calculated as above). The following theorem [Sch86, part of Theorem 10.2] relates the two measures: Theorem 11.1. Let P be a rational polyhedron in Q n ; then P φ ≤ 4n 2 P ψ .
11.2. Monomial bounds and linear programming. From this point on, we fix S to mean the set of attainable monomials of a given program (as vectors in N n ).
Let u = u 1 , . . . , u n ∈ N n be a (row) vector representing a univariate input, as in Section 7. Let m(x 1 , . . . , x n ) = x d 1 1 . . . x dn n . Instead of x d 1 1 . . . x dn n we write, concisely, x d . Then note that m(x u 1 , . . . , x un ) = x d 1 u 1 +···+dnun = x u·d . We now state Lemma 11.2. Finding a tight upper bound on the final value of X n given an initial univariate state u is equivalent to the following optimization problem: Given u, maximize u · d subject to d ∈ S .
(11.2) Note that the multivariate problem is simply to decide S. The statement is by no means trivial. Luckily it comes easily out of the results of [BAH20].
Proof. According to [BAH20], given that the values computed by the program are polynomially bounded, the set S of all the attainable multivariate monomials provides tight upper bounds as well (more precisely, taking the maximum of these monomials gives a well-defined function-since the set is finite-and this function is an asymptotic upper bound). If we plug u as initial state, the set of bounds becomes x u·d where d ranges over S. These are univariate monomials, so they are fully ordered. The highest bound, namely the maximum value of u · d, is the tight worst-case upper bound for the program.
So far, we have only considered bounds which are polynomials, where the exponents are integer. However, a function m(x 1 , . . . , x n ) = x d 1 1 . . . x dn n where the d i are non-negative rational numbers is a perfectly valid candidate for comparison with the results of a computation. Let T be the set of rational-valued vectors d such that the corresponding function m is attainable. Clearly S ⊆ T ; in fact, S = I(T ). The inclusion is clearly strict. However, when we consider the optimization problem (11.2), we have Proof. Since S ⊆ T , clearly max d∈T u · d ≥ max d∈S u · d. Suppose that the inequality is strict; then there is an attainable function m = x d (with d ∈ T ) such that m(u) has a greater exponent than m (u), for all m = x d where d ∈ S. This contradicts Corollary 4.2.
We now make a couple of observations on the shape of T . Proof. Immediate since if a 1 , . . . , a n ≤ b 1 , . . . , b n then x a 1 · · · x an ≤ x b 1 · · · x bn .
We conclude that T is a union of boxes. We can say more.
Lemma 11.5. If functions f, g : N n → R are attainable, so is max(f, g).
This can be easily checked against the definition (Definition 3.2).
Proof. Consider a monomial m = x d where d is the convex combination c j · d j of some d j ∈ T , where n j=1 c j = 1. We refer to the ith component of d j as d ji . Thus By the rational-weight form of the classic inequality of means [Ste04, Eq. (2.7)], a weighted geometric mean is bounded by the corresponding weighted arithmetic mean: The latter is an attainable function, therefore so is m. Hence d ∈ T . So T is convex, includes S, and the maximum of u · d for any u > 0 is obtained at a point of S. We conclude that T is an integral polyhedron [Sch86,§16.3]. It equals the convex hull of the boxes whose upper right corners are points of S (Figure 11.2, left). It will be convenient in the next proof to extend T by allowing negative numbers as well. This gives a polyhedron T which is unbounded in the negative direction (Figure 11.2, right); technically, T = T ∪ cone ( −1, 0, . . . , 0 , 0, −1, . . . , 0 , . . . , 0, . . . , 0, −1 ). Now we can use known results on the complexity of linear programming to obtain our result.
Lemma 11.7. The facet size of T is bounded by 4n 2 (n + d max ), where d max is the maximal attainable degree.
This follows directly from Theorem 11.1, using the fact that our polyhedron is integral, i.e., its vertices are members of S.
Theorem 11.8. The multivariate-bound decision problem can be solved in space polynomial in the size of the given program and the maximal attainable degree d max .
Proof. As explained above, our approach is to show that we can decide whether d ∈ T . As PSPACE is closed under complement, we can consider the converse question: whether d / ∈ T . This is equivalent to asking: is there a constraint a · x ≤ b, satisfied by T , such that a · d > b. Our decision procedure is non-deterministic and guesses a. Then we need to find b, which amounts to maximizing a · x over x ∈ T . Note that a will contain no negative numbers, since T does not satisfy any constraint with negative coefficients. In case a has non-integer rational numbers, we can scale by their common denominator. Thus, w.l.o.g. we may assume a to be integer-valued, and by Lemmas 11.2 and 11.3, finding whether b exists is equivalent to solving the univariate decision problem (Problem 3.5 on p. 8) with initial state a, querying the degree a · d.
So, we have reduced the multivariate-bound decision problem to the univariate problem. What is the complexity of the resulting algorithm? Assuming that we know d max , we can impose the bound given by Lemma 11.7. Since a consists of polynomially-big numbers (in terms of their bit size), our univariate algorithm (Section 7) solves the problem in polynomial space as stated.
What if we do not know d max ? In this case we can search for it; we give the procedure below.
In the case that d max is not known, our complexity result is less satisfying when the monomial presented to the decision problem is of a degree much smaller than d max . However, if what we really seek is not the decision problem for its own sake, but the discovery of tight upper bounds, then we do want to reach d max , and we can then describe our algorithm's complexity as polynomial space in the output-sensitive sense (that is, depending on a quantity which is not the size of the input but of the output). First, let us state the complexity of the search procedure.
Theorem 11.9. Given a core-language program P , the index of a chosen variable X j , we can maximize d max = i d i over all monomials x d that constitute an attainable lower bound on the final value of X n . The algorithm's space complexity is polynomial in the size of P plus d max .
Proof. Recall that a polynomial-space non-deterministic algorithm can be determinized in polynomial space (Savitch's Theorem). We search for d max using a deterministic implementation of our non-deterministic algorithm (Section 7). This deterministic algorithm can be used to give a definite answer to the query: is a given degree d attainable for X n . We start with d = 1 and repeatedly increase d (reusing the memory) until we hit a negative answer (as usual we assume that the possibility of super-polynomial growth has been eliminated first using [BJK08]). We can actually be quicker and only loop through powers of two, since this would give an approximation to d max which is close enough to be used in Theorem 11.8.

Generating Multivariate Bounds Directly
Our analysis algorithm from Section 7 can actually be modified to yield multivariate bounds directly, as we show next. Given this statement, the reader may wonder why we bothered with the version for univariate bounds, and the reduction of multi-to-uni-variate. Let us reveal right away, then, that despite the simplicity of the following algorithm, we do not have a direct proof for its correctness and we approach it based on the previous algorithms.
The new algorithm is obtained thanks to a rather simple observation about the analyzer Ev M , namely, that the only operator applied to our (univariate) monomials is product. Since the product of multivariate monomials is again a (multivariate) monomial, we can replace the data type of our symbolic state s from an n-tuple of univariate monomials to an n-tuple of multivariate ones. Such an n-tuple is a special case of AMP, and we also refer to it as a monomial state because it is used in the evaluator to abstract a state, just as the univariate state in the previous algorithm. Other than this change of datatype, no change to the code is necessary. In terms of implementation, a multivariate monomial is implemented as either an n-tuple of degrees, or a special constant representing 0. The space complexity of this representation remains polynomial in the number of variables and the maximum degree d max . We call the evaluator, thus modified, Ev MM . To illustrate the idea, consider a 3-variable program. Then its initial state is represented by Id = x 1 , x 2 , x 3 . Here are two examples of simple straight-line computations: where the data-flow matrices A(3, 2) and E(3) are as in Section 7.2. Proving the correctness of the algorithm falls, as usual, into soundness and completeness.
12.1. Soundness. The soundness for straight-line code is straight-forward to prove, so we content ourself with stating it: Lemma 12.1. Let C be a loop-free command. Given any monomial state s, if Ev MM C s ⇒ * (g, M ) (for any M ) then ∃p ∈ C S such that g p • s, and M = LDFM(p).
This means that if we set the initial state to Id , the monomials we generate are attainable in the sense of Corollary 4.2. We now wish to show the same property for loops. As in Section 9, the extension to Ev MM * is immediate: Lemma 12.2. Let C be a loop-free command. Given any monomial state s, if Ev MM * C s ⇒ * (g, M ) then ∃p ∈ Cl( C S ) such that g p • s, and M = LDFM(p).
As in Section 9, we proceed to cover any command by structural induction, where all loop-free commands are covered by the above lemmas, and similarly, given correctness for commands C and D, correctness for their sequential composition, or non-deterministic choice, Next, suppose that matrix M satisfies the condition for generalization (∀i / ∈ Z : M ii = 1), and let p be an element of Cl( C S ) such that s p • s[Z ← 0] and M = LDFM(p). Consider an index k to which generalization applies, i.e., k ∈ Z and M kk = 1. Thus We take r to be p τ . Correctness of M is as in the previous proofs.
Once we can do the induction step for loops, we easily get Lemma 12.4 (Soundness of Ev MM ). Let P be a core-language program. If Ev MM P Id ⇒ * (g, M ) (for any M ) then ∃p ∈ P S such that g p.
12.2. Completeness. We formulate our completeness claim based on the geometric view taken in the previous section. We maintain the convention that S denotes the set of attainable monomials (asymptotic lower bounds on the final value of X n ) for a program under discussion (arbitrary but fixed through the discussion), and T , T its extensions to rational-number polyhedra. Now, in the previous subsection we have argued for soundness of Ev MM in the same way that we did for Ev M , that is by comparison to the Closure Algorithm, and established the relation g p • s between Ev MM 's result and some p ∈ C S . A corresponding completeness claim might be that we can reach any monomial state g p • s. This is, however, not the case. Consider the two assignments: X 2 := X 1 + X 2 ; X 2 := X 2 * X 2 In Ev MM (starting with Id ), after the first assignment we have two monomial states, x 1 , x 1 and x 1 , x 2 ; and after the second, we have x 1 , x 2 1 and x 1 , x 2 2 . But the Closure Algorithm generates x 1 , x 2 1 + x 1 x 2 + x 2 2 . Note that there is a monomial, x 1 x 2 , which is not obtained by Ev MM . But this is OK, because the degree vector (1, 1) lies on the line between (2, 0) and (0, 2); i.e., it is not a vertex of the attainable-monomial polyhedron. In order to guarantee that we find the vertices, it suffices to compare the results on univariate states-this is the main insight of Section 11. Lemma 12.5. Let C be any command. Given any monomial state s, any p ∈ C S , and any univariate state x, there are a monomial state g and matrix M such that Proof. Fix s, p and x. By Theorem 9.1 there is y ≥ p • s • x such that and this justifies the lemma.
Corollary 12.6 (Completeness of Ev MM ). Let C be a core-language program. Let m = x d 1 1 · · · x dn n be a monomial representing an attainable bound on the final value of X n after executing C, and such that its degree vector d is a vertex of T . Then Ev MM C Id ⇒ * (g, M ) (for some M ) such that g[n] = m.
Proof. Since d is a vertex, there is a univariate state x for which d is the unique solution to the optimization problem (11.2), meaning that m • x maximizes the nth component in p • x when p ranges over C S . By the last lemma, this can be matched by Ev MM , and since Ev MM is also sound (i.e., it only produces attainable monomials), it must produce m.
Since we can find any vertex of the polyhedron, we have a complete solution in the sense that the max of all these monomials is a tight multivariate upper bound on the queried variable, as in Corollary 4.2.
12.3. Discussion. We have given a polynomial-space solution to the problem of generating a bound. Since the algorithm can generate all the vertices of the attainable-monomial polyhedron, it could be used to implement an alternative decision procedure as well, replacing the proof of Theorem 11.8. Note that, ultimately, both proofs rely on the insight expressed by Lemma 11.2, and even if this is a redundancy, we thought that presenting both approaches makes the picture more complete.
Looking back at Ev MM , as we have already noticed that the initial state does not really influence any decision by the algorithm, it is easy to see that we could rewrite the algorithm to remove this parameter. The algorithm then abstracts a command to a set in AMPol × M n (B), rather than a function that maps initial states to such sets. Next, we give this version in equational form, in the style of the Closure Algorithm. The result set may be of exponential size, so the non-deterministic interpreter form of the algorithm is useful to prove the PSPACE complexity class, but we feel that the following presentation is of a certain elegance. The abstraction of command C is denoted C M (M for Monomials). We define C M , first for non-looping commands: with function GEN defined just as in Ev MM (namely just as in Ev M , but calculating with multivariate monomials), as illustrated in the following example.
Example 12.7. Consider the following loop: loop X 5 { choose { X 3 := X 1 ; X 4 := X 2 } or X 1 := X 3 + X 4 } The abstraction of the loop body clearly yields the following pairs: In the closure of the loop body's abstraction we will find compositions of these results; e.g., composing the first with the second (respectively the third) corresponds to simulating the instructions: Vol. 17:4 X 3 := X 1 ; X 4 := X 2 ; X 1 := X 3 + X 4 ; and yields the results: To illustrate the effect of GEN , let Z = {1, 3, 4}. Inspecting the above matrix, we note that 3, 4 must be in the set to permit generalization, since they denote non-self-dependent entries; the presence of 1 in Z implies that we treat X 1 as an accumulator. The above two matrices produce, respectively, the following results We state the correctness of · M , which is based on the results established for Ev MM , by the simple observation that p ∈ C M ⇐⇒ (Ev MM C Id ⇒ * (p, LDFM(p))).
Theorem 12.8. Let P be a core-language program. The set P M satisfies the following properties: (1) (soundness) If m ∈ P M then ∃p ∈ P S such that m p.
(2) (completeness) Let m = x d 1 1 · · · x dn n be a monomial representing an attainable bound on the final value of variable X n after executing P, and such that its degree vector d is a vertex of T . Then there is m ∈ P M such that m[n] = m.
Note that the soundness clause indicates that m is an attainable AMP for P; and that in the completeness clause, the choice of X n is arbitrary. The same property holds for any chosen variable (and the corresponding set of attainable monomial bounds).

A Few Simple Extensions
A natural follow-up to the above results is to extend them to richer programming languages.
In this section we briefly discuss four such extensions. First, two trivial ones: • We can allow any polynomial (with positive coefficients) as the right-hand side of an assignment-this is just syntactic sugar. • We can include the weak (non-deterministic) assignment form which says "let X be a natural number bounded by the following expression." This construct may be useful if we want to use our language to approximately simulate the behavior of programs that have arithmetic expressions which we do not support, as long as a preprocessor can statically determine polynomial bounds on such expressions. No change to the algorithm is required.
Next, two extensions which are not trivial, but very easy.
Resets. [BA10] shows that it is possible to enrich the core language by a reset instruction, X := 0, and still solve the problem of distinguishing polynomially-bounded variables from potentially super-polynomial ones. Conceptually this is perhaps not a big change, but technically it caused a jump in the complexity of the solution, from PTIME to PSPACEcomplete. Our tight-bound problem is already PSPACE-complete, and we can extend our solution to handle resets without a further increase in complexity. In fact, with the abstract-interpreter algorithm, adding the resets is very smooth. We add a special constant 0, which means definitely zero -the symbolic value of a variable that has been reset. This value is treated using the natural laws, i.e., 0 + x = x, 0 * x = 0. Resetting a variable cuts data-flow, so the abstraction of X := 0 reflects this in the LDFM: Ev M X i := 0 s ⇒ (s[i ← 0], E(i)) Additional changes are necessary in the analysis of loops. If the loop bound is definitely zero, we know for certain that the loop body is not executed (unlike the general case, where skipping the loop is just a possibility). Thus we rewrite the corresponding section of our interpreter as follows: and similarly for Ev MM . Inside Ev M (respectively Ev MM ), we have to deal with the case that a definitely-zero variable is selected for the set Z of variables to be set to zero. Let us recall that this set serves two purposes: (1) To mask out non-self-dependent variables. If the variable is initially definitely zero, there is of course no need to mask it out, and moreover, replacing 0 with an ordinary zero may cause imprecision, in case that an internal loop depends on that variable. (2) To isolate the increment q in computations that set x i = x i + q, by putting 0 in x i . This has to work, even in the case that x i is initially 0; while, as above, replacing 0 with an ordinary zero would be wrong. Unbounded "value". In [BAP16] we proposed to extend our language with an instruction X := *, sometimes called "havoc X." The intention is to indicate that X is set to a value which is not expressible in the core language. This may be useful if we want to use our core-language programs as an abstraction of real programs, that possibly perform computations that we cannot express or bound by a polynomial. It may also be used to model a while-loop (as done, for example, in [JK09]): we do not attempt to analyze the loop for termination or even for bounds (this is outside the scope of our algorithm; such loops depend on concrete conditionals which we do not model at all). So all we suggest is to analyze what happens in the loop under the assumption that we cannot bound the number of iterations. This can be simulated by setting the loop bound to *. The implementation is very similar to that of reset, and moreover, they can be combined. The rules for calculating with * are: * · 0 = 0, and * · m = * for any other monomial m. And, of course, if an output variable comes out as * it is reported to have no polynomial upper bound.

Related Work
Bound analysis, in the sense of finding symbolic bounds for data values, iteration bounds and related quantities, is a classic field of program analysis [Weg75,Ros89,LM88]. It is also an area of active research, with tools being currently (or recently) developed including COSTA [AAG + 12], AProVE [GAB + 17], CiaoPP [LDK + 18] , C 4 B [CHS15], Loopus [SZV17] and CoFloCo [FH14]-and this is just for imperative programs. There is also work on functional and logic programs, term rewriting systems, recurrence relations, etc. which we cannot attempt to survey here.
Our programming language, which is based on bounded loops, is similar to Meyer and Ritchie's language of Loop Programs [MR67], which differed by being deterministic and including constants; it is capable of computing all the primitive recursive functions and too strong to obtain decidability results. Similar languages have been used as objects for analysis of growth-rate (in particular, identifying polynomial complexity) in [ JK09]. These works, too, considered deterministic languages for which precise analysis is impossible, but on reading them, one can clearly see that there are some clear limits to the aspects of the language that they analyze (such as using a loop counter as a bound, not relying on the assumption that the iteration count is always completed). Such considerations led to the definition of the weak, non-deterministic language in [BJK08]. Recent work in static analysis of fully-capable programs [BEF + 16] combines a subsystem for loop-bound analysis (via ranking functions) with a subsystem for growth-rate analysis, which establishes symbolic bounds on data that grow along a loop, based on loop bounds provided by the loop analysis. This may be seen, loosely speaking, as a process that reduces general programs to bounded-loop programs which are then analyzed. Our previous paper [BAH20] was, however, the first that gave a solution for computing tight polynomial bounds which is complete for the class of programs we consider. The problem of deciding termination of a weak programming language has been considered in several works; one well-studied example considers single-path while loops with integer variables, affine updates, and affine guard conditions [Tiw04,Bra06,HOW19]. Its decidability was first proved in [HOW19], but it is not shown to what complexity class this problem belongs. It would be interesting to see whether this can also be shown to be PSPACE-complete.
Dealing with a somewhat different problem, [MOS04, HOP + 18] both check, or find, invariants in the form of polynomial equations, so they can identify cases where a final result is precisely polynomial in the input values. We find it remarkable that they give complete solutions for weak languages, where the weakness lies in the non-deterministic control-flow, as in our language. However, their language has no mechanism to represent bounded loops, whereas in our work, it is all about these loops.

Conclusion
We have complemented previous results on the computability of polynomial bounds in the weak imperative language defined in [BJK08]. While [BJK08] established that the existence of polynomial bounds is decidable, [BAH20] showed that tight bounds are computable. Here we have shown that they can be computed in polynomial space, and that this is optimal in the sense of PSPACE-hardness. We have thus settled the question of the complexity class of this program-analysis problem. Interestingly, this improvement required some new ideas on top of [BAH20], including the connection of multivariate bounds to univariate bounds based on the geometry of the set of monomial bounds, and the computation of bounds using an abstraction which is space-economical, including only degree vectors and data-flow matrices.
Some challenging open problems regarding bound computation remain: (1) Whether the problem is solvable at all for various extensions to the language (see [BAK12,BAH20] for further discussion).
(2) How to compute more precise polynomial bounds (with explicit constant factors); can we make them tight? (3) Computation of tight super-polynomial bounds (currently, if a variable cannot be bounded by a polynomial, we give no upper bound).
Another problem we are interested in is: if one of our programs computes a polynomiallybounded result, how high can the degree of the bounding polynomial be, in terms of the size of the program? We conjecture that the degree is at most exponential in the program Figure 5: Dependence graph for the example MP. Nodes are labeled x j rather than just j for readability.
Example A.6. Next, we give an example for a complete computation of Solve . We consider the loop: loop X 3 { X 1 := X 1 + X 2 ; X 2 := X 2 + X 3 ; X 4 := X 3 } The body of the loop is evaluated symbolically and yields the abstract multi-polynomial: p = x 1 + x 2 , x 2 + x 3 , x 3 , x 3