Invariant Generation through Strategy Iteration in Succinctly Represented Control Flow Graphs

We consider the problem of computing numerical invariants of programs, for instance bounds on the values of numerical program variables. More specifically, we study the problem of performing static analysis by abstract interpretation using template linear constraint domains. Such invariants can be obtained by Kleene iterations that are, in order to guarantee termination, accelerated by widening operators. In many cases, however, applying this form of extrapolation leads to invariants that are weaker than the strongest inductive invariant that can be expressed within the abstract domain in use. Another well-known source of imprecision of traditional abstract interpretation techniques stems from their use of join operators at merge nodes in the control flow graph. The mentioned weaknesses may prevent these methods from proving safety properties. The technique we develop in this article addresses both of these issues: contrary to Kleene iterations accelerated by widening operators, it is guaranteed to yield the strongest inductive invariant that can be expressed within the template linear constraint domain in use. It also eschews join operators by distinguishing all paths of loop-free code segments. Formally speaking, our technique computes the least fixpoint within a given template linear constraint domain of a transition relation that is succinctly expressed as an existentially quantified linear real arithmetic formula. In contrast to previously published techniques that rely on quantifier elimination, our algorithm is proved to have optimal complexity: we prove that the decision problem associated with our fixpoint problem is in the second level of the polynomial-time hierarchy.


Introduction
Static program analysis aims at deriving properties that are valid for all possible executions of a program, through an algorithmic processing of its source or object code.Examples of interesting properties include: "the program always terminates"; "the program never executes a division by zero"; "the program never dereferences a null pointer"; "the value of variable x always lies between 1 and 3"; "the output of the program is well-formed XHTML".There is considerable practical interest in being able to prove such properties automatically, in particular for software used in safety-critical applications, e.g., in fly-by-wire flight control systems in aircraft [60].
1.1.Abstract interpretation.It is well-known that fully automatic, sound and complete program analysis is impossible for any nontrivial property regarding the final output of a program. 1 All analysis methods therefore suffer from at least one of the following limitations: they may be limited to programs with finite (and not too large) memory, or to bounded execution times; they may be unsound (they may infer untrue properties); or they may be incomplete (they fail to prove certain true properties).In this article, we use the abstract interpretation framework of Cousot and Cousot [18] to construct a static analysis technique that is sound, but incomplete.
Static analysis by abstract interpretation replaces the computation over concrete reachable states by computations over symbolically represented sets of concrete states.The sets are taken from an abstract domain.For instance, one may aim at computing, for each program point p and each program variable x, an interval in which the value of x is guaranteed to lie whenever the program reaches program point p.An analysis solely based on such intervals is known as interval analysis [17].More refined numerical analyses include, for instance, finding for each program point an enclosing polyhedron for the vector of program variables [19].By restricting the analysis to handle only sets found within a particular abstract domain (e.g., Cartesian products of intervals or convex polyhedra), one can make the problem tractable, at the expense of over-approximation.For instance, if the domain in use consists of convex shapes, only, non-convex invariants will necessarily get over-approximated.
In addition to the abstract domain not being able to represent the required properties, a major source of imprecision is the use of widening operators to enforce the convergence of Kleene iterations within finitely many iteration steps [18].These operators extrapolate the first iterates of the Kleene sequence, say, of the intervals [0, 1], [0, 2], [0, 3], . . . to a plausible limit, say [0, +∞), ensuring termination of the accelerated iteration.However, such an accelerated iteration may overshoot the target, leading to further over-approximations of the desired result.In order to regain precision lost by widening, one can then apply narrowing.In its simplest form, narrowing is a descending iteration towards a fixpoint that strengthens the invariant step by step.For more detailed information on Kleene iteration techniques in the context of abstract interpretation, we refer the reader to Cousot and Cousot [18].Many variants of this basic iteration scheme have been proposed to alleviate the overapproximations introduced by widening [34,35,38].However, all these techniques do not guarantee to find the strongest inductive invariant that can be expressed in the abstract domain in use.
Let us illustrate the above mentioned weaknesses on the following simple example: i = 0 ; while ( t r u e ) { i f ( i < 1 0 ) i = i + 2 ; e l s e goto end ; } end : p r i n t f ( " i = %d\n" ) ; The strongest invariant, that is, the set of reachable states, is given by the proposition i ∈ {0, 2, 4, 6, 8, 10}, which, together with the exit condition i ≥ 10, yields i = 10 as the only possible final value of i at program point end.Interval analysis by Kleene iterations with widenings computes the intervals [0, 0], [0, 2], [0, 4] and may then widen to [0, +∞).The narrowing phase yields the inductive invariant i ∈ [0 , 11].From this we can conclude that the final value of i is in the interval [10,11].The obtained interval [0,11] represents the strongest inductive invariant that can be expressed as an interval. 2It is, however, not the strongest invariant expressible as an interval, which is i ∈ [0, 10].The invariant i ∈ [0, 10] is not inductive, because a state with i = 9 is mapped to a state with i = 11 by one iteration of the loop.
Unfortunately, small changes to the above program can make the widening/narrowing approach fail to produce a good invariant.Consider, for instance, the introduction of an additional non-deterministic choice, represented by the function choice(): i = 0 ; while ( t r u e ) { i f ( c h o i c e ( ) ) { i f ( i < 1 0 ) i = i + 2 ; e l s e goto end ; } } end : p r i n t f ( " i = %d\n" ) ; The program still outputs the value 10, whenever it terminates.The only difference from the first version of the program is that there is, in each iteration, a non-deterministic choice whether or not the original loop body is to be executed.If we perform the widening/narrowing technique on the modified version, the widening phase will produce the same result [0, +∞).However, the narrowing phase is now not able to regain any precision lost due to widening.The loop body represents the relation τ = {(i, i) | i ∈ Z} ∪ {(i, i + 2) | i ∈ Z and i < 10}.This relation is reflexive, that is, (i, i) ∈ τ for all i ∈ Z.The problem is of a general nature: Whenever the transition relation τ of a loop is reflexive, descending iterations fail to improve the inductive invariant obtained by widening.
Of course, on such a simple example, one could use simple tricks to get rid of the imprecision and recover the interval [0, 11]: remove the identity from the transition relation (this does not change the set of all (inductive) invariants), or try a form of widening with 2 Some presentations of Hoare logic or static analysis call "invariant" what we refer to in this article as "inductive invariant": a set (or a logical formula defining such a set) containing all initial states and stable by the transition relation.In our terminology, an invariant is merely a property true at all times.With these definitions, an inductive invariant is an invariant by induction on the length of the execution trace, thus the terminology; however an invariant is not necessarily inductive.Consider the initial state (x, y) = (1, 0) and a transition consisting in a 45°clockwise rotation around (0, 0) : (x, y) ∈ [−1, 1] × [−1, 1] is an invariant (it is always true), but it is not inductive because [−1, 1] × [−1, 1] is not stable by this rotation.
thresholds, also known as widening "up to" [39].However, such approaches are brittle and may fail for more complex programs.
1.2.Alternatives to the widening/narrowing approach.Because of the known weaknesses of the widening/narrowing approach, alternative methods have been proposed.Finding an inductive invariant in an abstract domain can be recast as solving a constraint system.Finding the strongest inductive invariant is then the problem of finding a minimal solution to the constraint system.The technique described in this article is related to two recently proposed approaches, which we shall now briefly describe.
Quantifier elimination.Monniaux [48] considers abstract domains where elements are defined by a logical formula I (more specifically, a conjunction of linear inequalities) that links the program variables to some parameters.For instance, intervals on two variables x, y are defined by I := −l x ≤ x ≤ u x ∧ −l y ≤ y ≤ u y , where l x , u x , l y , u y are the parameters.An element from the abstract domain defined by the template I is specified by an assignment of values to the parameters.
Consider a set of initial states given by a formula ι (in the above example, with free variables σ = (x, y)) and a transition relation given by τ (in the above example, with free variables (σ, σ ) = (x, y, x , y )).I defines an inductive invariant for ι and τ if and only if Here, I(σ) is the formula I as above and I(σ ) is the formula I with σ replaced by σ .The free variables of formula (1.1) are the parameters in I.In the above example, they are l x , u x , l y , u y .Any satisfying assignment to these variables defines an inductive invariant from the abstract domain.A least inductive invariant in the abstract domain is then defined by constructing, using formula (1.1) as a building block, a formula whose solution is the minimal solution of (1.1), using that, for any formula F , x 0 = min{x | F (x)} if and only if The static analyzer then proceeds as follows: transform the loop into a set of initial states ι and a transition relation τ .From these formulas, construct Formula 1.2.Then, call a solver capable of dealing with quantified formulas, e.g, a quantifier elimination procedure or a lazy version thereof such as the one developed by Monniaux [47].
As an extension to this framework, ι and τ may have additional variables, e.g., precondition or system parameters.The formula defining the least inductive invariant will then take the invariant parameters as a partial function (in the mathematical sense, that is, as a binary related each input to at most one output) of these precondition or system parameters.By quantifier elimination and further processing of the formula, it is possible to turn this formula into a closed-form function, and even into executable code computing that function (a tree of if-then-else statements with assignments at the leaves).
This approach allows to effectively synthesize best abstract transformers (α • τ • γ in the notation of Cousot and Cousot [18]).Unfortunately, quantifier elimination over linear real arithmetic is still very costly, despite the various recent works on this problem, and quantifier elimination over linear integer arithmetic and polynomial real arithmetic are even costlier.
The technique described in this article considers the same problem as the quantifier elimination approach, but without preconditions or system parameters.Our technique also uses a different algorithmic approach, called max-strategy iteration.
Strategy Iteration.In this article, we introduce a refinement of the max-strategy iteration technique of Gawlitza and Seidl [26] for template linear constraint domains.The phrase "strategy iteration", also known as "policy iteration", comes from game theory.Let us consider two-players zero-sum games: the outcome of such a game is a real number, the two players (the maximizer and the minimizer) aim at maximizing (respectively, minimizing) the outcome.Strategy iteration is a method for computing the optimal strategy for one of the players.It successively improves a strategy through the following two steps until an optimal strategy is found: (Evaluation) Evaluate the currently selected strategy; and (Improvement) try to improve the currently selected strategy w.r.t. the result of the evaluation.
The max-strategy iteration technique of Gawlitza and Seidl [26] for finding invariants is inspired by this game-theoretic approach.Instantiated on template linear constraint domains, it computes the strongest inductive invariant that can be represented by polyhedra of the form P (b) = {x ∈ R n | T x ≤ b}, where T ∈ R m×n is a template constraint matrix, which is fixed before the analysis is run (heuristics for finding a suitable matrix are outof-scope for this article).The variable x is the vector of program variables.The template constraint matrix T is the counterpart of the template I from the quantifier elimination technique of Monniaux [48].Given T , every vector b ∈ R m uniquely determines a polyhedron P (b).The vector b contains the bounds on the linear functions that are represented by the rows of T .With the appropriate choice of T we can, among others, express the popular interval [17] and octagon [45,46] abstract domains.
Similarly to Kleene iterations, the max-strategy improvement algorithm produces an ascending sequence of pre-fixpoints that are less than or equal to the least inductive invariant we are aiming for.The pre-fixpoints are obtained through convex optimization techniques, e.g., linear programming.In contrast to Kleene iterations, though, the algorithm converges to the least inductive invariant after at most exponentially many steps.Our conjecture is that it usually converges fast in practice, though one can concoct artificial examples that exhibit exponential behavior.
Trace partitioning.Max-strategy iteration rids us of imprecisions introduced by widening, but, per se, does not remove imprecisions introduced by another operation: the merging of information from different program paths at join nodes in the control flow graph.In this article, we introduce a refinement of max-strategy iteration where we distinguish the various execution paths, in a manner similar to the work of Monniaux [48], and Monniaux and Gonnord [49].
In most systems for static analysis by abstract interpretation, joins in the control-flow graph result in computations of least upper bounds in the abstract domain.For instance, consider abstract interpretation over general convex polyhedra on the following program: i f ( x >= 0 ) y = x ; e l s e y = −x ; i f ( y >= 1 ) z = 3 .5 / x ; The program divides 3.5 by the value of x provided that the absolute value of x is at least 1.A static analyzer that uses convex polyhedra as abstract domain may work as follows.
Figure 1: On the left: the graph of y = |x| is the union of two half-lines, but computing their convex hull yields the grayed shape.By intersection with y ≥ 1, we obtain the shape on the right, which contains points with x = 0 even though y = |x| ∧ y ≥ 1 has no solution with x = 0.
Figure 2: Instead of considering two transitions (corresponding to a first if-then-else) followed by convex hull followed by two transitions (corresponding to a second ifthen-else), as on the left, we get better precision by considering the four product transitions, as on the right.
After the first if-then-else statement, a convex hull is computed between the x ≥ 0 ∧ y = x and x < 0 ∧ y = −x half-lines, resulting in a much larger polyhedron (see Fig. 1).The imprecision introduced by this operation prevents the analyzer from proving that a division by zero at line 3 is impossible.One solution is to get rid of all convex hulls corresponding to control flow joins by removing all control flow joins, except those corresponding to loop headers, by combining control flow edges.For instance, n successive if-then-else constructs can be turned into an expanded system of 2 n transitions (Figure 2 shows this construction for n = 2).This is close to the trace partitioning approach of Rival and Mauborgne [53]. 3 One could therefore run this exponential transformation first, and then run max-strategy iteration or min-strategy iteration (Sec.1.4).However, this transformation causes an exponential blowup and is therefore clearly not scalable.
In this article, we describe an algorithm that yields the same result as max-strategy iteration on this exponentially larger system.Our algorithm uses only polynomial space.It achieves this by keeping the exponentially large system implicit.
Path focusing.Henry et al. [40], Monniaux and Gonnord [49] propose to run the classical Kleene iterations with widening and narrowing scheme not on the original control-flow graph, but on this exponentially larger system.In this approach, iterations are run on a distinguished subset of the original control nodes, such that all cycles in the original control flow graphs cross at least one of these distinguished nodes, using transitions corresponding to the simple paths between these distinguished nodes in the original control flow graph.The expanded control multigraph is kept implicit: the transitions, corresponding to simple paths in the original graph, are obtained on demand as solutions to SMT problems.This approach has the following advantages: (1) It fully does away with imprecisions introduced by "join" operations, except those corresponding to loops.(2) The transition relations on the simple paths may be accelerable.That is, they can be dealt with through acceleration techniques (cf.Sec.1.4, [32,33,43]).
(3) While it uses widening operators, it does away with some of the imprecisions they introduce by focusing on one path at a time, which allows the use of narrowing iterations even on programs where they fail to yield better precision with the classical iteration scheme.The technique we present in this article combines the idea of implicit representation with max-strategy iteration.
1.3.Contributions.The main contribution of this article is an algorithm that computes the strongest inductive invariant of the expanded transition system (which allows higher precision for abstract interpretation) without actually constructing it.We shall see later the exact definition, but here is an interesting particular case (the general result allows more complex control flow): given a m × n matrix A, an initial value ι ∈ Q n and a transition relation τ over Q n , defined by a formula over variables x 1 , . . ., x n , x 1 , . . ., x n , built with non-strict linear (in)equalities, ∧, ∨ and prenex ∃, compute the least set of the form P (b) = {x ∈ R n | Ax ≤ b} (that is, compute b) containing ι and stable by the transition relation τ ; equivalently, find the least loop invariant of the form Ax ≤ b for the loop with initial state ι and loop body expressed by τ .
Our algorithm can be performed in polynomial space and exponential time.It works in a demand-driven fashion: elements from the exponentially-sized sets of strategies and loop-free paths are enumerated only as needed, and one can thus hope that they will not all be enumerated, which seems to be confirmed by our preliminary experiments.
We also consider the following associated decision problem, which we shall later make more formal: "Given a control-flow graph (with N vertices) and transition relations written as existentially quantified first-order linear real arithmetic formulas, a family A 1 , . . ., A N of matrices, an initial control state and a "bad" control state b, does there exist vectors b 1 , . . ., b N such that We show this problem to be Σ p 2 -complete (at the second level of the polynomial time hierarchy [50, ch. 17]), even if N = 1 and the matrix is 1 × 1. Equivalently, the negated problem (abstract reachability of a statement) is shown to be Π p 2 -complete.Assuming the polynomial hierarchy does not collapse, this mean that this problem can be solved in polynomial space, but is harder than NP-complete and coNP-complete problems.This clearly justifies the use of an exponential-time algorithm.
1.4.Other related Work.Many approaches have been proposed to address the imprecisions caused by widening operators.We now briefly describe approaches related to ours, in addition to those that we directly build upon (Sec.1.2).Halbwachs et al. [39] proposed widening "up to" (an idea resurrected in the Astrée system as widening with thresholds [8,9]), which extracts syntactic hints for limiting widening.Bagnara et al. [4,5] proposed improvements over the "classical" widenings on linear constraint domains [37].Gopan and Reps [34] introduced "look-ahead widening" [34] and "guided iterations" [35]: standard widening-based analysis is applied to a sequence of syntactic restrictions of the original program, which ultimately converges to the whole program; the idea is to distinguish phases or modes of operation in order to make the widening more precise.Some other techniques fully do away with widenings [13,15,55], for instance by expressing the invariants as solutions of a mathematical programming problem [36], and thus the least invariant in the domain as an optimal solution to this problem.
In some cases, it is possible to compute exactly the transitive closure of the transition relation, or the application of the transitive closure to given initial states, or at least to compute a good over-approximation thereof.Such acceleration techniques [32,33,43] tend to have difficulties dealing with programs where the control flow is not flat (multiple paths within the loop body).
In Section 1.2, we sketched max-strategy iteration by an analogy to solving games where "max" operations correspond to control-flow joins and "min" operations to guards.If instead of choosing arguments to "max" operators, the strategy chooses them for "min" operators, we obtain min-strategy iteration [14,24].Min-strategy iteration solves a sequence of fixpoint problems with decreasing values always weaker or equivalent to the strongest inductive invariant in the domain.In general, this sequence does not necessarily converge to this least inductive invariant, but it does so under certain conditions (e.g., when all abstract transformers are non-expansive [1]).We investigated applying our "implicit representation" idea to the min-strategy approach, but encountered a stumbling block: while it is possible to decide whether a max-strategy is improvable using SMT solving on quantifier-free formulas, the equivalent for min-strategies necessitated quantified formulas, which defeats the purpose of doing away with quantifier elimination techniques.

Basics
2.1.Notations.B = {0, 1} denotes the set of Boolean values.The set of real numbers (resp.the set of rational numbers) is denoted by R (resp.Q).The complete linearly ordered set R∪{−∞, ∞} is denoted by R, similarly Q∪{−∞, ∞} is denoted by Q.For any expression (resp.term) e, we write e[e 1 /x 1 , . . ., e k /x k ] to denote the expression (resp.term) that is obtained from e by simultaneously replacing all occurrences of the variables x 1 , . . ., x k by e 1 , . . ., e k .
A partially ordered set D is called a lattice if and only if any two elements x, y ∈ D have a greatest lower bound and a least upper bound, denoted respectively by x ∧ y and x ∨ y.It is a complete lattice if and only if any subset X ⊆ D has a greatest lower bound and a least upper bound, denoted by X and X.The least element ∅ of a complete lattice is denoted by ⊥.The greatest element ∅ is denoted by .
Assume that D 1 and D 2 are partially ordered by ≤ 1 and ≤ 2 , respectively.A function We shall often use the following fundamental result: Theorem 2.1 (Knaster/Tarski [62]).Let D be a complete lattice and f : D → D monotone.The operator f has a least fixpoint and a greatest fixpoint, respectively denoted by µf and νf .Moreover, we have µf We denote the transpose of a matrix A by A .For x ∈ R, we denote the column vector (x, . . ., x) by x.We denote the i-th row (resp.the j-th column) of a matrix A by A i• (resp.A •j ).Accordingly, A i•j denotes the entry in the i-th row and the j-th column.We also use this notation for vectors and mappings f : n is partially ordered by the component-wise extension of ≤, which we again denote by ≤.That is, for all x, y ∈ R n , x ≤ y if and only if only if there exist weak-affine mappings f 1 , . . ., f m : R n → R such that f = (f 1 , . . ., f m ) .
Every affine mapping is weak-affine, but not vice-versa.In this article, we are concerned with mappings that are point-wise minimums of finitely many monotone and weak-affine mappings.Note that these mappings are in particular concave, i.e., the set of points below the graph of the function is convex.
2.2.Linear Programming.Linear programming aims at optimizing a linear objective function with respect to linear constraints.In this article, we consider linear programming problems (LP problems for short) of the form sup The LP problem is called infeasible if and only if the feasible space is empty.An element of the feasible space, is called feasible solution.A feasible solution x that maximizes c x is called optimal solution.
If A and b consist of rational entries, only, then the feasible space is nonempty if and only if it contains a rational point.An optimal solution exists if and only if there exists a rational one.In this article, we always assume that all numbers in the input are rational.
LP problems can be solved in polynomial time through the ellipsoid method [41] and interior point methods [57].However, the running-time of these algorithms crucially depends on the sizes of occurring numbers.At the danger of an exponential running-time in contrived cases, we can also instead rely on the simplex algorithm: its worst-case running-time does not depend on the sizes of occurring numbers (given that arithmetic operations, comparison, storage and retrieval for numbers are counted for O(1)).See for example Dantzig [20], Schrijver [57] for more information on linear programming.

SAT modulo linear real arithmetic.
The set of SAT modulo linear real arithmetic formulas Φ is defined through the following grammar: Here, c ∈ Q is a constant, x is a real valued variable, e, e , e 1 , e 2 are real-valued linear expressions, a is a Boolean variable and Φ, Φ , Φ 1 , Φ 2 are formulas.An interpretation I for a formula Φ is a mapping that assigns a real value to every real-valued variable and a Boolean value to every Boolean variable.We write I |= Φ for "I is a model of Φ".That is, we firstly inductively define a function e that evaluates a linear expression e as follows: Secondly, we inductively define the relation |= as follows: The problem of deciding the satisfiability of SAT modulo linear real arithmetic formulas is NP-complete.There nevertheless exist efficient solver implementations for this decision problem, generally based on the DPLL(T) approach, an extension of the DPLL algorithm for SAT to richer logics.For more information see for example Biere et al. [7], Dutertre and de Moura [22], and Kroening and Strichman [42].Such implementations, on satisfiable instances, can provide a model over Booleans and rational numbers.
In order to simplify notations we also allow matrices, vectors, the relations ≥, >, =, =, and the Boolean constants 0 and 1 to occur in SAT modulo linear real arithmetic formulas.A program uses n real-valued variables x 1 , . . ., x n .A state is described by a vector x ∈ R n .We assign a collecting semantics s : 2 R n → 2 R n to each statement s ∈ Stmt.The collecting semantics s is an operator that assigns a set s (X) of possible states after the execution of s to a set X of possible states before the execution of s.The set Stmt of statements is specified subsequently.The collecting semantics V of a program G = (N, E, st) is finally defined as the least solution of the following constraint system:

The Framework
Here, for any v ∈ N , the variable V[v] takes values in 2 R n .The components of the collecting semantics V are denoted by V [v] for all v ∈ N .Throughout this article, we will usually denote variables in bold face, and values in normal face.

3.2.
Statements.The set Stmt of all statements is the set of all SAT modulo linear real arithmetic formulas without Boolean variables and without negation.Note that non-strict and strict inequality constraints are permitted.The formula e 1 = e 2 is also permitted, since it is an abbreviation for e 1 < e 2 ∨ e 2 < e 1 .We can (in linear time) transform any SAT modulo linear real arithmetic formula without Boolean variables into this form by pushing negations to the leaves.The R-valued variables x 1 , . . ., x n and x 1 , . . ., x n , that may occur in the formula, play a particular role.The values of the variables x 1 , . . ., x n represent the values of the program variables before executing the statement, and the values of the variables x 1 , . . ., x n represent the values of the program variables after executing the statement.For convenience, we denote the vectors (x 1 , . . ., x n ) and (x 1 , . . ., x n ) also by x and x , respectively.In addition to x 1 , . . ., x n and x 1 , . . ., x n , the statement may also include other variables, which may stand for intermediate values computed (or non-deterministically chosen) during the execution of a program statement.Conceptually, these variables are existentially quantified.
We could also add Boolean variables, at the expense of some additional complexity in definitions, theorems and proofs.Note that this would not increase the expressiveness, since a Boolean variable y can be simulated by a real variable y by replacing all occurrences of y by y = 1, all occurrences of ¬y by y = 0, and conjoining ( y = 0 ∨ y = 1) to the formula.In practice, the direct support of Boolean variables may be beneficial for the efficiency.More generally, we can accommodate any formula feature that just expresses disjunctions in a compact way; the only requirement is not to generate negations.
The collecting semantics s : 2 Consider the following C-code snippet: x 2 = −x 1 ; Assume that x 1 and x 2 are of type int and that they are the only numerical variables.The effect of the C code snippet can be abstracted by the statement Note that a conjunct x i = x i is needed for all variables that do not change their values.A statement s is called merge-simple if and only if it is in disjunctive normal form, i.e., s is of the form s 1 ∨ • • • ∨ s k , where the statements s 1 , . . ., s k do not use the Boolean connector ∨.Any statement can be rewritten into an equivalent merge-simple statement in exponential time and space using distributivity.The crux of our main result is that our algorithm never needs to compute such an exponentially-sized disjunctive normal form.
If we convert Statement (3.3) into an equivalent merge-simple statement using distributivity, we get: A merge-simple statement s that does not use the Boolean connector ∨ at all is called sequential.Intuitively, sequential statements correspond to straight-line sequences of basic blocks.The merge-simple statement (3.4) non-deterministically chooses between executing one of the following sequential statement: The abstract semantics V of a program G = (N, E, st) is the least solution of the following constraint system: Here, for any v ∈ N , the variable V [v] takes values in D. The components of the abstract semantics V are denoted by V [v] for all v ∈ N .The abstraction is sound, i.e., the abstract semantics V safely over-approximates the collecting semantics 3.4.Template Linear Constraints.In this article we restrict our considerations to template linear constraint domains as introduced by Sankaranarayanan et al. [56].We assume that a template constraint matrix T ∈ R m×n is given.For technical convenience, we always assume w.l.o.g. that m ≥ 1 and each row of T contains at least one non-zero entry.The template linear constraint domain can be identified with the set R m .As shown by Sankaranarayanan et al. [56], the abstraction α : 2 R n → R m and the concretization γ : R m → 2 R n , which are defined by form a Galois connection.The template linear constraint domains contain intervals, zones, and octagons [45,46], with appropriate choices of the template constraint matrix T [56].For instance, if we have two variables x and y, and we abstract each variable by an interval as x ∈ [−l x , u x ] and y ∈ [−l y , u y ], the vector d is formed of (l x , l y , u x , u y ).Here, the matrix T is given by: and thus the concretization expresses: zones, and octagons are somewhat "obvious" choices, a common discussion with respect to template domains is how to find the templates, as opposed to the domain of convex polyhedra, where the convex hull and widening operations somewhat "discover" interesting directions in space.In this article, we shall assume that template matrices are given and refrain from discussing how they were obtained.

Improving the Precision of the Abstraction
Most abstract interpretation techniques consider a control-flow graph with transitions expressed as sequential statements only (see formal definition in Sec.3.2), that is, composed of atomic guards and assignments.An if-then-else construct with simple constructs (e.g., assignments) in both branches is thus expressed as two sequential statements, and a sequence of two such if-then-else constructs (one from point A to point B and one from B to C) is expressed as on the left of Figure 2: two sequential statements between A and B, and two sequential statements between B and C. As noted in the introduction (Sec.1.2), abstract interpretation techniques usually abstract the set of reachable states at point B. This may result in spurious states being considered in the abstraction, which in turn may result in the analysis tool being unable to prove desirable properties.
In this article, we apply an idea that is very similar to the path focusing technique of Monniaux and Gonnord [49].Given a program expressed as a control-flow graph with sequential statements on the edges, we first compute a feedback vertex set (a.k.a.cut-set) S, that is, a set of control nodes (the feedback vertexes) such that removing them cuts all cycles in the graph.Our original program is equivalent to a program where the only control nodes are those in the feedback vertex set, but edges carry arbitrary statements instead of sequential statements only (cf.Sec.3.2).The results of program analyses on this new graph, at nodes from the feedback vertex set S, are sound invariants for the original program.If information is needed at other nodes, we can compute it from the information we have for the nodes from S.
Since methods for obtaining compact formulas expressing these statements from the original program have already been described in other publications [49], we do not explain them in detail.Instead, we provide an example.x 1 = 0 x 1 ≤ 1000 The programs G 1 and G are equivalent w.r.t.their collecting semantics, i.e., Here, V 1 denotes the collecting semantics of G 1 and V denotes the collecting semantics of G. W.r.t. to the abstract semantics, G is usually more precise than G 1 , because we reduced the number of merge points.In general, we only have where V 1 denotes the abstract semantics of G 1 and V denotes the abstract semantics of G.This is independent of the abstract domain. 4et us make a few last remarks regarding the feedback vertex set.Abstract interpretation techniques usually use such a set to select widening points [16, §4.1.2].In contrast, our method uses this set to select the nodes where it over-approximates the set of reachable states; it does not over-approximate the set of reachable states at other nodes; widening is not involved at all.Finding a feedback vertex set of minimal cardinality is an NP-complete problem if the control-flow graph is arbitrary; such a set can however be found in linear time if the control-flow graph is reducible (in short, if loops have a single entry point) [59], which is the case for control-flow graphs directly obtained from structured programs (the method extends to certain irreducible graphs).The control-flow graph may however become irreducible if certain optimizations or partitioning techniques are used.A common heuristic is, for structured programs, to use loop headers, and for unstructured programs to use the targets of back edges from a depth-first traversal [10,11]; this heuristic does not guarantee that the feedback vertex set is minimal with respect to inclusion ordering, let alone cardinality.

Basic Observations
We now note down basic properties of the abstract semantics.Proof.Let i ∈ {1, . . ., m}.We get: It remains to show that s i• is a point-wise minimum of finitely many monotone and weak-affine operators.Since s[x/x, x /x ][</≤] is a conjunction of non-strict linear inequalities, there exist matrices A, A and A and a vector b such that, for all x and x , s[x/x, x /x ][</≤] is satisfiable if and only if there exists a x such that Ax+A x +A x ≤ b (the vector x stands for the other variables in s, which are implicitly existentially quantified).Thus, the optimization problem (5.2) can be rewritten as follows: Strong duality [12], also known as Farkas' lemma, thus gives us, provided that s i• (d) > −∞, i.e., the optimization problem is feasible, the following equation: Since y 1 ≥ 0 for all feasible solutions of the linear programming problem in (5.4), s i• coincides with a point-wise infimum of monotone and affine operators on the set {d ∈ That is, s i• is a point-wise infimum of monotone and weak-affine operators.Since the optimal value, provided that it exists, is attained at the vertices of the feasible space (finitely many), the point-wise infimum is a point-wise minimum of finitely many monotone and weak-affine operators.
The max-strategy improvement algorithm we adapt in this article heavily relies on the fact that, for all sequential statements s, s is a point-wise minimum of finitely many monotone and weak-affine operators.The latter statement especially implies that s is concave (see Gawlitza and Seidl [29] for precise definitions).
The number of vertices in the feasible space of the point-wise infimum in (5.4) may be exponential in the size of the original problem, and thus the representation as a point-wise minimum of finitely many monotone and weak-affine operators might contain an exponential number of such operators.This is not a problem since our algorithm never computes this decomposition explicitly.
Any polynomial-time method for evaluating the abstract semantics of sequential statements can be used to derive a polynomial-time method for evaluating merge-simple statements.The problem for arbitrary statement is more difficult.By clear equivalence with satisfiability solving modulo the theory of linear real arithmetic, we obtain: Lemma 5.3.The problem of deciding, whether or not, for a given template constraint matrix T , and a given statement s, s (∞) > −∞ holds, is NP-complete.

A Trivial Method for Computing Abstract Semantics.
Using the results we have obtained so far, the abstract semantics of a program G w.r.t.some template constraint matrix T can be computed using the following two-step procedure: (1) Replace each statement s of the program G with an equivalent merge-simple statement.This corresponds to an explicit enumeration of all paths between cut-points, which potentially causes an exponential blowup.(2) Apply the methods of Gawlitza and Seidl [26] to the obtained program to compute the abstract semantics V of G.Because of the possible exponential blowup, the above described method is impractical for most cases 5 .Our method eschews this blowup as follows: instead of enumerating all program paths, we shall visit them only as needed.Guided by a SAT modulo linear real arithmetic solver, our method selects a path through a statement s only when it is locally profitable in some sense.In the worst case, an exponential number of paths may be visited (Section 7.3); but one can hope that this rarely happens in practice.In cases in which our algorithm needs exponential time, it at least avoids the explicit exponential expansions.It uses only polynomial space.
abstract reachability is NP-hard.Even if all statements are merge-simple, we cannot expect a polynomialtime algorithm, since the problem of computing the winning regions of parity games is polynomial-time reducible to abstract reachability [27].

Max-Strategy Iteration
This section presents our main contribution.We adapt the max-strategy improvement schema of Gawlitza and Seidl [28] to obtain an algorithm to compute abstract semantics in the framework of this article.6.1.Notations.Before we go in medias res, we have to introduce some notations.A system E of (fixpoint) equations over R is a finite set {x 1 = e 1 , . . ., x n = e n } of equations.Here, x 1 , . . ., x n are pairwise distinct, R-valued variables and e 1 , . . ., e n are expressions over R. We denote the set {x 1 , . . ., x n } of variables of E by X E .We omit the subscript, whenever it is clear from the context.A function ρ : X → R is called a variable assignment.It assigns the value ρ(x) to each variable x ∈ X. Variable assignments are ordered by the point-wise extension of ≤ on R, i.e., ρ ≤ ρ if and only if ρ(x) ≤ ρ (x) for all x ∈ X.Since R is a complete linearly ordered set, the set X → R of all variable assignments is a complete lattice.The semantics e : (X → R) → R of an expression e is defined by x (ρ) := ρ(x) and f (e 1 , . . ., e k ) (ρ) := f ( e 1 (ρ), . . ., e k (ρ)), where x ∈ X, f is a k-ary operator on R, e 1 , . . ., e k are expressions, and ρ : X → R is a variable assignment.We define the operator E : (X → R) → X → R by E (ρ)(x) := e ρ for all equations x = e of E, all ρ : X → R, and all x ∈ X.A fixpoint equation x = e is called monotone if and only if all operators used in e are monotone.Then, the evaluation function e of e is monotone, too.Finally, the operator E is monotone for all systems E of monotone (fixpoint) equations.A variable assignment ρ is called a solution (resp.pre-solution, resp.post-solution) of E if and only if ρ = E (ρ) (resp.ρ ≤ E (ρ), resp.ρ ≥ E (ρ)).The least solution of E is denoted by µ E .If the operator E is monotone, then the fixpoint theorem of Knaster/Tarski (Theorem 2.1) ensures the existence of a uniquely determined least solution µ E .For a system E of equations and a pre-solution ρ, µ ≥ρ E denotes the least solution of E among the solutions of E that are greater than or equal to ρ, i.e., µ ≥ρ E = min{ρ | ρ = E (ρ ) and ρ ≥ ρ}.Again, if the operator E is monotone, then the fixpoint theorem of Knaster/Tarski ensures the existence of µ ≥ρ E , since the set {ρ | ρ ≥ ρ} is a complete lattice.6.2.Rewriting the Abstract Semantic Equations.The first step of our method consists of rewriting our static analysis problem into a system of monotone fixpoint equations over R. Assume that G = (N, E, st) is a program that has n variables, and T ∈ R m×n is a template constraint matrix.Recall that (w.r.t.T ) the abstract semantics of G is the least solution of the following constraint system (cf.(3.7) in Subsection 3.3): The constraint system has exactly one R m -valued variable . We obtain the following constraint system: for all i ∈ {1, . . ., m} (6.2) d u,m ) for all (u, s, v) ∈ E and all i ∈ {1, . . ., m} (6.3) The template constraint matrix T (only x 1 is taken into account in the template, thus the zero right column) Figure 4: The running example The fixpoint theorem of Knaster/Tarski (Theorem 2.1) ensures that the least solution of the above system of inequalities is the least solution of the following equation system: for all i ∈ {1, . . ., m} (6.4) We denote the above system of fixpoint equations by E(G, T ).From Section 5, we know that the right-hand sides of E(G, T ) are point-wise maxima of finitely many point-wise minima of finitely many weak-affine operators.We summarize the properties of E(G, T ): Lemma 6.1.Let G be a program and V its abstract semantics (w.r.t. the template constraint matrix ) for all program points v ∈ N and all i ∈ {1, . . ., m}.The right-hand sides of E(G, T ) are point-wise maxima of finitely many point-wise minima of finitely many weakaffine operators.Thus, they are in particular point-wise maxima of finitely many monotone and concave functions.
Examples 6.2.We again consider our running example specified in Figure 4(a).We want to perform the analysis w.r.t. the template constraint matrix T specified in Figure 4

6.3.
Adapting the Max-Strategy Improvement Algorithm.Following the lines of Gawlitza and Seidl [29], our starting point is a system E of monotone fixpoint equations of the form x = max Σ x , where x is a R-valued variable, and Σ x is a finite set of monotone and concave expressions over R.An expression e is called monotone (resp.concave) if and only if e is monotone (resp.concave). 6We treat a function from the finite set X of variables to R as a vector of |X| elements from R. In our application -recall that we aim at solving the equation system E(G, T ) -the sets Σ x are implicitly and succinctly given by the righthand sides of equations of the forms (6.4) and (6.5).Indeed, every expression of the form s i• (d u,1 , . . ., d u,m ), found on the right-hand side of such equations, can be equivalently rewritten into max { s 1 i• (d u,1 , . ., d u,m ), . . ., s k i• (d u,1 , . . ., d u,m )}, where s 1 , . . ., s k are (potentially exponentially many) sequential statements.Since s 1 , . . ., s k are sequential, the operators s 1 i• , . . ., s k i• are point-wise minima of finitely many monotone and weak-affine operators; hence they are monotone and concave operators.
One obvious way to solve the system E of equations is to perform the above mentioned rewriting explicitly and then apply the max-strategy improvement algorithm.To avoid this impractical exponential blowup, in what follows we modify the algorithm such that it directly works on the succinct representation.
Assume that E denotes a system of fixpoint equations of the form x = max Σ x , where Σ x is a finite set of monotone and concave expressions over R. A max-strategy σ for E is a system of equations such that, for each equation x = e of σ, one of the following statements holds: (1) e is −∞.
(2) e ∈ Σ x , where x = max Σ x is an equation of E. Intuitively, a max-strategy picks for each maximum operator one of its operands.For a system E of equations, we denote the set of all max-strategies by Σ E .In our application, the cardinality of Σ E is exponential in the size of E. To be more precise, it is in O(2 n 2 ), where n denotes the size of E. Enumerating all max-strategies is therefore impractical.Examples 6.3.We continue our running example (Figure 4).Consider the system E(G, T ) and note that s can be equivalently rewritten into Recall that this expansion is solely for the purpose of proving properties: it is not done in the algorithm.The equation system σ consisting of the equations is thus a max-strategy for this system.
6 For a precise definition of concavity for functions from the set R n → R m , we refer to Gawlitza and Seidl [31].For this article, however, a precise treatment of these issues is not required.We just mention concavity to give a better intuition. ( The above lemma implies that the algorithm returns the least solution, whenever it terminates.Whether or not it terminates depends on the properties of the class of fixpoint equation systems under consideration.In our application, we aim at computing the least solution of the equation system E(G, T ) (see Subsection 6.2).By Lemma 6.1, the right-hand sides of E(G, T ) are point-wise maxima of finitely many monotone and concave functions.More specifically, the right-hand sides are point-wise maxima of finitely many point-wise minima of finitely many weak-affine operators.This property guaranties the termination of the max-strategy improvement algorithm [31][28, §6.1].At the latest, it terminates after considering each max-strategy at most linearly often (see Lemma 6.8).Before we explain the remaining building blocks, i.e., how to execute program lines 4 and 5, we consider an example.
Example 6.6.We consider our running example.That is, we aim at computing the least solution of the equation system E(G, T ) shown in Figure 4. Running the algorithm can, for instance, give us the following trace: ) Here, for all i, ρ i+1 = µ ≥ρ i σ i+1 and σ i+1 is an improvement of σ i w.r.t.ρ i .The variable ρ 4 is a solution of E(G, T ).The max-strategy improvement algorithm terminates with the correct least solution, which is ρ 4 .
We now present methods to evaluate max-strategies (Line 5 of Algorithm 1) and to improve max-strategies (Line 4 of Algorithm 1).

6.4.
Evaluating Max-Strategies.We restrict our consideration to our application.That is, we assume that the equation system E is given by E = E(G, T ) for some program G and some template constraint matrix T .For all i ∈ N, this allows us to compute ρ i as follows: Lemma 6.7 ([31], [28]).Let i ∈ N. Recall that, by construction, ρ i+1 = µ ≥ρ i σ i+1 .The variable assignment ρ i+1 can be computed as follows: Let E denote the system of equations that is obtained from the equation system σ i+1 by performing the following steps: (1) Remove every equation x = e, where e (ρ i ) = −∞ and replace then the remaining occurrences of x with the constant −∞.
(2) Remove every equation x = e, where e (ρ i ) = ∞ and replace then the remaining occurrences of x with the constant ∞.For all equations x = e of the equation system σ i+1 with −∞ < e (ρ i ) < ∞, we can compute ρ i+1 (x) as follows: The value ρ i+1 only depends on the equation system σ i+1 and the set of variables already identified to be ∞, namely, {x | x = e is an equation of σ i+1 with e (ρ i ) = ∞}.
In consequence, the max-strategy improvement algorithm has to consider each max-strategy at most |X| times.Hence, we have: 31], [28]).The max-strategy improvement algorithm terminates after at most |X| • |Σ E | max-strategy improvement steps.
Lemma 6.7 gives us a method for computing ρ i .For each variable x ∈ X, we have to compute The Here, y 1 , . . ., y n , y 1 , . . ., y n are fresh variables.Φ is a set of linear inequalities that is obtained from the sequential statement s by (1) replacing the variables x 1 , . . ., x n , x 1 , . . ., x n with the fresh variables y 1 , . . ., y n , y 1 , . . ., y n , (2) replacing all other variables of s with fresh variables, and (3) replacing every strict inequality < with a non-strict inequality ≤.We denote the resulting constraint system by C. By construction, we have: The construction can be carried out in polynomial time.Since C is a set of linear constraints, we can use linear programming to compute the optimal value.We have: Lemma 6.9 (Evaluating Max-Strategies).Whenever our max-strategy improvement algorithm has to compute µ ≥ρ σ , this can be performed by solving |X| linear programming problems of polynomial size.The linear programming problems do only depend on σ and the set {x | x = e is an equation of σ with e (ρ) = ∞}.Example 6.10.We now discuss how to compute ρ 3 := µ ≥ρ 2 σ 3 from Example 6.6.Note that the values of the variables d st,1 and d st,2 are already known to be ∞.It remains to determine the values for the variables d 1,1 and d 1,2 .According to Lemma 6.7, we have Observe that Φ ∧ Φ 2 can be equivalently rewritten into Thus, according to the above observations, ρ 3 (d 1,1 ) is the optimal value of the following linear programming problem: Since the optimal value is 1, we get ρ 3 (d 1,1 ) = 1.Similarly, to compute ρ 3 (d 1,2 ), we compute the optimal value of the following linear programming problem: This gives us ρ 3 (d 1,2 ) = 0.
Both linear programming problems have the same feasible space.This can be utilized in an implementation to improve the performance.Furthermore, 1,2 for any optimal solution (d * 1,1 , d * 1,2 , y * 1 ) of the following linear programming problem: Hence, for this example, it is sufficient to solve one linear programming problem to determine the variable assignment ρ 3 .
The technique for evaluating max-strategies can thus be further optimized.It is not necessary to solve one linear program for each variable.Instead, it is possible to evaluate a max-strategy entirely by solving only two linear programming problems of linear size.The solution of the first linear programming problem tells us which variables are to set to ∞.The solution of the second linear programming problem provides us with the values of the variables which receive finite values.In this article, we do not elaborate on these techniques.6.5.Improving Max-Strategies.We now discuss how we can compute an improvement of a max-strategy σ w.r.t. a variable assignment ρ.Since, by Lemma 5.3, this problem is NP-hard, we cannot expect to come up with a polynomial time algorithm.We propose a solution that utilizes SMT solving techniques.
Let us first explain the intuition of our method, which is very similar to how the "path focusing" technique from Monniaux and Gonnord [49] selects the next iteration path.A strategy needs improvement if and only if its value does not define an inductive invariant.In other words: there is an outgoing transition from the "invariant candidate" into its complement, meaning that there is an execution trace through a statement, starting from the invariant candidate and ending with a violation of the current bounds.Whether this holds is a SAT problem modulo (SMT) the theory of linear real arithmetic; it can therefore be solved by SMT-solvers.Furthermore, the solution from the SMT problem picks one of the sequential statements from the merge-simple expansion of the statement as "offending", explaining why the invariant candidate is not an invariant; in other words, it points to a possible improvement in the strategy.More generally, the set of solutions of the SMT problem maps to the possible improvements.
Let us now see this process more formally.Assume that we have to improve a given max-strategy for the equation system w.r.t. a variable assignment ρ, which is a solution of σ, i.e., ρ = σ (ρ).This is exactly the situation we are concerned with, when we execute our max-strategy improvement algorithm.
For each i ∈ {1, . . ., n}, we now want to check whether or not ρ(x i ) < e i ρ.If this is the case, we moreover want to compute a max-strategy σ i for e i such that ρ(x i ) < σ i ρ.Note that, since ρ(x i ) < e i ρ, we could also compute a max-strategy σ i such that σ i ρ = e i ρ.
Given an equation x = e and a variable assignment ρ, we must decide whether or not ρ(x) < e (ρ) holds, and compute a max-strategy σ of e such that ρ(x) < σ (ρ) holds.Recall that the semantic equations we are concerned with in this article are of the form x = max {e 1 , . . ., e k } (6.36) where, for all i ∈ {1, . . ., k}, each expression e i is either a constant or an expression of the form s j• (x 1 , . . ., x m ).Hence, we can answer the above question by answering the question for each argument e 1 , . . ., e k of the maximum separately.It thus remains to find a method to check whether or not, for a given statement s, a given j ∈ {1, . . ., m}, a given c ∈ R∪{−∞}, and a given d ∈ R m , s j• (d) > c holds -which is, by Lemma 5.3, a NP-hard problem.
Our approach is to construct the following SAT modulo linear real arithmetic formula (we use existential quantifiers to improve readability): Here, Ψ(s) is a formula that relates every x ∈ R n with all elements from the set s {x}.It is defined inductively over the structure of the statement s as follows: By again applying Lemma 6.11, we get σ M j• (d) > c and thus the following lemma: Lemma 6.12.By solving the SAT modulo linear real arithmetic formula Ψ(s, d, j, c) that can be obtained from s in linear time, we can decide, whether or not s j• (d) > c holds.
From a model M of this formula, we can, in linear time, obtain a ∨-strategy σ M for s such that σ M j• (d) > c.
Example 6.13.We again continue our running example, which is summarized in Figure 4. We want to know, whether or not s 1• (0, 0) > 0 holds.For that we compute a model M of the formula Ψ(s, (0, 0), 1, 0) which is given as follows: Ψ(s, (0, 0), 1, 0) ≡ ∃v ∈ R .Ψ(s, (0, 0) , 1) ∧ v > 0 (6.46)Ψ(s, (0, 0), 1) The formulas Φ, Φ 1 , and Φ 2 are defined in Figure 4. M = {a Φ 1 ∨Φ 2 → 1} is a model, which gives us the max-strategy σ M ≡ Φ ∧ Φ 2 for s.Thus, by Lemma 6.12, we have We must still provide a method for computing the values for the Boolean variables of a model of the formula Ψ(s, d, j, c).Most of the state-of-the-art SMT solvers, such as Yices [21,22], support the computation of models directly; the SMTLIB2 standard [6] has a get-assignment command that can be used to extract the Boolean part of a model.If this feature is not supported, one can compute the model, or only the values for the Boolean variables, using standard self-reduction techniques.
Recall that the semantic equations we are concerned with in this article are of the form x = max {e 1 , . . ., e k }, where each expression e i , for all i ∈ {1, . . ., k}, is either a constant or an expression of the form s j• (x 1 , . . ., x m ) where s is a statement.As discussed above, we can check whether or not ρ(x) < max {e 1 , . . ., e k } (ρ) holds, and if this is the case compute a max-strategy σ such that ρ(x) < σ (ρ) holds, by solving at most k SAT modulo linear real arithmetic formulas, each of which can be constructed in linear time.Equivalently, instead of running k SMT queries, each obtaining a part of the next strategy, we can rename Boolean variables of these SMT formulas so that they are distinct and query the conjunction of the resulting formulas.Lemma 6.14.Let x = e be an abstract semantic equation, ρ a variable assignment, and c ∈ R. By solving a single SAT modulo linear real arithmetic formula that can be obtained from e, ρ and c in linear time, we can decide, whether or not e ρ > c holds.From a model M of this formula, provided that e ρ > c holds, we can in linear time obtain a max-strategy σ M for e such that σ M ρ > c.
Remark that we did not discuss how to choose the next max-strategy σ , except that it should satisfy ρ(x) < σ (ρ) (which is ensured by the SMT-solving step).Indeed, there could be many different suitable σ s, and the SMT-solver may return any of them.There is however at least one that is locally optimal, that is, σ (ρ) is maximal, otherwise said σ (ρ) = e (ρ).Future work should include experiments on the performance impact of using the locally optimal strategies instead of just any strategies.
It is possible to obtain a locally optimal strategy by repeated calls to the SMT-solvers.A naive method would be to query the SMT-solver for a σ such that σ (ρ) < σ (ρ), then for a σ such that σ (ρ) < σ (ρ) and so on until there is no locally better strategy; the last strategy obtained is thus locally optimal.A less naive method would be to take a rough bound M ≥ e (ρ) and perform binary search in the interval [ σ (ρ), M ]: at each step, maintain an interval [a, b] and query whether there exists σ such that σ (ρ) ≥ a+b 2 ; if so, replace a by a+b 2 and restart, if not, replace b by a+b 2 and restart.The SMT-solving community is now considering the problem of optimization modulo theory [58] and we can hope for progress in this respect.

Complexity
In this section, we shall prove that the decision problem associated with our computation is at the second level of the polynomial hierarchy, even if there is a single feedback vertex, a single real variable, and a single constraint in the template.It is therefore unsurprising that our algorithm exhibits exponential complexity in the worst case, by enumerating an exponential number of strategies: we shall then provide an artificial example on which it is the case.

7.1.
A Lower Bound on the Complexity.In this section we show that the problem of computing abstract semantics of programs w.r.t. the interval domain is Π p 2 -hard.Π p 2hard problems are conjectured to be harder than both NP-complete and coNP-complete problems.For further information regarding the polynomial-time hierarchy see, for instance, Papadimitriou [50], Stockmeyer [61].Proof.We reduce the Π p 2 -complete problem of deciding the truth of a ∀ * ∃ * propositional formula [63] to our static analysis problem.Let Φ ≡ ∀x 1 , . . ., x n .∃y 1 , . . ., y m .Φ (7.1) be a formula without free variables, where Φ is a propositional formula.We consider the analysis of the following pseudo-C program, where n is a constant: In intuitive terms: this program initializes the program variable x to 0. Then, it enters a loop: compute into x 1 , . . ., x n the binary decomposition of x, and non-deterministically choose y 1 , . . ., y m .If Φ is true, it increments x by one and loops, unless x reaches 2 n in which case it terminates; otherwise, it just loops.Thus, there exists a terminating computation if and only if Φ holds.
We reformulate the above pseudo-C program into the program G = (N, E, st) that uses only one program variable x, where (1) N = {st, 1, 2} is the set of program points, and (2) E = {(st, x = 0, 1), (1, s, 1), (1, x ≥ 2 n , 2)} is the set of control-flow edges, where The statement s(Φ ) is obtained by taking formula Φ in negation normal form (all negations pushed to the leaves), leaving the Boolean structure in place and replacing each positive literal x by a test x = 1 and each negative literal ¬x by a test x = 0.With this formalization, Φ holds if and only if V [2] = ∅.For the abstraction, we consider the interval domain, or even simply the domain of upper bounds on x (i.e., we have constraints of the form x ≤ b).By considering the Kleene iteration, it is easy to see that V 2. An Example with Exponential Running Time Behavior.Recall that the number of strategy improvement steps is exponentially bounded by the size of the input.Each step consists in one phase of SMT-solving for linear real arithmetic followed by solving a linear program of polynomial size.Thus, each step can be performed in exponential time.Therefore, the whole algorithm can be executed in exponential time.
We shall now see that our algorithm takes exponential time on the instances that are similar to the instances generated from the reduction in the proof of Theorem 7.1.The instances generated from the reduction require Θ(2 n ) steps.However, the input is of size O(n 2 ), because the numbers 2 n−1 , 2 n−2 , . . ., 2 0 require space Θ(n 2 ).We modify the instances generated from the reduction in such a way that the sizes of the programs are in O(n).We achieve this by introducing auxiliary variables for the numbers 2 n−1 , 2 n−2 , . . ., 2 0 .For all n ∈ N, we define the program G n = (N, E, st), where with Here, x 1 is the only program variable.It is sufficient to use the template constraint matrix T = 1 , which corresponds to the template x 1 .That is, we are only interested in the upper bound on the value of the variable x 1 .Remark that the strategy iteration does not depend on the strategy improvement operator in use: at any time there is exactly one possible improvement, until the least solution is reached.All strategies for the statement s will be encountered.Thus, the strategy improvement algorithm performs 2 n strategy improvement steps.Since the size of G n is Θ(n), exponentially many strategy improvement steps are performed.

7.
3. An Upper Bound on the Complexity.In Section 7.1, we have provided a lower bound on the complexity of computing abstract semantics w.r.t. the template linear domains.The associated decision problem is not only Π p 2 -hard, but in fact Π p 2 -complete: Theorem 7.2.The problem of deciding, whether, for a given program G, a given template constraint matrix T , and a given program point v, V [v] > −∞ holds, is in Π p 2 .Proof.We consider the negation of the above problem: for a given program G, a given template constraint matrix T , a given program point v, and a given i ∈ {1, . . ., m}, decide whether V i• [v] = −∞; we shall now show that this problem is in Σ p 2 .In non-deterministic polynomial time we can guess a max-strategy σ for E := E(G, T ) and a set X ∞ of variables that have the value ∞; these will form the witness for the initial existential quantifier.We can evaluate the max-strategy σ w.r.t. the set of variables X ∞ assigned to +∞ in polynomial time using linear programming (cf.Subsection 6.4).Let ρ σ,X ∞ denote the resulting variable assignment.
We shall now show that checking whether this strategy (and set of infinite variables) is stable is in co-NP.Because of Lemma 5.3, we can use an NP oracle to check whether there exists an improvement of the strategy σ w.r.t.ρ σ,X ∞ , which is exactly the negation of being stable.
If the strategy is stable, we know that ρ σ,X ∞ ≥ µ E holds.Therefore, by Lemma 6.1, we have ρ σ,X ∞ (x v,i ) ≥ V i• [v] for all program points v ∈ N and all i ∈ {1, . . ., m}.Since we one could use the information obtained from the previously solved linear programs (that are similar).One can also utilize the information obtained from the SMT solver in order to obtain a feasible basis to start the simplex method with.

Conclusion and further research directions
We have proposed a method for computing the least fixpoints in template linear constraint domains (e.g., Cartesian products of intervals) of transition systems specified using linear real arithmetic formulas.This allows finding the strongest invariant in this domain of a loop consisting only in linear assignments and non-strict linear inequalities over the real numbers.
Because it distinguishes individual paths in the program, our method does not suffer from the imprecision induced by convex hull operations.These paths are looked up on demand, as results from satisfiability testing, therefore avoiding memory blowup.Our technique, however, has exponential worst case complexity, which is hardly surprising since the decision problem associated with our computation is Π p 2 -complete.Due to limited resources, we have so far not been able to implement it in a tool capable of running on real examples.
It is quite obvious that, due to the use of SMT queries, the size of the problems given as input, and their branching structure, must be limited.method for limiting the size of the SMT formulas is to decompose the program into statements, thus adding more points at which states are abstracted, as proposed by [49]: this simplifies the problem, but may reduce precision; another method is to restrict the analysis to a subset of the variables, determined by some form of dependency analysis.
The restriction to linear templates and linear statements may seem onerous.It might be possible to apply the same ideas for non-linear templates [30].With respect to non-linear statements, a possibility is to linearize them [44,46]: for short, assuming A ≤ x ≤ B where A and B are constants, then the nonlinear constraint z = xy may be abstracted by the linear constraint (Ay ≤ xy ≤ By ∧ y ≥ 0) ∨ (By ≤ xy ≤ Ay ∧ y < 0).If the assumptions made by the linearization are found not to hold for the fixed point computed by the max-strategy iteration technique, one has to relax these assumptions and restart the solving process.
More generally, one may envision a nesting of two iteration schemes: the inner scheme solving exactly, using max-strategy iteration, a simplification of the concrete program, the outer scheme iterating over possible simplifications.The outer scheme would deal with all program features not supported by our max-strategy iteration algorithm.Consider pointers, for instance: the outer scheme could temporarily assume that x and y may be aliased, while z is not aliased with anything, and then rewrite the program according to these assumptions in order to obtain a pointer-free program (may-alias information becomes non-deterministic choice, while must-aliased variables are merged).This outer iteration may be ascending and optimistic, starting with strong assumptions on the program and relaxing them progressively as the results of the inner scheme invalidate them, or decreasing and pessimistic, starting with weak assumptions and strengthening them progressively as the results of the inner scheme show them to be too severe.Such mixed approaches would cope with programs features not directly supported by our max-strategy iteration solver.Further work is needed in this direction to ascertain which techniques are usable.
Another problem is finding suitable templates -while there exist obvious choices in some cases (intervals for getting rough invariants of control applications, difference bounds for scheduling applications, etc.), there is no generic method for obtaining good templates.Amato et al. [2] proposed finding templates using principal component analysis, but it is yet unclear whether this approach suited to practical problems.A simple solution may be to run some conventional polyhedral analysis, and keeping the directions of the polyhedra obtained before widening.
Our max-strategy iteration algorithms only deal with real numerical values.We can cope with integers by relaxing them to reals, with the usual precautions (x < y converted to x ≤ y − 1).Another possible extension is to integrate Boolean types, or more generally finitely enumerated types, into the invariant, or equivalently, to insert them implicitly into the control flow.
An intriguing extension of our framework is the case where the control flow is specified implicitly.The problem considered in this article is expressed as a control-flow graph given by a list of nodes and statements over the transitions.Now consider the addition of n Boolean variables to the system: a common method to encode such variables in a transition system is to distinguish all Boolean combinations and every control node, and thus multiply the number of control nodes by 2 n .Clearly, we would prefer to work directly on the transition relation of the original program, which would include free Boolean variables encoding the departure and arrival control states, and consider our abstract reachability problem on programs expressed using this succinct representation.Since this problem includes Boolean reachability (also known as the reachability problem for succinctly represented graphs), which is PSPACE-complete [51], it is PSPACE-hard.Our strategy iteration approach can be extended to show that it is in coNEXPTIME.We conjecture that it is coNEXPTIMEcomplete, but we have so far not been able to prove it.It is also unknown whether some practically useful algorithms, perhaps based on binary decision diagrams (BDDs), could be devised for this problem.

3. 1 .
Control Flow Graphs and Collecting Semantics.In this article, we model programs as control flow graphs, i.e., a program G is a triple (N, E, st), where (1) N is a finite set of program points, (2) E ⊆ N × Stmt × N is a finite set of control-flow edges, and (3) st ∈ N is the start program point.

Figure 3 :
Figure 3: The program G 1 of the running example

5. 1 .
Abstract Semantics of Statements.Our first observation is that, for all sequential statements s and all d ∈ R m , s (d) can be computed efficiently.Lemma 5.1 (Sequential Statements).Let s be a sequential statement and d ∈ R m .The operator s is a point-wise minimum of finitely many monotone and weak-affine operators.For all d ∈ R m , s (d) can be computed in polynomial time through linear programming.

Lemma 5 . 2 (
Merge-Simple Statements).Let s be a merge-simple statement.The operator s is a point-wise maximum of finitely many point-wise minima of finitely many monotone and weak-affine mappings.For all d ∈ R m , s (d) can be computed in polynomial time through linear programming.Proof.Let s ≡ s 1 ∨ • • • ∨ s k , where s 1 , . . ., s k are sequential statements.Since s (d) = s 1 (d) ∨ • • • ∨ s k (d), Lemma 5.1, can be applied to provide us with the desired result.
equations of E are of the form b = s k• (b 1 , . . ., b m ), where b, b 1 , . . ., b m are R-valued variables, and s is a sequential statement.Thus, by Lemma 5.1, the right-hand sides are point-wise minima of finitely many monotone and weak-affine functions.Hence, they are monotone and concave.Therefore, (6.25) represents a convex optimization problem.The above convex optimization problem is of a very special form.The right-hand sides are parameterized linear programs.In consequence, the convex optimization problem can be rewritten into an equivalent linear programming problem as follows: In accordance to (5.1) and (5.2), in E , we replace each equation b = s k• (b 1 , . . ., b m ) with the following linear constraints: b ≤ T k• (y 1 , . . ., y n )

Theorem 7 . 1 .
The problem of deciding, whether, for a given program G, a given template constraint matrix T , and a given program pointv, V [v] > −∞ holds, is Π p 2 -hard.The problem remains Π p 2 -hard even if the program variables are abstracted at a single program point and the template constraint matrix T is restricted to a single variable x and a single constraint of the form x ≤ B.
5)3.3.Abstract Semantics.Let D be a complete lattice (for instance the complete lattice of all n-dimensional closed real intervals).Assume that α : 2 R n → D and γ : D → 2 R n form a Galois connection, i.e., for all X ⊆ R n and all d ∈ D, α(X) ≤ d if and only if X ≤ γ(d).The abstract semantics s : D → D of a statement s is then defined by s := α • s • γ.(3.6)Remark that we have chosen to use the best abstract transformer, i.e., the most precise abstract semantics.All that was needed for soundness is that s • γ(d) ⊆ γ • s (d) for all d ∈ D. Our choice of s (d), however, is the most accurate sound value.
∨ s 2 ) :≡ (¬a s 1 ∨s 2 ∧ Ψ(s 1 )) ∨ (a s 1 ∨s 2 ∧ Ψ(s 2 ))(6.41)Here, for every sub-formula s 1 ∨ s 2 of s, a s 1 ∨s 2 is a fresh Boolean variable.The set of free variables of the formula Ψ(s) is{x, x } ∪ {a s 1 ∨s 2 | s 1 ∨ s 2 is a sub-formula of s}.(6.42)The variables x and x are R n -valued variables.By construction, s[x/x, x /x ] is satisfiable if and only if Ψ(s)[x/x, x /x ] is satisfiable for all x, x ∈ R n .That is, s and Ψ(s) are describing the same relation.We therefore obtain the following lemma: Lemma 6.11.s j• (d) > c if and only if Ψ(s, d, j, c) is satisfiable.The difference between the formula s and the formula Ψ(s) is that the Boolean variables of the formula Ψ(s) additionally describe a path through the formula.More precisely, a valuation for the variables from the set {a s 1 ∨s 2 | s 1 ∨ s 2 is a sub-formula of s} describes a path through s.Let s be a statement, d ∈ R m , j ∈ {1, . . ., m}, and c ∈ R ∪ {−∞}.Assume now that s j• (d) > c.Our next goal is to compute a max-strategy σ for the statement s such that σ j• (d) > c.By Lemma 6.11, there exists a model M of Ψ(s, d, j, c).We define the max-strategy σ M for the statement s recursively by