Linear Programs with Conjunctive Database Queries

In this paper, we study the problem of optimizing a linear program whose variables are the answers to a conjunctive query. For this we propose the language LP(CQ) for specifying linear programs whose constraints and objective functions depend on the answer sets of conjunctive queries. We contribute an efficient algorithm for solving programs in a fragment of LP(CQ). The natural approach constructs a linear program having as many variables as there are elements in the answer set of the queries. Our approach constructs a linear program having the same optimal value but fewer variables. This is done by exploiting the structure of the conjunctive queries using generalized hypertree decompositions of small width to factorize elements of the answer set together. We illustrate the various applications of LP(CQ) programs on three examples: optimizing deliveries of resources, minimizing noise for differential privacy, and computing the s-measure of patterns in graphs as needed for data mining.


Introduction
When modeling optimization problems it often seems natural to separate the logical constraints from the relational data.This holds for linear programming with AMPL [FGK90] and for constraint programming in MiniZinc [NSB + 07].It was also noticed in the context of database research, when using integer linear programming for finding optimal database repairs as proposed by Kolaitis, Pema and Tan [KPT13], or when using linear optimization to explain the result of a database query to the user as proposed by Meliou and Suciu [MS12].Moreover, tools like SolveDB [ ŠP16] have been developed to better integrate mixed integer programming and thus linear programming into relational databases.
We also find it natural to define the relational data of linear optimization problems by database queries.For this reason, we propose the language of linear programs with conjunctive queries LP (CQ Σ ) in the present paper.An LP (CQ Σ ) program is a linear program with constructs allowing to express linear constraints and linear sums over the weightings of an answer set of database queries.It hence allows us to express an optimization problem with a linear objective function subject to linear constraints that are parameterized by conjunctive queries.To do so, we define the natural interpretation ⟨L⟩ D of an LP (CQ Σ ) L over a database D that is a linear program whose variables are in correspondence with the answer set of the queries of L, hence, ⟨L⟩ D expresses an optimization problem whose solutions are weightings of the answer sets of conjunctive queries.The optimal weightings of LP (CQ Σ ) programs can be computed in a natural manner, by first answering the database queries, then generating the interpretation of L over D and solving it by calling a linear solver.We then approach the question -to our knowledge for the first time -of whether this can be done with lower complexity for subclasses of conjunctive queries such as the class of acyclic conjunctive queries.
As our main contribution we present a more efficient algorithm for computing the optimal value of a program in LP (CQ Σ ) programs that is able to exploit hypertree decomposition of the queries to speed up the computation.Our algorithm operates in two phases: first, it unfolds universal quantifiers present in LP (CQ Σ ) programs to generate a program in a more restrictive language that we call LP clos (CQ Σ ).Then, the algorithm exploits a hypertree decomposition to construct an alternate interpretation of an LP clos (CQ Σ ) program over a database that we call the factorized interpretation.The factorized interpretation is a linear program having the same optimal value as the linear program resulting in the natural interpretation of LP (CQ Σ ) while being more succinct.It uses different linear program variables that intuitively represent sums of the linear program variables in the natural interpretation.The number of linear program variables in the factorized interpretation depends only on the fractional hypertree width of hypertree decompositions of the queries provided in the input, rather than on the number of query variables.In this manner, our more efficient algorithm can decrease the data complexity, i.e., the degree of the polynomial in the upper bound of the run time of the naive algorithm based on computing the natural interpretation and solving it with a linear program solver.With respect to the combined complexity, even solving LP clos (CQ Σ ) programs is NP-hard and coNP-hard in general, but our approach shows that some cases are tractable.
We prove the correctness of the factorized interpretation with respect to the natural interpretation -that is, the fact that factorized and natural interpretation generates linear program with the same optimal value -by exhibiting a correspondence between weightings of answer sets on the natural interpretation, and weightings of answer sets on the factorized interpretation.This correspondence can be seen as an independent contribution as it shows that one can reconstruct a relevant weighting of the answer set of a quantifier free conjunctive query by only knowing the value of the projected weighting on the bags of a tree decomposition.Conjunctive queries with existential quantifiers are dealt with by showing that one can find an equivalent projecting LP (CQ Σ ) program with quantifier free conjunctive queries only.
1.1.A Concrete Example.We start by illustrating the language LP (CQ Σ ) on an example.
Resource Delivery Optimization.We consider a situation in logistics where a company received orders for specific quantities of resource objects.The objects must be produced at a factory, then transported to a warehouse before being delivered to the buyer.The objective is to fulfill every order while minimizing the overall delivery costs and respecting the production capacities of the factories as well as the storing capacities of the warehouses.
Let F be the set of factories, O the set of objects, W the set of warehouses and B the set of buyers.We consider a database D with elements in the domain The database D has four tables.The first table prod D ⊆ F × O × R + contains triples (f, o, q) stating that the factory f can produce up to q units of object o.The second table order D : B × O × R + contains triples (b, o, q) stating that the buyer b orders q units of object o.The third table store D ⊆ W × R + contains pairs (w, l) stating that the warehouse w has a storing limit of l.The fourth table route D : (F × W × R + ) ∪ (W × B × R + ) contains triples (f, w, c) stating that the transport from factory f to warehouse w costs c, and triples (w, b, c) stating that the transport from warehouse w to buyer b costs c.The query: dlr (f, w, b, o) = ∃q.∃q 2 .∃c.∃c 2 .prod (f, o, q) ∧ order (b, o, q 2 ) ∧ route(f, w, c) ∧ route(w, b, c 2 ) selects from the database D all tuples (f, w, b, o) such that the factory f can produce some objects o to be delivered to buyer b through the warehouse w.Let Q = dlr (f ′ , w ′ , b ′ , o ′ ) and let sol D (Q) be the answers of Q on database D. The goal is to determine for each of these possible deliveries the quantity of the object that should actually be sent.These quantities are modeled by the unknown weights θ α Q of the query answers α ∈ sol D (Q).For any factory f and warehouse w the sum α∈sol D (Q∧w ′ . =w∧f ′ .=f ) θ α Q is described by the expression weight f ′ .
=f ∧w ′ .=w (Q) when interpreted over D. We use the LP (CQ Σ ) program in Figure 1 to describe the optimal weights that minimize the overall delivery costs.=b (Q) subject to ∀(f, o, q):prod (f, o, q).weight x:f ′ .
=w (Q) ≤ num(l) The weights depend on the interpretation of the program over the database, since D specifies the production capacities of the factories, the stocking limits of the warehouses, etc.The program has the following constraints: -for each (f, o, q) ∈ prod D the overall quantity of object o produced by f is at most q1 .-for each (b, o, q) ∈ order D the overall quantity of objects o delivered to b is at least q.
-for each (w, l) ∈ store D the overall quantity of objects stored in w is at most l.
By answering the query Q on the database D and introducing a linear program variable θ α Q for each of the query answer α, we can interpret the LP (CQ Σ ) program in Figure 1 as a linear program.A solution to this linear program will associate a real weight to each α, that is, to each tuple (f, w, b, o) that is a solution of Q over D. Intuitively, this weight is the quantity of object o that factory f has to produce to store in warehouse w before being sent to buyer b.Moreover, these quantities are compatible with the constraints imposed on the capacity of each factory, warehouse and on the orders of each buyer.Hence an optimal solution of this linear program will yield an optimal way of producing what is necessary while minimizing the transportation costs.
Vol. 20:1 LINEAR PROGRAMS WITH CONJUNCTIVE QUERIES 9:5 1.2.Related Work.Our result builds on well-known techniques using dynamic programming on tree decompositions of the hypergraph of conjunctive queries.These techniques were first introduced by Yannkakis [Yan81] who observed that so-called acyclic conjunctive queries could be answered in linear time using dynamic programming on a tree whose nodes are in correspondence with the atoms of the query.Generalizations have followed in two directions: on the one hand, generalizations of acyclicity such as notions of hypertree width [GLS02, GLS99, Gro06] have been introduced and on the other hand enumeration and aggregation problems have been shown to be tractable on these families of queries such as finding the size of the answer set [PS13] or enumerating it with small delay [BDG07].These tractability results can be obtained in a unified and generalized way by using factorized databases introduced by Olteanu and Závodný [OZ12, OZ15], from which our work is inspired.Factorized databases provide succinct representations for answer sets of queries on databases.The representation enjoys interesting syntactic properties allowing to efficiently solve numerous aggregation problems on answer sets in polynomial time in the size of the representation.Olteanu and Závodný [OZ15] have shown that when the fractional hypertree width of a query Q is bounded, then one can construct, given a hypertree decomposition of Q and a database D, a factorized databases representing the answers of Q on D of polynomial size.They also give a O(1) delay enumeration algorithm on factorized databases.Combining both results gives a generalization of the result of Bagan, Durand and Grandjean [BDG07] on the complexity of enumerating the answers of conjunctive queries.
Our result heavily draws inspiration from this approach as we use bottom up dynamic programming on hypertree decomposition of the input query Q to construct a partial representation of the answers set of Q on database D that we later use to construct a factorized interpretation of the linear program to solve.While our approach could be made to work directly on factorized representations of queries answer sets as defined by Olteanu and Závodný [OZ15], we choose to directly work on tree decompositions.One reason for this is because our factorized interpretation uses hypertree decompositions that are slightly more constraint than the one usually used to efficiently handle complex linear programs.Namely, our tree decomposition needs extra bags for dependencies between variables that are not present in the query but only in the linear program.This constraints are not straightforward to translate into factorized databases while they are natural on tree decompositions.
Comparison with conference version.This paper is a longer version of [CCNR22].We have improved the presentation of some results from this old version, added new ones and added the full proofs left in the appendix in the earlier version.Some clarifications were made through slight changes in the theoretical framework that are described in the next paragraph.Our new contributions includes: • A precise complexity analysis on how one can solve linear programs in LP (CQ Σ ) depending on their structure, stated by explicitly using the AGM bound and fractional hypertree width of the queries involved in the linear program, • New hardness results for the general case, • A cleaner logical framework to describe linear programs over database queries.The main change in the presentation comes from the introduction of the core language LP clos (CQ Σ ) on which the factorized interpretation is described.It allows for a cleaner analysis of the complexity of our approach where we can separate the explanation of interpreting LP clos (CQ Σ ) languages, now called the natural interpretation (naive interpretation in the conference paper), and the unfolding of quantifiers in LP (CQ Σ ).Indeed, in the conference version, we started by unfolding quantifiers before performing the analysis.This unfolding is now made explicit by the closure operation of LP (CQ Σ ) programs which produces an LP clos (CQ Σ ) program that can then be solved using our techniques.It allows us to properly separate the unfolding phase from the interpretation phase and to describe their complexity independently.In particular, in the conference version of the paper, the tractability result holds only for a fragment of LP (CQ Σ ) where we are able to bound the size of the unfolding.Thanks to the introduction of LP clos (CQ Σ ) and to a notion of normal form for LP (CQ Σ ) programs, we are now able to state precise complexity bounds for every program in LP (CQ Σ ) depending on their structure without assuming anything more.We also removed one constructor from the definition of LP (CQ Σ ).Indeed, in the conference version of the paper, LP (CQ Σ ) programs could use an expression of the form weight x:Q ′ (Q) to generate a linear sum depending on both Q and Q ′ .However, our tractability results worked only when Q ′ has a very particular form, namely, Q ′ needed to be of the form x .=y.We hence removed this constructor from the definition of LP clos (CQ Σ ) language which has now only constructors of the form weight x .=c (Q) for a vector of database constants c.Consequently we do not need to introduce a fragment of the general language anymore to recover tractability since the tractable case is now the only one possible in LP clos (CQ Σ ).It may appear that we lost some expressivity along the way but it turns out that we can recover the same behavior using constructors in LP (CQ Σ ).Namely weight x:Q ′ (Q) can now be expressed as y:Q ′ weight x .
1.3.Organization of the paper.Section 2 contains the necessary definitions to understand the paper.Section 3 presents the language LP clos (CQ Σ ) of linear programs parameterized by conjunctive queries and gives its semantics by interpreting programs in LP clos (CQ Σ ) as linear programs, which we call the natural interpretation.This language is very simple and does not allow universal quantification as used in Section 1.1.We show in Section 4 that one can exploit hypertree decomposition to compute the optimal value of LP clos (CQ Σ ) programs efficiently by interpreting them as more succinct linear programs, via an interpretation that we call factorized interpretation.The soundness of this approach is delayed to Section 6 as it contains results on weightings of conjunctive queries that are of independent interest.We then proceed to defining the language LP (CQ Σ ) in Section 5.1.The language is more expressive than LP clos (CQ Σ ) as it allows for universal quantification over the database, as it is hinted in the previous example.We give its semantics via a closure operation that transforms an LP (CQ Σ ) program to an LP clos (CQ Σ ) program.We analyze the complexity of solving LP (CQ Σ ) programs and show how one can leverage the results on LP clos (CQ Σ ) to this program in Section 5.2.We present some preliminary experimental results in Section 5.3.Finally, Section 7 presents some applications of LP (CQ Σ ).

Preliminaries
Sets, Functions and Relations.Let B = {0, 1} be the set of Booleans, N the set of natural numbers including 0, R + be the set of positive reals including 0 and subsuming N, and R the set of all reals.Given any set S and n ∈ N we denote by S n the set of all n-tuples over S and by S * = ∪ n∈N S n the set of all words over S. A weighting on S is a (total) function f : S → R + .
Given a set of (total) functions A ⊆ D S = {f | f : S → D} and a subset S ′ ⊆ S, we define the set of restrictions we denote its transitive closure by R + ⊆ S × S and the reflexive transitive closure by Variable assignments.We fix a countably infinite set of (query) variables X .For any set D of database elements, an assignment of (query) variables to database elements is a function α : X → D that maps elements of a finite subset of variables X ⊆ X to values of D. For any two sets of variable assignments A 1 ⊆ D X 1 and A 2 ⊆ D X 2 we define their join We also use a few vector notations.Given a vector of variables x = (x 1 , . . ., x n ) ∈ X n we denote by set(x) = {x 1 , . . ., x n } the set of the elements of x.For any variable assignment α : X → D with set(x) ⊆ X we denote the application of the assignment α on x by α(x) = (α(x 1 ), . . ., α(x n )).We recall the formal semantics of linear programs in Figure 3.

Linear expressions S, S ′ ∈ Le
Figure 3: Semantics of linear expressions, constraints and programs.
For any weightings w : Ξ → R + , the value of a sum S ∈ Le is the real number eval w (S) ∈ R. We denote the solution set of a constraint C ∈ Lc by C ⊆ {w | w : Ξ → R + }.
The optimal value opt(L) ∈ R of a linear program L with objective function S and constraint C is: The size |L| of a linear program L is defined to be the number of symbols needed to write it down.It is well-known that the optimal solution of a linear program L can be computed in polynomial time in |L| [Kar84].
Observe that we are only interested in non-negative weightings, without explicitly imposing positivity constraints.It is a usual assumption in linear programming since it is well known that one can transform any linear program L into L ′ of size at most 2|L| so that the feasible points of L ′ over R + are exactly the feasible points of L over R, by simply replacing every occurrence of a variable x in L by x + − x − .2.2.Conjunctive Queries.A relational signature is a pair Σ = (R, C) where C a finite set of constants ranged over by c and R = ∪ n∈N R (n) is a finite set of relation symbols.The elements R ∈ R (n) are called relation symbols of arity n ∈ N. x ∈ X , c ∈ C, and R ∈ R (n) .
In Figure 4 we recall the notion of conjunctive queries.An expression E is either a (query) variable x ∈ X or a constant a ∈ C. The set of conjunctive queries Q ∈ CQ Σ is built from equations E 1 .=E 2 , atoms R(E 1 , . . ., E n ), the logical operators of conjunction Q ∧ Q ′ and existential quantification ∃x.Q.Given a vector x = (x 1 , . . ., x n ) ∈ X n and a query Q, we write ∃x.Q instead of ∃x 1 . . . ..∃x n .Q.For any sequence of constants c = (c 1 , . . ., c n ) ∈ C n we write =c is equal to true.The set of free variables fv (Q) ⊆ X are those variables that occur in Q outside the scope of an existential quantifier: A conjunctive query Q is said to be quantifier free if it does not contain any existential quantifier.In the literature, such queries are sometimes also called full queries.
We can define operations to extend queries with additional variables x such that for all For any n ≥ 0 and vector of constants c ∈ C n and vector of variables x ∈ X n we define an operator subs [x/c] on conjunctive queries, that substitutes any variable in a vector x by the constant at the same position in vector c, so that for all queries Q ∈ CQ Σ : i.e., where all occurrences in Q of variables in x are replaced by the corresponding elements of c.  n) and n ≥ 0. We also define the structures' domain dom(D) = D. A (relational) database D is a finite relational Σ-structure, i.e., all its components are finite.We denote the set of all databases by db Σ .
For any conjunctive query Q ∈ CQ Σ , set X ⊇ fv (Q) and relational database D ∈ db Σ we define the answer set Q D X in Figure 5.It contains all those assignments α : X → D for which Q becomes true on D. We define the semantics of a query by: In particular, observe that the semantics of existential quantifiers is the projection of the answer set, that is: 2.4.Hypertree Decompositions.Hypertree decompositions of conjunctive queries are a way of laying out the structure of a conjunctive query in a tree.It allows to solve many aggregation problems (such as checking the existence of a solution, counting or enumerating the solutions etc.) on quantifier free conjunctive queries in polynomial time where the degree of the polynomial is given by the width of the decomposition.
A digraph is a pair (V, E) with node set V and edge set E ⊆ V × V.A digraph is acyclic if there is no v ∈ V for which (v, v) ∈ E + .For any node u ∈ V, we denote by ↓u = {v ∈ V | (u, v) ∈ E * } the set of nodes in V reachable over some downwards path from u, and we define the context of u, denoted ↑u, by ↑u = (V \ ↓u) ∪ {u}.The digraph (V, E) is a forest if it is acyclic and for all u, u ′ , v ∈ V there holds that (u, v), (u ′ , v) ∈ E implies u = u ′ .Moreover, (V, E) is a tree if there exists a node r ∈ V such that V = ↓r.In this case, r is unique and called the root of the tree.If for v ∈ V it holds that ↓v = {v}, then v is called a leaf.Observe that in this tree, the paths are oriented from the root to the leaves of the tree.Definition 2.1.Let X ⊆ X be a finite set of variables.A decomposition tree T of X is a tuple (V, E, B) such that: -(V, E) is a finite directed rooted tree with edges from the root to the leaves, -the bag function B : V → 2 X maps nodes to subsets of variables in X, -for all x ∈ X the subset of nodes {u ∈ V | x ∈ B(u)} is connected in the tree (V, E), -each variable of X appears in some bag, that is u∈V B(u) = X.Now a hypertree decomposition of a quantifier free conjunctive query is a decomposition tree where for each atom of the query there is at least one bag that covers its variables.
Definition 2.2 (Hypertree width of quantifier free conjunctive queries).Let Q ∈ CQ Σ be a quantifier free conjunctive query.A generalized hypertree decomposition of Q is a decomposition tree T = (V, E, B) of fv (Q) such that for each atom R(x) of Q there is a vertex u ∈ V such that set(x) ⊆ B(u).The width of T with respect to Q is the minimal number k such that every bag of T can be covered by the variables of k atoms of Q.The generalized hypertree width of a query Q is the minimal width of a tree decomposition of Q.
While hypertree width allows to obtain efficient algorithms on conjunctive queries, our results will also work for the more general notion of fractional hypertree width, which consists in a fractional relaxation of the hypertree width.We let Q ∈ CQ Σ be a quantifier free conjunctive query, A be the atoms of Q and let X ⊆ fv (Q).A fractional cover of X is a function c : A → R + assigning positive weights to the atoms of Q such that for every x ∈ X, R∈A,x∈fv (R) c(R) ≥ 1.The value of a fractional cover c is defined as R∈A c(R).
For example, consider the query T riangle = R(x, y)∧S(y, z)∧T (z, x) and X = {x, y, z}.The function c such that c(R) = c(S) = c(T ) = 1/2 is a fractional cover of X of value 3/2.Definition 2.3.Let Q be a conjunctive query and T = (V, E, B) be a generalized hypertree decomposition of Q.The fractional hypertree width of T is the smallest k such that for every u ∈ V, there exists a fractional cover of B(u) of value smaller than k.The fractional hypertree width of Q, denoted by fhtw(Q), is the smallest k such that Q has a generalized hypertree decomposition of fractional hypertree width k.
From now on, we will only write the width of T in place of the fractional hypertree width.
The key observation making fractional hypertree width suitable for algorithmic purposes is due to Grohe and Marx [GM14] who proved that if a quantifier free conjunctive query is such that fv Lemma 2.4 is folklore: it can be proven by computing the semi-join of every bag in a subtree in a bottom-up fashion, as it is done in [Lib13, Theorem 6.25] and using a worst-case optimal join algorithm such as Triejoin [Vel14] for computing the relation at each bag.This yields a superset S u of Q D | B(u) for every u.Then, with a second top-down phase, one can remove tuples from S u that cannot be extended to a solution of Q D .
Following the previously mentioned upper bound of [GM14], Atserias, Grohe and Marx proved in [AGM13] that the bound given by an optimal fractional cover of Q is tight (up to polynomial factors).This bound is now usually referred to as the AGM bound.More precisely, it says that if AGM(Q) denotes the smallest value over every fractional cover of Q, then for every D, Q D is of size at most |D| AGM(Q) and there exists a database poly(|Q|) .Hence, even if Q is of width k, the size of Q D could be order of magnitudes bigger than |D| k when k < AGM(Q).Hence, Lemma 2.4 gives a succinct way of describing the set of solutions of Q that we exploit in this paper.
Parts of our result will be easier to describe on so-called normalized decomposition trees: Definition 2.5.Let T = (V, E, B) be a decomposition tree.We call a node u ∈ V of T : -an extend node: if it has a single child u ′ and B(u We call T normalized2 if all its nodes in V are either extend nodes, project nodes, join nodes, or leaves.
It is well-known that tree decompositions can always be normalized without changing the width.Thus normalization does not change the asymptotic complexity of the algorithms.
Lemma 2.6 (Lemma of 13.1.2 of [Klo94]).For every tree decomposition of T = (V, E, B) of Q of width k, there exists a normalized tree decomposition T ′ = (V ′ , E ′ , B ′ ) having width k.Moreover, one can compute T ′ from T in polynomial time.

Linear Programs with Closed Weight Expressions
In this section, we introduce the language LP clos (CQ Σ ) to express linear programs parameterized by conjunctive queries.This language is deliberately kept simple, which allow us to design efficient algorithms for it.An element of LP clos (CQ Σ ) is called a closed LP (CQ Σ ) program.We refer to such programs as "closed" because they do not contain quantification in the linear program part, which contrasts with the more general definition of LP (CQ Σ ) given in Section 5, which allows to express more interesting linear program.The case of closed LP (CQ Σ ) programs is however central in this work as this is the class of optimization problems for which we propose an efficient algorithm.The more general case of LP (CQ Σ ) programs is dealt with using a "closure" procedure which transforms any programs from Let Σ be a relational signature.A closed weight expression is an expression of the form weight x .
=c (Q) where Q is a conjunctive query, set(x) ⊆ fv (Q), variables in x are pairwise distinct and c are database constants.An LP clos (CQ Σ ) program is intuitively a linear program whose variables are closed weight expressions.A formal definition is given in Figure 6.Such linear programs can be interpreted as standard linear programs for any database D with numerical values.In order to do so, we fix for any query We can then map each closed weight expression to a linear sum with variables in Θ D Q as follows: The size |L| of a program L ∈ LP clos (CQ Σ ) is defined to be the number of symbols needed to write it down.

Example. As an example we consider the conjunctive query
The interpretation ⟨L⟩ D is the following linear program, where we denote any query answer α ∈ Q D by a pair (α(x), α(y)) in the Cartesian product {0, 1} 2 for simplicity: and the second constraint by interpreting weight x .
=1 (Q) ≤ 1.Note that the objective function is the sum of the lefthandsides of the two constraints, so the three weight expressions of L are semantically related.
3.2.Complexity of solving LP clos (CQ Σ ).In this section, we are interested in the complexity of computing opt(⟨L⟩ D ) given L ∈ LP clos (CQ Σ ) and a database D. From a combined complexity point of view, that is, when both L and D are assumed to be part of the input, it is not hard to see that the problem is NP-hard since it requires to implicitly find the answer set of every conjunctive query appearing in L. We formalize this intuition in the following theorem: Theorem 3.2.The problem of deciding whether opt(⟨L⟩ D ) ̸ = 0 given a relational signature Σ, L ∈ LP clos (CQ Σ ) and a database D is both NP-hard and coNP-hard.
Proof.It is well-known that the problem of deciding whether Q D ̸ = ∅ given a conjunctive query Q and a database D in the input is NP-complete [CM77].We show that this problem can be reduced to the problem of deciding whether the optimal value opt(⟨L⟩ D ) is non-zero, given a relational signature Σ, a linear program L ∈ LP clos (CQ Σ ) and a database D with schema Σ.The NP-hardness of computing opt(⟨L⟩ D ) is thus a direct corollary.
For any conjunctive query Q, we consider the following LP clos (CQ Σ ) program: We first note that: To show the coNP-hardness, it is sufficient to observe that the same trick can be applied to reduce the problem of deciding whether Data complexity.The hardness from Theorem 3.2 mainly stems from the hardness of answering conjunctive queries, that is only relevant in the context of combined complexity.It is often assumed however that the size of the query is small with respect to the size of the data, hence one can study the data complexity of the problem, that is, the complexity of the problem when we fix the linear program L. In this case, computing opt(⟨L⟩ D ) can be done in polynomial time in |D| using the following procedure: The exact complexity of the previous procedure is however dependent on the size of ⟨L⟩ D whose number of variables is the sum of | Q D | for every Q appearing in L. We lift the AGM bound presented in Section 2.4 to linear programs in LP clos (CQ Σ ) by defining AGM(L) to be the maximum of AGM(Q) for every query Q appearing in L. The size of ⟨L⟩ D can now be upper bounded by |L| × |D| AGM(L) .Using a worst-case optimal join algorithm such as Triejoin [Vel14] to compute Q D in time O(|D| AGM(Q) ), we conclude that one can compute opt(⟨L⟩ D ) in time O((|L||D| AGM(L) ) ℓ ), where ℓ is the best known constant to compute the optimal value of a linear program.Currently, the best known value for ℓ is smaller than 2.37286 by combining a result relating the complexity of solving linear programs with the complexity of multiplying matrices [CLS21] with the best known algorithm for multiplying matrices [AW21].
3.3.Replacement Rewriting.In this paper, we will often introduce alternate ways of interpreting linear program of LP clos (CQ Σ ) over a database.We will say that such alternate interpretation is sound if for any database D, when interpreting the linear program using this alternate interpretation over D, we obtain a linear program having the same optimal value as ⟨L⟩ D , the natural interpretation of L over D. It will allow us for example to construct smaller linear programs and hence speed up the computation of opt(⟨L⟩ D ).Formally proving the soundness of an alternate interpretation is usually tedious as it involves transforming solutions from one interpretation to the other while proving by induction that the constraints in L are all satisfied.However, every interpretation that we will consider in this paper is based on interpreting weight expressions differently with some extra equality constraints.Hence, most of the time, the reasoning may be reduced to very simple linear programs involving only equality constraints.Our goal in this section is to provide formal tools to simplify soundness proofs.
One can always construct a function ν mapping every possible weight expression W into a fresh linear program variable ν(W ) such that if W ̸ = W ′ , then ν(W ) ̸ = ν(W ′ ).We will often denote ν(weight x . =c (Q)) by ν x .=c Q .From now on, we assume such function ν has been fixed.For a set of weight expressions W, we define the weight constraints of W, denoted by wc ν (W) as the following set of linear constraints: For for any linear sum S ∈ LS clos (CQ Σ ), let ⟨S⟩ ν ∈ LS be defined by replacing any weight expression W in S by ν(W ) and for C ∈ LC clos (CQ Σ ), let ⟨C⟩ ν ∈ LP be the linear constraint obtained by applying the substitution to every linear sum appearing in C.
Consider a linear program L = maximize S subject to C with some linear sum S ∈ LS clos (CQ Σ ) and some linear constraint C ∈ LC clos (CQ Σ ).We denote by W(L) the set of weight expressions that appear in L. The replacement rewriting of L is the following linear program repl ν (L) ∈ LP (CQ Σ ): Observe that in repl ν (L), the only place where weight x .=c (Q) constructors appear is in the wc ν (W(L)) part.Hence, we can naturally lift the interpretation of L over a database D to repl ν (L) as follows: The main feature of repl ν (L) is that it allows to formally separate the linear programming part from the part that is interpreted over a database, which will be helpful whenever we need to reason only on weight expressions.

Example.
In the example of Section 3.1, the rewriting repl ν (L) of L is: which is then interpreted as ⟨repl ν (L)⟩ D as: Note that the linear program's variables θ (i,j) Q do not occur in the objective function, so they are implicitly existentially quantified.Up to existential quantification, the above constraint is equivalent to: So we obtain a linear program with much fewer variables: The optimal value of this new linear program is achieved with the solution ν 0 = ν 1 = 1 and ν 2 = 2.We can see that it directly corresponds to the optimal solution of the original linear program where θ How to derive such factorized rewritings of constraints and how to reconstruct an optimal solution from the optimal solution of the rewritten program from a given LP (CQ Σ ) program in a systematic manner is studied in the remainder of this article.
It is easy to see that for every database D over signature Σ, the optimal value of ⟨repl ν (L)⟩ D is the same as the optimal value of ⟨L⟩ D which is formalized as follows: Proposition 3.4 (Soundness of replacement rewriting).Given a database D with signature Σ and a linear program L ∈ LP clos (CQ Σ ), we have:

9:16
Proof.For every weight expression weight x .=c (Q) of L and w : Clearly, by definition, this extension of w satisfies the weight constraint Moreover, for any sum expression S of L, eval w (⟨S⟩ D ) will give the same value as eval w (⟨repl ν (S)⟩ D ) as every weight expression W of S has been replaced in repl ν (S) by ν(W ) and that from what precedes w(ν(W )) has the same value as eval w (⟨W ⟩ D ).Hence, any solution of ⟨L⟩ D can be extended to a solution of ⟨repl ν (L)⟩ D that evaluates to the same objective value.
On the other hand, with the same reasoning, any solution w of ⟨repl ν (L)⟩ D directly gives a solution of ⟨L⟩ D that evaluates to the same objective value since the weight constraints ensure that eval w (⟨W ⟩ D ) = w(ν(W )).
3.4.Interpretations of linear programs.In this section, we formalize the notion of interpretation of linear programs in LP clos (CQ Σ ) and give necessary conditions for an alternate interpretation to be sound.In the following, we denote by Queries w (L) the set of conjunctive queries that appear in L, that is, the set of Q such that there is an expression of the form weight x .
Given an interpretation I, a conjunctive query Q and a set of weight constraints W over Q, we denote by: where ν is a fixed function as constructed in Section 3.3.Observe that when For example, the replacement rewriting of a linear program in L can be defined by the interpretation N = (N W , N C ) such that N D C (Q) := true and However, the assumption from the statement applied to every since every I D Q (W Q (L)) contains disjoint variables.It thus means that for any solution w of constraints ⟨wc ν (W(L))⟩ D , one can construct a solution w ′ of constraints Q∈Queriesw(L) I D Q (W Q (L)) that assign variables ν(W) to the same values.Hence, we have eval w ′ (⟨S⟩ ν ) = eval w (⟨S⟩ ν ).Moreover, since eval w (⟨C⟩ ν ) is true by definition, and since w and w ′ coincide on ν variables and that ⟨C⟩ ν only contains ν(W) variables, eval w ′ (⟨C⟩ ν ) is also true.Hence w ′ is a solution of I D (L) and the values of both linear programs on w and w ′ respectively coincide.Taking w so that it is optimal for ⟨repl ν (L)⟩ D yields that opt(⟨repl ν (L)⟩ D ) ≤ opt(I D (L)).
On the other hand, we also have that given a solution w ′ of constraints Q∈Queriesw(L) one can construct a solution w of constraints ⟨wc ν (W(L))⟩ D that assign variables ν(W) to the same value.By the same reasoning, it implies that opt(⟨repl ν (L)⟩ D ) ≥ opt(I D (L)), and the equality follows.
Example.To illustrate the notion of interpretations, we will consider a toy alternative interpretation I = (I W , I C ) defined as follows: the linear program variables X D I,Q are defined as {x α Q | α ∈ Q D } and I W is defined as: The interpretation I D (L) of the example given in Section 3.1 is thus: .
The last constraint corresponds to I D C (Q).It is clear that this program has the same optimal value as the original one because it is obtained by substituting θ α Q by 3x α Q and by adding a constraint that is always true since x α Q ≥ 0. Actually, Proposition 3.5 tells us that for every LPCQ L and database D, I D (L) has the same optimal value as ⟨L⟩ D .Indeed, given a solution w of ⟨wc ν (W)⟩ D , we can transform it into a solution w ′ of and w ′ (ν(W )) = w(ν(W )).It is readily verified that w ′ respects every constraint of I D C (W) and that w ′ has the same values as w on variables ν(W).Similarly, a solution w ′ of I D Q (W) can be transformed into a solution w of ⟨wc ν (W)⟩ D by defining w as w( and Proposition 3.5 can be applied.

Solving LP clos (CQ Σ ) linear programs efficiently
In this section, we propose an algorithm for solving linear programs in LP clos (CQ Σ ) that is better than the one given in Theorem 3.3.The proof of Theorem 3.2 suggests that the main source of intractability stem from the complexity of answering conjunctive queries, which is reflected in the upper bound given in Theorem 3.3 where the complexity here depends on the worst-case size of the answer sets of queries.However, for many problems on conjunctive queries such as computing the number of answers, one can get better upper bounds by exploiting the fact that they have small fractional hypertree width.By leveraging the notion of fractional hypertree width from conjunctive queries to linear programs of LP clos (CQ Σ ), we are able to lower the complexity from O(|L| ℓ |D| ℓAGM(L) ) given by Theorem 3.3 to O(|L| ℓ |D| ℓfhtw(L) ), where fhtw(L) denotes the (leveraged notion of) fractional hypertree width of L, when an optimal tree decomposition of L is provided in the input.
To achieve this, we avoid the expensive step of computing ⟨L⟩ D by exploiting tree decompositions T of the queries of L to generate a smaller linear program ρ T,D (L) having only O(|D| fhtw(T) ) variables and show that the optimal value for ρ T,D (L) is the same as the optimal value of ⟨L⟩ D .4.1.Tree decomposition of LP clos (CQ Σ ).We start by lifting the concept of hypertree decompositions from conjunctive queries to linear programs in LP clos (CQ Σ ).Given a linear program L ∈ LP clos (CQ Σ ), recall that we denote by Queries w (L) the set of conjunctive queries that appear in L.
Intuitively, our notion of hypertree decomposition for LP clos (CQ Σ ) will consist in a collection of hypertree decomposition for every ∃y.Q ∈ Queries w (L).However, we will need a stronger condition on the decomposition than the usual one: =c (∃y.Q) is a tree decomposition of Q such that there exists u ∈ V with set(x) = B(u).
T is said to be compatible with L if it is a tree decomposition of Q compatible with every weight x .=c (∃y.Q) of L.
Observe that the requirement that a tree decomposition has to be compatible with the linear program may increase the optimal width of the decomposition.For example, consider the conjunctive query Q = R(x, y) ∧ S(y, z).If a linear program L contains an expression of the form weight x .

=0,z
. =0 (Q), then every tree decomposition of Q compatible with L has width at least 3/2 whereas optimal tree decomposition for Q have width 1 (a decomposition of Q is given in Section 2.4).Definition 4.2.Let L be an LP clos (CQ Σ ) program.A tree decomposition of L is defined to be a collection The width of T L is defined to be the maximal width of the decomposition trees in T L .The size of T L is defined to be 4.2.Factorized Interpretation of quantifier free LP clos (CQ Σ ).In this section, we present a more succinct way of interpreting weight constraints in linear programs in LP clos (CQ Σ ), called the factorized interpretation, that exploits a tree decomposition of the queries.In this section, we assume that the linear program only contains quantifier free queries.We will explain in Section 4.3 how one can reduce the case with existentially quantified queries to the case of quantifier free queries.
From now on, we fix a linear language L ∈ LP clos (CQ Σ ) and T a tree decomposition of L. We will describe an interpretation ρ T = (ρ T W , ρ T C ) of L exploiting tree decompositions.
Interpreting weight constraints.Given a conjunctive query Q ∈ Queries w (L) and T = T Q the tree decomposition of Q compatible with L given by T, we define an interpretation of weight expressions on the following set of variables: =c (Q) as follows: where u ∈ V is the vertex such that set(x) = B(u) that is the closest to the root of T 3 .If no such u exists then ρ T,D W (weight x .=c (Q)) is undefined.However, for every weight expression of L, the existence of u is implied by the fact that T is compatible with L, see Definition 4.1.
For Q ∈ Queries w (L), we define ρ T,D W (weight . It gives a function to interpret weight expressions in the sense given in Section 3.4 since if Q and Q ′ are distinct conjunctive queries, then ρ T,D W (weight x .=c (Q)) and ρ T,D W (weight x .=c (Q ′ )) will contain disjoint linear program variables.

Local soundness constraints. If one simply defines the factorized interpretation as (ρ T
W , true), it will not give a sound interpretation.We illustrate this phenomenon on the example from Section 3.1 using the tree decomposition having three nodes r, u, v rooted at r with r connected to u and v and with B(r) = ∅, B(u) = {x} and B(v) = {y}.Interpreting only the weights without additional constraints would yield the following: One strange aspect of this program is that is does not depends on the variables ξ and hence on the value of y, because the program does not contain weight expression on variable y.It can be easily checked that the optimal value of this program is not the same as the on from Section 3.1.Indeed, in the above linear program, the values of ν 0 , ν 1 and ν 2 3 Actually, any vertex u such that set(x i ) = B(u) would work but we choose it to be the closest to the root to have a deterministic definition of the factorized interpretation.It is well-defined by connectedness of tree decompositions.Indeed, if two bags B, B ′ contain a set S, then their least common ancestor also contains S.
are now completely independent.Hence, the optimal value of the above linear program is actually unbounded.
To make the factorized interpretation equivalent to the natural interpretation, one has to restore somehow the forgotten dependencies.One way of resolving it in the above program would be to add a new constraint ξ To achieve this, we add so-called local soundness constraints.For every edge e = (u, v) ∈ E of T and γ ∈ Q D |B(u)∩B(v) , we define the equality constraint E e,D γ (Q) as follows: Intuitively, this constraint encodes the following: for expressions weight x .=c (Q) and weight y .
=c ′ (Q), if the assignments β = [x/c] and β ′ = [y/c ′ ] are compatible with one another, in the sense that they agree on the common variables they assign, then ⟨weight x .=c (Q)⟩ D and ⟨weight y .
=c ′ (Q)⟩ D will contain common variables from Θ D Q and hence, will not produce independent linear sums.On the other hand, in the factorized interpretation without additional constraint, they will be interpreted as independent variables ξ β Q,u and ξ β ′ Q,v for some u, v in V.If e = (u, v) is an edge of T , then E e,D γ (Q) accounts for the missed dependency in the factorized interpretation.It turns out that these constraints are enough to make the factorized interpretation equivalent to the natural interpretation (in the sense that they have the same optimal value).
Hence, we define ρ T,D C (Q), the local soundness constraints of Q w.r.t T , as follows: Local soundness constraints can be efficiently computed with respect to the width of the decomposition: Lemma 4.4.Let k be the width of T .The size of ρ Proof.There is one constraint =S γ v for each γ ∈ Q D |B(u)∩B(v) that have been found.
To construct it, observe that each Q D |B(u) is listed at most once for each edges of T and that the projection γ can be constructed in time O(|Q|).Hence, the total time required to construct ρ Factorized interpretation.Taking local soundness constraints into account, we can now define the T-factorized interpretation ρ T of a linear program L as the pair (ρ T W , ρ T C ) where When T is clear from context, we will simply say "the factorized interpretation".
The soundness of factorized interpretation follows from Proposition 3.5 and from results on weighting answer sets of conjunctive queries that can be of independent interest and that we give in Section 6.1: Theorem 4.5.Let L be a LP clos (CQ Σ ) program such that every conjunctive query in Queries w (L) is quantifier free, T a decomposition of L and D a database.The factorized interpretation ρ T,D (L) then has the same optimal value as ⟨L⟩ D : Moreover, given an optimal solution W of ρ T,D (L), there exists a canonical optimal solution ω of ⟨L⟩ D such that given W and a variable θ of ⟨L⟩ D , one can compute ω(θ) in polynomial time.
The first part of Theorem 4.5 is proven by providing two explicit transformations: one to go from a solution of ρ T,D (L) to a solution of ⟨L⟩ D having the same value and another one to go from a solution of ⟨L⟩ D to a solution of ρ T,D (L) having the same value.The preservation of the values by these transformations is enough to establish that both linear programs have the same optimal value.More interestingly, it also allows us to prove the second part of the theorem, that is, that one can recover an optimal solution of ⟨L⟩ D from an optimal solution of ρ T,D (L).This transformation is described in the proof of Theorem 6.13.
Example.Going back to the example from Section 3.1, and the tree decomposition previously mentioned, ρ T,D (L) is the following program: The two last lines contain the local soundness constraint ρ T,D C (Q).The last constraint mentions variables ξ Q,v that are not used elsewhere in the program and can safely 9:23 be ignored when looking for the optimal value.The other soundness constraint directly implies that ν 2 = ν 0 + ν 1 and hence, the optimal value for this program is 2.
Computing the factorized interpretation.The factorized interpretation ρ T,D (L) is interesting because it is smaller than the natural interpretation ⟨L⟩ D .Indeed, while ⟨L⟩ D has O(|D| AGM(L) ) variables and O(|L|) constraints (see Section 3.2), one can show that if k is the width of T, then the size of ρ T,D (L) is O(|D| k ) in the data complexity model (where L is considered constant).It follows from the following, more precise, combined complexity analysis: Theorem 4.6.Given a relational signature Σ, L ∈ LP clos (CQ Σ ) such that every query in Queries w (L) is quantifier free, a tree decomposition T of L and a database D, we let k be the width of T, t be the sum of the sizes of the tree decompositions in T and q be the sum of the sizes of the queries in Queries w (L).Then ρ T,D (L) has at most O(t Proof.This is a direct consequence of applying Lemma 4.3 and Lemma 4.4 to each query in Queries w (L).Concerning the size of ρ T,D (L), it comes from the fact that each weight construct has been replaced by at most one variables resulting in a program of size at most |L| and then we add O(t|D| k ) = O(|L||D| k ) soundness constraints.
Theorem 4.6 together with Theorem 4.5 implies that the data complexity of computing the optimal value of a linear program L ∈ LP clos (CQ Σ ) having only quantifier free queries is below O(|D| ℓ•fhtw(L) ) with ℓ < 2.37286 which improves the complexity stated in Theorem 3.3.
We observe however that the factorized interpretation may be small in practice than the worst-case theoretical bound given by the fractional hypertree width of L. Indeed, the number of variables in the factorized interpretation is the sum of the sizes of Q D |B(u) for each u.In particular, each one of them contains less elements than Q D .Hence, even when Q D is small with respect to the worst case, the factorized interpretation will also be smaller than the worst case.
We summarize this discussion in the following theorem, which mirrors Theorem 3.3: containing only quantifier free conjunctive queries.This restriction can actually be lifted since replacing existentially quantified queries of L with their quantifier free part yield a linear program that has the same optimal value under the natural interpretation for any database D. We formalize this approach and prove its correctness in this section.Given a quantified conjunctive query Q ′ = ∃y.Q with Q a quantifier free conjunctive query, we denote by qf (Q ′ ) the conjunctive query Q ∧ y .=y.Clearly, for any database D, the answer set Q D is the same as qf (Q ′ ) D and qf (Q ′ ) and Q have the same hypertree decompositions and hence the same width.However, in the following, we will need to be able to syntactically distinguish two queries having the same quantifier free part but different quantifier prefix and qf (Q ′ ) allows us to do so.Indeed,qf (Q ′ ) is syntactically different from Q ′′ = ∃y ′ .Q for different y and y ′ .
We lift the definition of qf (Q) to a linear program in L ∈ LP clos (CQ Σ ): we denote by qf (L) the LP clos (CQ Σ ) language obtained by replacing every expression weight x .
=c (Q ′ ) in L by weight x .
=c (qf (Q ′ )).It is clear that qf (L) now only contains quantifier free conjunctive query.
Example.We adapt the example from Section 3.1 by considering the conjunctive query =1 (Q ′ ) ≤ 1 Over the database given in Section 3.1, we have ⟨L ′ ⟩ D : On the other hand, we have qf =y and qf (L ′ ) is: =1 (qf (Q ′ )) ≤ 1 which is clearly equivalent from the linear program L from Section 3.1 which itself has optimal value 2 when interpreted on the database given in the same section.
Soundness of quantifier elimination.It turns out that it is equivalent to L in the following sense: Proposition 4.8.For every L ∈ LP clos (CQ Σ ) and D on signature Σ, we have We start by observing that W Q (L) is in one to one correspondence with W qf (Q) (qf (L)) by definition since we replaced every occurrence of Q in L by qf (Q) in qf (L).Moreover, observe that if Q and Q ′ are distinct queries of Queries w (L), then qf (Q) and qf (Q ′ ) are also distinct.Indeed, either Q and Q ′ have distinct quantifier free part and it is obvious, or they have distinct quantifier prefix y and y ′ respectively, in which case qf (Q) and qf (Q ′ ) will be distinct since qf (Q) contains y  =c (Q ′ ) where by the following sum expression: By Proposition 3.5, it is thus sufficient to show that for a set W of weight expressions over a conjunctive query Q ′ = ∃y.Q with Q a quantifier free conjunctive query, we have For a weight constraint W , we set w ′ (ν(W )) = w(ν(W )).Clearly, w and w ′ coincide on ν variables.It remains to show that w ′ ∈ ⟨wc ν (W)⟩ D , that is, for a weight expression W ∈ W of the form weight x .=c (Q ′ ), w ′ (ν(W )) = w ′ (⟨W ⟩ D ), that is: By definition of w ′ , the right-hand side of this equation rewrites to: For the other way around, let , where N β is the number of γ ∈ Q D such that γ |V = β.For every weight expression W ∈ W, we let w(ν(W )) := w ′ (ν(W )).Clearly w and w ′ coincide on ν variables.Now, it remains to show that w ∈ I D Q ′ (W) that is for a weight expression . By definition, we have: Recall that by definition, a hypertree decomposition T of width k of a linear program L in LP clos (CQ Σ ) consists of a collection of tree decompositions for the quantifier free part of each query in Queries w (L) of width at most k.For Q ∈ Queries w (L), T Q is then also a decomposition of qf (Q) with width at most k.Hence, T is a hypertree decomposition of qf (L) of width k.The fractional hypertree width of qf (L) is thus the same as the fractional hypertree width of L. Hence, one can compute the optimal value of L in O(|D| ℓ•fhtw(L) ) with ℓ < 2.37286 in data complexity by computing the optimal value of the factorized interpretation of qf (L) using Theorem 4.5 and Theorem 4.6, even when the program contains existentially quantified conjunctive queries.This is wrapped up in the following theorem which is an improvement over Theorem 3.3: Theorem 4.9.Given a relational signature Σ, L ∈ LP clos (CQ Σ ), a tree decomposition T of L of width k and a database D, there exists some ℓ < 2.37286 such that one can compute opt(⟨L⟩ D ) in time O((|L| + tq|D| k ) ℓ ) where t is the sum of the sizes of tree decompositions in T and q the sum of the sizes of the conjunctive queries in Queries w (L).

Linear Programs with Open Weight Expressions
While we have shown that LP clos (CQ Σ ) programs can be solved efficiently by exploiting tree decompositions of the input conjunctive queries, it is not yet powerful enough to express interesting linear program such as the one presented in Section 1.1.The missing feature in LP clos (CQ Σ ) for is their inability to quantify over values in the database to create new constraints.This is especially useful in the example of Section 1.1.The last quantified constraint states that the storing limit of every warehouse in the database will not be exceeded in the solution of the linear program.This expressivity is enabled by the fact that one can universally quantify over warehouses given the table store, which will be interpreted as generating one constraint for each.Moreover, observe that this constraint also draws a numerical value num(l) from the database.9:27 In this section, we introduce the language LP (CQ Σ ) allowing to universally quantify and sum over answers set of database queries.The syntax of the language is presented in Figure 7.It includes definitions for linear sums, linear constraints and linear programs.The semantics of programs in LP (CQ Σ ) will be given in Section 5.1 via a closure operation, transforming an LP (CQ Σ ) into a program in LP clos (CQ Σ ).
y) contains variables and constants and set(y Apart from the addition of an operator num(E) which will intuitively allow to get numerical constants from the database, universal quantifiers and sums ranging over a conjunctive query, the main difference with LP clos (CQ Σ ) programs is that non-constant values are allowed in weight z:x .=y (Q) expressions.We will call such weight expression open weight expressions, as opposed to closed weight expressions where y only contains constants.Intuitively, variables in set(y) will be replaced by database constants in the closure of an LP (CQ Σ ).
A valid LP (CQ Σ ) program L does not have free variables, that is, every variable has to be bound by one of the new operator: either a linear sum x:Q S or a universal quantifiers ∀x:Q.C.To formalize this notion, we give in Figure 8 the definition of the free variables of an LP (CQ Σ ) program.Observe in particular that for weight z:x .=y (Q), only variables in set(y) are considered free since fv (Q) ⊆ set(z).The free variables of the conjunctive queries (and hence in set(x) since set(x) ⊆ fv (Q)) are not considered free variables.In other words, variables bounded through universal quantifiers and sums over conjunctive queries in LP (CQ Σ ) will not introduce constants in conjunctive queries appearing in weight expressions. 4If one follows Barendregt's variable convention, it means that the variables appearing in the conjunctive queries of L ∈ LP (CQ Σ ) may be considered disjoint from the variables used in the linear program part.5.1.Closure and semantics.We define the semantics of linear programs with open weight expressions over a database D by mapping it to a linear program with closed weight expression.Intuitively, the queries in x:Q S and a universal quantifiers ∀x:Q.C get interpreted over D and it maps x to some possible values that passed into a context.The details are given in Figure 9. .Recall that subs is the substitution operator and ext x is the extension operator as defined in Section 2.2 The closure is defined in a somewhat classical way: one inductively evaluates a linear program L over a database D and an environment γ mapping free variables of L to values in the domain.We illustrate this on by detailing the closure of (∀x:Q.C) over a database D given an environment γ.The closure of x:Q S and of weight x:z .=y (Q) being very similar.
Vol. 20:1 LINEAR PROGRAMS WITH CONJUNCTIVE QUERIES 9:29 The closure of (∀x:Q.C) generates one set of constraint for each answer of Q over D (over variables x).It intuitively means that for every answer of Q, we want the constraints of the closure of C to be satisfied.However, when evaluated the closure inductively, Q may contain free variables.These variables must have a value set by γ.We start by mapping the variables of Q that are free to their value in the environment using subs γ (Q), where γ is the restriction of γ to the free variables of Q.Indeed, it may be that γ assigns a variable x that appears in x.In this case, we consider that x is not bounded by γ since it is not free in (x : Q).For example, ∀x : Q 1 .∀x: Q 2 .C has the same closure as ∀x : Q 1 .∀y: Q 2 .C since the value of γ over x when computing the closure of ∀x : Q 2 .C is not used in γ.To summarize, the closure of (∀x:Q.C) over a database D in an environment γ generates a set of constraints close(C) D,γ∪γ ′ for every γ ′ that is a solution of Q over variables x where each variable of Q that are not in x have been replaced by their value under γ, which is formally written as Observe that the closure is always well-defined due to the syntactic restrictions on linear programs.Also observe that one needs to be able to interpret some database constants as numerical values because of the num operation.Hence, one needs to use databases that can contain real numbers.This is formalized in the following definition: a (relational) database with real numbers is a tuple The natural interpretation ⟨L⟩ D of an LP (CQ Σ ) L over a database D is defined to be the natural interpretation of its closure, that is, ⟨L⟩ D is defined to be ⟨close(L) D ⟩ D .Example.As an example, we reconsider the conjunctive query Q = R 1 (x) ∧ R 2 (y) from Section 3.1.Assume we also have a unary relation S and a binary relation T in D with S D = {0, 1} and T D = {(0, 0.4), (0, 0.6), (1, 0.3)}.We then consider the following LP (CQ Σ ) program L: To compute the closure over D of this program, we start by unfolding the universal quantifier.Hence close(L) D,[] is equal to: By evaluating the closure of the weights and sum we then have: =y (Q) ≤ 1.We claim that given a database D, the optimal value of ⟨L⟩ D is the size of Q yields a solution of ⟨L⟩ D whose value is the size of Q D .Moreover, since every constraint in ⟨L⟩ D is saturated, it is also optimal.
The problem of computing the size of Q D when both Q and D are given in the input is #P-complete [PS13], hence, computing opt(⟨L⟩ D ) is #P-hard.
Data complexity.When considering the input linear program size to be constant, that is, in the data complexity model, it turns out that one can compute the optimal value of the natural interpretation of an LP (CQ Σ ) program L over D in time polynomial in the size of the database D. To analyze the precise complexity of solving programs in LP (CQ Σ ), we start by defining a normal form for LP (CQ Σ ) programs that will be helpful.
An atomic linear constraint is a linear constraint of the form ∀x:Q. S ≤ S ′ , ∀x:Q.S = S ′ , S ≤ S ′ or S = S ′ where S and S ′ are linear sums.We insist on the fact that an atomic linear constraint has at most one conjunctive query on which the universal quantifier applies.An atomic linear sum is a linear sum of the form N , N z: =y (Q) with N a constant number (of the form r ∈ R or num(E)).A linear sum is said to be in normal form if it is written as a sum of atomic linear sum.A linear constraint is in normal form if it is written as a conjunction of atomic linear constraints on linear sum in normal form.Finally, an LP (CQ Σ ) is in normal form if its objective is a linear sum in normal form and if its constraint is in normal form.
It turns out that every LP (CQ Σ ) program can be written as an equivalent linear program in normal form of polynomial size.
Theorem 5.3.Let L be an LP (CQ Σ ).There exists a normal form LP (CQ Σ ) L ′ such that L ′ is of size at most |L| 3 and such that for every database D, close(L) D and close(L ′ ) D have the same constraints (up to permutations).
Proof.We first assume that L is written using Barendregt's variable convention [Bar12] to avoid variables capture.First we observe that a linear constraint can always be written as a conjunction of constraints of the form where B is of the form S = S ′ or S ≤ S ′ (we consider that k = 0 corresponds to the case without quantifier).It comes from the fact that the closure of ∀x : Q.(C 1 ∧ C 2 ) will generate the same constraints as (∀x : Q.C 1 ) ∧ (∀x : Q.C 2 )).One can hence apply this transformation until all constraints are of the desired form.The transformation applied to L results in an LP (CQ Σ ) L 1 of size at most d ∀ |L| where d ∀ is the depth of universal quantifiers of L, that is, the maximal number of universal quantifiers that are enclosed in one another.Indeed, one can see that L 1 will contain a conjunction of constraints of the form ∀x 1 : Q where B is of the form S = S ′ or S ≤ S ′ and appears in L enclosed in several quantifiers Similarly, one can rewrite each linear sum S as a sum of expressions of the form where B is of the form N , weight z:x .=y (Q) or N weight z:x .=y (Q) (again, the case k = 0 corresponds to the case without ).Indeed, we proceed similarly by observing that the closure of ( x:Q (S 1 + S 2 )) produces the same terms as ( x:Q S 1 ) + ( x:Q S 2 ).Like above, the transformation applied to L 1 results in an LP (CQ Σ ) L 2 of size at most where d Σ is the maximal number of enclosed expressions of L. It remains to observe that a constraint of the form ∀x 1 : Q 1 . . .∀x k : Q k .B where B is of the form S = S ′ or S ≤ S ′ can be rewritten as This is only true when set(x 1 ), . . ., set(x k ) are pairwise disjoint, which is ensured by the fact that we adopted Barendregt's variables convention.Indeed, we can always rename variables that are bounded by a quantifier so that they have different names.We show it for the case k = 2, the general case being a straightforward induction from there.
Let D be a database and let By definition, for α mapping every free variables of E and F (they have the same free variables by definition), we have: where are the same as the set of γ such that: 9:32 F. Capelli, N. Crosetti, J. Niehren, and J. Ramon Vol. 20:1 which is clear from the definition of conjunctive queries and the fact that set(x 1 ) ∩ set(x 2 ).Similarly, the closure of linear sums of the form will generate the same terms as Linear programs in LP (CQ Σ ) in normal form allows us to upper bound the size of the closure more precisely.For an LP (CQ Σ ) L in normal form, we denote by Queries ∀ (L) the set of conjunctive query Q that appear in L in an expression of the form ∀x : Q and by Queries Σ (L) the set of conjunctive query Q that appear in L in an expression of the form x:Q .As for LP clos (CQ Σ ), we let Queries w (L) be the set of conjunctive query Q that appear in L in an expression of the form weight z:x .=y (Q).We let AGM ∀ (L) to be max Q∈Queries ∀ (L) AGM(Q) and similarly, AGM Σ (L) is max Q∈Queries Σ (L) AGM(Q) and AGM w (L) is max Q∈Queriesw(L) AGM(Q).
Observe that, given a database D, an atomic linear constraint of the form ∀x:Q. B will generate at most |D| AGM(Q) constraints in the closure close(L) D .Similarly, an atomic linear sum of the form x:Q B will generate at most |D| AGM(Q) terms in close(L) D .Hence, we have the following: While it is not clear to us how one could avoid unfolding every universal quantifiers when constructing the factorized interpretation of an LP (CQ Σ ), we explain how one could possibly get a complexity that sometimes may be smaller than |D| AGM Σ (L) when computing the closure of a linear expression.Observe that when computing the closure over a database D of an expression of the form x:Q 1, it will evaluate to the size of Q D .Now, if Q has fractional hypertree width k, one can reduce the data complexity of this task from O(|D| AGM(Q) ) to O(|D| k ) using [PS13].Similarly, when evaluating expression of the form |set(y) which will be of size at most |D| k ′ .Now observe that the closure of Again, if Q has bounded fractional hypertree width k, one can compute K α efficiently since the condition γ(z) = α(y) corresponds into some condition of the values of that some variables in x have to take.Hence, since computing the list of possible α being doable in time O(|D| k ′ ) and since computing K α for each α can be done in time O(|D| k ), we can compute the closure in time O(|D| k+k ′ ) if we have the relevant tree decomposition.5.3.Case study.The practical performances of our idea heavily depends on how linear solvers perform on factorized interpretation.We compared the performances of GLPK on both the natural interpretation and the factorized interpretation of the resource delivery problem from Section 1.1 using some synthetic data.
For each run we fixed an input size m as well as a domain D of size n = f (m).We then generated each input table of arity k by uniformly sampling m tuples from the n k possible tuples on D. The value of k was defined so that the ratio of selected tuples m n k was constant throughout the runs.We used Python and the Pulp library to build the linear programs as well as a hard-coded tree-decomposition of the dlr query (see Section 1.1).The tests were run on an office laptop by progressively increasing the size of the generated input tables.A summary of our experiments is displayed on Figure 10.allowed us to born the size of the closure, which is now implicitly done by observing that in this case, AGM ∀ (L) + AGMΣ(L) ≤ 2. This new formulation allows us to be more precise and provides better bounds.Interestingly, in this example, the theoretical guarantees given by the factorized interpretation should not be much better than the theoretical guarantees given by the natural interpretation.Indeed, recall that the query considered in this example is the following one: Observe that the existentially quantified variables functionally depends on the free variables of dlr .For example, in the table prod , q represents the quantity of objects o that factory f can produce, q is hence functionally dependent on f, o.Hence, the AGM bound for dlr implies that for every database D, dlr D is of size at most |D| 2 .Now, observe that the fractional hypertree width of dlr is also 2. Hence, both the factorized interpretation and the natural interpretation may have up to O(|D| 2 ) variables.Observe however, that, in the light of Lemma 2.4, the number of variables in the factorized interpretation will never exceed the number of variables in the natural interpretation multiply by a factor depending only on the size of the tree decomposition.But, even if the decomposition has width 2, the factorized interpretation may still be smaller in practice than the natural interpretation and that is what our experiments show.The decomposition we used for the experiments consist in a normalized version of the tree decomposition having two connected vertices u 1 , u 2 with B(u 1 ) = {f, o, q, w, b, c 2 } and B(u 2 ) = {b, o, q 2 , f, w, c}.Hence, the variables in the factorized interpretation will be the projection of dlr D over B(u 1 ) and B(u 2 ).It turns out that in the syntactical data we have experimented on, these projections are much smaller than the total size of dlr D , leading to a factorized interpretation that is more succinct than the natural one.
As expected when comparing both linear programs we observed a larger number of constraints (due to the soundness constraints) in the factorized interpretation.We observe that the factorized interpretation has less variables, as explained in the previous paragraph.While building the natural interpretation quickly became slower than building the factorized interpretation, we did not analyze this aspect further since we are not using a database engine to build the natural interpretation and solve it directly from the tree decomposition, which may not be the fastest method without further optimizations.Most interestingly solving the factorized interpretation was faster than solving the natural interpretation in spite of the increased number of constraints thanks to the decrease in the number of variables.In particular for an instance with an input size of 2000 lines per table, the natural interpretation had roughly 1.5 million variables while the factorized interpretation had only roughly 150000.The solving time was also noticeably improved at 22s for the factorized case against 106s for the natural one.

Weightings on Tree Decompositions
This section is dedicated to the proof of Theorem 4.5 stating the soundness of the factorized interpretation.The soundness of the factorized interpretation boils down to a purely algebraic result concerning conjunctive queries that we now explain.Let Q be a conjunctive query, D a database, T = (V, E, B) a tree decomposition of Q.We are interested in weightings of Q D , that is, mappings w : Q D → R + .Such a mapping naturally defines a mapping π T (w) from {α |B(u) | α ∈ Q D , u ∈ V} to R + as follows: for β = α |B(u) for some u ∈ V, we define the projection of w on T as follows: π T (w)(β) = γ∈ Q D : γ |B(u) =β w(γ).That is, the weight of β is obtained by summing the weight of every answer of Q compatible with β.
We show in Section 6.1 that the soundness of the factorized interpretation boils down to inverting this projection.Namely, if we are given a weighting W of {α |B(u) | α ∈ Q D , u ∈ V}, can we construct a weighting ⨿(W ) of Q D such that π T (⨿(W )) = W ?While this is not always possible, we show in Section 6.2 that it is always possible as long as W is sound, a property that roughly says that W (β) and W (γ) could not be independent if β and γ are compatible tuples (that is, they assign the same value to their common variables).The proof structure is as follows: Proposition 6.6 established that π T (w) is sound.Then Theorem 6.13 explains the constructions of ⨿(W ) from a sound W .The proof of this theorem relies on a good understanding on how projections of the form Q D |B(u) are related to one another, which is done via several intermediate lemmas presented in Section 6.2.6.1.Factorized interpretation and weightings.By Proposition 3.5, to prove Theorem 4.5, it is sufficient to show that for every quantifier free conjunctive query Q with tree decomposition T = (V, E, B) and for any set W over weight expressions over Q, we have ρ T,D (W) |ν(W) = ⟨wc ν (W)⟩ D |ν(W) .In other words, we have to prove the following: Now recall that ⟨wc ν (W)⟩ D is a set of equality constraints of the form: In other words, w 2 ∈ ⟨wc ν (W)⟩ D is a function that maps every θ α Q to a value in R + and maps ν(weight We thus have a one-to-one correspondence between ⟨wc ν (W)⟩ D and weightings of Q D by associating w 2 ∈ ⟨wc ν (W)⟩ D to the weighting ω of Q D such that for every α ∈ Q D , ω(α) := w 2 (θ α Q ).Similarly, recall that ρ T,D (W) is a set of equality constraints of the form: and a conjunction of local soundness constraints, for every edge e = (u, v) ∈ E of T and γ ∈ Q D |B(u)∩B(v) , we define the equality constraint E e,D γ (Q) as follows: Hence, one can naturally associate ρ T,D (W) to a family of weightings (W u ) u∈V where W u is a weighting of Q D |B(u) .We do so by associating w 1 ∈ ρ T,D (W) to the family of weightings W u such that for every β ∈ Q D |B(u) , W u (β) = w 1 (ξ β Q,u ).
Observe however that every family of (W u ) u∈V cannot always be mapped back to ρ T,D (W) since it may not satisfy the local soundness constraints.One needs also (W u ) u∈V to be sound, that is, for every edge (u, v) ∈ E and γ ∈ Q D |B(u)∩B(v) , we have: Observe that, when interchanging ξ β Q,u and W u (β) in the equality, it exactly corresponds to the local soundness constraints of Q over D. Hence, we have a one-to-one correspondence between ρ T,D (W) and sound family of weightings (W u ) u∈V .
Hence, proving Lemma 6.1 boils down to the following.For a query Q, a set of weight expressions W, a tree decomposition of T = (V, E, B) of Q compatible with W, we have: • Given a weighting w of Q D , there exists a sound family of weightings (W u ) u∈V such that for every weight x .=c (Q) ∈ W, we have: where u is the vertex of T closest to the root such that B(u) = set(x).• Given a sound family of weightings (W u ) u∈V , there exists a weighting w of Q D such that for every weight x .=c (Q) ∈ W, we have: where u is the vertex of T closest to the root such that B(u) = set(x).
The existence of such weightings will be proven in Theorem 6.13.Proving the first item is actually relatively straightforward as it is sufficient to define W u ([x/c]) as α∈ Q D α(x)=c w(α) and prove that it yields a sound weighting.The second item requires a bottom up inductive construction from T .Section 6.2 is dedicated to proving this correspondence between weightings.
6.2.Constructing Weightings.To make notations lighter, we fix in this section a relation A ⊆ D X = {α | α : X → D} on a finite set of variables X.In this work, A can be thought as the answer set of a conjunctive query Q with fv (Q) = X on database with domain D but the results presented in this section could apply to any relation that is conjunctive with respect to a tree decomposition (see Definition 6.7 for more details).
We also fix T = (V, E, B) a decomposition tree for X and define a few useful notations.Given two nodes u, v ∈ V we denote the intersection of their bags by B uv = B(u) ∩ B(v).We denote by ↓u the set of vertices v such that v is in the subtree rooted in u (u included) and by ↑u the set of vertices containing u and every vertex v not in ↓u.We extend the notation: B(↓u) (resp.B(↑u)) is the union of B(v) for v in ↓u (resp.↑u).6.2.1.Projections and Extensions.We start by introducing a few notations to formally restrict relations and manipulate weightings on relations that will be necessary to write down the proofs.Let X ′ ⊆ X ⊆ X .For any α ′ : X ′ → D we define the set of its extensions into A by: For a weighting ω of A and subset of variables X ′ ⊆ X, the projection of ω on X ′ denoted as π X ′ (ω) : A |X ′ → R + is defined for all α ′ ∈ A |X ′ as: We make a few useful observations on how extensions and projections interact with one another.Formal proofs of these statements may be found in the appendix.Lemma 6.2.For any two Lemma 6.4.For A ∈ D X , ω : A → R + , X ′′ ⊆ X ′ ⊆ X: π X ′′ (ω) = π X ′′ (π X ′ (ω)).6.2.2.Weighting Collections.Definition 6.5.A family Ω = (Ω v ) v ∈V is a weighting collection on T for A if it satisfies the following conditions for any two nodes u, v ∈ V: -Ω u is a weighting of A |B(u) : i.e., Ω u : A |B(u) → R + .
Intuitively, the soundness of a weighting collection on T is a minimal requirement for the existence of a weighting ω of A such that Ω u is the projection of ω on the bag B(u) of T , that is Ω u = π B(u) (ω) since we have the following: Proposition 6.6.For any weighting ω : A → R + , the family (π B(v) (ω)) v∈V is a weighting collection on T for A.
Proof.For any u ∈ V let W u = π B(u) (ω).The first condition on weighting projections holds trivially so we only have to show that the soundness constraint holds.By definition of W u , π B uv (W u ) = π B uv (π B(u) (ω)).Observe that B uv ⊆ B(u) so by Lemma 6.4 π B uv (W u ) = π B uv (ω).Similarly π B uv (W v ) = π B uv (ω).
What is more interesting is the other way around.For any weighting ω : A → R + with A ⊆ D X and decomposition tree T = (V, E, B) of X, let: Π T (ω) = (π B(v) (ω)) v∈V So the question is then given Ω = (Ω u ) u∈V a weighting collection on T whether we can find a weighting ω of A such that Ω = Π T (ω).It turns out that soundness is not enough to ensure the existence of such a weighting.6.2.3.Conjunctive Decompositions.However it becomes possible when A is conjunctively decomposed, as we define next.For this, given a decomposition tree T = (V, E, B) and a subsets V ⊆ V we define B(V ) = v∈V B(v).Definition 6.7.Let T = (V, E, B) be a decomposition tree of X ⊆ X .We call a subset of variable assignments A ⊆ D X conjunctively decomposed by T if for all u ∈ V and β ∈ A |B(u) : Note that the inverse inclusion does hold in general in any case.Proposition 6.8.For any tree decomposition T of a quantifier free conjunctive query Q ∈ CQ Σ and database D ∈ db Σ , the answer set Q D is conjunctively decomposed by T .
Proof.Let u be a node of T .Let R(x) be an atom of Q such that x ̸ ⊆ B(u).Then we either have x ⊆ B(↓u) or x ⊆ B(↑u).Indeed, by definition, there exists v in T such that x ⊆ B(v).Since it is not u, v is either in B(↓u) or B(↑u) and the result follows.Hence Q can be written as a conjunction Q 1 ∧ Q 2 where the variables of Q 1 are included in B(↓u) and the variables of Q 2 are included in B(↑u) by defining Q 1 as the set of atom R(x) of Q such that x ⊆ B(↓u) and Q 2 to be the other atoms (observe that if x ⊆ B(u), R(x) will appear in Q 1 by definition).
Moreover To do so, it is sufficient to observe that α 1 ∈ Q 1 D and α 2 ∈ Q 2 D which is true since α 1 (and symmetrically α 2 ) is a projection of some α ′ ∈ Q D to B(↓u) and that the variables of Q 1 are included in B(↓u).It shows that Q D is conjunctively decomposed by T .Proposition 6.8 does not hold when Q is not quantifier free.It is the reason why this technique only works when every query in the linear program are quantifier free.
Conjunctive decomposition is necessary to get clean relations between the projections of the form A |B(u) for a vertex u and projections of A |B(v) for v a child of u.We express these relations depending on the type of u in Lemma 6.9, 6.10, 6.11 and 6.12.These relations will be necessary to prove the correctness of our construction.Lemma 6.9 (Extend nodes).Let T be a decomposition tree of X, u an extend node of T with child v, and A ⊆ D X a subset of variable assignments.If A is conjunctively decomposed by T then any assignment β ∈ A |B(u) satisfies: minimize (f,w,c):route(f,w,c) num(c) weight x:f ′ .=f ∧w ′ .=w (Q)+ (w,b,c):route(w,b,c) num(c) weight x:w ′ .

Figure 2 :
Figure 2: The set of linear programs Lp with variables ξ ∈ Ξ and constants r ∈ R. 2.1.Linear Programs.Let Ξ be a set of linear program variables.In Figure 2, we recall the definition of the sets of linear expressions Le, linear constraints Lc, and linear programs Lp with variables in Ξ.We consider the usual linear equations S .=S ′ as syntactic sugar for the constraints S ≤ S ′ ∧ S ′ ≤ S. For any linear program L = maximize S subject to C we call S the objective function of L and C the constraint of L. Note that the linear program minimize S subject to C can be expressed by maximize −1 • S subject to C.

2. 3 .
Relational Databases.A relational Σ-structure is a tuple D = (Σ, D, • D ), where Σ = ((R (n) ) n≥0 , C) is a relational signature, D a finite set, c D ∈ D an element for each constant c ∈ C and R D ⊆ D n a relation for any relation symbol R ∈ R (

Figure 5 :
Figure 5: The answer set of a conjunctive query Q ∈ CQ Σ on a database D ∈ db Σ for a set of variables X ⊇ fv (Q).

Figure 6 :
Figure 6: Linear sums, constraints, and programs with closed weight expressions containing conjunctive queries Q ∈ CQ Σ where r ∈ R.

Definition 3. 1 .
For any linear program L ∈ LP clos (CQ Σ ) we define the natural interpretation ⟨L⟩ D ∈ LP by replacing any weight expression S in L by ⟨S⟩ D .By applying the same substitution we define the interpretation ⟨S⟩ D ∈ LS of any linear sum S ∈ LS clos (CQ Σ ) and the interpretation ⟨C⟩ D ∈ LC of any linear constraint C ∈ LC clos (CQ Σ ) in analogy.
=c (Q) in L.An interpretation I = (I W , I C ) of LP clos (CQ Σ ) is a pair of functions such that, given a database D over signature Σ: • I W maps every weight expression W := weight x .=c (Q) and database D to a linear sum I D W (W ) over linear program variables X D I,Q , • I C maps every conjunctive query Q over signature Σ and database D to a set of constraints variables.Given a linear program L = maximize S subject to C with some linear sum S ∈ LS clos (CQ Σ ) and some linear constraint C ∈ LC clos (CQ Σ ), we denote by W Q (L) the set of weight expressions of L over Q.The I-interpretation of L is the following linear program I D (L): the variables of I D Q (L) and of I D Q ′ (L) are disjoint, it allows us to prove the following sufficient condition for an interpretation to be sound, that only depends on the value of the interpretation on weight expressions over one conjunctive query Q: Proposition 3.5.Let I = (I W , I C ) be an interpretation of LP clos (CQ Σ ) such that for every conjunctive query Q and set W of weight expressions over Q we have that I D Q (W) |ν(W) = ⟨wc ν (W)⟩ D |ν(W) .Then I is sound, that is, for every L ∈ LP clos (CQ Σ ) and database D, opt(⟨L⟩ D ) = opt(I D (L)).Proof.Let L = maximize S subject to C. By Proposition 3.4, opt(⟨L⟩ D ) = opt(⟨repl ν (L)⟩ D ).Hence it is sufficient to show that opt(⟨repl ν (L)⟩ D ) = opt(I D (L)).Now recall that ⟨repl ν (L)⟩ D and I D (L) have the same objective function ⟨S⟩ ν and the same constraint ⟨C⟩ ν on variables ν(W).Only the last part of the program is different.I D (L) contains additional constraints Q∈Queriesw(L) I D Q (W Q (L)) and ⟨repl ν (L)⟩ D contains ⟨wc ν (W(L))⟩ D .
Definition 4.1.Let L ∈ LP clos (CQ Σ ), ∃y.Q ∈ Queries w (L), with Q the quantifier free part of ∃y.Q and weight x .=c(∃y.Q) a weight expression of L. A tree decomposition T = (V, E, B) of ∃y.Q compatible with weight x .
be done efficiently with respect to the width of the decomposition: Lemma 4.3.Let k be the width of T .The size of Ξ D Q,T is at most |V| • |D| k and one can compute Ξ D Q,T in time O(|T | • |D| k log(|D|)).Proof.It follows directly by Lemma 2.4 in Section 2.4.We define the factorized interpretation of the weight expressions weight x .

.
=y and qf (Q ′ ) contains y ′ .=y′ .Hence, exploiting this one to one correspondence, for L = maximize S subject to C with some linear sum S ∈ LS clos (CQ Σ ) and some linear constraint C ∈ LC clos (CQ Σ ), we can see ⟨repl ν (qf (L))⟩ D as: 9:25maximize ⟨S⟩ ν subject to ⟨C⟩ ν ∧ W ∈W(L) ν(W ) = M D (W )where M D maps weight expressions of L to a sum expression.In other words, one can see that ⟨repl ν (qf (L))⟩ D as I D (L) where I = (M, true) is the alternate interpretation in the sense given in Section 3.4 defined as follows: for a weight expression W = weight x .

Figure 7 :
Figure 7: Linear sum, constraints, and programs with open weight expressions over conjunctive queries Q, Q ′ ∈ CQ Σ , with variables x ∈ X, sequences of variables x ∈ X * , constants c ∈ C, and reals r ∈ R.

Figure 8 :
Figure 8: Free variables of expressions, constraints, and programs, where var(y) denotes the elements of set(y) that are not constants.

Figure 9 :
Figure 9: Closure close(F ) D,γ of linear programs F with relational descriptors L ∈ LP (CQ Σ ) to linear programs with set descriptors close(L) D ∈ LP clos (CQ Σ ).Furthermore, γ : Y → D a variable assignment with fv (F ) ⊆ Y ⊆ X , and γ = γ |Y \set(x).Recall that subs is the substitution operator and ext x is the extension operator as defined in Section 2.2 and num D a partial function from dom(D) to R. Since a linear programs L ∈ LP (CQ Σ ) do not have free variables, the closure of L indeed produces a linear program in LP clos (CQ Σ ), as stated below: Proposition 5.1.For any linear programs L ∈ LP (CQ Σ ) with open weight expressions and database with numerical values D = (Σ, D, • D , num D ) such that D ⊆ C, the closure close(L) D is a linear program in LP clos (CQ Σ ).

Proposition 5. 4 .
Let L be an LP (CQ Σ ) in normal form and D be a database.close(L) D has at most |L| • |D| AGM ∀ (L) constraints, of size at most |L| • |D| AGM Σ (L) .In particular, ⟨L⟩ D has at most |L| • |D| AGM ∀ (L) constraints, |L| • |D| AGMw(L) variables and can be computed in time O(|L| • |D| AGM Σ (L)+AGM ∀ (L)+AGMw(L) ).Hence, there exists ℓ < 2.37286 such that one can compute opt(⟨L⟩ D ) in time O(|L| ℓ • |D| ℓ(AGM Σ (L)+AGM ∀ (L)+AGMw(L)) ).Proposition 5.4 directly implies that LP (CQ Σ ) can be solved in polynomial time in the data complexity.Factorized interpretation of LP (CQ Σ ).One can get a better time complexity than the one stated in Proposition 5.4 by exploiting the factorized interpretation presented in Section 4. Indeed, one can directly lift Definition 4.2 of hypertree decomposition and fractional hypertree width of LP clos (CQ Σ ) to LP (CQ Σ ).Now, observe that for every LP (CQ Σ ) programs for L and every database D, we have Queries w (L) = Queries w (close(L) D ).Hence we have the following:Lemma 5.5.Let L be an LP (CQ Σ ) program and let T be a tree decomposition of L. Then for every D, T is a tree decomposition of close(L) D .Hence, let L be an LP (CQ Σ ) in normal form and T a tree decomposition of L of width k.Given a database D and an LP (CQ Σ ) L in normal form, one can compute L ′ = close(L) D in time O(|L| • |D| AGM ∀ (L)+AGM Σ (L) ).Then using Theorem 4.9, one can compute opt(L ′ ) in time O((|L ′ | + qt|D| k ) ℓ ) where q and t respectively denotes the sum of sizes of the queries in Queries w (L) and of the tree decompositions in T. Hence, in the data complexity model, there exists ℓ < 2.37286 such that one can compute opt(⟨L⟩ D ) in time O(|D| ℓp ) where p = max(k, AGM ∀ (L) + AGM Σ (L)) 5 .

Figure 10 :
Figure 10: Number of variables and performances of GLPK for natural (blue) and factorized (red) interpretation of the resource delivery problem with respect to table size.
) which is itself smaller than |D| k by Lemma 2.4.Hence there are at most |T | • |D| k constraints in ρ ′ ∈ Q D |B(v) and let γ = β ′ |B(u).If it is not yet constructed, we create an empty linear sum S γ v and add ξ β ′ Q,v in it.If S γ v already exists, we just append +ξ β ′ Q,v to it.Finally, we let E e,D γ (Q) be S γ u .
Linear Programs with Existentially Quantified Conjunctive Queries.One drawback of Theorem 4.5 is that it only works for a linear program L in LP clos (CQ Σ ) Theorem 4.7.Given a relational signature Σ, L ∈ LP clos (CQ Σ ), a database D and optimal tree decompositions T of L of width fhtw(L), one can compute opt(⟨L⟩ D ) in time O(|L| ℓ |D| ℓ•fhtw(L) ) with ℓ < 2.37286.4.3.
→1,y →0.3Complexity of solving LP (CQ Σ ) programs.In this section, we are interested in the complexity of solving LP (CQ Σ ) programs.Hardness.Universal quantifiers make the complexity of solving programs in LP (CQ Σ ) much harder than for LP clos (CQ Σ ):Theorem 5.2.The problem of computing ⟨L⟩ D for an LP (CQ Σ ) L and a database D with real values given in the input is #P-hard.Proof.Given a conjunctive query Q on variables x, we define L Q to be the following LP (CQ Σ ) program: maximize weight true (Q) subject to ∀y:Q(y).weight x:x .