Answer Counting under Guarded TGDs

We study the complexity of answer counting for ontology-mediated queries and for querying under constraints, considering conjunctive queries and unions thereof (UCQs) as the query language and guarded TGDs as the ontology and constraint language, respectively. Our main result is a classification according to whether answer counting is fixed-parameter tractable (FPT), W[1]-equivalent, #W[1]-equivalent, #W[2]-hard, or #A[2]-equivalent, lifting a recent classification for UCQs without ontologies and constraints due to Dell et al. The classification pertains to various structural measures, namely treewidth, contract treewidth, starsize, and linked matching number. Our results rest on the assumption that the arity of relation symbols is bounded by a constant and, in the case of ontology-mediated querying, that all symbols from the ontology and query can occur in the data (so-called full data schema). We also study the meta-problems for the mentioned structural measures, that is, to decide whether a given ontology-mediated query or constraint-query specification is equivalent to one for which the structural measure is bounded.


Introduction
Tuple-generating dependencies (TGDs) are a prominent formalism for formulating database constraints. A TGD states that if certain facts are true, then certain other facts must be true as well. This can be interpreted in different ways. In ontology-mediated querying, TGDs give rise to ontology languages and are used to derive new facts in addition to those that are present in the database. This makes it possible to obtain additional answers if the data is incomplete and also enriches the vocabulary that is available for querying. In a more classical setup that we refer to as querying under constraints, TGDs are used as integrity constraints on the database, that is, a TGD expresses the promise that if certain facts are present in the database, then certain other facts are present as well. Integrity constraints are relevant to query optimization as their presence might enable the reformulation of a query into a 'simpler' one. TGDs generalize a wide range of other integrity constraints such as referential integrity constraints (also known as inclusion dependencies), which was the original motivation for introducing them [AHV95]. considered in [KR15,KK18,BMT20,CCLR20,BMT21a,BMT21b]. There, conjunctive queries are equipped with dedicated counting variables and the focus is to decide, given an OMQ Q = (O, S, q), an S-database D and a k ≥ 0 whether there is a model of D and O such that the homomorphisms from q to that model yield at least/at most k bindings of the counting variables. The ontology languages studied are versions of the description logic DL-Lite, with the exception of [BMT21b] which studies the description logic ELI. These can all be viewed as guarded TGDs, up to a certain syntactic normalization in the case of ELI.
Query evaluation under constraints that are guarded TGDs has been considered in [BGP16,BFGP20]. A main result is an FPT upper bound for CQs that have bounded generalized hypertreewidth modulo equivalence. These papers also study the meta problem for querying under constraints that are guarded TGDs and for the measure of generalized hypertree width. A topic closely related to the evaluation of queries under constraints is query containment under constraints, see for example [CGL98,JK84,Fig16]. We are not aware that answer counting under integrity constraints has been studied before.

Preliminaries
For an integer n ≥ 1, we use [n] to denote the set {1, . . . , n}. To indicate the cardinality of a set S, we may write #S or |S|.

Relational Databases.
A schema S is a set of relation symbols R with associated arity ar(R) ≥ 0. We write ar(S) for max R∈S {ar(R)}. An S-fact is an expression of the form R(c), where R ∈ S andc is an ar(R)-tuple of constants. An S-instance is a (possibly infinite) set of S-facts and an S-database is a finite S-instance. We write adom(I) for the set of constants in an instance I. For a set S ⊆ adom(I), we denote by I |S the restriction of I to facts that mention only constants from S. A homomorphism from I to an instance J is a function h : adom(I) → adom(J) such that R(h(c)) ∈ J for every R(c) ∈ I where h(c) means the component-wise application of h. A guarded set in a database D is a set S ⊆ adom (D) such that all constants in S jointly occur in a fact in D, possibly together with other constants. With a maximal guarded set, we mean a guarded set that is maximal regarding set inclusion . We next introduce some operations on instances that are used in the paper. An induced subinstance of an S-instance I is any S-instance I ′ obtained from I by choosing a ∆ ⊆ adom(I) and putting I ′ = {R(c) ∈ I |c ∈ ∆ ar(R) }. If I and I ′ are finite, then we speak of an induced subdatabase. The disjoint union of two S-instances I 1 and I 2 with adom(I 1 ) ∩ adom(I 2 ) = ∅ is simply I 1 ∪ I 2 . The direct product of two S-instances I 1 and I 2 is the S-instance I with domain adom(I) = adom(I 1 ) × adom(I 2 ) defined as I = {R((a 1 , b 1 ), . . . , (a n , b n )) | R(a 1 , . . . , a n ) ∈ I 1 and R(b 1 , . . . , b n ) ∈ I 2 }.
An instance I ′ is obtained from an instance I by cloning constants if I ′ ⊇ I can be constructed by choosing c 1 , . . . , c n ∈ adom(I) and positive integers m 1 , . . . , m n , reserving fresh constants c i 1 1 , . . . , c in n with 1 ≤ i ℓ ≤ m ℓ for 1 ≤ ℓ ≤ n, and adding to I each atom R(c ′ ) that can be obtained from some R(c) ∈ I by replacing each occurrence of c i , 1 ≤ i ≤ n, with c j i for some j with 1 ≤ j ≤ m i .

CQs and UCQs.
A conjunctive query (CQ) q(x) over a schema S is a first-order formula of the form ∃ȳ φ(x,ȳ) wherex andȳ are disjoint tuples of variables and φ is a conjunction that may contain relational atoms R i (x i ) with R i ∈ S andx i a tuple of variables of length 16:6 C. Feier, C. Lutz, and M. Przyby lko Vol. 19:3 ar(R i ) as well as equality atoms x 1 = x 2 . The variables used in φ must be exactly those inx andȳ, and only variables fromx may appear in equality atoms. We assume thatx contains no repeated variables, which is w.l.o.g. due to the presence of equality atoms. With var(q), we denote the set of variables that occur inx or inȳ. Whenever convenient, we identify a conjunction of atoms with a set of atoms. When we are not interested in order and multiplicity, we treatx as a set of variables. A CQ is equality-free if it contains no equality atoms. Note that we do not admit constants in CQs. 2 We write CQ for the class of all CQs. Every CQ q(x) can be seen as a database D q in a natural way, namely by dropping the existential quantifier prefix and the equality atoms, and viewing variables as constants. A homomorphism h from a CQ q to an instance I is a homomorphism from D q to I such that x = y ∈ q implies h(x) = h(y). A tuplec ∈ adom(I) |x| is an answer to q on I if there is a homomorphism h from q to I with h(x) =c.
A union of conjunctive queries (UCQ) over a schema S is a first-order formula of the form q(x) := q 1 (x) ∨ · · · ∨ q n (x), where n ≥ 1, and q 1 (x), . . . , q n (x) are CQs over S. We refer to the variables inx as the answer variables of q and the arity of q is defined as the number of its answer variables. An example for a UCQ with two answer variables x 1 , x 2 is x 1 = x 2 ∨ ∃y R(x 1 , y) ∧ R(x 2 , y). A tuplec ∈ adom(I) |x| is an answer to q on instance I if it is an answer to q i on i, for some i with 1 ≤ i ≤ n. The evaluation of q on an instance I, denoted q(I), is the set of all answers to q on I. A (U)CQ of arity zero is called Boolean.
The only possible answer to a Boolean query is the empty tuple. For a Boolean (U)CQ q, we may write I |= q if q(I) = {()} and I ̸ |= q otherwise. Note that all notions defined for UCQs also apply to CQs, which are simply UCQs with a single disjunct. We write UCQ for the class of all UCQs.
Let q 1 (x) and q 2 (x) be two UCQs over the same schema S. We say that q 1 is contained in q 2 , written q 1 ⊆ S q 2 , if q 1 (D) ⊆ q 2 (D) for every S-database D. Moreover, q 1 and q 2 are equivalent, written q 1 ≡ S q 2 , if q 1 ⊆ S q 2 and q 2 ⊆ S q 1 .
We next define the important notion of a homomorphism core of a CQ q(x). The potential presence of equality atoms in q brings some subtleties. In particular, it is not guaranteed that there is a homomorphism from q to D q that is the identity onx. To address this issue, we resort to the database D ∼ q obtained from D q by identifying any constants/variables x 1 , x 2 such that x 1 = x 2 ∈ q. For V the set of all variables fromx that occur as constants in D ∼ q , it is easy to see that there is a homomorphism from q to D ∼ q that is the identity on all variables in V . We say that q is a core if every homomorphism h from q to D ∼ q that is the identity on V is surjective. Every CQ q(x) is equivalent to a CQ p(x) that is a core and can be obtained from q by dropping atoms. In fact, p is unique up to isomorphism and we call it the core of q. For a UCQ q, we use core(q) to denote the disjunction whose disjuncts are the cores of the CQs in q.
For a UCQ q, but also for any other syntactic object q, we use ||q|| to denote the number of symbols needed to write q as a word over a suitable alphabet.
Our main interest is in the complexity of counting the number of answers. Every choice of a query language Q, such as CQ and UCQ, and a class of databases D gives rise to the following answer counting problem: 2 We believe that, in principle, our results can be adapted to the case with constants. This requires a suitable revision of the structural measures defined in Section 3 as, for example, constants should not contribute to the treewidth of a CQ. Also, the results for CQs without ontologies that we build upon would first have to be extended to include constants. Vol A query q ∈ Q over some schema S and an S-database D ∈ D OUTPUT : #q (D) Our main interest is in the parameterized version of the above problem where we generally assume that the parameter is the size of the input query, see below for more details. When D is the class of all databases, we simply write AnswerCount(Q).
We call ϕ and ψ the body and head of T , denoted body(T ) and head(T ), respectively. An instance I over S satisfies T , denoted I |= T , if q ϕ (I) ⊆ q ψ (I). It satisfies a set of TGDs S, denoted I |= S, if I |= T for each T ∈ S. We then also say that I is a model of S. We write TGD to denote the class of all TGDs.
A TGD T is guarded if body(T ) is true or there exists an atom α in its body that contains all variables that occur in body(T ) [CGK13]. Such an atom α is a guard of T . While there may be multiple guard atoms in the body of a TGD, we generally assume that one of them is chosen as the actual guard and may thus speak of 'the' guard atom. We write G for the class of guarded TGDs. A TGD T is full if the tuplez of variables is empty, that is, it uses no existential quantification in the head. We use FULL to denote the class of full TGDs and shall often refer to G ∩ FULL, the class of TGDs that are both guarded and full. Note that this class is essentially the class of Datalog programs with guarded rule bodies.
Ontology-Mediated Queries. An ontology O is a finite set of TGDs. An ontology mediated query (OMQ) takes the form Q = (O, S, q) where O is an ontology, S is a finite schema called the data schema, and q is a UCQ. Both O and q can use symbols from S, but also additional symbols, and in particular O can 'introduce' additional symbols to enrich the vocabulary available for querying. We assume w.l.o.g. that all relation symbols in q that are not from S occur also in O. In fact, any OMQ violating this condition is trivial in that it never returns any answers. When O and q only use symbols from S, then we say that the data schema of Q is full. The arity of Q is defined as the arity of q. We write Q(x) to emphasize that the answer variables of q arex and for brevity often refer to the data schema simply as the schema.
A tuplec ∈ adom (D) |x| is an answer to Q on S-database D ifc ∈ q(I) for each model , is the set of all answers to Q on D.
The conjunctive query q asks to return all authors that have self-published and the ontology O adds knowledge about the domain of publications. Now consider the S-database D that consists of the following facts: hasAuthor(alice, carroll) hasPublisher(alice, macmillan) Book(finn) hasAuthor(finn, twain) hasPublisher(finn, twain) Book(beowulf) SelfPublication(beowulf).
A straightforward semantic analysis shows that twain ∈ Q(D), despite the fact that the database D does not explicitly state the fact that finn is a self-publication. While beowulf is a self-publication and we know from the ontology that it has an author, this author is not returned as an answer because their identity is unknown. In fact, Q(D) = {twain}.
An OMQ language is a class of OMQs. For a class of TGDs C and a class of UCQs Q, we write (C, Q) to denote the OMQ language that consists of all OMQs (O, S, q) where O is a set of TGDs from C and q ∈ Q. For example, we may write (G ∩ FULL, UCQ). We say that an OMQ language (C, Q) has full data schema if every OMQ in it has.
The Chase. We next introduce the well-known chase procedure for making explicit the consequences of a set of TGDs [MMS79,JK84,FKMP05,CGK13]. We first define a single chase step. Let I be an instance over a schema S and T = ϕ(x,ȳ) → ∃z ψ(x,z) a TGD over S. We say that T is applicable to a tuple (c,c ′ ) of constants in I if ϕ(c,c ′ ) ⊆ I. In this case, the result of applying T in I at (c,c ′ ) is the instance J = I ∪ ψ(c,c ′′ ), wherē c ′′ is the tuple obtained fromz by simultaneously replacing each variable z with a fresh distinct constant that does not occur in I. We describe such a single chase step by writing Let I be an instance and S a finite set of TGDs. A chase sequence for I with S is a sequence of chase steps such that (1) I 0 = I, (2) T i ∈ S for each i ≥ 0, and (3) J |= S with J = i≥0 I i . The instance J is the (potentially infinite) result of this chase sequence, which always exists. The chase sequence is fair if whenever a TGD T ∈ S is applicable to a tuple (c,c ′ ) in some I i , −−−−→ I j+1 is part of the sequence for some j ≥ i. Note that our chase is oblivious, that is, a TGD is triggered whenever its body is satisfied, even if also its head is already satisfied. As a consequence, every fair chase sequence for I with S leads to the same result, up to isomorphism. Thus, we can refer to the result of chasing I with S, denoted ch S (I).
The following lemma gives the well-known main properties of the chase.

Lemma 2.2.
(1) Let S be a finite set of TGDs and I an instance. Then for every model J of S with I ⊆ J, there is a homomorphism h from ch S (I) to J that is the identity on adom(I).
Point 1 can be proved by constructing h step by step, starting from the identity on adom(I) and following chase rules. Point 2 is an easy consequence of Point 1 and the semantics of OMQs.
We shall often chase with sets S of guarded full TGDs, that is, TGDs from G ∩ FULL. In contrast to the case of guarded TGDs, the chase is then clearly finite. Moreover, it can be constructed within the following time bounds. Vol The time bound stated in Lemma 2.3 can be achieved in a straightforward way. To find a homomorphism from a TGD ϕ(x,ȳ) → ψ(x) in S with guard R(x,ȳ) to D, we can scan D linearly to find all facts that R(x,ȳ) can be mapped to and then verify by additional scans that the remaining atoms in ϕ are also satisfied. This takes time ||D|| 2 · n, where n is the number of atoms in ϕ. Because all TGDs are guarded, it is easy to prove by induction on the number of chase rule applications that for every added fact R(b), all constants inb must co-occur in some fact T (c) in D where T occurs in S. Consequently, the chase can add at most ||D|| · k k · ℓ fresh facts where k is the maximum arity of relation symbols in S and ℓ is the number of relation symbols that occur on the right-hand side of a TGD in O. Note that k k is the maximum number of ways to choose a k-tuple of constants from a fact T (c) in D where T occurs in S.
For sets S of TGDs from G ∩ FULL, we may also chase a CQ q(x) with S, denoting the result with ch S (q). What we mean is the (finite) result of chasing database D q with S, viewing the result as a CQ with answer variablesx, and adding back the equality atoms of q (that are dropped in the construction of D q ). We then have the following.
Lemma 2.4. q(ch S (D)) = ch S (q)(ch S (D)) for all databases D, CQs q, and finite sets of TGDs S from G ∩ FULL.
The converse containment also holds as every homomorphism from q i to ch S (D) is also a homomorphism from ch O (q i ) to ch S (D). This can be shown by induction, considering all CQs q i = p 1 , . . . , p ℓ = ch O (q i ) that arise when chasing q i with O.
Treewidth. Treewidth is a widely used notion that measures the degree of tree-likeness of a graph. Let G = (V, E) be an undirected graph. A tree decomposition of G is a pair δ = (T δ , χ), where T δ = (V δ , E δ ) is a tree, and χ is a labeling function V δ → 2 V , i.e., χ assigns a subset of V to each node of T δ , such that: The width of δ is the number max t∈V δ {|χ(t)|} − 1. If the edge set E of G is non-empty, then the treewidth of G is the minimum width over all its tree decompositions; otherwise, it is defined to be one. Note that trees have treewidth 1. Each instance I is associated with an undirected graph (without self loops) G I = (V, E), called the Gaifman graph of I, defined as follows: V = adom(I), and {a, b} ∈ E iff there is a fact R(c) ∈ I that mentions both a and b. The treewidth of I is the treewidth of G I .
Parameterized Complexity. A counting problem over a finite alphabet Λ is a function P : Λ * → N and a parameterized counting problem over Λ is a pair (P, κ), with P a counting problem over Λ and κ the parameterization of P , a function κ : Λ * → N that is computable in PTime. An example of a parameterized counting problem is #pClique in which P maps (a suitable encoding of) each pair (G, k) with G an undirected graph and k ≥ 0 a clique size to the number of k-cliques in G, and where κ(G, k) = k. Another example is #pDomSet where P maps each pair (G, k) to the number of dominating sets of size k, and where again κ(G, k) = k. x 1 x 2 x 3 x 4 x 5 x 6 y 1 y 2 y 3 y 4 (b) Gaifman graph Gq of q x 1 x 2 x 3 =x 6 x 4 x 5 y 1 y 2 y 3 y 4 (c) contract of Gq x 1 x 2 x 3 =x 6 x 4 x 5 Figure 1. An example for Gaifman graphs and their contracts.
A counting problem P is a decision problem if the range of P is {0, 1}, and a parameterized decision problem is defined accordingly. An example of a parameterized decision problem is pClique in which P maps each pair (G, k) to 1 if the undirected graph G contains a k-clique and to 0 otherwise, and where κ(G, k) = k.
A parameterized problem (P, κ) is fixed-parameter tractable (fpt) if there is a computable function f : N → N such that P (x) can be computed in time |x| O(1) ·f (κ(x)) for all inputs x. We use FPT to denote the class of all parameterized counting problems that are fixedparameter tractable.
A Turing fpt-reduction from a parameterized counting problem (P 1 , κ 1 ) to a parameterized counting problem (P 2 , κ 2 ) is an algorithm that computes P 1 with oracle access to P 2 , runs within the time bounds of fixed parameter tractability for (P 1 , κ 1 ), and when started on input x only makes oracle calls with argument y such that κ 2 (y) ≤ f (κ 1 (x)), for some computable function f . The reduction is called a parsimonious fpt-reduction if only a single oracle call is made at the end of the computation and its output is then returned as the output of the algorithm without any further modification.
A parameterized counting problem (P, κ)  ) is C-equivalent if it is C-easy and C-hard. Note that we follow [CM15,DRW19] in defining both easiness and hardness in terms of Turing fpt-reductions; stronger notions would rely on parsimonious fpt-reductions [FG04].

The Classification Without TGDs
In the series of papers [DM14, DM15, CM15, CM16, DRW19], the parameterized complexity of answer counting is studied for classes of CQs and UCQs, resulting in a rather detailed classification. We present it in this section as a reference point and as a basis for establishing our own classifications later on. We start with introducing the various structural measures that play a role in the classification.
Let q(x) = ∃ȳ φ(x,ȳ) be a CQ. The Gaifman graph of q, denoted G q , is defined as Anx-component of G q is any undirected graph that can be obtained as follows: (1) take the subgraph of G q induced by vertex setȳ, (2) choose a maximal connected component (V c , E c ), and (3) re-add all edges from G q that contain at least one vertex from V c . Note that the last step may re-add answer variables as vertices, but no quantified variables. The contract of G q , denoted contract(G q ), is the restriction of G q to the answer variables, extended with every edge {x 1 , x 2 } ⊆x such that x 1 , x 2 co-occur in somex-component of G q . We shall often be interested in the treewidth of the contract of a CQ q, which we refer to as the contract treewidth (CTW) of q. An example is given in Figure 1. Part (a) shows CQ q(x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ) = ∃y 1 ∃y 2 ∃y 3 ∃y 4 T (y 1 , y 2 , where filled nodes indicate answer variables and hollow nodes quantified variables, the triangles represent the ternary relation T , and the edges the binary relation R. Part (b) shows the Gaifman graph G q of q, where x 3 and x 6 have been identified. The dashed blue boxes show thex-components and the contract of G q is shown in Part (c) with edges that have been added due to thex-components shown in red. Both the treewidth and contract treewidth of q are two. The starsize (SS) of q is the maximum number of answer variables in anyx-component of G q . Note that the same notion is called strict starsize in [CM15] and dominating starsize in [DRW19]. It is different from the original notion of starsize from [DM14,DM15]. The starsize of the CQ in Figure 1 is three.
A set of quantified variables S in q is node-well-linked if for every two disjoint sets S 1 , S 2 ⊆ S of the same cardinality, there are |S 1 | vertex disjoint paths in G q that connect the vertices in S 1 with the vertices in S 2 . For example, S is node-well-linked if G q | S takes the form of a grid or of a clique. A matching M from the answer variablesx to the quantified variablesȳ in the graph G q (in the standard sense of graph theory) is linked if the set S of quantified variables that occur in M is node-well-linked. The linked matching number (LMN) of q is the size of the largest linked matching fromx toȳ in G q . One should think of the linked matching number as a strengthening of starsize. We do not only demand that many answer variables are interlinked by the samex-component, but additionally require that this component is sufficiently large and highly connected ('linked'). In Part (b) of Figure 1, the purple edges in (b) indicate the maximal matching. The LMN of the CQ in that figure is two. Figure 2 contains some example CQs with associated measures. For a class of CQs C, the contract treewidths of CQs in C being bounded by a constant implies that the same is true for starsizes, and bounded starsizes in turn imply bounded linked matching numbers. In fact, the starsize of a CQ q is bounded by the contract treewidth of q plus one and its linked matching number is bounded by its starsize. There are no implications between treewidth and contract treewidth. In Figure 2, Example (a) generalizes to any treewidth while always having contract treewidth 1 and Example (c), which has contract treewidth 3, generalizes to any contract treewidth (and starsize) while always having treewidth 1. We refer to [CM15,DRW19] for additional examples.
It is a fundamental observation that cores of CQs are guaranteed to have minimum measures among all equivalent CQs, as stated by the following lemma [CM15,DRW19]. Lemma 3.1. If a CQ q is equivalent to a CQ of treewidth k, then core(q) has treewidth at most k. The same is true for contract treewidth, starsize, and linked matching number.
An additional ingredient needed to formulate the classification for UCQs emerges from [CM16]. There, Chen and Mengel associate with every UCQ q a set of CQs cl CM (q) such that counting the number of answers to q is closely tied to counting the number of answers to the CQs in cl CM (q). We now introduce this set, which we refer to as the Chen-Mengel closure, in detail.
Two CQs q 1 (x 1 ) and q 2 (x 2 ) over the same schema S are counting equivalent if #q 1 (D) = #q 2 (D) for all S-databases D. Let q(x) = p 1 ∨· · ·∨p n . The starting point for defining cl CM (q) is the observation that, by the inclusion-exclusion principle, every database D satisfies We can manipulate this sum as follows: if there are two summands c 1 · # i∈I 1 p i (D) and c 2 · # i∈I 2 p i (D) such that i∈I 1 p i and i∈I 2 p i are counting equivalent, then delete both summands and add (c 1 + c 2 ) · #( i∈I 1 p i (D) to the sum. After doing this exhaustively, delete all summands with coefficient zero. The elements of cl CM (q) are all CQs i∈I p i in the original sum that are counting equivalent to some CQ i∈J p i which remains in the sum. 3 Note that the number of CQs in cl CM (q) might be exponentially larger than the number of CQs in q and that cl CM (q) does not need to contain all CQs from the original UCQ q. For a class Q of UCQs, we use cl CM (Q) to denote q∈Q cl CM (q).
It is not hard to see that p {1} , p {2} , and p {3} are pairwise counting equivalent, and so are p {1,3} and p {2,3} . Moreover, p {1,2} and p {1,2,3} are equivalent and thus counting equivalent. Applying the manipulation described above, we thus arrive at the sum Note that cl CM (q) is defined so that for every S-database D, #q(D) can be computed in polynomial time from the counts #q ′ (D), q ′ ∈ cl CM (q). This, in fact, is the raison d'etre of the Chen-Mengel closure.
We are now ready to state the characterization. (1) If the treewidths and the contract treewidths of CQs in Q ⋆ are bounded, then AnswerCount(Q) is in FPT; it is even in PTime when Q ⊆ CQ. We remark that cl CM (q) = {q} when q is a CQ, and thus Q ⋆ = {core(q) | q ∈ Q} when Q ⊆ CQ in Theorem 3. 3. The assumption that relation symbols have bounded arity is needed only for the lower bounds, but not for the upper bounds.
Note that the classification given by Theorem 3.3 is not complete. It leaves open the possibility that there is a class of (U)CQs Q such that AnswerCount(Q) is #W[2]-hard, but neither #W[2]-equivalent nor #A[2]-equivalent. It is conjectured in [DRW19] that such a class Q indeed exists and in particular that there might be classes Q such that AnswerCount(Q) is #W func [2]-equivalent. The classification also leaves open whether having unbounded linked matching numbers is a necessary condition for #A[2]-hardness. While a complete classification is certainly desirable we note that, from our perspective, the most relevant aspect is the delineation of the FPT cases from the hard cases, achieved by Points (1)-(3) of the theorem.

Problems Studied and Main Results
We introduce the problems studied and state the main results of this paper. We start with ontology-mediated querying and then proceed to querying under constraints. Every OMQ language Q gives rise to an answer counting problem, defined exactly as in Section 2: A query q ∈ Q over schema S and an S-database D OUTPUT : #q (D) Our first main result is a counterpart of Theorem 3.3 for the OMQ language (G, UCQ), restricted to OMQs based on the full schema. To illustrate the effect on the complexity of counting of adding an ontology, we first show that the ontology interacts with all of the measures in Theorem 3.3.
Then q n is a core of treewidth ⌊ n 2 ⌋, contract treewidth n, starsize n, and linked matching number n. But the OMQ (O, S, q n ) is equivalent to (O, S, p n ) with p n obtained from q n by dropping all S-atoms. Since p n is tree-shaped and has no quantified variables, all measures are at most 1. Figure 3 depicts query q 4 .
Before we state our characterization, we observe as a preliminary that OMQs from (G, UCQ) can be rewritten into equivalent ones from (G ∩ FULL, UCQ), that is, existential quantifiers can be removed from rule heads when the actual query is adjusted in a suitable way. This has already been observed in the literature. The proof of Theorem 4.2 is constructive, that is, it provides an explicit way of computing, given an OMQ Q = (O, S, q) ∈ (G, UCQ), an equivalent OMQ from (G ∩ FULL, UCQ). We denote this OMQ with Q ∃ = (O ∃ , S, q ∃ ) and call it the ∃-rewriting of Q. It is worth noting that even if q contains no equality atoms, such atoms might be introduced during the construction of q ∃ . What is more, different CQs in the produced UCQ can comprise different equalities on answer variables, and thus repeated answer variables cannot be used. This is actually the main reason for admitting equality atoms in (U)CQs in this paper. Vol For OMQs Q ∈ (G, UCQ), we define a set cl CM (Q) of OMQs from (G, CQ) in exact analogy with the definition of cl CM (q) for UCQs q, that is, for Q = (O, S, p 1 ∨ · · · ∨ p n ), we use the OMQs (O, S, p i ) in place of the CQs p i from the UCQ q in the definition of cl CM (Q). This requires the use of counting equivalence for OMQs, which is defined in the expected way. For a class Q of OMQs, we use cl CM (Q) to denote Q∈Q cl CM (Q).
For a class Q ⊆ (G, UCQ), we now identify a class Q ⋆ of CQs by setting In other words, the CQs in Q ⋆ are obtained by choosing an OMQ from Q, replacing it with Q ∃ , then choosing an OMQ (O ∃ , S, p) from cl CM (Q), chasing p with O ∃ , and finally taking the core. Our first main result is as follows. The upper bounds also hold when the arity of relation symbols is unbounded.
Points (1) to (5) of Theorem 4.3 parallel exactly those of Theorem 3.3, but of course the definition of Q ⋆ is a different one. It is through this definition that we capture the potential interaction between the ontology and the structural measures. Note, for example, that the class of OMQs (O, S, q n ), n ≥ 1, from Example 4.1 would be classified as #A[2]-equivalent if core(ch O ∃ (p)) was replaced with p in the definition of Q ⋆ while it is in fact in FPT. Also note that the PTime statement in Point (1) of Theorem 3.3 is absent in Theorem 4.3. In fact, evaluating Boolean OMQs from (G, UCQ) is 2ExpTime-complete (ExpTime-hard when the arity of relation symbols is bounded by a constant) [CGK13] and since for Boolean OMQs evaluation coincides with answer counting, PTime cannot be attained.
Our second main result concerns querying under integrity constraints that take the form of guarded TGDs. In contrast to OMQs, the constraints are thus not used for deductive reasoning, but instead give rise to a promise regarding the shape of the input database. Following [BDF + 20], we define a constraint-query specification (CQS) to be a triple S = (T , S, q) where T is a finite set of TGDs over finite schema S and q a UCQ over S. We call T the set of integrity constraints. Overloading notation, we write (C, Q) for the class of CQSs in which the set of integrity constraints is formulated in the class of TGDs C, and the query is coming from the class of queries Q. It will be clear from the context whether (C, Q) is an OMQ language or a class of CQSs. Every class C of CQSs gives rise to the following answer counting problem.  Note that the delineation of the considered complexities is identical for ontology-mediated querying and for querying under constraints. In particular, Theorem 4.4 (implicitly) uses exactly the same class of CQs Q ⋆ and the same associated measures.
It would be interesting to know whether AnswerCount(Q) being in FPT coincides with AnswerCount(Q) being in PTime for classes of CQSs Q ⊆ (G, CQ). Note that this is the case for evaluation in the presence of constraints that are guarded TGDs [BGP16,BFGP20] and also for answer counting without constraints [CM15]. The proofs of these results, however, break in our setting.

Querying Under Integrity Constraints
We derive Theorem 4.4 from Theorem 4.3 by means of reduction, so that in the rest of the paper we may concentrate on the case of ontology-mediated querying. In fact, Theorem 4.4 is a consequence of Theorem 4.3 and the following result.
Theorem 5.1. Let C ⊆ (G, UCQ) be a recursively enumerable class of CQSs and let C ′ be C viewed as a class of OMQs based on the full schema. 4 Then there is a Turing fpt-reduction from AnswerCount(C ′ ) to AnswerCount(C) and there is a parsimonious polynomial time reduction from AnswerCount(C) to AnswerCount(C ′ ).
The reduction from AnswerCount(C) to AnswerCount(C ′ ) is immediate: given a set of guarded TGDs T , a CQ q, and an S-database D that satisfies T , we can view (T , S, q) as an OMQ Q based on the full schema and return #Q(D) as #q (D). It is easy to see that this is correct.
For the converse reduction, we are given a Q = (O, S, q) that is a CQS from C viewed as an OMQ and an S-database D. It seems a natural idea to simply view Q as a CQS, which it originally was, and replace D with ch O (D) so that the promise is satisfied, and to then return #q(ch O (D)) as #Q (D). However, there are two obstacles. First, ch O (D) need not be finite; and second, chasing adds fresh constants which changes the answer count. We solve the first problem by replacing the infinite chase with a (finite!) database D ⋆ that extends D and satisfies O. This is based on the following result from [BDF + 20] which is essentially a consequence of G being finitely controllable [BGO10]. (2)ā ∈ q(D * ) iffā ∈ Q(D) for all OMQs (O, S, q) where q has at most n variables and for all tuplesā that use only constants in adom (D).
To address the second problem, we correct the count. Note that this cannot be done by introducing fresh unary relation symbols as markers to distinguish the original constants from those introduced by the chase as this would require us to change the query, potentially leaving the class of queries that we are working with. We instead use an approach inspired by [CM15]. The idea is to compute #q(D ′ ) on a set of databases D ′ obtained from D ⋆ by cloning constants in adom(D) ⊆ adom(D ⋆ ). The results can be arranged in a system of equations whose coefficients form a Vandermonde matrix. Finally, the system can be solved to obtain #q (D). This is formalized by the following lemma where we use clones (D) to denote the class of all S-databases that can be obtained from S-database D by cloning constants.

Proof.
We first give a brief overview of the algorithm. Assume that the input is a UCQ q(x), a database D, and a set F . The algorithm first constructs databases D 1 , . . . , D |x|+1 by starting with D and cloning constants from F . Then, it computes #q(D j ) for 1 ≤ j ≤ |x| + 1 and, finally, constructs and solves a system of linear equations for which one of the unknowns is the desired value # q(D) ∩ F |x| . We now make this precise.
For 1 ≤ j < |x| + 1, database D j is constructed from D by cloning each element from F exactly j − 1 times. In particular, and each D j can be constructed in time ||q|| O(ar(S)) · ||D||. Now, for 0 ≤ i ≤ |x| and 1 ≤ j < |x| + 1, let q i (D j ) denote the subset of answers a ∈ q(D j ) such that exactly i positions inā have constants that are in F or have been obtained from such constants by cloning. We claim that #q i (D j ) = j i · #q i (D), that is, having j such clones multiplies each answerā ∈ q(D) having i positions of the described kind exactly j i times. By the semantics, this is immediate if q is a CQ. So assume that Let 1 ≤ j ≤ |x| + 1. Since the sets q i (D j ) partition the set q(D j ), we have that #q (D j In the above equation, there are |x| + 1 unknown values #q 0 (D), . . . , #q |x| (D) and one value, i.e. #q(D j ), that can be computed by the oracle. Taking this equation for j = 0, . . . , |x| + 1 generates a system of |x| + 1 linear equations with |x| + 1 variables. The coefficients of the system form a Vandermonde matrix, which implies that the equations are independent and that the system has a unique solution. Thus, we can solve the system in polynomial time, e.g. by Gaussian elimination, to compute the values #q 0 (D), . . . , #q |x| (D). Clearly It can be verified that, overall, the algorithm runs in the time stated in Lemma 5.3. Now for the reduction from AnswerCount(C ′ ) to AnswerCount(C) claimed in Theorem 5.1. Let Q(x) = (O, S, q) be a CQS from C viewed as an OMQ, and let D be an S-database. We first construct the database D * as per Theorem 5.2 with n being the number of variables in q. We then apply the algorithm asserted by Lemma 5.3 with D * in place of D and with F := adom(D). Cloning preserves guarded TGDs and thus we can use the oracle (which can compute #q(D ′ ) for any S-database D ′ that satisfies O) for computing AnswerCount({q}, clones(D)) as required by Lemma 5.3.

Counting Equivalence
For the proofs of both the upper and lower bounds stated in Theorem 4.3, we need a good grasp of counting equivalence. For the lower bounds, the same is true for the related notion of semi-counting equivalence. In this section, we make some fundamental observations regarding these notions.
In the lower bound proofs, we shall often be concerned with classes of databases For closure under direct products, it suffices to observe that there is a homomorphism from the direct product I of instances I 1 and I 2 to each of the components I 1 and I 2 . Thus, applicability of a TGD in the product implies applicability in both components. Moreover, the result of the applications in the components is then clearly also found in the product, see e.g. [Fag80] for more details. The arguments for the other closure properties are similar, but simpler.
We now make a central observation regarding the relationship between (semi-)counting equivalence over classes of databases D S O and (semi-)counting equivalence over the class of all databases. But let us first introduce the notion of semi-counting equivalence. Two CQs q 1 (x 1 ) and q 2 (x 2 ) over the same schema S are semi-counting equivalent if they are counting equivalent over all S-databases D such that #q 1 (D) > 0 and #q 2 (D) > 0. For a CQ q, we useq to denote the CQ obtained from q by dropping all maximal connected subqueries that contain no answer variable. Lemma 6.2. Let q 1 (x 1 ) and q 2 (x 2 ) be equality-free CQs over schema S and let D be a class of S-databases that contains D q i and Dq i for i ∈ {1, 2} and is closed under cloning. Then (1) q 1 and q 2 are counting equivalent over D iff q 1 and q 2 are counting equivalent over the class of all S-databases; (2) if D is closed under disjoint union and contains D ⊤ S , then q 1 and q 2 are semi-counting equivalent over D iff q 1 and q 2 are semi-counting equivalent over the class of all Sdatabases.
The 'if' directions of Points (1) and (2) of Lemma 6.2 are trivial. The 'only if' directions are a consequence of results on counting equivalence and semi-counting equivalence obtained in [CM16]. We give more details in the appendix.
We next observe that counting equivalence and semi-counting equivalence are decidable over classes of databases D S O . For the class of all databases, this has been shown in [CM16]. In fact, it is shown there that CQs q 1 and q 2 are counting equivalent iff there is a way to rename their answer variables to make them equivalent in the standard sense, and that they are semi-counting equivalent iffq 1 andq 2 are counting equivalent. Consequently, both problems are in NP. For a CQ q, letq denote the CQ obtained from q by removing all equality atoms and identifying any two variables x 1 , x 2 with x 1 = x 2 ∈ q. Proposition 6.3. Let O ⊆ G ∩ FULL and let S be a schema that contains all symbols from O. Given CQs q 1 (x 1 ) and q 2 (x 2 ) over S, it is decidable whether q 1 and q 2 are counting equivalent over D S O . The same holds for semi-counting equivalence.
Proof. Let q 1 (x 1 ) and q 2 (x 2 ) be given as the input. Let i ∈ {1, 2}. It is easy to see that q i (x i ) is (semi-)counting equivalent toq i (ȳ i ) over the class of all databases, and consequently also over D S O . We may thus assume that q 1 and q 2 are equality-free as otherwise we can replace them withq 1 andq 2 . We then construct ch O (q 1 ) and ch O (q 2 ), check whether they are (semi-)counting equivalent over the class of all databases using the decision procedure from [CM16], and return the result.
We have to argue that this is correct. By Lemma 2.4, it suffices to decide whether ch O (q 1 ) and ch O (q 2 ) are (semi-)counting equivalent over D S O , which by Lemma 6.2 is identical to their (semi-)counting equivalence over the class of all databases. Note that the preconditions of Lemma 6.2 are satisfied. In particular, In the upper bound, it shall be necessary to compute the Chen-Mengel closure of an OMQ Q ∈ (G ∩ FULL, UCQ). This is possible by simply following the definition of cl CM (Q), but requires us to decide counting equivalence of OMQs. We show that this is possible.
, it is decidable whether Q 1 and Q 2 are counting equivalent. We prove the upper bounds in Theorem 4.3 by Turing fpt-reductions to the corresponding upper bounds in Theorem 3.3, and the lower bounds by Turing fpt-reduction from the corresponding lower bounds in Theorem 3. 3. In both cases, the assumption that the arity of relation symbols is bounded is only required for Theorem 3.3, but not for the Turing fpt-reductions that we give. Consequently, any future classifications of AnswerCount(C) for classes of CQs C that does not rely on this assumption also lift to classes of OMQs through our reductions.

Upper Bounds.
We first establish the upper bounds presented in Theorem 4.3. All these bounds are proved in a uniform way, by providing a Turing FPT reduction from AnswerCount(Q), for any class Q ⊆ (G, UCQ) of OMQs, to AnswerCount(Q ⋆ ). It then remains to use the corresponding upper bounds for classes of CQs from Theorem 3.3. For the reduction, it is not necessary to assume that the arity of relation symbols is bounded by a constant. Let Q ⊆ (G, UCQ) be a class of OMQs with the full schema. We need to exhibit an fpt algorithm for AnswerCount(Q) that has access to an oracle for AnswerCount(Q ⋆ ). Let an OMQ Q ∈ Q and an S-database D be given. The algorithm first replaces Q by its ∃-rewriting , and thus it suffices to compute the latter count.
To compute #Q ∃ (D) within the time requirements of FPT, we first compute the set cl CM (Q ∃ ), then for every Q ′ ∈ cl CM (Q ∃ ) we determine #Q ′ (D) within the time requirements of FPT, and finally we combine the results to #Q(D) as per the following lemma, which is an immediate consequence of the definition of cl CM (Q). Note that we need to effectively compute cl CM (Q ∃ ), which is possible by Corollary 6.4 in the case that the schema is full.
can be computed within the time requirements of FPT by Lemma 2.3. To compute #Q ′ (D), we may thus construct ch O ∃ (D) and then compute #p(ch O ∃ (D)). Equivalently, we can compute and use core(ch O ∃ (p)) in place of p.
It remains to note that the CQs core(ch O ∃ (p)), for (O ∃ , S, p) ∈ cl CM (Q ∃ ), are exactly the CQs from Q ⋆ . 7.2. Lower Bounds: Getting Started. We next turn towards lower bounds in Theorem 4.3, which we all consider in parallel. Let Q ⊆ (G, UCQ) be a class of OMQs. We provide a Turing fpt-reduction from AnswerCount(C) to AnswerCount(Q) for a class of CQs C such that if Q satisfies the preconditions in one of the four lower bounds stated in Theorem 4.3 (in Points (2) to (5), respectively), then C satisfies the preconditions from the corresponding point of Theorem 3.3. While the constructed class of CQs C is closely related to Q ⋆ , it is not identical . We in fact obtain the desired Turing fpt-reduction by composing three Turing fptreductions. The first reduction consists in transitioning to the ∃-rewritings of the OMQs in the original class. The second reduction enables us to consider OMQs that use CQs rather Vol. 19 :3   ANSWER COUNTING UNDER GUARDED TGDS  16:21 than UCQs. 6 And in the third reduction, we remove ontologies altogether, that is, we reduce classes of CQs to classes of OMQs. We start with the first reduction, which is essentially an immediate consequence of Theorem 4.2.
Theorem 7.2. Let Q ⊆ (G, UCQ) be recursively enumerable and let Q ′ ⊆ (G ∩ FULL, UCQ) be the class of ∃-rewritings of OMQs from Q. There is a parsimonious fpt-reduction from AnswerCount(Q ′ ) to AnswerCount(Q). In [CM16], Chen and Mengel establish Theorem 7.3 in the special case where ontologies are empty. A careful analysis of their proof reveals that it actually establishes something stronger, namely a Turing fpt-reduction from AnswerCount(cl CM (Q), D) to AnswerCount(Q, D) for all classes of UCQs Q and all classes of databases D that satisfy certain natural properties. This is important for us because it turns out that the class of databases obtained by chasing with an ontology from G ∩ FULL satisfies all the relevant properties, and thus Theorem 7.3 is a consequence of Chen and Mengel's constructions. We now make this more precise.
For a class of databases D and a CQ q, we use cl D CM (q) to denote the version of the Chen-Mengel closure that is defined exactly as cl CM (q), except that all tests of counting equivalence are over the class of databases D rather than over the class of all databases.
Theorem 7.4 [CM16]. Let D be a class of databases over some schema S such that D is closed under disjoint union, direct product, and contains D ⊤ S . Then there is an algorithm that (1) takes as input a UCQ q, a CQ p ∈ cl D CM (q), and a database D ∈ D, subject to the promise that for all p ′ ∈ cl D CM (q), there is an equality-free CQ p ′′ such that D p ′′ ∈ D, Dp′′ ∈ D, and p ′ and p ′′ are counting equivalent over D, (2) has access to an oracle for AnswerCount({q}, D), to a procedure for enumerating D, and to procedures for deciding counting equivalence and semi-counting equivalence between CQs over D, (3) runs in time f (||q||) · p(||D||) with f a computable function and p a polynomial, (4) outputs #p (D).
The difference between access to an oracle and access to procedures in Point (2) of Theorem 7.4 is that the running time of the oracle does not contribute to the running time of the overall algorithm while the running time of the procedures does. When used with 6 It is interesting to note in this context that the construction of Q ∃ may produce a UCQ even if the original OMQ Q uses a CQ. In the appendix, we summarize the proof of Theorem 7.4 given in [CM16], showing that it works not only for the class of all databases as considered in [CM16], but also for all stated classes of databases D.
Before we prove that Theorem 7.4 implies Theorem 7.3, we make the following observation on Chen-Mengel closures. Theorem 7.6. Let Q ⊆ (G ∩ FULL, CQ) be a recursively enumerable class of OMQs with full schema. There is a class C ⊆ CQ that only contains cores and such that: (1) there is a Turing fpt-reduction from AnswerCount(C) to AnswerCount(Q); (2) for every OMQ Q = (O, S, q) ∈ Q, we find a CQ p ∈ C such that p and core(ch O (q)) have the same Gaifman graph.
Before we prove Theorem 7.6, we first show how we can make use of the three Turing fptreductions stated as Theorems 7.2, 7.3, and 7.6, to obtain the lower bounds in Theorem 4.3 from those in Theorem 3.3. Let us consider, for example, the W[1] lower bound from Point (2) of Theorem 4.3. Take a class Q 0 ⊆ (G, UCQ) of OMQs such that the treewidths of CQs in Theorems 7.2 and 7.3 give a Turing fpt-reduction from AnswerCount(Q) to By assumption, the treewidths of the CQs core(ch O (q)), (O, S, q) ∈ Q, are unbounded. Let C be the class of CQs whose existence is asserted by Theorem 7.6. By Point (2) of that theorem, the treewidths of the CQs in C are unbounded and thus AnswerCount(C) is W[1]-hard by Point (2) of Theorem 3.3. Composing the Turing fpt-reduction from AnswerCount(C) to AnswerCount(Q) given by Point (1) of Theorem 7.6 with the reduction from AnswerCount(Q) to AnswerCount(Q 0 ), we obtain a Turing fpt-reduction from AnswerCount(C) to AnswerCount(Q 0 ) and thus the latter is W[1]-hard. The other lower bounds can be proved analogously.
We now turn to the proof of Theorem 7.6 which in turn uses three consecutive fptreductions. The first reduction is easy and ensures that all involved CQs (inside OMQs) are equality-free. The second reduction allows us, informally spoken, to mark every variable in a CQ (inside an OMQ) by a unary relation symbol that uniquely identifies it. In the third reduction, we make use of these markings to remove the ontology. For the first reduction, recall that CQq is obtained from CQ q by removing all equality atoms and identifying any two variables x 1 , x 2 with x 1 = x 2 ∈ q. We next give the second reduction. The marking of a CQ q over schema S is the CQ q m obtained from q by adding an atom R x (x) for each x ∈ var(q) where R x is a fresh unary relation symbol. Note that q m is over schema S m obtained from S by adding all the fresh unary symbols.  To prove Lemma 7.8, we again adapt a reduction by Chen and Mengel that addresses the case of CQs without ontologies, but that can be lifted to relevant classes of databases similarly to Theorem 7.4.
Theorem 7.9 [CM15]. Let D be a class of databases over schema S m that is closed under direct products, cloning, and induced subdatabases. Then there is an algorithm that • takes as input an equality-free CQ q such that q m is over schema S m and a database D ∈ D, subject to the promise that q is a core and D q m ∈ D, • has access to an oracle for AnswerCount({q}, D), 7 • runs in time f (||q||) · p(||D||), f a computable function and p a polynomial, and • outputs #q m (D).
When used with the class D of all databases, Lemma 7.9 is simply the special case of Lemma 7.8 where ontologies are empty. In the appendix, we give an overview of the proof of Lemma 7.9 in [CM15], also showing that it extends to classes of databases D that satisfy the stated properties.
We now use Lemma 7.9 to prove Lemma 7.8. Let Q ⊆ (G ∩ FULL, CQ) be a recursively enumerable class of equality-free OMQs with full schema. We give an fpt algorithm that uses AnswerCount(Q) as an oracle and, given an OMQ Q m (x) = (O, S m , core(ch O (q)) m ) ∈ Q m and an S m -database D, computes #Q m (D).
First, the algorithm enumerates Q to find an OMQ Q(x) such that Q m is the core-chased marking of Q, that is, Q = (O, S, q). It then starts the algorithm from Lemma 7.9 for the class of databases We should argue that the preconditions of Lemma 7.9 are satisfied. By Lemma 6.1, D S m O is closed under direct products and cloning. Since the ontologies in Q are from FULL, D S m O is also closed under induced subdatabases. Moreover, class D S m O contains D core(ch O (q)) m since core(ch O (q)) m = ch O (core(ch O (q)) m ) and the schema is full. As the oracle for AnswerCount({core(ch O (q))}, D S m O ) needed by the algorithm, we can use the oracle for AnswerCount(Q) that we have available, as follows.
Given a database D ∈ D S m O , we first construct database D| S by dropping all atoms that use a symbol from S m \ S and then ask the oracle for AnswerCount(Q) to return #Q(D| S ). We argue that this is the same as the required #core(ch O (q))(D). In fact, The first equality is due to the universality of the chase. For the second equality, recall that D ∈ D S m O and is thus of the form D = ch O (D ′ ) with D ′ an S m -database. Since O does not use the symbols from S m \ S, this implies the second equality. The third equality holds because q does not use the symbols from S m \ S. And the final equality holds because D = ch O (D ′ ) and thus any homomorphism from q to D is also a homomorphism from ch O (q) to D. Moreover, taking the core produces an equivalent CQ. Now for the third fpt-reduction that we use in the proof of Theorem 7. 4. It facilitates that with the presence of markings it is possible to remove ontologies, in the following sense.  (1) there is a Turing fpt-reduction from AnswerCount(C) to AnswerCount(Q m ); (2) C is based on the same Gaifman graphs as Q m : We provide a proof of Lemma 7.10 below. Before, however, we show how Theorem 7.6 follows from Lemmas 7.8 and 7.10.
Proof of Theorem 7.6. Let Q ⊆ (G ∩ FULL, CQ) be a recursively enumerable class of OMQs with full schema. From Lemma 7.10, we obtain a class C of CQs that are cores and are based on the same Gaifman graphs as Q m . This is the class whose existence is postulated by Theorem 7. 6. We argue that Points (1) and (2) of that theorem are satisfied. The Turing fpt-reduction required by Point (1) is the composition of the reductions asserted by Lemmas 7.10, 7.8, and 7.7. Point (2) is a consequence of the facts that C is based on the same Gaifman graphs as Q m and neither does marking a CQ affect its Gaifman graph nor does the transition from a CQ q toq. To see the latter, recall that the same variable identifications that take place when constructingq from q are also part of the definition of the Gaifman graph D q of q. Now for the announced proof of Lemma 7.10, a key ingredient to the proof of Theorem 4.3.
Proof of Lemma 7. 10. To prove the lemma, we define the required class of CQs C and describe an fpt algorithm that takes as an input a query q ∈ C over schema S and an S-database D, has access to an oracle for AnswerCount(Q m ), and outputs #q (D). Every Q = (O, S, q) ∈ Q gives rise to a CQ q s in C that is formulated in a schema different from S (whence the superscript 's'). To define q s , fix a total order on var(q). For every guarded set S in D q , let S be the tuple that contains the variables in S in the fixed order. Now q s contains, for every maximal guarded set S in D q , the atom R S (S) where R S is a fresh relation symbol of arity |S|. Note that q s is self-join free, that is, it contains no two distinct atoms that use the same relation symbol. It is thus a core. Moreover, the Gaifman graph of q s is identical to that of q m since the maximal guarded sets of D q m are exactly those of D q s . An example of a transformation from q to q s can be found in Figure 4. This defines the class of CQs C.
We now describe the algorithm. Let a CQ q s ∈ C over schema S s and an S s -database D s be given as input. To compute #q s (D s ), we first enumerate Q to find an OMQ Q = (O, S, q) such that q s can be obtained from q as described above.  The above equalities, as well as the construction of the involved databases and queries, are illustrated in Figure 5. The figure also shows some homomorphisms used in the remaining proof.
The first equality is immediate since D m = ch O (D m ). For the third equality, let x = x 1 · · · x n be the answer variables in q s and for anyā = a 1 · · · a n ∈ adom(D s ) n , letx ×ā denote the tuple (x 1 , a 1 ) · · · (x n , a n ) ∈ adom(P ) n . Then q s,m (P m ) = {x ×ā |ā ∈ q(D s )}. In fact, if h is a homomorphism from q s,m to P m and x ∈ var(q s,m ), then h(x) ∈ {x}×adom(P m ) due to the use of the marking relation R x in q s,m and in P m . Moreover, every such homomorphism h gives rise to a homomorphism h ′ from q s to D s by setting h ′ (x) = c if h(x) = (x, c), for all x ∈ var(q s ). Conversely, every homomorphism h from q s to D s gives rise to a homomorphism h ′ from q s,m to P m by setting h ′ (x) = (x, h(x)) for all x ∈ var(q s,m ).
It thus remains to deal with the second equality by showing that q m (D m ) = q s,m (P m ). It is enough to observe that any function h : var(q m ) → adom(D m ) is a homomorphism from q m to D m if and only if it is a homomorphism from q s,m to P m .
For the "if" direction, let h be a homomorphism from q s,m to P m . First let R(ȳ) be an atom in q m with R ∈ S. There is a maximal guarded set S of D q m that contains all variables inȳ. Then R S (S) is an atom in q s and thus R S (h(S)) ∈ P . By construction of D m and sinceȳ is a tuple over S, this yields R(h(ȳ)) ∈ D m , as required. Now let R x (x) be an atom in q m . Then R x (x) is also an atom in q s,m and thus h(x) ∈ {x} × adom(D s ) due to the definition of P m . But then R x (h(x)) ∈ D m by definition of D m .

Vol. 19:3 ANSWER COUNTING UNDER GUARDED TGDS 16:27
For the "only if" direction, let h be a homomorphism from q m to D m . First consider atoms R S (S) in q s,m . Then q m contains an atom R(ȳ) whereȳ contains exactly the variables in S and thus R(h(ȳ)) ∈ D m . By construction of D m , h(ȳ) is thus a tuple over some guarded set in P , that is, P contains an atom Q(ā) whereā contains all constants from h(ȳ). In the following, we show that Q(ā) must in fact be R S (h(S)), as required.
Letā = (z 1 , c 1 ), . . . , (z n , c n ) andz = z 1 , . . . , z n . By construction of P as D q s × D s , Q(ā) ∈ P implies that q s contains an atom Q(z). It suffices to show thatz contains all variables from S: since the construction of q s uses as S only maximal guarded sets, the only such atom in q s is R S (S). By construction of P , we must thus have Q(ā) = R S (h(S)).
Let V be the variables inz. Since Q(z) ∈ q s , V is a guarded set in q s . Now note that we must have h(y) ∈ {y} × adom(D s ) for every variable y inȳ due to the use of the relation symbol R y in q m and D m . Sinceā contains all constants from h(ȳ), every variable fromȳ occurs in V . Moreover, these are exactly the variables in S and thus S ⊆ V .

Approximation and FPTRASes
In many applications of answer counting, it suffices to produce a good approximation of the exact count. For CQs without ontologies, significant progress on approximate answer counting has recently been made by Arenas et al. [ACJR21], see also [FGRZ21] for follow-up work. We observe some important consequences for approximately counting the number of answers to ontology-mediated queries.
A randomized approximation scheme for a counting problem P : Λ * → N is a randomized algorithm that takes as input a word w ∈ Λ * and an approximation factor ϵ ∈ (0, 1) and outputs a value v ∈ N such that A fixed-parameter tractable randomized approximation scheme (FPTRAS) for a parameterized counting problem (P, κ) over alphabet Λ is a randomized approximation scheme for the counting problem P with running time at most f (κ(w)) · p(|w|, 1 ϵ ) for some computable function f and polynomial p. The results proved in [ACJR21] imply the following. An OMQ Q ∈ (G, UCQ) has semantic treewidth at most k ≥ 1 if there is an OMQ Q ′ ∈ (G, UCQ) such that Q ≡ Q ′ and Q ′ has treewidth at most k. let UCQ k be the class of UCQs that have treewidth at most k. By Propoposition 3.5 of [ACJR21], there is an FPRAS for AnswerCount(UCQ k ), where an FPRAS is defined like an FPTRAS except that the running time may be at most p(|w|, 1 ϵ ). We use this FPRAS to compute an approximation of # q ′ ( D * ) and return the result. Overall, this yields the desired FPTRAS for Q.
It is interesting to note the contrast between Theorem 8.1 and Point 1 of Theorem 4.3: the latter refers to the treewidth and contract treewidth of the class of CQs Q * , which is defined in a non-trivial way, while Theorem 8.1 simply speaks about the semantic treewidth of the OMQs in Q and is thus in line with the characterizations of efficient OMQ evaluation given in [BFLP19, BDF + 20]. In fact, the classes of OMQs covered by Theorem 8.1 are precisely those subclasses of (G, UCQ) for which evaluation is in FPT [BDF + 20], paralleling the situation for CQs without ontologies. Informally, exact counting and approximate counting differ in how the CQs inside a UCQ interact (and we cannot avoid UCQs when we eliminate existential quantifiers from ontologies). In exact counting, the Chen-Mengel closure captures this interaction, demonstrating that answer counting for a UCQ may enable answer counting for CQs whose structural measures are higher than that of any CQ in the UCQ. In approximate counting, such effects do not seem to play a role. Also note that Theorem 8.1 does not rely on the data schema to be full, unlike the upper bounds in Theorem 4. 3.
It may well be the case that a matching lower bound can be proved for Theorem 8.1 under the assumptions that W[1] ̸ = FPT and B = BPP, that is, if Q ⊆ (G, UCQ) does not have bounded semantic treewidth, then there is no FPTRAS for AnswerCount(Q) unless one of the mentioned assumptions fails. This was proved in [ACJR21] for classes Q of CQs (without ontologies) under the additional assumption that for every q ∈ Q, there is a self-join free q ′ ∈ Q that has the same hypergraph as q. It is currently only known that this assumption can be dropped when all OMQs in Q are Boolean and when none of the OMQs in Q contains quantified variables. We conjecture that it is possible to lift these restricted cases from pure CQs to OMQs from (G, UCQ), building on results from [BDF + 20]. The general case, however, remains open. This suggests the importance of the meta problem to decide whether a given query is equivalent to one in which some selected structural measures are small, and to construct the latter query if it exists. We present some results on this topic both for ontology-mediated querying and for querying under constraints. These results and their proofs also shed some more light on the interplay between the ontology and the structural measures. 9.1. Querying Under Constraints. We start with querying under constraints, considering all measures in parallel. In fact, we even consider sets of measures since some of the statements in Theorems 4.3 and 4.4 refer to multiple measures and it is not a priori clear whether the fact that each measure from a certain set of measures can be made small in an equivalent query implies that the same is true for all measures from the set simultaneously.

The Meta Problems-Equivalent Queries with Small Measures
Our approach is as follows. For a given CQS (T , S, q), we construct a certain CQ q ′ that approximates q from below under the constraints in T and that has small measures. Similar approximations have been considered for instance in [BLR14], without constraints. We then show that if there is any CQ q ′′ that has small measures and is equivalent to q under the constraints in T , then q ′ is equivalent to q. In this way, we are able to simultaneously solve the decision and computation version of the meta problem at hand. With 'approximation from below', we mean that the answers to q ′ are contained in those to q on all S-databases. This should not be confused with computing an approximation of the number of answers to a given query as considered in Section 8. A set of measures is a subset M ⊆ {TW, CTW, SS, LMN} with the obvious meaning. For a set of measures M and k ≥ 1, we say that a UCQ q is an M k -query if for every CQ in q, every measure from M is at most k. If T is a finite set of TGDs from (G, UCQ) over schema S and q 1 (x), q 2 (x) are UCQs over S, then we say that q 1 is contained in q 2 under T , written q 1 ⊆ T q 2 , if q 1 (D) ⊆ q 2 (D) for every S-database D that is a model of T , and likewise for equivalence and q 1 ≡ T q 2 .
Definition 9.1. Let (T , S, q) ∈ (G, UCQ) be a CQS, M a set of measures, and k ≥ 1. An M k -approximation of q under T is a UCQ q ′ such that (1) q ′ ⊆ T q, (2) q ′ is an M k -query, and (3) for each UCQ q ′′ that satisfies Conditions 1 and 2, q ′′ ⊆ T q ′ .
It might be useful for the reader to reconsider Example 4.1, which for every n ≥ 0 gives an OMQ (O, S, q n ) with full schema such that has high measures, but is equivalent to an OMQ (O, S, p n ) with low measures. The equivalence also holds true if the OMQs are viewed as CQSs, that is, q n ≡ O p n . If we choose for example M = {T W, CT W } and k = 1, then it can be seen that every M k -approximation of q n must contain a CQ that is equivalent to p n .
We next identify a simple way to construct M k -approximations. Let (T , S, q) ∈ (G, UCQ) be a CQS, M a set of measures, and k ≥ 1. Moreover, let ℓ be the maximum number of variables in any CQ in q and fix a set V of exactly ℓ · ar(S) variables. Assuming that T is understood from the context, we define q M k to be the UCQ that contains as a disjunct any CQ p such that p ⊆ T q, p is an M k -query, and p uses only variables from V. As containment between UCQs under constraints from G is decidable [BBP18], given (T , S, q) we can effectively compute q M k . We show next that q M k is an M k -approximation of q under T . Proof. By construction, q M k satisfies Points 1 and 2 from Definition 9.1. We show that it satisfies also Point 3. Let q ′′ (x) be a UCQ such that q ′′ ⊆ T q and q ′′ is an M k -query. Further, let p be a CQ in q ′′ . We have to show that q M k contains a CQ p ′ with p ⊆ T p ′ . We apply Theorem 5.2 to the ontology O = T , the database D = D p , and the integer n, defined to be the maximum number of variables of CQs in q. This yields a database D ⋆ p which has the properties that D ⋆ p |= T , D p ⊆ D ⋆ p , and thusx ∈ p(D ⋆ p ). From q ′′ ⊆ T q, it follows thatx ∈ q(D ⋆ p ), and thus there must be CQ q i in q such thatx ∈ q i (D ⋆ p ). From Point 2 of Theorem 5.2 and |var(q i )| ≤ ℓ, it follows thatx ∈ Q(D p ) for the OMQ Q = (T , S, q i ). Consequently, q i maps into ch T (D p ) via some homomorphism h that is the identity onx. We intend to use h for identifying the desired CQ p ′ in q M k such that p ⊆ T p ′ . We need some preliminaries that we keep on an intuitive level here and flesh out in the appendix. Since T is a set of guarded TGDs, ch T (D p ) is of a certain regular shape. Informally, it looks like D p with a tree-like structure attached to every guarded set X in D p . 8 Note that the constants in D p are exactly the variables in p. We refer to all other constants in ch T (D p ) as nulls. Formally we first identify with every fact R(c) ∈ ch T (D p ) such thatc contains at least one null a unique 'source' fact src(R(b)) ∈ D p that played the role of the guard when the tree-like structure that R(c) is in was generated by the chase and then use src to identify the tree-like structures in ch T (D p ).
Start with setting src(R(c)) = R(c) for all R(c) ∈ D p . Next assume that R(c) ∈ ch T (D p ) was introduced by a chase step that applies a TGD T ∈ T at a tuple (d,d ′ ), and let R ′ be the relation symbol in the guard atom in body(T ). Then we set src(R(c)) = R ′ (d,d ′ ) if d ∪d ′ ⊆ adom(D p ) and src(R(c)) = src(R ′ (d,d ′ )) otherwise. For any guarded set X of D p , define ch T (D p )| ↓ X to contain those facts R(c) ∈ ch T (D p ) such that the constants in src(R(c)) are exactly those in X.
In the appendix, we show the following: (A) for every guarded set X in D p , there is a homomorphism from ch T (D p )| ↓ X to ch T (ch T (D p )| X ) that is the identity on all constants in X; (B) if c ∈ adom(ch T (D p )) is a null and R 1 (c 1 ), R 2 (c 2 ) ∈ ch T (D p ) such that c occurs in both c 1 and c 2 , then src(R 1 (c 1 )) = src(R 2 (c 2 )). Informally, Condition (A) may be viewed as a locality property of the chase an Condition (B) says that, as expected, attached tree-like structures do not share any variables.
As announced, we now construct the CQ p ′ in q M k . All atoms in p ′ are facts from ch T (p)| var(p) , viewed as atoms. To control the number of variables in p ′ , however, we do not include all such atoms, but only a selection of them. Consider each atom R(ȳ) in q i and distinguish the following cases: • if h(ȳ) contains only variables, then add R(h(ȳ)) to p ′ ; • if h(ȳ) contains a null, then consider the atom S(z) = src(R(h(ȳ))) ∈ D p and let X be the set of variables inz; add all facts in ch T (p)| X as atoms to p ′ . The answer variables of p ′ are exactly those of p. All these variables must be present since h is the identity onx. It follows from the construction of p ′ that the identity is a homomorphism from p ′ to ch T (p). Thus p ⊆ T p ′ and it remains to show that, up to renaming the variables so that they are from the set V fixed for the construction of q M k , p ′ is a CQ in q M k . This is a consequence of the following properties: (1) p ′ is an M k -query. By definition of p ′ , all guarded sets in p ′ are also guarded sets in ch T (D p ). Moreover, those guarded sets contain no nulls and are thus also guarded sets in p. Consequently, the Gaifman graph of p ′ is a subgraph of the Gaifman graph of p, and all measures are monotone regarding subgraphs.
(2) p ′ ⊆ T q i . It suffices to construct a homomorphism h ′ from q i to ch T (D p ′ ). This can be done as follows.
For each x ∈ var(q i ) with h(x) a variable, put h ′ (x) = h(x). It remains to deal with all x ∈ var(q i ) with h(x) a null.
With any such x, we associate a unique atom Γ(x) = src(R 1 (h(x 1 ))) ∈ D p , identifying the tree-like structure in ch T (D p ) that h(x) is in. Take any atom R(ȳ) ∈ q i such thatȳ contains 8 More precisely, a structure of treewidth ℓ, where ℓ is the maximum number of variables in the head of a TGD from T . x. It follows from (B) above that src(R(h(ȳ))) is the same, no matter which such atom R(ȳ) ∈ q i we take. We may thus associate with x the unique atom Γ(x) = src(R(h(ȳ))) ∈ D p . Now consider any maximal set X of variables x ∈ var(q i ) such that h(x) is a null and x 1 , x 2 ∈ X implies Γ(x 1 ) = Γ(x 2 ). Then h is a homomorphism from q 1 | X to ch T (D p )| ↓ X . By (A) above, there is a homomorphism h X from ch T (D p )| ↓ X to ch T (ch T (D p )| X ) that is the identity on all constants in X. Moreover, the construction of p ′ yields ch T (D p )| X ⊆ p ′ . We may thus view h X as a homomorphism from ch T (D p It can be verified that the constructed h ′ is indeed a homomorphism from q i to ch T (D p ′ ).
A straightforward analysis of the construction of p ′ shows that it introduces into p ′ the following variables, implying the statement: • for every x ∈ q i with h(x) a variable, the variable h(x), • for every x ∈ q i with h(x) a null, the variables that occur in Γ(x), where Γ is as above.
In fact, assume that an atom R(ȳ) is treated in the construction of p ′ and letȳ = y 1 , . . . , y k . If Case 1 of the construction applies, then the variables h(y 1 ), . . . , h(y k ) are introduced. For Case 2 of the construction, we reuse the function Γ defined in the proof of the previous property. If this case applies, then by definition Γ(x) is identical for every variable x inȳ with h(x) a null, subsequently just referred to as Γ. This Γ contains h(x) for all variables x inȳ with h(x) a variable, and Γ is precisely the set of variables introduced in this step.
Let (T , S, q) ∈ (G, UCQ) be a CQS. By definition of M k -approximations, it is clear that if there exists a UCQ q ′ such that q ′ ≡ T q and q ′ is an M k -query, then any M k -approximation q ⋆ of q under T also satisfies q ⋆ ≡ T q. The following is thus an immediate consequence of Lemma 9.2 and the fact that containment between UCQs under constraints from G is decidable.
Theorem 9.3. Let M be a set of measures. Given a CQS (T , S, q) ∈ (G, UCQ) and k ≥ 1, it is decidable whether q is equivalent under T to a UCQ q ′ that is an M k -query. Moreover, if this is the case, then such a q ′ can be effectively computed.
A particularly relevant case is M = {TW, CTW}, as it is linked to fixed-parameter tractability. From the above results, we obtain that answer counting in FPT is possible for CQSs that are semantically of bounded treewidth and contract treewidth, provided that only CQs are admitted as the actual query. Let us make this more precise. Fix k ≥ 1 and let C k be the class of CQSs (T , S, q) ∈ (G, CQ) such that q ≡ T q ′ for some UCQ q ′ of treewidth and contract treewidth at most k. Then AnswerCount(C k ) is in FPT: given a CQS (T , S, q) ∈ C k and an S-database D, we may compute, as per Theorem 9.3, a UCQ q ′ that is an M k -query and satisfies q ≡ T q ′ . Since q is a CQ, it is easy to see that there must be a single disjunct q ⋆ of q such that q ≡ T q ⋆ . We can effectively identify q ⋆ and use Point 1 of Theorem 3.3 as a blackbox to count answers to q ⋆ on D. The same is probably not true when we define C k as a subclass of (G, UCQ) rather than (G, CQ). Then, we have to count the answers to q ′ on D rather than to a single CQ q ⋆ in q ′ , but for UCQs of bounded treewidth and contract treewidth, Point 1 of Theorem 3.3 does not always guarantee answer counting in FPT because of the use of the Chen-Mengel closure in that theorem.
for every S-database D. Q 1 and Q 2 are equivalent, written Q 1 ≡ Q 2 , if Q 1 ⊆ Q 2 and Q 2 ⊆ Q 1 . We say that an OMQ Q = (O, S, q) is an M k -query if q is.
The containment in the center holds since q ′ is an M k -approximation of q under O.
(3) P ⊆ Q ′ for all P (x) = (O, S, p) ∈ (G, UCQ) such that P ⊆ Q and p is an M k -query. We first observe that p ⊆ O q. Thus let D be an S-database that satisfies all TGDs from O. Then P (D) = p(D) and Q(D) = q(D), thus P ⊆ Q implies p(D) ⊆ q (D).
We now show that P ⊆ Q ′ , as required. Let D ⋆ be the database from Theorem 5.2 invoked with O, D, and n = max{|var(q ′ )|, |var(p)|}. Then The containment in the center holds since p ⊆ O q and q ′ is an M k -approximation of q under O.
"only if". Assume that Q ′ (x) = (O, S, q ′ ) is an M k -approximation of Q while preserving the ontology. To show that q ′ is an M k -approximation of q under O, we have to show that Points (1) to (3) from Definition 9.1 are satisfied. Again, Point 2 is obvious.
(1) q ′ ⊆ O q. Follows from the fact that q ′ (D) = Q ′ (D) ⊆ Q(D) = q(D) for all S-databases D that satisfy O. The containment holds since Q ′ is an M k -approximation of Q.
(2) p ⊆ O q ′ for all UCQs p such that p ⊆ O q and p is an M k -query. Vol We first observe that P ⊆ Q ′ where P = (O, S, p). In fact, let D be an S-database. Now take the database D ⋆ from Theorem 5.2 invoked with O, D, and n = max{|var(q ′ )|, |var(p)|}. Then Now, p ⊆ O q ′ is a consequene of P ⊆ Q ′ and the fact that P (D) = p(D) and Q(D) = q (D) for all S-databases D that satisfy O.
Lemma 9.5 allows us to compute approximations of OMQs using the construction given in Section 9.1. As in the CQS case, it is easy to see that a given OMQ Q = (O, S, q) is equivalent to an OMQ Q ′ = (O, S, q ′ ) that is an M k -query if and only if the M k -approximation of Q is an M k -query. We thus obtain the following.
Theorem 9.6. Let M be a set of measures. Given an OMQ Q = (O, S, q) ∈ (G, UCQ) based on the full schema and k ≥ 1, it is decidable whether Q is equivalent to an OMQ Q ′ = (O, S, q ′ ) ∈ (G, UCQ) that is an M k -query. Moreover, if this is the case, then such a Q ′ can be effectively computed.
While Theorem 9.6 requires the schema to be full and the ontology to be preserved, we now turn to approximations of OMQs that need neither preserve the ontology nor assume the full schema. We focus on contract treewidth and starsize and leave treewidth and dominating starsize as open problems. To simplify notation, instead of {CTW} k -approximations we speak of CTW k -approximations, and likewise for SS k -approximations.
A collapsing of a CQ q(x) is a CQ p(x) that can be obtained from q by identifying variables and adding equality atoms (on answer variables). When an answer variable x is identified with a non-answer variable y, the resulting variable is x; the identification of two answer variables is not allowed. The CTW k -approximation of an OMQ Q = (O, S, q) ∈ (G, UCQ), is the UCQ that contains as CQs all collapsings of q that have contract treewidth at most k. The SS k -approximation of Q is defined accordingly, and denoted with Q SS k .
Theorem 9.7. Let (O, S, q) ∈ (G, UCQ) be an OMQ and k ≥ 1. Then Q CTW k is a CTW kapproximation of Q. Moreover, if k ≥ ar(S), then Q SS k is an SS k -approximation of Q. The proof of Theorem 9.7 is non-trivial and relies on careful manipulations of databases that are tailored towards the structural measure under consideration. Details are given below. The theorem gives rise to decidability results that, in contrast to Theorem 9.6, neither require the ontology to be preserved nor the schema to be full.
Corollary 9.8. Given an OMQ Q = (O, S, q) ∈ (G, UCQ) and k ≥ 1, it is decidable whether Q is equivalent to an OMQ Q ′ ∈ (G, UCQ) of contract treewidth at most k. Moreover, if this is the case, then such a Q ′ can be effectively computed. The same is true for starsize in place of contract treewidth.
Note that, although we are concerned here with approximations that are not required to preserve the ontology, Theorem 9.7 implies that for CTW k -approximations and SS kapproximations, it is never necessary to use an ontology different from the one in the original OMQ. Before proving Theorem 9.7, we observe that treewidth behaves differently in this respect, and thus a counterpart of Theorem 9.7 for treewitdth cannot be expected. This is even true when the schema is full. Example 9.9. For n ≥ 3, let Q n () = (∅, S n , q n ∨ p n ) where S n = {W, R 1 , . . . , R n } with W of arity n and each R i binary and where q n = ∃x 1 · · · ∃x n W (x 1 , . . . , x n ) and p n = ∃x 1 · · · ∃x n ∃y R 1 (x 1 , y), . . . , R n (x n , y).
In fact, it is equivalent to Q n . However, Q n has no TW k -approximation Q ⋆ based on the same (empty) ontology for any k < n−1 since Q n ̸ ⊆ Q ⋆ for any Q ⋆ = (∅, S n , q ⋆ ) such that q ⋆ is of treewidth k < n−1. In fact, any Q ⋆ of treewidth k < n−1 does not return any answers on the database {W (a 1 , . . . , a n )}.
One might criticize that in Example 9.9, the arity of relation symbols grows unboundedly. The next example shows that this is not necessary. It does, however, use a data schema that is not full.
Example 9. 10. Let S = {W, R} with W of arity 3 and R of arity 2. For n ≥ 0, let Q n () = (∅, S, q n ) where q n = ∃z 1 ∃z 2 ∃z 3 ∃x 1 · · · ∃x n 1≤i,j<n;i̸ =j Then, G qn is the (n + 3)-clique and thus the treewidth of q n is n + 2. Since q n is a core, there is no OMQ based on the empty ontology that is equivalent to Q n () and in which the actual query has treewidth less than n + 2.
For n ≥ 0, let P n () = (O, S, p n ) where p n = ∃y∃z 1 ∃z 2 ∃z 3 ∃x 1 · · · ∃x n 1≤i,j<n;i̸ =j and Then p n has treewidth n+1 and P n is equivalent to Q n . Consequently, P n is a TW n+1approximation of Q n . For the case n = 2, the involved CQs are displayed in Figure 6.
We now turn to the proof of Theorem 9.7. Here, we present only the statement about starsize made in Theorem 9.7, restated as Lemma 9.11 below. The statement about contract treewidth is proved in the appendix. The proof follows a similar strategy as for starsize, but is a bit more involved.
A pointed S-database is a pair (D,c) with D an S-database andc a tuple of constants from adom (D). The contract treewidth and starsize of (D,c) are that of D viewed as a conjunctive query with constants fromc playing the role of answer variables. Vol Lemma 9. 11. Let (O, S, q) ∈ (G, UCQ) be an OMQ and k ≥ ar(S). Then Q SS k is an SS k -approximation of Q.
, it is clear that Points 1 and 2 of the definition of SS k -approximations are satisfied. It remains to establish Point 3.
Let P (x) = (O ′ , S, p) ∈ (G, UCQ) such that P ⊆ Q with p of starsize at most k. We have to show that P ⊆ Q SS k , i.e.,c ∈ P (D) impliesc ∈ Q SS k (D) for all S-databases D. Thus let D be an S-database and letc ∈ P (D). Since P ⊆ Q, we havec ∈ Q(D). We construct a pointed S-database (D ′ ,c) such that (1)c ∈ P (D ′ ), (2) the starsize of (D ′ ,c) at most k, and (3) there is a homomorphism from D ′ to D that is the identity onc. In the following, we consider sets S of constants that occur inc, with |S| ≤ k. Let S denote the set of all such sets S. For every S ∈ S, let D S denote the database obtained from D by renaming every constant c / ∈ S to c S . We then define By definition, (D ′ ,c) has noc-component with more than k constants fromc, and thus Point 2 is satisfied. Point 3 is clear by construction of (D ′ ,c). We need to show that Point 1 also holds. Sincec ∈ P (D), there is a homomorphism h from some CQ p ′ (x) in p to ch O ′ (D). We construct a homomorphism h ′ from p ′ to ch O ′ (D), which showsc ∈ P (D ′ ) as desired.
For every S ∈ S, there is a homomorphism (even isomorphism) h S from D to D S that is the identity on S. This homomorphism can be extended to a homomorphism from Now for the construction of h ′ . For all answer variables x in p ′ , we set h ′ (x) = h(x). Note that this yields h ′ (x) =c. For every quantified variable y, let S be the set of answer variables that are part of the uniquex-component that contains y. Then set h ′ (y) = h S •h(y). This is well-defined since p has starsize at most k, and thus |S| ≤ k implying S ∈ S.
We argue that h ′ is indeed a homomorphism. For every atom R(z) ∈ p ′ , we have R(h(z)) ∈ ch O ′ (D). First assume that the variables inz are all answer variables. Let S be the set of all constants in h(z). We have |S| ≤ k since k ≥ ar(S). Since h ′ (z) = h(z) and h S is the identity on S, R(h(z)) ∈ ch O ′ (D) implies R(h ′ (z)) ∈ ch O ′ (D ′ ), as required. Now assume thatz contains at least one quantified variable. Then all variables inz belong to the samex-component of p ′ . Let S be the set of constants h(x) such that x is an answer variable in thisx-component. Then h ′ (z) = h S • h(z) and we are done. We have thus established Point 1 above.
From P ⊆ Q andc ∈ P (D ′ ), we obtainc ∈ Q(D ′ ). Thus, for some CQ q ′ in q, there is a homomorphism g from q ′ to ch O (D ′ ) such that g(x) =c. Let q denote the collapsing of q ′ that is obtained by identifying y 1 and y 2 whenever g(y 1 ) = g(y 2 ) with at least one of y 1 , y 2 a quantified variable and adding x 1 = x 2 whenever g(x 1 ) = g(x 2 ) and x 1 , x 2 are both answer variables. Then g is also a homomorphism from q to ch O (D ′ ). By Point 3, there is a homomorphism h D from D ′ to D, which can be extended to a homomorphism from  (D). To finish the proof, it thus remains to show that q is a CQ in q SS k . Assume to the contrary of what is to be shown that the starsize of q(x) is at least ℓ = max{k, ar(S)} + 1. Then, there is anx-component S of q with at least ℓ distinct answer variables, say x 1 , x 2 , . . . , x ℓ such that q does not contain atoms x i = x j for 1 ≤ i < j ≤ ℓ. Let y be a quantified variable in S. By definition ofx-components, G q contains (simple) paths P i between y and x i , for 1 ≤ i ≤ ℓ. Together with the homomorphism g, each path P i gives rise to a path P ′ i in G ch O (D ′ ) between a = h(y) and c i = h(x i ), for 1 ≤ i ≤ ℓ. By definition of q, g(z 1 ) = g(z 2 ) implies that z 1 = z 2 or z 1 , z 2 are both answer variables and q contains an equality atom z 1 = z 2 . It follows: (a) the constants c 1 , . . . , c ℓ and a are all different; (b) a is different from all constants inc; (c) path P ′ i contains no constants fromc as inner nodes. First assume that a ∈ adom(D ′ ). An easy analysis of the chase shows that, due to the existence of the path P ′ i and since all TGDs in O are guarded, for every 1 ≤ i ≤ ℓ there is a path P ′′ i in G D ′ between c i and a such that P ′′ i uses no constants introduced by the chase. In fact, we can obtain P ′′ i from P ′ i by dropping all constants that have been introduced by the chase. It then follows from (a) to (c) that the starsize of (D ′ ,c) is at least ℓ, a contradiction. Now assume that a / ∈ adom(D ′ ). Let b i be the last constant on the path P ′ i that is in adom(D ′ ) when traveling the path from c i to a. Thus, the subpath of P ′ i that connects (the last occurrence of) b i with a uses only constants introduced by the chase as inner nodes. Another easy analysis of the chase reveals that since all paths P ′ 1 , . . . , P ′ ℓ end at the same constant a, there must be a fact in D ′ that contains all of b 1 , . . . , b ℓ . Note that {c 1 , . . . , c ℓ } ⊆ {b 1 , . . . , b ℓ } is impossible since ar(S) < ℓ. It thus follows from (c) that some b i is not inc. Consequently, there is a path P ′′ i in G D ′ that connects c i and b i and uses no constants fromc as inner nodes, for 1 ≤ i ≤ ℓ. We may again obtain P ′′ i by dropping constants introduced by the chase. This implies that the starsize of (D ′ ,c) is at least ℓ, a contradiction.

Conclusions
We have provided a complexity classification for counting the number of answers to UCQs in the presence of TGDs that applies both to ontology-mediated querying and to querying under constraints. The classification also applies to ontology-mediated querying with the OMQ language (ELIH, UCQ) where ELIH is a well-known description logic [BHLS17]. In fact, this is immediate if the ontologies in OMQs are in a certain well-known normal form that avoids nesting of concepts [BHLS17]. In the general case, it suffices to observe that all our proofs extended from guarded TGDs to frontier-guarded TGDs [BLMS11] with bodies of bounded treewidth, a strict generalization of ELIH. In contrast, a complexity classification for OMQs based on frontier-guarded TGDs with unrestricted bodies is an interesting problem for future work.
There are several other interesting questions that remain open. In querying under constraints that are guarded TGDs, does answer counting in FPT coincide with answer counting in PTime? Do our results extend to ontology-mediated querying when the data schema is not required to be full? What happens when we drop the restriction that relation Vol. 19 :3   ANSWER COUNTING UNDER GUARDED TGDS  16:37 symbols are of bounded arity? What about OMQs and CQSs based on other decidable classes of TGDs? And how can we decide the meta problems for the important structural measure of treewidth when the ontology needs not be preserved, with full data schema or even with unrestricted data schema?
Lemma 6.2. Let q 1 (x 1 ) and q 2 (x 2 ) be equality-free CQs over schema S and let D be a class of S-databases that contains D q i and Dq i for i ∈ {1, 2} and is closed under cloning. Then (1) q 1 and q 2 are counting equivalent over D iff q 1 and q 2 are counting equivalent over the class of all S-databases; (2) if D is closed under disjoint union and contains D ⊤ S , then q 1 and q 2 are semi-counting equivalent over D iff q 1 and q 2 are semi-counting equivalent over the class of all Sdatabases. We start with two simple, yet crucial, observations about counting: one regarding products and one regarding cloning. Those observation will often be used implicitly in the following three sections of the appendix. The proof is folklore.
We need one more definition before we formulate the statement regarding cloning. Let D be an S-database and q(x) be a CQ over schema S. For a number i ≥ 0 and a set T ⊆adom (D), by hom i,T (q, D,x) we denote the set of all functions h :x → adom(D) that extend to a homomorphism from q to D such that h maps exactly i variables fromx to T .
Lemma B.2. Let D be an S-database, let i ≥ 0, j > 0 be natural numbers, and T ⊆ adom(D) be a subset of the active domain of D. Let q(x) be an equality-free CQ over schema S.
If D j is a database obtained from D by cloning every element from T exactly j−1 times and T j ⊆ adom (D j ) is the set of all those clones, then |hom i,T j (q, D j ,x)| = j i |hom i,T (q, D,x)|.
As before, the proof is straightforward. Nevertheless, observe that in the above statement it is crucial that the answer variables are independent and do not repeat in the tuplex.
For the notion of counting equivalence, we inspect the strongly related notion of being renaming equivalent. Let q 1 (x 1 ), q 2 (x 2 ) be CQs over some schema S. We say that q 1 and q 2 are renaming equivalent if there are two surjections h 1 :x 1 →x 2 and h 2 :x 2 →x 1 that can be extended to homomorphisms h 1 : q 1 → D q 2 and h 2 : q 2 → D q 1 . Lemma B.3 (counting equivalence). Let q 1 (x 1 ), q 2 (x 2 ) be equality-free CQs over schema S. Let D be a class of databases such that • D q 1 , D q 2 ∈ D and • D is closed under cloning. Then q 1 and q 2 are renaming equivalent or there is a D ∈ D such that #q 1 (D) ̸ = #q 2 (D).
Now, we show that if #q 1 (D) = #q 2 (D) for all databases D ∈ D then q 1 and q 2 are renaming equivalent. Since #q 1 (D) = #q 2 (D) for all databases D ∈ D, the above observation yields |x 1 | = |x 2 |. Hence, possibly after some renaming, we can assume thatx 1 =x 2 , drop the subscript, and simply writex.
Let q(x) be a CQ over schema S, let D be an S-database such thatx ⊆ adom(D). By hom(q, D,x) we denote all mappings fromx to adom(D) that can be extended to homomorphisms from q to D. Similarly, by surj(q, D,x) we denote all surjections fromx tō x that lie in hom(q, D,x). Notice that if |surj(q 1 , D q 2 ,x)| > 0 and |surj(q 2 , D q 1 ,x)| > 0 then, by definition, q 1 and q 2 are renaming equivalent.
For T ⊆x let hom T (q, D,x) denote the set of mappings h ∈ hom(q, D,x) such that h(x) ⊆ T . By an inclusion-exclusion argument we get We now show how to compute hom T (q, D,x) for all T ⊆x. For i ≥ 0, let hom i,T (q, D,x) be the set of mappings h ∈ hom(q, D,x) such that h maps exactly i variables fromx into T . In particular, hom T (q, D,x) = hom |x|,T (q, D,x). For j ≥ 1 and T ⊆ adom (D), let D j,T be a database obtained from D by cloning all elements from T exactly j−1 times, i.e. for every a ∈ T the database D j,T has exactly j clones of a. In particular, D 1,T = D.
By Lemma B.2, for every T ⊆x and every j > 0, we have Since the above equation holds for every j ≥ 1, by taking first |x|+1 equations we construct a system of linear equations where |hom i,T (q, D,x)| are the unknowns, the coefficients i j form a Vandermonde matrix, and #q(D j,T ) are the constant terms. Notice that the matrix does not depend on q nor D. Since the matrix has full rank, the values |hom i,T (q, D,x)| are uniquely determined by, and can be effectively computed from, the constant terms #q(D j,T ). In particular, the value |hom T (q, D,x)| = |hom |x|,T (q, D,x)| is uniquely determined by the constant terms, and, in consequence, so is the value |surj(q, D,x)|.
If we apply the above system of equations to CQ q 1 and database D q 2 we can conclude that |surj(q 1 , D q 2 ,x)| is uniquely determined by the values #q 1 (D) for databases D from a certain set S ⊆ D. Similarly, |surj(q 2 , D q 2 ,x)| is uniquely determined the same equations with the values #q 1 (D) replaced by #q 2 (D). Now, since the equations' coefficients do not depend on the query nor on the database and since we have #q 1 (D) = #q 2 (D) for all D ∈ D, we can infer that |surj(q 1 , D q 2 ,x)| = |surj(q 2 , D q 2 ,x)|. Hence, |surj(q 1 , D q 2 ,x)| > 0 as the identity function clearly belongs to surj(q 2 , D q 2 ,x).
A similar reasoning shows that |surj(q 2 , D q 1 ,x)| > 0 and ends the proof.
The notion of renaming equivalence is also connected to semi-counting equivalence.
Lemma B.4 (semi-counting equivalence). Let q 1 (x 1 ), q 2 (x 2 ) be equality-free CQs over schema S. Let D be a class of databases such that • and D is closed under cloning and disjoint union.
Eitherq 1 ,q 2 are renaming equivalent or there is, and can be computed, a database D ∈ D such that |q 1 (D)| ̸ = |q 2 (D)| and for every CQ q over schema S we have that |q(D)| > 0.
Proof. We will show that ifq 1 ,q 2 are not renaming equivalent then there is a database D ∈ D such that #q 1 (D)̸ =#q 2 (D) and such that for every CQ q over schema S we have that #q(D) > 0. Sinceq 1 ,q 2 are not renaming equivalent then, by the previous lemma, we can find a database D ′ ∈ D such that #q 1 (D ′ ) ̸ = #q 2 (D ′ ).
Consider the function f 1 : k → #q 1 (D ′ + kD ⊤ S ) defined for k ≥ 1, where D ′ + kD ⊤ S is the disjoint union of D ′ and k copies of D ⊤ S . After some elementary transformations on #q 1 (D ′ + kD ⊤ S ) we can infer that f 1 is a polynomial in k whose constant term, i.e. term of degree 0, is #q 1 (D ′ ). Similarly, the function f 2 : k → #q 2 (D ′ + kD ⊤ S ) is a polynomial whose constant term is #q 2 (D ′ ). For the details please refer to the proof of Theorem 5.9 in [CM16].
If the counts of q 1 and q 2 would agree on all databases from D then for all k≥1 we would have that f 1 (k)=f 2 (k). Moreover, since f 1 , f 2 are polynomials, this would imply that f 1 and f 2 are equal, i.e. they have the same degree and their corresponding coefficients coincide. In particular, this would imply that #q 1 (D ′ )=#q 2 (D ′ ). But this is impossible, as D ′ was chosen so that #q 1 (D ′ ) ̸ = #q 2 (D ′ ). Therefore, there is k ′ ≥ 1 such that f 1 (k ′ ) ̸ = f 2 (k ′ ).
What remains is to check that D = D ′ + k ′ D ⊤ S is as required. First, by definition D ∈ D. Moreover, since k ′ > 0, for every CQ q over schema S we have that #q(D) > 0. Finally, we observe that #q 1 (D) = f 1 (k ′ ) ̸ = f 2 (k ′ ) = #q 2 (D).
From Lemma B.3 and Lemma B.4 we obtain the following.
Lemma B.5. Let q 1 , q 2 be CQs over some schema S. The following holds: • q 1 and q 2 are counting equivalent if and only if they are renaming equivalent; • q 1 and q 2 are semi-counting equivalent if and only ifq 1 andq 2 are renaming equivalent. Proof. For the first bullet we argue as follows. If q 1 and q 2 are not renaming equivalent then they are not counting equivalent. Indeed, by Lemma B.3 there is a database D such that #q 1 (D) ̸ = #q 2 (D). On the other hand, if q 1 and q 2 are renaming equivalent then the surjections promised by the definition of renaming equivalence provide for every S-database D surjections g D 1 : q 1 (D) → q 2 (D) and g D 2 : q 2 (D) → q 1 (D). Clearly, these are then even bijections. This shows that #q 1 (D) = #q 2 (D) for every S-database D and, thus, shows that q 1 and q 2 are counting equivalent.
For the second bullet we observe that ifq 1 andq 2 are not renaming equivalent then by Lemma B.4 there is a database D such that #q 1 (D) > 0, #q 2 (D) > 0, and #q 1 (D) ̸ = #q 2 (D). Thus, q 1 and q 2 are not semi-counting equivalent. On the other hand, ifq 1 andq 2 are renaming equivalent then they are counting equivalent and for every database such that #q 1 (D) > 0 and #q 2 (D) > 0 we have that #q 1 (D) = #q 1 (D) = #q 1 (D) = #q 2 (D). Thus, q 1 and q 2 are semi-counting equivalent. The middle equality follows from counting equivalence, the other follow from the fact that maximal Boolean connected components either force the answer set to be empty or do not change the size of the answer set.
As a consequence, we get the following.
Lemma B.6 (equivalence relations). Counting equivalence and semi-counting equivalence are equivalence relations.
We can finally prove the first missing lemma.
Proof of Lemma 6.2. For Point 1, if q 1 and q 2 are counting equivalent, then they are counting equivalent over class D. On the other hand, if they are not counting equivalent then, by Lemma B.5, they are not renaming equivalent. Thus, by Lemma B.3 there is a database D ∈ D such that #q 1 (D) ̸ = #q 2 (D). This implies that q 1 and q 2 are not counting equivalent over D. Therefore, q 1 and q 2 are counting equivalent if and only if they are counting equivalent over D. For Point 2, if q 1 and q 2 are semi-counting equivalent, then they are clearly semi-counting equivalent over class D. On the other hand, if they are not semi-counting equivalent, then by Lemma B.5q 1 andq 2 are not renaming equivalent. Thus, by Lemma B.4 there is a database D ∈ D such that #q 1 (D) > 0, #q 2 (D) > 0, and #q 1 (D) ̸ = #q 2 (D). Hence, q 1 and q 2 are not semi-counting equivalent over D. Therefore, q 1 and q 2 are semi-counting equivalent if and only if they are semi-counting equivalent over D.

Appendix C. Additional Details for Section 7.3
This section is dedicated to the results from [CM16] that culminate in the algorithm promised in the statement below.
Theorem 7.4 [CM16]. Let D be a class of databases over some schema S such that D is closed under disjoint union, direct product, and contains D ⊤ S . Then there is an algorithm that (1) takes as input a UCQ q, a CQ p ∈ cl D CM (q), and a database D ∈ D, subject to the promise that for all p ′ ∈ cl D CM (q), there is an equality-free CQ p ′′ such that D p ′′ ∈ D, Dp′′ ∈ D, and p ′ and p ′′ are counting equivalent over D, (2) has access to an oracle for AnswerCount({q}, D), to a procedure for enumerating D, and to procedures for deciding counting equivalence and semi-counting equivalence between CQs over D, (3) runs in time f (||q||) · p(||D||) with f a computable function and p a polynomial, (4) outputs #p (D).
As mentioned before, the statements are provided in our notation and are enough for our purposes. For the original statements please refer to [CM16].
We start by proving a stronger version of Lemma B.4. We show that given a set of pairwise not semi-counting equivalent CQs, we can always find a database that distinguishes those queries.
Lemma C.1 (inequivalence witness; Lemma 5.12 from [CM16]). Let q 1 (x 1 ), q 2 (x 2 ), . . . , q n (x n ), n > 0, be equality-free CQs over schema S such that for all 1 ≤ i ≤ n we have that |x i | > 0. Let D be a class of databases, such that • D ⊤ S ∈ D, • D q i , Dq i ∈ D, for 1 ≤ i ≤ n, • and D is closed under direct product, disjoint union, and cloning. Then there is, and can be computed, a database D ∈ D such that • for all 1 ≤ i ≤ n #q i (D) > 0, • and for all 1 ≤ i, j ≤ n if q i , q j are not semi-counting equivalent then #q i (D) ̸ = #q j (D).
Proof. We construct the database inductively. By requirement, the constructed database D will satisfy that #q(D) > 0 for every CQ q(x) over schema S. Therefore, for every pair of semi-counting equivalent CQs q, q ′ we will necessarily have that #q(D) = #q ′ (D). This allows us to assume without loss of generality that the CQs q i used in the construction are pairwise not semi-counting equivalent.
For the base case, i.e. n=1, take database D ⊤ S . Now, let us assume that n > 0 and we have already created the database D n for the queries q 1 , . . . , q n . For the inductive step, we show how to construct database D n+1 for the queries q 1 , . . . , q n , q n+1 .
Without loss of generality, we can assume that 0 < #q 1 (D n ) < #q 2 (D n ) < · · · < #q n (D n ). Now, if #q n+1 (D n ) ̸ = #q i (D n ) for all 1 ≤ i ≤ n then we are done and we take D n+1 = D n . Otherwise, let us assume that there is 1≤i≤n such that #q n+1 (D n )=#q i (D n ).
By Lemma B.4 there is a database D ′ ∈ D such that #q n+1 (D ′ ) ̸ = #q i (D ′ ) and for every CQ q over schema S we have that #q(D ′ ) > 0. We can assume that #q n+1 (D ′ ) > #q i (D ′ ). If the equality is reversed we simply swap q n+1 with q i before proceeding. Now, we can show that there is l > 0 such that for the database D = D ′ × (D n ) l , where (D n ) l is the direct product of l copies of D n , we have 0 < #q 1 (D) < · · · < #q i (D) < #q n+1 (D) < #q i+1 (D) < · · · < #q n (D).
Clearly, #q(D) > 0 for every CQ q(x) over schema S. Moreover, since D is a product of databases from D we have that D ∈ D. Thus, taking D n+1 = D ends the inductive step and the whole construction. The witness produced in the above statement will not distinguish two CQs in cl D CM (q) that are semi-counting equivalent, but not counting equivalent. The below lemma shows that this is not necessarily a problem.
Lemma C.2 (Lemma 5.18 in [CM16]: extracting counts from semi-counting equivalence classes). Let p 1 , . . . , p n be a set of semi-counting equivalent equality-free CQs that are pairwise not counting equivalent and let c 1 , . . . , c n be a set of non-zero integers. Let D be a class of databases such that • D p i ∈ D, for 1 ≤ i ≤ n, • and D is closed under direct product.
There is an fpt algorithm that performs the following: given a database D ∈ D and a CQ q ∈ {p 1 , . . . , p n } the algorithm computes #q (D); the algorithm may make calls to an oracle A that provides i c i · #p i (D ′ ) upon being given a database D ′ ∈ D.
Proof. Let S = {p 1 , . . . , p n }. We start with the observation that for all non-empty subsets S ′ ⊆ S there are, and can be computed, a CQ q S ′ ∈ S ′ and a database D S ′ such that (⋄) #q S ′ (D ′ ) > 0 and for all q ∈ S ′ \ {q S ′ } we have that #q(D S ′ ) = 0. Indeed, let q 1 (x 1 ), q 2 (x 2 ) ∈ S ′ be two different CQs. Since q 1 , q 2 are semi-counting equivalent, the CQsq 1 ,q 2 are renaming equivalent by Lemma B.5. Hence, there are two surjections h 1 :x 1 →x 2 and h 2 :x 2 →x 1 that can be extended to homomorphisms h 1 :q 1 → Dq 2 and h 2 :q 2 → Dq 1 . Therefore, if there would be homomorphisms g 1 : q 1 → D q 2 and g 2 : q 2 → D q 1 then we could extend the surjections h 1 and h 2 to homomorphisms h 1 : q 1 → D q 2 and h 2 : q 2 → D q 1 , respectively. This would imply that q 1 and q 2 are renaming equivalent and, by Lemma B.5, counting equivalent. Since they are not counting equivalent, one of the homomorphisms g 1 or g 2 does not exist.
Let q S ′ be a minimal element in S ′ with respect to the partial order defined as q ≤ q ′ if there is a homomorphism from q ′ to D q . Let D S ′ = D q S ′ . It is easy to check that the pair (q S ′ , D S ′ ) satisfies the requirements in (⋄). Let get-min(S ′ ) be the algorithm that given a set S ′ ⊆ {p 1 , . . . , p n } returns the pair (q S ′ , D S ′ ).
Finally we can describe the desired algorithm. For T ⊆ {p 1 , . . . , p n }, let A T be an oracle that takes a database D ∈ D and returns the value p i ∈T c i · #p i (D). Then, the algorithm promised by the lemma, let us call it compute-count(T, q, A T , D), takes a set of CQs T , a CQ q ∈ T , an oracle A T (·), a database D ∈ D, and outputs the value #q (D). The algorithm works as follows.
First, the algorithm finds the pair (p i , D i ) = get-min(T ). If p i = q then it returns = p j ∈T ′ c j · #p j (D ′ ). The algorithm clearly belongs to FPT. To infer that it returns the desired value, we observe that The first equality follows from definition of A T , the second is the product rule, the third from the fact that by construction of D i , for all p j ∈ T we have that #p j (D i ) = 0 if and only if p j ̸ = p i . The last equality is trivial.