Integrity Constraints Revisited: From Exact to Approximate Implication

Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been investigated in both the database and the AI literature, under the assumption that all constraints hold exactly. However, many applications today consider constraints that hold only approximately. In this paper we define an approximate implication as a linear inequality between the degree of satisfaction of the antecedents and consequent, and we study the relaxation problem: when does an exact implication relax to an approximate implication? We use information theory to define the degree of satisfaction, and prove several results. First, we show that any implication from a set of data dependencies (MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most quadratic in the number of variables; when the consequent is an FD, the factor can be reduced to 1. Second, we prove that there exists an implication between CIs that does not admit any relaxation; however, we prove that every implication between CIs relaxes"in the limit". Then, we show that the implication problem for differential constraints in market basket analysis also admits a relaxation with a factor equal to 1. Finally, we show how some of the results in the paper can be derived using the I-measure theory, which relates between information theoretic measures and set theory. Our results recover, and sometimes extend, previously known results about the implication problem: the implication of MVDs and FDs can be checked by considering only 2-tuple relations.


Introduction
Traditionally, integrity constraints are assertions about a database that are stated by the database administrator and enforced by the system during updates. However, in several applications of Big Data, integrity constraints are discovered, or mined in a database instance, as opposed to being asserted by the administrator [GR04, SBHR06, CIPY14, BBF + 16, KN18]. For example, data cleaning can be done by first learning conditional functional dependencies in some reference data, then using them to identify inconsistencies in the test data [IC15,CIPY14]. Causal reasoning [SGS00,PE08,SGS18] and learning sum-ofproduct networks [PD11, FD16, MVM + 18] repeatedly discover conditional independencies example, suppose we prove that a given set of FDs implies another FD, but the input data satisfies the antecedent FDs only to some degree: to what degree does the consequent FD hold in the database? The relaxation problem asks whether we can convert an exact implication into an approximate implication. In other words, we ask how the error in the antecedents propagates to the error in its consequent. When relaxation holds, then the error to the consequent can be bounded, and any inference system for proving exact implication, e.g. using a set of axioms or some algorithm, can be used to infer an approximate implication.
In order to study the relaxation problem we need to measure the degree of satisfaction of a constraint. In this paper we use Information Theory. This is the natural semantics for modeling CIs of multivariate distributions, because X ⊥ Y | Z iff I(X; Y |Z) = 0 where I is the conditional mutual information. FDs and MVDs are special cases of CIs [Lee87, DR00, WBW00] (reviewed in Section 2.1), and thus they are naturally modeled using the information theoretic measure I(X; Y |Z) or H(Y |X); in contrast, EMVDs do not appear to have a natural interpretation using information theory, and we will not discuss them further in this work. An approximate implication (ApxI) is an inequality that (numerically) bounds the information-theoretic measure of the consequent (e.g., conditional entropy H(·|·) if it is an FD, or the conditional mutual information I(·; ·|·) if it is an MVD) by a linear combination of the information-theoretic measures of the antecedents. The link between integrity constraints and information theoretic measures is, in itself, not new. Several papers have argued that information theory is a suitable tool to express integrity constraints [Lee87, DR00, WBW00, Mal86,GR04].
An exact implication (EI) becomes an assertion of the form (σ 1 = 0 ∧ σ 2 = 0 ∧ . . .) ⇒ (τ = 0), while an approximate implication (ApxI) is a linear inequality τ ≤ λ · σ i , where λ ≥ 0, and τ, σ 1 , σ 2 , . . . are information theoretic measures. We say that a class of constraints can be relaxed if every EI where the antecedents are from this class, implies the corresponding ApxI; we also say that this EI admits a λ-relaxation, when we want to specify the factor λ in the ApxI. By the non-negativity of the Shannon information measures (described in Section 2), we get that an ApxI always implies the corresponding EI.
Results. We make several contributions, summarized in Table 1. We start by showing in Section 4 that MVDs+FDs admit an n 2 /4-relaxation, where n is the number of variables. When the consequent is an FD, we show that implication admits a 1-relaxation. Thus, whenever an exact implication holds between MVD+FDs, a simple linear inequality also holds between their associated information theoretic terms. In fact, we prove a stronger result that holds for CIs in general, which implies the result for MVDs+FDs. In Section 8, we further show that under some additional syntactic restrictions to the antecedents, the bound can be tightened: from a n 2 /4-relaxation to a 1-relaxation, even when the consequent is not an FD. We leave open the question of whether a 1-relaxation exists in general.
So far, we have restricted ourselves to saturated or conditional CIs (which correspond to MVDs or FDs). In Section 5 we remove all restrictions, and prove a negative result: there exists an EI that does not relax (Eq. (5.4), based on an example in [KR13]). Nevertheless, we show that every EI can be relaxed to its corresponding ApxI plus an error term, which can be made arbitrarily small, at the cost of increasing the factor λ. This result implies that every EI is a consequence of its corresponding inequality ApxI, plus an error term. In fact, the EI in Eq. (5.4) follows from an inequality by Matúš [Mat07], which is precisely the associated ApxI plus an error term; our result shows that every EI can be proven in this style.
In Section 6 we consider approximate (and exact) implications that can be proved using Shannon's inequalities (monotonicity and submodularity, reviewed in Section 2.2). In general, Shannon's inequalities are sound but incomplete for proving exact and approximate implications that hold for all probability distributions [ZY97,ZY98], but they are complete for deriving inequalities that hold for all polymatroids (defined in Section 2.2) [Yeu08]. We also prove that every exact implication that holds for all polymatroids relaxes to an approximate implication with a finite upper bound λ ≤ (2 n )!, and a lower bound λ ≥ 3; the tightness of these bounds remain open.
In Section 7 we show that the Shannon inequalities are sound and complete for implication from measure-based constraints [SGG08] that arise in market-basket analysis. More generally, in Section 7 we restrict the class of models used to check an implication, to probability distributions with exactly 2 outcomes (tuples), each with probability 1/2; we justify this shortly. We prove that, under this restriction, the implication problem has a 1-relaxation. Restricting the models leads to a complete but unsound method for checking general implication, however this method is sound for saturated+conditional CIs (as we show in Section 4), and is also sound for deriving implications from Frequent Itemset constraints (as we show in Section 7).
In Section 8 we extend some of the results in this paper and provide alternative proofs using the I-measure theory that relates between Shannon's information measures and settheory. Specifically, we extend the result of Section 4, and provide an alternative proof to the result of Section 7.
Two Consequences. While our paper is focused on relaxation, our results have two consequences for the exact implication problem. The first is a 2-tuple model property: an exact implication, where the antecedents are saturated or conditional CIs, holds iff it holds on all uniform probability distributions with 2 tuples. A similar result is known for MVD+FDs [SDPF81]. Geiger and Pearl [GP93], building on an earlier result by Fagin [Fag82], prove that every set of CIs has an Armstrong model : a discrete probability distribution that satisfies only the CIs and their consequences, and no other CI. The Armstrong model is also called a global witness, and, in general, can be arbitrarily large. Our result concerns a local witness: for any EI, if it fails on some probability distribution, then it fails on a 2-tuple uniform distribution.
The second consequence concerns the equivalence between the implication problem of saturated+conditional CIs with that of MVD+FDs. It is easy to check that the former implies the latter (Section 2). Wong et al. [WBW00] prove the other direction (i.e., if an implication from MVDs+FDs holds then the same implication holds for saturated+conditional CIs), relying on the sound and complete axiomatization of MVDs [BFH77]. Our 2-tuple model property implies the other direction almost immediately, leading to a much simpler proof in Section 4.
This article extends the conference publication by the authors [KS20]. We have added in this article all the proofs and intermediate results that were excluded from the conference paper, as well as examples illustrating the ideas and methods introduced. We say that the instance R satisfies the functional dependency (FD) X → Y , and write When XY Z = Ω, then we call X Y | Z a multivalued dependency, MVD; notice that X, Y, Z are not necessarily disjoint [BFH77].
A set of constraints Σ implies a constraint τ , in notation Σ ⇒ τ , if for every instance R, if R |= Σ then R |= τ . The implication problem has been extensively studied in the literature; Beeri et al. [BFH77] gave a complete axiomatization of FDs and MVDs along with a polynomial time procedure for deciding implication. Herrman [Her06] showed that the implication problem for EMVDs is undecidable. In the following we refer to this type of implication as Exact Implication, abbreviated EI.
Recall that two discrete random variables X, Y are called independent if p(X = x, Y = y) = p(X = x) · p(Y = y) for all outcomes x, y. We say that X, Y are conditionally independent given another discrete random variable Z if p(X = x, Y = y|Z = z) = p(X = x|Z = z) · p(Y = y|Z = z) for all outcomes x, y, and z. Fix Ω = {X 1 , . . . , X n } a set of n jointly distributed discrete random variables with finite domains D 1 , . . . , D n , respectively; let p : D 1 × · · · × D n → [0, 1] be the probability mass.
An assertion Y ⊥ Z|X is called a Conditional Independence statement, or a CI; this includes X → Y as a special case. When XY Z = Ω we call it saturated. When XY ⊆ Ω and Z = ∅, we call it marginal. A set of CIs Σ implies a CI τ , in notation Σ ⇒ τ , if every probability distribution that satisfies Σ also satisfies τ . This implication problem has also been extensively studied: Pearl and Paz [PP86] gave a sound but incomplete set of graphoid axioms, Studeny [Stu90] proved that no finite axiomatization exists, while Geiger and Pearl [GP93] gave a complete axiomatization for saturated, and marginal CIs.
Lee [Lee87] observed the following connection between database constraints and CIs. The empirical distribution of a relation R is the uniform distribution over its tuples, in other words, ∀t ∈ R, p(t) = 1/|R|. Then: Lemma 2.2 [Lee87]. For all X, Y, Z ⊆ Ω such that XY Z = Ω. As can be seen in the Example of Table 2, the lemma no longer holds for EMVDs, and for that reason we no longer consider EMVDs in this paper. The lemma immediately implies that if Σ is a set of saturated+conditional CIs and the implication Σ ⇒ τ holds for all probability distributions, then the corresponding implication holds in databases, where the CIs are interpreted as MVDs or FDs respectively. Wong [WBW00] gave a non-trivial proof for the other direction; we will give a much shorter proof in Corollary 4.3. Table 2. The relation R[X 1 , X 2 , X 3 ] satisfies the EMVD ∅ X 1 | X 2 , yet for the empirical distribution, I h (X 1 ; X 2 ) = 0 because X 1 , X 2 are dependent: p(X 1 = a) = 2/5 = p(X 1 = a | X 2 = c) = 1/2.

2.2.
Background on Information Theory. We adopt required notation from the literature on information theory [Yeu08,Cha11]. For n > 0, we identify vectors in R 2 n with functions 2 [n] → R.

Polymatroids.
A function 1 h ∈ (R + ) 2 n is called a polymatroid if h(∅) = 0 and it satisfies the following inequalities, called Shannon inequalities: (2.2) The set of polymatroids is denoted Γ n ⊆ (R + ) 2 n , and forms a polyhedral cone (reviewed in Section 5). For any polymatroid h and subsets A, B, C ⊆ [n], we define 2 Then, for all h ∈ Γ n , we have I h (B; C|A) ≥ 0 and h(B|A) ≥ 0. The chain rule for entropies and mutual information, respectively, are the identities: when the rule being applied is clear from the context, we will only say that we apply the chain rule. Entropic Functions. If X is a random variable with a finite domain D and probability mass p, then H(X) denotes its entropy For a set of jointly distributed random variables Ω = {X 1 , . . . , X n } we define the function h : 2 [n] → R + as h(α) def = H(X α ) (see Notation 2.1); h is called an entropic function, or, with some abuse, an entropy. The set of entropic functions is denoted Γ * n . The quantities h(B|A) and I h (B; C|A) are called the conditional entropy and conditional mutual information respectively. The conditional independence p |= B ⊥ C | A holds iff I h (B; C|A) = 0, and similarly p |= A → B iff h(B|A) = 0, thus, entropy provides us with an alternative characterization of CIs.
Notation 2.3. We summarize the various ways a conditional independence statement (CI) and related notions are represented in the paper, and the relationships between them. In what follows, we let p be an n-variable joint probability distribution over variable set Ω, and we let H p denote the entropy function corresponding to p (see (2.8)). Let X, Y, Z ⊆ Ω where X and Y are non-empty.
(1) The notation X⊥Y |Z implies that for any set of values x, y, z in the domains of X, Y and Z respectively, it holds that p( (2) The statement X⊥Y |Z is equivalent to saying that I Hp (X; Y |Z) = 0 where I Hp is defined in (2.5). Note, however, that the notation I h (X; Y |Z) = 0 is more general since we only assume that h is a polymatroid (i.e., but not necessarily the entropy function associated with a probability distribution).
(3) For a polymatroid h (which may or may not be an entropic function), we denote by the triple σ def = (X; Y |Z) a CI statement that may hold either exactly (i.e., I h (X; Y |Z) = 0), or approximately (i.e., I h (X; Y |Z) ≤ ε for some threshold ε > 0). We then abbreviate: (4) When XY Z = Ω, and p is a uniform distribution (i.e., for every r ∈ D Ω either p(r) = 1 M for some M > 0 or p(r) = 0), then by Lemma 2.2, an MVD Z X|Y holds in p iff I Hp (X; Y |Z) = 0.
Next, we prove two simple, technical lemmas concerning CIs (X; Y |Z) where the intersection between the variable-sets (e.g., X ∩ Y , X ∩ Z, or Y ∩ Z) may be non-empty. These lemmas will be used later on.
Lemma 2.4. Let h ∈ Γ n be an n-variable polymatroid, and let X, Y and Z denote subsets of of variables. Then for any CI (X; Therefore, by (2.5): Lemma 2.5. Let h ∈ Γ n be an n-variable polymatroid, and let X, Y and Z denote subsets of variables.
Hence, we can write I h (X; Y |Z) as: By definition, X , Y , Z and B XY are pairwise disjoint, thus proving the claim.

2-Tuple Relations and
Step Functions. 2-tuple relations play a key role for the implication problem of MVDs+FDs: if an implication fails, then there exists a witness consisting of only two tuples [SDPF81]. We define a step function as the entropy of the empirical distribution of a 2-tuple relation; R = {t 1 , t 2 }, t 1 = t 2 , and p(t 1 ) = p(t 2 ) = 1/2. We denote the step function by h U , where U Ω is the set of attributes where t 1 , t 2 agree. One can check: , and computing the entropy gives us that h U (W ) = 1 2 log 2 + 1 2 log 2 = 1. If we set U = Ω in (2.9) then h Ω ≡ 0. That is, h Ω (W ) = 0 for all W ⊆ Ω, and hence, h Ω is equivalent to the function that always returns 0. Unless otherwise stated, in this paper we do not consider h Ω to be a step function. Thus, there are 2 n − 1 step functions and their set is denoted S n . We will use the following fact extensively in this paper: I h U (Y ; Z|X) = 1 if X ⊆ U and Y, Z ⊆ U , and I h U (Y ; Z|X) = 0 otherwise. We now present two simple polymatroids that correspond to the entropic functions of the distributions in Figure 1.
Example 2.6. Consider the relational instance in Fig. 1 (a). Its entropy is the step function h U 1 U 2 (W ), which is 0 for W ⊆ U 1 U 2 and 1 otherwise.
Example 2.7. The relational instance R = {(x, y, z) | x + y + z mod 2 = 0} in Fig. 1 (b) is called the parity function. Its entropy is h( To see this, observe that: Discussion. This paper studies exact and approximate implications, expressed as equalities or inequalities of entropic functions h. For example, the augmentation axiom for MVDs [BFH77] A B|CD ⇒ AC B|D is expressed as I h (B; CD|A) = 0 ⇒ I h (B; D|AC) = 0, which holds by the chain rule (2.7). Thus, our golden standard is to prove that (in)equalities hold for all entropic functions h ∈ Γ * n . Fix a set A ⊆ R N in an N -dimensional euclidean space with distance metric d. The topological closure of A is defined as: Equivalently, cl (A) consists of all limits of convergent sequences in A. The set A is called topologically closed if A = cl (A). The topological closure enjoys the following basic property: if f : R N → R is any continuous function, and f (x) ≥ 0 for all x ∈ A, then f (x) ≥ 0 for x ∈ cl (A). Yeung [Yeu08] has proven that, when n ≥ 3, then Γ * n is not topologically closed, in other words, Γ * n cl (Γ * n ). The elements of the set cl (Γ * n ) are called almost entropic functions. Equivalently, a function g is almost entropic if it is the limit of a sequence of entropic functions. It follows immediately from our discussion that, if an inequality holds for all entropic functions h ∈ Γ * n , then, by continuity, it also holds for all almost entropic functions h ∈ cl (Γ * n ). However, this observation does not extend to implications of (in)equalities; Kaced and Romashchenko [KR13] gave an example of an exact implication that holds only for entropic functions but fails for almost entropic functions. Thus, when discussing an EI, it matters whether we assume that it holds for Γ * n or for cl (Γ * n ). The only result in this paper where this distinction matters are the two main theorems in Section 5: the negative result Theorem 5.1 holds for both Γ * n and for cl (Γ * n ), while the positive result Theorem 5.2 holds only for cl (Γ * n ). The results in Section 4 apply to any set of polymatroids K that contains all step functions, i.e. S n ⊆ K ⊆ Γ n , thus they apply to both Γ * n and cl (Γ * n ), while those in Section 6 and Section 7 are stated only for Γ n and only for (the conic closure of) S n respectively.

Definition of the Relaxation Problem
We now formally define the relaxation problem. We fix a set of variables Ω = {X 1 , . . . , X n }, and consider formulas of the form σ = (Y ; Z|X), where X, Y, Z ⊆ Ω, which we call a conditional independence, CI; when Y = Z then we write it as X → Y and call it a conditional. An implication is a formula Σ ⇒ τ , where Σ is a set of CIs called antecedents and τ is a CI called consequent. For a CI σ = (B; C|A), we define h(σ) def = I h (B; C|A) (see (2.5)), for a set of CIs Σ, we define h(Σ) We recall that S n is the set of step functions, and that Γ n is the set of polymatroids.
We will sometimes consider an equivalent definition for ApxI, as σ∈Σ λ σ h(σ) ≥ h(τ ), where λ σ ≥ 0 are coefficients, one for each σ ∈ Σ; these two definitions are equivalent, by taking λ = max σ λ σ . Notice that both EI and ApxI are preserved under subsets of K in the sense that ≥ 0 for every CI τ , and every polymatroid h. In this paper we study the reverse.
Definition 3.2. Let I be a syntactically-defined class of implication statements (Σ ⇒ τ ), and let K ⊆ Γ n . We say that I admits a relaxation in K if every implication statement (Σ ⇒ τ ) in I that holds exactly also holds approximately: K |= EI Σ ⇒ τ implies K |= ApxI Σ ⇒ τ . We say that I admits a λ-relaxation in K, if every EI admits a λ-ApxI.
If K 1 ⊆ K 2 and I admits a λ-relaxation in K 1 then, in general, it does not necessarily admit a λ-relaxation in K 2 , nor vice versa. However, the following more limited fact holds: Fact 3.3. If I admits a λ-relaxation in K, then it also admits a λ-relaxation in cl (K).

Relaxation for FDs and MVDs: Always Possible
In this section we consider the implication problem where the antecedents are either saturated CIs, or conditionals. This is a case of special interest in databases, because the constraints correspond to MVDs, or FDs. Recall that a CI (B; C|A) is saturated if ABC = Ω (i.e., the set of all attributes). Our main result in this section is: Theorem 4.1. Assume that each formula in Σ is either saturated or a conditional (e.g., Z → X), and let τ be an arbitrary CI. Assume S n |= EI Σ ⇒ τ . Then: Before we prove the theorem, we list two important consequences.
Corollary 4.2. Let Σ consist of saturated CIs and/or conditionals, and let τ be any CI.
The corollary implies that if Σ, τ are restricted to saturated CIs and/or conditionals, then the Exact Implication problem is the same for S n as it is for any other set K where The corollary has an immediate application to the inference problem in graphical models [GP93]. There, the problem is to check if every probability distribution that satisfies all CIs in Σ also satisfies the CI τ ; we have seen that this is equivalent to Γ * n |= EI Σ ⇒ τ . The corollary states that it is enough that this implication holds on all of the uniform 2-tuple distributions, i.e., S n |= EI Σ ⇒ τ , because this implies the (even stronger!) statement Γ n |= EI Σ ⇒ τ . Decidability (i.e., for exact implication) was already known: Geiger and Pearl [GP93] proved that the set of graphoid axioms is sound and complete for the case when both Σ and τ are saturated. Specifically, the equivalence between saturated CIs and MVDs [GP93] enables the application of the polynomial implication algorithm devised for MVDs [Bee80] which, in this setting where both Σ and τ are saturated, has a runtime complexity of O(|Σ|n 2 ). Gyssens et al. [GNG14] improve this result by dropping any restrictions on τ .
The second consequence is the following: Corollary 4.3. Let Σ, τ consist of saturated CIs and/or conditionals. Then the following two statements are equivalent: (1) The implication Σ ⇒ τ holds, where we interpret Σ, τ as MVDs and/or FDs.
Proof. We have shown right after Lemma 2.2 that (2) implies (1). For the opposite direction, by Th. 4.1, we need only check S n |= EI Σ ⇒ τ , which holds because on every uniform probability distribution a saturated CI holds iff the corresponding MVD holds, and similarly for conditionals and FDs. Since the 2-tuple relation satisfies the implication for MVDs+FDs, it also satisfies the implication for CIs, proving the claim.
Wong et al. [WBW00] proved that the implication for MVDs is equivalent to that of the corresponding saturated CIs (called there BMVD); they did not consider FDs. For the proof in the hard direction, they use the sound and complete axiomatization of MVDs in [BFH77]. In contrast, our proof is independent of any axiomatic system, and is also much shorter. Finally, we notice that the corollary also implies that, in order to check an implication between MVDs and/or FDs, it suffices to check it on all 2-tuple databases: indeed, this is equivalent to checking S n |= EI Σ ⇒ τ , because this implies Item (2), which in turn implies item (1). This rather surprising fact was first proven in [SDPF81]. The proof of Theorem 4.1 follows from a series of lemmas, and a Theorem that is of independent interest, that we prove next. Before proceeding, we note that we can assume w.l.o.g. that Σ consists only of saturated CIs. Indeed, if Σ contains a non-saturated term, then by assumption it is a conditional, X → Y , and we will replace it with two saturated terms: Thus, we will assume w.l.o.g. that all formulas in Σ are saturated.
We say that a CI (X; By the chain rule, h(B XY |Z) and I h (X ; Y |Z) can be written as the sum of m = |B XY |, and |X | · |Y | = n X n Y elemental terms respectively. Since B XY , X , and Y are pairwise disjoint, then m + n X + n Y ≤ n. Therefore: Where the last transition follows by observing that 1 4 m(4 − m) ≤ 1 for all integral m (e.g., for m ∈ {1, 3} then 1 4 m(4 − m) = 3 4 , and for m = 2, 1 4 m(4 − m) = 1). Since the number of elemental terms is integral then m + n X · n Y ≤ n 2 4 as required.
Theorem 4.1 follows from the next result, which is also of independent interest. We say that σ covers τ if all variables in τ are contained in σ; for example σ = (abc; d|e) covers τ = (cd; be). Then: Theorem 4.5. Let τ be an elemental CI, and suppose each formula in Σ covers τ . Then Notice that Theorem 4.5 immediately implies Item (1) of Theorem 4.1, because by Lemma 4.4 every τ = (Y ; Z|X) can be written as a sum of at most n 2 /4 elemental terms.
In what follows, we prove Theorem 4.5, then use it to prove item (2) of Theorem 4.1.
Finally, we consider whether (1) of Theorem 4.1 can be strengthened to a 1-relaxation; we give in Th. 4.11 below a sufficient condition, whose proof uses the notion of I-measure (Section 8), and leave open the question whether 1-relaxation holds in general for implications where the antecedents are saturated CIs and conditionals. Definition 4.6. We say that two CIs (X; Y |Z) and (A; B|C) are disjoint if at least one of the following four conditions holds: If τ = (X; Y |Z) and σ = (A; B|C) are disjoint, then for any step function h W , it cannot be the case that both h W (τ ) = 0 and h W (σ) = 0. Indeed, if such a W exists, then Z, C ⊆ W and, assuming (1) X ⊆ C (the other three cases are similar), we have ZX ⊆ W thus h W (τ ) = 0.
4.1. Proof of Theorem 4.5. The following holds by the chain rule, and will be used later on.
. By the chain rule, we have that: Noting that Z = CZ A Z B , we get that I(X; Y |Z) ≤ I(A; B|C) as required.
We now prove theorem 4.5. We use lower case for single variables, thus τ = (x; y|Z) because it is elemental. We may assume w.l.o.g. that neither x nor y are in Z: x, y ∈ Z (otherwise I h (x; y|Z) = 0 and the lemma holds trivially). The deficit of an elemental CI τ = (x; y|Z) is the quantity |Ω − Z|. We prove by induction on the deficit of τ that Assume S n |= EI (Σ ⇒ τ ), and consider the step function h Z at Z. Since h Z (τ ) = 1, there exists σ ∈ Σ, written σ = (A; B|C), such that h Z (σ) = 1; this means that C ⊆ Z, and A, B ⊆ Z. In particular x, y ∈ C, therefore x, y ∈ AB, because σ covers τ . If x ∈ A and y ∈ B (or vice versa), then Γ n |= h(τ ) ≤ h(σ) by Lemma 4.8, proving the theorem. Therefore, we assume w.l.o.g. that x, y ∈ A and none is in B. Furthermore, since B ⊆ Z, there exists u ∈ B − Z.
Base case: τ is saturated. Then u ∈ xyZ, contradicting the assumption that τ is saturated; in other words, in the base case, it is the case that x ∈ A and y ∈ B. Step: Let Z A = Z∩A, and Z B = Z∩B. Since C ⊆ Z, and σ = (A; B|C) covers τ , then , and we use the chain rule to define σ 1 , σ 2 : Since σ 2 is a saturated CI (i.e., contains all variables in σ, which is saturated), then Σ 2 is saturated as well (i.e., contains only saturated CIs).

4.3.
A special case. In Theorem 4.11 we show that under the assumption that the set of CIs in Σ is pairwise disjoint (Definition 4.6), we also obtain a 1-relaxation. The rather technical proof that relies on the I-measure is deferred to Section 8, where we present the I-measure.

Relaxation for General CIs
We now extend our discussion from saturated CIs and conditionals (or, equivalently, FDs and MVDs), to arbitrary Conditional Independence statements. We prove two results in this section. First, we prove that relaxation fails in general, and, second, we prove that a weaker form of relaxation holds.
In both results the relaxation problem is considered in cl (Γ * n ). Recall that our golden standard is to check whether the relaxation problem holds in Γ * n . As we saw, when the constraints are restricted to FDs and MVDs, then the relaxation problem is the same in S n , in Γ * n , in cl (Γ * n ), and in Γ n . But for general constraints, these relaxation problems differ. Our first result in this section (the impossibility result) also holds in Γ * n , by Fact. 3.3, but we leave open the question whether the second result (weak relaxation) holds in Γ * n . We state formally the two results in this section: Theorem 5.1. There exists Σ, τ with four variables, such that cl Γ * 4 |= EI (Σ ⇒ τ ) and cl Γ * 4 |= ApxI (Σ ⇒ τ ). Theorem 5.2. Let Σ, τ be arbitrary CIs, and suppose cl (Γ * n ) |= EI Σ ⇒ τ . Then, for every ε > 0 there exists λ > 0 such that, for all h ∈ cl (Γ * n ): We will prove both theorems shortly. Before we do this, however, we provide some background and context for these theorems, which requires us to review the concept of cones.
5.1. Cones. Both theorems 5.1 and 5.2 are best understood when viewed through the lens of convex analysis, in particular cones. We briefly review cones here, and refer to [Sch03,Stu93,BV04] for more details. Fix some number N > 0. A set K ⊆ R N is called a cone, if for every x ∈ K and θ ≥ 0 we have that θx ∈ K. A set C ⊆ R N is called convex if, for any two points x 1 , x 2 ∈ C and any θ ∈ [0, 1], θx 1 + (1 − θ)x 2 ∈ C. Unless otherwise stated, in this paper every cone will be assumed to be convex. The intersection of a, not necessarily finite, set of convex cones is also a convex cone. The conic hull of a set C ⊆ R N , in notation conhull (C), is the smallest convex cone that contains C, or, equivalently, it is the set of vectors of the form Fix a vector u ∈ R N . The set K = {x | x·u ≤ 0} is a convex cone called a linear half-space. A polyhedral cone is the intersection of a finite number of linear half-spaces. Equivalently, K is polyhedral if K = {x | x·u 1 ≤ 0, . . . , x·u r ≤ 0}, where u 1 , . . . , u r ∈ R N are fixed vectors, or, also equivalently, K = {x | x T A ≤ 0} where A ∈ R N ×r is a matrix. A cone K is called finitely generated if K = conhull (C) for some finite set C ⊆ R N ; equivalently, K is finitely generated if 3 K = {Az | z ∈ R m , z ≥ 0}, for some matrix A ∈ R N ×m . Results by Farkas, Minkowski, and Weyl imply that a cone is finitely generated iff it is polyhedral [Sch03, pp.61].
As we discussed, an entropic function, or a polymatroid h : 2 [n] → R + can be seen as a vector h ∈ R N , where N def = 2 n , in other words Γ * n , Γ n ⊆ R N . By definition, Γ n is a polyhedral cone, hence it is a finitely generated, convex cone. Yeung [Yeu08] has proven that, when n ≥ 3, then Γ * n is neither convex, nor a cone, but its topological closure cl (Γ * n ) is always a convex cone. When n ≤ 3, cl (Γ * n ) = Γ n and thus cl (Γ * n ) is finitely generated. For n ≥ 4, cl (Γ * n ) is not finitely generated [Mat07]. A conditional entropy h(B|A) = h(AB) − h(A) is equal to u·h where u ∈ R N is the vector having +1 on the dimension AB, −1 on the dimension A, and 0 everywhere else. Similarly, the mutual information where v is a vector with two +1's corresponding to dimensions AB and AC, two −1's corresponding to dimensions ABC and A, and the rest 0.
This discussion justifies phrasing the relaxation problem as follows. Fix a convex cone K ⊆ R N , and let y 0 , y 1 , . . . , y m be m + 1 vectors in R N . Relaxation asks whether statement (5.2) below implies statement (5.3): x·y 1 ≤ 0 ∧ · · · ∧ x·y m ≤ 0 ⇒ x·y 0 ≤ 0 (5.2) ∃θ 1 , . . . , θ m , ∀x ∈ K : x · y 0 ≤ θ 1 x·y 1 + · · · + θ m x·y m (5.3) When each y i has the property x·y i ≥ 0 for all x ∈ K (as is the case with the vectors defining h(B|A) and I h (B; C|A)), then the condition x·y i ≤ 0 is equivalent to x·y i = 0, and statement (5.2) is an Exact Implication. Furthermore, we can set all θ i 's in statement (5.3) to be equal to λ def = max i θ i , because x·y i ≥ 0 implies i θ i x·y i ≤ λ i x·y i , and statement (5.3) is an Approximate Implication. If we view relaxation at this level of generality, then it is easy to find cases where relaxation holds, and where relaxation fails: Theorem 5.3. (1) If K is finitely generated, then statement (5.2) implies statement (5.3).
(1) We give here a quick and simple proof based on Farkas lemma; in the next section we give in Theorem 6.1 a slightly more elaborate proof, based on the strong duality property (which is a consequence of Farkas' lemma), in order to obtain an upper bound on the relaxation coefficient.
More precisely, we use here the following version of Farkas lemma [Sch03, pp.61, Corollary 5.3a]. For any matrix A ∈ R N ×M and vector y ∈ R N the following two statements are equivalent: Let A be the N × (r + m) matrix whose columns are the vectors u 1 , . . . , u r , y 1 , . . . , y m . Then, statement (5.2) can be written equivalently as: Farkas lemma implies the existence of a vector z ∈ R r+m such that z ≥ 0 and Az = y. Denoting its components as z = (γ 1 , . . . , γ r , θ 1 , . . . , θ m ), we have y = ( j γ j u j ) + ( i θ i y i ) and therefore: since x·u j ≤ 0, proving the claim.
(2) Let K ⊆ R 3 be the following cone: [BV04], i.e. the set of semi-positive definite 2 × 2 matrices: It is immediate to check that K is a convex cone. Then K satisfies the following Exact Implication: because x 1 ≤ 0 is equivalent to x 1 = 0, implying x 2 2 ≤ 0 thus x 2 = 0. However, K does not satisfy the corresponding Approximate Implication, more precisely the following is false: The first statement is normally given with ≥ 0 instead of ≤ 0. It is easy to revert the inequality by replacing A, y with −A, −y. Indeed, for every choice of λ > 0, choose 0 < x 1 < 1/λ, and let x 2 = 1, x 3 = 1/x 1 . Then (x 1 , x 2 , x 3 ) ∈ K, yet x 2 > λx 1 .

5.2.
Proof of Theorem 5.1. For n ≤ 3, the set cl (Γ * n ) is a polyhedral cone, and relaxation holds, by Theorem 5.3 (1). Thus, we need a counterexample with n = 4 jointly distributed random variables. The cone cl Γ * 4 is a subset of R 16 , hence the counterexample needed to prove Theorem 5.1 will be more complex than that used to prove Theorem 5.3 (2). For that purpose, we adapt an example by Kaced and Romashchenko [KR13, Inequality (I5 ) and Claim 5], built upon an earlier example by Matúš [Mat07].

B. Kenig and D. Suciu
Vol. 18:1 where: Finally, we use the fact that − ln(1 − x) = O(x) and obtain: which proves equation (5.8) Next, we prove cl (Γ * n ) |= EI (Σ ⇒ τ ). This follows from an inequality initially proven by Matúš [Mat07], then adapted by [KR13]. For completeness, we review here that inequality, starting with the statement of Theorem 2 in [Mat07], which asserts that for every entropic vector h ∈ Γ * 5 over 5 variables, and every natural number k ≥ 1: where, using our notation: Substituting A = 1, B = 3, C = 4, D = 2, E = 5 in Matúš's inequality and dividing by k, we obtain the following (which is Eq.(ii) in Theorem 2 of [KR13]): Finally, we set E = D to obtain the following inequality, for all h ∈ Γ * n and k ≥ 1: By continuity, the inequality also holds for cl (Γ * n ) too. We can prove now that the Exact Implication Σ ⇒ τ holds in cl (Γ * n ). Assume It is interesting to observe that inequality (5.9) is almost a relaxation of the implication (5.4): the only extra term is the last term, which can be made arbitrarily small by increasing k. Our second result generalizes this. 5.3. Proof and Discussion of Theorem 5.2. The proof of Theorem 5.2 follows from a more general statement about cones: Theorem 5.4. Let K ⊆ R N be a topologically closed, convex cone, and let y 0 , y 1 , . . . , y m be m + 1 vectors in R N . The following are equivalent: ∀x ∈ K : x · y 1 ≤ 0, . . . , x · y m ≤ 0 ⇒ x · y 0 ≤ 0 (5.10) ∀ε > 0, ∃θ 1 , . . . , θ m ≥ 0,∀x ∈ K, x·y 0 ≤ θ 1 x·y 1 + · · · + θ m x·y m + ε||x|| ∞ (5.11) We first show that Theorem 5.4 implies Theorem 5.2. For this purpose we take K def = cl (Γ * n ), which is a closed, convex cone [Yeu08]. Let Σ = {(B 1 ; C 1 |A 1 ), . . . , (B m ; C m |A m )}, and τ = (B 0 ; C 0 |A 0 ). We define the vectors y i such that y i ·h = I h (B i ; C i |A i ), and notice that the conditions y i ·h ≤ 0 and y i ·h = 0 are equivalent, because I h (B i ; C i |A i ) ≥ 0 always holds. If the Exact Implication Σ ⇒ τ holds, then condition (5.10) holds. This implies condition (5.11), and inequality (5.1) in Theorem 5.2 follows by setting λ def = max i θ i , and observing that ||h|| ∞ = h(Ω) (i.e., this is because max i∈N h i = h N = h(Ω)).
In the rest of this section we prove Theorem 5.4. While we only need the implication (5.10) ⇒ (5.11), it helps to observe that the reverse holds too. Indeed, assuming x·y i ≤ 0 for i = 1, m, the condition (5.11) implies that, for every ε > 0: x·y 0 ≤ ε||x||. By taking ε → 0 we obtain x·y 0 ≤ 0.
To prove the implication (5.10) ⇒ (5.11) we need to review some properties of cones. For any set C ⊆ R N , its dual, C * ⊆ R N is the following set: It is immediate to check that C * is a topologically closed, convex cone (because C * is the intersection of, possibly infinitely many, linear half-spaces, each of which is a topologically closed, convex cone), formally: C * = cl (C * ). We warn that the * in Γ * n does not represent the dual; the notation Γ * n for entropic functions is by now well established, and we adopt it here too, despite its clash with the standard notation for the dual cone.
We need the following basic properties of cones: (A) If L is a finite set, then conhull (L) is topologically closed. (B) For any set K, cl conhull (K) = K * * .

Restricted Axioms: Shannon Inequalities
The results in the previous section are mostly negative: for general constraints, relaxation fails, with the only exception of MVDs and FDs, where relaxation holds. In this section we prove that relaxation holds in general for constraints that can be inferred using only Shannon inequalities (monotonicity (2.2), and submodularity (2.3)). Equivalently, this means interpreting the constraints over the set of all polymatroids, Γ n . While the golden standard is the implication problem in cl (Γ * n ), a study of the implication problem in Γ n is important for several reasons. First, by restricting to Shannon inequalities we obtain a sound, but in general incomplete (w.r.t. cl (Γ * n )) method for deciding implications. The incompleteness stems from the fact that the set of entropic functions (and their limit points) cl(Γ * n ) obey additional inequalities (and hence, implications) beyond those that result from the Shannon inequalities. These are called non-Shannon-Type inequalities [ZY97,MMRV02], and as their name suggests, they are not implied by the Shannon inequalities. For example, Matúš's inequality (5.9) is a non-Shannon inequality. It holds only in cl(Γ * n ) but fails in Γ n . Second, while generally incomplete, Shannon's inequalities are complete for characterizing the inequalities that hold under certain syntactic restrictions. In particular, it follows from our results in Section 4 that they are complete for FDs and MVDs, and it follows from results in [Ken21] that they are complete for marginal CIs 5 : in both cases, relaxation holds, with a factor λ = n 2 /4. We have already seen in Theorem 5.3 that exact implication of CIs relax over Γ n . We start by proving an upper bound on the coefficient of the relaxation.
Proof. We start by proving a lemma: Next, we briefly review two facts from linear programming, see e.g. [Sch03]. Let A ∈ R M ×N , b ∈ R M , c ∈ R N . Then: • The strong duality theorem states that the primal LP and the dual LP have the same optimal values: • If the primal linear program has a finite optimal solution, then it has an optimal solution x that is a vertex of the polytope {x ∈ R N | x ≥ 0, Ax ≤ b}. In particular, if r def = rank(A), then there exists an optimal solution x of the primal LP that satisfies A 0 x = b, where A 0 is some r × N sub-matrix consisting of r independent rows. As a consequence, if all entries in A, b are −1 or 0 or +1, then, by Lemma 6.2, there exists an optimal solution x satisfying |x i | ≤ r!, for every coordinate i = 1, N .
We prove now Theorem 6.1. Fix a set of n variables, Ω, and let Γ n be the set of all polymatroids over variables Ω. Γ n is defined by Shannon's inequalities, monotonicity (2.2), and submodularity (2.3), and it is known that it suffices to take only the elemental inequalities, i.e. inequalities of the following form (see [Yeu08,(14.12)]): There are n elemental monotonicity constraints, and n(n−1) 2 2 n−2 = n(n − 1)2 n−3 submodularity constraints. The total number of Shannon inequalities is n + n(n − 1)2 n−3 . Equivalently, we can write: where A S is the n + n(n − 1)2 n−3 × 2 n matrix corresponding to all Shannon inequalities. Similarly, the constraints Σ on h can be defined by some m × 2 n matrix A Σ , in other words: A Σ has one row for each constraint in Σ. For example, if Σ contains the CI Z 1 ; Z 2 |V , then one row in the matrix A Σ corresponds to the assertion I h (Z 1 ; Z 2 |V ) ≤ 0, or, equivalently, = n + n(n − 1)2 n−3 +m and N def = 2 n , and let A be the following M × N matrix: Finally, define b ∈ R M the 0-vector, b T = (0, 0, . . . , 0), and let c τ ∈ R N be the vector corresponding to the constraint τ . More precisely, assume τ = (X; Y |W ). Then c T τ h is the With these notations, we claim that the Exact Implication Γ n |= EI (Σ ⇒ τ ) holds iff the optimal solution of the linear program below has value zero: is the value of consequent τ . We always have max{c T τ h | h ∈ R 2 n , h ≥ 0, Ah ≤ b} ≥ 0, because the zero polymatroid h def = 0 is a feasible solution to the LP. Moreover, if the Exact Implication holds, then every feasible solution is a polymatroid that satisfies Σ, hence it satisfies I h (X; Y |W ) ≤ 0, in other words c T τ h ≤ 0, proving tha the optimal solution to the LP is ≤ 0. We conclude that, if the Exact Implication holds, then the optimal is = 0, which proves the claim. Next, assume that the exact implication holds, thus the primal LP has optimal value 0. By the strong duality theorem, the dual LP also has optimal value 0: Since b = 0, we have y T b = 0, hence, when the exact implication holds, then min{0 | y ∈ R M , y ≥ 0, y T A ≥ c T τ } = 0. Equivalently, this asserts that the constraint y ≥ 0, y T A ≥ c τ has a feasible solution y: otherwise the optimal of the dual solution is min ∅ = ∞.
We observe now that every feasible solution y to the dual LP represents a relaxation of the implication problem. Indeed, lets write y as y = y S y Σ , where y S consists of the first n + n(n − 1)2 n−3 coordinates, and y Σ of the last m coordinates of y. Since y is feasible, Let h ∈ Γ n be any polymatroid; since h ≥ 0, we have: In the last inequality we used the fact that A S h ≤ 0, by definition of Γ n in Eq. (6.1). The m coordinates of the vector A Σ h ∈ R m are the values I h (Z 1 ; Z 2 |V ), for all (Z 1 ; Z 2 |V ) ∈ Σ. Thus, y T Σ (A Σ h) can be written as a positive linear combination of the constraints in Σ, and the inequality c T τ h ≤ y T Σ (A Σ h) is precisely an Approximate Implication. If r def = rank(A), then by our discussion above, we can find a feasible solution y whose coordinates are ≤ r! ≤ (2 n )!, since r ≤ min(M, N ) = N = 2 n . In other words, y ≤ (2 n )!(1 1 · · · 1) T , where (1 1 · · · 1) T is the all-1 vector. Thus, we have: This completes the proof of the theorem.
Theorem 6.1 gives us a very crude upper bound on the relaxation factor for Γ n . Next, we we show a lower bound the factor λ. We prove a lower bound of 3: but the inequality fails if any of the coefficients 3, 2 are replaced by smaller values. In particular, denoting τ, Σ the terms on the two sides of Eq.(6.2), the exact implication Γ n |= EI Σ ⇒ τ holds 6 , and does not have a 1-relaxation.
Proof. We make use the following inequality that was proved in Lemma 1 in [DFZ09].
We apply (6.3) three times: Plugging back into the formula we get that: Plugging back into (6.4) we get that: We remark that inequality (6.2) can be verified using known tools for testing whether an inequality holds for all polymatroids (e.g., ITIP 7 , and XITIP 8 ). It is still open whether the coefficients 3 and 2 (for h(Z|A) and h(Z|B), respectively) are tight. To show that, we would need to present a polymatroid for which (6.2) holds with equality. Using XITIP, we were able to test that the coefficients 3 and 2 could not be reduced even by 0.0001. While this does not rule out the possibility of having the inequality hold for some coefficient 3 − ε for a small-enough , it does allow us to conclude that the exact implication corresponding to (6.2) does not have a 1-relaxation.

Restricted Models: Positive I-Measure
While relaxation fails in its most general setting, we have seen that relaxation holds if we either restrict the type of constraints to FDs and MVDs, or if we restrict the implications to those that can be inferred from Shannon's inequalities. In this section we consider a different restriction: we will restrict the types of models, or databases, over which the constraints are interpreted: more precisely we restrict the set of entropic functions to the step functions, S n , or, equivalently, to their conic hull, which we denote by P n def = conhull (S n ). In Section 8 we show that these entropic functions are precisely those with a positive I-measure, a notion introduced by Yeung [Yeu91,Yeu08]. In this section, we prove that all EIs admit a 1-relaxation over entropic functions with positive I-measure (i.e. over P n ): Theorem 7.1. Every implication Σ ⇒ τ admits a 1-relaxation over P n , where Σ, τ are arbitrary CIs. In other words, if P n |= EI (Σ ⇒ τ ) then ∀h ∈ P n , h(τ ) ≤ h(Σ).
The golden standard for the semantics of constraints is to interpret them over Γ * n , hence the reader may wonder what we gain by restricting them to (the conic hull of) the step functions, or, equivalently, what we can learn by checking implications only on the uniform 2-tuple distributions. We have two motivations. First, checking an implication only on the uniform 2-tuple distributions leads to a complete, but unsound procedure for checking implication over Γ * n . In other words, by testing an implication on all uniform 2-tuple distributions we can detect if an implication fails, thus, the procedure is complete. Of course, the procedure is not sound, because an implication may hold on all step functions but fail in general. For a simple example, the inequality I h (X; Y |Z) ≤ I h (X; Y ) holds for all step functions, but fails on the "parity function" in Fig. 1 (b). The second motivation is more interesting. It turns out that restricting the models to uniform 2-tuple distributions leads to a sound and complete procedure in some important special cases. We saw one such case in Section 4: when the constraints are restricted to FDs and MVDs, then in order to check an implication in Γ * n , it suffices to check that it holds on all uniform 2-tuple distributions. We will present in this section a second case: checking differential constraints in market basket analysis [SVG05].
In Section 8 we use the I-measure theory to characterize the conic hull of step functions P n , and provide an alternative proof to the main results of this section: Theorems 7.1 and 7.4. defined as: The quantity I h (Y |W ) is usually written as I h (y 1 ; · · · ; y m |W ), where Y = {y 1 , . . . , y m }. In the literature, I h (Y |W ) is defined only for entropic functions h, but we remove this restriction here. When m = 2 then this is precisely the mutual information I h (y 1 ; y 2 |W ) = −h(W ) + h(y 1 W ) + h(y 2 W ) − h(y 1 y 2 W ), and when m = 1 then it becomes the conditional Definition 7.3. Fix a set Ω of n variables. An I-measure constraint statement is a formula of the form Y |W , where W, Y ⊆ Ω. We call the constraint saturated if Y ∪ W = Ω. We say that a function h : 2 Ω → R satisfies the constraint statement, if I h (Y |W ) = 0. An implication is a formula Σ ⇒ (Y |W ), where Σ is a set of I-measure constraints and Y |W is an I-measure constraint. Fix a set K s.t. S n ⊆ K ⊆ Γ n . The exact implication Σ ⇒ (Y |W ) holds in We say that the exact implication problem admits a λ-relaxation in K, for some λ > 0, if We prove our main result in this section: Theorem 7.4. Exact implications of I-measure constraints admit a 1-relaxation in P n . More precisely, if Σ |= (Y |W ) is an implication of I-measure constraints and P n |= EI Σ ⇒ (Y |W ), then ∀h ∈ P n , I h (Y |W ) ≤ X|V ∈Σ I h (X|V ).
Notice that Theorem 7.1 follows immediately as a special case, since every CI is, in particular, an I-measure constraint.
To prove theorem 7.4, we need to prove two lemmas. In both lemmas below we will use a simple property. For any two sets A, B: To prove the identity of (7.2), it suffices to show that C:A⊆C⊆B (−1) |C−A| = 0 when A B: fix any element b ∈ B − A, and notice that the sets C that don't contain b are in 1-1 correspondence with those that do contain b, via the mapping C → C ∪{b}. Since the two sets C and C ∪ {b} have different parities, their terms cancel out, (−1) |C−A| + (−1) |C∪{b}−A| = 0, and the entire sum is zero.
Recall from Section 2 that h Z denotes the "step function at Z", where Z Ω (see the definition in Eq. (2.9)). By definition, their conic hull P n def = conhull (S n ) consists of all functions of the form h = Z Ω d Z h Z , where d Z ≥ 0, for Z Ω are arbitrary coefficients. These coefficients are precisely the saturated conditional multi-variate mutual information. This follows from a more general lemma: Z Ω d Z h Z . In other words, the projections of h on the basis consisting of the step functions are precisely the saturated conditional multi-variate mutual informations.
Proof. We prove the following claim: for every W ⊆ Ω, it holds that . The claim implies that the step functions span the entire vector space {h | h : 2 Ω → R, h(∅) = 0}. This immediately implies that they are linearly independent, hence are a basis for V. Indeed, the dimensionality of V is 2 n − 1, and this coincides with the number of step functions: since they span the entire vector space, they must be linearly independent. It remains to prove the claim.
To prove the claim, for any W ⊆ Ω, define f (W ) We want to express f (W ) as f (W ) = Z:W ⊆Z⊆Ω g(Z), for some function g : 2 Ω → R. Such a function g is the unique Möbius inverse of f . Recall that the Möbius inversion formula states that the two expressions below are equivalent: We compute now h(W ) from (7.4): where the last equality holds because h Z (W ) = 1 when W ⊆ Z and h Z (W ) = 0 otherwise. This proves the claim, as required.
The lemma implies that, if h ∈ P n , then I h (Ω − Z|Z) ≥ 0 for all Z Ω. Indeed, by definition of the conic hull P n = conhull (S n ), the function h can be written as h = Z:Z Ω d Z h Z , where d Z ≥ 0. By the lemma, we have d Z = I h (Ω − Z|Z), proving the claim.
The second lemma is: Lemma 7.6. For any two sets W, Y ⊆ Ω s.t. Y = ∅, the following identity holds: Proof. We expand I h in the RHS according to its definition: By Eq. (7.2), the inner sum is = 1 when W = T ∩ (Ω − Y ) and = 0 otherwise. The condition W = T ∩ (Ω − Y ) is equivalent to W ⊆ T ⊆ W ∪ Y and thus we obtain: The latter expression is equal to I h (Y |W ) by Def. 7.2.
We will now prove Theorem 7.4. Consider an exact implication P n |= EI Σ ⇒ (Y |W ). By Lemma 7.6, I h (Y |W ) = V :W ⊆V ⊆Ω−Y I h (Ω − V |V ). We claim that, for every set V s.t. W ⊆ V ⊆ Ω − Y , there exists a constraint X|U ∈ Σ such that U ⊆ V ⊆ Ω − X. In other words, if we expand I h (X|U ) according to Lemma 7.6, then one of the terms will be I h (Ω − V |V ). To prove the claim, we consider the step function at V , h V , and use the fact that the exact implication must hold for h V . We notice that I h V (Ω − V |V ) = 1 and I h V (Ω − U |U ) = 0 for U = V : this follows by considering the expansion of h V given by Lemma 7.5, h V = U :U Ω d U h U , where d U = I h V (Ω − U |U ) and noting that, since h V is part of the basis, we have I h V (Ω − V |V ) = 1 and I h V (Ω − U |U ) = 0 for U = V . In particular, I h V (Y |W ) = 1, in other words, the constraint Y |W does not hold in h V . Since the exact implication holds, it must be the case that Σ does not hold for h V either, hence there exists an I-constraint X|U ∈ Σ such that I h V (X|U ) > 0. When expanding it according to Lemma 7.6, I h V (X|U ) = T :U ⊆T ⊆Ω−X I h V (Ω − T |T ). Since I h V (Ω − T |T ) = 0 for all T = V , we conclude that one of the terms must be I h V (Ω − V |V ), proving the claim.
We use claim to prove the theorem. Let h be any function in P n . To prove the inequality I h (Y |W ) ≤ X|U ∈Σ I h (X|U ), we expand both sides: We have shown that every term I h (Ω − V |V ) in the first sum occurs at least once as a term I h (Ω − T |T ) in the second sum. Since all terms are ≥ 0, it follows that I h (Y |W ) ≤ 7.2. Differential Constraints in Market Basket Analysis. We end this section by describing the tight connection between I-measure constraints and differential constraints for Market Basket Analysis, introduced by Sayrafi and Van Gucht [SVG05].
Consider a set of items Ω = {X 1 , . . . , X n }, and a set of baskets B = {b 1 , . . . , b N } where every basket is a subset b i ⊆ Ω. The support function f B : 2 Ω → N assigns to every subset W ⊆ Ω the number of baskets in B that contain the set W : The function f B is anti-monotone: . Similarly define d B (W ) the number of baskets in B that are equal to the set W : Then the following two identities are easily verified: The first identity follows immediately from the definitions of f B and d B . The second is Möbius' inversion function. Building on the identity (7.6), Sayrafi and Van Gucht define the density of any function f as follows: Definition 7.7. Let f : 2 Ω → R be any function. Its density is the unique function d f : 2 Ω → R is defined as: When f is the support function f B associated to a set of baskets B, then its density d f is precisely the function d B , because it satisfies the left equation in (7.6).
For any function d : 2 Ω → R and any two sets of items W, Y ⊆ Ω, we denote by d(Y |W ) def = V :W ⊆V ⊆W ∪Y d f (V ). A differential constraint is an expression Y |W . A set of baskets B satisfies the differential constraint 9 , if d B (Y |W ) = 0. Sayrafi and Van Gucht [SVG05] define an implication, to be an assertion Σ ⇒ (Y |W ), where Σ is a set of differential constraints. In addition to that, we also define here an approximate implication: Definition 7.8. Fix a set of items Ω. An implication is a formula Σ ⇒ (Y |W ) where Σ is a set of differential constraints, and Y |W is one differential constraint. The exact implication Σ ⇒ (Y |W ) holds, in notation |= EI (Σ ⇒ (Y |W )) if, for every set of baskets B, . We recall that X I is the joint random variable X I def = (X i : i ∈ I) (see Notation 2.1), and we denote m(X I ) def = i∈I m(X i ).
Definition 8.1. The field F n generated by sets m(X 1 ), . . . , m(X n ) is the collection of sets which can be obtained by any sequence of usual set operations (union, intersection, complement, and difference) on m(X 1 ), . . . , m(X n ).
The atoms of F n are sets of the form n i=1 Y i , where Y i is either m(X i ) or m c (X i ). We denote by A the atoms of F n . Note that by our choice of the universal set, Λ def = n i=1 m(X i ), the atom n i=1 m c (X i ) degenerates to the empty set. This is because: Hence, there are 2 n − 1 non-empty atoms in F n . We call a function µ : F n → R set additive if, for every pair of disjoint sets A and B, it holds that µ(A ∪ B) = µ(A)+µ(B). A real function µ defined on F n is called a signed measure if it is set additive, and µ(∅) = 0.
Theorem 8.2 [Yeu91,Yeu08]. µ * is the unique signed measure on F n which is consistent with all Shannon's information measures (i.e., entropies, conditional entropies, mutual information, and conditional mutual information).
Let X, Y, Z ⊆ Ω denote three subsets of variables; Z may be empty. We refer to objects of the form (X) and (X|Z) as entropy and conditional entropy terms, respectively. Likewise, we refer to triples of the form (X; Y |Z) as conditional mutual information terms. Collectively, we refer to these as information-theoretic terms. Let σ = (X; Y |Z) be a mutual information term. We denote by m(σ) = m(X) ∩ m(Y ) ∩ m c (Z) the (I-Measure) set associated with σ (see [Yeu08] for intuition and examples). Table 3 summarizes the extension of the I-measure µ * to the rest of the Shannon measures (i.e., conditional entropy, mutual information, and conditional mutual information). For a set of mutual information and entropy terms Σ, we Theorem 8.3 [Yeu08]. Let Ω = {X 1 , . . . , X n }, and let F n be the field generated by sets m(X 1 ), . . . , m(X n ). Let µ + : F n → R + be any set-additive function that assigns non-negative values to the atoms of F n . Then the function H : 2 Ω → R + , defined as H(X α ) = µ + (m(X α )) is a polymatroid (i.e., meets the polymatroid inequalities (2.2) and (2.3)).
In the following Lemma, we apply Theorem 8.3 to characterize exact implication in the polymatroid cone Γ n .
8.1. Proof from Section 4. In this section we formally prove Theorem 4.11 from Section 4 that relies on the I-measure. For brevity, we denote the intersection of the sets corresponding to variables A and B (e.g., m(A) ∩ m(B)) by m(A)m(B).
Example 8.5. Consider the parity function on three binary random variables. That is, Z = X ⊕ Y where X, Y are uniformly distributed binary variables. We first observe that the variables are pairwise independent. Clearly, X and Y are pairwise independent. It is also easy to see that P (Z = z) = 1 2 for z ∈ {0, 1}. Observe that X and Z are also independent: Further, every variable is functionally determined by the other two (e.g., X = 0, Z = 0 ⇒ Y = 0). Let h P denote the entropic function associated with P . Since X and Y are independent, then I h P (X, Y ) = 0. On the other hand, given Z = z variables X, Y are not independent. We can see this formally: = h P (X) + h P (Y ) + h P (Z) − (h P (Z|XY ) + h(XY )) = h P (X) + h P (Y ) + h P (Z) − h P (XY ) = h P (Z) = 2 · 1 2 log 2 = log 2 > 0 (8.15) We now see how this is related to the i-measure µ * . By definition of µ * we have that (see Table 3  If µ * (a) ≥ 0 for all atoms a ∈ A then it is called a positive measure. A polymatroid is said to be positive if its I-measure is positive, and we denote by ∆ n the cone of positive polymatroids.
Theorem 8.3 states that any I-measure assigning non-negative values to its atoms is a polymatroid. As a corollary, we get, in Lemma 8.4, a set-theoretic characterization of exact implication in the polymatroid cone. The result of Lemma 8.4, combined with the assumption of a non-negative i-measure, gives us a 1-relaxation in the cone of positive polymatroids ∆ n (Theorem 8.6).
Theorem 8.6. Exact implication in the cone of positive polymatroids admits a 1-relaxation.
Proof. By Lemma 8.4 we have that m(τ ) ⊆ m(Σ). Hence, m(τ ) = ∪ σ∈Σ m(τ ) ∩ m(σ) . By the assumption that the I-measure µ * is non-negative, we get that: Since, by Theorem 8.2, µ * is the unique signed measure consistent with all Shannon information measures, the result follows.
P n coincides with the cone of step functions. We describe the structure of the Imeasure of a step function, and prove that the conic hull of step functions P n = conhull (S n ) and positive polymatroids ∆ n ⊆ Γ n coincide. That is, ∆ n = P n . We let U ⊆ [n], and let s U denote the step function at U . In the rest of this section we prove Theorem 8.7.
Theorem 8.7. It holds that ∆ n = P n .
For proving this Theorem, we require Lemma 8.8 that characterizes the i-measure function for step functions. In particular, this lemma shows that step functions are positive polymatroids (i.e., have positive i-measures).