Expressiveness of SHACL Features and Extensions for Full Equality and Disjointness Tests

SHACL is a W3C-proposed schema language for expressing structural constraints on RDF graphs. Recent work on formalizing this language has revealed a striking relationship to description logics. SHACL expressions can use three fundamental features that are not so common in description logics. These features are equality tests; disjointness tests; and closure constraints. Moreover, SHACL is peculiar in allowing only a restricted form of expressions (so-called targets) on the left-hand side of inclusion constraints. The goal of this paper is to obtain a clear picture of the impact and expressiveness of these features and restrictions. We show that each of the four features is primitive: using the feature, one can express boolean queries that are not expressible without using the feature. We also show that the restriction that SHACL imposes on allowed targets is inessential, as long as closure constraints are not used. In addition, we show that enriching SHACL with"full"versions of equality tests, or disjointness tests, results in a strictly more powerful language.


Introduction
On the Web, the Resource Description Framework (RDF [RDF14]) is a standard format for representing knowledge and publishing data.RDF represents information in the form of directed graphs, where labeled edges indicate properties of nodes.To facilitate more effective access and exchange, it is important for a consumer of an RDF graph to know what properties to expect, or, more generally, to be able to rely on certain structural constraints that the graph is guaranteed to satisfy.We therefore need a declarative language in which such constraints can be expressed formally.In database terms, we need a schema language.
Two prominent proposals in this vein have been ShEx [BGP17] and SHACL [SHA17].Both approaches use formulas that express the presence or absence of certain properties of a node or its neighbors in the graph.Such formulas are called "shapes."When we evaluate a shape on a node, that node is called the "focus node."Some examples of shapes, expressed for now in English, could be the following: 1(1) "The focus node has a phone property, but no email." (2) "The focus node has at least five incoming managed-by edges."(3) "Through a path of friend edges, the focus node can reach a node with a CEO-of edge to the node Apple."(4) "The focus node has at least one colleague who is also a friend." (5) "The focus node has no other properties than name, address, or birthdate." In this paper, we look deeper into SHACL, the language recommended by the World Wide Web Consortium.We do not use the actual SHACL syntax, but work with the elegant formalization proposed by Corman, Reutter and Savkovic [CRS18], and used in subsequent works by several authors [ACO + 20, LS + 20, PK + 20].That formalization reveals a striking similarity between shapes on the one hand, and concepts, familiar from description logics [BHLS17], on the other hand.The similarity between SHACL and description logics runs even deeper when we account for targeting, which is the actual mechanism to express constraints on an RDF graph using shapes.
Specifically, a non-recursive shape schema 2 is essentially a finite list of shapes, where each shape ϕ is additionally associated with a target query q.An RDF graph G is said to conform to such a schema if for every target-shape combination (q, ϕ), and every node v returned by q on G, we have that v satisfies ϕ in G. Let us see some examples of target-shape pairs, still expressed in English: (6) "Every node of type Person has an email or phone property."Here, the target query returns all nodes with an edge labeled type to node Person; the shape checks that the focus node has an email or phone property.(7) "Different nodes never have the same email."Here the target query returns all nodes with an incoming email edge, and the shape checks that the focus node does not have two or more incoming email edges.(8) "Every mathematician has a finite Erdős number."Here the target query returns all nodes of type Mathematician, and the shape checks that the focus node can reach the node Erdős by a path that matches the regular expression (author − /author) * .Here, the minus superscript denotes an inverse edge.Interestingly, and apparent in the examples 6-8, the target queries considered for this purpose in SHACL, as well as in ShEx, actually correspond to simple cases of shapes.It is then only a small step to consider generalized shape schemas as finite sets of inclusion statements of the form ϕ 1 ⊆ ϕ 2 , where ϕ 1 and ϕ 2 are shapes.Since, as noted above, shapes correspond to concepts, we thus see that shape schemas correspond to TBoxes in description logics.
We stress that the task we are focusing on in this paper is checking conformance of RDF graphs against shape schemas.Every shape schema S defines a decision problem: given an RDF graph G, check whether G conforms to S. In database terms, we are processing a boolean query on a graph database.In description logic terms, this amounts to model checking of a TBox: given an interpretation, check whether it satisfies the TBox.Thus our focus is a bit different from that of typical applications of description logics.There, facts Remark 2.2.Universal quantification ∀E.ϕ could be introduced as an abbreviation for ¬∃E.¬ϕ.Likewise, ≤ n E.ϕ may be used as an abbreviation for ¬ ≥ n+1 E.ϕ.
Remark 2.3.In our formalization, a path expression can be 'id '.We show in Lemma 3.3 that every path expression is equivalent to id , E ′ ∪ id or E ′ , where E ′ does not use id .In real SHACL, it is possible to write E ′ ∪ id using "zero-or-one" path expressions.Explicitly writing id is not possible, but this poses no problem.Path expressions can only appear in counting quantifiers, equality and disjointness shapes.The shape ≥ n id .ϕ is clearly equivalent to ϕ if n = 1, otherwise, it is equivalent to ¬⊤.The shapes eq(E, p) or disj (E, p) where E is id are implicitly expressible in SHACL by writing the equality or disjointness constraint in node shapes, rather than property shapes.
A vocabulary Σ is a subset of N ∪ P .A path expression is said to be over Σ if it only uses property names from Σ.Likewise, a shape is over Σ if it only uses constants from Σ and path expressions over Σ.
Following common practice in logic, shapes are evaluated in interpretations.We recall the familiar definition of an interpretation.Let Σ be a vocabulary.An interpretation I over Σ consists of 3 In practice, node names and property names are IRIs [RDF14], so the disjointness assumption would not hold.However, this assumption is only made for simplicity of notation; it can be avoided if we make our notation for vocabularies and interpretations (see below) more complicated.16:5  The semantics of shapes is now defined as follows.
• On any interpretation I as above, every path expression E over Σ evaluates to a binary relation E I on ∆ I , defined in Table 1.• Now for any shape ϕ over Σ and any element a ∈ ∆ I , we define when a conforms to ϕ in I, denoted by I, a ⊨ ϕ.For the boolean operators ⊤ (true), ∧ (conjunction), ∨ (disjunction), ¬ (negation), the definition is obvious.For the other constructs, the definition is given in Table 2, taking note of the following: -We use the notation R(x), for a binary relation R, to denote the set {y | (x, y) ∈ R}.
We apply this notation to the case where R is of the form E I .-We also use the notation ♯X for the cardinality of a set X.
• For a shape ϕ and interpretation I, the notation Example 2.4.In the Introduction we already gave three examples (1), (2), and (3) of shapes expressed in English and the formal syntax.The target query of example (7) from the Introduction can be expressed as the shape ∃email − .⊤.The shape of example (7) can be written as ≤ 1 email − .⊤. Formally, we define a graph as a finite set of triples of the form (a, p, b), where p is a property name and a and b are (not necessarily distinct) node names.We refer to the node names appearing in a graph G simply as the nodes of G; the set of nodes of G is denoted by N G .A pair (a, b) with (a, p, b) ∈ G is referred to as an edge, or a p-edge, in G.

Graphs and their interpretation. Remember from
We now canonically view any graph G as an interpretation over the full vocabulary N ∪ P as follows: • ∆ G equals N (the universe of all node names).
• c G equals c itself, for every node name c.
• p G equals the set of p-edges in G, for every property name p.
Note that since graphs are finite, p G will be empty for all but a finite number of p's.
Given this canonical interpretation, path expressions and shapes obtain a semantics on all graphs G. Thus for any path expression E, the binary relation E G on N is well-defined; for any shape ϕ and a ∈ N , it is well-defined whether or not G, a ⊨ ϕ; and we can also use the notation ϕ G .
Remark 2.5.Since a graph is considered to be an interpretation with the infinite domain N , it may not be immediately clear that shapes can be effectively evaluated over graphs.Adapting well-known methods, however, we can reduce to a finite domain over a finite vocabulary [AHV95, Theorem 5.6.1],[AGSS86,HS94].Formally, let ϕ be a shape and let G be a graph.Recall that N G denotes the set of nodes of G; similarly, let P G be the set of property names appearing in G. Let C be the set of constants mentioned in ϕ.We can then form the finite vocabulary Σ = N G ∪ C ∪ P G .Now define the interpretation I over Σ as follows: Note that no constant symbol names ⋆ in I. Then for every x ∈ N G ∪ C, one can show that x ∈ ϕ G if and only if x ∈ ϕ I .For all other node names x, one can show that x ∈ ϕ G if and only if ⋆ ∈ ϕ I .
Example 2.6 (Example 2.4 continued).Consider the graph G ex depicted in Figure 1.This graph can be seen as the interpretation I ex with an infinite domain containing the elements a, b, m 1 , and m 2 .It interprets the predicate name email as {(a, m 1 ), (b, m 1 ), (b, m 2 )} and all other predicate names as the empty set.If we look at the interpretation of example (7) from the Introduction in I ex , we have ≤ 1 email − .⊤Iex = {m 1 } for the shape, and ∃email − .⊤Iex = {m 1 , m 2 } for the target.2.3.Targets and shape schemas.SHACL identifies four special forms of shapes and calls them targets: Node targets: {c} for any constant c.Class-based targets: ∃type/subclass * .{c}for any constant c.Here, type and subclass represent distinguished IRIs from the RDF Schema vocabulary [RDF14].Subjects-of targets: ∃p.⊤ for any property name p. Objects-of targets: ∃p − .⊤for any property name p.
We now define a generalized shape schema (or shape schema for short) as a finite set of inclusion statements, where an inclusion statement is of the form ϕ 1 ⊆ ϕ 2 , with ϕ 1 and ϕ 2 shapes.A target-based shape schema is a shape schema that only uses targets, as defined above, on the left-hand sides of its inclusion statements.This restriction corresponds to the shape schemas considered in real SHACL.
As already explained in the Introduction, a graph G conforms to a shape schema S, denoted by G ⊨ S, if ϕ 1 G is a subset of ϕ 2 G , for every statement ϕ 1 ⊆ ϕ 2 in S. Thus, any shape schema S defines the class of graphs that conform to it.We denote this class of graphs by S := {graph G | G ⊨ S}.Accordingly, two shape schemas S 1 and S 2 are said to be equivalent if S 1 = S 2 .

Expressiveness of SHACL features
When a complicated but influential new tool is proposed in the community, in our case SHACL, we feel it is important to have a solid understanding of its design.Concretely, as motivated in the Introduction, our goal is to obtain a clear picture of the relative expressiveness of the features eq, disj , and closed .Our methodology is as follows.
A feature set F is a subset of {eq, disj , closed }.The set of all shape schemas using only features from F , besides the standard constructs, is denoted by L(F ).In particular, shape schemas in L(∅) use only the standard constructs and none of the three features.Specifically, they only involve shapes built from boolean connectives, constants, and qualified number restrictions, with path expressions built from property names, id and the standard operators union, composition, and Kleene star.
We say that feature set F 1 is subsumed by feature set F 2 , denoted by F 1 ⪯ F 2 , if every shape schema in L(F 1 ) is equivalent to some shape schema in L(F 2 ).As it will turn out, or intuitively, "every feature counts."Note that the implication from right to left is trivial, but the other direction is by no means clear from the outset.More specifically, for every feature, we introduce a class of graphs, as follows.In what follows we fix some property name r.
Equality: Q eq is the class of graphs where all r-edges are symmetric.Note that Q eq is definable in L(eq) by the single, target-based, inclusion statement ∃r.⊤ ⊆ eq(r − , r).Disjointness: Q disj is the class of graphs where all nodes with an outgoing r-edge have at least one symmetric r-edge.This time, Q disj is definable in L(disj ), by the single, target-based, inclusion statement ∃r.⊤ ⊆ ¬disj (r − , r).Closure: Q closed is the class of graphs where for all nodes with an outgoing r-edge, all outgoing edges have label r.Again Q closed is definable in L(closed ) by the single, target-based, inclusion statement ∃r.⊤ ⊆ closed (r).We establish the following theorem, from which the above equivalence ( * ) immediately follows: For X = closed , Theorem 3.1 is proven differently than for the other two features.First, we deal with the remaining features through the following concrete result, illustrated in Figure 2. The formal definition of the graphs illustrated in Figure 2 for X = disj will be provided in Definition 3.8.Proposition 3.2.Let X = disj or eq.Let Σ be a finite vocabulary including r, and let m be a nonzero natural number.There exist two graphs G and G ′ with the following properties: (1) G ′ belongs to Q X , but G does not.
(2) For every shape ϕ over Σ such that ϕ does not use X, and ϕ counts to at most m, we have Here, "counting to at most m" means that all quantifiers ≥ n used in ϕ satisfy n ≤ m.For X = eq, this proposition is reformulated as Proposition 3.12, and for X = disj , this proposition is reformulated as Proposition 3.15.
To see that Proposition 3.2 indeed establishes Theorem 3.1 for the three features under consideration, we use the notion of validation shape of a shape schema.This shape evaluates to the set of all nodes that violate the schema.Thus, the validation shape is an abstraction of the "validation report" in SHACL [SHA17]: a graph conforms to a schema if and only if the validation shape evaluates to the empty set.The validation shape can be formally constructed as the disjunction of ϕ 1 ∧ ¬ϕ 2 for all statements ϕ 1 ⊆ ϕ 2 in the schema.Now consider a shape schema S not using feature X.Let m be the maximum count used in shapes in S, and let Σ ′ be the set of constants and property names mentioned in S. Now given Σ = Σ ′ ∪ {r} and m, let G and G ′ be the two graphs exhibited by the Proposition, and let ϕ be the validation shape for S. Then ϕ will evaluate to the same result on G and G ′ .However, for S to define Q X , validation would have to return the empty set on G ′ but a nonempty set on G.We conclude that S does not define Q X .
We will prove Proposition 3.2 for X = disj in Section 3.2, and X = eq in Section 3.3.We will show Theorem 3.1 for X = closed in Section 3.4.However, we first need to establish some preliminaries on path expressions.
3.1.Preliminaries on path expressions.We call a path expression E equivalent to a path expression E ′ when for every graph G, E G = E ′ G .We call a path expression E id-free whenever id is not present in the expression.Lemma 3.3.Every path expression E is equivalent to: id , or E ′ ∪ id , or E ′ where E ′ is an id -free path expression.
Proof.The proof is by induction on the structure of E. When E is id-free or id , the claim directly follows.We consider the following inductive cases: By induction, we consider nine cases.When both Consider the two cases where 1 an id -free path expression, then E is equivalent to E ′ * 1 and clearly E ′ * 1 is id -free.Lastly, when E 1 is id-free, clearly E is as well.
We also need the notion of "safe" path expressions together with the following Lemma, detailing how path expressions can behave on the nodes outside a graph.One can divide all path expressions into the "safe" and the "unsafe" ones.Definition 3.4 (Safety).A path expression is safe if one of the following conditions holds: Lemma 3.5.Let E be an id -free path expression and let G be a graph.
Proof.By induction.If E is a property name or its inverse, then the claim clearly holds.Now assume E is of the form E 1 ∪ E 2 .The cases where both E 1 and E 2 are safe, or both are unsafe, are clear by induction.
The same reasoning can be used when E 2 is safe but E 1 is not.
Next, assume E is of the form E 1 /E 2 .Furthermore assume E 1 is safe, so that E is safe.Let (x, y) ∈ E G .Then there exists z such that (x, z) ∈ E 1 G and (z, y) ∈ E 2 G .Since E 1 is safe, x and z are in N G .Now regardless of whether E 2 is safe or not, since (z, y) ∈ E 2 G and z ∈ N G , we get y ∈ N G as desired.The same reasoning can be used when E 2 is safe.
If E is not safe, we verify that For the inclusion from left to right, take (x, y) ∈ E G .Then there exists z such that (x, z) ∈ E 1 G and (z, y) ∈ E 2 G .By induction, there are four cases.If both (x, z) and (z, y) are in Lastly, the two cases where one of (x, z) and (z, y) is in N G × N G and the other in {(a, a) | a ∈ N − N G }, are not possible.
For the inclusion from right to left, take (x, y) G and (a, a) ∈ E 2 G since E 1 and E 2 are not safe.We conclude (a, a) ∈ E 1 /E 2 G as desired.Next, assume E is of the form E * 1 .Note that E is unsafe.By definition of Kleene star, we only need to verify that y, the claim clearly holds.Otherwise, we consider two cases: Lastly, we define the notion of a string, together with the following Lemma, detailing a convenient property of path expressions.Definition 3.6.A string s is a path expression of the form: id , or s ′ /p or s ′ /p − where s ′ is a string and p is a property name.
Lemma 3.7.For every path expression E and every natural number n, there exists a finite non-empty set of strings U s.t. for every graph G with at most n nodes we have The proof of Lemma 3.7 can be found in the appendix.
3.2.Disjointness.We present here the proof for X = disj .The general strategy is to first characterize the behavior of path expressions on G and G ′ .Then the Proposition is proven with a stronger induction hypothesis, to allow the induction to carry through.A similar strategy is followed in the proof for X = eq.
We begin by defining the graphs G and G ′ more formally.
Definition 3.8 (G disj (Σ, m)).Let Σ be a finite vocabulary including r, and let m be a natural number.We define the graph G disj (Σ, m) over the set of property names in Σ as follows.Let M = max(m, 3).There are 4M nodes in the graph, which are chosen outside of Σ.We denote these nodes by x j i for i = 1, 2, 3, 4 and j = 1, . . ., M .(In the description that follows, subscripts range from 1 to 4 and superscripts range from 1 to M .)For each property name p in Σ, the graph has the same set of p-edges.We describe these edges next.There is an edge from x j i to x j ′ i mod 4+1 for every i, j and j ′ .Moreover, if i is 2 or 4, there is an edge from x j i to x j ′ i for all j ̸ = j ′ .So, formally, we have: G disj (Σ, m) := {(x j i , p, x j ′ i mod 4+1 ) | i ∈ {1, . . ., 4} and j, j ′ ∈ {1, . . ., M } and p ∈ Σ ∩ P } ∪ {(x j i , p, x j ′ i ) | i ∈ {1, . . ., 4} and j, j ′ ∈ {1, . . ., M } and j ̸ = j ′ and p ∈ Σ ∩ P }.Thus, in Figure 2, bottom left, one can think of the left oval as the set of nodes x j 1 ; the top cloud as the set of nodes x j 2 ; and so on.We call the nodes x j i with i = 2, 4 the even nodes, and the nodes x j i with i = 1, 3 the odd nodes.Definition 3.9 (G ′ disj (Σ, m)).We define the graph G ′ disj (Σ, m) in the same way as G disj (Σ, m) except that there is an edge from x j i to x j ′ i for all i and j ̸ = j ′ (not only for even i values).We characterize the behavior of path expressions on the graph G disj (Σ, m) as follows.
Lemma 3.10.Let G be G disj (Σ, m).Call a path expression simple if it is a union of expressions of the form s 1 / . . ./s n , where n ≥ 1 and one of the s i is a property name while the other s i are "id ".Let E be a non-simple, id -free path expression over Σ.The following three statements hold: (1) (A) for all even nodes v of G, we have E G (v) ⊇ r G (v); or (B) for all even nodes v of G, we have Proof.For i = 1, 2, 3, 4, define the i-th blob of nodes to be the set X i = {x 1 i , . . ., x M i } (see Figure 2).We also use the notations next(1) = 2; next(2) = 3; next(3) = 4; next(4) = 1; prev (4) = 3; prev (3) = 2; prev (2) = 1; prev (1) = 4. Thus next(i) indicates the next blob in the cycle, and prev (i) the previous.
The proof is by induction on the structure of E. If E is a property name, E is simple so the claim is trivial.If E is of the form p − , cases B and D are clear and we only need to verify the third statement.That holds because for any i, if v ∈ X i , then p − G (v) ⊇ X prev (i) and clearly X prev (i) − r G (v) ̸ = ∅.We next consider the inductive cases.
First, assume E is of the form E 1 ∪E 2 .When at least one of E 1 and E 2 is not simple, the three statements immediately follow by induction, since If E 1 and E 2 are simple, then E is simple and the claim is trivial.
Next, assume E is of the form E * 1 .If E 1 is not simple, the three statements follow immediately by induction, since E G ⊇ E 1 G .If E 1 is simple, cases A and C clearly hold for E, so we only need to verify the third statement.That holds because, by the form of E, every node v is in E G (v), but not in r G (v), as G does not have any self-loops.Finally, assume E is of the form E 1 /E 2 .Note that if E 1 or E 2 is simple, clearly cases A and C apply to them.The argument that follows will therefore also apply when E 1 or E 2 is simple.We will be careful not to apply the induction hypothesis for the third statement to E 1 and E 2 .
We first focus on the even nodes, and show the first and the third statement.We distinguish two cases.
• If case A applies to E 2 , then we show that case A also applies to E. Let v ∈ X i be an even node.We verify the following two inclusions: G regardless of whether case A or B applies to E 1 .By case A for E 2 , we also have (w, u) • If case B applies to E 2 , then we show that case B also applies to E. This is analogous to the previous case, now verifying that In both cases, the third statement now follows for even nodes v.
. We next focus on the odd nodes, and show the second and the third statement.We again consider two cases.
• If case C applies to E 1 , then we show that case C also applies to E. Let v ∈ X i be an odd node.Note that r G (v) = X next(i) .To verify that whether case A or B applies to E 2 .We obtain (v, u) ∈ E G as desired.
We also verify the third statement for odd nodes in this case.We distinguish two further cases.
-If case A applies to E 2 , any node u ∈ X next(next(i)) belongs to E G (v), and clearly these u are not in and again these u are not in X next(i) .• If case D applies to E 1 , then we show that case D also applies to E. This is analogous to the previous case, now verifying that E G (v) ⊇ X prev (i) .In this case the third statement for odd nodes is clear, as clearly X prev (i) − X next(i) ̸ = ∅.
We similarly characterize the behavior of path expressions on the other graph.
Lemma 3.11.Let G ′ be G ′ disj (Σ, m) and let E be a non-simple, id -free path expression over Σ.The following statements hold: Proof.The proof is similar to the proof of Lemma 3.10, but simpler due to the homogeneous nature of the graph G ′ .We omit the proof.
We are now ready to prove the non-obvious part of Proposition 3.2 where X = disj .We use the following version of the proposition.Proposition 3.12.Let V be the common set of nodes of the graphs G = G disj (Σ, m) and Let ϕ be a shape over Σ that does not use disj , and that counts to at most m.Then either Proof.This is proven by induction on the structure of ϕ.
Next assume ϕ is of the form eq(E, p).Using Lemma 3.3, we distinguish four different cases for E.
• E is id .According to Lemma 3.10 and Lemma 3.11 E H will always contain either p H or p − H .In both cases, E H (v) clearly never equals id H (v) = {v}.Therefore, As the final base case, assume ϕ is of the form closed (R).If Σ contains a property name p not in R, then ϕ H ∩ V = ∅, since every node in H has an outgoing p-edge.Otherwise, i.e., if Σ ⊆ R, we have ϕ H ⊇ V , since every node in H has only outgoing edges labeled by property names in Σ.To see that, moreover, ϕ G = ϕ G ′ , it suffices to observe that trivially H, v ⊨ ϕ for all node names v / ∈ V .We next consider the inductive cases.The cases for the boolean connectives follow readily by induction.Finally, assume ϕ is of the form ≥ n E.ϕ 1 .By induction, there are two possibilities for ϕ 1 : path expressions can only reach nodes in some graph from nodes in that graph.
Next, when E is id -free or E ′ ∪ id with E ′ an id -free path expression, it suffices to show that ♯ E ′ H (v) ≥ n for all v ∈ V .By Lemmas 3.10 and 3.11 we know that E 1 H (v) contains r H (v) or r − H (v). Inspecting H, we see that each of these sets has at least max(3, m) ≥ n elements, as desired.Finally, when E is equivalent to an id -free path expression or whenever E simply does not use id , the argument is analogous to the previous case.
In both cases we still need to show that ϕ G = ϕ G ′ .We already showed that ϕ G ⊇ V and ϕ G ′ ⊇ V , or ϕ G ∩ V = ∅ and ϕ G ′ ∩ V = ∅.Therefore, towards a proof of the equality, we only need to consider the node names not in V .
For the inclusion from left to right, take x ∈ ϕ G − V .Since G, x ⊨ ϕ, there exist y 1 , . . ., y n such that (x, y i ) ∈ E G and G, y i ⊨ ϕ 1 for i = 1, . . ., n.However, since x / ∈ V , by Lemma 3.5, all y i must equal x.Hence, n = 1 and (x, x) ∈ E G and G, x ⊨ ϕ 1 .Then again by the same Lemma, (x, x) ∈ E G ′ , since G and G ′ have the same set of nodes V .Moreover, by induction, G ′ , x ⊨ ϕ 1 .We conclude that G ′ , x ⊨ ϕ as desired.The inclusion from right to left is argued symmetrically.
3.3.Equality.Next, we turn our attention to Proposition 3.2 for X = eq.We define the graphs from Figure 2 formally.Definition 3.13.Let Σ be a finite vocabulary including r, and let m be a natural number.Choose a set V of node names outside Σ, of cardinality M := max(3, m + 1).Fix two arbitrary nodes a and b from V .We define the graph G eq (Σ) over the set of property names from Σ as follows.For each property name p in Σ, the set of p-edges in G eq (Σ) equals V × V − (b, a).We define the graph G ′ eq (Σ) similarly, but with V × V as the set of p-edges.So, G ′ eq (Σ, m) is a complete graph, and G eq (Σ, m) is a complete graph with one edge (b, a) removed.
Lemma 3.14.Let E be an id -free path expression over Σ and let Proof.The claim is obvious for G ′ eq (Σ, m), being a complete graph.So we focus on the graph G eq (Σ, m).The proof is by induction.If E is a property name or its inverse, the claim is clear.If E is of the form E 1 ∪ E 2 , the claim is immediate by induction.
Assume E is of the form E 1 /E 2 .We show that A applies. 4 If A applies to E 1 , this is clear, since we can follow any edge by E 1 and then stay at the head of the edge by E 2 using the self-loop.If B applies to E 1 , the same can still be done for all edges except for (a, b), which is the only nonsymmetrical edge.To go from a to b by E, we go by E 1 from a to a node c distinct from a and b, then go by E 2 from c to b.
If E is of the form E * 1 , again A applies, since E * 1 contains E 1 /E 1 .We are now ready to prove the non-obvious part of Proposition 3.2 where X = eq.We use the following version of the proposition.Proposition 3.15.Let G be G eq (Σ, m) and let G ′ be G ′ eq (Σ, m).Let ϕ be a shape over Σ that does not use eq and that counts to at most m.Then either Proof.This is proven by induction on the structure of ϕ.Let H be G or G ′ .We focus directly on the relevant cases.Assume ϕ is of the form disj (E 1 , E 2 ).Lemma 3.14 clearly yields that ϕ H ∩ V = ∅.It again remains to verify that G, v ⊨ ϕ iff G ′ , v ⊨ ϕ for all node names v / ∈ V .By Lemma 3.5, for such v and H = G or G ′ , we indeed have H, v ⊨ ϕ if exactly one of E 1 and E 2 is safe.If both are safe or both are unsafe, we have H, v ⊭ ϕ.
The last base case of interest is the case where ϕ is of the form closed (R).This goes again exactly as in the proof for X = disj .
We next consider the inductive cases.The cases for the boolean connectives follow readily by induction.Finally, assume ϕ is of the form ≥ n E.ϕ 1 .By induction, there are two possibilities for ϕ 1 : since path expressions can only reach nodes in some graph from nodes in that graph.
• If ϕ 1 H ⊇ V , we distinguish three cases using Lemma 3.3.First, when E is id , then if n = 1, ϕ H ⊇ V .Otherwise, if n ̸ = 1, then ϕ H = ∅.Next, when E is id -free or E ′ ∪ id with E ′ an id -free path expression, it suffices to show that ♯ E ′ H (v) ≥ n for all v ∈ V .By Lemma 3.14, we know that E H (v) contains r H (v) or r − H (v).These sets contain at least M − 1 ≥ m ≥ n elements as desired.(The number M − 1 is reached only when H is G and v = b or v = a; otherwise the sets contain M elements.) The equality ϕ G = ϕ G ′ is shown in the same way as in the proof for X = disj (Section 3.2).
3.4.Closure.Without using closed , shapes cannot say anything about properties that they do not explicitly mention.We formalize this intuitive observation as follows.The proof is straightforward.
Lemma 3.16.Let Σ be a vocabulary, let E be a path expression over Σ, and let ϕ be a shape over Σ that does not use closed .Let G 1 and G 2 be graphs such that p G 1 = p G 2 for every property name p in Σ.
Theorem 3.1 now follows readily for X = closed .Let F be a feature set without closed , let S be a shape schema in L(F ), and let ϕ be the validation shape of S. Let p be a property name not mentioned in S, and different from r.Consider the graphs G = {(a, r, a), (a, p, a)} and G ′ = {(a, r, a)}, so that G ′ belongs to Q closed but G does not.By Lemma 3.16 we have ϕ G = ϕ G ′ , showing that S does not define Q closed .
Remark 3.17.Lemma 3.16 fails completely in the presence of closure constraints.The simplest counterexample is to consider Σ = ∅ and the shape closed (∅).Trivially, any two graphs agree on the property names from Σ.However, closed (∅) G , which equals the set of node names that do not have an outgoing edge in G (they may still have an incoming edge), obviously depends on the graph G.
The reader may wonder if this statement still holds under active domain semantics.In such semantics, which we denote by ϕ G adom , we would view G as an interpretation with domain not the whole of N ; rather we would take as domain the set N G ∪ C, with C the set of constants mentioned in ϕ.When assuming active domain semantics, a modified lemma is required.To see this, consider the graph G = {(a, p, b)} and G ′ = {(a, p, b), (a, q, c)}.Let ϕ simply be ⊤.We have ϕ G adom = {a, b} and ϕ G ′ adom = {a, b, c}, so Lemma 3.16 no longer holds.We can, however, give the following more refined variant of Lemma 3.16: Lemma 3.18.Let Σ be a vocabulary, let E be a path expression over Σ, and let ϕ be a shape over Σ that does not use closed .Let I 1 and I 2 be interpretations such that p I 1 = p I 2 for every property name p in Σ.
The same reasoning as given after Lemma 3.16, now using the new Lemma, then shows that closed is still primitive under active domain semantics.
Let G be any graph, and let G ′ be the graph obtained from G by removing all triples involving property names not mentioned in ϕ.We reason as follows: Remark 4.3.Note that we do not need class-based targets in the proof, so such targets are redundant on the left-hand sides of inclusions.This can also be seen directly: any inclusion ∃type/subclass * .{c}⊆ ϕ with a class-based target is equivalent to the following inclusion with a subjects-of target: ∃type.⊤ ⊆ ¬∃type/subclass * .{c}∨ ϕ Remark 4.4.Theorem 4.1 fails in the presence of closure constraints.For example, the inclusion ¬closed (∅) ⊆ ∃r.⊤ defines the class of graphs where every node with an outgoing edge has an outgoing r-edge.Suppose this inclusion would be equivalent to a target-based shape schema S, and let R be the set of all property names mentioned in the targets of S.
Let p be a property name not in R and distinct from r; let a be a node name not used as a constant in S; and consider the graph G = {(a, p, a)}.This graph trivially satisfies S, but violates the inclusion.

Extensions for full equality and disjointness tests
A quirk in the design of SHACL is that it only allows equality and disjointness tests eq(E 1 , E 2 ) and disj (E 1 , E 2 ) where E 1 can be a general path expression, but E 2 needs to be a property name.The next question we can ask is whether allowing "full" equality or disjointness tests, i.e., allowing a general path expression for E 2 , strictly increases the expressive power.Within the community there are indeed plans to extend SHACL in this direction [Knu21,Jak22].When we allow for such "full" equality and disjointness tests, it gives rise to two new features: full -eq and full -disj .Formally, we extend the grammar of shapes with two new constructs: eq(E 1 , E 2 ) and disj (E 1 , E 2 ).
Remark 5.1.We extend Remark 2.3 by noting that in real SHACL, we cannot explicitly write the shapes eq(id , id ) and disj (id , id ).However, these shapes are equivalent to ⊤ and ¬⊤ respectively.
We are going to show that each of these new features strictly adds expressive power.Concretely, we introduce the following classes of graphs.
Full equality: Q full-eq is the class of graphs where all objects of a property name p do not have the same subjects for p and q.Note that Q full-eq is definable in L(full -eq) by the single, target-based, inclusion statement ∃p − .⊤⊆ ¬eq(p − , q − ).
Consider the case where ϕ is disj (E, r).We verify (i), (ii), (iii), and (iv).First, to see that (i) and (ii) hold, we observe that r G Next, to show (iii) where r = p, assume G, c ⊨ disj (E, p).By Lemma 3.7 we know there is a set of strings U equivalent to E in both G and G ′ .By Lemma 5.9 there are only 10 types of strings.We observe from Lemma 5.9 that for every U , s∈U s G (c) is disjoint from p G (c) whenever U does not contain strings of type 1, 2, or 7.These are also exactly the U s.t.s∈U s G ′ (c) is disjoint from p G ′ (c).
ϕ uses the same constants, quantifiers, and path expressions as P.For semipositive programs, this is shown using a fixpoint characterization of the minimal model; for stratified programs, this argument can then be applied repeatedly.The crux, however, is that graphs G and G ′ of Proposition 3.2 will have the same ϕ.Indeed, by that Proposition, the fixpoints of the different strata will be reached on G and on G ′ in the same stage.We effectively obtain an extension of Proposition 3.2, which establishes the theorem for features X other than closed .
Also for X = closed , the reasoning, given after Lemma 3.16, extends in the same way to stratified shape schemas, since the graphs G and G ′ used there again yield exactly the same evaluation for all shapes that do not use closed .
Extending Theorem 4.1.Also Theorem 4.1 extends to stratified shape schemas.Thereto, Lemma 3.16 needs to be reproven in the presence of a stratified program P defining the intensional shape names.The extended Lemma 3.16 then states that ϕ P(G) = ϕ P(G ′ ) .The proof of Theorem 4.1 then goes through unchanged.
Extending Theorem 5.2.Also Theorem 5.2 extends to stratified shape schemas for the same reasons given above for Theorem 3.1.

Concluding remarks
An obvious open question is whether our results extend further to nonstratified programs, depending on various semantics that have been proposed for Datalog with negation, notably well-founded or stable models [AHV95,Tru18].One must then deal with 3-valued models and, for stable models, choose whether the TBox should hold in every stable model (skeptical), or in at least one (credulous).For example, Andreşel et al. [ACO + 20] adopt a credulous approach.In the same vein, even for stratified programs, one may consider maximal models instead of minimal ones, as suggested for ShEx [BGP17].Unified approaches developed for logic programming semantics can be naturally applied to SHACL [BJ21].
Notably, Corman et al. [CRS18] have already suggested that disjointness is redundant in a setting of recursive shape schemas with nonstratified negation.Their expression is not correct, however [Reu21]. 5 A general question surrounding SHACL, even standard nonrecursive SHACL, is to understand better in which sense (if at all) this language is actually better suited for expressing constraints on RDF graphs than, say, SPARQL ASK queries [CFRS19, T + 10, DMH + 21].Certainly, the affinity with description logics makes it easy to carve out cases where higher reasoning tasks become decidable [LS + 20, PK + 20].It is also possible to show that nonrecursive SHACL is strictly weaker in expressive power than SPARQL.But does SHACL conformance checking really have a lower computational complexity?Can we think of novel query processing strategies that apply to SHACL but not easily to SPARQL?Are SHACL expressions typically shorter, or perhaps longer, than the equivalent SPARQL ASK expression?How do the expression complexities [Var82] compare?
5 Their approach is to postulate two shape names s1 and s2 that can be assigned arbitrary sets of nodes, as long as the two sets form a partition of the domain.Then for one node x to satisfy the shape disj (E, p), it is sufficient that E(x) is a subset of s1 and p(x) of s2.This condition is not necessary, however, as other nodes may require different partitions.
p) the sets E I (a) and p I (a) are equal disj (E, p) the sets E I (a) and p I (a) are disjoint closed (R) p I (a) is empty for each p ∈ Σ − R Table 2: Conditions for conformance of a node to a shape.• a set ∆ I , called the domain of I; • for each constant c ∈ Σ, an element c I ∈ ∆ I ; and • for each property name p ∈ Σ, a binary relation p I on ∆ I .

Figure 2 :
Figure 2: Graphs used to prove Proposition 3.2.The nodes are taken outside Σ.For X = eq, the cloud shown for G ′ represents a complete directed graph on m + 1 nodes, with self-loops, and G is the same graph with one directed edge removed.For X = disj , in the picture for G, each cloud again stands for a complete graph, but this time on M = max(m, 3) nodes, and without the self-loops.Each oval stands for a set of M separate nodes.An arrow from one blob to the next means that every node of the first blob has a directed edge to every node of the next blob.So, G is a directed 4-cycle of alternating clouds and ovals, and G ′ is a directed 4-cycle of clouds.

Table 1 :
Semantics of path expressions.
ϕI, a ⊨ ϕ if: Table 2 that a shape closed (R) states that the focus node may only have outgoing properties that are mentioned in R. It may appear that such a shape is simply expressible as the conjunction of ¬∃p.⊤ for p ∈ Σ−R.However, since shapes must be finite formulas, this only works if Σ is finite.In practice, is id -free and non-simple.Lemmas 3.10 and 3.11 tell us that E H