Axiomatizing Hybrid XPath with Data

In this paper we introduce sound and strongly complete axiomatizations for XPath with data constraints extended with hybrid operators. First, we present HXPath=, a multi-modal version of XPath with data, extended with nominals and the hybrid operator @. Then, we introduce an axiomatic system for HXPath=, and we prove it is strongly complete with respect to the class of abstract data models, i.e., data models in which data values are abstracted as equivalence relations. We prove a general completeness result similar to the one presented in, e.g., [BtC06], that ensures that certain extensions of the axiomatic system we introduce are also complete. The axiomatic systems that can be obtained in this way cover a large family of hybrid XPath languages over different classes of frames, for which we present concrete examples. In addition, we investigate axiomatizations over the class of tree models, structures widely used in practice. We show that a strongly complete, finitary, first-order axiomatization of hybrid XPath over trees does not exist, and we propose two alternatives to deal with this issue. We finally introduce filtrations to investigate the status of decidability of the satisfiability problem for these languages.


XPath as a Modal Logic with Data Tests
XPath is, arguably, the most widely used query language for the eXtensible Markup Language (XML). Indeed, XPath is implemented in XSLT [Wad00] and XQuery [Wor02] and it is used in many specification and update languages (e.g., Saxon). It is, fundamentally, a general purpose language for addressing, searching, and matching pieces of an XML document. It is an open standard and constitutes a World Wide Web Consortium (W3C) Recommendation [CD99]. In [KRV15] XPath is adapted to be used as a powerful query language over knowledge bases. Core-XPath [GKP05] is the fragment of XPath 1.0 containing the navigational behaviour of XPath. It can express properties of the underlying tree structure of the XML document, such as the label (tag name) of a node, but it cannot express conditions on the actual data contained in the attributes. In other words, it is essentially a classical modal logic [BdRV01,BvBW06]. For instance the path expressions child, parent and descendant-or-self are basically the modal operators ♦, ♦ − and ♦ * respectively.
Core-XPath has been well studied from a modal logic point of view. For instance, its satisfiability problem is known to be decidable even in the presence of Document Type Definitions (DTDs) [Mar04,BFG08]. Moreover, it is known that it is equivalent, in terms of expressive power, to first-order logic with two variables (FO2) over an appropriate signature on trees [MdR05], and that it is strictly less expressive than Propositional Dynamic Logic (PDL) with converse over trees [BK08]. Sound and complete axiomatizations for Core-XPath have been introduced in [tCM09,tCLM10]. It has been argued that the study of XPath from a modal logic point of view is not merely a theoretical challenge but it has also concrete applications. For instance, by investigating the expressive power of XPath fragments it is possible to determine if an integrity constraint can be simplified. Also, query containment and query equivalence -two fundamental tasks in query optimization-can be reduced to the satisfiability problem. In particular these results have direct impact on, e.g., security [FCG04], type checking [MN07] and consistency of XML specifications [AFL02].
However, from a database perspective, Core-XPath falls short to define the most important construct in a query language: the join. Without the ability to relate nodes based on the actual data values of the attributes, the logic's expressive power is inappropriate for many applications. The extension of Core-XPath with (in)equality tests between attributes of elements in an XML document is named Core-Data-XPath in [BMSS09]. Here, we will call this logic XPath = . Models of XPath = are usually data trees which can be seen as XML documents. A data tree is a tree whose nodes contain a label from a finite alphabet and a data value from an infinite domain. From a modal logic perspective, these data trees are a particular class of relational models. In recent years other data structures have been considered, and XPath = has been used to query these structures. In particular, in this article we consider arbitrary data graphs as models for XPath = . Data graphs are the underlying mathematical structure in graph databases (see, e.g., [LV12,RWE13,AG08]) and it is important to study the metalogical properties of languages to query this particular kind of models (see, e.g., [LMV16,ABFF18]). In this respect, we focus on a variant of XPath = which provides us with several interesting expressivity features.  Figure 1 shows an example of a data graph, where we can see that edges carry labels from a finite alphabet (such as "friends"), and whose nodes carry data values, usually presented as a "key: value" pair, with key being a label from a finite alphabet (e.g., "Name") and value being a data value from an infinite domain (e.g., "Alice").
One of the characteristic advantages of graph databases is that they directly exhibit the topology of the data. Most of the approaches for querying these databases have focused on exploring the topology of the underlying graph exclusively. However, little attention has been paid to queries that check how the actual data contained in the graph nodes relates with the topology. XPath = allows comparison of data values by equality or inequality, even though it does not grant access to the concrete data values themselves.
The main characteristic of XPath = is to allow formulas of the form α = β and α = β , where α, β are path expressions that navigate the graph using accessibility relations: so-called unorthodox inference rules (i.e., inference rules involving side conditions) will help us define an axiom system whose completeness can be automatically extended to more expressive logics.
Summing up, our contributions in this article are the following: • We define a sound and strongly complete axiomatic system for multi-modal HXPath = , refining the axioms and rules first presented in [AF16]. • We give a Henkin-style completeness proof, proving that the system is strongly complete for any extension given by pure axioms (formulas involving only nominals) and existential rules (instantiations of first-order formulas with quantification pattern ∀∃ followed by an HXPath = -formula). This gives us a strong completeness result with respect to a wide variety of frame classes. • Even though the main focus of our work is axiomatizing node expressions, we also provide as a by-product strong completeness results for path expressions. • We discuss concrete extensions of HXPath = in which completeness is automatically obtained. In particular we explore backward and sibling navigation, and equality inclusion. • We show that a strongly complete, finitary, first-order axiomatization of hybrid XPath over trees does not exist. Then we introduce an infinite pure axiomatic system that is strongly complete for a slightly larger class of tree models. We also prove that this system is weakly complete over tree models. As a corollary of the method we use, we conclude that the satisfiability problem for HXPath = on trees is decidable in ExpSpace. • Finally, we prove that the satisfiability problem for the multi-modal logic HXPath = and some of its pure extensions is decidable in NExpTime, using filtrations to establish a bounded finite model property. To our knowledge, the best known lower bound can be obtained by PSpace-hardness of mono-modal hybrid XPath with forward navigation [AFS17].
Related Work. It is argued in [BtC06] that hybrid logic is a natural setting for deductive systems involving Kripke semantics. By having nominals in the language, it is possible to imitate the role of first-order constants to obtain a Henkin-style completeness proof. Moreover, hybrid operators provide all the necessary machinery to get general completeness results for a wide variety of systems. The claim is that allowing the use of unorthodox rules, we can define a basic axiomatic system that can be extended with pure axioms and existential rules, which lets us obtain complete extensions with respect to diverse classes of frames, automatically. One of the first proof systems for data-aware fragments of XPath was introduced in [BLS16]. It is a Gentzen-style system for a very restricted language named DataGL, interpreted on finite data trees. In DataGL formulas of the form ♦ = ϕ are read as: "the current node has the same data as a descendant where ϕ holds". They introduced a sound and weakly complete sequent calculus for DataGL and established PSpace-completeness for its validity problem.
An equational system which extends the one presented for navigational fragments of XPath in [tCLM10] was introduced in [ADFF17]. The authors consider first the fragment of XPath = without inequality tests, over data trees. Therein, the proof of weak completeness relies on a normal form theorem for both node and path expressions, and on a canonical model construction for consistent formulas in normal form inspired by [Fin75]. When inequality is also taken into account, similar ideas can be used but the canonical model construction and the corresponding completeness proof become quite complex. A Hilbert-style axiomatization for XPath = extended with hybrid operators and backward navigation was first introduced in [AF16], using ideas from hybrid logic: nominals play the role of first-order constants in a Henkin-style completeness proof. The system we present herein is a vast extension of the work from [AF16], and it automatically encompasses a large family of logics.
In [AFS17] nominals are used as labels in a tableaux system for a restricted variant of HXPath = with only one accessibility relation and one equality relation. The tableaux calculus is shown to be sound, complete and terminating, and it is used to prove that the satisfiability problem for such logic is PSpace-complete.
Organization. The article is organized as follows. In Section 2 we introduce the syntax, semantics and a notion of bisimulation for HXPath = . We define the axiomatic system HXP in Section 3 and we prove its completeness. We finish that section by introducing extensions of HXP and their complete axiomatizations with pure axioms and existential rules. In Section 4 we introduce an axiomatic system for XPath over the class of data trees. Then, we use filtrations to show that the satisfiability problem of HXPath = is decidable in Section 5. To conclude, in Section 6 we discuss the results and introduce future lines of research.

Hybrid XPath with Data
In this section we introduce the syntax and semantics for the logic we call Hybrid XPath with Data (HXPath = for short). Then, we present a notion of bisimulation for HXPath = .
2.1. Syntax and Semantics of HXPath = . Thoughout the text, let Prop be a countable set of propositional symbols; let Nom be a countable set of nominals such that Prop∩Nom = ∅; and let Mod and Eq be sets of modal and equality symbols, respectively.
Definition 2.1 (Syntax). The sets PExp of path expressions (which we will note as α, β, γ, . . .) and NExp of node expressions (which we will note as ϕ, ψ, θ, . . .) of HXPath = are defined by mutual recursion as follows: where p ∈ Prop, i ∈ Nom, a ∈ Mod, e ∈ Eq, ϕ, ψ ∈ NExp, and α, β ∈ PExp. We will refer to members of PExp ∪ NExp as expressions. In what follows, when referring to expressions of HXPath = we will reserve the term formula for members of NExp. We say that an expression is pure if it does not contain propositional symbols.
Notice that path expressions occur in node expressions in data comparisons of the form α = e β and α = e β , while node expressions occur in path expressions in tests like [ϕ].
In what follows we will always use * for = e and = e , when the particular operator used is not relevant. Other Boolean operators are defined as usual: ϕ ∨ ψ := ¬(¬ϕ ∧ ¬ψ), and ϕ → ψ := ¬ϕ ∨ ψ. Below we define other operators as abbreviations.
Definition 2.2 (Abbreviations). Let α, β, δ be path expressions, γ 1 , γ 2 path expressions or the empty string, ϕ a node expression, i a nominal, and p an arbitrary symbol in Prop. We define the following expressions: As a corollary of the definition of the semantic relation below, the diamond and box expressions α ϕ and [α]ϕ will have their classical meaning, and the same will be true for hybrid formulas of the form @ i ϕ. Notice that we use @ i both as a path expression and as a modality; the intended meaning will always be clear in context. Notice also that, following the standard notation in XPath logics and in modal logics, the [ ] operation is overloaded: for ϕ a node expression and α a path expression, both [α]ϕ and α[ϕ] are well-formed expressions; the former is a node expression where [α] is a box modality, the latter is a path expression where [ϕ] is a test.
We now define the structures that will be used to evaluate formulas in the language. An abstract data frame (or just frame) is a tuple F = M, {∼ e } e∈Eq , {R a } a∈Mod . Let F be a class of frames, its class of models is A concrete hybrid data model is a tuple M = M, D, {R a } a∈Mod , V , nom, data , where M is a non-empty set of elements; D is a non-empty set of data; for each a ∈ Mod, R a ⊆ M × M is the associated accessibility relation; V : M → 2 Prop is the valuation function; nom : Nom → M is a function which names some nodes; and data : Eq × M → D is a function which assigns a data value to each node of the model (for each type of equality considered in the model).
Concrete data models are most commonly used in applications, where we encounter data from an infinite alphabet (e.g., alphabetic strings) associated to the nodes in a semistructured database, and different ways of comparing this data. It is easy to see that to each concrete data model we can associate an abstract data model where data is replaced by an equivalence relation that links all nodes with the same data, for each equality symbol from Eq. Vice-versa, each abstract data model can be "concretized" by assigning to each pair node and symbol from Eq, its equivalence data class as data. We will prove soundness and completeness over the class of abstract data models and, as a corollary, obtain completeness over concrete data models. As mentioned, it is a straightforward exercise to show that modal and hybrid operators have their intended meaning. Example 2.6. We list below some HXPath = expressions together with their intuitive meaning: There exists an α path between the current points of evaluation; the second node is named i.
There exists an α path between the node named i and some other node.
The node named i has the same data as the node named j, w.r.t. the data field e. α = e @ i β There exists a node accessible from the current point of evaluation by an α path that has the same data than a node accessible from the point named i by a β path, w.r.t. the data field e.
The next example illustrates the expressivity gained by adding hybrid operators into the language, in a concrete example.
Example 2.7. Consider the following queries in the data graph from Figure 1 (we omit the subdindex in the symbols = and =, since we deal with only one equality relation).
The person with the id A1 is friend of someone born in the same day.
[@ A2 born[V alue] = @ A2 friends born[V alue]] All friends of the person with id A2 were born in a day different to hers.
the same value (both are named Alice) and they were born in different days.
In the last item, the conjunct @ A1 [N ame] = @ A2 [N ame] states that the nodes named by the nominals A1 and A2 contain a data value corresponding to the key Name, expressed by the test [N ame]; also, these data values coincide, without referring to the actual value.
It is worth noting that with classical XPath = navigation, the two first properties can be expressed only if we are evaluating the formulas in the node with id A1 and A2, respectively. This is due the fact that XPath = only allows local navigation. Moreover, the third formula is not expressible in XPath = , even in the presence of transitive closure operators, as it involves unconnected components of the graph.
Before concluding this section, we will introduce two classical notions that will be useful in the rest of the paper: the modal depth and the set of subformulas of a formula. These definitions are given on node and path expressions seen as strings, in particular they consider the empty string λ which is not in the language. However, these definitions work as intended.
Definition 2.8. We define the modal depth of an expression, by mutual recursion as follows.
Definition 2.9. We define the set of subformulas of an expression, by mutual recursion as follows.

A Note on Bisimulations.
In the rest of the paper we will sometimes need the notion of bisimulation for HXPath = . Bisimulation for XPath on data trees was investigated in [FFA14,FFA15]. In [ABFF16,ABFF18] it is shown that the same notion can be used in arbitrary models. It is simple to extend these definitions to take into account hybrid operators (see, e.g., [ABM01,BdRV01]).
We write M, m M , m (and say that M, m and M , m are HXPath = -bisimilar) if there is an HXPath = -bisimulation Z such that mZm .
The notion of HXPath = -bisimulation extends the one for basic modal logic [BdRV01]. As usual, (Harmony) takes care of atomic information (both propositional symbols and nominals). The standard (Zig) and (Zag) should be strengthened to take into account that XPath modalities deal with two paths at the same time, and moreover equality of data values can be checked at their ending points. Finally (Nom) takes care of the @ operator.
Again, it is straightforward to prove the next proposition.

Axiomatization
In this section we introduce the axiomatic system HXP for HXPath = . It is an extension of an axiomatic system for the hybrid logic HL(@) which adds nominals and the @ operator to the basic modal language (see [BdRV01]). In particular, we include axioms to handle data equality and inequality. We will prove that HXP is sound and strongly complete with respect to the class of abstract hybrid data models. Moreover, we will show that the system is strongly complete for any extension with some particular kind of axioms and inference rules. This result will be helpful in order to automatically obtain complete axiomatic systems for several natural extensions of the language HXPath = . The system is geared for inference over node expressions, but we will discuss path expressions in Section 3.4. We remind the reader that when referring to HXPath = we use the term formula for node expressions. In what follows, we use * in axioms which hold for both = e and = e .
3.1. The Basic Axiomatic System HXP. We present axioms and rules step by step, providing brief comments to help the reader understand their role. In all axioms and rules ϕ, ψ and θ are node expressions; α, β, γ and η are path expressions; and i, j and k are nominals. More precisely, we provide axiom and rule schemes, i.e., they can be instantiated with arbitrary path and node expressions, respecting typing. Hence, ϕ, ψ and θ can be instantiated with node expressions; α, β, γ and η with path expressions; and i, j and k with nominals.
Definition 3.1 (Theorems, Syntactic Consequence, Consistency). For A an axiomatic system, ϕ is a theorem of A (notation: A ϕ) if it is either an instantiation of an axiom of A, or it can be derived from an axiom of A in a finite number of steps by application of the rules of A. Let Γ be a set of node expressions. We write Γ A ϕ and say that ϕ is a In addition to a complete set of axioms and rules for propositional logic, HXP includes generalizations of the K axiom and the Necessitation rule for the basic modal logic to handle modalities with arbitrary path expressions.
Axiom and rule for classical modal logic Then we introduce generalizations of the inference rules for the hybrid logic HL(@). Hybrid rules Now we introduce axioms to handle @. Notice that @ i is a path expression and as a result, some of the standard hybrid axioms for @ have been generalized. In particular, the K axiom and Nec rule above also apply to @ i . In addition, we provide axioms to ensure that the relation induced by @ is a congruence.
Axioms involving the classical XPath operators can be found below. First we introduce axioms to handle complex path expressions in data comparisons. Then we introduce axioms to handle data tests.

Axioms for paths
It is a straightforward exercise to see that the axiomatic system HXP is sound. However, we will provide a more general statement of soundness later on.
Below we prove that some useful theorems and rules can be derived within HXP.
Proposition 3.2. The following are theorems and derived rules of HXP, and will be used (explicitly or implicitly) in the rest of the paper.
Proof. (test-dist and test-⊥). Let * be = e or = e . Then: Replacing * by = e we get ϕ ∧ ψ by equal. Replacing it by = e we get = e , and given distinct we can derive ⊥. (@-swap). .
3.2. Extended Axiomatic Systems. In the next section we will prove that not only HXP is complete with respect to the class of all models, but that extensions of HXP with pure axioms and existential saturation rules preserve completeness with respect to the corresponding class of models. In this section we present such extensions. The use of nominals, and in particular pure axioms and existential saturation rules, allows us to characterize classes of models that are not definable without them. For instance, the hybrid axiom @ i ¬ a i forces the accessibility relation related to a in the model to be irreflexive, which cannot be expressed in the basic modal logic. In this way, we can express properties about the underlying topology of a data model and also impose restrictions about the data fields. For instance, the axiom @ i = e @ j → @ i = d @ j for data inclusion that will be discussed in Section 3.5, expresses that if two nodes coincide in the data field e, then they must also coincide in the data field d.
Standard Translation. In order to establish frame conditions associated to pure axioms and existential saturation rules we define the standard translation of HXPath = into first-order logic, by mutual recursion between NExp and PExp.
Definition 3.3. The correspondence language for expressions of HXPath = is a relational language with a unary relation symbol P i for each p i ∈ Prop, a binary relation symbol R a for each a ∈ Mod and a binary relation symbol D e for each e ∈ Eq. Moreover, for each nominal i ∈ Nom we will associate an indexed variable x i . The function ST x from HXPath = -formulas 5:14

C. Areces and R. Fervari
Vol. 17:3 into its correspondence language is defined by where y, z are first-order variables which have not been used yet in the translation, and where ST x,y is defined as with z not used yet in the translation. Finally, let Eq(e) be the first-order formula stating that the relation D e is an equivalence relation (i.e., reflexive, symmetric and transitive), we define the standard translation ST x of a formula ϕ as Since the standard translation mimics the semantic clauses for HXPath = -formulas, the following proposition holds. Notice, in the proposition above, that M = M, {∼ e } e∈Eq , {R a } a∈Mod , V , nom is used to interpret ST x (ϕ) with V (p) interpreting the unary relation symbol P , each accessibility relation R a interpreting the binary relation symbol R a and each relation ∼ e interpreting the binary relation D e .
Pure Axioms and Existential Saturation Rules. We have now all the ingredients needed to define pure axioms and existential saturation rules, together with their associated frame conditions. Definition 3.5 (Pure Axioms). We say that a formula (or in particular, an axiom) is pure if it does not contain any occurrences of propositional symbols. Let Π be a set of pure axioms, we define FC(Π), the frame condition associated to Π as the universal closure of the standard translation of the axioms in Π, i.e., . . , i n all the nominals in ϕ}.
Definition 3.6 (Existential Saturation Rules). Let ϕ(i 1 , . . . , i n , j 1 , . . . , j m ) be an HXPath = formula with no propositional symbols such that i 1 , . . . , i n , j 1 , . . . , j m is an enumeration of all nominals appearing in ϕ. An existential saturation rule is a rule of the form ϕ(i 1 , . . . , i n , j 1 , . . . , j m ) → ψ ψ provided that j 1 , . . . , j m do not occur in ψ. We will call ϕ the head of the rule, i 1 , . . . , i n its universally quantified nominals (notation: ϕ ∀ ), and j 1 , . . . , j m its existentially quantified nominals (notation: ϕ ∃ ) 1 . Let P be a set of existential saturation rules, we define FC(P), the frame condition associated to P as follows: Let Π be a set of pure formulas and let P be a set of existential saturation rules, in what follows we write HXP + Π + P for the axiomatic system HXP extended with Π as additional axioms and P as additional inference rules. In the coming sections we will show that this system is strongly complete with respect to the class of models based on frames satisfying the frame condition FC(Π) ∧ FC(P).
Consider the formula @ i a i. It can be used to define two different existential saturation rules: and Consider now the rule corresponding to the Church-Rosser property discussed in [BtC06]: where l does not occur in ψ.
The frame condition corresponding to this rule is: It is not difficult to see that any pure axiom π is equivalent to the rule that uses π as head without side conditions (i.e., π ∃ is empty). In that sense, any axiomatic system HXP + Π + P is equivalent to some HXP + P , as pure axioms do not introduce additional expressive power. On the other hand, properties that mix both universal and existential quantification like the Church-Rosser property mentioned above cannot be captured using only pure axioms. However, axioms are simpler than rules with side conditions; and pure axioms are expressive enough to characterize many interesting properties.
1 Notice that given a particular ϕ, different existential saturation rules can be defined depending on which nominals are listed in its side condition.

C. Areces and R. Fervari
Vol. 17:3 3.3. Soundness and Completeness. It is a fairly straightforward exercise to prove that the axioms and rules of HXP are sound with respect to the class of all models. Similarly, any set Π of pure axioms is sound with respect to the class of models obtained from frames satisfying FC(Π), and any set P of existential saturation rules is sound with respect to the class of frames satisfying FC(P) (the proof is similar to the one provided in [BtC06]).
Theorem 3.8 (Soundness). Let Π be a set of pure axioms and let P be a set of existential saturation rules. All the axioms and rules from HXP are valid over the class of abstract hybrid data models satisfying the frame condition FC(Π) ∧ FC(P).
Now we will devote ourselves to show that the axiomatic system is also strongly complete. The completeness argument follows the lines of the completeness proof for HL(@) and similar approaches (see, e.g., [Gol84,BtC06,SP10]), which is a Henkin-style proof with nominals playing the role of first-order constants.
We will prove in this section that the system HXP+Π+P is strongly complete with respect to the class of abstract hybrid data models obtained from frames satisfying FC(Π) ∧ FC(P). More precisely, given a particular extension HXP + Π + P, we will show that if C = Mod(F) for F the class of frames satisfying FC(Π) ∧ FC(P), then Γ |= C ϕ implies Γ HXP+Π+P ϕ, where Γ ∪ {ϕ} is a set of HXPath = -formulas. Or, equivalently, we need to show that every consistent set of formulas (for HXP + Π + P) is satisfiable in some abstract hybrid data model (in C). Recall that subscripts in relations and |= are ommited when they are clear from the context. Definition 3.9. Let Γ be a set of formulas, we say that Γ is an HXP + Π + P maximal consistent set (MCS for short) if and only if Γ ⊥ and for all ϕ / ∈ Γ we have Γ ∪ {ϕ} ⊥.
Proof. Item 1 is a consequence of @-intro', 2 follows from @=-dist and 3 can be proved using agree and =-comm.
The next fact follows from the definition of MCS, as expected: Fact 3.11. Let Γ be an MCS. Then for all ϕ, either ϕ ∈ Γ or ¬ϕ ∈ Γ.
So far, we presented an axiom system together with the standard tools for proving its completeness. We also introduced non-orthodox rules (i.e., rules with side conditions), which will play a crucial role in the Henkin-style model we will build for proving completeness. The paste rule expresses that path expressions can control what happens in accessible states from a named state. The name rule says that if ϕ is provable to hold in an arbitrary state named by j, then ϕ is also provable. Now we introduce some properties that will be required in the construction of the Henkin-style model.
Now we are going to prove a crucial property in our completeness proof: the Extended Lindenbaum Lemma. Intuitively, it says that the rules of HXP + Π + P allow us to extend MCSs to named and pasted MCSs, provided we enrich the language with new nominals. This lemma will be useful to obtain the models we need from an MCS.
Lemma 3.13 (Extended Lindenbaum Lemma). Let Nom be a (countably) infinite set of nominals disjoint from Nom, and let HXPath = be the language obtained by adding these new nominals to HXPath = . Then, every consistent set of formulas in HXPath = can be extended to a named and pasted MCS in HXPath = .
Let Σ ω = n≥0 Σ n . This set is named (by k), maximal and pasted. Furthermore, it is consistent as a direct consequence of the paste rule.
Lemma 3.14 (Rule Saturation Lemma). Let Nom be a (countably) infinite set of nominals disjoint from Nom, and let HXPath = be the language obtained by adding these new nominals to HXPath = . Let Π be a set of pure axioms, and P be a set of existential saturation rules. Then, every consistent set of formulas in HXPath = can be extended to a named, pasted and P-saturated MCS in HXPath = .
Let Nom be a (countably) infinite set of nominals disjoint from Nom. Given Σ and P, say that a pair (ρ,ī) is well-formed if ρ ∈ P with head ϕ, andī is a sequence of nominals in Σ with length equal to the length of ϕ ∀ . For Σ and P countable, the set of well-formed pairs is countable and can be enumerated. Define Σ 0 = Σ. Let (ρ n+1 , i 1 . . . i k ) be the (n + 1) th pair in the enumeration. Define Σ n+1 as follows.

C. Areces and R. Fervari
Vol. 17:3 We define Σ + = n≥0 Σ n , which is consistent and extends Σ as described above. Now consider any consistent set of formulas Γ. Let Γ 0 = Γ, and for all n ≥ 0 let Γ n+1 be a named and pasted MCS extending (Γ n ) + (which exists by Lemma 3.13). Then we have the following chain of inclusions: Let Γ ω = n≥0 Γ n . Then Γ ω is a named, pasted and P-saturated MCS in HXPath = . As we used only countably many new nominals, this set is also countable.
From a named and pasted MCS we can extract a model: We need to prove that M Γ is well defined, and that it is actually an abstract hybrid data model. (1) ∆ i = ∆ j implies @ i ϕ ∈ Γ iff @ j ϕ ∈ Γ.
(2) For all e ∈ Eq, ∼ e is an equivalence relation.
Proof. Item 1 ensures that the definition of M Γ does not depend of the particular nominal taken as representative of ∆ i . The property follows directly from bridge.
For item 2, we prove: -Reflexivity: We need to show that ∆ i ∼ e ∆ i , which by definition is equivalent to @ i = e @ i ∈ Γ. By equal we have = e ∈ Γ, and applying Nec we get @ i = e ∈ Γ. Also, by comp=-dist, @ i = @ i ∈ Γ, and by agree @ i = e @ i @ i ∈ Γ. Then, by @ =-dist, @ i = e @ i ∈ Γ, as wanted.
Corollary 3.19. Let Γ be an MCS and let M Γ = M, {∼ e } e∈Eq , {R a } a∈Mod , V , nom be the extracted model, and ∆ i ∈ M . If @ i aα ϕ ∈ Γ, then there exists ∆ j ∈ M such that ∆ i R a ∆ j and @ j α ϕ ∈ Γ.
Proof. Let ∆ i ∈ M . By hypothesis, @ i aα[ϕ] = e aα[ϕ] ∈ Γ, then by Lemma 3.18 there exists ∆ j ∈ M such that ∆ i R a ∆ j and @ j α[ϕ] = e @ i α[ϕ] ∈ Γ. By comp=-dist, @ j α[ϕ] = e @ j @ i α[ϕ] ∈ Γ, and by comp-neutral and subpath we get @ j α[ϕ] ∈ Γ. Then, using comp-dist, comp-assoc and =-test, we have @ j α ϕ ∈ Γ. Now we are ready to prove the Truth Lemma that states that membership in an MCS generating an extracted model is equivalent to being true at a state in the extracted model. First let us introduce a notion of size for node and path expressions, which we will use in the inductive cases of the proof.
Definition 3.20. We define inductively the size of a path and node expression (notation | · |) as follows: where α, β are path expressions and ϕ, ψ are node expressions.
Proof. In fact we will prove a stronger result. Let ∆ i , ∆ j ∈ M , ϕ be a node expression and α be a path expression. 5:20

C. Areces and R. Fervari
Vol. 17:3 The proof is a double induction argument, proceeding first on the structural complexity of ϕ and α, and then on the size of path-formulas as per Definition 3.20. First, we prove the base cases: But by definition of nom, ∆ j = ∆ k , and because we know that j ∈ ∆ j we have j ∈ ∆ k . Then, we have @ k j ∈ Γ, and by agree, Now we prove the inductive cases: By (IH1), we have @ i ψ ∈ Γ and j ∈ ∆ i , then @ i j ∈ Γ. As Γ is an MCS, we have @ i (ψ ∧ j) ∈ Γ, and by idempotence of the conjunction we have @ i (ψ ∧ ψ ∧ j ∧ j) ∈ Γ. Also, = e is a theorem, by Nec we have @ i = e ∈ Γ, then @ i (ψ ∧ ψ ∧ j ∧ j ∧ = e ) ∈ Γ. Using =-test and =-comm we obtain [j] ∈ Γ (which is the same as @ i [ψ] j) as we wanted.
For node expressions of the form α * β we need to do induction on the size of α and β. Notice that by * -comm, @ i α * β ∈ Γ iff @ i β * α ∈ Γ. And by the semantic definition, So we need only carry out the inductive steps for α. Moreover, by comp-neutral, α * β ↔ α * β which is also a validity. So we can assume that every path ends in a test. The base case then is when |α| + |β| = 2, and both α and β are tests.
For the other direction, suppose M Γ , Then we have, ∆ i R a ∆ t , and by definition of M Γ we get @ i a t ∈ Γ. On the other hand, by (IH2) on M Γ , ∆ t , ∆ j |= β we obtain @ t β j ∈ Γ. Since {@ i a t, @ t β j} ⊆ Γ, by comp-dist we have: (1) @ i aβ j ∈ Γ.
For the other direction suppose @ i @ j β = e γ ∈ Γ. First notice that, in all the cases we considered so far, the induction is on the path expression appearing on the left side of the =. However, an analogous argument can be applied if we do induction in the path expression on the right side of the =, by =-comm. Suppose we proceed as above for the node expression @ j β = e γ . If we apply the exact same steps, we will find out that this time we need to do induction on γ; but, as we mentioned, the cases for a and [ϕ] are symmetric in both sides of the =. As a consequence, it all boils down to consider only the case γ = @ k η.
Suppose @ i @ j β = e @ k η ∈ Γ. By Existence Lemma we have @ j β = e @ k η ∈ Γ, hence by (IH1) M Γ , ∆ j |= β = e @ k η . By semantics of @, and the fact that M Γ is named, -The cases involving = e are analogous, using item 1 from Proposition 3.17 to obtain @ j = e @ k / ∈ Γ in item 3 above. Proof. Since M Γ is a named model and Γ contains all instances of elements of Π, it follows that the underlying frame of M Γ satisfies FC(Π). Since M Γ is a named model and Γ is P-saturated, it follows that the underlying frame of M Γ satisfies FC(P).
As a result we obtain the completeness result.
Theorem 3.23 (Strong Completeness). Let Π be a set of pure axioms and P a set of existential saturation rules. Let C = Mod(F) for F the class of all frames satisfying FC(Π) ∧ FC(P). Then, the axiomatic system HXP + Π + P is strongly complete for C.
Proof. We need to prove that every set of HXPath = -formulas Σ is consistent if and only if Σ is satisfiable in an abstract hybrid data model satisfying the frame properties defined by Π and P. For any consistent Σ, we can use the Rule Saturation Lemma to obtain Σ ω , which is a named, pasted and P-saturated MCS in HXPath = extended by a set Nom of additional nominals. Let M Σ ω = M, {∼ e } e∈Eq , {R a } a∈Mod , V , nom be the extracted model from Σ ω . Let i ∈ Σ ω , for all ϕ ∈ Σ, by @-intro we have @ i ϕ ∈ Σ ω since Σ ω is MCS. Then by the Truth Lemma, M Σ ω , ∆ i |= ϕ. By the Frame Lemma, M Σ ω satisfies all required frame properties.
Because the class of abstract data models is a conservative abstraction of concrete data models, we can conclude: Corollary 3.24. The axiomatic system HXP + Π + P is strongly complete for the class of concrete hybrid data models satisfying the frame conditions FC(Π) ∧ FC(P).

3.4.
Completeness for path formulas. The main contribution of this article is a characterization of (local) semantic consequence between node formulas by means of an axiomatic system. Given that soundness of an axiomatic system is usually granted, this is the main outcome of the strong completeness result we just proved. In the setting of XPath it also makes sense to discuss the issue of inference between path formulas. Path inference can be traced back to [Pra91], in the context of dynamic algebras. We will now show how Theorem 3.23 can also be used to characterize path inference. First, recall that previous work like [tCLM10,ADFF17] provided equational axiomatizations that characterize theorems of the form ϕ ≡ ψ (for ϕ, ψ node expressions) and α ≡ β (for α, β path expressions). The first are obviously covered by our results that characterize theoremhood for arbitrary node expressions (in this case ≡ is nothing more than ↔). For equivalence between path formulas the following proposition suffices: Proposition 3.25. Let α, β be path expressions in HXPath = , then |= α ≡ β iff |= @ i α j ↔ @ i β j for i, j not appearing in α, β.
Hence, completeness for equivalence theorems between path formulas follow from Theorem 3.23. More interesting is to consider whether a strong completeness result for path consequence is also possible. We need first to introduce this notion. There exists at least two natural possible definitions: Definition 3.26. Let Λ ∪ {α} be a set of path formulas in HXPath = , and let C be a class of models. Define • Λ |= 1 C α iff for any model M ∈ C, (∀m, n, M, m, n |= Λ implies M, m, n |= α). • Λ |= 2 C α iff for any model M ∈ C, (∀m, n, M, m, n |= Λ implies ∀m, n, M, m, n |= α). As before we write |= instead of |= C if the intended class is clear from context. From the above definition it is easy to show that |= 1 C implies |= 2 C but not vice versa. Now, both |= 1 and |= 2 can be captured using Theorem 3.23.
Theorem 3.27. Let Π be a set of pure axioms and P be a set of existential saturation rules. Let C = Mod(F) for F the class of frames satisfying FC(Π) ∧ FC(P). Let Λ ∪ {α} be a set of path expressions in HXPath = then The result is a corollary of Theorem 3.23 because the following hold: 3.5. Some Concrete Examples of Pure Extensions. In this section we introduce some extensions of HXPath = , and their corresponding extended axiomatic systems. In all cases, we can apply the result from previous section and automatically obtain completeness, since we only use pure axioms and/or existential saturation rules.

5:24
C. Areces and R. Fervari Vol. 17:3 Backwards Navigation. We start with the language HXPath = ( ), i.e., HXPath = extended with the path expression a (backwards navigation). The intuitive semantics for a is M, x, y |= a iff xR −1 a y, i.e., a is interpreted on the inverse of the accessibility relation associated to a. Equivalently, In fact, following our presentation, a is an additional modal operator from Mod (to which all rules and axioms of HXP apply), and in addition we will insist that in models of HXPath = ( ), this accessibility relation is always interpreted as the inverse of the one for a.
Formally, consider two modal symbols a and a in Mod (and assume that their respective accessibility relations in a model are R a and R a ). Define the set Π 1 of pure axioms that characterizes the interaction between a and a as Axioms for a, a -interaction Since the axioms presented above are pure, the axiom system we obtain is complete for models obtained from frames satisfying FC(Π 1 ). It is a simple exercise to verify that FC(Π 1 ) is equivalent to ∀x.∀y.(R a (x, y) ↔ R a (y, x)).
Sibling Navigation. We consider now the extension of HXPath = ( ) with sibling navigation s, denoted HXPath = ( , s). Intuitively, M, x, y |= s iff x = y and there is some z s.t. zR a x and zR a y, where a ∈ Mod is fixed.
Consider now three modal symbols a, a and s in Mod (and their respective accessibility relations R a , R a and R s ). Define the set Π 2 of pure axioms that characterizes their interaction as the axioms in Π 1 together with Axioms for siblings As Π 2 extends Π 1 , FC(Π 2 ) ensures that R a and R a are inverses, and in addition that ∀x.∀y.(R s (x, y) ↔ ((x = y) ∧ (∃z.(R a (x, z) ∧ R a (z, y))))).
Data Equality Properties. As a final, very simple, example let us consider pure axioms defining the behaviour of equality tests. Consider a language with two equality test operators = e and = d such that one defines finer equivalence classes than the other (i.e., if ∼ e and ∼ d are their respective accessibility relations we want to ensure that ∼ e ⊆ ∼ d ). Let us call this logic HXPath = (⊆). Let Π 3 be the singleton set containing the axiom Axioms for equality inclusion Proposition 3.30. HXP + Π 3 is sound and strongly complete for HXPath = (⊆).

Hybrid XPath on Trees
The Henkin-style model construction from Section 3.3 provides us with the tools to obtain complete axiomatizations for a wide variety of extensions of HXPath = . However there are interesting cases in which completeness is not a direct corollary of the results we already presented. One such case is the class of tree models which we call C tree . We will devote this section to investigate hybrid XPath over the class of tree models. First, in Section 4.1 we will show that it is not possible to get a strongly complete finitary first-order axiomatization over trees. In Section 4.2 we show that it is possible to obtain a strongly complete infinite first-order axiomatization over a slightly larger class of models, while in Section 4.3 we prove that this axiomatization is weakly complete for C tree .
Tree models are interesting in this context since XPath is commonly used as a query language over XML documents; and mathematically, the relational structure of an XML document is a tree. In our setting, this consists of asking that the union of all relations in a model has a tree-shape.
Without loss of generality, in the rest of this section we will work in a mono-modal setting, with a unique basic path expression a, and with single equality/inequality comparisons denoted = and =. The path expression a will be interpreted over an accessibility relation R a , representing the union of all the accessibility relations in the model. To achieve this, we need to work in a signature where the set Mod is finite. Therefore, we can include the followig additional axiom: a-definition a ↔ a 1 ∪ . . . ∪ a k with Mod = {a 1 , . . . , a k } It is a straightforward exercise to show that this axiom characterizes R a as the union of all the relations of the model, in a signature where Mod is finite. As a consequence, in this section we will focus on axiomatizing R a as the acessibility relation of a tree. In this way, we obtain the intended meaning for XPath as a query language for XML documents.
Hence, a model in this signature is a tuple M = M, ∼, R a , V , nom , in which we have a unique data equality relation ∼ and the accessibility relation R a . 4.1. Failure of Strong Completeness over Trees. Let us start by formally defining the structures we will consider.
Definition 4.1. C tree is defined as the class of abstract hybrid data models M, ∼, R a , V , nom satisfying: • there is a unique point in M without predecessors by R a , which we call the root; • every node n ∈ M is reachable from the root by R a in zero or more steps; • for all n, l, l ∈ M , if lR a n and l R a n, then l = l ; and • there is no n ∈ M such that n is reachable by R a from itself in one or more steps. Intuitively, a model M, ∼, R a , V , nom is in C tree if M, R a is a tree.
In order to show that there is no finitary first-order axiomatization for HXPath = over C tree , we will use a standard tool from first-order model theory: the compacteness theorem. In particular, we will show that HXPath = over C tree is not compact, and consequently, a finitary axiomatization may not exist (see, e.g., [End01] for details).
Proposition 4.2. HXPath = is not compact over the class C tree .
Proof. We need to show that there exists an infinite set of HXPath = -formulas such that every finite subset Γ f in ⊂ Γ is satisfiable in the class C tree , but Γ is not.
Without loss of generality, we will use natural numbers as names for nominals, i.e., Nom = N. We start by defining the following formula indicating that, at the given point, the nominal k is the only one satisfied from the set {0, . . . , n}: We can define for n ≥ 0: Only n i .
The formula Lin(n) states that there exists a linear chain of named states of length n. Now let us define the set Γ Lin = {Lin(n) | n ∈ N}. Notice that Γ Lin enforces structures with at least one infinite branch of the shape . . . 3 2 1 0 Γ Lin satisfies the following properties: (1) Γ Lin does not have a model in C tree , since it enforces an infinite chain without a root (since every node has a predecessor), and (2) for all finite sets Γ f in Lin ⊂ Γ Lin , Γ f in Lin has a model in C tree . Hence, the proposition follows.
Theorem 4.3. There is no finitary first-order axiomatization which is strongly complete for HXPath = over C tree .
Proof. Suppose there is some finitary first-order axiomatization H which is sound and strongly complete for HXPath = over C tree , and let be obtained from H. For any set of formulas Γ, we know: ( †) if for all finite set Γ f in ⊆ Γ, Γ f in is consistent (i.e., Γ f in ⊥), then Γ is consistent.
This follows from the fact that any proof is finite, so in order to make Γ inconsistent, only a finite set of its formulas is needed.
Since H is strongly complete, by item 2 in Proposition 4.2, each set Γ f in Lin defined therein is consistent. Then, by item ( †) also Γ Lin is consistent, and again by strong completeness of H we can conclude that Γ Lin has a model in C tree , contradicting item 1 in Proposition 4.2. Therefore, there is no finitary first-order axiomatic system H strongly complete for C tree .
Corollary 4.4. There are no Π and P such that HXP + Π + P is strongly complete for C tree . Axiomatizing the Class C f orest − with Pure Axioms. Define the class C f orest − as follows: Definition 4.5. Let C f orest − be the class of models M, ∼, R a , V , nom such that • there is no n ∈ M such that n is reachable by R a from itself in one or more steps; • for all n ∈ M, n has at most one predecessor by R a .
Notice that the class C f orest − admits forests made of a collection of models that consist of a tree, possibly extended with an infinite chain attached to the root. Models from the class C f orest − differ from those in C tree , given that we relaxed the first two conditions from Definition 4.1. We will introduce a strongly complete axiomatization for C f orest − . Let Π f orest − be the set of axioms below. Define a 1 as a, and a n+1 as aa n .

Axioms for forests
no-loops @ i ¬ a n i for all n > 0 no-join @ j a i ∧ @ k a i → @ j k In the table above, no-loops is an infinite set of axioms preventing loops of any size, whereas no-join prevents the existence of more than one predecesor. Since our axioms are pure and enforce the appropriate structures, from Theorem 3.23 we get: Theorem 4.6. The axiomatic system HXP + Π f orest − is strongly complete for C f orest − .

4.3.
A Weakly Complete Axiomatization for Trees. In this section we will show that the axiom system HXP + Π f orest − introduced before, is weakly complete for C tree . Recall that for weak completeness we need to show that for any HXPath = -formula ϕ, |= ϕ implies ϕ. In order to achieve this result we need to work on the extracted model. In what follows we use mR h a n to denote mR a u 1 R a . . . R a u h−1 R a n, for some sequence of states u 1 , . . . , u h−1 .
Definition 4.7. Let M = M, ∼, R a , V , nom be a hybrid data model, N ⊆ Nom, N = ∅, and n ≥ 0. We define M (n,N ) as Intuitively, M (n,N ) is the restriction of M to the states reached from a nominal in N , in at most n steps. It is obvious that M (n,N ) has finite depth if N is finite.
For item 2, we will prove a stronger result.
With this construction at hand, we obtain almost the structure of the model we are looking for.

Discussion
We introduced a sound and strongly complete axiomatization for HXPath = , i.e., the language XPath with forward navigation for multiple accessibility relations; multiple equality/ inequality data comparisons; and where node expressions are extended with nominals, and path expressions are extended with the hybrid operator @. The hybridization of XPath allowed us to apply a completeness argument similar to the one used for the hybrid logic HL(@) shown in, e.g., [BtC06]. This ensures that certain extensions of the axiomatic system we introduce are also strongly complete. The axiomatic systems that can be obtained in this way cover a large family of hybrid XPath languages over different classes of frames.
Our system extends the calculus introduced previously in [AF16]. The most important improvement is that we provide a minimal system which can be extended with axioms and rules of certain kind, such that strong completeness immediately follows. The kind of axioms and rules we allow in the extensions ensures completeness with respect to a large family of frame classes. We showed interesting examples of such extensions. In particular, we obtain a strongly complete axiomatization for the logic extended with backward and sibling navigation, and for data equality inclusion.
One particularly interesting class of structures used in practice is the class of tree models, since the main applications of XPath in, e.g., web semantics are related to XML documents. From a mathematical point of view, XML documents are trees. In this respect, we investigate axiomatizations for HXPath = on tree models. First, we showed that there is no finitary first-order axiomatization which is strongly complete for HXPath = on trees. Then we discussed two alternatives for dealing with tree-like structures. On the one hand, it is possible to relax some condition on the models. We consider the class C f orest − , which are forest-like models but possibly with infinite chains with respect to the predecessor relation. More precisely, we extend the basic axiom system with an infinite set of pure axioms to strongly axiomatize the class C f orest − . Another alternative is giving up strong completeness. We showed weak completeness for the class C tree , using pure axioms.
We also investigated the status of decidability for the satisfiability problem of HXPath = and some extensions. We used a standard technique in modal logics named filtrations [BvB06]. The filtration method is a way to build finite models by taking a large model and collapsing as many states as possible. We replicate this technique for HXPath = ( ), and obtain a NExpTime upper bound for its satisfiability problem. We showed that filtrations do not work on some extensions of HXPath = . In particular, for the case of tree models we proved that the satisfiability problem is decidable in ExpSpace using a more specialized approach.
As future work, it would be interesting to investigate XPath = fragments with the reflexive-transitive path a * . One of the main limitations of the framework we introduced in this paper is that it can only axiomatize first-order languages. For that reason, transitive closure operators cannot be accounted for, and a different proof strategy is needed. We conjecture that it is possible to adapt the results from [HKT00] for Propositional Dynamic Logic (PDL) to obtain a weakly complete axiom system for HXPath = ( * ) (i.e., HXPath = extended with a * ) over the class of all models. However, for axiomatizing HXPath = ( * ) over trees, the filtration technique used in [HKT00] does not seem to work, and new developments are needed, as for other logics with fix point operators over tree-like structures (see, e.g., the case of CTL * in [Rey01]).
The exact computational complexity of the logics we considered has not been investigated yet. It has been proved that XPath = with single accessibility and data relations extended with hybrid operators is PSpace-complete [AFS17]. Moreover, we provided upper bounds for multi-modal HXPath = ( ) over arbitrary models and for HXPath = on trees, but without giving a tight lower bound. It would be also interesting to investigate the complexity of the multi-modal languages we studied in this article, enriched with backward and sibling navigation, reflexive-transitive closures, and over the class of trees. We conjecture that we can adapt the automata proof given in [Fig10], with the method used to account for hybrid operators presented in [SV01] to get exact bounds.