Fine-Grained Complexity of Regular Path Queries

A regular path query (RPQ) is a regular expression q that returns all node pairs (u, v) from a graph database that are connected by an arbitrary path labelled with a word from L(q). The obvious algorithmic approach to RPQ-evaluation (called PG-approach), i.e., constructing the product graph between an NFA for q and the graph database, is appealing due to its simplicity and also leads to efficient algorithms. However, it is unclear whether the PG-approach is optimal. We address this question by thoroughly investigating which upper complexity bounds can be achieved by the PG-approach, and we complement these with conditional lower bounds (in the sense of the fine-grained complexity framework). A special focus is put on enumeration and delay bounds, as well as the data complexity perspective. A main insight is that we can achieve optimal (or near optimal) algorithms with the PG-approach, but the delay for enumeration is rather high (linear in the database). We explore three successful approaches towards enumeration with sub-linear delay: super-linear preprocessing, approximations of the solution sets, and restricted classes of RPQs.


Introduction
An essential component of graph query languages (to be found both in academical prototypes as well as in industrial solutions) are regular path queries (RPQs).Abstractly speaking, a regular expression q over some alphabet Σ is interpreted as query that returns from a Σ-edge-labelled, directed graph D (i. e., a graph database) the set q(D) of all node pairs (u, v) that are connected by a q-path, i. e., a path labelled with a word from q's language (and possibly also a witness path per node pair, or even all such paths).This simple, yet relevant concept has heavily been studied in database theory (the following list is somewhat biased towards recent work): results on RPQs Table 1: All upper bounds can be achieved as running times of some algorithm, while the lower bounds cannot be achieved as running time by any algorithm, unless the displayed hypothesis fails.The exponent ω denotes the best known matrix multiplication exponent.
for this intermediate task.Moreover, if we want to benefit from the existing CQ evaluation techniques (e. g., exploiting acyclicity etc.) we are more or less forced to this two-step approach.With respect to enumerations of CQs, it is known that linear preprocessing and constant delay enumeration is possible provided that the CQs satisfy certain acyclicity properties (see [BDG07,BKS17], or the surveys [BGS20,Seg15]).Unfortunately, these techniques do not carry over to CRPQs since, as we show, linear preprocessing and constant delay enumeration is not possible even for single RPQs (conditional to some complexity assumptions).
Since the problem we investigate can be solved in polynomial time (also in combined complexity), we cannot show lower bounds in terms of hardness results for complexity classes like NP or PSPACE.Instead, we make use of the framework of fine-grained complexity, which allows to prove lower bounds that are conditional on some algorithmic assumptions (see the surveys [Wil15,Bri19,Wil18]).In particular, fine-grained complexity is a rather successful toolbox for giving evidence that the obvious algorithmic approach to some basic problem, is also the optimal one.This is exactly our setting here, with respect to RPQ-evaluation and the PG-approach.To the knowledge of the authors, such conditional lower bounds are not yet a well-established technique in database theory (however, see [BGS20, Section 6] for a survey of conditional lower bounds in the context of CQ enumeration).
A main challenge is that fine-grained complexity is not exactly tailored to either the data-complexity perspective or to enumeration problems.We will next outline our results.
1.1.Our Contribution.All investigated RPQ-evaluation problems are summarised on page 8 (see especially Table 3).In the following, D = (V D , E D ) is the graph database, q is the RPQ, and ϵ > 0. With the notation O dc (•), we hide factors f (|q|) for some function f (i.e., it is used for stating data-complexities).All lower bounds mentioned in the following are conditional to some of the algorithmic assumptions summarised in Section 4 (we encourage the reader less familiar with fine-grained complexity hypotheses to have a look at this section first, which can be read independently).For presentational reasons, we do not always explicitly mention this in the rest of the introduction and when we say that a certain running time is "not possible", this statement is always conditional in this sense (see Tables 1 and 2 for the actual hypotheses).As common in fine-grained complexity, we rule out true sub-linear (O(n 1−ϵ )), sub-quadratic (O(n 2−ϵ )), or sub-cubic (O(n 3−ϵ )) running times, but not possible running time improvements by logarithmic factors, e. g., O( n 3 log(n) ).
1.1.1.Non-Enumeration Variants.The following results are summarised in Table 1.For the simple problems Boole (checking q(D) = ∅), Test (checking (u, v) ∈ q(D)) and Witness (computing some element from q(D)), the PG-approach yields an upper bound of O(| D ||q|), which is optimal (since linear) in data complexity, and we can show lower bounds demonstrating its optimality also in combined complexity.For Eval (computing the set q(D)) the PG-approach yields a data complexity upper bound of O dc (|V D || D |), which cannot be improved by combinatorial algorithms, although O dc (|V D | 2.37 ) is possible by fast matrix multiplication (see Section 4 for a discussion of the meaning of the term "combinatorial").
In addition, we can show that linear time data complexity, i. 1.1.2.Enumeration.Our results for RPQ-enumeration are summarised in Table 2.An entry "O(delay)" in column "preprocessing" means that the preprocessing is bounded by the delay (which means that no preprocessing is required).The column "sorted" indicates whether the enumeration is produced lexicographically sorted.
In comparison to the non-enumeration problem variants, the picture is less clear and deserves more explanation.The PG-approach yields a simple enumeration algorithm with delay O(| D ||q|), that also trivially supports updates in constant time, since the preprocessing fits into the delay bound.Our lower bounds for Boole also mean that this delay cannot be improved in terms of combined complexity.While this lower bound was interesting for problems like Boole etc., it now gives a correct answer to the wrong question.The main goal now should be to find out whether we can remedy the linear dependency of the delay on | D |, at the expense of spending more time in terms of |q|, or of losing the ability of handling updates, or even of allowing a slightly super-linear preprocessing.
In this regard, the strongest result would be linear preprocessing O(| D |f (|q|)) and constant delay O(f (|q|)).However, we can rule this out even for algorithms not capable of handling updates.Then, the next question is which non-constant delays can be achieved that are strictly better than linear.For example, none of our lower bounds for the nonenumeration variants suggest that linear preprocessing and a delay bounded by, e. g., |V D | or the degree of D, should not be possible.We are not able to answer this question in its general form (and believe it to be very challenging), but we are able to provide several noteworthy insights.
For linear preprocessing, a delay of O(|V D |) (if possible at all) cannot be beaten by combinatorial algorithms (even without updates).This can be strengthened considerably, if we also require updates in some reasonable time: for general algorithms (i.e., not necessarily combinatorial) delay and update time strictly better than O dc (|V D |) is not possible even with arbitrary preprocessing, and for combinatorial algorithms with linear preprocessing even delay and update time of O(| D |) cannot be beaten.This last result nicely complements the upper bound at least for combinatorial algorithms and in the dynamic case.
In summary, for linear preprocessing, O(|V D |) is a lower bound for the delay and if we can beat O(| D |), we should not be able to also support updates.(2) in linear preprocessing and constant delay, we can enumerate a representative subset of q(D) instead of the whole set q(D);

Enum
(3) for a subclass of RPQs, we can solve RPQ-Enum with linear preprocessing and delay O(∆(D)) (where ∆(D) is the maximum degree of D).

Main Definitions
Let N = {1, 2, 3, . ..} and [n] = {1, 2, . . ., n} for n ∈ N.For a finite alphabet A, A + denotes the set of non-empty words over A and A * = A + ∪ {ε} (where ε is the empty word).For a word w ∈ A * , |w| denotes its length; 2.1.Σ-Graphs.We now define the central graph model that is used to represent graph databases as well as finite automata.Let Σ be a finite alphabet of constant size.A Σ-graph is a directed, edge labelled multigraph G = (V, E), where V is the set of vertices (or nodes) and We say that p is labelled with the word a 1 a 2 . . .a k ∈ Σ * .According to this definition, for every v ∈ V , (v) is a path from v to v of length 0 that is labelled by ε.Hence, every node v of a Σ-graph has an ε-labelled path to itself, even though there might not be an ε-arc from v to v.Moreover, due to ε as a possible edge-label, paths of length k may be labelled with words the underlying graph of G (note that the underlying graph is simple, non-labelled and has no loops).In particular, by a slight abuse of notation, we denote by E * the reflexive-transitive closure of the underlying graph of G. Since we always assume |Σ| to be a constant, we have that , |G| is asymptotically equal to the size of its underlying graph).For every u ∈ V , the degree of u is ∆(u) = | x∈Σ∪{ε} E x (u)| (so ∆(u) is actually the out-degree), and the maximum degree of Since Σ-graphs are the central data structures for our algorithms, we have to discuss implementational aspects of Σ-graphs in more detail.The set V of a Σ-graph G = (V, E) is represented as a list, and, for every u ∈ V and for every x ∈ Σ ∪ {ε}, we store a list of all x-successors of u, which is called the x-adjacency list for u.We assume that we can check in constant time whether a list is empty and we can insert elements in constant time.However, finding and deleting an element from a list requires linear time.Furthermore, we assume that we always store together with a node a pointer to its adjacency list (thus, we can always retrieve the x-adjacency list for a given node in constant time).
Remark 2.1.All lower bounds presented in this paper hold for any graph representation that can be constructed in time linear in |G| = max{|V |, |E|}.For the upper bounds, we chose the simple representation with adjacency lists as it emerged as the natural structure for our enumeration approach; let us point out here that since we always store pointers to the adjacency lists along with the nodes, we can perform a breadth-first search (BFS) from any given start node u in time O(|G|).It is a plausible assumption that most specific graph representations can be transformed into our list-based representation without much effort.This ensures a certain generality of our upper complexity bounds in the sense that the corresponding algorithms are, to a large extent, independent from implementational details.Note also that the list-based structure only requires space linear in |G|.
In the adjacency list representation, we do not have random access to specific nodes in the graph database, or to specific neighbours of a given node.Thus, we have to measure a non-constant running-time for performing such operations.However, the algorithms for our upper bounds are independent from this aspect, i. e., the total running times would not change if we assume random access to nodes in constant time.
An exception to this is Theorem 7.1, for which we can obtain some small improvement by applying the technique of lazy array initialization (see Remark 7.2).
For a Σ-graph G = (V, E), we denote by G R = (V, E R ) the Σ-graph obtained from G by reversing all arcs, i. e., Proof.Since G R = (V, E R ), it is sufficient to show how the adjacency lists can be computed that represent E R .To avoid confusion with respect to whether we talk about the Σ-graph G or the Σ-graph G R to be constructed, we denote the x-adjacency lists by x-G-adjacency lists or x-G R -adjacency lists, respectively.
We first move through the list for V and, for every u ∈ V , we store this node in an array along with an empty x-G R -adjacency list for every x ∈ Σ ∪ {ε}.This requires time O(|V |).Then, for every u ∈ V and x ∈ Σ ∪ {ε}, we move through the x-G-adjacency list for u, and for every element v that we encounter, we add u to the x-G R -adjacency list for v. Since we can access all x-G R -adjacency lists in constant time, we only have to add as many elements to some x-G R -adjacency as there are edges in E, i. e., the second step can be done in time O(|E|).Consequently, we can construct all x-G R -adjacency lists in total time 2.2.Graph Databases and Regular Path Queries.A nondeterministic finite automaton (NFA for short) is a tuple M = (G, S, T ), where G = (V, E) is a Σ-graph (the nodes q ∈ V are also called states), S ⊆ V with S ̸ = ∅ is the set of start states and T ⊆ V with T ̸ = ∅ is the set of final states.The language L(M ) of an NFA M is the set of all labels of paths from some start state to some final state.For a Σ-graph G = (V, E), any subsets S, T ⊆ V with S ̸ = ∅ ̸ = T induce the NFA (G, S, T ).If S = {s} and T = {t} are singletons, then we also write (G, s, t) instead of (G, {s}, {t}).
The set RE Σ of regular expressions (over Σ) is recursively defined as follows: a ∈ RE Σ for every a ∈ Σ ∪ {ε}; (α • β) ∈ RE Σ , (α ∨ β) ∈ RE Σ , and (α) + ∈ RE Σ , for every α, β ∈ RE Σ .For any α ∈ RE Σ , let L(α) be the regular language described by the regular expression α defined as usual: 1 for every a ∈ Σ∪{ε}, L(a) = {a}, and for every α, β We also use α * as short hand form for α + ∨ ε.By |α|, we denote the length of α represented as a string.Proof.Let α ∈ RE Σ .We first construct the syntax tree T α of α with node set V Tα .Obviously, T α has size O(|α|) and we can obtain V Tα from α in time O(|α|) (for example, we can transform the expression α to prefix notation and then construct T α while moving through the prefix notation of α from left to right).We now construct an NFA M α = (G = (V, E), {p 0 }, {p f }) from T α as follows (recall that G is a Σ-graph and therefore it should adhere to our representations of Σ-graphs).We first construct an array of size 2|V Tα | that contains the nodes of V = {t 1 , t 2 | t ∈ V Tα } and we initialise empty x-adjacency lists for all these nodes and for every x ∈ Σ ∪ {ε}.Then we move through T α top-down and if the current node t is an inner node with two children r and s, then we do the following: • If t corresponds to a concatenation •, then we add r 1 to the ε-adjacency list of t 1 , we add s 1 to the ε-adjacency list of r 2 , and we add t 2 to the ε-adjacency list of s 2 .• If t corresponds to an alternation ∨, then we add r 1 to the ε-adjacency list of t 1 , we add s 1 to the ε-adjacency list of t 1 , we add t 2 to the ε-adjacency list of r 2 , and we add t 2 to the ε-adjacency list of s 2 .If t is an inner node with one child r (which means it necessarily corresponds to a +), then we add r 1 to the ε-adjacency list of t 1 , we add t 2 to the ε-adjacency list of r 2 , and we add t 1 to the ε-adjacency list of t 2 .If t is a leaf labelled with x ∈ Σ ∪ {ε}, then we add t 2 to the x-adjacency list of t 1 .Finally, if t is the root of T α , we relabel t 1 by p 0 and we relabel t 2 by p f .
It can be easily verified that L(M α ) = L(α).

Name
Input Task RPQ-Boole D, q Decide whether q B (D) = true.RPQ-Test D, q, u, v Decide whether (u, v) ∈ q(D).RPQ-Witness D, q Compute a witness (u, v) ∈ q(D) or report that none exists.RPQ-Eval D, q Compute the whole set q(D) RPQ-Count D, q Compute |q(D)|.RPQ-Enum D, q Enumerate the whole set q(D).
Table 3: The investigated RPQ-evaluation problems (D is a graph database, q an RPQ and u, v two nodes from D).
In the following, when we speak about an automaton (or an NFA) for a regular expression α, we always mean an NFA equivalent to α with the properties asserted by Proposition 2.3.
A Σ-graph without ε-arcs is also called a graph database (over Σ); in the following, we denote graph databases by D = (V D , E D ).Since V D is represented as a list, any graph database implicitly represents a linear order on V D (i. e., the order induced by the list that represents V D ), which we denote by ⪯ D , or simply Slightly abusing notation, we shall also call single graph databases sparse to denote that we are dealing with a graph database from a sparse class of graph databases.
Regular expressions q (over alphabet Σ) are interpreted as regular path queries (RPQ) for graph databases (over Σ).The result q(D) of an RPQ q on a graph database If we interpret q as a Boolean RPQ, then the result is q B (D) = true if q(D) ̸ = ∅ and q B (D) = false otherwise.We consider the RPQ-evaluation problems summarised in Table 3.By sorted RPQ-Enum (or semi-sorted RPQ-Enum), we denote the variant of RPQ-Enum, where the pairs of q(D) are to be enumerated in lexicographical order with respect to ⪯ D (or ordered only with respect to their left elements, while successive pairs with the same right element can be ordered arbitrarily, respectively).
Remark 2.4.If an order ⪯ ′ on V D is explicitly given as a bijection π : V D → {1, . . ., n}, then we can modify D (in O(|V D |)) such that ⪯ D = ⪯ ′ .In this regard, sorted RPQ-Enum just models the case where we wish the enumeration to be sorted according to some order.In particular, by assuming the order ⪯ D to be implicitly represented by D, we do not hide the complexity of sorting n elements.Proof.Let D be a graph database with |V D | = n.We initialise an array A of size n that can be addressed with the elements of V D and the entries of which can store numbers of [n], and an array B of size n that can be addressed with the elements of [n] and the entries of which can store elements of V D .Furthermore, we initialise a counter c = 1.Then we move through the list for V D from left to right and for each element u that we encounter, we set Next, we make a copy D ′ of D but replace each u ∈ V D by the number A[u] (note that D is just a collection of lists that store elements of V D along with pointers to lists).This can be done in time O(| D |).Moreover, D ′ is obviously isomorphic to D and B describes an isomorphism Lemma 2.5 means that with an overhead of time O(| D |), we can always assume that our input graph databases are well-formed; in particular, note that with the isomorphism π ensured by the lemma, we can always translate elements from q(D ′ ), where D ′ is the well-formed graph database isomorphic to D, back to the corresponding elements of q(D).Thus, Lemma 2.5 justifies that whenever we spend at least O(| D |) in some preprocessing, we can always assume that the input graph database is well-formed.

General Algorithmic
Framework for RPQ-Evaluation.We assume the RAMmodel with logarithmic word-size as our computational model.Let us next discuss our algorithmic framework for RPQ-evaluation.The input to our algorithms is a graph database D = (V D , E D ) and an RPQ q (and, for solving the problem RPQ-Test, also a pair (u, v) ∈ V D ).
In the case of RPQ-Enum, the algorithms have routines preprocess and enum.Initially, preprocess performs some preliminary computations on the input or constructs some auxiliary data-structures; the performance of preprocess is measured in its running-time depending on the input size as usual (i.e., we treat preprocess as an individual algorithm).Then enum will produce an enumeration (u 1 , v 1 ), (u 2 , v 2 ), . . ., (u ℓ , v ℓ ) such that q(D) = {(u i , v i ) | 1 ≤ i ≤ ℓ}, no element occurs twice, and the algorithm reports when the enumeration is done.We measure the performance of enum in terms of its delay, which describes the time that (in the worst-case) elapses between enumerating two consecutive elements, between the start of the enumeration and the first element, and between the last element and the end of the enumeration (or between start and end in case that q(D) = ∅).We say that (variants of) RPQ-Enum can be solved with preprocessing p and delay d, where p and d are functions bounding the preprocessing running-time and the delay.In the case that p = O(d), the preprocessing complexity is absorbed by the delay; in this case, we say that (variants of) RPQ-Enum can be solved with delay d and do not mention any bound on the preprocessing.
We also consider RPQ-Enum in the dynamic setting, i. e., there is the possibility to perform update operations on the input graph database D, which trigger a routine update.After an update and termination of the update routine, invoking enum is supposed to enumerate q(D ′ ), where D ′ is the updated graph database.The performance of an algorithm for RPQ-Enum is then measured in the running-times of routines preprocess (to be initially carried out only once) and update, as well as the delay.We only consider the following types of individual updates: inserting a new arc between existing nodes, deleting an arc, adding a new (isolated) node, deleting an (isolated) node.In particular, deleting or adding a single non-isolated node u may require a non-constant number of updates.

The Product Graph Approach
The PG-approach has already been informally described in the introduction; for our finegrained perspective, we need to define it in detail.Let D = (V D , E D ) be a graph database over some alphabet Σ and let q be an RPQ over Σ.Furthermore, let (G q , p 0 , p f ) with G q = (V q , E q ) be an automaton for q.Recall that, according to Proposition 2.3, G q can be obtained in time O(|q|) and it has O(|q|) states and O(|q|) arcs.The product graph of D and Remark 3.1.The arc labels in G ⊠ (D, q) are superfluous in the sense that we only need the underlying graph of G ⊠ (D, q) (see Lemma 3.3).We define it nevertheless as Σ-graph, since then all our definitions and terminology for Σ-graphs introduced above apply as well.
Proof.We first note that |V ⊠ (D, q)| ≤ |V D ||q| directly follows from the definition.Moreover, the following is also immediate by definition: For the question how G ⊠ (D, q) can be computed, we have to keep in mind our list-based implementation of Σ-graphs (see Section 2).For every u ∈ V D and every p ∈ V q , we add (u, p) to the list that stores V ⊠ (D, q).This can be done by moving |V D | times through the list for V q .Thus, time O( In order to construct the adjacency lists, we proceed as follows.Let u ∈ V D , p ∈ V q and x ∈ Σ.Then we add all (v, p ′ ) to the x-adjacency list of (u, p), where v is an element of the x-adjacency list of u, and p ′ is an element of the x-adjacency list of p.Moreover, we add all (u, p ′ ) to the ε-adjacency list of (u, p), where p ′ is an element of the ε-adjacency list of p.This can be done by moving once through the lists of adjacency lists for E D and, for each encountered element, to move through the lists of adjacency lists for E q .Since each insertion to a list can be done in constant time, the whole procedure can be carried out in The following lemma, which is an immediate consequence of the construction, shows how G ⊠ (D, q) can be used for solving RPQ-evaluation tasks (recall that E * is the reflexivetransitive closure of the underlying unlabelled graph).

Fine-grained Complexity and Conditional Lower Bounds
We now state several computational problems along with hypotheses regarding their complexity, which are commonly used in the framework of fine-grained complexity to obtain conditional lower bounds.We discuss some details and give background information later on.
• Orthogonal Vectors (OV): Given sets A, B each containing n Boolean-vectors of dimension d, check whether there are vectors ⃗ a ∈ A and ⃗ b ∈ B that are orthogonal.OV-Hypothesis: For every ϵ > 0 there is no algorithm solving OV in time O(n 2−ϵ poly(d)).
• Boolean Matrix Multiplication (BMM): Given Boolean n×n matrices A, B, compute A×B.
com-BMM-Hypothesis: For every ϵ > 0 there is no combinatorial algorithm that solves BMM in time O(n 3−ϵ ).• Sparse Boolean Matrix Multiplication (SBMM): Like BMM, but all matrices are represented as sets {(i, j) | A[i, j] = 1} of 1-entries.SBMM-Hypothesis: There is no algorithm that solves SBMM in time O(m) (where m is the total number of 1-entries, i. e., m = |{(i, where M⃗ v i is produced as output before ⃗ v i+1 is received as input. OMv-Hypothesis: For every ϵ > 0 there is no algorithm that solves OMv in time O(n 3−ϵ ).We will reduce these problems to variants of RPQ evaluation problems in such a way that algorithms with certain running-times for RPQ evaluation would break the corresponding hypotheses mentioned above.Thus, we obtain lower bounds for RPQ evaluation that are conditional to these hypotheses.In the following, we give a very brief overview of the relevance of these problems and corresponding hypotheses in fine-grained complexity.
The problem OV can be solved by brute-force in time O(n2 d) and the hypothesis that there is no subquadratic algorithm is well-established.It exists in slightly different variants and has been formulated in several different places in the literature (e. g., [Bri14,Bri19,Wil15]).The variant used here is sometimes referred to as moderate dimension OV-hypothesis in contrast to low dimension variants, where d can be assumed to be rather small in comparison to n.The relevance of the OV-hypothesis is due to the fact that it is implied by the Strong Exponential Time Hypothesis (SETH) [Wil04,Wil05], and therefore it is a convenient tool to prove SETH lower bounds that has been applied in various contexts.
One of the most famous computational problems is BMM, which, unfortunately, is a much less suitable basis for conditional lower bounds.The straightforward algorithm solves it in time O(n 3 ), but there are fast matrix multiplication algorithms that run in time O(n 2.373 ) [Wil12b,Gal14].It is unclear how much further this exponent can be decreased and there is even belief that BMM can be solved in time n 2+o(1) (see [Wil12a] and [BGS20, Section 6]).However, these theoretically fast algorithms cannot be considered efficient in a practical sense, which motivates the mathematically informal notion of "combinatorial " algorithms (see, e. g., [WW18]). 2 So far, no truly subcubic combinatorial BMM-algorithm exists and it has been shown in [WW18] that BMM is contained in a class of problems (including other prominent examples like Triangle Finding (also mentioned below) and Context-Free Grammar Parsing) which are all equivalent in the sense that if one such problem is solvable in truly subcubic time by a combinatorial algorithm, then all of them are.Consequently, it is often possible to argue that the existence of a certain combinatorial algorithm for some problem would imply a major (and unlikely) algorithmic breakthrough with respect to BMM, Parsing, Triangle Finding, etc.Despite the defect of relying on the vague notion of combinatorial algorithms, this lower bound technique is a common approach in fine-grained complexity (see, e. g., [WW18, HKNS15, AW14, AWY18, ABW18, HLNW17]).Whenever we use the com-BMM-hypothesis, our reductions will always be combinatorial, which is necessary; moreover, whenever we say that a certain running time cannot be achieved unless the com-BMM-hypothesis fails, we mean, of course, that it cannot be achieved by a combinatorial algorithm.
In order to make BMM suitable as base problem for conditional lower bounds (that does not rely on combinatorial algorithms) one can formulate the weaker (i.e., more plausible) hypothesis that BMM cannot be solved in time linear in the number of 1-entries of the matrices (therefore called sparse BMM since matrices are represented in a sparse way); see [AP09,YZ05].Another approach is to require the output matrix A × B to be computed column by column, i. e., formulating it as the online-version OMv.For OMv, subcubic algorithms are not known and would yield several major algorithmic breakthroughs (see [HKNS15]).
A convenient tool to deal with BMM is the problem Triangle: check whether a given undirected graph G has a triangle.This is due to the fact that these two problems are subcubic equivalent with respect to combinatorial algorithms (see [WW18]), i. e., the com-BMM-hypothesis fails if and only if Triangle can be solved by a combinatorial algorithm in time O(n 3−ϵ ) for some ϵ > 0. Thus, for lower bounds conditional to the com-BMMhypothesis, we can make use of both these problems.There is also a (non-combinatorial) Triangle-hypothesis that states that Triangle cannot be solved in linear time in the number of edges, but we were not able to apply it in the context of RPQ-evaluation (see [AW14] for different variants of Triangle).

Bounds for the Non-Enumeration Problem Variants
We now investigate how well the PG-approach performs with respect to the non-enumeration variants of RPQ-evaluation, and we give some evidence that, in most cases, it can be considered optimal or almost optimal (subject to the algorithmic hypotheses of Section 4).5.1.Boolean Evaluation, Testing and Computing a Witness.It is relatively straightforward to see that the problems RPQ-Test and RPQ-Boole are equivalent and can both be reduced to RPQ-Witness.Hence, upper bounds for RPQ-Witness and lower bounds for RPQ-Test or RPQ-Boole automatically apply to all three problem variants, which simplifies the proofs for such bounds.We shall now formally prove this.
Lemma 5.1.Let (D, q) be an RPQ-Boole-instance.Then we can construct an equivalent RPQ-Test-instance Proof.Let D and q be defined over Σ.We transform D into a graph database D ′ over Σ ∪ {#} (where # / ∈ Σ) by adding new nodes u and v, and new arcs (u, #, x) and (x, #, v) for every x ∈ V D .Moreover, we set q ′ = #q#.This construction can be carried It remains to show that q B (D) = true if and only if (u, v) ∈ q ′ (D ′ ).If q B (D) = true, then there is some (u ′ , v ′ ) ∈ q(D), which means that in D there is a path u ′ , . . ., v ′ that is labelled with a word w ∈ L(q).Since there are arcs (u, #, u ′ ) and (v ′ , #, v) in D ′ , there is a path u, u ′ , . . ., v ′ , v in D ′ that is labelled with #w# ∈ L(q ′ ).Therefore, (u, v) ∈ q ′ (D ′ ).On the other hand, if (u, v) ∈ q ′ (D ′ ), then there is some path u, u ′ , . . ., v ′ , v in D ′ that is labelled with #w# ∈ L(q ′ ), so therefore also (u ′ , v ′ ) ∈ q(D) and q B (D) = true (note that u ′ = v ′ and therefore w = ε is also possible).
Lemma 5.2.Let (D, q, u, v) be an RPQ-Test-instance.Then we can construct an equivalent RPQ-Boole-instance Proof.Let D and q be defined over Σ.We transform D into a graph database D ′ over Σ ∪ {#} (where # / ∈ Σ) by adding new nodes s and t with arcs (s, #, u) and (v, #, t), and we define q ′ = #q#.This construction can be carried It remains to show that (u, v) ∈ q(D) if and only if q ′ B (D ′ ) = true.If (u, v) ∈ q(D), then there is a path from u to v in D that is labelled with a word from L(q).Since there are arcs (s, #, u) and (v, #, t) in D ′ , there is a path in D ′ from s to t labelled with a word from L(#q#); thus (s, t) ∈ q ′ (D ′ ) and therefore q ′ B (D ′ ) = true.On the other hand, if q ′ B (D ′ ) = true, then we can conclude that (s, t) ∈ q ′ (D ′ ), which is due to the fact that q ′ = #q#, q does not contain any occurrence of # and the only arcs labelled with # have source s and target t.This implies that there is a path s, u, . . ., v, t labelled with a word #w# with w ∈ L(q), which implies that there is a path from u to v labelled with a word from L(q), which means that (u, v) ∈ q(D).We are now ready to prove that the PG-approach yields the following upper bound.Proof.We only show the upper bound for RPQ-Witness (due to Theorem 5.3 it applies to RPQ-Test and RPQ-Boole as well).To this end, let D = (V D , E D ) be a graph database over Σ and let q be an RPQ over Σ.We construct G ⊠ (D, q), which, according to Lemma 3.2, can be done in time O(| D ||q|).In the following considerations, we interpret G ⊠ (D, q) as its underlying non-labelled graph.
We add to G ⊠ (D, q) a node v source with an arc to each (u, p 0 ) with u ∈ V D .This can be done in time O(|V D ||q|) as follows.First, we add v source to the list that represents V ⊠ (D, q), which requires constant time.Then, we move through the list that represents V ⊠ (D, q) and every node (u, p 0 ) that we encounter is added to the adjacency list for v source .
Next, we perform a special kind of BFS in v source .First, we construct an array S of size |V ⊠ (D, q)| the entries of which can store values from V ⊠ (D, q) ∪ {0} and which can be addressed by the nodes from V ⊠ (D, q).Moreover, S is initialised with every entry storing 0.Then, for every u ∈ V D , we set S[(u, p 0 )] = u.This can be done in time O(|V ⊠ (D, q)|) = O(| D ||q|).We perform a BFS from v source and whenever we traverse an arc ((u, p), (u ′ , p ′ )), we set S[(u ′ , p ′ )] = S[(u, p)].This can be done in time O(|G ⊠ (D, q)|) = O(| D ||q|).Since S[(u, p 0 )] = u for every u ∈ V D , we can conclude by induction that, for every (v, p) ∈ V ⊠ (D, q), if (v, p) is reachable from some node (u, p 0 ), then S[(v, p)] = u ′ for some u ′ ∈ V D such that (v, p) is reachable from (u ′ , p 0 ), and if (v, p) is not reachable from any node (u, p 0 ), then S[(u, p)] = 0. Consequently, if there is some v ∈ V D with S[(v, p f )] = u ̸ = 0, then there is a path from (u, p 0 ) to (v, p f ) in G ⊠ (D, q), which, according to Lemma 3.3, means that (u, v) ∈ q(D) and therefore, we can produce the output (u, v).On the other hand, if there is no v ∈ V D with S[(v, p f )] ̸ = 0, then there are no u, v ∈ V D with a path from (u, p 0 ) to (v, p f ); thus, q(D) = ∅.Checking whether there is some v ∈ V D with S[(v, p f )] ̸ = 0 can be done in time O(|V D |).
More interestingly, we can complement this upper bound with lower bounds as follows.Proof.We prove the lower bound for RPQ-Test only, since by Theorem 5.3 it also applies to RPQ-Witness and RPQ-Boole.We first devise a general reduction from the OV-problem to RPQ-Test (which is similar to the reduction from [BI16] used for proving conditional lower bounds of regular expression matching): Let A = {⃗ a 1 , . . ., ⃗ a n } and B = { ⃗ b 1 , . . ., ⃗ b n } be an instance for the OV-problem.We define an RPQ q and a graph database D over the alphabet Σ = {0, 1, #} as follows.For every , where V D contains nodes s and t and, for every i ∈ [n], nodes v i,0 , v i,1 , . . ., v i,d .For every i ∈ [n] and j ∈ {0} ∪ [d − 1], there is an arc from v i,j to v i,j+1 labelled with 0 and, if ⃗ a i [j + 1] = 0, also an arc from v i,j to v i,j+1 labelled with 1. Finally, for every i ∈ [n], there are arcs labelled with # from s to v i,0 and from v i,d to t.It can be easily verified that there are orthogonal ⃗ a ∈ A and ⃗ b ∈ B if and only if (s, t) ∈ q(D).We now assume that RPQ-Test can be solved in time O(| D | 2 + |q| 2−ϵ ) for some ϵ > 0. Let again A = {⃗ a 1 , . . ., ⃗ a n } and B = { ⃗ b 1 , . . ., ⃗ b n } be an instance for the OV-problem, and let ϵ ′ be arbitrarily chosen with 0 < ϵ ′ < ϵ.We divide A into A 1 , A 2 , . . ., A ⌈n ϵ ′ ⌉ with |A i | = ⌈n 1−ϵ ′ ⌉ for every i ∈ [⌈n ϵ ′ ⌉] and such that ⌈n ϵ ′ ⌉ i=1 A i = A. Obviously, (A, B) is a positive OV-instance if and only if at least one of (A 1 , B), (A 2 , B), . . ., (A ⌈n ϵ ′ ⌉ , B) is a positive OV-instance.We can now separately reduce each (A i , B) to an RPQ-Test-instance (D i , q) as described above and we note that , where (ϵ − ϵ ′ ) > 0. This contradicts the OV-hypothesis.
We assume that RPQ-Boole can be solved by a combinatorial algorithm in time O(|V D | 3−ϵ + |q| 3−ϵ ) for some ϵ > 0. We can then solve Triangle on some instance G = (V, E) with V = {v 1 , v 2 , . . ., v n } as follows (the result will then follow from the combinatorial subcubic equivalence of BMM and Triangle (see Section 4)).We construct the graph database D over {a, #} with nodes and arcs Furthermore, we define the RPQ q = #a n+4 #.
It can be easily seen that any path from s ′ to t ′ is an i-j-path for some i, j ∈ [n], and that any i-j-path is labelled with #a i a 3 a n−j+1 # = #a i+n−j+4 #.Hence, q(D) ̸ = ∅ if and only if there is an i-i-path for some i ∈ [n].Since, for every i ∈ [n], there is an i-i-path if and only if there is a path (v i,0 , v ℓ 1 ,1 , v ℓ 2 ,2 , v i,3 ), we see that, for every i ∈ [n], there is an i-i-path if and only if G has triangle that contains v i .Consequently, q(D) ̸ = ∅ if and only if G has a triangle.
This means that by first constructing D and q in time O(|D| + |q|) = O(|G| + |V |) and then solving RPQ-Boole for instance (D, q) in the assumed time O(|V D | 3−ϵ + |q| 3−ϵ ) for some ϵ > 0, means that we can solve Triangle by a combinatorial algorithm in time With Lemma 5.3, we conclude that the assumptions that RPQ-Test or RPQ-Witness can be solved in time O(|V D | 3−ϵ + |q| 3−ϵ ) for some ϵ > 0, leads to a combinatorial O(|V | 3−ϵ ) algorithm for Triangle as well.Thus, due to the combinatorial subcubic equivalence of BMM and Triangle (see Section 4), there is a combinatorial O(n 3−ϵ ) algorithm for BMM, contradicting the com-BMM-hypothesis.
, such running-times are also ruled out under the com-BMM-hypothesis.Especially, a combinatorial algorithm with running-time O((| D ||q|) 1−ϵ ) refutes both the OVand the com-BMM-hypothesis; thus, such an algorithm does not exist provided that at least one of these hypotheses is true (basing lower bounds on several hypotheses is common in fine-grained complexity, see, e. g., [AWY18]).
The lower bounds discussed above are only meaningful for combined complexity.However, the upper bound of Theorem 5.4 already yields the optimum of linear data complexity.

Full Evaluation and Counting.
The following upper bound is again a straightforward application of the PG-approach.Proof.Let D = (V D , E D ) be a graph database over Σ and let q be an RPQ over Σ.We construct G ⊠ (D, q), which, according to Lemma 3.2, can be done in time O(| D ||q|).We interpret G ⊠ (D, q) as its underlying non-labelled graph.According to Lemma 3.3, we can now compute q(D) by performing a BFS from each node (u, Instead of using graph-searching techniques on G ⊠ (D, q), we could also compute the complete transitive closure of G ⊠ (D, q) with fast matrix multiplication.Proof.We assume that BMM can be solved in time O(n ω ).Let D = (V D , E D ) be a graph database over Σ and let q be an RPQ over Σ.We construct G ⊠ (D, q), which, according to Lemma 3.2, can be done in time O(| D ||q|), and we interpret it as its underlying nonlabelled graph.Obviously, we can turn the adjacency list-based representation of G ⊠ (D, q) into an adjacency matrix-based representation in time O(|V ⊠ (D, q)| 2 ) = O((|V D ||q|) 2 ) = O((|V D ||q|) ω ).Then, we compute the transitive closure (E ⊠ (D, q)) * of G ⊠ (D, q), which can be done in time O(|V ⊠ (D, q)| ω ) = O((|V D ||q|) ω ) (see [Mun71]).In order to obtain q(D), it is sufficient to go through all elements ((u, p), (v, p ′ )) ∈ (E ⊠ (D, q)) * and add (u, v) to a new set if and only if p = p 0 and p We mention this theoretical upper bound for completeness, but stress the fact that our main interest lies in combinatorial algorithms.In addition to the limitation that algorithms for fast matrix multiplication are not practical, we also observe that the approach of Theorem 5.8 is only better if the graph database is not too sparse, i. e., only if Next, we investigate the question whether O(|V D || D ||q|) is optimal for RPQ-Eval at least with respect to combinatorial algorithms.Since for RPQ-Eval the PG-approach does not yield an algorithm that is linear in data complexity (like it was the case with respect to the problems of Section 5.1), the question arises whether the |V D || D | part can be improved at the cost of spending more time in |q|.It seems necessary that respective data complexity lower bounds need reductions that do not use q to represent a non-constant part of the instance, as it was the case for both the OV and the Triangle reduction from Section 5.1.
It is not difficult to see that multiplying two n×n Boolean matrices reduces to RPQ-Eval as formally stated by the next lemma.Proof.We assume that RPQ-Eval can be solved in time O(|V D | ω f (|q|)) for some function f .Let A and B be n × n Boolean matrices.Then we construct the graph database D A,B over {a} with V D A,B = {(i, 0), (i, 1), (i, 2) | i ∈ [n]} and E D A,B = {((i, 0), a, (j, 1)) | A[i, j] = 1} ∪ {((i, 1), a, (j, 2)) | B[i, j] = 1}, and the RPQ q = aa.Obviously, q(D A,B ) ⊆ {((i, 0), (j, 2)) | i, j ∈ [n]} and ((i, 0), (j, 2)) ∈ q(D A,B ) if and only if (A × B)[i, j] = 1.Hence, we can construct A × B from q(D A,B ) in time O(|q(D A,B )|) = O(n 2 ).By assumption, the set q(D) From this, we can conclude the following lower bound.Proof.We assume that RPQ-Eval can be solved in time O((|q(D)| + | D |) ω f (|q|)) for some function f .Let A and B be n × n Boolean matrices given as sets of their 1-entries and let m be the total number of 1-entries in A, B and A × B. Then we can construct a graph database D A,B over {a} as follows.For every i ∈ [n], the set V D A,B contains a node (i, 0) if the i th row of A contains at least one 1-entry, a node (i, 1) if the i th column of A contains at least one 1-entry or the i th row of B contains at least one 1-entry, and a node (i, 2) if the i th column of B contains at least one 1-entry.The set of arcs is defined by and D A,B can be constructed in time O(m).Further, we define q = aa.
We can observe that q(D A,B ) ⊆ {((i, 0), (j, 2) By assumption, we can compute q(D A,B ) in time O((|q ) for some function f and ω ≥ 1.This implies that we can obtain a set of exactly the 1-entries of Next, we construct D A ′ ,B ′ as in the proof of Lemma 5.9, and we note that

Bounds for the Enumeration of RPQs
By using the PG-approach for enumeration, we can obtain the following upper bound.Proof.Let D be a graph database over Σ, let q be an RPQ over Σ and let ⪯ be the order on V D .We assume that V D = [n] with 1 ⪯ 2 ⪯ . . .⪯ n (see Lemma 2.5).

Preprocessing:
(1) We compute G ⊠ (D, q), which, according to Lemma 3.2, can be done in time O(| D ||q|), and we interpret G ⊠ (D, q) as its underlying non-labelled graph.(2) We construct two arrays S and T of size and T [i] = (i, p f ).Note that this also means that pointers to the corresponding adjacency lists are stored along with the nodes in S and T .Computing S and T can be done in time O(|V D ||q|) as follows.We move through the list that represents V ⊠ (D, q) and for every node (i, p 0 ), we set S[i] = (i, p 0 ), and for every node (i, p f ), we set T [i] = (i, p f ).(3) We modify G ⊠ (D, q) by adding a new node v sink with an arc from each (i, p f ) with i ∈ [n].This can be done in time O(|V D |) as follows.We first add v sink to the list that represents V ⊠ (D, q), which requires constant time.Then, for every node (i, p f ) of T , we add v sink to the adjacency list for (i, p f ).Again, this can be done in time O(|V D |).
(4) We obtain (G ⊠ (D, q)) R , which, by Lemma 2.2, can be done in time O(|G ⊠ (D, q)|) = O(| D ||q|).Then we initialise a Boolean array S ′ of size |V D | with entries 0 everywhere.We then perform a BFS in (G ⊠ (D, q)) R starting in v sink and, for every visited node (i, p 0 ), we set S ′ [i] = 1.For every i ∈ V D , S ′ [i] = 1 if and only if v sink and therefore some node from T can be reached from (i, p 0 ).This step can be done in time O(| D ||q|).(5) We initialise a Boolean array T ′ of size |V D | with entries 0 everywhere, which can be done in time O(|V D |).Enumeration: In the enumeration phase, we carry out the following procedure.
• For every i = 1, 2, . . ., n: -If S ′ [i] = 1, then * perform a BFS in (i, p 0 ) and for every (j, p f ) that we visit, we set T ′ [j] = 1, * for every j = 1, 2, . . ., n: produce (i, j) as output if T ′ [j] = 1, * Set all entries of T ′ to 0. Correctness: In the enumeration phase we perform a BFS from each (i, p 0 ) that can reach at least one node (j, p f ) (i. e., from each (i, p 0 ) with S ′ [i] = 1) and after termination of this BFS, we output exactly the pairs (i, j) for which (j, p f ) is visited in this BFS.This directly shows that we correctly enumerate q(D) without repetitions.Furthermore, since we consider the start vertices for the BFSs in increasing order with respect to ⪯, and since after termination of a BFS started in (i, p 0 ) we output the pairs (i, j) in increasing order by the second element, the enumeration is sorted by lexicographic order.
Next, we estimate the delay of the enumeration.Each iteration of the main loop with S ′ i] = 0, then the iteration terminates after constant time.Moreover, since S ′ [i] = 1 implies that i can reach v sink and since v sink can only be reached via some (j, p f ), we produce at least one output in such an iteration.Consequently, the delay between two outputs is bounded by the total running time for one iteration of the main loop, which is O(| D ||q|).
Updates: If D is changed to D ′ by an update, then we can again perform the whole preprocessing (with respect to D ′ ) followed by the enumeration procedure.Technically, the preprocessing is done as a first step of the enumeration procedure (since our algorithmic framework does not allow to re-run the preprocessing after an update), which is possible since its running time of O(| D ′ ||q|) is completely subsumed by the time available for the first delay.This enumeration algorithm is easy to implement and has some nice features like linear preprocessing (in data complexity), sorted enumeration and constant updates.Unfortunately, these features come more or less for free with the disappointing delay bound.Is the PGapproach therefore the wrong tool for RPQ-enumeration?Or can we give evidence that linear delay is a barrier we cannot break?The rest of this work is devoted to this question.
Since running an algorithm for RPQ-Enum until we get the first element yields an algorithm for RPQ-Boole (with preprocessing plus delay as running time), and since running such an algorithm completely solves RPQ-Eval in time preprocessing plus |q(D)| times delay, we can inherit several lower bounds directly from Section 5. since Question 6.4 is about enumeration, a positive answer also means that after O(|G|) preprocessing, we can enumerate the transitive closure of a graph with delay O(|V |), and that after O(n 2 ) preprocessing, all 1-entries of the Boolean matrix multiplication can be enumerated with delay O(n).Are such enumeration algorithms unlikely, so that we should rather expect a negative answer to Question 6.4?In fact not, since for the simple RPQs q = a * or q = aa, which are sufficient to encode transitive closures and Boolean matrix multiplications, linear preprocessing and delay O(|V D |) is indeed possible, as we shall obtain as byproducts of the results in the next section.
We close this section by the following remark that points out some similarity (and differences) of reductions used in Sections 5 and 6.Remark 6.5.As already mentioned in Section 5.1, the reduction used for the combined complexity lower bounds of Theorems 5.5 (and therefore Point 1 of Theorem 6.2) is similar to the reduction from [BI16] used for proving conditional lower bounds of regular expression matching.Also note that [EMT21] improves the bound of [BI16] by showing that subquadratic algorithms for finding a path in a graph labelled by a given pattern string are most likely impossible, even if we are allowed to build indexes in polynomial time.The paper [EGMT19] strengthens the bound of [BI16] by restricting the structure of the graphs.Moreover, the OV-lower bound for RPQ-Count of Theorem 5.13 is similiar to a lower bound on counting the results of certain conjunctive queries from [BKS17], and the OMv-lower bound from Point 1 of Theorem 6.3 is similiar to a lower bound on enumerating certain conjunctive queries from [BKS17].
The quite simple observation that Boolean matrix multiplication can be expressed as querying a bipartite graph (with conjunctive queries) has also been used in [BDG07] (see also [BGS20,Section 6]) and is also the base for the OMv-lower bound of [BKS17].In the context of this paper, this connection has been used in the bounds of Theorems 5.10, 5.12 and Points 3 and 4 of 6.2.
The obvious connection between evaluating (non-acyclic) conjunctive queries and finding triangles (or larger cliques) has already been observed in [Bra13] (see also [BGS20,Section 6]).However, the Triangle-lower bounds of this paper are quite different, since RPQs cannot explicitly express the structure of a triangle (or a larger clique, for that matter) by using conjunction.Therefore, our respective lower bounds (Theorems 5.6, Point 2 of Theorem 6.2, and Point 2 of Theorem 6.3) need to encode this aspect in a different way.With respect to Theorems 5.6 and Point 2 of Theorem 6.2 this is done by using non-constant queries (which explains why the lower bounds are not for data complexity, in contrast to the case of conjunctive queries), and with respect to Point 2 of Theorem 6.3, which does work for data complexity, it is done by using updates.Moreover, our Triangle-lower bounds do not seem to extend to larger cliques like it is the case for conjunctive queries (see [BGS20,Section 6]).
Finally, we wish to point out that although some of the reductions used in this paper are similar to reductions used in the context of conjunctive queries, due to the difference of RPQs and CQs, none of the lower bounds directly carry over.Furthermore, note that the lower bound reductions in [BDG07,BKS17] have been used for obtaining dichotomies and therefore have been stated in a much more general way.
(4) Initialise for each node j in H R an empty AVL tree A j (or any search tree that allows the operations insert, delete, lookup, delete-min and delete-max in worst-case time that is logarithmic in its size), and store pointers to the roots of these trees and a counter for their size in an array A of size ℓ, such that finding the tree associated to a specific node can be done in constant time.For every i ∈ [n], insert i into the tree A L[(i,p f )] , and at each point where the size of A L[(i,p f )] exceeds ∆(D)|q| during these insertions, delete the node of largest index from A L[(i,p f )] .(5) For j = 1, . . ., ℓ go through the adjacency list of node j in H R .For each arc (j, j ′ ) in this list, insert one by one all entries from A j into A j ′ (first lookup to avoid duplicates).Update also the size counter of the tree A j ′ accordingly, and at each point where this size exceeds ∆(D)|q| during these insertions, delete the node of largest index from A j ′ .This ensures that A j ′ is kept at a size at most ∆(D)|q|.(6) Initialise a Boolean array S of size ℓ with entries 0 everywhere.Create for each j ∈ [ℓ] a sorted array Z j of all entries in A j by successively extracting and deleting the smallest entry.If Z j is not empty, set S[j] = 1.(7) Initialise a Boolean array T of size |V D | with entries 0 everywhere.Enumeration: In the enumeration phase, we carry out the following procedure.
• For every i = 1, 2, . . ., n: - )] has size less than ∆(D)|q|, output all (i, j) for all j stored in Z i (in the sorted order).* If Z L[(i,q 0 )] has size ∆(D)|q| perform a BFS in G ⊠ (D, q) starting at node (i, p 0 ) and for every visited (i ′ , p f ), set T [i ′ ] = 1.Output during this BFS, all pairs (i, i ′ ) for all i ′ stored in Z L[(i,q 0 )] (in the sorted order) to keep the desired delay, i.e., start with k = 1 and whenever the delay of |V D ||q| has expired output (i, Step 6, we have to copy, for every j ∈ [ℓ], the elements from A j to Z j , which requires a total of ℓ ∆(D)|q| search tree operations, so time O(|G ⊠ (D, q)| ∆(D)|q| log(∆(D)|q|)) is sufficient (note that the time needed for initialising and filling S is clearly subsumed by this).These considerations show that the procedure from above will enumerate q(D).In particular, note that if we enter Step 5 with X being empty, then we do not produce an output at this point, but, due to Step 2, we will nevertheless completely enumerate q(D).Since we synchronise the enumerations of the A j with respect to their phases, the enumeration of q(D) produces all pairs from q(D) with left element i, then all pairs with left element i ′ > i and so on; thus, the enumeration of q(D) is semi-ordered.Furthermore, the book keeping done in sets X and Y guarantees that we do not produce duplicates.It only remains to analyse the delay of this enumeration procedure. We However, in Steps 2a and 5, we also have to retrieve some element from X.In order to do this efficiently (and not by moving through the array from left to right to find some elements, which requires time O(n)), we also store the elements of X as an unsorted list (which is initialised in the preprocessing).This means that we can always obtain some element of X in constant time (by just retrieving the first list element).Keeping the array and the list for X synchronised is no problem: Whenever we add some i to X (Step 4), we set X[i] = 1 and add x at the end of the list for X; whenever we want to retrieve some element from X (Steps 2a and 5), we retrieve and remove the first list element, say i, and then we set X[i] = 0. Consequently, all operations with respect to the lists X and Y can be performed in constant time.
We estimate the running-time for each of the separate steps of an iteration.Obviously, if we can never reach the situation that X = ∅ in Step 5, then in each iteration at least one pair is produced.Therefore, it is sufficient to prove this property.
Let us assume that the enumeration procedure has just finished Step 3 and we are in some iteration of phase i, i. e., c = i.Moreover, we assume that so far, we have not encountered the situation that X = ∅ in Step 5.For every j ∈ [m], let a j = R[j] if L[j] = c where a j = ⊥ indicates the situation that element a j does not exist.This means that all existing elements a j with j ∈ [m] are exactly those elements that have most recently been produced in phases i of the enumeration procedures from the algorithms A j (this can have happened in Step 1 of the same iteration or, if this is the first iteration of phase i, also in applications of Step 1 in previous iterations).In particular, the existing elements a j with j ∈ [m] have not yet been handeled in the sense of Step 4, i. e., we have not yet checked whether they have already been produced as output and, if not, have added them to X.We can also note that there must be at least one j ∈ [m] with a j ̸ = ⊥, since otherwise c < min{L[i] | i ∈ [m]}.
In addition to these elements a j , for every j ∈ [m], let b j,1 , b j,2 , . . ., b j,ℓ j (note that ℓ j = 0 is possible) be exactly the elements already produced in phase i of the enumeration procedure of A j in some previous iterations.In other words, for every j ∈ [m], we have requested in applications of Step 1 exactly the elements b j,1 , b j,2 , . . ., b j,ℓ j , a j from phase i of the enumeration procedure of A j (note that a j = ⊥ is possible, which means that a j does not exist and therefore has not been requested).Moreover, this has happened in ℓ = max{ℓ j | j ∈ [m]} previous (i.e., not counting the current one) iterations of the main loop of the enumeration procedure of algorithm A. In particular, this means that we have in phase i so far only produced ℓ pairs with i as left element.
Let K = {b j,p | j ∈ [m], p ∈ [ℓ j ]} and let M = {a j | j ∈ [m]}.Since ℓ = max{ℓ j | j ∈ [m]}, there is at least one j ′ ∈ [m] with ℓ j ′ = ℓ and therefore |{b j ′ ,1 , b j ′ ,2 , . . ., b j ′ ,ℓ j ′ }| = ℓ.This is true since the elements b j ′ ,1 , b j ′ ,2 , . . ., b j ′ ,ℓ j ′ are part of the i th phase of the enumeration of A j ′ and therefore must be distinct.Thus, |K| ≥ ℓ.Furthermore, we can choose j ′ such that a j ′ ̸ = ⊥.This is the case since we have a j = ⊥ if and only if ℓ j < ℓ (i.e., phase i of the enumeration of A j has already terminated) and, as observed above, there must be at least one j ∈ [m] with a j ̸ = ⊥.
For every b ∈ K, either (i, b) has been produced as output, or b ∈ X.Since so far we have only produced ℓ pairs as output, this directly implies that if |K| > ℓ, then X ̸ = ∅, which means that we reach Step 5 with X ̸ = ∅.If, on the other hand, |K| = ℓ, then K = {b j ′ ,1 , b j ′ ,2 , . . ., b j ′ ,ℓ j ′ }, which also means that a j ′ / ∈ K since b j ′ ,1 , b j ′ ,2 , . . ., b j ′ ,ℓ j ′ , a j ′ is an enumeration of distinct elements provided by algorithm A j ′ (recall that a j ′ ̸ = ⊥, as observed above).Consequently, we also have a j ′ / ∈ Y and therefore a j ′ is added to X in Step 4. Hence, we reach Step 5 with X ̸ = ∅.Now, we give upper bounds for Enum(BT-RPQ) and Enum(S-RPQ) separately.
Proof.Let D be a graph database, let q = (x 1 ∨ x 2 ∨ . . .∨ x k ) * and q ′ = (x 1 ∨ x 2 ∨ . . .∨ x k ) + , where x 1 , x 2 , . . ., x k ∈ Σ.It can be easily seen that enumerating q(D) or q ′ (D) is the same as enumerating the reflexive-transitive closure (E D ′ ) * or the transitive closure (E D ′ ) + , where D ′ is obtained from D by deleting all x-adjacency lists with x / ∈ {x 1 , x 2 , . . ., x k }.In [CFNS20], it is shown for general directed graphs G = (V, E) how to enumerate E * and E + sorted by first coordinate (denoted by "row-wise") with delay O(∆(G)).This approach translates to a semi-sorted enumeration and can be used on D in such a way that all x-adjacency lists with x / ∈ {x 1 , x 2 , . . ., x k } are ignored (so without preprocessing).Thus, by using this procedure on D, we can enumerate q(D) and q ′ (D) semi-sorted with delay O(∆(D)).information is already enough).Finally, in our third approach, we identify a class of RPQs that can be enumerated with linear preprocessing and delay O(∆(D)) (see Theorem 7.10). 4 This result points out that the simplicity of the regular expression of the query might be exploited to achieve a better delay.This is particularly interesting given the fact that very simple RPQs are already sufficient for conditional lower bound reductions with respect to data complexity.Moreover, empirical work has shown that in practical scenarios where regular expressions are used as means of querying graph databases it is often the case that the regular expressions are rather simple.In case that Question 6.4 can be answered in the negative, and if enumeration algorithms with sublinear delay are of high relevance, we should concentrate on algorithms that only work for a special and simple class of RPQs.
A possible future research task with respect to the topic of this paper is to answer Question 6.4.We conjecture that an algorithm that answers the question in the affirmative will be non-trivial and likely to yield more general algorithmic insights with respect to querying graphs with regular expressions.If, by a conditional lower bound, it can be shown that the answer to the question is negative, then the question arises for which RPQs a sublinear delay is possible and for which it is (conditionally) not.Our Theorem 7.10 constitutes a partial result in this regard.
By construction |V | = O(|T α |) = O(|α|) and, since G has constant degree, we also have that |G| = O(|V |) = O(|α|).Moreover, in the construction we spend constant time per arc that is added, so the whole construction of M α can be done in time O(|α|).

Lemma 2. 5 .
Let D be a graph database with |V D | = n.Then we can construct in time O(| D |) a well-formed graph database D ′ and an isomorphism π : [n] → V D between D ′ and D.
Moreover, | D | = O(|A|d) = O(nd) and |q| = O(|B|d) = O(nd), and, furthermore, D and q can also be constructed in time O(nd), and, since |E D | = O(|V D |), D is a sparse graph database.
Lemma 5.9.If RPQ-Eval can be solved in time O(|V D | ω f (|q|)) for some function f and ω ≥ 2, then BMM can be solved in time O(n ω ).
Theorem 5.10.If RPQ-Eval can be solved in time O((|V D || D |) 1−ϵ f (|q|)) for some function f and some ϵ > 0, then the com-BMM-hypothesis fails.Proof.If RPQ-Eval can be solved by a combinatorial algorithm with a running time in O((|V D || D |) 1−ϵ f (|q|)) for some function f and some ϵ > 0, then, since we have that O((|V D || D |) 1−ϵ f (|q|)) = O(|V D | 3−ϵ f (|q|)), Lemma 5.9 implies that BMM can be solved in time O(n 3−ϵ ).Thus, the com-BMM-hypothesis fails.If we drop the restriction to combinatorial algorithms, we can nevertheless show (with more or less the same construction) that linear time in data complexity is impossible, unless the SBMM-hypothesis fails.However, since the size of the output q(D) might be super-linear in | D |, we should interpret linear as linear in | D | + |q(D)|.Lemma 5.11.If RPQ-Eval can be solved in time O((|q(D)| + | D |) ω f (|q|)) for some function f and ω ≥ 1, then SBMM can be solved in time O(m ω ).

Lemma 5 .
11 directly implies the following lower bound.Theorem 5.12.If RPQ-Eval can be solved in time O((|q(D)| + | D |)f (|q|)) for some function f , then the SBMM-hypothesis fails.Surprisingly, we can obtain a more complete picture for the problem RPQ-Count.First, we observe that obviously all upper bounds carry over from RPQ-Eval to RPQ-Count.On the other hand, a combinatorial O((|V D || D |) 1−ϵ f (q)) algorithm or a general O((|q(D)| + | D |)f (|q|)) algorithm for RPQ-Count does not seem to help for solving Boolean matrix multiplication (and therefore, the lower bounds do not carry over).Fortunately, it turns out that OV is a suitable problem to reduce to RPQ-Count, although by a rather different reduction compared to the one used for Theorem 5.5.Theorem 5.13.If RPQ-Count can be solved in time O(| D | 2−ϵ f (|q|)) for some function f and ϵ > 0, then the OV-hypothesis fails.Proof.We assume that RPQ-Count can be solved in time O(| D | 2−ϵ f (|q|)) for some function f and ϵ > 0. Let A = {⃗ a 1 , ⃗ a 2 , . . ., ⃗ a n } and B = { ⃗ b 1 , ⃗ b 2 , . . ., ⃗ b n } be an OV-instance, i. e., for every i ∈ [n], ⃗ a i and ⃗ b i are d-dimensional Boolean vectors.Now let A ′ be the Boolean matrix having rows ⃗ a 1 , ⃗ a 2 , . . ., ⃗ a n and let B ′ be the Boolean matrix having columns ⃗ b 1 , ⃗ b 2 , . . ., ⃗ b n .It can be easily seen that, for every i, j ∈ [n], (A ′ × B ′ )[i, j] = 0 if and only if ⃗ a i and ⃗ b j are orthogonal.Moreover, A ′ and B ′ can be constructed in time O(nd).
and only if there are no ⃗ a ∈ A and ⃗ b ∈ B that are orthogonal.Consequently, we can check whether there are no ⃗ a ∈ A and ⃗ b ∈ B that are orthogonal by computing |q(D A,B )|, which, by assumption, can be done in time O

Since
Theorem 5.13 also excludes running time O((|V D || D |) 1−ϵ f (|q|)) for any function f and ϵ > 0 (without restriction to combinatorial algorithms), it also shows that, subject to the OV-hypothesis, O(|V D || D |) is a tight bound for the data complexity of RPQ-Count.
Running time: We first note that Steps 1 to 3 can be done in time O(|G ⊠ (D, q)|) = O(| D ||q|): Lemma 3.2 ensures that G ⊠ (D, q) can be computed in O(| D ||q|), Lemma 2.2 ensures that H R can be computed in O(|H|) = O(| D ||q|), and it can be easily seen that in order to construct H, we only have to move through G ⊠ (D, q) a constant number of times; thus, time O(|G ⊠ (D, q)|) = O(| D ||q|) is sufficient.Since the search trees never exceed a size of ∆(D)|q|, each operation supported by the search trees can be carried out in time O(log(∆(D)|q|)).In Step 4, we first initialise the search trees, which can be done in O(|V H |) = O(|V ⊠ (D, q)|).Then, adding i to the search tree A L[(i,p f )] for every i ∈ [n] can be done in time O(|V D | log(∆(D)|q|)).In Step 5, we have to perform O(∆(D)|q|) search tree operations per arc of H, i. e., we need time O(|E H | ∆(D)|q| log(∆(D)|q|)) = O(|G ⊠ (D, q)| ∆(D)|q| log(∆(D)|q|)).
Step 1 requires time O( m j=1 d(| D |, |q j |)) = O(md(| D |, |q|)).The total running time of Steps 2 to 2c is O(m + k), where k is the number of pairs (c, v) produced in Step 2a and O(m) is needed to compute min{L[i] | i ∈ [m]}.Note that the pairs that are produced in Step 2 are output with constant delay in Step 2a and pay for the running time dependence on k, hence in the worst case, k = 0 where Step 2 requires time in O(m).Step 4 requires time O(m).All other steps can be carried out in constant time.This means that if in each iteration at least one pair is produced by Step 6, then the delay of the whole enumeration procedure of algorithm A is O(md(| D |, |q|)).

Table 2 :
All upper bounds can be achieved as running times of some algorithm, while the lower bounds cannot be achieved as running time by any algorithm, unless the displayed hypothesis fails.The exponent ω denotes the best known matrix multiplication exponent.
1.1.3.Enumeration of Restricted Variants.Finally, we obtain restricted problem variants that can be solved with delay strictly better than O(| D |) (in data complexity).We explore three different approaches:(1) by allowing super-linear preprocessing of O dc first observe that Step 2b, i. e., setting Y = ∅, is problematic since it requires time O(n) (we have to set Y [i] = 0 for every i ∈ [n]).Therefore, we implement the array Y as follows.Instead of letting it be Boolean, we assume that it can store elements from[n] ∪ {0}.The idea is that Y [i] = 0 means that i /∈ Y (just as for the Boolean case), whileY [i] = j with j ∈ [n] means i ∈ Y inthe case that we are currently in phase j, i. e., c = j, and i / ∈ Y otherwise.With this interpretation, Step 2b is not necessary anymore, since setting c = min{L[i] | i ∈ [m]} in Step 2c has the same effect as erasing all elements from Y .Consequently, we can ignore Step 2b altogether (or rather interpret as a mere comment in the pseudo code above to indicate what is happening at Step 2b).In particular, we note that with this implementation of Y , we can still check for both X and Y whether they contain a specific element, and we can add or erase specific elements in constant time (adding i to Y in phase c just means to set Y