Robustness against Read Committed for Transaction Templates with Functional Constraints

The popular isolation level Multiversion Read Committed (RC) trades some of the strong guarantees of serializability for increased transaction throughput. Sometimes, transaction workloads can be safely executed under RC obtaining serializability at the lower cost of RC. Such workloads are said to be robust against RC. Previous work has yielded a tractable procedure for deciding robustness against RC for workloads generated by transaction programs modeled as transaction templates. An important insight of that work is that, by more accurately modeling transaction programs, we are able to recognize larger sets of workloads as robust. In this work, we increase the modeling power of transaction templates by extending them with functional constraints, which are useful for capturing data dependencies like foreign keys. We show that the incorporation of functional constraints can identify more workloads as robust that otherwise would not be. Even though we establish that the robustness problem becomes undecidable in its most general form, we show that various restrictions on functional constraints lead to decidable and even tractable fragments that can be used to model and test for robustness against RC for realistic scenarios.


Introduction
Many database systems implement several isolation levels, allowing users to trade isolation guarantees for improved performance.The highest, serializability, projects the appearance of a complete absence of concurrency, and thus perfect isolation.Executing transactions concurrently under weaker isolation levels can introduce certain anomalies.Sometimes, a transactional workload can be executed at an isolation level lower than serializability without introducing any anomalies.This is a desirable scenario: a lower isolation level, usually implementable with a cheaper concurrency control algorithm, yields the stronger isolation guarantees of serializability for free.This formal property is called robustness [Fek05,BG16]: a set of transactions T is called robust against a given isolation level if every possible interleaving of the transactions in T that is allowed under the specified isolation level is serializable.
Robustness received quite a bit of attention in the literature.Most existing work focuses on Snapshot Isolation (SI) [ACFR08, BBE19a, Fek05, FLO + 05] or higher isolation levels [BBE19b,BG16,CBG15,CGY17]. It is particularly interesting to consider robustness against lower level isolation levels like multi-version Read Committed (referred to as RC from now on).Indeed, RC is widely available, often the default in database systems (see, e.g., [BDF + 13]), and is generally expected to have better throughput than stronger isolation levels.
In previous work [VKKN21], we provided a tractable decision procedure for robustness against RC for workloads generated by transaction programs modeled as transaction templates.The approach is centered on a novel characterization of robustness against RC in the spirit of [Fek05,KKNV20] that improves over the sufficient condition presented in [AF15], and on a formalization of transaction programs, called transaction templates, facilitating fine-grained reasoning for robustness against RC.Conceptually, transaction templates as introduced in [VKKN21] are functions with parameters, and can, for instance, be derived from stored procedures inside a database system (c.f. Figure 1 for an example).The abstraction generalizes transactions as usually studied in concurrency control research -sequences of read and write operations -by making the objects worked on variable, determined by input parameters.Such parameters are typed to add additional power to the analysis.They support atomic updates (that is, a read followed by a write of the same database object, to make a relative change to its value).Furthermore, database objects read and written are considered at the granularity of fields, rather than just entire tuples, decoupling conflicts further and allowing to recognize additional cases that would not be recognizable as robust on the tuple level.
An important insight obtained from [VKKN21] is that more accurate modeling of the workload allows to recognize larger sets of transaction programs as robust.Processing workloads under RC increases the throughput of the transactional database system compared to when executing the workload under SI or serializable SI, so larger robust sets mean better performance of the database system.In this work, we increase the modeling power of transaction templates by extending them with functional constraints, which are useful for capturing data dependencies like foreign keys (inclusion dependencies).This appears to be a sweet spot for strengthening modelling power -as we show in this paper, it allows us to remain with abstractions that have been well established within database theory, without having to move to general program analysis, and it pushes the robustness frontier on popular transaction processing benchmarks.Generally speaking, workloads can profit more from richer modelling the larger and more complex they get, so the fact that adding functional constraints yields larger robust sets already on these simple benchmarks suggests that these techniques are practically useful.Our contributions can be summarized as follows: • We argue in Section 2 through the SmallBank and TPC-C benchmarks that the incorporation of functional constraints can identify more workloads as robust that otherwise would not be, and that they reduce the extent to which changes need to be made to workloads to make them robust against RC.• In Section 4, we establish that robustness in its most general form becomes undecidable.
The proof is a reduction from PCP and relies on cyclic dependencies between functions allowing to connect data values through an unbounded application of functions.
• We consider a fragment in Section 5 that only allows a very limited form of cyclic dependencies between functions and assumes additional constraints on templates that, together, imply that functions behave as bijections.Robustness against RC can be decided in nlogspace and this fragment is general enough to model the SmallBank benchmark.• In Section 6, we obtain an expspace decision procedure when the schema graph is acyclic (so, no cyclic dependencies between functions).Even for small input sizes, such a result is not practical.We provide various restrictions that lower the complexity to pspace and exptime, and which allow to model the TPC-C benchmark as discussed.Notice that, for robustness testing, an exponential time decision procedure is considered to be practical as the size of the input is small and robustness is a static property that can be tested offline.
These contributions should be contrasted with our earlier work [VKKN21], where we focused on a characterization for robustness against RC for basic transaction templates without functional constraints and performed an experimental study to show how the robustness property can improve transaction throughput.

Benchmarks
We present a small extension of the SmallBank benchmark [ACFR08] to exemplify the modeling power of transaction templates and discuss how the addition of functional constraints can detect larger sets of transaction templates to be robust.Finally, we discuss in the context of the TPC-C benchmark how the incorporation of functional constraints requires less changes to templates in making them robust.The SmallBank schema consists of three tables: Account(Name, CustomerID, Is-Premium), Savings(CustomerID, Balance, InterestRate), and Checking(CustomerID, Balance).Underlined attributes are primary keys.The Account table associates customer names with IDs and keeps track of the premium status (Boolean); CustomerID is a UNIQUE attribute.The other tables contain the balance (numeric value) of the savings and checking accounts of customers identified by their ID.Account (CustomerID) is a foreign key referencing both the columns Savings (CustomerID) and Checking (CustomerID).The interest rate on a savings account is based on a number of parameters, including the account status (premium or not).The application code can interact with the database through a fixed number of transaction programs: • Balance(N ): returns the total balance (savings & checking) for a customer with name N .
• DepositChecking(N ,V ): makes a deposit of amount V on the checking account of the customer with name N .• TransactSavings(N ,V ): makes a deposit or withdrawal V on the savings account of the customer with name N .• Amalgamate(N 1 ,N 2 ): transfers all the funds from N 1 to N 2 .
• WriteCheck(N ,V ): writes a check V against the account of the customer with name N , penalizing if overdrawing.• GoPremium(N ): converts the account of the customer with name N to a premium account and updates the interest rate of the corresponding savings account.This transaction program is an extension w.r.t.[ACFR08].
The transaction templates for these programs are presented in Figure 1.The corresponding SQL code is given in Appendix A.
Based on this benchmark, we give an informal description of transaction templates and functional constraints to illustrate their modeling power.More formal definitions can be found in Section 3. In short, a transaction template is a sequence of read (R), write (W) and update (U) statements over typed variables (X, Y, . . . ) with additional equality and disequality constraints.For instance, R[Y : Savings{C, I}] in GoPremium indicates that a read operation is performed to a tuple in relation Savings on the attributes CustomerID and InterestRate.We abbreviate the names of attributes by their first letter to save space.The set {C, I} is the read set.Write operations have an associated write set while update operations contain a read set followed by a write set: e.g., U[X : Account{N, C}{I}] in GoPremium first reads the Name and CustomerID of tuple X and then writes to the attribute InterestRate.To capture the dependencies between tuples induced by the foreign keys, we use two unary functions: f A→S maps a tuple of type Account to a tuple of type Savings, while f A→C maps a tuple of type Account to a tuple of type Checking.As Account(CustomerID) is UNIQUE, every savings and checking account is associated to a unique Account tuple.This is modelled through the functions f C→A and f S→A with an analogous interpretation.Notice that the equality constraints for each template in Figure 1 imply that these functions are bijections and each others inverses.
A transaction T over a database D is an instantiation of a transaction template τ if there is a variable mapping µ from the variables in τ to tuples in D that satisfies all the constraints in τ such that µ(τ ) = T .For instance, consider a database D with tuples a 1 , a 2 , . . . of type Account, s 1 , s 2 , . . . of type Savings, and c 1 , c 2 , . . . of type Checking with ] is an instantiation of GoPremium whereas µ 2 (GoPremium) with µ 2 = {X → a 1 , Y → s 2 } is not as the functional constraint Y = f A→S (X) is not satisfied.Indeed, µ 2 (Y) = s 2 ̸ = s 1 = f D A→S (a 1 ) = f D A→S (µ 2 (X)).We then say that a set of transactions is consistent with a set of templates if every transaction is an instantiation of a transaction template.More formal definitions are given in Section 3.
Functional constraints are different from the more usual data consistency constraints like key constraints, functional dependencies or denial constraints, etc.The latter are intended to verify data consistency, whereas the former are intended to verify whether a set of transactions instantiated from templates are indeed consistent with these templates.The abstraction of functional constraints provides a straightforward mechanism to capture dependencies between tuples implied by e.g.foreign key constraints.Consider for example variables X and Y in GoPremium.Rather than specifying that the value of the attribute CustomerID in the tuple assigned to X should agree with the value of the attribute CustomerID in the tuple assigned to Y and combining this information with the defined foreign key from Account to Savings to conclude that two instantiations of GoPremium that agree on the tuple assigned to X should also agree on the tuple assigned to Y, the functional constraint Y = f A→S (X) expresses this dependency more directly.An additional benefit of our abstraction is that this approach is not limited to dependencies implied by foreign keys.For the SmallBank benchmark, for example, we can infer from the fact that Account(CustomerID) is UNIQUE that each checking and savings account is associated to exactly one Account tuple, even though no foreign key from respectively Checking and Savings to Account is defined in the schema.Since functional constraints are based on unary functions, they are limited to expressing tuples being implied by a single other tuple (e.g., the Savings tuple being implied by the Account tuple in GoPremium).More complex relationships where a tuple is implied by the co-occurrence of two or more other tuples cannot be captured by our formalism.
Our previous work [VKKN21], which did not consider functional constraints, has shown that {Am,DC,TS}, {Bal,DC}, and {Bal,TS} are maximal robust sets of transaction templates.This means that for any database, for any set of transactions T that is consistent with one of the three mentioned sets, any possible interleaving of the transactions in T that is allowed under RC is always serializable!Using the results from Section 5, it follows that when functional constraints are taken into account GoPremium can be added to each of these sets as well: {Am,DC,GP,TS}, {Bal,DC,GP}, {Bal,TS,GP} are maximal robust sets.
We argue that incorporating functional constraints is crucial.Indeed, without functional constraints it's easy to show that even the set {GoPremium} is not robust.Consider the schedule over two instantiations T 1 and T 2 of GoPremium, where we use the mappings µ 1 and µ 2 as defined above for respectively T 1 and T 2 (we show the read and write sets to facilitate the discussion): The above schedule is allowed under RC as there is no dirty write, but it is not conflict serializable.Indeed, there is a rw-conflict between R 1 [s 1 {C,I}] and U 2 [s 1 {C}{I}] as the former reads the attribute I that is written to by the latter, which implies that T 1 should occur before T 2 in an equivalent serial schedule.But, there is a ww-conflict between U 2 [s 1 {C}{I}] and U 1 [s 1 {C}{I}] as both write to the common attribute I implying that T 2 should occur before T 1 in an equivalent serial schedule.Consequently, the schedule is not serializable.However, taking functional constraints into account, ) implying that the above schedule is not a counterexample for robustness.
The second benchmark is based on the TPC-C benchmark [TC].We modified the schema and templates to turn all predicate reads into key-based accesses.The schema consists of six relations: Table 1.Function names for the TPC-C benchmark schema.
The function names belonging to this schema are given in Table 1.
We focus on five different transaction templates: A detailed abstraction of each transaction template is given in Figure 2. To shorten the presentation, we only show two orderlines per order.
Incorporating functional constraints for TPC-C can not identify larger sets of templates to be robust.However, when a set of transaction templates P is not robust against RC, an equivalent set of templates P ′ can be constructed from P by promoting certain Roperations to U-operations [VKKN21].By incorporating functional constraints it can be shown that fewer R-operations need to be promoted leading to an increase in throughput as R-operations do not take locks whereas U-operations do.Consider for example the subset Delivery: P = {Delivery, OrderStatus} of the TPC-C benchmark, given in Figure 2, where functional constraints are added to express the fact that a tuple of type OrderLine implies the tuple of type Order (function f L→O ), which in turn implies the tuple of type Customer (function f O→C ).This set P is not robust against RC, but robustness can be achieved by promoting the R-operation over Customer in OrderStatus to a U-operation.However, without functional constraints, this single promoted operation no longer guarantees robustness, as witnessed by the following schedule: Notice in particular how this schedule implicitly assumes in T 2 that Order a belongs to Customer c ′ instead of Customer c to avoid a dirty write on c.Without functional constraints, P is only robust against RC if all R-operations in OrderStatus are promoted to U-operations.

Definitions
We recall the necessary definitions from [VKKN21] and extend them with functional constraints.
3.1.Databases.A relational schema is a pair (Rels, Funcs) where Rels is a set of relation names and Funcs is a set of function names.A finite set of attribute names Attr(R) is associated to every relation R ∈ Rels.Relations will be instantiated by abstract objects that serve as an abstraction of relational tuples.To this end, for every relation R ∈ Rels, we fix an infinite set of tuples Tuples R .Furthermore, we assume that Tuples R ∩ Tuples S = ∅ for all R, S ∈ Rels with R ̸ = S.We then denote by Tuples the set R∈Rels Tuples R of all possible tuples.Notice that, by definition, for every t ∈ Tuples there is a unique relation R ∈ Rels such that t ∈ Tuples R .In that case, we say that t is of type R and denote the latter by type(t) = R.Each function name f ∈ Funcs has a domain dom(f ) ∈ Rels and a range range(f ) ∈ Rels.Functions are used to encode relationships between tuples like for instance those implied by foreign-keys constraints.For instance, in the SmallBank example 3.2.Transactions and Schedules.For a tuple t ∈ Tuples, we distinguish three operations and U[t] on t, denoting that tuple t is read, written, or updated, respectively.We say that the operation is on the tuple t.The operation U[t] is an atomic update and should be viewed as an atomic sequence of a read of t followed by a write to t.We will use the following terminology: a read operation is an R[t] or a U[t], and a write operation A transaction T is a sequence of read and write operations followed by a commit.We assume that a transactions starts when its first operation is executed, but no earlier.Formally, we model a transaction as a linear order (T, ≤ T ), where T is the set of (read, write and commit) operations occurring in the transaction and ≤ T encodes the ordering of the operations.As usual, we use < T to denote the strict ordering.
When considering a set T of transactions, we assume that every transaction in the set has a unique id i and write T i to make this id explicit.Similarly, to distinguish the operations of different transactions, we add this id as a subscript to the operation.That is, we write and U[t] occurring in transaction T i ; similarly C i denotes the commit operation in transaction T i .This convention is consistent with the literature (see, e.g.[BBG + 95, Fek05]).To avoid ambiguity of notation, we assume that a transaction performs at most one write, one read, and one update per tuple.The latter is a common assumption (see, e.g.[Fek05]).All our results carry over to the more general setting in which multiple writes and reads per tuple are allowed.
A (multiversion) schedule s over a set T of transactions is a tuple (O s , ≤ s , ≪ s ,v s ) where O s is the set containing all operations of transactions in T as well as a special operation op 0 conceptually writing the initial versions of all existing tuples, ≤ s encodes the ordering of these operations, ≪ s is a version order providing for each tuple t a total order over all write operations on t occurring in s, and v s is a version function mapping each read operation a in s to either op 0 or to a write1 operation different from a in s.We require that op 0 ≤ s a for every operation a ∈ O s , op 0 ≪ s a for every write operation a ∈ O s , and that a < T b implies a < s b for every T ∈ T and every a, b ∈ T. 2 We furthermore require that for every read operation a, v s (a) < s a and, if v s (a) ̸ = op 0 , then the operation v s (a) is on the same tuple as a. Intuitively, op 0 indicates the start of the schedule, the order of operations in s is consistent with the order of operations in every transaction T ∈ T , and the version function maps each read operation a to the operation that wrote the version observed by a.If v s (a) is op 0 , then a observes the initial version of this tuple.The version order ≪ s represents the order in which different versions of a tuple are installed in the database.For a pair of write operations on the same tuple, this version order does not necessarily coincide with ≤ s .For example, under RC the version order is based on the commit order instead.We say that a schedule s is a single version schedule if ≪ s coincides with ≤ s and every read operation always reads the last written version of the tuple.Formally, for each pair of write operations a and b on the same tuple, a ≪ s b iff a < s b, and for every read operation a there is no write operation c on the same tuple as a with v s (a) < s c < s a.A single version schedule over a set of transactions T is single version serial if its transactions are not interleaved with operations from other transactions.That is, for every a, b, c ∈ O s with a < s b < s c and a, c ∈ T implies b ∈ T for every T ∈ T .
The absence of aborts in our definition of schedule is consistent with the common assumption [Fek05,BG16] that an underlying recovery mechanism will rollback aborted transactions.We only consider isolation levels that only read committed versions.Therefore there will never be cascading aborts.
3.3.Conflict Serializability.Let a j and b i be two operations on the same tuple from different transactions T j and T i in a set of transactions T .We then say that a j is conflicting with b i if: In this case, we also say that a j and b i are conflicting operations.Furthermore, commit operations and the special operation op 0 never conflict with any other operation.When a j and b i are conflicting operations in T , we say that a j depends on b i in a schedule s over T , denoted b i → s a j if:3 • (ww-dependency) b i is ww-conflicting with a j and b i ≪ s a j ; or, • (wr-dependency) b i is wr-conflicting with a j and b i = v s (a j ) or b i ≪ s v s (a j ); or, • (rw-antidependency) b i is rw-conflicting with a j and v s (b i ) ≪ s a j .
Intuitively, a ww-dependency from b i to a j implies that a j writes a version of a tuple that is installed after the version written by b i .A wr-dependency from b i to a j implies that b i either writes the version observed by a j , or it writes a version that is installed before the version observed by a j .A rw-antidependency from b i to a j implies that b i observes a version installed before the version written by a j .
Two schedules s and s ′ are conflict equivalent if they are over the same set T of transactions and for every pair of conflicting operations a j and b i , b i → s a j iff b i → s ′ a j .Definition 3.1.A schedule s is conflict serializable if it is conflict equivalent to a single version serial schedule.
A conflict graph CG(s) for schedule s over a set of transactions T is the graph whose nodes are the transactions in T and where there is an edge from T i to T j if T i has an operation b i that conflicts with an operation a j in T j and b i → s a j .Theorem 3.2 [Pap86].A schedule s is conflict serializable iff the conflict graph for s is acyclic.

Multiversion Read Committed.
Let s be a schedule for a set T of transactions.Then, s exhibits a dirty write iff there are two ww-conflicting operations a j and b i in s on the same tuple t with a j ∈ T j , b i ∈ T i and T j ̸ = T i such that b i < s a j < s C i .That is, transaction T j writes to an attribute of a tuple that has been modified earlier by T i , but T i has not yet issued a commit.For a schedule s, the version order ≪ s corresponds to the commit order in s if for every pair of write operations a j ∈ T j and b i ∈ T i , b i ≪ s a j iff C i < s a j .We say that a schedule s is read-last-committed (RLC) if ≪ s corresponds to the commit order and for every read operation a j in s on some tuple t the following holds: • there is no write4 operation c k ∈ T k on t with C k < s a j and v s (a j )≪ s c k .So, a j observes the most recent version of t (according to the order of commits) that is committed before a j .Note in particular that a schedule cannot exhibit dirty reads, defined in the traditional way [BBG + 95], if it is read-last-committed.Definition 3.3.A schedule is allowed under isolation level read committed (RC) if it is read-last-committed and does not exhibit dirty writes.
Since a read operation in a schedule allowed under RC can access the most recently committed version immediately instead of waiting for an uncommitted version to be committed, our definition of read committed allows more schedules than the more restrictive lock-based implementation of read committed [BBG + 95].Furthermore, our definition of RC should be contrasted with more abstract specifications of Read Committed [ALO00] where read operations are only required to read a committed version, rather than the most recent one.We emphasize that our definition of RC is in line with practical implementations of read committed found in e.g.PostgreSQL.5 3.5.Transaction Templates.Transaction templates are transactions where operations are defined over typed variables together with functional constraints on these variables.Types of variables are relation names in Rels and indicate that variables can only be instantiated by tuples from the respective type.We fix an infinite set of variables Var that is disjoint from Tuples.Every variable X ∈ Var has an associated relation name in Rels as type that we denote by type(X).For an operation o i in a template, var(o i ) denotes the variable in o i .An equality constraint is an expression of the form X = f (Y) where X, Y ∈ Var, dom(f ) = type(Y) and range(f ) = type(X).A disequality constraint is an expression of the form X ̸ = Y where type(X) = type(Y).Recall that we denote variables by capital letters X, Y, Z and tuples by small letters t, v.
The transaction templates derived from the SmallBank and TPC-C benchmarks are shown in Figure 1 and Figure 2, respectively.A variable assignment µ is a mapping from Var to Tuples such that µ(X) ∈ Tuples type(X) .Furthermore, µ satisfies a constraint A variable assignment µ for a transaction template τ is admissible for D if it satisfies all constraints in Γ(τ ) over D. By µ(τ), we denote the transaction obtained by replacing each variable X in τ with µ(X).
A set of transactions T is consistent with a set of transaction templates P and database D, if for every transaction T in T there is a transaction template τ ∈ P and a variable mapping µ T that is admissible for D such that µ T (τ ) = T.We refer to Section 2 for concrete examples based on transaction templates derived from the SmallBank and TPC-C benchmarks.
3.6.Robustness.We define the robustness property [BG16] (also called acceptability in [Fek05, FLO + 05]), which guarantees serializability for all schedules of a given set of transactions for a given isolation level.Definition 3.5 (Transaction Robustness).A set T of transactions is robust against RC if every schedule for T that is allowed under RC is conflict serializable.
In the next definition, we represent conflicting operations from transactions in a set T as quadruples (T i , b i , a j , T j ) with b i and a j conflicting operations, and T i and T j their respective transactions in T .We call these quadruples conflicting quadruples for T .Further, for an operation b ∈ T, we denote by prefix b (T) the restriction of T to all operations that are before or equal to b according to ≤ T .Similarly, we denote by postfix b (T) the restriction of T to all operations that are strictly after b according to ≤ T .Throughout the paper, we interchangeably consider transactions both as linear orders as well as sequences.Therefore, T is then equal to the sequence prefix b (T) followed by postfix b (T) which we denote by prefix b (T) • postfix b (T) for every b ∈ T .Definition 3.6 (Multiversion split schedule).Let T be a set of transactions and C = (T 1 , b 1 , a 2 , T 2 ), (T 2 , b 2 , a 3 , T 3 ), . . ., (T m , b m , a 1 , T 1 ) a sequence of conflicting quadruples for T such that each transaction in T occurs in at most two different quadruples.A multiversion split schedule for T based on C is a multiversion schedule that has the following form: where (1) there is no write operation in prefix b 1 (T 1 ) ww-conflicting with a write operation in any of the transactions T 2 , . . ., T m ; (2) b 1 < T 1 a 1 or b m is rw-conflicting with a 1 ; and, (3) b 1 is rw-conflicting with a 2 .Furthermore, T m+1 , . . ., T n are the remaining transactions in T (those not mentioned in C) in an arbitrary order.
Figure 3 depicts a schematic multiversion split schedule.The name stems from the fact that the schedule is obtained by splitting one transaction in two (T 1 at operation b 1 in Figure 3) and placing all other transactions in C in between.The figure does not display the trailing transactions T m+1 , T m+2 , . . .and assumes b 1 < T 1 a 1 .
The following theorem characterizes non-robustness in terms of the existence of a multiversion split schedule.
Theorem 3.7 [VKKN21].For a set of transactions T , the following are equivalent: (1) T is not robust against RC; (2) there is a multiversion split schedule s for T based on some C.
Let P be a set of transaction templates and D be a database.Then, P is robust against RC over D if for every set of transactions T that is consistent with P and D, it holds that T is robust against RC.Definition 3.8 (Template Robustness).A set of transaction templates P is robust against RC if P is robust against RC for every database D.
We say that a transaction template (τ, Γ) is a variable transaction template when Γ = ∅ and an equality transaction template when all constraints in Γ are equalities.We denote these sets by VarTemp and EqTemp, respectively.For an isolation level I and a class of transaction templates C, t-robustness(C,I) is the problem to decide if a given set of transaction templates P ∈ C is robust against I.When C is the class of all transaction templates, we simply write t-robustness(I).
In Section 4 we start out with a negative result and argue that the addition of functional constraints in its most general form is undecidable by proving undecidability for t-robustness(EqTemp,RC).Notice in particular that the undecidability result does not even require disequalities.To obtain decidable fragments, we introduce restrictions on the structure of functional constraints.The schema graph SG(Rels, Funcs) of a schema (Rels, Funcs) is a directed multigraph having the relations in Rels as nodes, and in which there are as many edges from a node R ∈ Rels to node S ∈ Rels as there are functions f ∈ Funcs with dom(f ) = R and range(f ) = S.We say that a schema (Rels, Funcs) is acyclic if the multigraph SG(Rels, Funcs) is acyclic and that it is a multi-tree if there is at most one directed path between any two nodes in SG(Rels, Funcs).
Example 3.10.Consider the schema ({P, Q, R, S}, {f P,R , f Q,R , f R,S }) with dom(f i,j ) = i and range(f i,j ) = j for each function f i,j .The corresponding schema graph with solid lines is given in Figure 4.This schema is a multi-tree, as there is at most one path between any pair of nodes.Notice that the definition of a multi-tree is more general than a forest, as a node can still have multiple parents (e.g., node R in our example).Adding the function name f Q,S with dom(f Q,S ) = Q and range(f Q,S ) = S results in the schema graph given in If we remove function name f Q,S (dashed edge), the resulting schema graph is a multi-tree.
Figure 4 that is still acyclic, but no longer a multi-tree as there are now two paths from Q to S.
The schema graph constructed in the proof of Theorem 4.1 contains several cycles (cf., Figure 6).We consider in Section 5 robustness for a fragment where a restricted form of cycles in the schema graph is allowed but where additional constraints on the templates are assumed.We consider robustness for acyclic schema graphs in Section 6.

Robustness for Templates
We start out with a negative result and show that the robustness problem in its most general form is undecidable (even when disequalities are not allowed).The proof is a reduction from Post's Correspondence Problem (PCP) [Pos46] and relies on cyclic dependencies between functional constraints.The proof can be found in the remainder of this section and is quite elaborate but the basic intuition is simple: the counterexample split schedule will build up the two strings that need to be generated by the PCP instance by repeated application of functional constraints.It might be tempting to relate the above result to the undecidability of the implication problem for functional and inclusion dependencies [CV85].Functional constraints indeed allow to define inclusion dependencies (as in the SmallBank example) but they always relate complete tuples and are not suited to define functional dependencies.Furthermore, the proof of Theorem 4.1 makes use of only unary relations, for which the implication problem for functional dependencies and inclusion dependencies is known to be decidable.
The remainder of this section is devoted to proving the correctness of Theorem 4.1.We first present the reduction from the PCP problem in Section 4.1.Afterwards, we show that this reduction is indeed correct by proving both directions in respectively Section 4.2 and Section 4.4.

4.1.
Reduction.The proof is based on a reduction from the Post's Correspondence Problem (PCP), which is known to be undecidable [Pos46].A domino is a pair (a, b) of two non-empty strings over Σ. Henceforth we call a its top value and b its bottom value.Given a set of dominoes D, the PCP asks if a non-empty sequence d 1 , d 2 , . . ., d r of dominoes in D exists such that, with d i = (a i , b i ), the strings a 1 a 2 . . .a r and b 1 b 2 . . .b r are identical.For the reduction to non-robustness against RC, we construct a set P of transaction templates consisting of the transaction templates in Figure 5 for D. There are the transactions Split, First and Last (whose meaning will be explained next) and for every domino in D there is a template in Figure 5 representing that domino and the action of appending that domino to a sequence of dominoes.The schema consists of the relations {Boolean, InitialConflict, String, PCPSolution, DominoSequence} whose meaning will be explained below together with a discussion of all the functions.The schema graph is presented in Figure 6  t o p -s t r in g b o t t o m -s t ri n g e m p t y -s t r in g fut ur e-so lu t io n -s t r i n g pc p -d s ds -p c p Figure 6.Schema graph for the transaction templates in Figure 5 (for any set of dominoes).

Split(I):
To prove Theorem 4.1, we will show that there is a solution for PCP if and only if P is not robust against RC.For the only-if direction, we show that, if there is a solution d = d 1 , d 2 , . . ., d r for the PCP problem over D, then there is a multiversion split schedule that encodes this solution in a particular way: in this schedule the split transaction is an instantiation of transaction template Split, the next transaction is an instantiation of First, then followed by instantiations of transaction templates Domino d 1 , . . ., Domino dr representing the sequence of dominoes in solution d, and finally an instantiation of transaction template Last.Henceforth, we call a schedule that encodes a sequence of dominoes d in this way a schedule-encoding of d.For the if-direction, we first show that every multiversion split schedule consistent with the transaction templates in Figure 5 for some set D of dominoes is a schedule-encoding for some sequence d of dominoes from D, and then that for every schedule-encoding of a sequence d of dominoes, d is always a solution for the PCP problem over a set of dominoes containing those in d.

Only-if direction.
We first prove the only-if direction of Theorem 4.1.Relation PCPSolution contains a tuple that we interpret as the PCP solution d = d 1 , d 2 , . . ., d r .Relation DominoSequence contains r + 1 tuples, one for every prefix of d, including the empty sequence () and the PCP solution d itself.For convenience of notation, we will henceforth often represent tuples by their interpretation, which is justified by the fact that every tuple in a particular relation will have a different interpretation, and the relation itself can always be derived from the context (e.g., the function signature).
Since the PCP solution has an interpretation in both the relations PCPSolution and DominoSequence, we assume two functions, f PCP→DS : PCPSolution → DominoSequence and f DS→PCP : DominoSequence → PCPSolution that relate these interpretations to each other.That is, Further, we have functions f next-sequence : DominoSequence → DominoSequence and f previous-sequence : DominoSequence → DominoSequence with the following interpretation: Intuitively, these functions relate each tuple in DominoSequence representing a prefix of d to the prefixes obtained by adding or removing one domino in the sequence.That is, given a tuple in DominoSequence representing a strict prefix , the prefix of d obtained by adding one domino), and f previous-sequence returns the tuple representing d 1 , d 2 , . . ., d i−1 (i.e., the prefix of d obtained by removing the last domino).We furthermore distinguish two special cases to guarantee that both functions are defined for all tuples in DominoSequence: if the tuple represents the solution d itself, then f next-sequence returns d, and if the tuple represents the empty sequence (), then f previous-sequence returns ().Relation String D contains a tuple representing the read c of PCP-solution sequence d, a tuple representing an error ⟨error⟩, and a tuple for every substring of c, including the empty string ⟨⟩.We assume that all these tuples are different.We use notation ⟨⟩ to denote the empty string to distinguish it from (), which denotes the empty sequence of dominoes.All other relations and functions have as purpose to pass tuples from one transaction to another in a schedule and to enforce that certain tuples do not collide, which is useful for the (if)-part of the proof.
Relation Boolean D contains two tuples, which we interpret as Boolean values 0 and 1. Function f is-non-empty : String → Boolean and f is-error : String → Boolean are interpreted as follows:

Now the schedule prefix
) and b 1 = ⟨init⟩ has the conditions of Definition 3.6.Indeed, it is based on sequence of conflict quadruples (T Condition (1) is true because there is no ww-conflict between a write operation in prefix b 1 (T 1 ) and a write operation in any of the transactions T 2 , . . ., T m , since the first write operation, respectively second write operation, in Split(⟨init⟩) has a type that only occurs before the conflict with First(⟨init⟩), and is the conflict with Last((d 1 , . . ., d r )), respectively.Furthermore (2) is true because b 1 < T 1 a 1 and Condition (3) is true because b 1 and a 2 are rw-conflicting.4.3.Helpful lemma.Before proving the opposite direction of Theorem 4.1, we first establish the following Lemma.
Lemma 4.3.If a set P of transaction templates is not robust against RC then there is a for a set T = {T 1 , . . ., T m } of transactions consistent with P in which an operation from a transaction T j depends on an operation from transaction T i only if j = i + 1 or i = m and j = 1.
Proof.If P is not robust against RC, then there is a database D and a multiversion split on a sequence of conflict quadruples C for a set of transactions T that is consistent with P and D having the properties of Definition 3.6.
We can assume that n = m.Otherwise removing the transactions T m+1 , . . ., T n from T , s, and C. We can also assume that s is read-last-committed.Otherwise, choosing an appropriate version order ≪ s and version function v s .Now suppose that there is a transaction T j with an operation a ′ j that depends on an operation b ′ i from transaction T i and with j ̸ = i + 1 or i = m and j ̸ = 1.Clearly, by definition of dependency and the structure of a multiversion split schedule, i < j or j = 1.
We proceed the proof by a construction showing that under these assumptions there is an alternative schedule s ′ that is also a multiversion split schedule, but for a strict subset of transactions in T (thus also still consistent with P and D).The result of the lemma then follows from the observation that repeated application of this construction must lead to a schedule with the properties of the lemma, without existence of such a dependency.
For the construction, we proceed by case distinction.
If i ̸ = 1 and j ̸ = 1, we construct a schedule s ′ from s by removing all operations from transactions T h with i < h < j.Notice that we remove at least one transaction, since i < i + 1 < j.We can derive a sequence of conflict quadruples C ′ from C by removing all occurrences of these transactions T h and adding the conflict quadruple (T i , b ′ i , a ′ j , T j ) instead.By construction, s ′ is a multiversion split schedule based on C ′ over a set of transactions consistent with P and D. It remains to show that the newly constructed schedule s ′ has the properties of Definition 3.6.The latter is straightforward since C and C ′ agree on their first and last quadruple, due to assumption i ̸ = 1 and j ̸ = 1.
If i = 1, it follows that i < j and thus j ̸ = 1.Then, we construct a schedule s ′ from s by removing all operations from transactions T h with i < h < j and updating the prefix and postfix of T 1 , now based on b ′ i .Notice that we again remove at least one transaction, since i < i + 1 < j and that we can derive a sequence of conflict quadruples C ′ from C in the same way as before, by removing all occurrences of these transactions T h and adding the conflict quadruple (T i , b ′ i , a ′ j , T j ) instead.By construction, s ′ is a multiversion split schedule based on C ′ over a set of transactions consistent with P and D. It remains to show that the newly constructed schedule s ′ has the properties of Definition 3.6.
First, we observe that b ′ 1 and a ′ j are rw-conflicting, which immediately implies that Condition (3) is true for s ′ .The argument is by exclusion.Indeed, if b ′ 1 and a ′ j would be ww-conflicting, then b ′ 1 ≪ s a ′ j implying b ′ 1 < s a ′ j (due to the assumed read-last committed) and thus b ′ 1 ≤ s b 1 , which is not allowed by condition (1) on s.It follows from a similar argument that b ′ 1 and a ′ j are not wr-conflicting due to read-last-committed and the structure of a multiversion split schedule), thus b ′ 1 ≤ s b 1 .Therefore, Condition (1) again transfers from s to s ′ .For similar reasons Condition (2) applies on s Otherwise, if j = 1, it follows that 1 < i.Then, we construct a schedule s ′ from s by removing all operations from transactions T h with i < h.Notice that we remove at least one transaction, since i < m.We can derive a sequence of conflicting quadruples C ′ from C by removing all occurrences of these transactions T h and adding the conflicting quadruple (T i , b ′ i , a ′ j , T j ) instead.In this schedule s ′ , Condition (1) and (3) transfer from s by its construction.To see that Condition (2) is true on s ′ , simply notice that if b ′ i and a ′ 1 are ww or wr-conflicting, then either b 4.4.If direction.It remains to argue that the if direction of Theorem 4.1 is indeed correct.Next, we show that, if there exists a multiversion split schedule for the set of transaction templates in Figure 5 for some set D of dominoes, then this schedule is always a scheduleencoding of a sequence of dominoes in D.
Proposition 4.4.Let D be a set of dominoes.If there is a multiversion split schedule s for a set of transactions consistent with the transaction template in Figure 5 for D and some database D, then this schedule s is a schedule-encoding of some sequence d of dominoes in D.
For the proof, let D be a database and a multiversion split schedule for a set of transactions T consistent with P and D, with the conditions of Lemma 4.3 and based on some sequence of conflict quadruples C = (T 1 , b 1 , a 2 , T 2 ), (T 2 , b 2 , a 3 , T 3 ) . . ., (T m , b m , a 1 , T 1 ).We show through a sequence of properties (Lemmas 4.6, 4.7, 4.8, and 4.9), that s is a schedule-encoding of a sequence d of dominoes in D.
As a first property (Lemma 4.5), we observe that transaction templates in P heavily constrain the possible variable instantiations.For transaction template Split, for example, a variable mapping depends entirely on the choice of the value for variable I. Since Lemma 4.3 forbids the presence of duplicate transactions in T , two transactions T i and T j (with i ̸ = j) based on transaction template Split cannot agree on their choice for variable I in s.By applying this argument to other transaction templates, we obtain the following corollary of Lemma 4.3.Here, for each transaction T i in s, we write τ i to denote the transaction template in P that it is based on, and by µ i the associated variable mapping for τ i , with µ i (τ i ) = T i .Lemma 4.5.for two transactions T i and T j in s, with i ̸ = j: • if T i and T j are based on Split, then µ i (I) ̸ = µ j (I); • if T i and T j are based on First then µ i (S 1 ) ̸ = µ j (S 1 ); • if T i and T j are based on Last then µ i (B) ̸ = µ j (B) and µ i (C) ̸ = µ j (C); • if T i and T j are based on domino transaction templates then µ i (B) ̸ = µ j (B) and We conclude the proof of Proposition 4.4 with the necessary arguments (Lemmas 4.6, 4.7, 4.8 and 4.9) that s is indeed a schedule-encoding for some sequence of dominoes.
Proof.Since b 1 and a 2 are rw-conflicting (cf, Definition 3.6), and there are no updates in the considered transaction templates, operation b 1 must be a read.Since InitialConflict is the only type allowing for conflicts involving a read, it is immediate that T 1 must be based on Split and T 2 based on First, with µ 1 (I) = µ 2 (I).From this equality and function f final-dominoes-string it follows that µ 1 (S 1 ) = µ 2 (S 1 ).From Definition 3.6, particularly that there is no ww-conflict between a write operation in prefix b 1 (T 1 ) and a write operation in any of the transactions T 2 , . . ., T m , it follows that µ 1 (X 1 ) ̸ = µ 2 (X 2 ).Finally, function f is-non-empty , which maps S 1 onto X 1 in transaction template Split and S 0 onto X 2 in transaction template First, implies µ 1 (S 1 ) ̸ = µ 2 (S 0 ).Lemma 4.7.There is a transaction T 3 in s and it is based on a domino transaction template, with µ 3 (S 1 ) = µ 2 (S 1 ).
Proof.First, suppose towards a contradiction that m = 2.We already know from Lemma 4.6 that µ 1 (I) = µ 2 (I) and indicating b 1 → s a 2 , particularly, v s (b 1 ) ≪ s a 2 , thus implying that a 1 cannot depend on b 2 , which is the desired contradiction.
Lemma 4.8.For a transaction T i , with i ≥ 4, for which all T j 's, with j ∈ {3, . . ., i − 1}, are based on domino transaction templates, transaction T i is based on a domino transaction template or on transaction template Last.Furthermore µ 2 (S 1 ) = µ i (S 1 ).
Proof.Since domino transaction templates do not mention variables of type InitialConflict and write only to variables of type DominoSequence, it remains to show that T i+1 is not based on transaction template First.
For this, observe that µ 2 (S 1 ) = µ i−1 (S 1 ).Indeed, every conflict quadruple (T i , b i , a i+1 , T i+1 ), with i ∈ {3, . . ., i − 1}, admits ww-conflicting operations with variables of type DominoSequence.No matter if the conflict is via a variable B or B next , the constraints S 1 = f future-solution-string (B) and Now, assume towards a contradiction that T i+1 is based on First, thus admitting a conflict quadruple ) and a i+1 = µ i+1 (B).Both of these equalities imply µ i (S 1 ) = µ i+1 (S 1 ) due to constraints S 1 = f future-solution-string (B) and S 1 = f future-solution-string (B next ), thus implying µ i (S 1 ) = µ 2 (S 1 ), this contradict with Lemma 4.5.We conclude that T i is indeed based on a domino transaction template or on transaction template Last.That µ i−1 (S 1 ) = µ i (S 1 ) follows again from the constraints using function f future-solution-string .Lemma 4.9.If T i is based on transaction template Last, then i = m.
Proof.Let T j be the transaction following T i .We already know about T j that either j = 1 or must be a transaction that is different to all foregoing transactions T 1 , . . ., T i (due to Lemma 4.3).
We first show, by exclusion, that transaction T j is based on Split: Transaction T j cannot be based on Last, as then either µ i (B) = µ j (B) or µ i (C) = µ j (C), which directly contradicts Lemma 4.5.Similarly, transaction T j cannot be based on First, as then µ i (B) = µ j (B) implying µ 2 (S 1 ) = µ i (S 1 ) = µ j (S 1 ), due to the constraints involving function f future-solution-string .Finally, transaction T j cannot be based on a domino transaction template, because then , thus with T i and T j contradicting Lemma 4.5.We can thus indeed conclude that transaction T j is based on Split.
To see that j = 1, recall that µ 1 (S 1 ) = µ i−1 (S 1 ) and the only possible conflict between T i and T j implies µ i (C) = µ j (C).From the latter we obtain µ i−1 (S 1 ) = µ i (S 1 ), due to µ i−1 (B next ) = µ i (B) and function f future-solution-string .From this it follows that µ 1 (I) = µ j (I) through I = (f defines • f is-non-empty )(S 1 ) in transaction template Split.That j = 1 then follows from Lemma 4.5.
Finally, we show that if there is a multiversion split schedule with the properties of Lemma 4.3 that is a schedule-encoding for a sequence of dominoes d, then this sequence d is also a solution to the respective PCP problem.The next Proposition thus finalizes the proof for the if-direction of Theorem 4.1.
Proposition 4.10.Let D be a set of dominoes.Let s be a multiversion split schedule with the properties of Lemma 4.3 that is consistent with the transaction templates in Figure 5  For convenience of notation, we introduce for every i ∈ {0, . . ., h} and j ∈ {1, . . ., k} the following notation: First, we show that This result follows from the assumed structure of schedule s.More precisely, since an instantiation of First with an instantiation of Domino d 1 can only have conflicts on instantiations of W[B : DominoSequence], we have µ 2 (B) = µ 3 (B), from which it follows that µ 2 (S 1 ) = µ 3 (S 1 ).
For every individual instantiation of Domino d i in s, we have that f append-a ℓa i For transactions T i , with i ∈ {3, . . ., m + 1}, (thus representing an instantiation of Domino d i−2 which is followed in s by an instantiation of Domino d i−1 ), the only possible conflict is between the instantiation of W[B next : DominoSequence] (T i ) and the instantiation of W[B : DominoSequence] in (T i+1 ) -notice that this is indeed the only option due to Lemma 4.5 -thus with µ i (B next ) = µ i+1 (B), implying µ i (S ta i ) = µ i+1 (S t ), µ i (S bb i ) = µ i+1 (S b ), and µ i (S 1 ) = µ i+1 (S 1 ).
Finally, transaction T m−1 (an instantiation of Domino dm ) can only conflict with transaction T m (an instantiation of Last) on instantiations of B next (in Domino dm , and B (in Last), thus with Combining the above equalities indeed proves Condition (4.1).
From Condition (4.1) we can now derive that, by following an analogous approach.Indeed, in every instantiation of Domino i , there is a functional constraint for every application of the append function that requires its input to be the result of the detach function applied over its output, which indeed implies Condition (4.2).
To see that k = h, we observe that k ̸ = h implies an application of the detach function over the instantiation of S 1 (for which we already argued it has the same tuple assigned for every domino instantiation) for the shortest string, which contradicts with Condition (4.1) because such an application results in the same instantiation as S e , which can never equal the instantiation for S 1 .
The desired result that the individual symbols in the top and bottom reads of dominoes in sequence d are the same now follows from the functional constraint that every interpretation of Account Savings Checking

Robustness for Templates admitting Multi-Tree Bijectivity
We say that a set of transaction templates P over a schema (Rels, Funcs) admits multi-tree bijectivity if a disjoint partitioning of Funcs in pairs (f 1 , g 1 ), (f 2 , g 2 ), . . ., (f n , g n ) exists such that dom(f i ) = range(g i ) and dom(g i ) = range(f i ) for every pair of function names (f i , g i ); every schema graph SG(Rels, {h 1 , h 2 , . . ., h n }) over the schema restricted to function names {h 1 , h 2 , . . ., h n } (with h i = f i or h i = g i ) is a multi-tree; and, for every pair of function names (f i , g i ) and for every pair of variables X, Y occurring in a template τ j ∈ P, we have Intuitively, we can think of f i as a bijective function, with g i its inverse.We denote the class of all sets of templates admitting multi-tree bijectivity by MTBTemp.The SmallBank benchmark given in Figure 1 is in MTBTemp, witnessed by the partitioning {(f A→C , f C→A ), (f A→S , f S→A )}.For example, the schema graph restricted to f A→C and f A→S is a tree and therefore also a multi-tree, as illustrated in Figure 7.
The next theorem allows disequalities whereas Theorem 4.1 does not require them.
The approach followed in the proof of Theorem 5.1 is to repeatedly pick a transaction template while maintaining an overall consistent variable mapping in search for a counterexample multiversion split schedule that by Theorem 3.7 suffices to show that robustness does not hold.The main challenge is to show that a variable mapping consistent with all functional constraints can be maintained in logarithmic space and that all requirements for a multiversion split schedule can be verified in nlogspace.
Central to our approach is a generalization of conflicting operations.Let P be a set of transaction templates.For τ i and τ j in P, we say that an operation o i ∈ τ i is potentially conflicting with an operation o j ∈ τ j if o i and o j are operations over a variable of the same type, and at least one of the following holds: Intuitively, potentially conflicting operations lead to conflicting operations when the variables of these operations are mapped to the same tuple by a variable assignment.In analogy to conflicting quadruples over a set of transactions as in Definition 3.6, we consider potentially conflicting quadruples (τ i , o i , p j , τ j ) over P with τ i , τ j ∈ P, and o i ∈ τ i an operation that is potentially conflicting with an operation p j ∈ τ j .For a sequence of potentially conflicting quadruples D = (τ 1 , o 1 , p 2 , τ 2 ), . . ., (τ m , o m , p 1 , τ 1 ) over P, we write Trans(D) to denote the set {τ 1 , . . ., τ m } of transaction templates mentioned in D. For ease of exposition, we assume a variable renaming such that any pair of templates in Trans(D) uses a disjoint set of variables. 6The sequence D induces a sequence of conflicting quadruples C = (T 1 , b 1 , a 2 , T 2 ), . . ., (T m , b m , a 1 , T 1 ) by applying a variable assignment µ i to each τ i in Trans(D).We call such a set of variable assignments simply a variable mapping for D, denoted μ, and write μ(D) = C.For a variable X occurring in a template τ i , we write μ(X) as a shorthand notation for µ i (X), with µ i the variable assignment over τ i in μ.This is well-defined as all templates in Trans(D) are variable-disjoint.Furthermore, μ(var(o i )) = μ(var(p j )) for each potentially conflicting quadruple (τ i , o i , p j , τ j ) in D as otherwise the induced quadruple (T i , b i , a j , T j ) is not a valid conflicting quadruple in C. We say that a variable mapping μ is admissible for a database D if every variable assignment µ i in μ is admissible for D.
A basic insight is that if there is a multiversion split schedule s for some C over a set of transactions T consistent with P and a database D, then there is a sequence of potentially conflicting quadruples D such that μ(D) = C for some μ.We will verify the existence of such a C, satisfying the properties of Definition 3.6, by nondeterministically constructing D on-the-fly together with a mapping μ.We show in Lemma 5.5 that when P ∈ MTBTemp, μ is a collection of disjoint type mappings (that map variables of the same type to the same tuple) such that variables that are "connected" in D (in a way that we will make precise next) are mapped using the same type mapping.Lemma 5.6 then shows that already a constant number of those type mappings suffice.
We introduce the necessary notions to capture when two variables are connected in D. We can think of equality constraints Y = f (X) in a template τ as constraints on the possible variable assignments µ for τ when a database D is given.Indeed, if we fix µ(X) to a tuple in D, then µ(Y) = f D (µ(X)) is immediately implied.These constraints can cause a chain reaction of implications.If for example Z = g(Y) is a constraint in τ as well, then µ(X) immediately implies µ(Z) = g D (f D (µ(X))).We formalize this notion of implication next.We use sequences of function names F = f 1 • • • f n , denoting the empty sequence as ε and the concatenation of two sequences F and G by F • G.For two variables X, Y occurring in a template τ and a (possibly empty) sequence of function names F , we say that ⇝ τ Z and F = F ′ • f .We next extend the notions of implication to sequences of potentially conflicting quadruples.Let D = (τ 1 , o 1 , p 2 , τ 2 ), . . ., (τ m , o m , p 1 , τ 1 ) be a sequence of potentially conflicting quadruples, and let X and Y be two variables occurring in templates τ i and τ j in Trans(D), respectively.Then X implies Y by a sequence of function names ⇝ τ i Y (implication within the same template); • F = ε and (τ i , o i , p j , τ j ) or (τ j , o j , p i , τ i ) is a potentially conflicting quadruple in D with o i (respectively p i ) an operation over X and p j (respectively o j ) an operation over Y (implication between templates, notice that ⇝ τ Y or Y F ⇝ τ Z for some sequence F .These definitions of connectedness can be trivially extended to operations over variables: two operations in D (respectively τ ) are connected in D (respectively τ ) if they are over variables that are connected in D (respectively τ ).When F is not important we drop it from the notation.For instance, we denote by X ⇝ D Y that there is an F with X F ⇝ D Y.
Lemma 5.2.Let D be a sequence of potentially conflicting quadruples over P ∈ MTBTemp.
for every variable mapping μ for D that is admissible for some database D.
Before proving the correctness of Lemma 5.2, we first present two additional lemmas that will be used in the correctness proof.
Lemma 5.3.Let (Rels, Funcs) be a schema for which a disjoint partitioning of Funcs in pairs P = (f 1 , g 1 ), (f 2 , g 2 ), . . ., (f n , g n ) exists such that dom(f i ) = range(g i ) and dom(g i ) = range(f i ) for every (f i , g i ) ∈ P and every schema graph SG(Rels, {h 1 , h 2 , . . ., h n }) over the schema restricted to function names {h 1 , h 2 , . . ., h n } with h i ∈ (f i , g i ) is a multi-tree.Then: (1) there is no function name f ∈ Funcs with dom(f ) = range(f ); and (2) for every path in SG(Rels, Funcs) visiting a sequence of nodes Proof.Towards a contradiction, assume (1) does not hold.That is, there is a function name f i with dom(f i ) = range(f i ) = R for some type R. Let g i be the function name such that (f i , g i ) is a pair in P .By definition, dom(g i ) = range(g i ) = R.But then we cannot pick a h i ∈ (f i , g i ) such that the resulting schema graph is a multi-tree.Indeed, in both cases, there is a self-loop on R, leading to the desired contradiction.For (2), assume towards a contradiction that R 2 ̸ = R m−1 .Without loss of generality, we can assume that each node is visited only once in R 2 , . . ., R m−1 .Otherwise, R 2 , . . ., R m−1 contains a loop that can be removed from this sequence without altering R 2 and R m−1 .Since R 1 , R 2 , . . ., R m−1 , R m is a path in SG(Rels, Funcs), there is a sequence of function names F = e 1 • • • e m−1 such that each e i is an edge from R i to R i+1 in SG(Rels, Funcs), implying dom(e i ) = R i and range(e i ) = R i+1 .By assumption that each type R i occurs only once in R 2 , . . ., R m−1 (notice that, for i = 1, this follows from Condition (2) of the lemma) and type R 1 = R m does not appear in R 2 , . . ., R m−1 , there is no pair of function names e i and e j in F with i ̸ = j, dom(e i ) = range(e j ) and range(e i ) = dom(e j ).Therefore, at most one function name of each pair in P appears in F .But then we can choose h i = e i for each such pair in P , with e i the function name appearing in F .Since F describes a cycle in SG(Rels, Funcs), the resulting schema graph restricted to these h i cannot be a multi-tree, as it contains a cycle.
It remains to argue that if f is the edge from R 1 to R 2 and g is the edge from R m−1 to R m on this path, then (f, g) is a pair in P .To this end, note that dom(f ) = range(g) and range(f ) = dom(g), as R 1 = R m and R 2 = R m−1 .If (f, g) is not a pair in P , then there are two pairs (f, f ′ ) and (g, g ′ ) in P with dom(f ) = dom(g ′ ) = range(f ′ ) = range(g) = R 1 and range(f ) = range(g ′ ) = dom(f ′ ) = dom(g) = R 2 .Then we can choose f in (f, f ′ ) and g in (g, g ′ ).Since the resulting schema graph cannot be a multi-tree, as there is a cycle between R 1 and R 2 , this choice leads to a contradiction.Lemma 5.4.Let D be a sequence of potentially conflicting quadruples over P ∈ MTBTemp.Then (1) X ⇝ τ Y iff Y ⇝ τ X for every pair of variables X and Y occurring in a template τ ; and (2) X ⇝ D Y iff Y ⇝ D X for every pair of variables X and Y occurring in D.

Proof. (1) We argue by induction on the definition of X
The other direction is analogous.The base case is immediate, as X = Y implies Y ⇝ τ X by definition.For the inductive case, assume a variable Z such that Y = f (Z) is a constraint in τ and X ⇝ τ Z.By the induction hypothesis, Z F ⇝ τ X for some sequence of function names F .Since P ∈ MTBTemp, there is a constraint (2) We argue by induction on the definition of The other direction is again analogous.The first base case is now immediate, as we already argued that X ⇝ τ Y implies Y ⇝ τ X.For the second base case, assume X ε ⇝ D Y and (τ i , o i , p j , τ j ) is a potentially conflicting quadruple in D with var(o i ) = X and var(p j ) = Y (the case for (τ j , o j , p i , τ i ) is analogous).Y ε ⇝ D X then follows by definition.For the inductive case, let Z be a variable such that X F1 ⇝ D Z and Z F2 ⇝ D Y. Then by induction hypothesis We are now ready to prove the correctness of Lemma 5.2.
Proof of Lemma 5.2.Assuming X D Y, we first show by induction on the definition of connectedness that X ⇝ D Y.By Lemma 5.4, Y ⇝ D X then follows.For the base case, both X ⇝ D Y and Y ⇝ D X imply X ⇝ D Y, where the former is immediate and the latter is by Lemma 5.4.For the inductive case, let Z be a variable with X D Z and either ⇝ D Y for some sequence of function names F 2 is implied in both cases.By induction hypothesis, X F1 ⇝ D Z for some sequence of function names F 1 .As a result, Next, let X and Y be two variables occurring in Trans(D) with X D Y and type(X) = type(Y) and let μ be a variable mapping for D that is admissible for a database D. We prove that μ(X) = μ(Y).
We already argued that X D Y implies X ⇝ D Y.By definition of X ⇝ D Y, there is a sequence of variables X 1 , X 2 . . ., X n with X 1 = X and X n = Y such that for each pair of adjacent variables X i and X i+1 : ( †) X i and X i+1 both occur in the same template τ ∈ Trans(D) and X i+1 = f (X i ) ∈ Γ(τ ) for some function name f ; or ( ‡) type(X i ) = type(X i+1 ) and there is a potentially conflicting quadruple (τ j , o j , p k , τ k ) in D with either var(o j ) = X i and var(p k ) = X i+1 or var(p k ) = X i and var(o j ) = X i+1 .In the remainder of this proof, we show that for each pair of variables X i and X j in this sequence with type(X i ) = type(X j ) that μ(X i ) = μ(X j ).The desired μ(X) = μ(Y) then follows immediately as X = X 1 and Y = X n .Note that it suffices to show this property only for pairs of variables X i and X j for which no variable X k exists with i < k < j and type(X i ) = type(X j ) = type(X k ).Indeed, if such an X k exists, we can recursively argue that μ(X i ) = μ(X k ) and μ(X k ) = μ(X j ).The argument is by induction on the number of variables between X i and X j .
If j = i + 1 (base case), then ( ‡) applies to X i and X j .Indeed, if ( †) would apply instead, then there would be a function name f with dom(f ) = type(X i ) = type(X j ) = range(f ), contradicting Condition (1) of Lemma 5.3.By definition of μ, we have μ(X i ) = μ(X j ).
Next, let i + 1 < j (inductive case), and assume that μ(X k ) = μ(X ℓ ) for all X k and X ℓ with i < k ≤ ℓ < j and type(X k ) = type(X ℓ ) (induction hypothesis).From this sequence X i , . . .X j , we derive a sequence of function names where each function name f i is based on an application of ( †) on adjacent variables (notice that applications of ( ‡) do not result in a function name being added to F ).By assumption on the types of variables X k with i < k < j, we have in particular type(X i+1 ) ̸ = type(X i ) and type(X j−1 ) ̸ = type(X j ).This implies that ( †) is applicable for X i and X i+1 (respectively X j−1 and X j ).Furthermore, X i and X i+1 appear in the same template, say τ i (respectively τ j for X j−1 and X j ), and Since this path satisfies Condition 2 in Lemma 5.3, it follows that type(X i+1 ) = type(X j−1 ) and (f 1 , f m−1 ) is a pair in the pairwise partitioning of Funcs witnessing P ∈ MTBTemp.By definition of MTBTemp, It follows from Lemma 5.2 that, if we group connected variables, then the same tuple is assigned to all variables of the same type in this group.We encode this choice of tuples for variables through (total) functions c : Rels → Tuples that we call type mappings and which map a relation onto a particular tuple of that relation's type.For instance, in SmallBank, a type mapping c is determined by an Account tuple a, a Savings tuple s, and a Checking tuple c.The following Lemma makes explicit how μ can be decomposed into type mappings such that connected variables use the same type mapping and disequalities enforce the use of different type mappings.
Lemma 5.5.For a multiversion split schedule s based on a sequence of conflicting quadruples C over a set of transactions T consistent with a P ∈ MTBTemp and a database D, let μ be the variable mapping for a sequence of potentially conflicting quadruples D over P with μ(D) = C.Then, a set S of type mappings over disjoint ranges and a function φ S : Var → S exist with: Proof.To aid the construction of S and φ S , we first define a coloring function λ that assigns a color to each tuple occurring in the schedule s such that the following holds: for every pair of tuples t and v occurring in s: • connected tuples are mapped to the same color: if μ(X) = t, μ(Y) = v and X D Y for some variables X, Y occurring in Trans(D), then λ(t) = λ(v); and • different tuples of the same type are mapped to different colors: if type(t) = type(v) and t ̸ = v, then λ(t) ̸ = λ(v).
Note that we can always construct such a function λ as by Lemma 5.2, it cannot be the case that type(t) = type(v), t ̸ = v and there is a pair of variables X, Y with μ(X) = t, μ(Y) = v, and X D Y .
and τ m are restricted to satisfy respectively Condition (3) and (2) in Definition 3.6 in the resulting multiversion split schedule.We first present an additional Lemma derived from Lemma 5.2 that will be used in the proof of Lemma 5.6.Proof of Lemma 5.6.(if ) Let D = (τ 1 , o 1 , p 2 , τ 2 ), . . ., (τ m , o m , p 1 , τ 1 ) be the sequence of potentially conflicting quadruples derived from E. Notice in particular that D is indeed a sequence of potentially conflicting quadruples by ( 1) and ( 6).We construct a variable mapping μ for D admissible for a database D such that the sequence of conflicting quadruples C = μ(D) satisfies the conditions in Definition 3.6, thereby proving that P is not robust against RC.
Let φ S : Var → S be the (partial) function assigning a type mapping in S to each variable occurring in an operation in E: This function φ S is well defined: if there is a then o i τ i p i and hence c o i = c p i by (3).Recall that we assume that templates in E are variable-disjoint.We argue that φ S (X) = φ S (Y) if X D Y for each pair of variables X and Y for which φ S is defined.From Lemma 5.2, it follows that X ⇝ D Y whenever X D Y. Let τ i and τ j be the template in which respectively X and Y occur.The argument is now by induction on the definition of X ⇝ D Y: φ S (Y).By Lemma 5.7, D is an equivalence relation.For X occurring in Trans(D), denote by [X] the equivalence class of X.Let S ′ be obtained by extending S with a type mapping c [X] for each equivalence class where no variable Y ∈ [X] is defined in φ S .Furthermore, each of the c [X] are picked such that all type mappings in S ′ have disjoint ranges.
Next, we extend φ S to a function φ S ′ : Var → S ′ assigning a type mapping to each variable X occurring in Trans(D) as follows: otherwise.
Notice, furthermore, that in the second case X might be connected in D to multiple variables for which φ S is defined, say Y 1 and Y 2 .Then, by Lemma 5.7, Y 1 D Y 2 and hence φ S (Y 1 ) = φ S (Y 2 ).We therefore conclude that φ S ′ (X) is well defined.We argue that φ At this point, we verify that Condition ( 7) and ( 8) are true, that Conditions (1-5) are true for all chosen transaction templates and operations, and that Condition (6) is true for τ 1 and τ 2 , and τ 2 and τ m .We reject the guessed quintuples if any of the conditions is false.
If all previous checks are true, we proceed by inserting another step.Let i = 2.We guess a new quintuple E i+1 and verify that Condition (5) is true for τ i and τ i+1 and that Conditions (1-6) are true for τ i+1 and reject the entire construction if one of these conditions failed.Notice that all Conditions, including Condition (5) can be checked easily, particularly because quintuple E 1 is stored.To proceed, we discard quintuple E i and store E i+1 instead, thus without increasing the amount of space we use.
If τ i+1 and τ m (from quintuple E m ) have Condition (6), the algorithm emits an accept.Indeed, then the sequence E 1 , . . ., E i , E i+1 , E m of guessed quintuples has all the properties of Lemma 5.6.Otherwise, the algorithm proceeds with another insertion step, for i = i + 1.

Robustness for Templates over Acyclic Schemas
We denote by AcycTemp the class of all sets of transaction templates over acyclic schemas.As a concrete example, the schema graph for the TPC-C benchmark is given in Figure 8.Since this schema graph does not contain any cycles, the TPC-C benchmark is situated within AcycTemp.Notice in particular how this acyclic schema graph corresponds to the hierarchical structure of many-to-one relationships inherent to the schema for this benchmark.For example, every orderline belongs to exactly one order, and every order is related to exactly one customer, but the opposite is never true (i.e., a customer can be related to multiple orders, each of which can be related to multiple orderlines).In general, the results presented in this section can be applied to all workloads over schemas with such a hierarchical structure.Theorem 6.1.t-robustness(AcycTemp,RC) is decidable in expspace.
We first provide some intuition for the proof.For a given acyclic schema graph SG, R F ⇝ SG S denotes the directed path from node R to node S in SG with F the sequence of edge labels on the path.The next lemma relates implication between variables to paths in SG.
Lemma 6.2.Let D be a sequence of potentially conflicting quadruples over a set of transaction templates P ∈ AcycTemp.For every pair of variables X, Y occurring in Trans(D), if X F ⇝ D Y, then type(X) F ⇝ SG type(Y), with SG the corresponding schema graph.
Proof.Let τ i and τ j be the templates in D in which X and Y occur, respectively.The proof is by induction on the definition of X F ⇝ D Y. (Implication within the same template) If i = j and X F ⇝ τ i Y, then either F = ε and X = Y, or there is a variable Z such that Y = f (Z) is a constraint in Γ(τ i ), X F ′ ⇝ τ i Z and F = F ′ • f .In the former case, type(X) = type(Y), so type(X) ε ⇝ SG type(Y) is immediate.In the latter case, it follows by induction that type(X) F ′ ⇝ SG type(Z).Since dom(f ) = type(Z) and range(f ) = type(Y), it follows by definition that type(Z) f ⇝ SG type(Y) and furthermore type(X) F ⇝ SG type(Z) holds.(Implication between templates) If F = ε and (τ i , o i , p j , τ j ) (respectively (τ j , o j , p i , τ i )) is a potentially conflicting quadruple in D, with var(o i ) = X and var(p j ) = Y (respectively var(p i ) = X and var(o j ) = Y), then type(X) = type(Y) by definition of potentially conflicting operations.So, type(X ⇝ SG type(Z) and type(Z) F2 ⇝ SG type(Y) follow by induction.We conclude that type(X) F ⇝ SG type(Y).
Notice that an assignment of a tuple to a variable X determines the tuples assigned to all variables Y with X F ⇝ D Y for some sequence of function names F .From Lemma 6.2 it follows that each such implied tuple is witnessed by a path in the corresponding schema graph SG.Therefore, the maximal number of different tuples implied by X corresponds to the number of paths in SG starting in type(X), which is finite when SG is acyclic.Because there can be multiple paths between nodes in the schema graph, it is no longer the case as in the previous section that variables of the same type connected in D must be assigned the same value.So, instead of using type mappings, we introduce tuple-contexts to represent the sets of all tuples implied by the assignment of a given variable.Formally, a tuple-context for a type R ∈ Rels is a function from paths with source R in SG(Rels, Funcs) to tuples in Tuples of the appropriate type.That is, for each tuple-context c for type R and for each path R F ⇝ SG S in SG, type(c(R F ⇝ SG S)) = S. Similar to Lemma 5.5, we show that we can represent a counterexample schedule based on D by assigning a tuple-context to each variable in Trans(D), taking special care when assigning contexts to variables connected in D to make sure that they are properly related to each other.For this, we introduce a (partial) function φ A : Var → A mapping (a subset of) variables in Trans(D) to tuple-contexts in A (for A a set of tuple-contexts)and refer to it as a (partial) context assignment for D over A. In a sequence of lemmas, we show that φ A can always be expanded into a total function and an approach based on enumeration of quintuples analogous to Lemma 5.6 suffices to decide robustness.A major difference with the previous section is that there is no longer a constant bound on the number of tuple-contexts that are needed and consistency between tuple-contexts in connected variables needs to be maintained.
We call node S a descendant of node R and R an ancestor of S in an acyclic schema graph SG.We write R ε ⇝ SG S, with ε denoting the empty labeling, for the case R = S.This means that a node is a descendant and ancestor of itself.When F is not relevant, we simply write R ⇝ SG S.
Let c R and c S be two tuple-contexts for types R and S, respectively, such that S is a descendant of R in SG, witnessed by the path R F ⇝ SG S in SG.We then say that c S is a tuple-subcontext of c R witnessed by ⇝ SG S and S F ′ ⇝ SG S ′ .For a given tuple-context c for a type R in the schema graph SG, we will often write c(F ) as a shorthand notation for c(R F ⇝ SG S).Similar to Lemma 5.5 for sets of transaction templates admitting multi-tree bijectivity, Lemma 6.6 shows that we can represent a counterexample schedule based on a sequence of potentially conflicting quadruples D over an acyclic schema by assigning a tuple-context to each variable in Trans(D), taking special care when assigning contexts to variables connected in D to make sure that they are properly related to each other.For a set of tuple-contexts A, we refer to a (partial) function φ A : Var → A mapping (a subset of) variables in Trans(D) to tuple-contexts in A as a (partial) context assignment for D over A. We furthermore say that φ A is a total context assignment for D over A if φ A is defined for every variable in Trans(D).
Two variables X and Y occurring in Trans(D) are equivalent in D, denoted • there exists a pair of variables Z and W and a sequence of function names • there exists a pair of variables Z and W in τ and a sequence of function names F with Intuitively, every variable mapping admissible for a given database will assign the same tuple to equivalent variables (see Lemma 6.4).Due to these equivalent variables, the assignment of a tuple to a variable X for a given database might imply the tuple assigned to a variable Y, even if X ⇝ D Y does not hold.We capture this observation by introducing variable determination, which is stronger than the previously defined variable implication.Formally, a variable X determines a variable Y in D witnessed by a sequence of function names F , denoted For two variables X and Y in a template τ ∈ Trans(D) we furthermore say that X determines Y in τ witnessed by a sequence of function names F , denoted For a multiversion split schedule s based on a sequence of conflicting quadruples C over a set of transactions T consistent with a set of transaction templates P and a database D, let μ be the variable mapping for a sequence of potentially conflicting quadruples D over P with μ(D) = C.Then, for every combination of variables W, X, Y, Z occurring in there is a sequence of variables X 1 , X 2 . . ., X n with X 1 = Z and X n = X such that for each pair of adjacent variables X i and X i+1 : ( †) X i and X i+1 both occur in the same template τ ∈ Trans(D) and X i+1 = f (X i ) ∈ Γ(τ ) for some function name f ; or ( ‡) type(X i ) = type(X i+1 ) and there is a potentially conflicting quadruple (τ j , o j , p k , τ k ) in D with either var(o j ) = X i and var(p k ) = X i+1 or var(p k ) = X i and var(o j ) = X i+1 .Furthermore, the sequence F corresponds to the function names used in applications of ( †).Analogously, with the same properties.Notice that the lengths of these two sequences of variables, namely n and m, are not necessarily equal to each other and to the length of F due to possible applications of ( ‡).For a variable X i in the sequence X 1 , X 2 , . . ., X n , we denote the sequence of function names derived from applications of ( †) in the subsequence X i , . . ., X n by suffix F (X i ).Notice that suffix F (X i ) is indeed always a suffix of F , and that suffix We argue by induction that for every ).Then, we distinguish the following cases: This means that ( ‡) applies to X i and X i+1 , and there is a potentially conflicting quadruple (τ k , o k , p ℓ , τ ℓ ) in D with either var(o k ) = X i and var(p ℓ ) = X i+1 or var(p ℓ ) = X i and var(o k ) = X i+1 .By definition of μ, we have μ(X i+1 ) = μ(X i ) and by induction that μ( Then, ( †) applies to both X i and X i+1 , and Y j and The next Lemma shows that every variable mapping admissible for a given database will assign the same tuple to equivalent variables.Lemma 6.4.For a multiversion split schedule s based on a sequence of conflicting quadruples C over a set of transactions T consistent with a set of transaction templates P and a database D, let μ be the variable mapping for a sequence of potentially conflicting quadruples D over P with μ(D) = C.Then, for every pair of variables X and Y occurring in templates τ i and τ j in Trans(D) respectively, if X ≡ D Y, then μ(X) = μ(Y).
Proof.The proof is by induction on the definition of X ≡ D Y. (base case) If X = Y, then the result is immediate.(inductive cases) If there are two variables Z and W and a sequence of function names F such that Z ≡ D W, Z F ⇝ D X and W F ⇝ D Y, then by induction we have μ(Z) = μ(W).The proof that μ(X) = μ(Y) is now immediate by application of Lemma 6.3.If instead there is a variable Z with X ≡ D Z and Y ≡ D Z, then we can argue by induction that μ(X) = μ(Z) and μ(Y) = μ(Z), and hence μ(X) = μ(Y).Definition 6.5.Let D be a sequence of potentially conflicting quadruples, A a set of tuple-contexts and φ A a partial context assignment for D over A. We say that φ A respects the constraints of D if, for every two (not necessarily different) variables X and Y occurring in D that φ A is defined for, the following conditions are true, where c X = φ A (X) and c Y = φ A (Y): (1) c X is a tuple-context for type(X); (2) for every (6) for every pair of tuple-subcontexts c ′ X and c ′ Y of c X and c Y witnessed by respectively The next Lemma shows that we can represent a counterexample schedule based on a sequence of potentially conflicting quadruples D over an acyclic schema by assigning a tuple-context to each variable in Trans(D).Lemma 6.6.For a multiversion split schedule s based on a sequence of conflicting quadruples C over a set of transactions T consistent with a set of transaction templates P ∈ AcycTemp and a database D, let μ be the variable mapping for a sequence of potentially conflicting quadruples D over P with μ(D) = C. Then a set A of tuple-contexts and a total context assignment φ A for D over A exist with: • φ A respects the constraints of D; and • μ(X) = c X (ε) for every variable X, with c X = φ A (X).
Proof.We first assign a tuple-context to each tuple in database D, based on the functions in D. Let (Rels, Funcs) be the schema over which P is defined.Since the schema graph SG(Rels, Funcs) is acyclic, a total order < SG over Rels exists such that there is no path from type R to type S in SG if R < SG S. We now assign tuple-contexts to tuples based on the order implied by < SG .That is, we first consider all tuples of the type that is ordered first by < SG , then all tuples of the type that is ordered second, etc.If there are multiple tuples of the same type, the relative order in which we handle them is not important.For each tuple t, we construct a tuple-context c t with c t (ε) = t, and for each path F = f • F ′ in SG starting in type(t), set c t (F ) = c v (F ′ ), with v = f D (t).Notice that c v is already defined for v, as there is a path from type(t) to type(v) in SG and, hence, type(v) < SG type(t).By construction, c v is a tuple-subcontext of c t witnessed by f .Next, we construct φ A as follows: φ A (X) = c t with μ(X) = t for every variable X occurring in Trans(D).We argue by induction on the definition of F ⇒ D that It remains to verify that φ A indeed satisfies all required properties.By construction, μ(X) = c t (ε) with c t = φ A (X), so we only need to show that φ A respects the constraints of D by verifying all properties in Definition 6.5.To this end, let X and Y be two variables occurring in Trans(D), and let c t = φ A (X) and ⇒ D Y, then by construction of the tuple-contexts c t and c v it follows that c v is a tuple-subcontext of c t witnessed by F .(6) Let c t ′ and c v ′ be tuple-subcontexts of c t and c v witnessed by respectively F X and F Y .If c t (F X ) = c v (F Y ) = q for some tuple q, then by construction of c t and c v we have c t ′ = c v ′ = c q , with c q the tuple-context assigned to this tuple q.
From D = (τ 1 , o 1 , p 2 , τ 2 ), . . ., (τ m , o m , p 1 , τ 1 ) and φ A as in Lemma 6.6 we can derive a sequence of quintuples E = (τ 1 , o 1 , c o 1 , p 1 , c p 1 ), . . ., (τ m , o m , c om , p m , c pm ) such that c o i = φ A (X i ) (respectively c p i = φ A (Y i )) for i ∈ [1, m] with o i (respectively p i ) an operation over variable X i (respectively Y i ).Intuitively, this sequence of quintuples can be used to reconstruct the original multiversion split schedule s.To this end, notice that we can derive the original sequence of potentially conflicting quadruples D and a partial context assignment φ ′ A from E that is defined for each variable X i occurring in either an operation o i or p i in τ i .We first show that we can extend this partial context assignment φ ′ A to a total context assignment respecting the constraints in D (Lemma 6.7), and then prove that such a total context assignment respecting the constraints in D implies a variable assignment μ such that the C = μ(D) is a valid sequence of conflicting quadruples (Lemma 6.8).Lemma 6.7.Let D = (τ 1 , o 1 , p 2 , τ 2 ), . . ., (τ m , o m , p 1 , τ 1 ) be a sequence of potentially conflicting quadruples over a set of transaction templates P ∈ AcycTemp and φ A a partial context assignment defined for every variable X i of o i and Y i of p i in every τ i .If • φ A respects the constraints of D; and • for every pair of variables X and Y in a template τ i with X ≡ τ i Y, there is no constraint X ̸ = Y in τ i ; then we can extend φ A to a total context assignment φ ′ A for D respecting the constraints of D.  ⇒ D Y to denote that X F ⇒ D Y ′ for every Y ′ ∈ [Y] and X ′ F ⇒ D Y for every X ′ ∈ [X], respectively.Let (Rels, Funcs) be the schema over which P is defined.Since the schema graph SG(Rels, Funcs) is acyclic, a total order < SG over Rels exists such that there is no path from type R to type S in SG if R < SG S. We now define φ ′ A for variables in Trans(D) according to the order implied by < SG .If there are multiple variables of the same type, the relative order in which we handle them is not important.
The proof is as follows.Assume φ A respects the constraints of D and is at least defined for every variable X i of o i and Y i of p i in every τ i .We extend φ A towards φ ′ A by defining φ ′ A for the whole equivalence class [X] of the first (according to < SG ) variable X for which φ A is not defined.The precise construction is by case.In the first case, the tuple-context that should be assigned to variables in [X] is already implied, as it is the tuple-subcontext of an existing tuple-context.In the second case, we construct a fresh tuple-context, including existing tuple-contexts as tuple-subcontexts where we need to make sure that φ ′ A respects the constraints in D. In each case, we then argue that φ ′ A still respects the constraints in D. By repeating this argument, we can extend the context assignment to a total context assignment defined for all variables occurring in Trans(D).
(Case 1) If a variable Y exists with φ A defined for Y and Y F ⇒ D [X], then φ ′ A (X ′ ) = c X ′ for every variable X ′ ∈ [X], with c X ′ the tuple-subcontext of c Y = φ A (Y) witnessed by F .Notice that this is well defined, even if there are multiple such Y, as they all agree on c X ′ by Definition 6.5 (2, 6).Also note that the special case where φ A is already defined for at least one variable X ′ ∈ [X] is covered by this case as well, as In this special case, the tuple-subcontext of φ A (X ′ ) witnessed by ε (i.e., φ A (X ′ ) itself) will be assigned to each variable in [X].We show that φ ′ A indeed respects the constraints in D according to the properties stated in Definition 6.5.To this end, let X ′ and Y ′ be two variables, with c then we can apply this substitution and use the fact that φ A respects the constraints in τ to conclude that the desired properties hold for φ Since φ A respects the constraints in D, we apply Definition 6.5 (2, 5, 6) to conclude that c Y ′ is a tuple-subcontext of c X ′ witnessed by and let c X ′′ and c Y ′′ be the tuple-subcontexts of respectively c X ′ witnessed by F X ′ and c Y ′ witnessed by F Y ′ .We argue that c (Case 2) Otherwise, we construct a fresh tuple-context c X and define φ ′ A (X ′ ) = c X for every variable X ′ ∈ [X].This tuple-context c X is constructed as follows: c X (ε) = t X , with t X a fresh tuple of the appropriate type.For every path In other words, c Y is the tuple-subcontext of c X witnessed by f .Note that due to the order < SG , φ A (Y) has to be defined already.Also note that this is well defined, even if multiple such Y exist.In that case, all these Y are equivalent to each other by definition of ≡ D , and by construction of φ A they are assigned the same tuple-context.If instead no such variable Y exists, we define c X (F ) = t F , with t F a fresh tuple of the appropriate type.
We show that φ ′ A indeed respects the constraints in D according to the properties stated in Definition 6.5.To this end, let X ′ and Y ′ be two variables occurring in Trans(D), with ⇒ D Z and Y ′ F2 ⇒ D Z for some variable Z.We argue that there exists a pair of variables X ′′ and Y ′′ and two sequences of function names In the former case there is a variable , where c X ′′ = φ A (X ′′ ).In the later case, Z ∈ [X], and we simply take follows by the fact that φ A respects the constraints of D. In the latter case, both X ′′ ∈ [X] and Y ′′ ∈ [X], as otherwise (Case 1) would apply to [X] instead.Then, c (3, 4) The reasoning is analogous to the previous property.Note in particular that by construction of the new c X we have c X ′ (F 1 ) = c Y ′ (F 2 ) if W ≡ D Z. Since W ≡ D Z implies that there is no constraint W ̸ = Z by the assumptions on φ A and on the disequality constraints in each template τ ∈ Trans(D), this does not lead to contradictions.(5) only if X ′ ∈ [X], as otherwise (Case 1) would apply to [X] instead.We argue by case that c Y ′ is a tuple-subcontext of c X ′ witnessed by F ′ .If X ′ ̸ ∈ [X] and Y ′ ̸ ∈ [X], the result is immediate by the fact that φ A respects the constraints of D. If X ′ ∈ [X] and Y ′ ̸ ∈ [X], then c X ′ = c X and a variable Z exists such that and, by construction of c X , c Y ′ is a tuple-subcontext of φ A (Z) witnessed by F ′′ .It now follows that c Y ′ is a tuple-subcontext of c X witnessed by F ′ .Lastly, If both X ′ ∈ [X] and Y ′ ∈ [X], then F ′ = ε, as otherwise the schema graph is not acyclic.The result is immediate, as c Y ′ = c X ′ = c X is by definition a tuple-subcontext of itself witnessed by ε. (6) Assume c X ′′ (F 1 ) = c Y ′′ (F 1 ) for some pair of tuple-contexts c X ′′ and c Y ′′ that are tuple-subcontexts of respectively c X ′ witnessed by F 1 and c Y ′ witnessed by F 2 .We argue that c X ′′ = c Y ′′ .If both c X ′ and c Y ′ are different from c X , the result is immediate as φ A respects the constraints of D. Otherwise, since the construction of c X , either copies existing tuple-contexts as tuple-subcontexts, or introduces fresh variables.the result holds if c X ′ and/or c Y ′ are equal to c X .
Given a total context assignment respecting the constraints in D as in Lemma 6.7, we show in the next Lemma that such a total context assignment implies a variable assignment μ such that the C = μ(D) is a valid sequence of conflicting quadruples.Lemma 6.8.Let D = (τ 1 , o 1 , p 2 , τ 2 ), . . ., (τ m , o m , p 1 , τ 1 ) be a sequence of potentially conflicting quadruples over a set of transaction templates and φ A a total context assignment for D respecting the constraints of D. The variable mapping μ obtained by defining μ(X) = c X (ε) for every variable X in Trans(D) with c X = φ A (X) then is a valid variable mapping admissible for some database D.
Proof.We first argue that μ is valid by showing for each conflicting quadruple (τ i , o i , p j , τ j ) in D that μ(X) = μ(Y) with X = var(o i ) and Y = var(p j ).By definition, X ε ⇝ D Y, and hence Next, we construct a database D and show that μ is admissible for D. To this end, we add the tuple μ(X) to D for each variable X occurring in Trans(D).For each functional constraint Y = f (X) in a transaction template in Trans(D), we define μ(Y) = f D (μ(X)) for the corresponding function f D in D. Note that this is well defined.Towards a contradiction, assume that we have both μ To conclude the proof, we show that μ is indeed admissible for D. By construction of D based on μ, μ(Y) = f D (μ(X)) is immediate for each constraint Y = f (X) in a template τ ∈ Trans(D).We still need to argue that μ(X) ̸ = μ(Y) for each constraint X ̸ = Y in a template τ ∈ Trans(D).Let c X = φ A (X) and c Y = φ A (Y).By construction of μ we have μ(X) = c X (ε) and μ(Y) = c Y (ε).Note that X ε ⇒ D X and Y ε ⇒ D Y. Therefore, we can apply Definition 6.5 (3) to conclude that c X (ε) ̸ = c Y (ε), and hence μ(X) ̸ = μ(Y).
In order to decide robustness against RC, one can now construct a sequence of quintuples E and derive the sequence of potentially conflicting quadruples D and partial context assignment φ A from it.If φ A respects the constraints in D, then it follows from Lemma 6.7 and Lemma 6.8 that we can construct a variable assignment μ such that C = μ(D) is a valid sequence of conflicting quadruples.However, in this construction of E, care should be taken to guarantee that φ A indeed respects the constraints in D, and that the resulting multiversion split schedule based on C indeed satisfies all properties in Definition 3.6.
In the algorithm that we are about to propose, we search for such a sequence of quintuples E, but without fixating all the tuples in each context.For this, we generalize our definition of tuple-contexts to allow variables: A context for a type R is a function from paths with source R in SG(Rels, Funcs) to variables in Var and tuples in Tuples of the appropriate type.The purpose of variables is to encode equalities and disequalities within each context, without being explicit about the precise tuples.That is, if two paths ending in the same node in SG are mapped on the same variable, then they will represent the same tuple; if they are mapped on different variables, then they represent a different tuple.We remark that a same variable occurring in different contexts can still represent different tuples.Analogous to tuple-subcontexts, for two types R and S with R F ⇝ SG S, we say that a context c S for type S is a subcontext of a context c R for type R witnessed by F if: We call a context a variable-context if all paths are mapped on variables.
For a transaction template τ , tuple-context c p for p and c o for o in τ , we consider the set Contexts(SG, τ, p, c p , o, c o ) of all different (not-necessarily tuple-) contexts c (up to isomorphisms over the variables in c) that can be obtained, starting from a variable-context c ′ , by performing substitutions of subcontexts of c ′ with subcontexts of c p and/or c o .More formally, these substitutions are of the form: For a path R c F ⇝ SG S (here R c is the type that c is for) and R p The substitution rule can be applied for c p as well as for c o .)Lemma 6.9.Let P be a set of transaction templates over an acyclic schema.Then, P is not robust against RC if, and only if, there is a sequence of quintuples with q i and r i two (not necessarily different) operations in {o i , p i }, (1) if i = 1, then c o 1 and c p 1 are tuple-contexts for type(var(o 1 )) and type(var(p 1 )).Furthermore, for every pair of tuple-subcontexts c ′ o 1 and c ′ p 1 of c o 1 and c p 1 witnessed by respectively F and F ) for every var(q i ) F1 ⇒ τ i Z i and var(r i ) F2 ⇒ τ i Z i , the subcontext of c q i witnessed by F 1 is equal (up to isomorphisms over variables) to the subcontext of c r i witnessed by F 2 .
(5) for every var(q i ) F1 ⇒ τ i W i and var(r i ) F2 ⇒ τ i Z i with W i ̸ = Z i a constraint in τ i , c q i (F 1 ) ̸ = c r i (F 2 ) or c q i (F 1 ) and c r i (F 2 ) are both variables; • each pair of (not necessarily different) variables X k i , Y k i occur in the same template τ k i in Trans(D) and • in the implied sequence of templates τ k 1 , . . ., τ km , these τ k i , . . ., τ k i+1 are neighbouring in E (where we assume that τ 1 is neighbouring to τ n in E); and • for each pair of variables Y k i , X k i+1 , there is a sequence of function names a sequence of neighbouring templates, where equivalence between each Y k i and X k i+1 is implied by the variables in the potentially conflicting operations o k i and p k i+1 .For ease of exposition, we implicitly assumed that τ k 1 , . . ., τ km agrees with the order in E. If the order is opposite to the order in E instead, the above still holds, but the occurrences of o k i and p k i+1 should be replaced with p k i and o k i+1 .
We argue by construction of φ A that for every pair of variables X and Y for which φ A is defined with c where c ′ X and c ′ Y are the tuple-subcontexts of c X witnessed by F and c Y witnessed by F ′ , respectively ( ‡).If X = var(o 1 ) and Y = var(p 1 ) (or the other way around), the result is immediate by Lemma 6.9 (1).Otherwise, let X be the variable in a template τ i and Y the variable in a template τ j such that j ≤ i (i.e., τ i does not occur before τ j in E).W.l.o.g., we assume that X = var(o i ) with i ̸ = 1 (the case where , then the whole tuple-subcontext of c ′ o i witnessed by F i is copied over from the tuple-subcontext of c ′ p i witnessed by F ′ i .Indeed λ i introduces fresh tuples whenever the tuple for c ′ o i (F i ) is not implied by c ′ p i .The desired properties now follow from ( †) and ( ‡) as well as the conditions in Lemma 6.9.In particular, φ A respecting the constraints of D can now be derived from Conditions (1, 2, 4-7) in Lemma 6.9, and the last condition of Lemma 6.10 follows from Condition (8) in Lemma 6.9.
Proof of Theorem 6.1.A nexpspace algorithm proving the correctness of Theorem 6.1 is now immediate by Lemma 6.9, as we can iteratively guess and verify quintuples in E while only keeping track of the very first quintuple and the previous quintuple.Since in an acyclic schema graph the number of paths starting in a given type is at most exponential in the total number of types, each context is defined over at most an exponential number of paths.However, to formally argue that these contexts can be encoded in exponential space, we still need to show that each tuple or variable used in a context can be encoded in at most exponential space.Since the only tuples used are those mentioned in c o 1 and c p 1 , and since we can reuse the same variables over all contexts, both the maximal number of tuples and the maximal number of variables needed are exponential in the total number of types.6.1.Lowering complexity.Next, we consider restrictions that lower the complexity.To this end, we say that two variables X and Y occurring in a transaction template τ are equivalent in τ , denoted • there exists a pair of variables Z and W in τ and a sequence of function names F with Z ≡ τ W, Z F ⇝ τ X and W F ⇝ τ Y; or • there exists a variable Z with X ≡ τ Z and Y ≡ τ Z.
Then, a transaction template τ is restricted if for every combination of variables X, Y, W, Z in τ with X ⇝ τ W and Y ⇝ τ Z, either W ≡ τ Z, W ⇝ τ Z or Z ⇝ τ W. We denote by AcycResTemp the class of all sets of restricted transaction templates over acyclic schemas.
(2) t-robustness(AcycTemp,RC) is decidable in pspace when the number of paths between any two nodes in the schema graph is bounded by a constant k.
Regarding (1), all templates in TPC-C with the exception of NewOrder are restricted.Regarding (2), when the schema graph is a multi-tree then k = 1 and for TPC-C k = 2 (recall that in general there can be an exponential number of paths), leading to a more practical algorithm for robustness in those cases.The pspace result in Theorem 6.11 for workloads over a schema (Rels, Funcs) where the number of paths between any two nodes in the schema graph is bounded by a constant k is immediate by the nondeterministic algorithm based on Lemma 6.9 presented for Theorem 6.1.Indeed, in this case, the total number of paths starting in a given type is at most k.|Rels| and therefore each context is defined over at most a polynomial number of paths, instead of an exponential number of paths for the general case in Theorem 6.1.
The exptime result in Theorem 6.11 for workloads in AcycResTemp follows from a deterministic algorithm based on Lemma 6.9.In the remainder of this section, we first present the algorithm, and then discuss its complexity.
A deterministic algorithm.Towards a deterministic algorithm, assume the first quintuple (τ 1 , o 1 , c o 1 , p 1 , c p 1 ) of E is fixed.We now translate the problem of deciding whether we can extend E such that it satisfies all properties to a graph problem over a graph G(τ 1 , o 1 , c o 1 , p 1 , c p 1 ).This graph is constructed as follows: • each quintuple (τ i , o i , c o i , p i , c p i ) satisfying Conditions (2-8) of Lemma 6.9 is added as a node to G(τ 1 , o 1 , c o 1 , p 1 , c p 1 ); and • there is an edge from a node (τ i , o i , c o i , p i , c p i ) to a node (τ j , o j , c o j , p j , c p j ) if o i is potentially conflicting with p j and c o i = c p j (c.f.Condition (9) of Lemma 6.9).By construction, it is now easy to see that there is a sequence E satisfying Lemma 6.9 if there is a path from a quintuple (τ 2 , o 2 , c o 2 , p 2 , c p 2 ) to a quintuple (τ m , o m , c om , p m , c pm ) in G(τ 1 , o 1 , c o 1 , p 1 , c p 1 ) (where we allow a zero-length path with 2 = m), such that ( †) • c o 1 = c p 2 and c om = c p 1 (c.f.Condition (9) of Lemma 6.9); • o 1 is potentially rw-conflicting with p 2 (c.f.Condition (10) of Lemma 6.9); and • o 1 < τ 1 p 1 or o m is potentially rw-conflicting with p 1 (c.f.Condition (11) of Lemma 6.9).
Given a set of transaction templates P over a schema (Rels, Funcs), the algorithm iterates over all possible quintuples (τ 1 , o 1 , c o 1 , p 1 , c p 1 ) satisfying Condition (1, 3-7) of Lemma 6.9, where we consider all possible tuple-contexts c o 1 and c p 1 up to isomorphisms.For each such quintuple, the graph G(τ 1 , o 1 , c o 1 , p 1 , c p 1 ) is constructed.Let T C be the reflexive-transitive closure of G.If there is a pair of quintuples (τ 2 , o 2 , c o 2 , p 2 , c p 2 ) and (τ m , o m , c om , p m , c pm ) in T C satisfying ( †), the algorithm emits a reject, indicating that P is not robust against RC.Otherwise, it proceeds with a new choice for (τ 1 , o 1 , c o 1 , p 1 , c p 1 ).If, the algorithm didn't reject after considering all such quintuples, it accepts, indicating that P is indeed robust against RC.The correctness of this algorithm is immediate by Lemma 6.9.
Complexity analysis.We show the complexity of the presented algorithm.For this, first, notice that we have defined contexts c based on a type S (with S not necessarily a root of SG).For encoding purposes it makes sense to encode these as contexts for a root type R in combination with the intended type S. The context as defined in the previous section can then be derived by taking the left-most subtree with root S. Notice that this is purely an encoding choice that will simplify the analysis.
For a schema graph SG(Rels, Funcs) the total number of non-isomorphic tuple-contexts can be expressed using Bell where TupleContexts(SG) denotes the set containing all different tuple-contexts (up to isomorphisms).Now let c 1 and c 2 be two fixed contexts for types that are descendants of roots R 1 and root R 2 , respectively, in SG, and let c be a context for a type descending from root R. To express a bound on the number of substitutions in c from (parts of) c 1 and c 2 , we need some additional terminology: Let Paths SG (R, * ) = S∈Rels Paths SG (R, S).We say that a path R F1 ⇝ SG S is a prefix of a path R F ⇝ SG S ′ in SG if there is a (possibly empty) sequence of function names F 2 with F = F 1 • F 2 .The number of substitutions in c from (parts of) c 1 and c 2 is now bounded by In the above expression, 1 P F P is an indicator variable that equals 1 if no path in Part is a prefix of another path in Part and that equals 0 otherwise.Further: P denotes the maximum number of different paths between a particular root and a particular node in SG, T denotes the maximum number of different paths from a particular root to nodes in SG, and ℓ denotes the maximal size of a set in which no path is a prefix of another path in the set.The latter is trivially bounded by T .
A special cases exists if all templates τ in P are restricted.In that case, the size of sets Part is bounded by 2, hence ℓ ≤ 2.
With the above bounds, the complexity of the presented algorithm is rather straightforward.The iteration over all possible quintuples (τ 1 , o 1 , c o 1 , p 1 , c p 1 ) requires at most |P|.Since ℓ is bounded by a constant if all template are restricted, and since B * , T and P can be exponential in the size of the input, the presented algorithm indeed decides t-robustness(AcycResTemp,RC) in exptime.

Related Work
Transaction Programs.Previous work on static robustness testing [FLO + 05, AF15] for transaction programs is based on the following key insight: when a schedule is not serializable, then the dependency graph constructed from that schedule contains a cycle satisfying a condition specific to the isolation level at hand (dangerous structure for snapshot isolation and the presence of a counterflow edge for RC).That insight is extended to a workload of transaction programs through the construction of a so-called static dependency graph where each program is represented by a node, and there is a conflict edge from one program to another if there can be a schedule that gives rise to that conflict.The absence of a cycle satisfying the condition specific to that isolation level then guarantees robustness while the presence of a cycle does not necessarily imply non-robustness.
Other work studies robustness within a framework for uniformly specifying different isolation levels in a declarative way [CBG15,BG16,CG18].A key assumption here is atomic visibility requiring that either all or none of the updates of each transaction are visible to other transactions.These approaches aim at higher isolation levels and cannot be used for RC, as RC does not admit atomic visibility.
Transaction Templates.The static robustness approach based on transaction templates [VKKN21] differs in two ways.First, it makes more underlying assumptions explicit within the formalism of transaction templates (whereas previous work departs from the static dependency graph that should be constructed in some way by the dba).Second, it allows for a decision procedure that is sound and complete for robustness testing against RC, allowing to detect larger subsets of transactions to be robust [VKKN21].
The formalization of transactions and conflict serializability in [VKKN21] and this paper is based on [Fek05], generalized to operations over attributes of tuples and extended with U-operations that combine R-and W-operations into one atomic operation.These definitions are closely related to the formalization presented by Adya et al. [ALO00], but we assume a total rather than a partial order over the operations in a schedule.There are also a few restrictions to the model: there needs to be a fixed set of read-only attributes that cannot be updated and which are used to select tuples for update.The most typical example of this are primary key values passed to transaction templates as parameters.The inability to update primary keys is not an important restriction in many workloads, where keys, once assigned, never get changed, for regulatory or data integrity reasons.
In [VKKN21], a ptime decision procedure is obtained for robustness against RC for templates without functional constraints and the present paper improves that result to nlogspace.In addition, an experimental study was performed showing how an approach based on robustness and making transactions robust through promotion can improve transaction throughput.
Transactions.The work by Fekete [Fek05] is the first work that provides a necessary and sufficient condition for deciding robustness against snapshot isolation for a workload of concrete transactions (not transaction programs).That work provides a characterization for acceptable allocations when every transaction runs under either snapshot isolation or strict two-phase locking (S2PL).The allocation then is acceptable when every possible execution respecting the allocated isolation levels is serializable.As a side result, this work indirectly provides a necessary and sufficient condition for robustness against snapshot isolation, since robustness against snapshot isolation holds iff the allocation where each transaction is allocated to snapshot isolation is acceptable.Ketsman et al. [KKNV20] provide full characterizations for robustness against read committed and read uncommitted under lock-based semantics.In addition, it is shown that the corresponding decision problems are complete for conp and logspace, respectively, which should be contrasted with the polynomial time characterization obtained in [VKKN21] for robustness against multiversion read committed.

Conclusion
This paper falls within a more general research line investigating how transaction throughput can be improved through an approach based on robustness testing that can be readily applied without making any changes to the underlying database system.As argued in Section 2, incorporating functional constraints can detect larger sets of templates to be robust and requires less R-operations to be promoted to U-operations.In future work, we plan to look at lower bounds, restrictions that lower complexity, and consider other referential integrity constraints to further enlarge the modelling power of transaction templates.For example, the current formalism allows to express dependencies between tuples, but it is not suited to express the fact that a transaction accesses all tuples depending on a specific tuple, such as all Order tuples depending on a specific Customer tuple.
: Account{N, C}] R[Y : Savings{C, B}] R[Z : Checking{C, B}] Y = f A→S (X), X = f S→A (Y) Z = f A→C (X), X = f C→A (Z) DepositChecking: R[X : Account{N, C}] U[Z : Checking{C, B}{B}] Z = f A→C (X), X = f C→A (Z) TransactSavings: R[X : Account{N, C}] U[Y : Savings{C, B}{B}] . . .): creates a new order for the customer identified by (W, D, C).The id for this order is obtained by increasing the NextOrderID attribute of the District tuple identified by (W, D) by one.Each order consists of a number of items I 1 , I 2 , . . .with respectively quantities Q 1 , Q 2 , . ... For each of these items, a new OrderLine tuple is created and the related stock quantity is decreased.•Payment(W , D, C, A): represents a customer identified by (W, D, C) paying an amount A. This payment is reflected in the database by increasing the balance of this customer by A. This amount is furthermore added to the YearToDate (YTD) income of both the related Warehouse and District tuples.• OrderStatus(W , D, C, O): requests information about the current status of the order identified by (W, D, O).This transaction template collects information of the customer identified by (W, D, C) who created the order, the Order tuple itself, and the different OrderLine tuples related to this order.• Delivery(W , D, C, O): delivers the order represented by (W, D, O).The status of the order is updated, as well as the DeliveryInfo attribute of each OrderLine tuple related to this order.The total price of the order is deduced from the balance of the customer who made this order, identified by (W, D, C). • StockLevel(W , I): returns the current stock level of item I in warehouse W .

Figure 2 .
Figure 2. Abstraction for the TPC-C transaction templates.Attribute names are abbreviated.
and range(f A→C ) = C.A database D over schema (Rels, Funcs) assigns to every relation name R ∈ Rels a finite set R D ⊂ Tuples R and to every function name f ∈ Funcs a function f D from dom(f ) D to range(f ) D .
and a U-operation is a U[t].We also assume a special commit operation denoted C. To every operation o on a tuple of type R, we associate the set of attributes ReadSet(o) ⊆ Attr(R) and WriteSet(o) ⊆ Attr(R) containing, respectively, the set of attributes that o reads from and writes to.When o is a R-operation then WriteSet(o) = ∅.Similarly, when o is a W-operation then ReadSet(o) = ∅.

Figure 5 .
Figure 5. Transaction templates for the proof of Theorem 4.1.
and contains various cycles.on-e m p t y is-er ro r f i n a l -d o m i n o -s e q u e n c e e r r o r -s t r i n g a l -d o m in o-sequ e n c e Proposition 4.2 (Only-if part of Theorem 4.1).Let D be a set of dominoes with a solution d for the PCP problem for D. Then there exists a schedule-encoding of d that is consistent with the transaction templates in Figure5and some database D. Proof.Let d = d 1 , d 2 , . . ., d r be a solution to the PCP problem for D. Let a 1 a 2 . . .a r be the read of top values and b 1 b 2 . . .b r be the read of bottom values, which thus represent an identical string c = c 1 • • • c n , with c i ∈ Σ.We now construct a schedule s and database D as in Definition 3.6 with transactions based on the transaction templates P in Figure 5.
Functions f append-0 : String → String, f append-1 : String → String, f detach : String → String, and f top : String → String simulate standard string operations for the interpretations of tuples in relation String.Thus, tuples representing a (possibly empty) string e: f D append-c (⟨e⟩) =    ⟨ec⟩ with e a (possibly empty) string over Σ, c ∈ Σ, and ⟨ec⟩ a substring of c, ⟨error⟩ otherwise, f D detach (⟨ec⟩) = ⟨e⟩ with e a (possibly empty) string over Σ, and c ∈ Σ, f D detach (⟨⟩) = f detach (⟨error⟩) = ⟨error⟩, f D top (⟨ec⟩) = ⟨c⟩ with e a (possibly empty) string over Σ, and c ∈ Σ, f D top (⟨⟩) = f top (⟨error⟩) = ⟨error⟩.Notice that these function interpretations are closed under D, that is, every tuple from relation String D maps onto a tuple that is in relation String D .Every tuple in DominoSequence is associated with three tuples in String representing, respectively, the read of top values, the read of bottom values, and the empty string.The association is made via functions f top-string : DominoSequence → String, f bottom-string : DominoSequence → String, and f empty-string : DominoSequence → String with following interpretations in D: f D top-string (d ′ ) = e, with e the read of top values on dominoes in d ′ , f D bottom-string (d ′ ) = e, with e the read of bottom values on dominoes in d ′ , and f D empty-string (d ′ ) = ⟨⟩.We emphasize that in the expressions above the read e of top and bottom values on dominoes in d ′ might be empty.Finally, for function f future-solution-string : DominoSequence → String we consider the interpretation that associates every domino sequence d ′ represented by a tuple in relation DominoSequence in D to the final read f D future-solution-string (d ′ ) = c.Function f solution-string : PCPSolution → String does the same for the single tuple representing d in PCPSolution, thus with f D solution-string (d) = c.Function f empty-domino-sequence : String → DominoSequence is interpreted to map every tuple in String onto the tuple from DominoSequence representing the empty sequence ().
otherwise, , and.Finally, relation InitialConflict D contains a single tuple, which we refer to by ⟨init⟩.The interpretation of f defines : Boolean → InitialConflict maps 1 and 0 onto ⟨init⟩.Function f error-string : InitialConflict → String maps ⟨init⟩ onto ⟨error⟩.Functions f final-domino-string : InitialConflict → DominoSequence and f final-domino-sequence : InitialConflict → PCPSolution map ⟨init⟩ onto the solution domino sequence d, respectively on the final read c of d.
for D and with some database D. If s is a schedule-encoding of a sequence d of dominoes in D, then d is a solution for the PCP problem on input D. Proof.Let a 1 a 2 . . .a h and b 1 b 2 . . .b k be the two strings (with a i , b i ∈ Σ) obtained by reading from left to right, symbol by symbol, the values on the top, respectively, the bottom of dominoes d 1 , . . ., d r .Let us say that a 1 a 2 . . .a h = a 1 a 2 . . .a r and b 1 b 2 . . .b k = b 1 b 2 . . .b r .Notice that h and k are not necessarily equal to r as the top and bottom value of an individual domino can be of different length.

Figure 7 .
Figure 7. Schema graph for the SmallBank benchmark.The dashed edges correspond to the multi-tree schema graph for the schema restricted to f A→S and f A→C .

Lemma 5. 7 .
Let D be a sequence of potentially conflicting quadruples.If X D Y and Y D Z then X D Z for every triple of variables X, Y, Z occurring in Trans(D).Proof.According to Lemma 5.2, X D Y and Y D Z imply respectively X ⇝ D Y and Y ⇝ D Z.By definition, X ⇝ D Z and hence X D Z.

Figure 8 .
Figure 8. Acyclic schema graph for the TPC-C benchmark.
then by construction of φ A and since μ is admissible for D, we havec t (F ) = μ(Y).If F = ε and X ≡ D Y, then c t (ε) = μ(X) = μ(Y) by Lemma 6.4.Otherwise, if there exists a variable Z with X F1 ⇒ D Z, Z F2 ⇒ D Y and F = F 1 • F 2 , then by induction c t (F 1 ) = μ(Z) = v and c v (F 2 ) = μ(Y), with φ A (Z) = c v .By construction of c t and c v , the desired c t (F ) = μ(Y) now follows.

Proof.
By definition of equivalence, ≡ D partitions all variables occurring in Trans(D) in equivalence classes.That is, two variables X and Y are in the same equivalence class iff X ≡ D Y.For a given variable X, we denote the equivalence class X belongs to by [X].Note that for any pair of variables X and Y occurring in Trans(D), if X F ⇒ D Y, then X ′ F ⇒ D Y ′ for any pair of variables X ′ ∈ [X] and Y ′ ∈ [Y].By slight abuse of notation, we use X F ⇒ D [Y] and [X] F X) and c Y = φ A (Y). Since φ A respects the constraints of D, c Y is a tuple-subcontext of c X witnessed by ε.By definition of tuple-subcontexts, c X (ε) = c Y (ε), and, as a result, μ( some template in Trans(D), and hence X f ⇒ D Y. Analogously, Z f ⇒ D W. By Definition 6.5 (5), c Y and c W are tuple-subcontexts of respectively c X and c Z witnessed by f .As c X = c Z , it immediately follows that c Y = c W , and in particular c Y (ε) = c W (ε), leading to the desired contradiction.
F .Furthermore, two variables X and Y occurring in a template τ are connected in τ , denoted X τ Y, if X F ⇝ τ Y or Y F ⇝ τ X, or if there is a variable Z with X τ Z and either Z F or if there is a variable Z with X D Z and either Z F ⇝ D Y or Y F ⇝ D Z for some6To be formally correct, the latter would require to add every such variable-renamed template to P creating a larger set P ′ .This does not influence the complexity of Theorem 5.1 as Trans(D) nor P ′ are used in the algorithm.Their only purpose is to reason about properties of μ.sequence are contexts for type(var(o i )) and type(var(p i ));(3) for every pair of variables W i and Z