On linear rewriting systems for Boolean logic and some applications to proof theory

Linear rules have played an increasing role in structural proof theory in recent years. It has been observed that the set of all sound linear inference rules in Boolean logic is already coNP-complete, i.e. that every Boolean tautology can be written as a (left- and right-)linear rewrite rule. In this paper we study properties of systems consisting only of linear inferences. Our main result is that the length of any 'nontrivial' derivation in such a system is bound by a polynomial. As a consequence there is no polynomial-time decidable sound and complete system of linear inferences, unless coNP=NP. We draw tools and concepts from term rewriting, Boolean function theory and graph theory in order to access some required intermediate results. At the same time we make several connections between these areas that, to our knowledge, have not yet been presented and constitute a rich theoretical framework for reasoning about linear TRSs for Boolean logic.


Introduction
Consider the following conjunction rule from a Gentzen-style sequent calculus: where Γ and ∆ are finite sequences of formulae. In this rule all the formulae in the premisses occur in the conclusion with the same multiplicity. In proof theory this is referred to as a multiplicative rule. This phenomenon can also be described as a linear rule in term rewriting. For instance, the proof rule above has logical behaviour induced by the following linear term rewriting rule, 2) where C and D here represent the disjunction of the formulae in Γ and ∆ respectively from (1.1). This rule has been particularly important in structural proof theory, serving as the basis of Girard's multiplicative linear logic [Gir87]. A variant of (1.2), that will play some role in this paper is the following, which we call switch, following [Gug07, GS01,BT01], but which is also known as weak distributivity [BCST96]. However the concept of linearity, or multiplicativity, itself is far more general. For instance, the advent of deep inference has introduced the following linear rule, known as medial [BT01]: to be reduced to atomic form. For example consider the following transformation which reduces the logical complexity of a contraction step, where redexes are underlined. Until now the nature of linearity in Boolean logic has not been well understood, despite proving to be a concept of continuing interest in proof theory, cf. [Gug11], and category theory, cf. [Str07b,Lam07]. While switch and medial form the basis of usual deep inference systems, it has been known for some time that other options are available: there are linear rules that cannot be derived from just these two rules (even modulo logical equivalences and constants), first explicitly shown in [Str12]. The minimal known example, from [Das13], is the following: This example can be generalised to an infinite set of rules, where each rule is independent from all smaller rules. In fact, the situation is rather more intricate than this: the set of linear inferences, denoted L henceforth, is itself coNP-complete [Str12]. This can be proved by showing that every Boolean tautology can be written as a linear rule (which we demonstrate in Proposition 6.1). This leads us to a natural question: The structure of the paper is as follows. In Sections 2, 3 and 4 we present preliminaries on each of our three settings and some basic results connecting various concepts between them. In Section 5 and 6 we specialise to the setting of linear rewrite rules for Boolean logic and present our main results, Theorem 5.9 and Corollary 6.9. In Sections 7 and 8 we present some applications to deep inference proof theory, showing a form of canonicity for medial and some general consequences for the normalisation of deep inference proofs. In Section 9 we discuss a direction for future work in a graph-theoretic setting, and in Section 10 we present some concluding remarks, including relationships to models of linear logic and axiomatisations of Boolean algebras.
1 These have been studied in various forms and under different names. The first appearance we are aware of is in [Che67], and also the seminal paper of [Gur77] characterising these functions. The book we reference presents an excellent and comprehensive introduction to the area.

Preliminaries on rewriting theory
We work in the setting of first-order term rewriting as defined in the Terese textbook, Term Rewriting Systems [Ter03]. We will use the same notation for all symbols except the connectives, for which we use more standard notation from proof theory. In particular we will use ⊥ and for the truth constants, reserving 0 and 1 for the inputs and outputs of Boolean functions, introduced later.
We adopt one particular convention that differs from what is usual in the literature. A term rewriting system (TRS) is usually defined as an arbitrary set of rewrite rules. Here we insist that the set of instances of these rules, or reduction steps, is polynomial-time decidable. The motivation is that we wish to be as general as possible without admitting trivial results. If we allowed all sets then a complete system could be specified quite easily indeed. Furthermore, that an inference rule is easily or feasibly checkable is a usual requirement in proof theory, and in proof complexity this is formalised by the same condition on inference rules, cf. [CR74].
Let us now consider Boolean logic in the term rewriting setting. Our language conists of the connectives ⊥, , ∧, ∨ and a set Var of propositional variables, typically denoted x, y, z etc. The set Var is equipped with an involution (i.e. self-inverse function) · : Var → Var , such thatx = x for all x ∈ Var . We callx the dual of x and, for each pair of dual variables, we arbitrarily choose one to be positive and the other to be negative.
The set Ter of formulae, or terms, is built freely from this signature in the usual way. Terms are typically denoted by s, t, u etc., and term and variable symbols may occur with superscripts and subscripts if required.
In this setting and ⊥ are considered the constant symbols of our language. We say that a term t is constant-free if and ⊥ do not occur in t.
We do not include a symbol for negation in our language. This is due to the fact that soundness of a rewrite step is only preserved under positive contexts. Instead we simply consider terms in negation normal form (NNF), which can be generated for arbitrary terms from positive and negative variables by the De Morgan laws: We say that a term is negation-free if it does not contain any negative variables. We write Var (t) to denote the set of variables occurring in t. We say that a term t is linear if, for each x ∈ Var (t), there is exactly one occurrence of x in t. The size of a term t, denoted |t|, is the total number of variable and function symbols occurring in t. A substitution is a mapping σ : Var → Ter from the set of variables to the set of terms such that σ(x) = x for only finitely many x. The notion of substitution is extended to all terms, i.e. a map Ter → Ter , in the usual way. A (one-hole) context is a term with a single 'hole' occurring in place of a subterm. Below are three examples: We may write C i [t] to denote the term obtained by replacing the occurrence of in C i [ ] with t. We may also replace holes with other contexts to derive new contexts. For example, Definition 2.1 (Rewrite rules). A rewrite rule is an expression l → r, where l and r are terms, such that l = r. We write ρ : l → r to express that the rule l → r is called ρ. In this rule we call l the left hand side (LHS) of ρ, and r its right hand side (RHS). We say that ρ is left-linear (resp. right-linear ) if l (resp. r) is a linear term. We say that ρ is linear if it is both left-and right-linear. We write s → ρ t to express that s → t is a reduction step of ρ, i.e.
For instance, the rules s from (1.3) and m and (1.4) are examples of linear rules. The rule w↑ : x ∧ y → x (which we consider later in Section 8) is also linear, while the rule c↓ from (1.5) is not linear.
Definition 2.2 (Term rewriting systems). The one-step reduction relation of a set of rewrite of rewrite rules whose one-step reduction relation is decidable in polynomial time. A linear (term rewriting) system is a TRS whose rules are all linear.

Definition 2.3 (Derivations). A derivation under a binary relation →
For an equivalence relation ∼ on Ter and a TRS R, we define an R-derivation modulo ∼ as a sequence π : In this case we say that the length of π is l, i.e. we do not count the ∼ steps.
We write AC to denote the smallest equivalence relation closed under contexts generated by the following equations for associativity and commutivity of ∧ and ∨: Note that AC contains only linear equations. The following equations for the constants are also linear and similarly generate a context-closed equivalence relation called U : We denote by ACU the combined system of AC and U . We will also need the system U that extends U in the natural way by the following equations: Notice that these are not linear in the sense of [Das13], but are considered linear in our more general setting. We denote by ACU the combined system of AC and U . It turns out that this equivalence relation relates precisely those linear terms that compute the same Boolean function, as we will see later.

Preliminaries on relation webs
In this section we restrict our attention to negation-free constant-free linear terms and study their syntactic structure, in the form of relation webs [Gug07,Str07a].
We will consider graphs that are undirected, simple, and with labelled edges; we will make use of standard graph-theoretic terminology. For a graph G we denote its vertex set or set of nodes as V (G), and the set of its labelled edges as E(G). We say " x y in G" to express that the edge {x, y} is labelled in the graph G. A set X ⊆ V (G) is a -clique if every pair x, y ∈ X has a -labelled edge between them. A maximal -clique is a -clique that is not contained in any larger -clique.
Analysing the term tree of a negation-free constant-free linear term t, notice that for each pair of variables x, y occurring in t, there is a unique connective ∈ {∧, ∨} at the root of the smallest subtree containing the (unique) occurrences of x and y. Let us call this the least common connective of x and y in t.
Definition 3.1 (Relation webs). The (relation) web W(t) of a constant-free negation-free linear term t is the complete graph whose vertex set is Var (t), such that the edge between two variables x and y is labelled by their least common connective in t. We write e ∧ (t) (resp. e ∨ (t)) to be the number of ∧-(resp. ∨-)labelled edges in W(t).
As a convention we will write x y if the edge {x, y} is labelled by ∧, and we write x y if it is labelled by ∨.
Proposition 3.3. Let t be a constant-free negation-free linear term with n variables, and let e := 1 2 n(n − 1). Then e ∧ (t), e ∨ (t) ≤ e, and e ∧ (t) + e ∨ (t) = e. Proof. This follows from the fact that there are only e edges in a web, all of which must be labelled ∧ or ∨.
Remark 3.4 (Labels). We point out that, instead of using labelled complete graphs, we could have also used unlabelled arbitrary graphs, since we have only two connectives (∧ and ∨) and so one could be specified by the lack of an edge. This is indeed done in some settings, e.g. the cooccurrence graphs of [CH11]. However, we use the current formulation in order to maintain consistency with the previous literature, e.g. [Gug07] and [Str07a], and since it helps write certain arguments, e.g. in Section 7, where we need to draw graphs with incomplete information.
One of the reasons for considering relation webs is the following proposition, which allows us to reason about equivalence classes modulo AC easily. Proof. This follows immediately from the definition and that AC preserves least common connectives.

LINEAR REWRITING SYSTEMS FOR BOOLEAN LOGIC 7
An important property of webs is that they have no minimal paths of length > 2. More precisely, we have the following: Proposition 3.6. A complete {∧, ∨}-labelled graph on X is the web of some negation-free constant-free linear term on X if and only if it contains no induced subgraphs of the form: w x y z (3.1) A proof of this property can be found, for example, in [Möh89], [Ret93], [BdGR97], or [Gug07]. It is called P 4 -freeness or Z-freeness or N-freeness, depending on the viewpoint. This property can be useful when we reason with webs, for instance in Section 7.

Preliminaries on Boolean functions
In this section we introduce the usual Boolean function models for terms of Boolean logic. At the end of the section we give some examples of the various notions introduced.
A Boolean function on a (finite) set of variables X ⊆ Var is a map f : {0, 1} X → {0, 1}. We identify {0, 1} X with P(X), the powerset of X, i.e. we may specify an argument of a Boolean function by the subset of its variables assigned to 1. A little more formally, a function ν : X → {0, 1} is specified by the set X ν it indicates, i.e. x ∈ X ν just if ν(x) = 1. For this reason we may quantify over the arguments of a Boolean function by writing Y ⊆ X rather than ν ∈ {0, 1} X , i.e. we write f (Y ) to denote the value of f if the input is 1 for the variables in Y and 0 for the variables in X \ Y . Similarly, we write f (Y ) for the value of f when the variables in Y are 0 and the variables in X \ Y are 1.
For Boolean functions f, g : . Notice that the following can easily be shown to be equivalent: (1) f ≤ g. ( We also have that, if f (X) = 1, then there is some S ∈ MIN (f ) such that S ⊆ X; dually, if f (X) = 0, then there is some T ∈ MAX (f ) such that T ⊇ X.
Minterms and maxterms correspond to minimal DNF and CNF representations, respectively, of a monotone Boolean function. We refer the reader to [CH11] for an introduction to their theory. In this work we use them in a somewhat different way to Boolean function theory, in that we devise definitions of logical concepts such as entailment and, in the next section, what we call "triviality". The reason for this is to take advantage of the purely function-theoretic results stated in this section (e.g. Gurvich's Theorem 4.10 below) to derive our main results in Sections 5 and 6.
Proposition 4.4. For monotone Boolean functions f, g on the same variable set, the following are equivalent: We have that f (S) = 1 so also g(S) = 1, by 1, whence there must be an S ∈ MIN (g) such that S ⊆ S, by Observation 4.3.
2 =⇒ 1. If f (X) = 1 then there is some S ∈ MIN (f ) such that S ⊆ X, by Observation 4.3. By 2, there is some S ∈ MIN (g) such that S ⊆ S, and so S ⊆ X. Therefore g(X) = 1, by monotonicity, and so f ≤ g.
A term t computes a Boolean function {0, 1} Var (t) → {0, 1}, in the usual way, and negation-free terms compute monotone Boolean functions. Thus, we can speak of minterms and maxterms of a negation-free term t, referring to the minterms and maxterms of the function computed by t. For linear terms, this will allow us to give a graph-theoretic formulation of minterms and maxterms using concepts from the previous section. We give the following inductive construction of minterms and maxterms: Proposition 4.5. Let t be a term. A set S ⊆ Var (t) is a minterm of t if and only if: • t = and S is empty, or • t = x and S = {x}, or • t = t 1 ∨ t 2 and S is a minterm of t 1 or of t 2 , or • t = t 1 ∧ t 2 and S = S 1 ∪ S 2 where each S i is a minterm of t i . Dually, a set T ⊆ Var (t) is a maxterm of t if and only if: Proof. This follows straightforwardly from Definition 4.2 and structural induction on t.
Notice that, in particular, ⊥ has no minterms and has no maxterms. We can now present one of the important correspondences of this work, characterising minterms and maxterms of linear terms as maximal cliques in their relation webs: Theorem 4.6. A set of variables is a minterm (resp. maxterm) of a negation-free constantfree linear term t if and only if it is a maximal ∧-clique (resp. maximal ∨-clique) in W(t).
Proof. This follows from structural induction on t and Proposition 4.5.
Definition 4.7 (Read-once functions). A Boolean function is called read-once if it is computed by some linear term.
It is not exactly clear when the following result first appeared, although we refer to a discussion in [CH11] where it is stated that results directly implying this were first mentioned in [Kuz58]. The result also occurs in [Gur77], and is generalised to certain other bases in [HNW94] and [HK90].
Theorem 4.8 (Folklore). Constant-free negation-free linear terms compute the same (readonce) Boolean function if and only if they are equivalent modulo AC .
The following consequence of Theorem 4.8 appears in [Das11], where a detailed proof may be found. Proof idea. The result essentially follows from the observation that every negation-free term is ACU -equivalent to ⊥, or a unique constant-free linear term.
Let us conclude this section by stating the following classical result, characterising the read-once functions over ∧ and ∨, due to Gurvich in [Gur77]. This has appeared in various presentations and, in particular, the proof appearing in [CH11] uses 'cooccurrence' graphs that correspond to our relation webs.
In this paper we will actually only need one direction of this theorem: that for monotone read-once functions, minterms and maxterms have singleton intersection. Using the different settings we have introduced, we arrive at a remarkably simple proof of this direction: Proof of left-right direction of Theorem 4.10. A minterm and maxterm of f must intersect since, otherwise, we could simultaneously force f to evaluate to 0 and 1. On the other hand, by Theorem 4.6, a minterm is a ∧-maxclique of W(t) and a maxterm is a ∨-maxclique of W(t), and cliques with different labels can intersect at most once.
This simple proof exemplifies the usefulness of considering both the graph theoretic viewpoint and the Boolean function viewpoint. Such interplays will prove to be very useful in the remainder of this work.
Example 4.11. Consider the function computed by the term Now consider the Boolean 'threshold' functions TH X k : {0, 1} X → {0, 1}, which return 1 on just those Y ⊆ X such that |Y | ≥ k. By defnition, this has minterms S ⊆ X such that |S| = k and maxterms T ⊆ X such that |T | = n − k + 1. This means that for each minterm there is a maxterm that contains it or vice versa, depending on whether k ≥ |X| 2 . Therefore by Gurvich's result, Theorem 4.10, TH X k is read-once just when k = 1, where it is computed by the disjunction of X, or when k = |X| − 1, where it is computed by the conjunction of X.
Now let X = {v, w, x, y, z}. Appealing to Proposition 4.4, we have that t ≤ TH X 2 , since all minterms of t have size 2 and so are also minterms of TH X 2 . Dually, the maxterms of TH

Linear inferences, triviality and a polynomial bound on length
In the previous section we considered the semantics of linear terms via Boolean functions. In this section we study sound rewriting steps between linear terms, with respect to this semantics, and prove our main result, Theorem 5.9, about the length of such rewriting paths, corresponding to point ((A)) in the Introduction, Section 1.
Definition 5.1 (Soundness). We say that a rewrite rule s → t is sound if s and t compute Boolean functions f and g, respectively, such that f ≤ g. We say that a TRS is sound if all its rules are sound. A linear inference is a sound linear rewrite rule.
Notation 5.2. To switch conveniently between the settings of terms and Boolean functions, we freely interchange notations, e.g. writing s ≤ t to denote that s → t is sound, and saying f → g is sound when f ≤ g.
We immediately have the following, which can also be found in [Das13].
Proposition 5.3. Any sound negation-free linear TRS, modulo ACU , is terminating in exponential-time. 2 Proof. The result follows by Boolean semantics and Corollary 4.9: each consequent term must compute a distinct Boolean function that is strictly bigger, under ≤, and the graph of ≤ has length 2 n , where n is the number of variables in the input term.
The purpose of this section is now to put a polynomial bound on the length of certain linear derivations. For this, the fundamental concept we use is that of "triviality", first introduced in [Das13] as "semantic triviality".
Definition 5.4 (Triviality). Let f and g be Boolean functions on a set of variables X, and let x ∈ X. We say We say simply that f → g is trivial if it is trivial at one of its variables.
The idea behind triviality of a variable in an inference is that the validity of the inference is "independent" of the behaviour of that variable.
Example 5.5. Recalling the Boolean threshold functions TH X k from Example 4.11, notice that TH X k+1 → TH X k is trivial at any (but at most one) variable of X. More concretely, the linear inference x ∧ y → x ∨ y is trivial at x or y, whereas the linear inference, is trivial at all y i simultaneously.
As observed in [Das13], the inference (5.1) above can be used to create exponentiallength (constant-free) linear derivations. The idea is to construct a derivation from the conjunction of a variable set X to its disjunction, by induction on |X|, as follows, where redexes are underlined and the two intermediate derivations are obtained from the inductive hypothesis. We will show in the remainder of this section that such exponential length rewrite paths only occur when deriving a triviality.
Remark 5.6 (Hereditariness of triviality). Notice that the triviality property is somehow hereditary: if a sound sequence f 0 → f 1 → . . . → f l of Boolean functions is trivial at some point f i → f i+1 for 0 ≤ i < l then f 1 → f l is trivial. However the converse does not hold: if the first and last function of a sound sequence constitutes a trivial pair it may be that there is no local triviality in the sequence. For example the endpoints of the derivation, form a pair that is trivial at w (or trivial at x), but no local step witnesses this. In these cases we call the sequence globally trivial. This phenomenon is what we will need to address later in Lemma 5.8, on which our main result crucially relies.
In a similar way to how we expressed soundness via minterms or maxterms in Proposition 4.4, we can also define triviality via minterms or maxterms.
Proposition 5.7. The following are equivalent: (1) f → g is trivial at x.
Let us now fix a sequence f = f 0 < f 1 < · · · < f l = g of strictly increasing read-once Boolean functions on a variable set X. Intuitively, we would like to build a decreasing chain of minterms, whence we could extract an appropriate bound for l. The problem, however, is that new minterms can appear too, for example in the case of medial (1.4), so this process does not clearly terminate in reasonable time.
To address this issue, we will show that there must exist particular chains of minterms, for each variable, which will strictly decrease sufficiently often. Unless f → g is trivial, for each variable x ∈ X we must be able to associate a minterm S x of f such that, for any S ⊆ S x that is a minterm of some f i , it must be that S x. This is visualized in Figure 1 together with the dual property for maxterms.
Lemma 5.8 (Subset and intersection lemma). Suppose f → g is not trivial. For every variable x ∈ X, there is a minterm S x of f and a maxterm T x of g such that: Proof. Suppose that, for some variable x no minterm of f has property 1. In other words, for every minterm S x of f containing x there is some minterm S i of some f i that is a subset of S x yet does not contain x. Since f i → f l is sound for every i we have that, by Proposition 4.4, for every minterm S x of f containing x there is some minterm S l of f l = g that is a subset of S x not containing x. I.e. f → g is trivial, by Proposition 5.7, which is a contradiction. Property 2 is proved analogously. Finally, Property 3 is proved by appealing to read-onceness: any such S i and T i must contain x by properties 1 and 2, yet their intersection must be a singleton by Theorem 4.10 since all f i are read-once.
Notice that, since some such S i and T i must exist for all i, by soundness, we can build a chain of such minterms and maxterms preserving the intersection point. For a given derivation, let us call a choice of such minterms and maxterms critical (see Figure 1).
We now state the main result of this section, also the main technical contribution of this work, for which Lemma 5.8 will play a crucial role and from which we can obtain our further results. While we state this result for terms, in order to access simultaneously the notions of relation webs and Boolean semantics, this could equally be stated in the setting of read-once Boolean functions due to Gurvich's result, Theorem 4.10.
Theorem 5.9. Let s = t 0 < t 1 < · · · < t l = t be a (strictly increasing under ≤) sequence of negation-free constant-free linear terms on variable set X of size n, such that l > 0 and such that s → t is not trivial. We have that l = O(n 4 ).
The remainder of this section is devoted to the proof of Theorem 5.9. For this let us fix π to denote the sequence s = t 0 < t 1 < · · · < t l = t. Recall that, since t i < t i+1 , t i and x x y y S S Figure 2: In the proof of Proposition 5.11, S cannot contain both x and y, so we can assume without loss of generality that it does not contain x (although it need not necessarily contain y either).
t i+1 have distinct minterms and maxterms, by Observation 4.3, and so must have distinct relation webs by Theorem 4.6. We now fix, for each x ∈ X and 0 ≤ i ≤ l, some choice of S x i and T x i as critical minterms and maxterms, respectively, of t i , under Lemma 5.8. I.e. we have that, for each x ∈ X: We denote the size of the critical minterms and maxterms of t i by |S x i | and |T x i |, respectively. Now we define: Observation 5.10. Note that we always have |S x i |, |T x i | ≤ n because a minterm or maxterm is a subset of X, and therefore we have ν(t i ), µ(t i ) ≤ n 2 for all t i in π.
The following two propositions now form the core of the argument. The first says that whenever a ∧-edge changes to a ∨-edge, some minterm strictly decreases in size, and the second one says that if a minterm strictly decreases in size then some critical maxterm must strictly increase in size. Thus the proof of Theorem 5.9 that follows again relies crucially on the interplay between the Boolean function setting and the graph-theoretic setting.
Proposition 5.11. Suppose, for some i < l, we have that x y in W(t i ) and x y in W(t i+1 ). Then there is a minterm S of t i , and a minterm S of t i+1 such that S S.
Proof. Take any maximal ∧-clique in W(t i ) containing x and y, of which there must be at least one. This must have a ∧-subclique which is maximal in W(t i+1 ), by Proposition 4.4 and Theorem 4.6. This subclique cannot contain both x and y, so the inclusion must be strict (see Figure 2).
Proposition 5.12. Suppose for j > i there is some minterm S i of t i and some minterm S j of t j such that S j S i . Then, for some variable x ∈ X, we have that T x i T x j . Proof. We let x be some variable in x ∈ S i \ S j , which must be nonempty by hypothesis. By Theorem 4.10 we have that |T x i ∩ S i | = 1, so it must be that T x i ∩ S i = {x} by construction. On the other hand we also have that |T x j ∩ S j | = 1, and so there is some (unique) y ∈ T x j ∩ S j . Now, since S i S j we must have y ∈ S i . However we cannot have y ∈ T x i since that would imply that {x, y} ⊆ T x i ∩ S i , contradicting the above. Since we have that T x i ⊆ T x j we can now conclude that T x i T x j as required, because y ∈ T x j and y / ∈ T x i (see Figure 3). Notice that both of the two propositions above rely crucially on the notion of linearity. Proposition 5.11 assumes the existence of relation webs for a term, a property peculiar to linear terms, whereas Proposition 5.12 does not remain true for terms that do not compute read-once Boolean functions: there is no requirement for minterms and maxterms of arbitrary Boolean functions to intersect at most once, cf. Example 4.11.
Lemma 5.13 (Increasing measure). The lexicographical product µ × e ∧ is strictly increasing at each step of π.
Proof. Notice that, by Lemma 5.8.2, we have that T x 0 ⊆ T x 1 ⊆ · · · ⊆ T x l , which means that µ is non-decreasing. So let us consider the case that e ∧ decreases at some step and show that µ must strictly increase. If e ∧ (t i ) > e ∧ (t i+1 ) then we must have that some edge is labelled ∧ in W(t i ) and labelled ∨ in W(t i+1 ). Hence, by Proposition 5.11 some minterm has strictly decreased in size and so by Proposition 5.12 some critical maxterm must have strictly increased in size.
From here we can finally prove our main result.
Proof of Theorem 5.9. By Observation 5.10 and Proposition 3.3 we have that µ = O(n 2 ) = e ∧ and so, since s → t is nontrivial, it must be that the length l of π is O(n 4 ), as required.
Notice that, while the various settings exhibit a symmetry between ∧ and ∨, it is the property of soundness that induces the necessary asymmetry required to achieve this result.
Remark 5.14. Let us take a moment to reflect on what might happen if the inference that is derived were trivial. Consider the following: This derivation is trivial at x, in fact witnessed by the second inference. 3 Notice that there is no 'critical' minterm for y in this derivation: the only minterm containing y on the left is {w, x, y}, but this contains a minterm {w, x} on the right. This is similarly true for z, although here the situation is rather worse: while the minterm {w, x, z} on the left indeed contains {w, x} on the right, there is no intermediate minterm. This prevents us from proving termination via a step-by-step analysis of the subsets of {w, x, z} that occur as minterms in the derivation, which we are able to do in the presence of critical minterms and maxterms.

No complete linear term rewriting system for propositional logic
Recall that a linear inference is a sound linear rewrite rule. We denote the set of all linear inferences by L. We will now show that there is no sound linear term rewriting system that is complete for L unless coNP = NP. The work in this section corresponds to point ((B)) in the Introduction, culminating in Theorem 6.8, and ultimately point ((C)) by way of Corollary 6.9.
We start with the following observation made in [Str12]: This result is the reason, from the point of proof theory, why one might restrict attention to only linear inferences at all: every Boolean tautology can be written as a linear inference. As we can see from the proof that follows, the translation is not very complicated, and it induces an at most quadratic blowup in size from an input tautology to a linear inference.
We include the proof here for completeness, and also since the statement here differs slightly from that in [Str12].
Proof of Proposition 6.1. That L is in coNP is due to the fact that checking soundness of a rewrite rule s → t can be reduced to checking validity of the formulas ∨ t. To prove coNP-hardness, we reduce validity of general tautologies to soundness of linear rewrite rules. Let t be the term obtained from t (which is assumed to be in NNF) by doing the following for each positive variable x: let n be the number of occurrences of x in t, and let m be the number of occurrences ofx in t. If n = 0 replace every occurence ofx by ⊥, and if m = 0 replace every occurrence of x by ⊥. Otherwise, introduce 2mn fresh (positive) variables x i,j , x i,j for 1 ≤ i ≤ n and 1 ≤ j ≤ m. Now, for 1 ≤ i ≤ n, replace the i th occurrence of x by x i,1 ∨ . . . ∨ x i,m and, for 1 ≤ j ≤ m, replace the j th occurrence ofx by x 1,j ∨ . . . ∨ x n,j . Now t is a linear term (without negation), and its size is quadratic in the size of t. Let s be the conjunction of all pairs x ∨ x of variables introduced in the construction of t . Clearly Var (s ) = Var (t ) and s is also a linear term of the same size as t . Furthermore, t is a tautology if and only if s → t is sound. To see this, let s and t be obtained from s and t , respectively, by replacing each x byx . Then s always evaluates to 1, and t is a tautology if and only if t is a tautology.
In the next step we extend the result of the previous section to all linear inferences, i.e., we have to deal with constants, negation, erasure, and trivialities. Some of the following results appeared already in [Das13], so we present only brief arguments here.
Definition 6.2. We define the following rules: We call the former switch and the latter medial [BT01].
In what follows we implicitly assume that rewriting is conducted modulo ACU . (2) s → t is sound and nontrivial.
Proof. See [Das13]. Briefly, the idea is that u is obtained by repeatedly 'moving aside' trivial variables, using s, m and ACU , until there are no trivialities remaining in s → t . The bound of O(n 2 ) is not explicitly mentioned in [Das13], but it is clear from direct inspection of that construction.
Remark 6.4. Notice that, while the derivations from Lemma 6.3.(1) above are small in size, they are in general difficult to compute, due to the inherent complexity of detecting triviality. This problem is in fact already coNP-complete, since validity of an arbitrary linear inference s → t can be reduced to detecting triviality at x in s ∧ x → t ∨ x, where x is fresh. This is not an issue in what follows since we are only concerned with the existence of small derivations, and so the existence of an NP-algorithm, for various inferences.
A left-and right-linear rewrite rule may still erase or introduce variables, i.e. there may be variables on one side that do not occur on the other. 4 However, notice that any such situation must constitute a triviality at such a variable, since the soundness of the step is not dependent on the value of that variable.
Proposition 6.5. Suppose ρ : l → r is linear, and there is some variable x occurring in only one of l and r. Then ρ is trivial at x.
If a (positive) variable x occurs negatively on both sides of a linear rule thenx can be replaced soundly by x on both sides. Otherwise, if x occurs positively on one side and negatively on the other, it must be that we have a triviality at x. Proposition 6.6. For each linear rule ρ either there is a negation-free linear rule that is equivalent to ρ (i.e. with the same reduction steps), or ρ is trivial.
Recall that ACU preserves the Boolean function computed by a term, and that every linear term is ACU -equivalent to ⊥, or a unique constant-free linear term. Let us write R · S for the composition of relations R and S, and = ACU for equivalence under ACU .
Proposition 6.7. If R is a complete linear system then any constant-free nontrivial linear inference has a constant-free derivation in = ACU · → R · = ACU .
Proof. Let s → t be a constant-free nontrivial linear inference. By completeness there is an R-derivation of s → t, in which we may simply reduce every line by ACU to a constant-free term or ⊥ or . However, if some line were to reduce to ⊥ or then either s or t would contain a constant, by soundness and Corollary 4.9, so the resulting sequence is a derivation of the appropriate format. Now, combining our results from Section 5 with the normal forms obtained above, we arrive at the main result of this work: Theorem 6.8. If there is a sound and complete linear system for L, then there is one that has a O(n 4 )-length derivation for each linear inference on n variables.
Proof. Assume we have a sound and complete linear system R for L, and let s → t be a linear inference on n variables. By Lemma 6.3 we have linear terms s , t such that |s | ≤ |s| and s → t is sound, linear, and nontrivial. By Propositions 6.5, 6.6 and reduction under ACU we can assume that s , t have the same size and are free of negation and constants. 5 By Proposition 6.7 there is thus a derivation of s → t in = ACU · → R · = ACU that is constant-free and negation-free. We can assume that each term in this derivation computes a distinct Boolean function, by Corollary 4.9, and so, by Theorem 5.9, the length of this derivation is O(n 4 ). Finally, by Lemma 6.3.(1), this means that we can construct a derivation of s → t with overall length O(n 4 ) in R ∪ {s, m} ∪ ACU .
Corollary 6.9. There is no sound linear system complete for L unless coNP = NP.
Proof. By Proposition 6.1, L is coNP-complete, and the existence of such a system would lead to a NP decision procedure for L by Theorem 6.8: for any linear inference on n variables we could simply guess a correct O(n 4 ) length derivation in an appropriate system.

On the canonicity of switch and medial
In this section we investigate to what extent the two rules switch and medial from Definition 6.2, which play a crucial role in the proof theory of classical propositional logic, are "canonical". Let us restrict our attention to constant-free terms and rules for this section.
Recall that the switch and medial rules are as follows: First we observe that both rules are minimal in the following sense: Definition 7.1. A sound linear rewrite rule ρ : l → r is minimal if there is no linear term t on the same variables as l and r such that l < t < r.
Proposition 7.2. Switch and medial are minimal.
Proof. By exhaustive search on all terms of size 3 (for switch) and 4 (for medial).
Observe that, seen as an action on relation webs, switch and medial preserve ∨-edges and ∧-edges, respectively. Formally, let us consider the following two properties of a linear inference ρ: (*) If s → ρ t then, whenever x y in W(s), we have that x y in W(t).
(**) If s → ρ t then, whenever x y in W(s), we have that x y in W(t).
Our first canonicity result is that medial is the only sound linear inference that is minimal and satisfies (**). In fact, we will show the stronger property that any sound linear rule satisfying (**) is already derivable by medial. First, we will require a certain relation between the webs of terms, which was defined in [Str07a]. This relation allows us to relate structural properties of graphs to derivability by medial, via the characterisation result below. The proof from [Str07a] relies on careful analysis of subterms which is beyond the scope of this paper. Using this result we can show that any sound linear rule satisfying (**) is already derivable by medial: Theorem 7.5. Let s and t be linear terms on a variable set X. The following are equivalent: (1) s ≤ t and for all x, y ∈ X we have x y in W(s) implies x y in W(t).
For the proof let us say, if t is a linear term with x, y, z ∈ Var (t), that y separates x from z in W(t) if x y in W(t) and y z in W(t).
Proof of Theorem 7.5. We have that 2 =⇒ 3 by Proposition 7.4 and 3 =⇒ 1 by inspection of medial, so it suffices to show 1 =⇒ 2. For this, assume 1 and suppose x y in W(s) and x y in W(t), and let S be a minterm of s containing x. We must have S {x} since x y in W(t) and s → t is sound. 6 Similarly there must be a maxterm T of t containing y such that T {y}. Now, by 1, it must be that S (resp. T ) is also a minterm (resp. maxterm) of t (resp. s), 7 and so, by Theorem 4.10, there is some (unique) z ∈ S ∩ T which, by definition, separates x from y in both W(s) and W(t). By a symmetric argument we obtain a w separating y from x in both W(s) and W(t). By construction, w and z must be distinct, so we have the following situation, x z w y in W(s) and x z w y in W(t).
Corollary 7.6 (Canonicity of medial). Medial is the only sound linear inference that is minimal and has property (**).
Proof. By Theorem 7.5, any linear inference satisfying (**) can be derived by medial. The result then follows by minimality of medial.
Using these results, we are actually able to improve the length bound on nontrivial linear derivations that we proved earlier: Corollary 7.7. The bound in Theorem 5.9 can be improved to O(n 3 ).
For the proof, let us first define # ∧ (t) (resp. # ∨ (t)) to be the number of ∧ (resp. ∨) symbols occurring in t.
Proof of Corollary 7.7. Instead of using e ∧ in Lemma 5.13, use # ∨ , which is linear in the size of the term. If no ∧-edge changes to a ∨-edge in some step, it follows by Theorem 7.5 that the step is derivable using medial, and so # ∨ must have strictly increased.
While we have just shown a fairly succinct form of canonicity for medial, it turns out that we cannot obtain an analogous result for switch: switch is not the only sound linear inference that is minimal and satisfies (*). To see this, simply recall the example of (1.7) from the Introduction: Notice, however, that this inference does not preserve the number # ∧ of conjunction symbols in a term. In fact, switch is the only nontrivial linear inference we know of that preserves # ∧ , although there are known trivial examples that even increase # ∧ , for instance the "supermix" rules from [Das13] that we considered earlier in Example 5.5, (5.1): This leads us to the following conjecture: Conjecture 7.8. If s → t is sound, nontrivial, satisfies (*) and # ∧ (s) ≤ # ∧ (t), then s * → s t.
Notice that this conjecture would already imply our main result, Theorem 5.9, since # ∧ × e ∧ would be a strictly decreasing measure. This measure can also be used for the usual proof of termination of {s, m} (constant-free and modulo AC ) and also yields a cubic bound on termination. 8 We point out that, in this work, we have matched that bound for all linear derivations that are not trivial.
The supermix rules are also examples of linear inferences that satisfy neither (*) nor (**). However, again, we have not been able to identify any nontrivial examples of this, and we further conjecture the following: Conjecture 7.9. There is no nontrivial minimal sound linear inference that satisfies neither (*) nor (**).
An interesting observation is that Conjecture 7.9 and Corollary 7.6 together entail that medial is the only linear inference that allows contraction to be reduced to atomic form. To see what this means, consider again (1.6) from the introduction. The steps marked c↓ are instances of the contraction rule x ∨ x → x. If the contractum of such a step is simply a variable, then we call that instance of contraction atomic, denoted by ac↓ as in [BT01]. Dually, the atomic instances of 'cocontraction' x → x ∧ x, when the redex is simply a variable, are denoted by ac↑. We say that a linear inference ρ : l → r reduces contraction to atomic form if, for every term t, we have t ∨ t Conjecture 7.10. Medial is the only minimal linear inference that reduces contraction to atomic form. More precisely, for every linear inference ρ : l → r that reduces contraction to atomic form we have l * → m r.
Proof using Conjecture 7.9. Assume t ∨ t * −→ ρ,ac↓ t and t * −→ ρ,ac↑ t ∧ t modulo ACU , for every term t. Since t can contain ∨ and ∧, it must be the case that ρ replaces ∨-edges in W(l) by ∧-edges in W(r). By Conjecture 7.9 ρ does not replace ∧-edges in W(l) by ∨-edges in W(r). By Theorem 7.5 we must have l * → m r.

On the normalisation of deep inference proofs
Another application of our results is to the normalisation of deep inference proofs. This is typically done via rewriting on certain graphs extracted from derivations, known as atomic flows [GG08,GGS10]. The main sources of complexity here are 'contraction loops', and so a lot of effort has gone into the question of whether such features can be eliminated. A consequence of our main result is that this is impossible for a large class of deep inference systems.
We will now only consider rewriting systems on positive terms, and then make some remarks about negative rules at the end of this section. We consider systems with the standard structural rules of deep inference, extended by an arbitrary (polynomial-time decidable) set of linear rules.
A formal definition of atomic flows can be found in [GG08], where they were first presented, and an alternative presentation can be found in [GGS10]. We give an informal definition below which is sufficient for our purposes.
Definition 8.1 (Structural rules and atomic flows). We define the system cw as follows: If S is the extension of cw by a set of linear rules and π is an S-derivation (written as a vertical list), then the atomic flow of π, denoted fl (π), is the (downwards directed) graph obtained by tracing the paths of each variable through the derivation, designating nodes at cw steps as follows: w↓ : w↑ : c↑ : c↓ : Example 8.2. Consider the system MSKS obtained by extending cw by the rules switch and medial, from Definition 6.2, as well as rules ACU from Section 2 for associativity, commutativity and constants. This is equivalent to the monotone fragment of the common deep inference system SKS [BT01].
Here is an example of an MSKS rewrite derivation, with redexes underlined, and its atomic flow. The colours are used to help the reader associate edges with variable occurrences in the derivation.
Definition 8.3 (Flow rewriting systems). A flow rewriting system (FRS) is a set of graph rewriting rules on atomic flows. We say that a FRS R lifts to a TRS S if, for every Sderivation π : s * → S t and reduction step fl (π) → φ there is a S-derivation π : s * → S t with fl (π ) = φ.
Example 8.4. Consider the following FRS, which is a subset of rules occurring in [GG08,GGS10] and which is called norm in [Das14]. We have essentially the following result from [GG08]: Proposition 8.5. norm lifts to any extension of MSKS by linear rules.
The proof of this is beyond the scope of this work, but crucially relies on the presence of switch, medial and ACU to make the w and c rules atomic, cf. 1.6, and thereby allow these steps to permute more freely in a derivation.
For example, here is a norm-derivation that normalises the flow from (8.1), where redexes are marked by .
norm is strongly normalising, as implied by results in [GG08]. In the works [Das12] and [Das15] the main source of complexity of (weak) normalisation under norm is the presence of contraction loops. In their absence the time complexity of normalisation is polynomially bounded.
It turns out that our previous results imply that no deep inference system that extends MSKS by linear rules can admit a flow-rewriting normalisation procedure that eliminates contraction loops: Theorem 8.7. Let R be a FRS such that, for any flow φ, there is some flow ψ free of contraction loops such that φ * → R ψ. Then R lifts to no sound system extending MSKS by linear rules unless coNP = NP.
Before giving the proof, let us first make the following observation: Proposition 8.8. If a flow φ is free of contraction loops and φ * −→ norm ψ, then ψ is also free of contraction loops.
intersects every ∨-maxclique of W(t). 13 However, when generalised to arbitrary graphs, this relation is not even reflexive because of, again, the case of a P 4 configuration (3.1).
In further work we would like to study the logics induced by the relations → ∧ and → ∨ , and even systems where one may alternate between them any time a graph is, say, P 4 -free. Such systems would be sound for Boolean logic when the source and target are P 4 -free, under the association of a term to its web. They would also leave the world of Boolean functions altogether, as we previously mentioned, which bears semblance to algebraic proof systems for propositional logic such as Cutting Planes and Nullstellensatz (studied in, for example, [BPR97] and [BIK + 97]).
Furthermore, notice that our crucial Lemma 5.8 cannot immediately be generalised to the setting of arbitrary graphs due to the fact that ∧-maxcliques no longer necessarily intersect ∨-maxcliques. It would be particularly interesting to examine the extent to which 'linear reasoning' can be recovered in this setting, sidestepping the shortcomings of P 4 -free graphs (i.e. terms) we have studied in this work.

Final remarks
To some extent, this work can be seen as a justification for the approach of 'structural' proof theory: for any deductive system that can be embedded into a rewriting framework on Boolean terms, as we have considered here, completeness requires the inclusion of structural rules that introduce, destroy and duplicate formulae, unless coNP = NP. It is not difficult to see that this covers a large class of proof systems, including essentially all the well-known systems based on formulae or related structures, e.g. Gentzen sequent calculi, Hilbert-Frege systems, Resolution, deep inference systems etc. On the other hand, as we mentioned in Section 9, proof systems based on other objects such as algebraic equations or graphs are not covered by our result. While the observation that structural behaviour is somewhat necessary for proof theory is perhaps not surprising, it is of natural theoretical interest.
There are clear thematic relationships between this line of work and linear logic. In some ways, we can see this work as contributing to the study of the 'multiplicative' fragment of Boolean logic. One particular connection we would like to point out is with Blass' model of linear logic in [Bla92], the first game semantics model of linear logic. The multiplicative fragment of this model in fact validates precisely the sound linear inferences of Boolean logic 14 , which he calls 'binary tautologies'. Following from the paragraph above, it would seem that one drawback of this model is that it can admit no sound and complete proof system, unless coNP = NP, by virtue of our results.
Finally, this work contributes to the study of term rewriting systems for Boolean Algebras. While complete axiomatisations have been known since the early 20th century by Whitehead, Huntington, Tarski and others, these are typically sets of equations, rather than 'directed' rewrite rules which are more related to proof theory. It has been known for some time, for example, that there is no convergent TRS for Boolean Algebras [Soc91]; our result, in the same vein, shows there is no linear TRS for the linear fragment of Boolean Algebras.
13 If s evaluates to 1, then one of its minterms must entirely be assigned to 1, and if this intersects every maxterm of t, then no maxterm of t is entirely assigned to 0, so t must also evaluate to 1. Conversely, if some minterm of s and some maxterm of t do not intersect, then we can simultaneously force s to evaluate to 1 and t to evaluate to 0.
14 Under the assiociation of with ∧ and with ∨.