Optimizing tree decompositions in MSO

The classic algorithm of Bodlaender and Kloks [J. Algorithms, 1996] solves the following problem in linear fixed-parameter time: given a tree decomposition of a graph of (possibly suboptimal) width k, compute an optimum-width tree decomposition of the graph. In this work, we prove that this problem can also be solved in mso in the following sense: for every positive integer k, there is an mso transduction from tree decompositions of width k to tree decompositions of optimum width. Together with our recent results [LICS 2016], this implies that for every k there exists an mso transduction which inputs a graph of treewidth k, and nondeterministically outputs its tree decomposition of optimum width. We also show that mso transductions can be implemented in linear fixed-parameter time, which enables us to derive the algorithmic result of Bodlaender and Kloks as a corollary of our main result.


Introduction
Consider the following problem: given a tree decomposition of a graph of some width k, possibly suboptimal, we would like to compute an optimum-width tree decomposition of the graph. A classic algorithm of Bodlaender and Kloks [BK96] solves this problem in linear fixed-parameter time complexity, where the input width k is the parameter.
Theorem 1.1 (Bodlaender and Kloks, [BK96]). There exists an algorithm that, given a graph G on n vertices and its tree decomposition of width k, runs in time 2 O(k 3 ) · n and returns a tree decomposition of G of optimum width.
The algorithm of Bodlaender and Kloks applies a dynamic programming procedure that processes the input decomposition in a bottom-up manner. For every subtree, a set of partial optimum-width decompositions is computed. The crucial ingredient is a combinatorial analysis of partial decompositions which shows that only some small subset of them, of size bounded only by a function of k, needs to be remembered for future computation. complicated, and we hope that it will find applications for computing other width measures. In fact, a similar approach has very recently been used by Giannopoulou et al. [GPR + 19] in a much simpler setting of cutwidth to give a new fixed-parameter algorithm for this graph parameter.
Next, we derive a corollary of the Dealternation Lemma called the Conflict Lemma, which directly prepares us to construct the mso transduction for the Bodlaender-Kloks problem. The Conflict Lemma is stated in purely combinatorial terms, but intuitively it shows that some optimum-width tree decomposition of the graph can be interpreted in the given suboptimum-width tree decomposition using subtrees that cross each other in a restricted fashion, guessable in mso. Finally, we formalize the intuition given by the Conflict Lemma in mso, thus constructing the mso transduction promised in our main result.

Preliminaries and statement of the main result
Trees, forests and tree decompositions. Throughout this paper all graphs are undirected, unless explicitly stated. A forest (which is sometimes called a rooted forest in other contexts) is defined to be an acyclic graph, where every connected component has one designated node called the root. This naturally imposes parent-child and ancestor-descendant relations in a (rooted) forest. We use the usual tree terminology: root, leaf, child, parent, descendant and ancestor. We assume that every node is its own descendant, to exclude staying in the same node we use the name strict descendant; likewise for ancestors. For forests we often use the name node instead of vertex. A tree is the special case of a forest that is connected and thus has one root. Two nodes in a forest are called siblings if they have a common parent, or if they are both roots. Note that there is no order on siblings, unlike some models of unranked forests where siblings are ordered from left to right.
A tree decomposition of a graph G is a pair t = (F, bag), where F is a rooted forest and bag(·) is a function that associates bags to the nodes of F . A bag is a nonempty subset of vertices of G. We require the following two properties: (T1) whenever uv is an edge of G, then there exists a node of F whose bag contains both u and v; and (T2) for every vertex u of G, the set of nodes of F whose bags contain u is nonempty and induces a connected subtree in F . The width of a tree decomposition is its maximum bag size minus 1, and the treewidth of a graph is the minimum width of its tree decomposition. An optimum-width tree decomposition is one whose width is equal to the treewidth of the underlying graph. Note that throughout this paper all tree decompositions will be rooted forests. This slightly diverges from the literature where usually the shape of a tree decomposition is an unrooted tree.
For a tree decomposition t = (F, bag) of a graph G, and each node x of F , we define the following vertex sets: • The adhesion of x, denoted adh(x), is equal to bag(x) ∩ bag(x ), where x is the parent of x in F . If x is a root of F , we define its adhesion to be empty. • The margin of x, denoted mrg(x), is equal to bag(x) \ adh(x).
• The component of x, denoted cmp(x), is the union of the margins of all the descendants of x (including x itself). Equivalently, it is the union of the bags of all the descendants of x, minus the adhesion of x. Whenever the tree decomposition t is not clear from the context, we specify it in the subscript, i.e., we use operators bag t (·), adh t (·), mrg t (·), and cmp t (·).
Observe that, by property (T2) of a tree decomposition, for every vertex of G there is a unique node whose bag contains u, but the bag of its parent (if exists) does not contain u. In other words, there is a unique node whose margin contains u. Consequently, the margins of the nodes of a tree decomposition form a partition of the vertex set of the underlying graph.
Relational structures and MSO. Define a vocabulary to be a finite set of relation names, each with associated arity that is a nonnegative integer. A relational structure over the vocabulary Σ consists of a set called the universe, and for each relation name in the vocabulary, an associated relation of the same arity over the universe. To describe properties of relational structures, we use logics, mainly monadic second-order logic (mso for short). This logic allows quantification both over single elements of the universe and also over subsets of the universe. For a precise definition of mso, see [CE12].
We use mso to describe properties of graphs and tree decompositions. To do this, we need to model graphs and tree decompositions as relational structures. A graph is viewed as a relational structure, where the universe is a disjoint union of the vertex set and the edge set of a graph. There is a single binary incidence relation, which selects a pair (v, e) whenever v is a vertex and e is an incident edge. The edges can be recovered as those elements of the universe which appear on the second coordinate of the incidence relation; the vertices can be recovered as the rest of the universe. For a tree decomposition of a graph G, the universe of the corresponding structure consists of the disjoint union of: the vertex set of G, the edge set of G, and the node set of the tree decomposition. There is the incidence relation between vertices and edges, as for graphs, a binary descendant relation over the nodes of the tree decomposition, and a binary bag relation which selects pairs (v, x) such that x is a node of the tree decomposition whose bag contains vertex v of the graph. The nodes of the decomposition can be recovered as those which are their own descendants, since we assume that the descendant relation is reflexive. Note that thus, the representation of a tree decomposition as a relational structure contains the underlying graph as a substructure.
MSO transductions. Suppose that Σ and Γ are vocabularies. Define a transduction with input vocabulary Σ and output vocabulary Γ to be a set of pairs (input structure over Σ, output structure over Γ) that is invariant under isomorphism of relational structures. When talking about transductions on graphs or tree decompositions, we use the representations described in the previous paragraph. Note that a transduction is a relation and not necessarily a function, thus it can have many possible outputs for the same input. A transduction is called deterministic if it is a partial function (up to isomorphism). For example, the subgraph relation is a transduction from graphs to graphs, but it is not deterministic since a graph can have many subgraphs. On the other hand, the transformation that inputs a tree decomposition and outputs its underlying graph is a deterministic transduction.
We use mso transductions, which are a special case of transductions that can be defined using the logic mso. The precise definition is in Section 5, but the main idea is that an mso transduction is a finite composition of transductions of the following types: copy the input a fixed number of times, nondeterministically color the universe of the input, and add new predicates to the vocabulary with interpretations given by mso formulas over the input vocabulary. The notion of transductions we use is borrowed from our previous work [BP16] and differs syntactically from the common definition that can be found, for instance, in the book of Courcelle and Engelfriet [CE12]. However, both definitions can be easily seen to be equivalent. We invite the reader to [CE12] for a broader discussion of the role of mso transduction in the theory of formal languages for graphs.
The main result. We now state the main contribution of this paper, which is an mso version of the algorithm of Bodlaender and Kloks.
Theorem 2.1. For every k ∈ {0, 1, 2, . . .} there is an mso transduction from tree decompositions to tree decompositions such that for every input tree decomposition t: • if t has width at most k, then there is at least one output; and • every output is an optimum-width tree decomposition of the underlying graph of t.
Let us stress that the transduction of Theorem 2.1 is not deterministic, that is, it might have several outputs on the same input. Using Theorem 2.1, we prove that an mso transduction can compute an optimum-width tree decomposition given only the graph.
Corollary 2.2. For every k ∈ {0, 1, 2, . . .} there is an mso transduction from graphs to tree decompositions such that for every input graph G: • if G has treewidth at most k, then there is at least one output; and • every output is a tree decomposition of G of optimum width.
Proof. Theorem 2.4 of [BP16] says that for every k ∈ {0, 1, 2, . . .} there is an mso transduction with exactly the properties stated in the statement, except that when the input has treewidth k, then the output tree decompositions have width at most f (k), for some function f : N → N. By composing this transduction with the transduction given by Theorem 2.1, applied to f (k), we obtain the claim.
We remark that all the arguments that we will use in the proof of Theorem 2.1 are constructive, hence the mso transduction whose existence is asserted in Theorem 2.1 can be computed given k as the input. The same holds also for the mso transduction given by Theorem 2.4 of [BP16], even though this is not explicitly stated in this work. As a result, the mso transduction of Corollary 2.2 can be also computed given k. In order not to obfuscate the presentation with computability issues of secondary relevance and straightforward nature, we choose to rely on the reader in verifying these claims.
Structure of the paper. Sections 3-5 are devoted to the proof of Theorem 2.1. First, in Section 3 we formulate the Dealternation Lemma. Its proof is deferred to Section 7 in order not to disturb the flow of the reasoning. Next, in Section 4 we prove the Conflict Lemma, which is a corollary of the Dealternation Lemma. Finally, in Section 5 we introduce formally mso transductions and use the combinatorial insight given by the Conflict Lemma to prove Theorem 2.1. In Section 6 we show how mso transductions can be implemented in linear fixed-parameter time on structure of bounded treewidth, and we discuss the corollaries of combining this result with our mso transduction for the Bodlaender-Kloks problem. This result relies on a normalization theorem for mso transductions, whose proof is deferred to Section 8 due to its technicality. Finally, in Section 9 we give some concluding remarks. A context factor is the difference X − Y for a tree factor X and a forest factor Y , where the root of X is a strict ancestor of every root of Y . For a context factor X − Y , its root is defined to be the root of X, while the roots of Y are called the appendices. Note that a context factor always contains a unique node that is the parent of all its appendices.
Forest factors and context factors will be jointly called factors. The following lemma can be proved by a straightforward case study, and hence we leave its proof to the reader.
Lemma 3.1. The union of two intersecting factors in the same forest is also a factor.
For a subset U of nodes of a forest, a U -factor is a factor that is entirely contained in U . A factorization of U is a partition of U into U -factors. A U -factor is maximal if no other U -factor contains it as a strict subset.
Lemma 3.2. Suppose U is a subset of nodes of a forest. Then the maximal U -factors form a factorization of U .
Proof. Every node of U is contained in some factor, e.g., a singleton factor (which has forest or context type depending on whether the node is a leaf or not). Thus, every node of U is also contained in some maximal U -factor. On the other hand, two different maximal U -factors must be disjoint, since otherwise by Lemma 3.1, their union would also be a U -factor, contradicting maximality.
The set of all maximal U -factors will be called the maximal factorization of U , and will be denoted by fact(U ). We specify the forest in the subscript whenever it is not clear from the context. Lemma 3.2 asserts that fact(U ) is indeed a factorization of U . Note that the maximal factorization of U is the coarsest in the following sense: in every factorization of U , each of its factors is contained in some factor of fact(U ). In particular, the maximal factorization has the smallest number of factors among all factorizations of U . In the sequel, we will need the following simple result about relation between the maximal factorizations of a set and of its complement. Its proof is a part of the proof of the Dealternation Lemma, and can be found in Section 7.2 (see Lemma 7.4 there).
Lemma 3.3. Suppose (U, W ) is a partition of the node set of a rooted forest F , and let k be the number of factors in the maximal factorization of W . Then the maximal factorization of U has at most k + 1 forest factors and at most 2k − 1 context factors.
Elimination forests. The general definition of a tree decomposition is flexible and allows for multiple combinatorial adjustments. Here, we will rely on a normalized form that we call elimination forests, which are essentially tree decompositions where all the margins have size exactly 1. The definition of treewidth via elimination forests resembles the definition of pathwidth via the so-called vertex separation number [Kin92].
Definition 3.4. Suppose G is a graph. An elimination forest of G is a rooted forest F on the same vertex set as G such that G is contained in the ancestor-descendant closure of F ; that is, whenever uv is an edge of G, then u is an ancestor of v in F or vice versa.
Elimination forests are used to define the graph parameter treedepth, which is equal to the minimum depth of an elimination forest of a graph. To define treewidth, we need to take a different measure than just the depth, as explained next.
Suppose F is an elimination forest of G. Endow F with the following bag function bag(·). For any vertex u of G, assign to u the bag bag(u) consisting of u and all the ancestors of u in F that have a neighbor among the descendants of u in F . The following claim follows by verifying the definition of a tree decomposition; we leave the easy proof to the reader.
Claim 3.5. If F is an elimination forest of G and bag(·) is defined as above, then (F, bag) is a tree decomposition of G. Further, for every vertex u of G, the margin of u in (F, bag) is {u}.
The tree decomposition (F, bag) defined above is said to be induced by the elimination forest F . Observe that if t = (F, bag) is induced by F , then for any vertex u, the component of u in t consists of all the descendants of u in F . One can reformulate the construction given above as follows. First, put every vertex u into its bag bag(u). Then, examine every neighbor v of u, and if v is a descendant of u in F , then add u to every bag on the path from v to u in F . Thus, every vertex u is "smeared" onto a subtree of F , where u is the root of this subtree and its leaves correspond to those neighbors of u that are also its descendants in F . This construction is depicted in Figure 1.
The width of an elimination forest is simply the width of the tree decomposition induced by it. Consequently, the width of an elimination forest is never smaller than the treewidth of a graph. The next result shows that in fact there is always an elimination forest of optimum width. The proof follows by a simple surgery on an optimum-width tree decomposition, and can be found in Section 7.3 (see Lemma 7.6 there).
Lemma 3.6. For every graph G there exists an elimination forest of G whose width is equal to the treewidth of G.
Dealternation Lemma. We are finally ready to state the Dealternation Lemma.
Lemma 3.7 (Dealternation Lemma). There exist functions f (k) ∈ O(k 3 ) and g(k) ∈ O(k 4 ) such that the following holds. Suppose that t is a tree decomposition of a graph G of width k. Then there exists an optimum-width elimination forest F of G such that: (D1) for every node x of t, the maximal factorization fact F (cmp t (x)) has at most f (k) factors; (D2) for every node x of t, there are at most g(k) children of x in the set { y : y is a node of t with at least one context factor in fact F (cmp t (y)) }.
Note that in the statement of the Dealternation Lemma, the vertex set of G is at the same time the node set of the forest F . Thus, fact F (cmp t (x)) denotes the maximal factorization of cmp t (x), treated as a subset of nodes of F .
The proof of the Dealternation Lemma uses essentially the same core ideas as the correctness proof of the algorithm of Bodlaender and Kloks [BK96]. We include our proof for several reasons. First, unlike in [BK96], in our setting we cannot assume that t has binary branching, as is the case in [BK96]. In fact, condition (D2) is superfluous when t has binary branching. Second, our formulation of the Dealternation Lemma highlights the key combinatorial property, which is expressed as the existence of a single elimination forest F that behaves nicely with respect to the input decomposition t. This property is somehow implicit [BK96], where the existence of nicely-behaved optimum-width tree decompositions is argued along performing dynamic programming. For this reason, we find the new formulation more explanatory and potentially interesting on its own.
For now we take the Dealternation Lemma for granted and we proceed with the proof of Theorem 2.1. The proof of the Dealternation Lemma can be found in Section 7.

Using the Dealternation Lemma
In this section we use the Dealternation Lemma to show that an optimum-width elimination forest of a graph can be interpreted in a suboptimum-width tree decomposition. For this, we need to develop a better understanding of the combinatorial insight provided by the Dealternation Lemma, which is expressed via an auxiliary graph, called the conflict graph.
Suppose G is a graph, t is a tree decomposition of G of width k, and F is an elimination forest of G. Let φ be the mapping that sends each vertex u of G to the unique node of t that contains u in its margin. For a vertex u of G, we define the stain of u, denoted S u , which is a subgraph of the underlying forest of t, as follows. For every child v of u in F , find the unique path in t between φ(u) and φ(v). Then stain S u consists of the node φ(u) and the union of these paths. Note that if u is a leaf of F , then the stain S u consists only of the node φ(u). Define the conflict graph H(t, F ) as follows. The vertices of H(t, F ) are the vertices of G, and vertices u and v are adjacent in H(t, F ) if and only their stains S u and S v have a node in common. The main result of this section can be formulated as follows. Recall here that a proper coloring of a graph is a coloring of its vertex set such that no two adjacent vertices receive the same color. The rest of this section is devoted to the proof of the Conflict Lemma. From now on, we assume that G, t, F are as in the Dealternation Lemma, and we denote H = H(t, F ).
Observe that the conflict graph H is an intersection graph of a family of subtrees of a forest (here, a subtree of a forest F is simply a connected subgraph of F ). It is well-known (see, e.g., [Gol04]) that this property precisely characterizes the class of chordal graphs (graphs with no induced cycle of length larger than 3), so H is chordal. Chordal graphs are known to be perfect (again see, e.g., [Gol04]), hence the chromatic number of a chordal graph (the minimum number of colors needed in a proper coloring) is equal to the size of the largest clique in it. On the other hand, subtrees of a forest are known to satisfy the so-called Helly property: whenever F is some family of subtrees such that the subtrees in F pairwise intersect, then in fact there is a node of the forest that belongs to all the subtrees in F. This means that the largest clique in an intersection graph of a family of subtrees of a forest can be obtained by taking all the subtrees that contain some fixed node. Therefore, to prove the Conflict Lemma it is sufficient to prove the following claim.
Claim 4.2. There exists a function h(k) ∈ O(k 7 ) such that every node of t belongs to at most h(k) of the stains {S u : u ∈ V (G)}.
In the remainder of this section we prove Claim 4.2. Fix any node x of t, and let y 1 , y 2 , . . . , y p be its children in t. Consider the following partition of the vertex set of G: Define a factorization Φ of the whole node set of F as follows: for each set X from the partition Π, take its maximal factorization fact F (X), and define Φ to be the union of these maximal factorizations. Thus, Φ is a factorization that refines the partition Π. Since the number of children y i is unbounded, we cannot expect that Φ has a small number of factors, but at least it has a small number of context factors. Proof. By the Dealternation Lemma, each of the sets cmp t (y 1 ), . . . , cmp t (y p ), cmp t (x) has at most f (k) factors in its maximal factorization in F . Moreover, only at most g(k) of the sets cmp t (y 1 ), . . . , cmp t (y p ) can have a context factor in their maximal factorizations. Hence, the maximal factorizations of sets cmp t (y 1 ), . . . , cmp t (y p ) introduce at most g(k) · f (k) context factors to the factorization Π. Since the maximal factorization of cmp t (x) has at most f (k) factors as well, by Lemma 3.3 we deduce that the maximal factorization of V (G) \ cmp t (x) has at most 2f (k) − 1 context factors. Finally, the cardinality of mrg t (x) is at most k + 1, so in particular its maximal factorization has at most k + 1 factors in total. Summing up all these upper bounds, we conclude that Φ has at most g(k) · f (k) + 2f (k) + k context factors.
With Claim 4.3 in hand, we complete now the proof of Claim 4.2. Take any vertex u such that x belongs to the stain S u . This means that either (i) u belongs to the margin of x, or (ii) u does not belong to the margin of x, but u has a child v in F such that the unique path in t between φ(u) and φ(v) passes through x. The number of vertices u satisfying (i) is bounded by the size of the margin of x, which is at most k + 1, hence we focus on vertices u that satisfy (ii). Observe that condition (ii) in particular means that u and v belong to different parts of partition Π, so also to different factors of the factorization Φ. Since u is the parent of v in F , this means that the unique factor of Φ that contains u must be a context factor, and u must be the parent of its appendices. Consequently, the number of vertices u satisfying (ii) is upper bounded by the number of context factors in factorization Φ, which is at most g(k) · f (k) + 2f (k) + k by Claim 4.3. We conclude that the number of stains S u containing x is at most . This concludes the proof of Claim 4.2, so also the proof of the Conflict Lemma is complete.

Constructing the transduction
We now use the understanding gathered in the previous sections to give an mso transduction that takes a tree decomposition of a graph of suboptimum width, and produces an optimumwidth tree decomposition. First, we need to precisely define mso transductions.
MSO transductions. Formally, an mso transduction is any transduction that can be obtained by composing a finite number of transductions of the following kinds. Note that kind 1 is a partial function, kinds 2, 3, 4 are functions, and kind 5 is a relation.
(1) Filtering. For every mso sentence ϕ over the input vocabulary there is transduction that filters out structures where ϕ is satisfied. Formally, the transduction is the partial identity whose domain consists of the structures that satisfy the sentence. The input and output vocabularies are the same. (2) Universe restriction. For every mso formula ϕ(x) over the input vocabulary with one free first-order variable there is a transduction, which restricts the universe to those elements that satisfy ϕ. The input and output vocabularies are the same, the interpretation of each relation in the output structure is defined as the restriction of its interpretation in the input structure to tuples of elements that remain in the universe. (3) MSO interpretation. This kind of transduction changes the vocabulary of the structure while keeping the universe intact. For every relation name R of the output vocabulary, there is an mso formula ϕ R (x 1 , . . . , x k ) over the input vocabulary which has as many free first-order variables as the arity of R. The output structure is obtained from the input structure by keeping the same universe, and interpreting each relation R of the output vocabulary as the set of those tuples (x 1 , . . . , x k ) that satisfy ϕ R .
(4) Copying. For k ∈ {1, 2, . . .}, define k-copying to be the transduction which inputs a structure and outputs a structure consisting of k disjoint copies of the input. Precisely, the output universe consists of k copies of the input universe. The output vocabulary is the input vocabulary enriched with a binary predicate copy that selects copies of the same element, and unary predicates layer 1 , layer 2 , . . . , layer k which select elements belonging to the first, second, etc. copies of the universe. In the output structure, a relation name R of the input vocabulary is interpreted as the set of all those tuples over the output structure, where the original elements of the copies were in relation R in the input structure. (5) Coloring. We add a new unary predicate to the input structure. Precisely, the universe as well as the interpretations of all relation names of the input vocabulary stay intact, but the output vocabulary has one more unary predicate. For every possible interpretation of this unary predicate, there is a different output with this interpretation implemented. We remark that the above definition is easily equivalent to the one used in [BP16], where filtering, universe restriction, and mso interpretation are merged into one kind of a transduction.
Proving the main result. We are finally ready to prove our main result, Theorem 2.1. The proof is broken down into several steps. The first, main step shows that an mso transduction can output optimum-width elimination forests. Here, an elimination forest of a graph G is encoded by enriching the relational structure encoding G with a single binary relation interpreted as the child relation of F . Note that the definition of an elimination forest is mso-expressible: there is an mso sentence that checks whether the additional relation indeed encodes an elimination forest of the graph.
Lemma 5.1. For every k ∈ {0, 1, 2, . . .}, there is an mso transduction from tree decompositions to elimination forests such that for every input tree decomposition t: • every output is an elimination forest of the underlying graph of t; and • if t has width at most k, then there is at least one output that is an elimination forest of optimum width.
Proof. Observe that the verification whether the width of t is at most k can be expressed by an mso sentence, so we can first use filtering to filter out any input tree decomposition t whose width is larger than k; for such decompositions, the transduction produces no output. Let G be the underlying graph of t, and let φ be the mapping that sends each vertex u of G to the unique node of t whose margin contains u. By the Conflict Lemma, there exists some elimination forest F of G of optimum width such that the conflict graph H(t, F ) admits some proper coloring λ with h(k) colors. The constructed mso transduction attempts at guessing and interpreting F as follows. First, using coloring and filtering, we guess the coloring λ, represented as a partition of the vertex set of G. Then, again using coloring and filtering, for every vertex u of G we guess whether u is a root of F , and if not, then we guess the color under λ of the parent of u in F .
Next, for every color c used in λ, we guess the forest where S u is the stain of u in t, defined as in Section 4 for the elimination forest F . Note that the stains {S u : u ∈ λ −1 (c)} are pairwise disjoint, because λ is a proper coloring of Observe also that M c is a subgraph of the decomposition t, so we can emulate guessing M c in an mso transduction working over t by guessing the subset of those nodes of t, for which the edge of t connecting the node and its parent belongs to M c . Having done all these guesses, we can interpret the child relation of F using an mso predicate as follows. Fix a pair of vertices u and v, and let c be the guessed color of u under λ. Then one can readily check that u is the parent of v in F if and only if the following conditions are satisfied: • we have guessed that v is not a root of F , • we have guessed that the color of the parent of v in F is c, and • u is the unique vertex of color c such that φ(u) belongs to the same connected component of M c as φ(v). It can be easily seen that these conditions can be expressed by an mso formula with two free variables u and v.
Finally, we filter out all the wrong guesses by verifying, using an mso sentence, whether the interpreted child relation on the vertices of G indeed forms a rooted forest, and whether this forest is an elimination forest of G. Obviously, the elimination forest F was obtained for at least one of the guesses, and survives this filtering. At the end, we remove the nodes of decomposition t from the structure using universe restriction.
Next, we need to construct the induced tree decomposition out of an elimination forest.
Lemma 5.2. There is an mso transduction from elimination forests to tree decompositions that on each input elimination forest has exactly one output, which is the tree decomposition induced by the input.
Proof. We copy the vertex set of the graph two times, and declare the second copies to be the nodes of the constructed tree decomposition. Using the child relation of the input elimination forest, we can interpret in mso the descendant relation in the forest of the decomposition. Finally, the bag relation in the induced tree decomposition, as defined in Section 3, can be easily interpreted using an mso formula.
Finally, so far the transduction can output tree decompositions of suboptimal width, which should be filtered out. For this, we need the following mso-expressible predicate.
Lemma 5.3. For every k ∈ {0, 1, 2, . . .}, there is an mso-sentence over tree decompositions that holds if and only if the given tree decomposition has width at most k and its width is optimum for the underlying graph.
Proof. Let t be the given tree decomposition of a graph G. Obviously, we can verify using an mso sentence whether the width of t is at most k. To check that the width of t is optimum, we could use the fact that graphs of treewidth k are characterized by a finite list of forbidden minors, but we choose to apply the following different strategy. Let R k be the mso transduction that is the composition of the transductions of Lemmas 5.1 (for parameter k) and 5.2. Provided the input tree decomposition t has width at most k, transduction R k outputs some set of tree decompositions of G among which one has optimum width. Hence, t has optimum width if and only if the output R k (t) does not contain any tree decomposition of width smaller than t.
The Backwards Translation Theorem for mso transductions [CE12] (see also [BP16]) states that whenever I is an mso transduction and ψ is an mso sentence over the output vocabulary, then the set of structures on which I outputs at least one structure satisfying ψ, is mso-definable over the input vocabulary. Hence, for every p < k, there exists an mso sentence ϕ p that verifies whether R k (t) outputs at least one tree decomposition of width at most p. Therefore, we can check whether t has optimum width by making a disjunction over all with 0 ≤ ≤ k of the sentences stating that t has width exactly and R k (t) does not output any tree decomposition of width less than .
Theorem 2.1 now follows by composing the mso transductions given by Lemmas 5.1 and 5.2, and at the end applying filtering using the predicate given by Lemma 5.3.

Implementing mso transductions in FPT time
In this section we prove that mso transductions on relational structures of bounded treewidth can be implemented in linear fixed-parameter time. To state this result formally, we first need to introduce some definitions regarding measuring the input and output size of the algorithm.
In the following, by the size of an mso transduction I, denoted I , we mean the sum of sizes of its atomic transductions. Here, the size of a copying step is the number of copies it produces, the size of a coloring step is 1, and the size of a transduction of any other type is the total size of mso formulas involved in its description.
By the treewidth of a relational structure we mean the treewidth of its Gaifman graph; that is, a graph whose vertices are elements of the structure, and two elements are adjacent if and only if they appear together in some tuple of some relation. The size of a relational structure A = (U, R 1 , R 2 , . . . , R c ), where U is the universe and R i is a relation of arity r i , for i = 1, . . . , c, is defined as We say that an algorithm that receives a structure A on input implements I on A if it either correctly concludes that I(A) is empty, or outputs an arbitrary structure belonging to I(A).
We may now formally state the algorithmic result for mso transductions..
Theorem 6.1. There is an algorithm that, given an mso transduction I and a relational structure A over the input vocabulary of I, implements I on A in time f ( I , w) · (n + m), where n and w are the size and the treewidth of the input structure, respectively, m is the size of the output structure (or 0 if I(A) is empty), and f is a computable function.
The cornerstone of the proof of Theorem 6.1 is a normalization theorem for mso transductions: every mso transduction can be written in a simple normal form that allows for algorithmic treatment. To describe this form, it will be useful to introduce another type of an mso transduction, which is a special case of interpretation. By a renaming we mean an interpretation step that only renames symbols from the signature, possibly dropping some of them. Precisely, if the input vocabulary is Σ and the output vocabulary is Γ, then there is an injective function ρ : Γ → Σ such each symbol R ∈ Γ, say of arity r, is interpreted by the formula φ R (x 1 , . . . , x r ) = ρ(R)(x 1 , . . . , x r ). We can now state the normalization theorem. where the above are mso transductions as follows: • I color is a finite sequence of coloring steps; • I filtering is a single filtering step; • I copy is a single copying step; • I interprete is a single interpretation step; • I restrict is a single universe restriction step; • I rename is a single renaming step. Moreover, there is an algorithm that, given I, computes the normal form as above.
The proof of Theorem 6.2 roughly proceeds as follows. We write the given mso transduction as a sequence of atomic transductions, each being a coloring, filtering, copying, interpretation, or universe restriction step. Then, we give a number of swapping and merging rules that enable us to swap these transductions while modifying them slightly. It is shown that by applying the rules exhaustively, we eventually arrive at the claimed normal form. While basically all the rules are straightforward, their full verification takes some effort. We give the proof of Theorem 6.2 in Section 8 for completeness, while for now let us take it for granted and proceed with the proof of Theorem 6.1.
Proof of Theorem 6.1. By Theorem 6.2, we can assume that I is in the normal form Suppose further that I color is a sequence of coloring steps that introduce new unary predicates X 1 , X 2 , . . . , X c , for some constant c, while I copy copies the universe times, for some constant . The proof will follow from the following two claims. In the following we use f for an arbitrary computable function, possibly different in each context. Claim 6.3. One can in time f ( I , w) · n determine a sequence of subsets X 1 , . . . , X c of elements of A such that filtering I filter preserves A enriched with X 1 , X 2 , . . . , X c as unary predicates, or correctly conclude that such a sequence does not exist.
Claim 6.4. Given A enriched with unary predicates X 1 , X 2 , . . . , X c , one can in time f ( I , w) · (n + m) compute the output of I rename • I restrict • I interprete • I copy on this structure, where m is the size of the output.
Note here that in Claim 6.4, the transduction I rename • I restrict • I interprete • I copy uses neither copying nor filtering, hence every input structure is mapped to exactly one output structure.
Observe that the proof follows trivially from combining Claims 6.3 and 6.4 as follows. First, using the algorithm of Claims 6.3 one tries to compute any sequence of element subsets X 1 , X 2 , . . . , X c for which the filtering step I filter passes. If this cannot be done, then I(A) is empty, and this conclusion can be reported. Otherwise, we plug the obtained sequence to the algorithm of Claim 6.4, thus computing an arbitrary structure from I(A).
We now prove Claims 6.3 and 6.4 in order. For this, we use the following results on answering mso queries on structures of bounded treewidth. Suppose we are given a relational structure A with tw(A) = w. Suppose further that ϕ(X 1 , . . . , X c , x 1 , . . . , x d ) is an mso formula over the vocabulary of A, where X i are monadic variable and x i are first-order variables. A tupleȳ = (A 1 , . . . , A c , a 1 , . . . , a d ) is an answer to the mso query ϕ if A |= ϕ(A 1 , . . . , A c , a 1 , . . . , a d ). Flum et al. [FFG02] gave an algorithm that in time f (w, ϕ ) · (n + m) outputs all the answers to ϕ on A, where n is the size of the universe of A, m is the total size of the output, and f is a computable function. Later, Bagan [Bag06] Vol gave an enumeration algorithm for solving mso queries on structures of bounded treewidth: this algorithm uses f (w, ϕ ) · n preprocessing time, and then reports answers to the query with delay between two consecutive reports bounded by f (w, ϕ ) · |ȳ|, where |ȳ| is the size of the next answer. A different proof of this result, but for queries using only first-order variables, was later given by Kazana and Segoufin [KS13].
In the sequel, f always denotes some computable function, possibly different in each context.
Proof of Claim 6.3. Let ψ be the mso formula used in the filtering step I filter , which works over the input structure A enriched with sets X 1 , . . . , X c . That is, the filtering step passes only if the sets X 1 , . . . , X c guessed by I color satisfy A, X 1 , . . . , X c |= ψ. Interpret ψ as an mso query on t with free monadic variables X 1 , . . . , X c . Run the algorithm of Bagan [Bag06] on it to enumerate only the first answer, or to conclude that there are no answers; either of these outcomes may be then reported. The preprocessing step takes time f ( ψ , w) · n, whereas the construction of the first answer also takes time f ( ψ , w) · n, since the size of the answer is trivially bounded by cn. Since ψ ≤ I , the claimed running time follows.
Proof of Claim 6.4. First, the step I copy can be just performed in time f ( I , w) · n, since is a constant bounded in terms of I . Observe here that since tw(A) ≤ w, the treewidth of the structure output by I copy is bounded by (w + 1). This follows by replacing, in every bag of an optimum-width tree decomposition of the Gaifman graph of A, each element of the original structure with its copies in A , the structure output by I copy . Next, we implement Take, any relation R of the output vocabulary, say of arity r, and let R be the relation from which R originates in the renaming step I rename . Let ϕ R (x 1 , . . . , x r ) be the formula used in I interprete to interprete R , and let ϕ(u) be the formula used in I restrict to restrict the universe. Moreover, let ϕ (u) be a formula constructed from ϕ(u) by replacing every relation atom Q(x 1 , . . . , x q ) by its interpretation ϕ Q (x 1 , . . . , x q ) under I interprete . Consider the formula Observe that α R (x 1 , . . . , x r ) in the structure A selects exactly those tuples (x 1 , . . . , x r ) that satisfy R(x 1 , . . . , x r ) in the output structure.
Hence, given the structure A , we implement I rename • I restrict • I interprete as follows. First, ϕ (u) can be regarded as an mso query with one free first-order variable over A ; obviously, the number of answers to this query is bounded by the size of the universe of A , which is n. Hence the algorithm of Flum et al. [FFG02] can output all the answers to this query, which are exactly the elements that are preserved in the universe by I restrict , in time f ( ψ , (w + 1)) · h( I ) · n, which is bounded by g( I , w) · n for some computable g. Thus, we have computed the universe of the output structure.
To compute the relations in the output structure, for every relation R of the output vocabulary, say of arity r, apply the algorithm of Flum et al. [FFG02] for the query α R (x 1 , . . . , x r ) on A . Thus we compute the set of tuples selected by R in the output structure in time f ( I , (w + 1)) · (h( I ) · n + m R ), where m R is the size of relation R in the output. By summing this bound through all relations of the output vocabulary, we obtain a running time of the form g( I , w) · (n + m) for some computable g, where m is the output size.
As argued before, the proof of Theorem follows from Claims 6.3 and 6.4. An observant reader might wonder why in the proof of Theorem 6.1 we actually needed the normal form provided by Theorem 6.2, as it would be natural to just implement consecutive atomic transductions comprising the input transduction I one by one, each in linear time. There are two reasons for this. First, every coloring step introduces a large number of possible intermediate outputs, and it can happen that only few of them eventually lead to producing an output of the whole transduction I, due to later filtering steps. At the moment of applying a coloring step it is difficult to determine which intermediate outputs will eventually get filtered out; the normal form facilitates this verification through Claim 6.3. Second, applying atomic transductions comprising I one by one and computing each intermediate output means that the running time is linear in the maximum size among the intermediate outputs. This can be much larger than the maximum among the sizes of the input and the final output, which is the measure promised in Theorem 6.1. The normal form helps here in compressing a possibly long sequence of atomic transductions into a sequence manageable in its entirety.
We now show how the result of Bodlaender and Kloks [BK96] may be obtained as a direct corollary of our meta-results.
Corollary 6.5. For every k ∈ {0, 1, 2, . . .} there exists a linear-time algorithm that, given a graph G and its tree decomposition of width k, returns a tree decomposition of G of optimum width.
Proof. Let A be the relational structure representing G together with the input tree decomposition t of G of width at most k. It can be easily seen that the Gaifman graph of A has treewidth at most 2k + 3, and its tree decomposition of such width can be constructed from t in linear time. To obtain a tree decomposition of G of optimum width, it suffices to apply the algorithm of Theorem 6.1 to A and the transduction given by Theorem 2.1. For the running time bound, observe that the size of the output is bounded linearly in the size of the input.
As we argued in Section 2, our proof of Theorem 2.1 is actually constructive: given k, one can compute the transduction given by Theorem 2.1 for this value. Thus, we can infer a slightly stronger uniform variant of Corollary 6.5, where k is also given in the input and the algorithm works in linear fixed-parameter time, that is, in time f (k) · n for some computable f , where n is the size of the input. While the uniformity of the algorithm follows from our arguments in this way, we unfortunately do not see an easy way to recover upper bounds on the running time similar to those in Theorem 1.1 using our approach.
A careful reader may have observed that our claim of recovering the algorithm of Bodlaender and Kloks [BK96] via meta-tools might seem like cheating. Namely, the algorithms of Flum et al. [FFG02] and of Bagan [Bag06], which are invoked in the algorithm of Theorem 6.1, actually use the linear-time algorithm of Bodlaender [Bod96] to compute a tree decomposition of the given structure. This algorithm, on the other hand, uses the algorithm of Bodlaender and Kloks [BK96] as a subroutine, thus creating a cycle of dependencies. This issue is, however, not really problematic. Namely, the algorithms of [FFG02,Bag06] use the linear-time algorithm of Bodlaender [Bod96] only as an opening step, to compute a tree decomposition that will be used in further computations. In our setting, we have a tree decomposition of the input structure in our hand, so there is no need of performing this step. Thus, we indeed obtain a new implementation of the algorithm of Bodlaender and Kloks [BK96]. Note, however, that this vicious cycle of dependencies would persist if we tried to combine Theorem 6.1 with the transduction of Corollary 2.2 in order to obtain a new implementation of the linear-time algorithm of Bodlaender [Bod96]. This is because in the setting of Corollary 2.2, there is no tree decomposition given on the input. Therefore, we do not obtain a new implementation of the algorithm of Bodlaender [Bod96] via our meta-techniques. We see, however, a potential for our tools to be useful in computing other types of tree-like decompositions of graphs; we discuss this matter in more details in Section 9.

Proof of the Dealternation Lemma
In this section we prove the Dealternation Lemma (Lemma 3.7), as well as some auxiliary simple facts whose proofs were omitted in Section 3. We begin with introducing some auxiliary tools on dealternation in words, as well as we give a few useful properties of maximal factorizations; in particular we prove Lemma 3.3. Then, we move to elimination forests: we prove Lemma 3.6, and we investigate a normalized form of elimination forests that we call reduced. Finally, we complete the proof of the Dealternation Lemma using the gathered tools. We first show how the Dealternation Lemma follows from an auxiliary result, called Local Dealternation Lemma, which can be thought of as one "fixing step". Then we conclude by proving the Local Dealternation Lemma. 7.1. Words and alternation. We now give some auxiliary combinatorial tools on reshuffling on a word over alphabet {−, +} in order to reduce its "alternation", while preserving some extremal properties. These results hold the essence of the technique of typical sequences, used by Bodlaender and Kloks in [BK96].
Fix the alphabet Σ = {−, +}. For a word w ∈ Σ , we define: • the sum of w, denoted sum(w), is the number of + in w, minus the number of − in w; • the prefix maximum of w, denoted pmax(w), is the maximum of sum(u) for u ranging over the prefixes of w; • the prefix minimum of w, denoted pmin(w), is the minimum of sum(u) for u ranging over the prefixes of w. Suppose a word w ∈ Σ has every position colored with some color drawn from some set of colors; in such a case, we will talk about a colored word. A block in a colored word w is a maximal set of consecutive letters colored with the same color.
We say that a colored word w is a block-shuffle of w if w can be obtained from w by permuting its letters (and keeping their colors) in such a manner that (i) within each color, the order of the letters remains the same as in w, and (ii) every block of w remains contiguous in w . Note that (ii) is equivalent to saying that every block of w, after applying the permutation, is contained in a block of w . It is clear that if w 1 is a block-shuffle of w 2 , which in turn is a block-shuffle of w 3 , then w 1 is a block-shuffle of w 3 .
Informally, the main result of this section can be stated as follows: provided a colored word w has bounded prefix maximum, and prefix minima within colors are also not too small, then there exists a block-shuffle of w that achieves a small number of blocks of each color. The formal statement follows.
Lemma 7.1. Suppose w ∈ Σ is colored with two colors. Suppose further that pmax(w) ≤ a for some nonnegative integer a, and if u is a word derived from w by restricting it to all the letters of one of the colors, then pmin(u) ≥ −b, for some nonnegative integer b. Then there exists a block-shuffle w of w such that also pmax(w ) ≤ a, but w has at most a/2 + 2b + 1 blocks in each of the colors.
Proof. Let us factorize w as w = w 1 w 2 . . . w n , where w i , for i = 1, 2, . . . , n, are the blocks of w. Thus, odd-numbered blocks are colored with one color, while the even-numbered blocks are colored with the second color. By considering swapping two consecutive blocks, we observe the following fact; the proof is a straightforward check.
Claim 7.2. Suppose that for some i, 1 ≤ i < n, we have that sum(w i ) ≥ 0 and sum(w i+1 ) ≤ 0. If w is obtained from w by swapping blocks w i and w i+1 , then w is a block-shuffle of w with pmax(w ) ≤ pmax(w).
Starting with the original word w, we apply the operation of Claim 7.2 exhaustively, up to the point when it cannot be applied anymore, or we obtain a word with exactly two blocks. Note that this procedure ends after a finite number of steps, as each swap strictly reduces the total number of blocks in the word (here we use the fact that we stop the procedure one a word with two blocks is obtained). Let w be the word obtained at the end of this procedure. If w has one block of each color, then we are done. Otherwise, suppose w is such that the operation of Claim 7.2 cannot be applied to w . Then we have that w is a block-shuffle of w and pmax(w ) ≤ pmax(w).
Let us factorize w as where w i are the blocks of w . Let I ⊆ {1, 2, . . . , n } be the set of those positions i, for which sum(w i ) ≥ 0. Since the operation of Claim 7.2 is not applicable to w , we infer that I is a suffix of {1, 2, . . . , n }, that is, there is a position j such that I = {j, . . . , n }. By the definition of I, we have sum(w i ) < 0 for all i < j. Therefore, among blocks w i for i < j there can be at most b blocks of each color; otherwise, restricting w (equivalently w) to letters of this color would yield a word with prefix minimum lower than −b. Hence there can be at most 2b blocks before block w j , and moreover we must have sum(w 1 w 2 . . . w j−1 ) ≥ −2b. On the other hand, for all i > j we have that sum(w i ) > 0, because otherwise the operation of Claim 7.2 would be applicable to blocks w i−1 and w i (recall that j < i implies that i − 1 ∈ I, which means that sum(w i−1 ) ≥ 0). Hence there cannot be more than a + 2b blocks after block w j , because then we would have that pmax(w ) > a, contradicting pmax(w ) ≤ pmax(w) ≤ a. We conclude that w has at most a + 4b + 1 blocks in total, so at most a/2 + 2b + 1 blocks in each of the colors, as requested.
7.2. Factorizations: additional properties. In the following, we will use the fact that for any subset of nodes U in a rooted forest, the numbers of factors in the maximal factorizations of U and of the complement of U are related to each other. We begin with the case when the complement of U is small. For a context factor B in factorization fact(U ), let r(B) be the parent of the appendices of B. Clearly, since r(B) ∈ B, function r is injective. We prove that for a context factor B, either r(B) has a child that belongs to X, or there are at least two different children of r(B) that have nodes of X as descendants. Suppose the contrary. This implies that either all the descendants of r(B) belong to U , or all the descendants of r(B) that are contained in X actually belong to the tree factor at the same child u of r(B), which moreover belongs to U . In the first case we observe that by adding all the descendants of r(B) to B we obtain a tree factor that is a U -factor, which contradicts the maximality of B. In the second case we observe that by adding to B all the descendants of r(B) apart from strict descendants of u (in particular we add u) we obtain a context factor that is a U -factor, which again contradicts the maximality of B.
Therefore, r injectively maps the context factors of fact(U ) to the set consisting of parents of vertices of X and lowest common ancestors of pairs of vertices of X. It is well known that in a rooted forest, for any node subset X, the set of the lowest common ancestors of pairs of vertices from X has size at most |X| − 1. Hence, r injectively maps the context factors of fact(U ) into a set of cardinality at most 2|X| − 1, thereby proving that the number of context factors in fact(U ) is at most 2|X| − 1.
Lemma 7.3 can be conveniently lifted to the setting where X can be large, but its maximal factorization has a small number of factors. Proof. Define a rooted forest F by identifying every maximal W -factor into a single vertex. More precisely, for every maximal W -factor A that is a forest factor, replace it with a single node x A . Make x A a child of the parent of the roots of A, or a root node if the roots 26:20

M. Bojańczyk and M. Pilipczuk
Vol. 18:1 of A were root nodes. Similarly, for every maximal W -factor B that is a context factor, replace it with a single node x B . Make x B a child of the parent of the root of B, or a root node if the root of B was a root node. Also, make every appendix of B a child of x B . Let X = {x A : A ∈ fact(W )}, then |X| = k. It can be easily seen that every maximal U -factor in F remains a maximal U -factor in F . Then the claim follows from Lemma 7.3 applied to forest F and the partition (X, U ) of its node set.
Finally, we observe that removing a small number of vertices from a set does not change the number of factors in its maximal factorization by much.
Lemma 7.5. Suppose F is a rooted forest and U ⊆ U are two node subsets such that |U \ U | ≤ , for some nonnegative integer . Then Proof. Let W = V (F ) \ U and W = V (F ) \ U be the complements of U and U , respectively. By Lemma 7.4, we have that |fact(W )| ≤ 3|fact(U )|. Observe now that W = W ∪ (U \ U ), so there is a partition of W into |fact(W )| + |U \ U | many W -factors: one can take the maximal factorization of W and add every vertex of U \ U as a singleton factor. Consequently, the maximal factorization of W has at most this many factors, hence Finally, U is the complement of W , so using Lemma 7.4 again we obtain that |fact(U )| ≤ 3|fact(W )|.
By combining the three inequalities above we are done. 7.3. Elimination forests. We begin with proving Lemma 3.6, then we introduce reduced elimination forests and investigate their properties.
Lemma 7.6 (Lemma 3.6, restated). For every graph G there exists an elimination forest of G whose width is equal to the treewidth of G.
Proof. Let t be an optimum-width tree decomposition of G. Fix any linear order on the vertices of G. For every vertex u of G, let x u be the node of t whose margin contains u; since margins form a partition of the vertex set, such a node exists and is unique. Define now a structure of a rooted forest F on the vertex set of G as follows: then make u an ancestor of v in F if u v, and make v an ancestor of u otherwise. First observe that the forest F defined above is an elimination forest of G. Indeed, for every vertex u, all the nodes of t whose bags contain u are descendants of x u ; this follows from the definition of the margin. Hence, if uv is an edge of G, then the node whose bag contains u and v must be both a descendant of x u and a descendant of x v . Consequently, x u and x v must be bound by the ancestor-descendant relation.
Finally, we verify that the width of the tree decomposition t induced by F is no larger than the width of t. To this end, we show that for every vertex u of G, the bag of u in t is a subset of the bag of x u in t. Recall that the bag of u in t consists of u and all the ancestors of u in F that have a neighbor among the descendants of u in F . Clearly, u itself belongs to the bag of x u in t. Take then any vertex v that is an ancestor of u in F that is adjacent to some w that is a descendant of u in F . Since v is an ancestor of u in F and w is a descendant of u in F , it follows that x v is an ancestor of x u in t and x w is a descendant of x u in t. The latter conclusion implies that all the nodes of t whose bags contain w, are in fact descendants of x u . Observe that one of these bags must contain v as well, as vw is an edge of G. Consequently, v is contained both in the bag of some descendant of x u , and in the bag of some ancestor of x u , namely x v . This implies that v is contained in the bag of x u , as claimed.
Reduced elimination forests. Intuitively, a reduced elimination forest is one that is minimal in terms of the depth of the nodes.
Definition 7.7. An elimination forest F of a graph G is reduced if for every vertex u and every its child v in F , u has a neighbor among the descendants of v.
The condition above was already considered in the context of treedepth [FGP15] and trivially perfect graphs [DFPV15]. We now show that in Lemma 7.6 one can require that the elimination forest is reduced.
Lemma 7.8. For every graph G there exists a reduced elimination forest of G whose width is equal to the treewidth of G.
Proof. Lemma 7.6 asserts that there are some elimination forests of G that have width equal to the treewidth of G. Among these elimination forests, pick one that minimizes the sum of depth of all the vertices, and call it F . We claim that F is reduced.
Suppose, for the sake of contradiction, that some vertex u has a child v such that no descendant of v is adjacent to u. Modify F by re-attaching v: make v a child of the parent of u, instead of v, or make it into a root if v has no parent (is a root). Since we assumed that the descendants of v are non-adjacent to u, it follows that the obtained forest F is still an elimination forest. Moreover, during the modification only some vertices ceased to be the descendants of u, and otherwise the sets of ancestors and descendants of all the vertices stayed the same. Consequently, in the construction of the induced tree decomposition from F , every vertex will be assigned a bag that is a subset of the bag that was assigned to it when F was considered. This implies that the width of F is not larger than the width of F . However, F has strictly smaller sum of depths of all the nodes than F . This contradicts the choice of F . We now derive a simple, yet useful property of a reduced elimination forest.
Lemma 7.9. Suppose F is a reduced elimination forest of a graph G. Then for every tree factor A in F , the subgraph G[A] is connected.
Proof. For the sake of contradiction, suppose A can be partitioned into nonempty subsets X and Y such that there is no edge between X and Y in G. Since A is a tree factor in F , there is at least one pair of vertices (u, v) such that u is the parent of v, while u and v belong to the opposite sides of the partition (X, Y ). Choose (u, v) so that v is the deepest among pairs with this property, and assume w.l.o.g. that u ∈ X and v ∈ Y . By the choice of (u, v), the tree factor at v is entirely contained in Y . Hence no descendant of v is adjacent u, due to u ∈ X. This is a contradiction with F being reduced. Finally, we derive two additional technical lemmas about reduced tree decompositions, which will be needed to achieve property (D2) of the Dealternation Lemma.
Lemma 7.10. Suppose F is a reduced elimination forest of a graph G; let be the width of F . Suppose further that X, A 1 , A 2 , . . . , A p is a partition of the vertex set of G such that there is no edge between A i and A j for i = j. Then any maximal (V (G) \ X)-factor that is a context factor, intersects at most + 1 among sets A 1 , A 2 , . . . , A p .
Proof. Let t be the tree decomposition induced by F . Fix any maximal (V (G) \ X)-factor B that is a context factor, and assume B ∩ A i is nonempty for some i.
Let B ⊇ B be the tree factor whose root is the root of B. Since B is a maximal (V (G) \ X)-factor that is a context factor, we infer that B must contain at least one vertex of X, because otherwise B would be a (V (G) \ X)-factor that would be a strict superset of B. By Lemma 7.9, G[B ] is connected. Observe that X ∩ B ⊆ B \ B, and hence sets X ∩ B and A i ∩ B are disjoint. Let P be a shortest path between X ∩ B and A i ∩ B in G [B ]. Denote the endpoints of P by u and v, where u ∈ A i ∩ B and v ∈ X ∩ B . As P was chosen to be the shortest, no vertex of P apart from v belongs to X. Since all the neighbors of vertices of A i lie in A i ∪ X, and u belongs to A i , we infer that all the vertices on P apart from v belong to A i .
Since P is connected, the set of those vertices whose bags in t contain any vertex of P , induces a connected subtree of F . This subtree contains both a vertex in B, namely u, and a vertex in B \ B, namely v, and hence it contains the whole path in F between these vertices. In particular the parent of the appendices of B is included in this subtree; denote it by w. Summarizing, there is a vertex a on P that is included in the bag of w. Observe that a cannot be equal to v. This is because v belongs to the forest factor B \ B, so all the nodes whose bags contain v also belong to this forest factor. Consequently, a is a vertex on P that is different than v, so a ∈ A i .
Since this reasoning can be performed for each i such that B ∩ A i is nonempty, for each such index i we obtain a different vertex a that needs to be included in the bag of w. The size of the bag of w is, however, bounded by + 1, so the same bound holds also for the number of indices i as above.
Lemma 7.11. Suppose t is a tree decomposition of width k of a graph G and F is a reduced elimination forest of G of width at most k, such that t and F satisfy condition (D1) of the Dealternation Lemma (Lemma 3.7) for some function f (k) ∈ O(k 3 ). Then for every node x of t, there are at most g(k) children of x in the set {y : y is a node of t with at least one context factor in fact F (cmp t (y))}, Proof. Fix any node x of t, and let y 1 , y 2 , . . . , y p be its children in t. Denote Since t is a tree decomposition of G, it follows that there is no edge between A i and A j for any i = j, and hence the tuple (X, A 1 , A 2 , . . . , A p ) satisfies the prerequisites of Lemma 7.10. Recall that F is reduced and has width at most k, so by Lemma 7.10 we conclude that for any context factor B from the maximal factorization of fact F (V (G) \ X), at most k + 1 among sets A 1 , A 2 , . . . , be obtained from cmp t (x) by removing at most k + 1 vertices. The maximal factorization of cmp t (x) in F has at most f (k) factors, so by Lemma 7.5 we have that the maximal factorization of V (G) \ X in F has at most 9 · f (k) + 3(k + 1) factors. Consequently, if we take g(k) = (9 · f (k) + 3(k + 1)) · (k + 1), then at most g(k) among sets A 1 , A 2 , . . . , A p can intersect any context factor in the maximal factorization of V (G) \ X. We claim that all the other sets A i have only forest factors in their maximal factorizations, which will conclude the proof.
Take any such A i , that is, A i intersects only forest factors of the maximal factorization of V (G) \ X. Let B be any tree factor in F that is contained in V (G) \ X. Since F is reduced, by Lemma 7.9 we have that G[B] is connected. There are no edges between A i and A j for any j = i, so we conclude that B is either entirely contained or entirely disjoint with A i . Since A i is disjoint with all the context factors of fact F (V (G) \ X), it follows that the set A i is closed under taking descendants in F . In particular, this implies that the maximal factorization of A i contains no context factors, as promised.
7.4. From Local to Global Dealternation Lemma. In this section we give a proof of the Dealternation Lemma assuming its local counterpart, which will be formulated in a moment. First, for convenience we introduce the appropriate notion of alternation for tree decompositions.
Definition 7.12. Suppose t is a tree decompositions of a graph G, and F is an elimination forest of G. The t-alternation of F is defined as the maximum among the nodes x of t, of the number of maximal cmp t (x)-factors in F . In other words, the t-alternation of F is equal to: Thus, to prove the Dealternation Lemma it suffices to show that there always exists an optimum-width elimination forest F of G, such that the t-alternation of F is bounded by a quadratic function of the width of t (that is, condition (D1) holds), and such that F also satisfies condition (D2).
The idea for the proof is as follows. We take any reduced elimination forest F of G of optimum width, and iteratively "correct" F so that its t-alternation becomes bounded. To achieve this, we examine each node x of t and correct F so that the number of cmp t (x)-factors in F is bounded by f (k), for some quadratic function f . For this, we devise a local correction procedure, which we call the Local Dealternation Lemma; this procedure is applied iteratively to all the nodes of t.
Lemma 7.13 (Local Dealternation Lemma). There exists a function f (k) ∈ O(k 3 ) such that the following holds. Suppose G is a graph of treewidth at most k, and F is a reduced elimination forest of G of optimum width. Suppose further that (U, X, W ) is a partition of the vertex set of G such that |X| ≤ k + 1 and there is no edge between U and W . Then there is a reduced elimination forest F of G of optimum width with the following conditions satisfied: (LD1) There are at most f (k) maximal U -factors in F . (LD2) For every U ⊆ U , every U -factor in F is also a U -factor in F . (LD3) For every W ⊆ W , every W -factor in F is also a W -factor in F . We remark that a statement formulating the essence of the Local Dealternation Lemma can be found in the work of Courcelle and Lagergren [CL96, Theorem 6.3].
When applying the Local Dealternation Lemma to each component of t, we need to be careful, as we have to make sure that one application that corrects cmp t (x)-factors for some x, does not increase the number of cmp t (y)-factors for nodes y that were corrected before. To achieve this, we shall apply the Local Dealternation Lemma in a bottom-up order on the nodes of x. At the end, this ensures that property (D1) is satisfied. For property (D2), we guarantee that all the intermediate, as well as the final elimination forest is reduced, and we make use of Lemma 7.10. We now proceed to a formal reasoning, supposing that the Local Dealternation Lemma holds.
Proof of Dealternation Lemma, using Local Dealternation Lemma. Since we know that t has width at most k, we have that all adhesions in t have sizes not larger than k + 1. Let F 0 be any reduced elimination forest of G of optimum width, which exists by Lemma 7.8. Clearly, as t has width k, the width of F 0 is at most k.
Let be an arbitrary linear order on the node set of t such that whenever a node x is a strict descendant of a node y, then x comes before y in . Let where m = |V (t)|. We process the nodes of t in the order , inductively computing reduced elimination forests F 1 , . . . , F m , starting with F 0 . We keep the following invariant for every i = 0, 1, . . . , m: in decomposition F i , the number of cmp t (x j )-factors is at most f (k) for every j ≤ i, where f is the function given by the Local Dealternation Lemma. Thus, the invariant is satisfied vacuously for i = 0. Observe that the reduced elimination forest F m obtained at the end of the construction has t-alternation bounded by f (k).
For i ≥ 1, construct decomposition F i by applying the Local Dealternation Lemma to the elimination forest F i−1 and partition of the vertex set of G; the fact that this partition satisfies the prerequisites of the Local Dealternation Lemma follows from the properties of the tree decomposition t. Clearly, by condition (LD1), the number of maximal cmp t (x i )-factors in F i is at most f (k). It remains to prove the same conclusion for maximal cmp t (x j )-factors, for every j < i. Since x j ≺ x i , we have that x j is not an ancestor of By condition (LD2), every cmp t (x j )-factor in F i−1 is also an cmp t (x j )-factor in F i , hence the number of maximal cmp t (x j )-factors in F i cannot be larger than the number of maximal cmp t (x j )-factors in F i−1 , which is at most f (k) by induction. If x j is not a descendant of x i , then since it is neither an ancestor, we obtain that cmp t (x j ) ⊆ V (G) \ (cmp t (x i ) ∪ adh t (x i )) = W . Again, by condition (LD3), every cmp t (x j )factor in F i−1 is also an cmp t (x j )-factor in F i , hence again the number of maximal cmp t (x j )factors in F i cannot be larger than f (k) by induction.
Thus, we have found an elimination forest F = F m of G such that: (i) F is reduced and has optimum width, and (ii) the t-alternation of F is bounded by f (k). Hence, F satisfies property (D1). Finally, note that property (D2) is also satisfied for F due to Lemma 7.11, because F is reduced. 7.5. Proof of the Local Dealternation Lemma. We are left with proving the Local Dealternation Lemma. Let F be the given reduced elimination forest of G of optimum width. Also, let s be the tree decomposition induced by F . Recall that the forest underlying s is equal to F , while the bags are constructed as described in Section 3. Finally, let ≤ k be the width of s, which is equal to the treewidth of G.
To ease the description, we color the vertices of G as follows: vertices of U are red, and the vertices of W are blue. The vertices of X do not receive any color. When we say that some set is monochromatic, we mean that all its members are red or all its members are blue. In particular, a monochromatic set has no elements of X. Similarly, when we say that two vertices have the same color, or are of different colors, we implicitly state that both of them are assigned some color, so they belong to U ∪ W .
The idea is to modify the forest F by performing local "surgery" on its shape, so that at the end it satisfies condition (LD1). During the modification we will make sure that the final decomposition will be reduced and will satisfy conditions (LD2) and (LD3). In order not to obfuscate the description, we do not verify conditions (LD2) and (LD3) directly, as their satisfaction follows immediately from the nature of the modification performed. More precisely, the modification will satisfy the following invariants: (I1) Whenever u is the parent of v, and u and v are of the same color, then u remains the parent of v after the modification. (I2) Whenever u and v are siblings, and the tree factors at u and v are monochromatic and of the same color, then u and v remain siblings after the modification. (I3) Whenever u ∈ U ∪ W is a leaf, it remains a leaf after the modification. It is easy to see that the satisfaction of these invariants ensures that conditions (LD2) and (LD3) are preserved. We leave the verification of the invariants throughout the description to the reader. Finally, the fact that the output elimination forest is reduced will be checked explicitly.
The main idea is to examine each maximal (U ∪ W )-factor in F , and reorganize it so that it can be partitioned into a bounded number of U -and W -factors. The fact that F is reduced implies that no reorganization is needed for forest factors: from Lemma 7.9 it follows that every forest factor of fact F (U ∪ W ) can be partitioned into one U -factor and one W -factor. For context factors of fact F (U ∪ W ), some rearrangement is, however, necessary. For this, we will use the tools developed in Section 7.1.
We start by observing the following property that is implied by the fact that F is reduced.
Claim 7.14. Suppose u is a vertex such that the tree factor at u in F is entirely contained in U ∪ W . Then this tree factor is monochromatic. Moreover, if u has a parent v, then it cannot happen that u and v have different colors.
Proof. Let A be the tree factor at u. Since there is no edge between U and W , there is also no edge between U ∩ A and W ∩ A. However, G[A] is connected by Lemma 7.9. Hence either U ∩ A or W ∩ A is empty, which establishes the first claim. For the second claim, observe that otherwise the pair (v, u) would contradict the fact that F is reduced.
We now examine fact F (U ∪ W ), the maximal factorization of U ∪ W in F . Recall that this partition of U ∪ W consists of all maximal (U ∪ W )-factors in F . Since |X| ≤ k + 1, by Lemma 7.3 we obtain the following.
Claim 7.15. Factorization fact F (U ∪ W ) has at most k + 2 forest factors and at most 2k + 1 context factors.
The context factors of fact F (U ∪ W ) may need reorganization. Fix some context factor B from fact F (U ∪ W ). We now analyze the structure of B. The path from the root of B to the parent of the appendices of B shall be called the spine of the context factor B; we denote the spine by S. For a vertex v ∈ S, let R v denote the set of those strict descendants of v which belong to B, and for which v is their lowest ancestor on the spine. Note that R v may be empty, if no such descendant exists, and otherwise it is a forest factor with roots being those children of v that are in B but are not on S. Let us observe the following.
Claim 7.17. For each vertex v ∈ S, every vertex of R v has the same color as v.
Proof. Follows immediately from Claim 7.14. For Observe that each such set C v , for v ∈ S, is a context factor in F , which is moreover monochromatic by Claim 7.17. Thus, each C v is a U -or W -factor, depending on the color of v.
A vertex v ∈ S shall be called important if either v is the deepest vertex on S (i.e., the parent of the appendices of B), or adh s (v) \ adh s (v ) contains a vertex of X, where v is the child of v on S. We note that there are not so many important vertices.
Claim 7.18. There are at most + 1 important vertices on S.
Proof. For each important vertex v ∈ S that is not the deepest vertex on S, select any vertex x v that belongs both to X and to adh s (v) \ adh s (v ), where v is the child of v on S. Observe that since B ∩ X = ∅, it follows that x v is an ancestor of all the vertices of S. Hence x v ∈ adh s (r), where r is the root of B. As |adh s (r)| ≤ and vertices x v are pairwise different for different vertices v, it follows that the number of important vertices on S is at most + 1 (where the +1 summand is contributed by the deepest vertex of S).
We now consider a factorization F B of B into context factors defined as follows: • For each important vertex v on S, put C v into F as a separate context factor. These context factors shall be called important. • For each maximal subpath S of S that does not contain any important vertices, put the context factor v∈S C v into F. These context factors shall be called regular. By Claim 7.18, F B consists of at most + 1 important factors and at most + 1 regular factors. The important factors of F B are monochromatic by Claim 7.17, but the same cannot be said about the regular ones. Therefore, let us fix a regular factor B ∈ F B . That is, B = v∈S C v for some maximal subpath S of S that does not contain any important vertices. The path S shall be called the spine of B .
Let us enumerate the vertices of S as v 1 , . . . , v m , where v i is an ancestor of v j for i ≤ j. Further, let v m+1 be the child of v m on S (which exists due to considering the deepest vertex of S to be important). For brevity, we will write R i = R v i and C i = C v i . For each i ∈ {1, 2, . . . , m}, define Note that by the way s is constructed from F (see Section 3), Q i comprises all strict ancestors of v i that do have a neighbor in C i , but do not have any neighbors among the descendants of v i+1 . This implies the following.
Proof. By symmetry, suppose v i ∈ U . Consider any x ∈ Q i and let w be any neighbor of x in C i . By Claim 7.17, we have w ∈ U . Since v i is not important, x / ∈ X. Therefore we must have x ∈ U , for otherwise wx would be an edge with one endpoint in U and the second in W .
For i = 1, 2, . . . , m, let x i be the word over the alphabet Σ = {+, −} defined as follows: That is, we first put one +, and then repeat − exactly |Q i | times. Color x i with the same color as v i , and define a bichromatic word h as follows: The idea is apply the block-shuffle given by Lemma 7.1 to h; this block-shuffle will naturally induce a reorganization of B within F , as depicted on Figure 3. Thus, the number of monochromatic blocks will be reduced, while the additional properties asserted by Lemma 7.1 will ensure that the width of the decomposition does not increase.
We proceed to the details, but first we need to examine the parameters of h needed to apply Lemma 7.1.
Claim 7.20. For each i ∈ {1, 2, . . . , m}, we have sum( as claimed. On the other hand, by Claim 7.20, for every i = 0, 1, . . . , m we have that The first claimed inequality follows. Consider now the word h red , which is defined as On the other hand, since word x i is nonempty exactly when C i is red, similarly as in Claim 7.20 we obtain that Consequently, we have which implies the second claimed inequality. The proof for the third one is analogous.
Thus, we can apply Lemma 7.1 to the word h, obtaining a word h with the following properties.
• Word h is a block-shuffle of h, in particular every subword x i remains contiguous in h .
• The numbers of red and blue blocks in h are not larger than (5 + 3)/2. Now, based on h , we construct the modified context factor in a natural manner. Let π : {1, . . . , m} → {1, . . . , m} be a permutation such that Then permute the context factors {C i : i ∈ {1, 2, . . . m}} according to π; see Figure 3 for an illustration.
• Make v π(1) into a child of the node that was the parent of v 1 in s; in case v 1 was a root node, v π(1) becomes a root node. • For each i = 2, 3, . . . , m, make v π(i+1) a child of v π(i) .
• Make v m+1 into a child of v π(m) . Since there is no edge between red and blue vertices in G, and h is a block-shuffle of h, this reorganization seems not to spoil the basic assumption that we are working with an elimination forest. We now verify this formally.
Apply the reorganization defined above to every regular factor B belonging to the factorization F B , for every context factor B ∈ fact F (U ∪ W ). Let F be the obtained rooted forest. We now verify the properties of F . For convenience, denote  Proof. Take any edge uv ∈ E(G). Since F is an elimination forest of G, we have that u and v are bound by the ancestor-descendant relation in F ; say u is an ancestor of v. If u / ∈ F or v / ∈ F, then u remains an ancestor of v in F , because the modification yielding F is performed in each factor of F separately, while the vertices outside of F stay intact. Similarly, if u ∈ B u ∈ F and v ∈ B v ∈ F where B u = B v , then the relative positions of B u and B v do not change during the reordering, and u remains an ancestor of v in F .
We are left with the case when u and v belong to the same factor B ∈ F. Note that in particular u, v / ∈ X. Since uv is an edge, it cannot be that u ∈ U and v ∈ W or vice versa. Assume then, without loss of generality, that u, v ∈ U , that is, both u and v are red. Let u  Figure 3: Reorganization of an example regular factor B with 7 vertices on the spine. The context before the reorganization is on the left panel, after is on the right. The applied permutation is π = (1, 4, 5, 2, 3, 6), and it leaves only two monochromatic blocks in h . Note that the last context C 7 does not participate in the reorganization and stays on its place.
and v be vertices on the spine of B such that u ∈ C u and v ∈ C v . Note that u and v are both red. Further, since u is an ancestor of v in F , we either have u = v , or u is a strict ancestor of v in F and u = u . In the former case, the ancestor-descendant relation within C u = C v is left intact by the reorganization, so u remains an ancestor of v in F . In the latter case, since the reorganization within B is performed by a block-shuffle, the relative order of u = u and v on the spine does not change, so again u remains an ancestor of v in F .
Claim 7.23. F is reduced.
Proof. Take any vertex u. Observe that if u does not lie on the spine of any factor of F, then u has exactly the same descendants in F and in F , which are moreover partitioned in the same manner among the tree factors at the children of u. Hence, it remains to check what happens if u lies on the spine of a factor B ∈ F. Adopt the notation from the description of the reorganization of the factor B , and suppose u = v i for some i ∈ {1, 2, . . . , m}. W.l.o.g. suppose u is red. Every tree factor contained in R i stays intact in F and remains attached below u, so u has still a neighbor in each of these tree factors. Therefore, the only tree factor at a child of u in F that remains to be checked is the tree factor at the child of u on the spine of B . Since the reorganization was obtained by a block-shuffle, the relative positions of red vertices along the spine remain the same in F as they were in F . Hence, this tree factor is obtained from the tree factor at v i+1 in F by adding and/or removing some blue vertices. Since F was reduced, u has a neighbor w in the tree factor rooted at v i+1 in F . The neighbor w in particular cannot be blue, because u is red. We infer that w remains in the tree factor at the child of v i on the spine in F , which concludes the proof.
Claim 7.24. The width of F is not larger than the width of F .
Proof. Let s be the tree decomposition induced by F . Take any vertex v of G. If v / ∈ F, then v has exactly the same ancestors and descendants in F as in F , hence it is assigned exactly the same bag in the induced decompositions s and s . Therefore, from now on assume that v ∈ B for some B ∈ F. In particular v / ∈ X, hence assume without loss of generality that v ∈ U , i.e., v is red.
Adopt the notation from the description of the reorganization of the factor B . Suppose first that v does not lie on the spine of B . In this case, the set of descendants of v does not change during the reorganization, however v can get new ancestors on the spine of B . Observe, nevertheless, that all these new ancestors will be blue, because the reorganization applied to the context factor B does not change the relative order of red vertices on the spine. As all descendants of v are red (by Claim 7.17), we infer that none of the new ancestors of v is included in the bag of v in s . Consequently, the bag of v in s is a subset of the bag of v in s.
Finally, we are left with the case when v belongs to the spine of B , say v = v i for some i ∈ {1, 2, . . . , m}. First, we observe that for every j ∈ {1, 2, . . . , m} it holds that This is because adh s (v j ) \ adh s (v j+1 ) consists of strict ancestors of v j with the same color as v j , which in particular do not belong to X (Claim 7.19), and B is reorganized through a block shuffle. Since F and F are reduced (Claim 7.23), we also have By Claim 7.20, this implies that Therefore, we have |bag s (v i )| = 1 + |adh s (v i )| = 1 + |adh s (v π(1) )| + sum(x π(1) x π(2) . . . x π(π −1 (i)−1) ). (7.1) Observe that by construction we have adh s (v 1 ) = adh s (v π(1) ).
(7.2) Moreover, by the definition of pmax(·), Lemma 7.1, and Claim 7.21, we have Here, the additional +1 summand on the left hand side is obtained by including also the first + symbol at the front of x i = x π(π −1 (i)) . By combining (7.1) with (7.2) and (7.3), we infer that |bag s (v i )| ≤ + 1, as requested.
The whole construction was set up in order to make sure that after the reorganization, the red vertices within each factor of F can be grouped into a small number of U -factors. We now check this formally.
Claim 7.25. Let B ∈ F. Then the set B ∩ U can be partitioned into at most (5 + 3)/2 sets that are U -factors in F . Proof. Let us adopt the notation from the description of the reorganization of the factor B . Take any monochromatic block x π(i) x π(i+1) . . . x π(j) in h . Observe that the corresponding vertex set C π(i) ∪ C π(i+1) ∪ . . . ∪ C π(j) is a monochromatic context factor in F of the same color as the block. Consequently, since there are at most (5 + 3)/2 maximal red blocks in h , the set B ∩ U can be partitioned into at most (5 + 3)/2 U -factors in F .
We now argue that the forest F has all the required properties. By Claim 7.22, F is indeed an elimination forest of G, and by Claim 7.23 it is reduced. By Claim 7.24, the width of F is not larger than the width of F . We now bound the number of maximal U -factors in F . Observe that for every forest factor A ∈ fact F (U ∪ W ), A ∩ U is either empty or a forest factor in F , which stays intact in F (Claim 7.16). On the other hand, if B ∈ fact F (U ∪ W ) is a context factor, then F B contains at most + 1 important factors and at most + 1 regular factors (Claim 7.18). Each important factor of F B is monochromatic, while for each regular factor B ∈ F B , the set B ∩ U can be partitioned into at most (5 + 3)/2 U -factors in F (Claim 7.25). By Claim 7.15 and since ≤ k, we conclude that U can be partitioned into at most f (k) := (k + 2) + (2k + 1) · (k + 1) · (5k + 3)/2 U -factors in F . Since each U -factor is contained in some maximal U -factor, and maximal U -factors in F form a partition of U by Lemma 3.2, we infer that there are at most f (k) maximal U -factors in F . This establishes condition (LD1). Finally, as we said before, the satisfaction of conditions (LD2) and (LD3) follows easily from preserving invariants (I1)-(I3), and we leave this verification to the reader. This concludes the proof of the Local Dealternation Lemma.

Normal form for mso transductions
In this section we prove Theorem 6.2. Let us first discuss the proof strategy. Recall that an mso transduction is a finite sequence of atomic steps, each being filtering, universe restriction, interpretation, copying, or coloring. Hence, the idea is to show that one can appropriately swap and merge these steps while modifying them slightly, so that the final normal form is achieved. It will be trivial to implement the rules algorithmically, hence we focus only on their description. As most of the rules are very simple, we keep the argumentation concise.
We start with merging rules: whenever two steps of the same type, apart from coloring, appear consecutively in the sequence, then they can be merged into one step.
Claim 8.1. If I 1 and I 2 are two atomic transductions of the same kind, being either renaming, copying, filtering, universe restriction, or interpretation, then I 2 • I 1 can be expressed as a single step of the same kind.
Proof. For copying and renaming the claim is trivial. For filtering, it suffices to take filtering with mso sentence ψ 1 ∧ ψ 2 , where ψ 1 and ψ 2 are sentences used in I 1 and I 2 , respectively. For universe restriction, suppose ϕ 1 (u) and ϕ 2 (u) are two mso formulas used in I 1 and I 2 , respectively. Then it suffices to take a universe restriction step using the formula ϕ(u) = ϕ 1 (u) ∧ ϕ 2 (u), where ϕ 2 (·) is constructed from ϕ 2 (·) by restricting the universe to the elements satisfying ϕ 1 (·), that is, adding a guard to each quantifier that restricts its range to (sets of) elements satisfying ϕ 1 (·). Finally, for interpretation, it suffices to replace each relation atom R(x 1 , . . . , x r ) appearing in each mso formula used in I 2 , by the mso fomula ϕ R (x 1 , . . . , x r ) used in I 1 to define the interpretation of R. The formulas obtained in this way define an interpretation that is equivalent to I 2 • I 1 . Next, we give swapping rules that enable us to exchange pairs of consecutive transductions. We first check that renaming steps can be swapped with any other step, thus they can be always pushed to the left.
Claim 8.2. Suppose I 1 is a renaming step and I 2 is an atomic transduction that is not renaming. Then I 2 • I 1 = I 1 • I 2 , where I 1 is a renaming step and I 2 is an atomic step of the same kind as I 2 .
Proof. If I 2 is an interpretation step, then we can just apply Claim 8.1 to merge I 1 and I 2 into a single interpretation step I 2 , and take I 1 to be identity. For other kinds of transductions, it is trivial to rewrite I 2 in the vocabulary before renaming, thus obtaining I 2 , and we put I 1 = I 1 .
Next, we show that the universe restriction steps can be pushed to the left by swapping.
Claim 8.3. Suppose I 1 is a universe restriction step and I 2 is an atomic transduction that is not a universe restriction. Then I 2 • I 1 = J • I 1 • I 2 for some J , I 1 and I 2 , such that J is a renaming step, I 1 is a universe restriction step, and I 2 is an atomic transduction of the same kind as I 2 .
Proof. Let ϕ(·) be the formula used by I 1 to restrict the universe. We proceed by case study, depending on the kind of I 2 .
If I 2 is a coloring step, then we can take I 1 = I 1 , I 2 = I 2 , and J to be identity, since introducing the new color has no effect on the application of universe restriction.
If I 2 is a filtering step, say using an mso sentence ψ, then we can take I 1 = I 1 and I 2 to be filtering using ψ restricted to the elements satisfying ϕ(·). That is, we modify ψ by adding a guard to each quantifier that restricts its range to (sets of) elements satisfying ϕ(·). For J we can take the identity.
If I 2 is a copying step, then we can take I 2 = I 2 and J to be identity, and we define I 1 as follows. First, let ϕ (u) be the sentence over the vocabulary after copying, obtained from ϕ(u) by additionally requiring that u belongs to the first layer of copies and restricting the range of each quantifier to the first layer. Then, I 1 is universe restriction with the mso predicate ϕ (u) saying that the unique element u that is a copy of u from the first layer satisfies ϕ (u ). Thus, ϕ (·) works on the first layer in exactly the same manner as ϕ(·) worked on the original universe, while ϕ (·) removes all copies of all elements that would be removed by ϕ(·).
Finally, if I 2 is an interpretation step, then we proceed as follows. As I 2 we take I 2 restricted to the elements that satisfy ϕ(·); that is, in every formula used in I 2 we restrict both the free variables and all quantifiers to elements satisfying ϕ(·). Moreover, I 2 only adds relations to the structure, interpreted via formulas modified as in the previous sentence, while all relations of the original vocabulary are kept intact via identity interpretations. Next, we take I 1 = I 1 ; note that thus I 1 works on the original relations that were kept in the structure. Finally, we add a renaming step J that removes the relations of the original vocabulary and renames the other relations added by I 2 to their final names.
Next, we push the interpretation steps to the left by swapping.
Claim 8.4. Suppose I 1 is an interpretation step and I 2 is an atomic transduction, being either coloring, filtering, or copying. Then I 2 • I 1 = I 1 • I 2 , where I 1 is an interpretation step and I 2 is an atomic transduction of the same kind as I 2 . Proof. We proceed by case study, depending on the kind of I 2 .
If I 2 is a coloring step, then we can simply put I 1 = I 1 and I 2 to be I 2 enriched by keeping the unary predicate introduced by I 1 intact.
If I 2 is a filtering step, say using an mso sentence ψ, then we can put I 1 = I 1 and I 2 to be a filtering using the sentence ψ obtained from ψ by replacing each relation atom R(x 1 , . . . , x r ) by its interpretation ϕ R (x 1 , . . . , x r ) under I 1 .
Finally, if I 2 is a copying step, then we take I 2 = I 2 and I 1 defined as follows. First, from each formula ϕ R (x 1 , . . . , x r ) used by I 1 construct a formula ϕ R (x 1 , . . . , x r ) by restricting all free variables and ranges of all quantifiers to the first layer of copies. Then, in I 2 to interpret relation R use the formula ϕ R (x 1 , . . . , x r ) which expresses the following: the unique elements x 1 , . . . , x r that are the first-layer copies of x 1 , . . . , x r , respectively, satisfy ϕ R (x 1 , . . . , x r ).
The next type to tackle is copying.
Claim 8.5. Suppose I 1 is a copying step and I 2 is an atomic transduction, being either filtering or coloring. Then I 2 • I 1 = J • I 1 • I 2 , where J is a single interpretation step, while I 2 is a single filtering step if I 2 was filtering, and I 2 is a finite sequence of coloring steps if I 2 was coloring.
Proof. Let I 1 copy the universe times.
First, suppose I 2 is a filtering step, say using an mso sentence ψ. Then we can take J to be the identity, while I 2 is filtering using a sentence ψ obtained from ψ by restricting the ranges of all quantifiers to the first layer of copies.
Second, suppose I 2 is a coloring step, say introducing a unary predicate X. Then we take I 2 to be a sequence of coloring steps as follows. The ith coloring step introduces a unary predicate X i . After performing the copying (transduction I 1 ), we add an additional interpretation step J that introduces the unary predicate X interpreted as follows: if u is from the ith layer of copies, then u is declared to belong to X if and only if it belongs to X i ; this can be easily expressed in mso. The auxiliary predicates X 1 , . . . , X are dropped by interpretation J .
Finally, we are left with swapping coloring and filtering.
Claim 8.6. Suppose I 1 is a filtering step and I 2 is a coloring step. Then I 2 • I 1 = I 1 • I 2 .
Proof. The filtering may just ignore the new predicate introduced by the coloring.
We now show that using the merging and swapping rules described in the above claims, we can reduce any sequence of atomic transductions to the normal form described in the theorem statement.
First, observe that by iteratively using Claim 8.1 (for renaming) and Claim 8.2 we can always move any renaming steps to the left of the current sequence of transductions, and merge it there into a single renaming step. Next, consider the universe restriction steps. Using Claim 8.1 (for universe restriction) and Claim 8.3 we can iteratively move any universe restriction steps to the left and merge them into one universe restriction step, placed immediately to the right of the final renaming step. Any additional renaming steps obtained during this procedure can be again pushed to the left as in the previous paragraph.
Thus, the remaining sequence has no universe restriction steps. Observe that now all interpretation steps can be moved to the left using Claim 8.4, and merged into one