Positive First-order Logic on Words and Graphs

We study FO+, a fragment of first-order logic on finite words, where monadic predicates can only appear positively. We show that there is an FO-definable language that is monotone in monadic predicates but not definable in FO+. This provides a simple proof that Lyndon's preservation theorem fails on finite structures. We lift this example language to finite graphs, thereby providing a new result of independent interest for FO-definable graph classes: negation might be needed even when the class is closed under addition of edges. We finally show that the problem of whether a given regular language of finite words is definable in FO+ is undecidable.


Introduction
Preservation theorems in first-order logic (FO) establish a link between semantic and syntactic properties [AG97,Ros08].We will be particularly interested here in Lyndon's theorem [Lyn59], which states that if a first-order formula is monotone in a predicate P (semantic property), then it is equivalent to a formula that is positive in P (syntactic property).Recall that "monotone in P " means that the formula stays true when tuples are added to P , while "positive in P " means that P does not appear under a negation in the formula.
As it is often the case with preservation theorems, Lyndon's Theorem may not hold when restricting the class of structures considered.Whether Lyndon's Theorem is true when restricted to finite structures was an open problem for 28 years.It was finally shown to fail on finite structures in [AG87] with a very difficult proof, using a large array of techniques from different fields of mathematics such as probability theory, topology, lattice theory, and analytic number theory.A simpler but still quite intricate proof of this fact was later given by [Sto95], using Ehrenfeucht-Fraïssé games on grid-like structures equipped with two binary predicates.This construction was slightly modified in [Ros95] to treat a signature monotone in every relation symbol.
The goal of this paper is to further restrict the class of structures under consideration, starting by allowing only finite words.This will allow us to obtain in turn a better understanding of the problem for finite graphs and general finite structures.We will therefore work in most of this paper with the particular signature associated with finite words: one binary predicate (the total order on positions in the word), and a finite set of monadic predicates (encoding the alphabet).We will call FO + the fragment of first-order and NFA for non-deterministic), finite monoids, and first-order logic.See e.g.[DG08] for an introduction to all the needed material.

Monotonicity on words
2.1.Ordered alphabet.In this paper we will consider that the finite alphabet A is equipped with a partial order ≤ A .This partial order is naturally extended to words componentwise: a 1 a 2 . . .a n ≤ A b 1 b 2 . . .b m if n = m and for all i ∈ [1, n] we have a i ≤ A b i .
A special case that will be of interest here is when the alphabet is built as the powerset of a set P of predicates, i.e.A = P(P ), and the order ≤ A is inclusion.We will call this a powerset alphabet.
Taking A = P(P ) is standard in settings such as verification and model theory, where several predicates can be considered independently of each other in some position.
Powerset alphabets constitute a particular case of ordered alphabets.The results obtained in this paper are valid for both the powerset case and the general case.Due to the nature of the results (existence of a counter-example and undecidability result), it is enough to show them in the particular case of powerset alphabets to cover both cases.Moreover, the powerset alphabet case allows us to directly establish a link with Lyndon's theorem, which is stated in the framework of model theory.For these reasons, we will keep the more general notion of ordered alphabet for generic definitions, but we will prove our main results on powerset alphabets in order to directly obtain the stronger version of these results.

Monotone languages.
We fix A a finite ordered alphabet.
Definition 2.1.We say that a language L ⊆ A * is monotone if for all u ≤ A v, if u ∈ L then v ∈ L.
Example 2.2.Let A = {a, b} with a ≤ A b. Then A * bA * is monotone but its complement a * is not monotone.Definition 2.3.Let L ⊆ A * , the monotone closure of L is the language L ↑ = {v ∈ A * | ∃u ∈ L, u ≤ A v}.It is the smallest monotone language containing L.
In particular, if a ∈ A, we will note a ↑ the set {b ∈ A | a ≤ A b}. Proof.We build an NFA B ↑ from B, by replacing every transition p a → q of B by p a ↑ → q.We use here the standard convention where a transition p X → q with X ⊆ A stands for a set of transitions {p b → q | b ∈ X}.It is straightforward to verify that B ↑ is an NFA for L ↑ : any run of B ↑ on some word v can be mapped to a run of B on some u ≤ A v.
Theorem 2.5.Given a regular language L ⊆ A * , it is decidable whether L is monotone.The problem is in P if L is given by a DFA and Pspace-complete if L is given by an NFA, on any alphabet with non-trivial order.
Proof.Notice that if B is an NFA, L(B) is monotone if and only if L(B ↑ ) ⊆ L(B).This shows that the problem is in Pspace in general, and that it is in P when B is a DFA, since it reduces to checking emptiness of the intersection between B ↑ and the complement of B. We show that the general problem is Pspace-hard by reducing from NFA universality.Let B be an NFA on a two-letter alphabet A = {a, b}.We build an NFA C of size polynomial in the size of B and recognizing aA * + bL(B), using standard NFA constructions.We now consider the monotonicity of C according to the alphabet order a ≤ A b.If L(C) is monotone, since for all u ∈ A * we have au ∈ L(C), we obtain bu ∈ L(C) as well, so u ∈ L(B).
We have that L(B) = A * if and only if L(C) is monotone, thereby completing the Pspace-hardness reduction.This means that the monotonicity problem is Pspace-complete as soon as there are two comparable letters a ≤ A b in the alphabet.Otherwise the problem is trivial, as any language is monotone on a trivially ordered alphabet.

Positive first-order logic
3.1.Syntax and semantics.The main idea of positive FO, that we will note FO + , is to guarantee via a syntactic restriction that it only defines monotone languages.
Notice that since monotone languages are not closed under complement (see Example 2.2), we cannot allow negation in the syntax of FO + .This means we have to add dual versions of classical operators of first-order logic.
This naturally yields the following syntax for FO + : As usual, variables x, y, . . .range over the positions of the input word.The semantics is the same as classical FO on words, with the notable exception that a ↑ (x) is true if and only if x is labelled by some b ∈ a ↑ .Unlike classical FO, it is not possible to require that a position is labelled by a specific letter a, except when a ↑ = {a}.This is necessary to guarantee that only monotone languages can be defined.
Here the valuation α Remark 3.2.In the powerset alphabet framework where A = P(P ), we can naturally view FO + as the negation-free fragment of first-order logic, by having atomic predicates a ↑ (x) range directly over P instead of A = P(P ).We can then drop the a ↑ notation, as predicates from P are considered independently of each other.This way, p(x) will be true if and only if the letter S ∈ A labelling x contains p.A letter predicate S ↑ (x) in the former syntax can then be expressed by p∈S p(x), so FO + based on predicates from P is indeed equivalent to FO + based on A. We will take this convention when working on powerset alphabets.Then all languages are monotone, and any FO-definable language is FO + -definable.
Proof.The fact that all languages are monotone in this case follows from the fact that for two words u, v we have u If L is definable by an FO formula ϕ, we can build an FO + formula ψ from ϕ by pushing negations to the leaves using the usual rewritings such as ¬(ϕ ∧ ψ) = ¬ϕ ∨ ¬ψ and ¬(∃x.ϕ)= ∀x.¬ϕ.For all letter a ∈ A and variable x, we then replace all occurrences of ¬a(x) by b =a b(x).Finally, the negation of x ≤ y (resp.x < y) can be written y < x (resp.y ≤ x).
Lemma 3.5.The logic FO + can only define monotone languages.
Proof.This is done by induction on the FO + formula ϕ, where the induction property is strengthened to include possible free variables: for all (u, α) ∈ ϕ and v ≥ A u, we have (v, α) ∈ ϕ .Base cases: It is natural to ask whether the converse of Lemma 3.5 holds: if a language is FOdefinable and monotone, then is it necessarily FO + -definable?This will be the purpose of Section 4.

3.3.
Ordered Ehrenfeucht-Fraïssé games.We will explain here how FO + -definability can be captured by an ordered variant of Ehrenfeucht-Fraïssé games, that we will call EF + -games.
This notion was defined in [Sto95] for general structures, we will instantiate it here on finite words.
We define the n-round EF + -game on two words u, v ∈ A * , noted EF + n (u, v).This game is played between two players, Spoiler and Duplicator.
If k ∈ N, a k-position of the game is of the form (u, α, v, β), where α : [1, k] → dom(u) and β : [1, k] → dom(v) are valuations for k variables in u and v respectively.We can think of α and β as giving the position of k previously placed tokens in u and v.
Notice the difference with usual EF-games: here we do not ask that tokens placed in the same round have same label, but that the label in u is ≤ A -smaller than the label in v.This feature is intended to capture FO + instead of FO.
At each round, starting from a k-position (u, α, v, β), the game is played as follows.If k = n, then Duplicator wins.Otherwise, Spoiler chooses a position in one of the two words, and places token number k + 1 on it.Duplicator answers by placing token number k + 1 on a position of the other word.Let us call α and β the extensions of α and β with these new tokens.If (u, α , v, β ) is not a valid (k + 1)-position, then Spoiler immediately wins the game, otherwise, the game moves to the next round with (k + 1)-position (u, α , v, β ).
The following Theorem shows the link between the n-round EF + game and formulas of rank at most n.
Theorem 3.7 [Sto95, Thm 2.4].We have u n v if and only if for all formulas ϕ of FO + with qr(ϕ) ≤ n, we have Since the proof of Theorem 3.7 does not appear in [Sto95], we will prove it in a general setting in Section 5.2.
Let us now see how we can use EF + games to characterize FO + -definability.
Corollary 3.8.A language L is not FO + -definable if and only if for all n ∈ N, there exists Proof.⇐ : Let n ∈ N, there exists (u, v) ∈ L × L such that u n v.By Theorem 3.7, any formula of quantifier rank n accepting u must accept v, so no formula of quantifier rank n recognizes L. This is true for all n ∈ N, so L is not FO + -definable.⇒ (contrapositive): Assume there exists n ∈ N such that for all (u, v) ∈ L × L, u n v.By Theorem 3.7, this means that for all (u, v) ∈ L × L, there exists a formula ϕ u,v of quantifier rank n accepting u but not v.Since there are finitely many FO + formulas of rank n up to logical equivalence [Lib04, Lem 3.13], the set of formulas F = {ϕ u,v | (u, v) ∈ L × L} can be chosen finite.We define ψ = u∈L v / ∈L ϕ u,v , where the conjunctions and disjunction are finite since F is finite.For all u ∈ L, u |= v / ∈L ϕ u,v hence u |= ψ, and conversely, a word satisfying ψ must satisfy some v / ∈L ϕ u,v , so it cannot be in L.

A counter-example language
4.1.The language K.We will now answer the natural question posed in Section 3.2: is any FO-definable monotone language (on any ordered alphabet) also FO + -definable?This section is dedicated to the proof of the following Theorem: Theorem 4.1.There is an FO-definable monotone language K on a powerset alphabet that is not FO + -definable.
Let P = {a, b, c} and A = P(P ), ordered by inclusion.We will note a b , b c , c a for the letters {a, b}, {b, c}, {a, c} respectively, and for {a, b, c}.If x ∈ P we will often note x instead of {x} to lighten notations.Definition 4.2.We now define the desired language by: We claim that K satisfies the requirements of Theorem 4.1.
Notice that the second disjunct A * A * could be omitted if we were to consider only the alphabet A \ { }.When sticking with a powerset alphabet, this disjunct is necessary to obtain an FO-definable language.Indeed, if we just define K 0 = (a ↑ b ↑ c ↑ ) * on alphabet A, we have K 0 ∩ ( * ) = ( ) * .Since * is FO-definable but ( ) * is not (see [DG08]), and FO-definable languages are closed under intersection, we have that K 0 is not FO-definable.
Proof.The fact that K is monotone is straightforward from its definition, as the union of two monotone languages.
We will show that K is FO-definable in three different ways: using its minimal DFA, its syntactic monoid, and finally describing how an FO formula recognizing it can be defined.This gives several complementary points of view, which can all be helpful for a deep understanding of this counter-example language.
Let us start with the automaton approach.We use the classical characterizations of first-order definable languages [DG08] by verifying that the minimal DFA A of K is counter-free.That is, no word induces a non-trivial cycle in A.
The minimal DFA A recognizing K is depicted in Figure 1.We note ¬a = {∅, {b}, {c}, {b, c}} the sub-alphabet of A of letters not containing a, similarly for ¬b and ¬c.The edges going to rejecting state ⊥ are grayed and dashed, and the ones going to accepting sink state q are grayed, for readability.We also note a = a ↑ \ { } = {{a}, a b , c a }, and similarly for b , c .
To show that K is FO-definable, it suffices to show that A is counter-free, i.e. that there is no word u ∈ A * , distinct states p, q of A, and integer k, such that p u → q and q u k → p. Assume for contradiction that such u, p, q, k exist.Since the only non-trivial strongly connected component in A is {q a , q b , q c }, these states are the only candidates for Since p, q are distinct, it means |u| is not a multiple of 3, and u induces a 3-cycle, either → q a if |u| ≡ 1 mod 3 or in the reverse order if |u| ≡ 2 mod 3. Thus, the first letter of u can be read from all states from {q a , q b , q c }, while staying in this component.Such a letter does not exist, so we reach a contradiction.The DFA A is counter-free, so K is FO-definable [DG08].4.3.Syntactic monoid for the language K.It is instructive to see what the syntactic monoid of K looks like, in particular to get a first intuition on how an FO formula can be defined for K.
We depict this monoid M in Figure 2, using the eggbox representation based on Green's relations: boxes are J -classes, lines are R-classes, columns are L-classes, and cells are H-classes.See [Col11] for an introduction to Green's relations and eggbox representation.
The syntactic morphism h : A * → M is easily inferred, as elements of the monoid in h(A) are directly named after the letter mapping to them.The accepting part of M is To show that K is FO-definable, it suffices to verify that M is aperiodic, which is directly visible on Figure 2, as all H-classes are singletons (see [Col11]).4.4.Defining an FO formula for the language K.We now give some intuition on how an FO-formula can recognize K.
Recall that K = (a ↑ b ↑ c ↑ ) * + A * A * .We describe here the behaviour of a formula witnessing that K is FO-definable.
The A * A * part of K is just to rule out words containing by accepting them, which can be done by a formula ∃x.(x).So we just need to design a formula ϕ for K = (a ↑ b ↑ c ↑ ) * \ (A * A * ), assuming the letter does not appear, the final formula will then be ϕ ∨ ∃x.(x).
We will call forbidden pattern any word that is not an infix of a word in K .Let us call anchor a position x such that either x is labelled by a singleton, or x is labelled by a b (resp.

Figure 3: A visualization of anchors
The anchor x goes left-up and right-up, while the anchor y goes left-up and right-down.If d ∈ {up, down} is a direction, we say that two successive anchors x < y agree on d if x goes right-d and y goes left-d.We say that x and y agree if they agree on some d.Now, the formula ϕ will express the following properties: • for all x, x + 1 consecutive anchors, the letters at positions x, x + 1, x + 2 do not form a forbidden pattern (omit x + 2 if x + 1 is the last position).
• all non-consecutive successive anchors agree.
For instance the formula will accept the word u above, as the anchors 0, x agree on up, x, y agree on up, and y, last agree on down.
It is routine to verify that these properties can be expressed in FO, and that they indeed characterize the language K .4.5.Undefinability of K in FO + .To prove that K is the wanted counter-example, it remains to show: Proof.We establish this using Corollary 3.8.Let n ∈ N, and N = 2 n .We define u = (abc and v / ∈ K because |v| ≡ 2 mod 3, and v does not contain .By Corollary 3.8, it suffices to prove that u n v to conclude.We give a strategy for Duplicator in EF + n (u, v).The strategy is an adaptation from the classical strategy showing that (aa) * is not FO-definable [Lib04].To simplify the description of the strategy, let us consider that prior to the game, tokens first, last are placed on the first and last positions on u, and first , last on the first and last position of v.The strategy of Duplicator during the game is then as follows: every time Spoiler places a token in one of the words, Duplicator answers in the other by replicating the closest distance (and direction) to an existing token.This strategy is illustrated in Figure 4, where move i of Spoiler (resp.Duplicator) is represented by i (resp.i ).Intuitively, the strategy of Duplicator is to match u with the top row of v if Spoiler plays close to the beginning of the words, and with the bottom row of v if Spoiler plays close to the end.
We have to show that this strategy of Duplicator allows him to play n rounds without losing the game.This proof is similar to the classical one for (aa) * , see e.g.[Lib04], and actually the strategy is exactly the same if we forget the letter labels.The main intuition is that the length of the non-matching intervals between u and v is at worst divided by 2 at each round, and it starts with a length of 2 n , so Duplicator can survive n rounds.
Let us show that this strategy is indeed winning for Duplicator in EF + n (u, v).We will generally write p, p for related tokens, p being the position in u and p the position in v.
The proof works by showing that the following invariant holds: after i rounds where Duplicator did not lose, if tokens in positions p < q in u are related to tokens p < q in v, and u[p..q] ≤ A v[p ..q ], let us note d = q − p, d = q − p ; then d = d + 1 and d ≥ 2 n−i .In other words, if we call wrong interval a factor u[p..q] or v[p ..q ] such that u[p..q] ≤ A v[p ..q ], the invariant states that after i rounds, the length of the smallest wrong interval in u is at least 2 n−i , and corresponding wrong intervals differ by 1, the one in u being longer.Before the first round, this invariant is true, as the only tokens are at the endpoints of u and v, and we have |u| = |v| + 1 and |u| ≥ 2 n .Now, assume the invariant true at round i, and consider round i + 1.When Spoiler plays a token in one of the words, two cases can happen.If it is played between previous tokens p, q (resp.p , q ) such that u[p..q] ≤ A v[p ..q ], then Duplicator will simply answer the corresponding position in the other word, and the smallest wrong interval is not affected.If on the contrary, the new token is played in a minimal wrong interval, say u[p, q] on position r, then Duplicator will answer by preserving the closest distance between r − p and q − r.For instance if r − p < q − r, Duplicator will answer r = p + (r − p).We can notice that by definition of the words u and v, and since u[p] ≤ v[p ] by the rules of the game, we have u[p..r] ≤ A v[p ..r ], and in particular u[r] ≤ A v[r ], so the move of Duplicator is legal.Moreover, since q − r > r − p, we have q − r ≥ q−p 2 , so using the induction hypothesis, q − r ≤ 2 n−(i+1) .Moreover, since we had (q − p) = (q − p ) + 1, we now have (q − r) = (q − p) − (r − p) = (q − p ) + 1 − (r − p ) = (q − r ) + 1, so the invariant is preserved.The case where r − p ≥ q − r is symmetrical.If on the other hand Spoiler plays in v a position r in a wrong interval v[p ..q ], then min(r − p , q − r ) will be strictly smaller than 2 n−(i+1) , and will be replicated by the answer r of Duplicator in u[p..q].This means that the new smallest wrong interval created in u will have length at least 2 n−(i+1) , thereby guaranteeing that the invariant is also preserved in this case.

Lyndon's Theorem
In this section we will see what the existence of this counter-example language K means for Lyndon's theorem on other structures.We start by showing in Section 5.1 that it can be adapted to show the failure of Lyndon's theorem on finite structures.We then show in Section 5.3 that the counter-example can also be encoded in finite directed graphs, and finally on undirected graphs.5.1.General structures.We will consider here first-order logic on arbitrary signatures and unconstrained structures.All definitions of Section 3 can be naturally extended to this general setting, and all results from Section 3.2 extend straightforwardly.We will later extend the EF + -game result as well.
Our goal is to see how Theorem 4.1 can be lifted to this general framework.
Definition 5.1.A formula ϕ is monotone in a predicate P if whenever a structure S is a model of ϕ, any structure S obtained from S by adding tuples to P is also a model of ϕ.
Example 5.2.On graphs, where the only predicate is the edge predicate, the formula asking for the existence of a triangle is monotone, but the formula stating that the graph is not a clique is not monotone.
Definition 5.3.A formula ϕ is positive in P if it never uses P under a negation.
Let us recall the statement of Lyndon's Theorem, which holds on general (possibly infinite) structures: If ψ is an FO formula monotone in predicates P 1 , . . ., P n , then it is equivalent to a formula positive in predicates P 1 , . . ., P n .
We will now see explicitly how the language K from Section 4 can be used to show that Lyndon's Theorem fails on finite structures.
The failure of this theorem on finite structures was first shown in [AG87] with a very difficult proof, then reproved in [Sto95] with a simpler one, using the Ehrenfeucht-Fraïssé technique.Still, the proof from [Sto95] is quite involved compared to the one we present here.
Since Lyndon's theorem can be found in the literature under different formulations, and since it is not clear at first sight that they are equivalent, we make it clear here how the construction of this paper applies to all of them.This also serves the purpose of making explicit the exact signature needed in each formulation to show the failure on finite structures with our method.
We will describe a signature by its sequence of arities, and add a symbol ↑ to specify monotone predicates.For instance (2, 1↑) describes a signature consisting of a binary predicate and a monotone unary predicate.Notice that the order is not important here, we are only interested in the multiset of pairs arity/monotonicity.

(i) Arbitrarily many monotone predicates
This is the most general formulation of Lyndon's theorem, as made explicit in Theorem 5.4.Let us show that our language K shows its failure on finite structures.
We will use here the fact that if P = {a, b, c} is a set of monadic predicates, then a finite model over the signature (≤, a, b, c) where the order ≤ is total is simply a finite word on the powerset alphabet A = P(P ).Therefore, in order to view our words as general finite structures, it suffices to axiomatize the fact that ≤ a total order.This can be done with a formula ψ tot = (∀x, y. x ≤ y ∨ y ≤ x) ∧ (∀x, y, z. x ≤ y ∧ y ≤ z ⇒ x ≤ z) ∧ (∀x, y. x ≤ y ∧ y ≤ x ⇒ x = y) ∧ (∀x.x ≤ x).Notice that ψ tot is not monotone in the predicate ≤.
Let ϕ be the FO-formula defining K, obtained in Lemma 4.3, and let ψ = ϕ ∧ ψ tot .Then, ψ is monotone in predicates a, b, c, and finite structures on signature (≤, a, b, c) satisfying ψ are exactly words of K.However, as we proved in Theorem 4.1, no first-order formula that is positive in predicates a, b, c can define the same class of structures, since the same formula interpreted on words would be an FO + -formula for K.
We can encode the language K in this framework, by using one binary predicate A to represent all letter predicates.Let K 3 be K restricted to words of length at least 3.By Theorem 4.1 it is clear that K 3 is FO-definable but not FO + -definable.
Let ψ 3 be an FO-formula stating that there are at least 3 elements 0, 1, 2, and that for all y / ∈ {0, 1, 2} and for all x, A(x, y) holds.We build the FO formula ϕ from the FO formula ϕ recognizing the language K 3 by replacing every occurrence of a(x) (resp.b(x), c(x)) by A(x, 0) (resp.A(x, 1), A(x, 2)).Finally, we define the FO formula ψ = ψ tot ∧ ψ 3 ∧ ϕ .Finite structures on signature (≤, A) accepted by ψ are exactly those which encode words of K 3 .No formula positive in A can recognize this class of structures, otherwise we could obtain from it an FO + -formula for K 3 , by replacing every occurrence of A(x, y) by (a(x) We thus give a counter-example on a signature (2, 2↑), as was done in [Sto95].
(iii) Closure under surjective homomorphisms Lyndon's theorem is also often stated in the following way: if an FO formula defines a class of structures closed under surjective homomorphisms, then it is equivalent to a positive formula.This formulation is equivalent to saying that the formula is monotone in all predicates.This case has been treated in [Ros95], using a slight modification of the construction from [Sto95], and building a counter-example on signature (0, 0, 1↑, 1↑, 2↑, 2↑, 2↑).Notice that arities 0 correspond to constants, which are always trivially monotone.
We can deal with this framework as well, by incorporating a predicate ≤ to the signature.Let ψ ≤ be the formula obtained from ψ defined in case (i) by pushing negations to the leaves and replacing all subformulas of the shape ¬(x ≤ y) with x ≤ y.Let Finite structures on the signature (≤, ≤, a, b, c) can be classified into three categories: (1) if there are x, y such that x ≤ y ∧ x ≤ y, then the structure satisfies ψ + hence ψ (2) otherwise, if there are x, y such that ¬(x ≤ y ∨ x ≤ y), then the structure does not satisfy ψ − so it does not satisfy ψ either.(3) otherwise, ≤ is the complement of ≤, and the structure satisfies ψ if and only if it satisfies ψ ≤ .Therefore, in ψ ≤ we can use ≤ and ≤ freely, assuming that ≤ is actually the complement of ≤.In particular the ψ tot subformula of ψ ≤ axiomatizes the fact that ≤ is a total order, provided that ≤ is its complement.So the structures of item (3) are exactly the words of K, with an additional predicate ≤ which is the complement of ≤.Items (1),(2) guarantee that the class of finite structures accepted by ψ is monotone with respect to ≤ and ≤ as well.As before, it is impossible to have a formula positive in all predicates accepting the same class of finite structures as ψ , since replacing x ≤ y with y < x in this formula would directly yield an FO + -formula for K.
Thus the syntax of FO + in this setting is: Notice that we allow the negation of predicates from R i .We can additionally assume that these formulas will only be evaluated on structures verifying certain axioms, for instance on structures where a predicate ≤ evaluates to a linear order, or where the predicate = corresponds to equality.These will be called σ-structures in the following, where the signature σ can be enriched by such axioms.
The EF + -game on two σ-structures u, v is defined as before, with the following generalization: at a given stage (u, α, v, β) of the game, where α (resp.β) is a valuation in u (resp.v) for already played tokens, Duplicator must ensure: • For any monotone predicate P i and tuple x of r i played tokens, we must have u, α |= P i ( x) ⇒ v, β |= P i ( x). • For any non-monotone predicate R i and tuple x of r i played tokens, we must have u, α As before, we note u n v if Duplicator has a strategy to win the n-round EF + -game between s and t.We want to show the following theorem, generalizing Theorem 3.7, and formulated in [Sto95]: In a general setting, for any σ-structures u, v, we have u n v if and only if for all formulas ϕ of FO + with qr(ϕ) ≤ n, we have The proof is an adaptation of the classical proof for correctness of EF-games, see e.g.[Lib04].
Since FO + is a fragment of FO, we can directly use the following Lemma: Lemma 5.6 [Lib04, Lem 3.13].Let n, k ∈ N. Up to logical equivalence, there are finitely many formulas of quantifier rank at most n using k free variables.
We will now show a strengthening of Theorem 5.5, where free variables are incorporated in the statement: be valuations for k variables x 1 , . . ., x k in u, v respectively.Then Duplicator wins EF + n (u, α, v, β) if and only if for any FO + formula ϕ with qr(ϕ) ≤ n using k free variables x 1 . . .x k , we have u, α |= ϕ ⇒ v, β |= ϕ.
Proof.We prove this by induction on n.Base case n = 0: Notice that quantifier-free formulas of FO + are just positive boolean combinations of atomic formulas of the form P i ( x), R i ( x) or ¬R i ( x).We will note Q( x) for such an arbitrary atomic formula.Let ϕ be such a formula with k free variables accepting u, α but rejecting v, β.This happens if and only if there is an atomic formula Q( x) such that u, α |= Q( x) and v, β |= Q( x).This is equivalent to saying that (u, α, v, β) is not a valid k-position, i.e.Spoiler wins the 0-round game EF + 0 (u, α, v, β).
Induction case: Assume there is an FO + formula ϕ with qr(ϕ) ≤ n, accepting u, α but not v, β.The formula ϕ is a positive combination of atomic formulas, formulas of the form ∃x.ψ, and formulas of the form ∀x.ψ.Therefore, one of these formulas accepts u, α but not v, β.If it is an atomic formula, then Spoiler immediately wins EF + n (u, α, v, β) as in the base case.
If it is a formula of the form ∃x.ψ, then Spoiler can use the following strategy: pick a position p witnessing that the formula is true for u, α, and play the position p in u.Duplicator will answer a position p in v, and the game will move to (u, α , v, β ), where α = α[x → p] and β = β[x → p ]. Since the formula ψ has quantifier rank at most n − 1, and accepts u, α but not v, β , by induction hypothesis Spoiler can win in the remaining n − 1 rounds of the game.Now if it is a formula of the form ∀x.ψ, then Spoiler can do the following: pick a position p witnessing that the formula is false for v, β, and play the position p in v. Duplicator will answer a position p in u, and the game will move to (u, α , v, β ), where α = α[x → p] and β = β[x → p ]. Since the formula ψ has quantifier rank at most n − 1, and accepts u, α but not v, β , by induction hypothesis Spoiler can win in the remaining n − 1 rounds of the game.
Let us now show the converse implication.We assume any formula of quantifier rank at most n accepting u, α must accept v, β, and we give a strategy for Duplicator in EF + n (u, α, v, β).Suppose Spoiler places token x at position p in u.Let α = α[x → p].By Lemma 5.6, up to logical equivalence, there is only a finite set F of FO + formulas of rank at most n − 1 with k + 1 free variables accepting u, α .Let ψ = ϕ∈F ϕ.Then u, α satisfies the formula ∃x.ψ of rank n (as witnessed by p), so by assumption we also have v, β |= ∃x.ψ.This means there is a position p of v such that v, β |= ψ, where β = β[x → p ]. Duplicator can answer position p in v, and by induction hypothesis he will win the remaining of the game, since every formula of F accepts v, β .
Suppose now that Spoiler places token x at position p in v.
Let F be the finite set of formulas (up to equivalence) of quantifier rank at most n − 1 and with k + 1 free variables, that reject v, β .Let ψ = ϕ∈F ϕ, and ψ = ∀x.ψ.By construction, x = p witnesses that ψ does not accept v, β.Our assumption implies that it does not accept u, α either.So there is p ∈ dom(u) such that u, α |= ∀x.ψ,where α = α[x → p].Duplicator can answer position p in u.If a formula ϕ of rank at most n − 1 is true in u, α , then by construction it cannot appear in F , therefore it is also true in v, β .By induction hypothesis, Duplicator wins the remaining (n − 1)-round game starting from (u, α , v, β ).This achieves the proof of Theorem 5.5, and its instantiation Theorem 3.7 on finite words.The proof of Corollary 3.8 is exactly identical in this general setting, so EF + game can be used to prove that some properties of general structures are not expressible in FO + : Corollary 5.8.A class of σ-structures C is not FO + -definable if and only if for all n ∈ N, there exists (u, v) ∈ C × C such that u n v. 5.3.Finite directed graphs.Our goal is now to show that Lyndon's theorem fails on finite directed graphs, i.e. on finite structures where the signature consists in one (monotone) binary predicate, in addition to (non-monotone) equality.To our knowledge this is a new result.
The positive FO formulas on graphs, that we will call again FO + , is defined via the following syntax: while general FO formulas can additionally use predicates of the form ¬E(x, y).A class C of graphs is monotone if whenever G ∈ C, and G is obtained from G by adding edges, then G ∈ C. It is straightforward to adapt the proof of Lemma 3.5 to show that FO + can only define monotone classes of graphs.
Notice that equality/inequality predicates were not needed in the case of words since this was expressible with the order predicates ≤ and <.
The goal of this section is to prove the following result: Theorem 5.9 (Failure of Lyndon's Theorem on finite directed graphs).There exists an FO-definable monotone class of directed graphs, which is not FO + -definable.
Let us start by giving an informal proof sketch.Our goal will simply be to encode the language K from Section 4 as a set of graphs.Thus, the graphs of interest will have a very specific shape, allowing to encode words on alphabet A = P({a, b, c}).In order to ensure monotonicity, instead of forbidding all patterns that break the encoding, we will instead accept any graph having "too many edges".This is the same idea as in the "Closure under surjective homomorphisms" paragraph of Section 5.1.Thus we will have two kinds of constraints: • A formula ψ − asking for some edges to be present, i.e. rejecting graphs that cannot encode a word because of a lack of edges.• A formula ψ + that will accept any graph falling outside of the required shape of an encoding, because of an excess of edges.
Both ψ − and ψ + will be FO + formulas, thus describing monotone classes of graphs.The graphs encoding words of A * will be the models of ψ − that are not models of ψ + .Let us call G w the set such graphs, encoding words of A * .We call G K the subset of G w consisting of graphs encoding words of K. Our monotone language of graphs witnessing failure of Lyndon's theorem will be G K ∪ ψ + .Let (ϕ K ) G be an FO formula accepting G K among graphs from G w , the behaviour of (ϕ K ) G being irrelevant outside of G w .This formula (ϕ K ) G will be obtained from the FO formula ϕ K for the language K, by interpreting predicates a(x), b(x), c(x), and x ≤ y in our encoding.For technical reasons, our encoding will also use some distinguished vertices x, shared by all formulas, and existentially quantified.Thus our final formula will be of the form φ := ∃ x.ψ − ∧ ((ϕ K ) G ∨ ψ + ).It accepts a monotone language of graphs: G K ∪ ψ + .In order to show that there is no positive formula equivalent to φ, we will replicate the EF + game of Lemma 4.4 using graphs from G w .
Let us now move to the detailed construction.Let E(x, y) be the edge predicate, that we will often note x → y for simplicity.Similarly, we will note x → y → z as a shortcut for E(x, y) ∧ E(y, z).This arrow will not be at risk of being confused with an implication symbol, since we will avoid the use of implication to obtain negation-free formulas.In the following, we assume three vertices x a , x b , x c are pointed in the graph.We will call them sources, represented by circles in figures.We will call "squares" the vertices other than x a , x b , x c .We will impose that the subgraph induced by the sources is a particular one, the only purpose of this is to be able to uniquely identify these three vertices.We define G w to be the set of graphs satisfying the following properties: (sources) x a , x b , x c are distinct, and they induce the following subgraph: (in-edge) x a is the only vertex with no in-edge (cycle) There is no cycle of length at most 3 other than the 2-cycle on x b , x c .(order) Any two squares are related by an edge.(direction) There is no edge from a square to a source.
The rule (direction) is actually optional for the correctness of the construction, but it simplifies the exposition.The next lemma justifies the choice of name for the rule (order): Lemma 5.10.If G is a graph in G w , the edge relation defines a strict total order on squares.
Proof.Since 3-cycles are forbidden, and any two squares are related, the edge relation is transitive on squares: for any x → y → z, there is an edge x → z.Since self-loops and 2-cycles are forbidden, the relation is irreflexive and antisymmetric.Therefore, it defines a strict total order on squares.
Figure 5 shows the shape of such a graph with four squares.Notice that the edges from sources to squares can be arbitrary, except that there must be an edge from a source to the first square in the order, because of rule (in-edge).Let us give at this stage more details about sources: their role will be to encode the unary predicates a, b, c on words, and the constraints of G w allow to identify them without ambiguity in an unlabeled graph: x a is the only vertex with no in-edge, and x b , x c form the only 2-cycle, x b being the vertex connected to x a .
We now need to express all these constraints via FO + formulas ψ − and ψ + .Recall that formulas in ψ + are meant to express when a graph is not in G w because of extra edges.Let E • = {(x a , x b ), (x b , x c ), (x c , x b )} the set of edges in the induced graph of rule (source), and its complement.We will also use as auxiliary formulas: We finally take for the ψ − (resp.ψ + ) the conjunction (resp.disjunction) of the formulas in its column.We obtain that by definition, a graph with marked vertices x a , x b , x c is in G w if and only if it satisfies ψ − but not ψ + .
It remains to describe how words on alphabet A = P({a, b, c}) can be encoded into these graphs.Let G ∈ G w , we will associate to it a word u using the following rules: • The positions dom(u) of u correspond to the square vertices of G.
• The order < on dom(u) corresponds to edges between square vertices.
• If x ∈ dom(u), we will say that a(x) (resp.b(x), c(x)) is true if there is in G an edge x a → x (resp.with x b , x c ).
With this, we can see that the graph of Figure 5 encodes the word ab a b c.
Proof.We use EF + -games on graphs of G w .By Corollary 5.8, we can use EF + -games on signature (E, =), with E monotone and = non-monotone and correspond to equality, to show that there is no positive formula defining φ .We will replicate the game from Lemma 4.4, using graphs instead of words.More precisely, let n ∈ N and N = 2 n .Let u = (abc , recall that u ∈ K and v / ∈ K.We associate to u and v graphs G u and G v in G w according to the above encoding.We have His strategy can actually mimic exactly the strategy σ D for Duplicator in EF + n (u, v).It suffices to play as σ D on squares, and if Spoiler plays a source in one of the graphs, Duplicator answers with the corresponding source in the other.It is straightforward to verify that this is a winning strategy for Duplicator in EF + n (G u , G v ), as both order and letter predicates on words directly translate to edge predicates on graphs of G w .This shows that there is no FO + formula equivalent to φ.
This concludes the proof of Theorem 5.9.5.4.Finite undirected graphs.We describe in this section how to lift the counter-example from directed graphs to undirected ones.
As the proof scheme follows the same pattern as in the directed case, we will go in less details and just describe here the modifications to be made to lift the previous proof to undirected graphs.
We will again use distinguished "source vertices" that will encode letter predicates, except that now there will be more than one source per letter.As we can no longer use the orientation of edges to isolate these sources without ambiguity, we will instead use cycles, in the same spirit as the x b − x c cycle in the directed case: the sources will form the only cycles of length at most 5.More precisely, the a-sources will form a 3-cycle, the b-sources a 4-cycle, and the c-sources a 5-cycle, for a total of 12 sources.Moreover, there are no other edges between sources than the ones forming these cycles.This means that once again we completely impose the graph induced by the 12 sources: it has to consist in three disjoint cycles of size 3, 4, 5.We can therefore assume that we have a formula (x) stating that x is a source, and formulas a (x), b (x), c (x) specifying the letter of this source.Since sources will again be explicitly quantified in a formula, it is still possible to state in FO + that a vertex x is not a source, via a formula ¬ (x) simply asserting that x is different from all sources.
As before, we will use vertices called "squares" to encode positions of the word.This time, squares will not be all non-source vertices, but will be defined as follows: a square is any non-source vertex that is connected to a source by an edge.This can be expressed by a formula (x) = ¬ (x) ∧ ∃y.(y) ∧ E(x, y).We still need to encode the total order on squares, but since edges are not oriented anymore, we will make use of oriented "meta-edges" as described by the following picture: The fact that there is a meta-edge from square x to square y can be described by a formula M (x, y) asserting the existence of the above pattern.Notice that such a meta-edge can create a cycle of length 6, if x and y are connected to the same source, but will not introduce a cycle of length at most 5.
We call "diamonds" the auxiliary vertices that are not sources, and that are connected to a square by a path of length 1 or 2. This can be defined by a positive formula ♦(x).
For a graph to be in G w , we require the following additional constraints: (partition) Any vertex is either a source, a square or a diamond, with no overlap.
(cycle) The only cycles of length at most 5 are those composed of sources.
(order) The meta-edge relation form a strict total order on squares.
(diamonds) A diamond can be part of at most one meta-edge.
As before, we can express all these constraints with positive formulas ψ − and ψ + .Let us explicit these formulas for rules (partition) and (order).We can use ¬ (x) := (x) ∨ ♦(x) to assert that x is not a square, thanks to rule (partition).This allows us to quantify on squares only, that we will abbreviate ∃ and ∀ .

Constraint
The formulas for (cycle) and (diamonds) pose no additional difficulty: the one for (cycle) is similar to the directed case, and the one for (diamonds) only has a ψ + component, stating the existence of two meta-edges sharing a diamond.The formula ψ − (resp.ψ + ) will again be the conjunction (resp.disjunction) of its components coming from various rules.
A graph G ∈ G w will be an encoding of a word u ∈ (A \ {∅}) * , by interpreting the squares with the meta-edge order as (dom(u), <), and a predicate a(x) true if the square x is connected to an a-source (resp.with b, c).
For instance the following graph encodes the word ab a b c, where meta-edges are represented by dashed arrows: Similarly to the directed case, we will transform the formula ϕ K into a formula (ϕ K ) G using the following translation: As before, we define φ = ∃ x.(ψ − ∧ ((ϕ K ) G ∨ ψ + )), where x is the list of the 12 source vertices.The fact that φ is monotone and φ ∩ G w = G K , the graphs encoding words in K, is proved the same way.
We again show that there is no positive formula for φ using the EF + -game technique according to Corollary 5.8.Given words u and v as before, we choose canonical encodings G u and G v in G w , where all diamonds are used in some meta-edge.We choose G u and G v of the simplest possible form, for instance without parallel meta-edges.We will show that Duplicator can use his winning strategy from EF + n (u, v) to win in the (G u , G v ) arena as well.As before, sources and squares pose no problem, and the strategy can be directly copied from words to graphs.However, there is a new subtlety to take care of: Spoiler can now play on diamond vertices in G u or G v .Since by rule (diamonds), any diamond is part of exactly one meta-edge from a square x to a square y.In order to answer this move in the graph game, Duplicator will look at what happens in the word game if Spoiler plays positions x and y.Duplicator's winning strategy gives answers x and y to these moves.He can now answer the corresponding diamond in the other graph, in the meta-edge between squares x and y , at the same relative position as the diamond originally played by Spoiler.Since playing 2 moves on words can be necessary to answer one move on graphs, Duplicator will only be able to play n/2 rounds in the game on G u , G v .This is enough to show that for any n there are G u ∈ φ and G v / ∈ φ such that Duplicator wins EF + n (G u , G v ), thereby proving that φ is not definable in FO + .This concludes the proof scheme of the following theorem: Theorem 5.14.Lyndon's Theorem fails on finite undirected graphs: there is an FO-definable monotone class of undirected graphs that is not definable with a negation-free formula.

Undecidability of FO + -definability for regular languages
In this section, we are back to considering only finite words.Our goal will be to prove the following Theorem: Theorem 6.1.The following problem is undecidable: given L a regular language on a powerset alphabet, is L FO + -definable?
Notice that since FO-definability is decidable for regular languages, this theorem could be equivalently stated with an input language given by an FO formula.
We will a start with an informal proof sketch to convey the main ideas of the proof, before going to the technical details.
6.1.Proof sketch.The proof proceeds by reduction from the Turing Machine Mortality problem, known to be undecidable [Hoo66].A deterministic Turing Machine (TM) is mortal if there is a uniform bound n ∈ N on the length of its runs, starting from any arbitrary configuration.
Given a machine M , we want to build a regular language L M such that L M is FO +definable if and only if M is mortal.

Configuration words
The intuitive idea is that L M will mimic the language (a ↑ b ↑ c ↑ ) * from Definition 4.2, but the letters will be replaced by words encoding configurations of M .We therefore design an ordered alphabet A and a language C of configurations words such that words from C encode configurations of M .These words will be of three possible types 1, 2, 3, playing the role of the letters a, b, c of the language K.This partitions C into C 1 ∪ C 2 ∪ C 3 .We guarantee that the transitions of M will always change the types in the following way: 1 → 2, 2 → 3 or 3 → 1.
Moreover, we design the order ≤ A of the alphabet A so that given two words u 1 , u 2 from C, there is a word v that is bigger (for the order ≤ A ) than both u 1 and u 2 if and only if u 1 and u 2 are consecutive configurations of M .Such a word v will be written u 1 u 2 in this proof sketch.

Language L M
Finally, the language L M will be roughly defined as the upward-closure of (C 1 #C 2 #C 3 #) * , where # is a separator symbol.The only requirement for configuration words appearing in a word of L M is on their types.Apart from this, the configuration words can be arbitrary, they do not have to form a run of M .
We will then use the EF + -game technique to show that L M is FO + -definable if and only if M is mortal.

If M not mortal
The easier direction is proving that if M is not mortal, then L M is not FO + -definable.Indeed, if M is not mortal, we can choose an arbitrarily long run u = u 1 #u 2 # . . .#u N of M , where u 1 ∈ C 1 and N is a multiple of 3 (implying u N ∈ C 3 ).We build the word u N , and we verify that u ∈ L M and v / ∈ L M (because of the number of # modulo 3).Then, using the same technique as in the proof of Lemma 4.4, with C 1 , C 2 , C 3 playing the role of a, b, c, we show that Duplicator wins the EF + game on u, v with log(N ) rounds.Therefore L M is not FO + -definable, by Corollary 3.8.

If M mortal
The converse direction is more difficult: we have to show that if there is a bound n on the length of runs of M from any configuration, then L M is FO + -definable.We will again use the EF + -game and Corollary 3.8: we give an integer m (depending only on n) such that for any u ∈ L M and v / ∈ L M , Spoiler wins EF + m (u, v).Without loss of generality, consider u = u 0 #u 1 # . . .#u N a word of L M , where each u i is in C, and v a word not in L M .To describe the winning strategy of Spoiler, we will first rule out the problems in "local behaviours": if v contains a factor containing at most 2 symbols # preventing it from belonging to L M , then Spoiler wins easily in a bounded number of rounds by pointing this local inconsistency in v, that cannot be mirrored in u.The only remaining problem is the "long-term inconsistency" occurring in the previous EF + games: a long factor with two conflicting possible interpretations, each being forced by one of the endpoints.For instance, when dealing with the language K, such long-term inconsistencies were exhibited by words of the form a c , with the first letter being constrained to a and the last one to c.We have to show that contrarily to what happens with the language K, or in the case where M is not mortal, Spoiler can now point out such long-term inconsistencies in a bounded number of rounds.
To do that, let us abstract a configuration word w ∈ C by its height h(w): the length of the run of M starting in the configuration w.Our mortality hypothesis can be rewritten as: the height of any configuration word is at most n.A word w ∈ C will be abstracted by a single letter h(w) ∈ [0, n].We saw that if a word w is of the form w 1 w 2 , then w 1 and w 2 encode consecutive configurations of M , so their heights must be consecutive integers i + 1 and i.We will abstract such a word w by the letter i+1 i .This allows us to design an abstracted version of the EF-game, called the integer game, where letters are integers or pairs of integers, and with special rules designed to reflect the constraints of the original EF + game on (u, v).The integer game makes explicit the core combinatorial argument making use of the mortality hypothesis.We show that Spoiler wins this integer game in 2n rounds.We finally conclude by lifting this strategy to the original EF + -game.This ends the proof sketch, and we now go to the more detailed proof.6.2.The Turing Machine Mortality problem.We will start by describing the problem we will reduce from, called Turing Machine (TM) Mortality.
The TM Mortality problem asks, given a deterministic TM M , whether there exists a bound n ∈ N such that from any finite configuration (state of the machine, position on the tape, and content of the tape), the machine halts in at most n steps.We say that M is mortal if such an n exists.Theorem 6.2 [Hoo66].The TM Mortality problem is undecidable.Remark 6.3.The standard mortality problem as formulated in [Hoo66] does not ask for a uniform bound on the halting time, and allows for infinite configurations, but it is well-known that the two formulations are equivalent using a compactness argument.Indeed, if for all n ∈ N, the TM has a run of length at least n from some configuration C n , then we can find a configuration C that is a limit of a subsequence of (C n ) n∈N , so that M has an infinite run from C.
Notice that the initial and final states of M play no role here, so we will omit them in the description of M .Indeed, we can assume that M halts whenever there is no transition from the current configuration.
We will also assume without loss of generality that Q is partitioned into Q 1 , Q 2 , Q 3 , and that all possible successors of a state in Q 1 (resp.Q 2 , Q 3 ) are in Q 2 (resp.Q 3 , Q 1 ).Remark that if M is not of this shape, it suffices to make three copies Q 1 , Q 2 , Q 3 of its state space, and have each transition change copy according to the 1-2-3 order given above.This transformation does not change the mortality of M .We will say that p has type i if p ∈ Q i .The successor type of 1 (resp.2, 3) is 2 (resp.3, 1).
Our goal is now to start from an instance M of TM Mortality, and define a regular language L M such that L M is FO + -definable if and only if M is mortal.6.3.The base language L base .

The base alphabet
We define first a base alphabet A base .Words over this alphabet will be used to encode configurations of the TM M .
The letter [q.a] is used to encode the position of the reading head, q ∈ Q being the current state of the machine, and a ∈ Γ the letter it is reading.
A letter a δ will be used to encode a position of the tape that the reading head just left, via a transition δ writing an a on this position.A letter a δ will be used for a position of the tape containing a, and that the reading head is about to enter via a transition δ .We use a δ δ if both are simultaneously true, i.e. the reading head is coming back to the position it just visited.Finally, the letter # is used as separator between different configurations.

Configuration words
The encoding of a configuration of M is therefore a word of the form (for example): a 1 a 2 . . .(a i−1 ) δ [q.a i ](a i+1 ) δ . . .a n .
The letter (a i+1 ) δ indicates that the reading head came from the right via a transition δ = ( , , q, a i+1 , ←) (where is a placeholder for an unknown element).The letter (a i−1 ) δ indicates that it will go in the next step to the left via a transition δ = (q, a i , , , ←).
A word u ∈ (A base ) * is a configuration word if it encodes a configuration of M with no inconsistency.More formally, u is a configuration word if u contains no #, exactly one letter from Q × Γ (the reading head), and either one a δ and one b δ located on each side of the head, or just one letter a δ δ adjacent to the head.Moreover, the labels δ and δ both have to be coherent with the current content of the tape.Remark 6.4.Because we ask these δ and δ labellings to be present, configuration words only encode TM configurations that have a predecessor and a successor configuration.
The type of a configuration word is simply the type in {1, 2, 3} of the unique state it contains.
Let us call C ⊆ (A base ) * the language of configuration words.This language C is partitioned into C 1 , C 2 , C 3 according to the type of the configuration word.It is straightforward to verify that each C i is an FO-definable language.

Lemma 2. 4 .
Given an NFA B, we can compute in time O(|B| • |A|) an NFA B ↑ for the monotone closure of L(B).

Figure 2 :
Figure 2: The syntactic monoid M of K

Figure 5 :
Figure 5: A graph of G w

Figure 6 :
Figure 6: A meta-edge from square x to square y Figure 7: A graph of G w encoding ab a b c