Existential Definability over the Subword Ordering

We study first-order logic (FO) over the structure consisting of finite words over some alphabet $A$, together with the (non-contiguous) subword ordering. In terms of decidability of quantifier alternation fragments, this logic is well-understood: If every word is available as a constant, then even the $\Sigma_1$ (i.e., existential) fragment is undecidable, already for binary alphabets $A$. However, up to now, little is known about the expressiveness of the quantifier alternation fragments: For example, the undecidability proof for the existential fragment relies on Diophantine equations and only shows that recursively enumerable languages over a singleton alphabet (and some auxiliary predicates) are definable. We show that if $|A|\ge 3$, then a relation is definable in the existential fragment over $A$ with constants if and only if it is recursively enumerable. This implies characterizations for all fragments $\Sigma_i$: If $|A|\ge 3$, then a relation is definable in $\Sigma_i$ if and only if it belongs to the $i$-th level of the arithmetical hierarchy. In addition, our result yields an analogous complete description of the $\Sigma_i$-fragments for $i\ge 2$ of the pure logic, where the words of $A^*$ are not available as constants.


Introduction
The subword ordering.A word u is a subword of another word v if u can be obtained from v by deleting letters at an arbitrary set of positions.The subword ordering has been studied intensively over the last few decades.On the one hand, it appears in many classical results of theoretical computer science.For example, subwords have been a central topic in string algorithms [BY91,ERW08,Mai78].Moreover, their combinatorial properties are the basis for verifying lossy channel systems [AJ96].Particularly in recent years, subwords have received a considerable amount of attention.Notable examples include lower bounds in fine-grained complexity [BK15,BK18], algorithms to compute the set of all subwords of formal languages [ACH + 16, AMMS17, BCCP20, CPSW16, HMW10, HKO16, Zet15a, Zet15b, Zet16, Zet18, GLHK + 20, AZ23], and applications thereof to infinitestate verification [ABQ11, TMW15, BMTZ22, MTZ22, BMTZ20, BGM + 23b, BGM + 23a].Subwords are also the basis of Simon's congruence [SS97], which has recently been studied from algorithmic [FK18, GKK + 21, FKK + 23] and combinatorial [BFH + 20, DFK + 21, KKS15, KS19, SV23] viewpoints.
First-order logic over subwords.The importance of subwords has motivated the study of first-order logics (FO) over the subword ordering.This has been considered in two variants: In the pure logic, one has FO over the structure (A * , ≼), where A is an alphabet and ≼ is the subword ordering.In the version with constants, we have the structure (A * , ≼, (w) w∈A * ), which has a constant for each word from A * .Traditionally for FO, the primary questions are decidability and definability, particularly regarding quantifier alternation fragments Σ i .Here, decidability refers to the truth problem: Given a formula φ in a particular fragment over (A * , ≼) or (A * , ≼, (w) w∈A * ), respectively, does φ hold?By definability, we mean understanding which relations can be defined by formulas in a particular fragment.The Σ i -fragment consists of formulas in prenex form that begin with existential quantifiers and then alternate i − 1 times between blocks of universal and existential quantifiers.For example, the formula belongs to the Σ 1 -fragment, also called the existential fragment over (A * , ≼, (w) w∈A * ) with A = {a, b}.The formula has free variables u, v and refers to the constants a and b.It holds if and only if v has more b's or more a's than u.
Nevertheless, little is known about definability.Kudinov, Selivanov, and Yartseva have shown that using arbitrary first-order formulas over (A * , ≼), one can define exactly the relations from the arithmetical hierarchy 1 that are invariant under automorphisms of (A * , ≼) [KSY10, Theorem 5], if |A| ≥ 2. However, this does not explain definability of the Σ i -fragments.For example, in order to define all recursively enumerable languages, as far as we can see, their proof requires several quantifier alternations.An undecidability proof by Karandikar and Schnoebelen [KS15,Theorem 4.6] for the Σ 2 -fragment can easily be adapted to show that for each alphabet A, there exists a larger alphabet B such that every recursively enumerable language L ⊆ A * is definable in the Σ 2 -fragment over (B * , ≼, (w) w∈B * ).However, a full description of the expressiveness of the Σ 2 -fragment is missing.
Existential formulas.The expressiveness of existential formulas is even further from being understood.The undecidability proof in [HSZ17] reduces from solvability of Diophantine equations, i.e., polynomial equations over integers, which is a well-known undecidable problem [Mat93].To this end, it is shown in [HSZ17] that the relations ADD = {(a m , a n , a m+n ) | m, n ∈ N} and MULT = {(a m , a n , a m•n ) | m, n ∈ N} are definable existentially using the subword ordering, if one has at least two letters.Since Diophantine equations can be used to define all recursively enumerable relations over natural numbers, this implies that all recursively enumerable relations involving a single letter are definable existentially.However, this says little about which languages (let alone relations) over more than one letter are definable.For example, it is not clear whether the language of all w ∈ {a, b} * that do not contain aba as an infix, or the reversal relation v is the reversal of u}, are definable-it seems particularly difficult to define them over the subword ordering using the methods from [HSZ17].
Contribution.We show that for any alphabet A with |A| ≥ 3, every recursively enumerable relation R ⊆ (A * ) k , k ∈ N, is existentially definable in (A * , ≼, (w) w∈A * ).In fact, similarly to an observation made in [HSZ17], we even show that there is a single sufficiently complex word W ∈ A * such that the structure (A * , ≼, W ) with just this single constant symbol suffices to existentially define all recursively enumerable relations.Since every existentially definable relation is clearly recursively enumerable (via a simple enumerative algorithm), this completely describes the expressiveness of existential formulas for |A| ≥ 3. Despite the undecidability of the existential fragment [HSZ17], we find it surprising that all recursively enumerable relations-including relations like REV A -are existentially definable.
Our result yields characterizations of the Σ i -fragments for every i ≥ 2: It implies that for each i ≥ 2, the Σ i -fragment over (A * , ≼, (w) w∈A * ) can define exactly the relations in Σ 0 i , the i-th level of the arithmetical hierarchy, assuming |A| ≥ 3.This also provides a description of Σ i in the pure logic: It follows that in the Σ i -fragment over (A * , ≼), one can define exactly the relations in Σ 0 i that are invariant under automorphisms of (A * , ≼), if |A| ≥ 3. Since [HSZ17] shows that all recursively enumerable languages over one letter are definable in (A * , ≼, (w) w∈A * ) if |A| ≥ 2, it would suffice to define a bijection between a * and A * using subwords.However, since this seems hard to do directly, our proof follows a different route.We first show how to define rational transductions and then a special language from which one can build every recursively enumerable relation via rational transductions and intersections.In particular, a byproduct is a direct proof of undecidability of the existential fragment in the case of |A| ≥ 3 that avoids using undecidability of Diophantine equations2 .Key ingredients.The undecidability proof for the existential fragment from [HSZ17] shows that the relations ADD and MULT are definable, in addition to auxiliary predicates that are needed for this, such as concatenation and letter counting predicates of the form "|u| a = |v| b ".With these methods, it is difficult to express that a certain property holds locally-by which we mean: at every position in a word.Using concatenation, we can define languages like (a n b) * for each n ∈ N (see Section 3), which "locally look like a n b".But if we want to express that, e.g., aba does not occur as an infix, this is of little help, because words avoiding an infix need not be periodic.The ability to disallow infixes would aid us in defining rational transductions via runs of transducers, as these are little more than configuration sequences where pairs of configurations that are not connected by a transition do not occur as infixes.Such local properties are often easy to state with universal quantification, but this is not available in existential formulas.
An important theme in our proof is to express such local properties by carefully constructing long words in which w has to embed in order for w to have the local property.For example, our first lemma says: Each set X ⊆ A =ℓ can be characterized as the set of words (of length ≥ ℓ) that embed into each word in a finite set P .This allows us to define sets X * .
Steps I-III of our proof use techniques of this type to express rational transductions.In Step IV, we then define the special language G = {a n b n | n ≥ 0} * , which has the property that all recursively enumerable languages can be obtained from G using rational transductions and intersection.This yields all recursively enumerable relations over two letters in Step V.
In sum, Steps I-V let us define all recursively enumerable relations over {a, b}, provided that the alphabet A contains an additional auxiliary letter.It then remains to define recursively enumerable relations that can also involve all other letters in A. We do this in Step VI by observing that each word w ∈ A * is determined by its projections to binary alphabets B ⊆ A. This allows us to compare words by looking at two letters at a time and use the other (currently unused) letters for auxiliary means.
A conference version of this paper appeared in [BGTZ22].

Main results
We say that u is a subword of v, written u ≼ v, if there exist words u 1 , . . ., u n and v 0 , . . ., v n such that Subword logic.We consider first-order logic over the structure (A * , ≼), first-order logic over the structure (A * , ≼, w 1 , . . ., w n ) enriched with finitely many constant symbols w 1 , . . ., w n ∈ A * , and first-order logic over the structure (A * , ≼, (w) w∈A * ) enriched with constant symbols w for every word w ∈ A * .A first-order formula φ with free variables x 1 , . . ., x k defines a relation R ⊆ (A * ) k if R contains exactly those tuples of words (v 1 , . . ., v k ) that satisfy 3 the formula φ.
Let us define the quantifier alternation fragments of first-order logic.A formula without quantifiers is called Σ 0 -formula or Π 0 -formula.For i ≥ 1, a Σ i -formula (resp.Π i -formula) is one of the form ∃x 1 • • • ∃x n φ (resp.∀x 1 • • • ∀x n φ), where φ is a a Π i−1 -formula (resp.Σ i−1 -formula), x 1 , . . ., x n are variables, and n ≥ 0. In other words, a Σ i -formula is in prenex form and its quantifiers begin with a block of existential quantifiers and alternate at most i−1 times between universal and existential quantifiers.The Σ i -fragment (Π i -fragment) consists of the Σ i -formulas (Π i -formulas).In particular, the Σ 1 -fragment (called the existential fragment) consists of the formulas in prenex form that only contain existential quantifiers.
Expressiveness with constants.Our main technical contribution is the following.
Theorem 2.1.Let A be an alphabet with |A| ≥ 3. A relation is definable in the Σ 1 -fragment over (A * , ≼, (w) w∈A * ) if and only if it is recursively enumerable.
We prove Theorem 2.1 in Section 3. Theorem 2.1 in particular yields a description of what is expressible using Σ i -formulas for each i ≥ 1. Recall that the arithmetical hierarchy consists of classes Σ 0 1 , Σ 0 2 , . .., where Σ 0 1 = RE is the class of recursively enumerable relations, and for i ≥ 2, we have Σ 0 i = RE Σ 0 i−1 .Here, for a class of relations C, RE C denotes the class of relations recognized by oracle Turing machines with access to oracles over the class C. 3 The correspondence between the entries in the tuple and the free variables of φ will always be clear, because the variables will have an obvious linear order by sorting them alphabetically and by their index.For example, if φ has free variables xi for 1 ≤ i ≤ k and yj for 1 ≤ j ≤ ℓ, then we order them as x1, . . ., x k , y1, . . ., y ℓ .
Corollary 2.2.Let A be an alphabet with |A| ≥ 3 and let i ≥ 1.A relation is definable in the Σ i -fragment over (A * , ≼, (w) w∈A * ) if and only if it belongs to Σ 0 i .
By [HSZ21, Theorem 3.5] the undecidability of the Σ 1 -fragment already holds for (A * , ≼, W ) where W ∈ A * is a sufficiently complex constant.Using the same ideas we show that the characterizations from Theorem 2.1 and Corollary 2.2 also already hold for a single constant, which will be proven in Section 4.
Expressiveness of the pure logic.Corollary 2.2 completely describes the relations definable in the structure (A * , ≼, (w) w∈A * ) if |A| ≥ 3. We can use this to derive a description of the relations definable without constants, i.e., in the structure (A * , ≼).The lack of constants slightly reduces the expressiveness; to make this precise, we need some terminology.
An automorphism (of It is straightforward to check that every formula over (A * , ≼) defines an automorphism-invariant relation.Thus, in the Σ i -fragment over (A * , ≼), we can only define automorphism-invariant relations inside Σ 0 i .
Corollary 2.4.Let A be an alphabet with |A| ≥ 3 and let i ≥ 2. A relation is definable in the Σ i -fragment over (A * , ≼) if and only if it is automorphism-invariant and belongs to Σ 0 i .
To give some intuition on automorphism-invariant sets, let us recall the classification of automorphisms of (A * , ≼), shown implicitly by Kudinov, Selivanov, and Yartseva in [KSY10] (for a short and explicit proof, see [HSZ21, Lemma 3.8]): A map α : A * → A * is an automorphism of (A * , ≼) if and only if (i) the restriction of α to A is a permutation of A, and (ii) α is either a word morphism, i.e., α(a Finally, Corollary 2.4 raises the question of whether the Σ 1 -fragment over (A * , ≼) also expresses exactly the automorphism-invariant recursively enumerable relations.It does not: Observation 2.5.Let |A| ≥ 2. There are undecidable binary relations definable in the Σ 1 -fragment over (A * , ≼).However, not every automorphism-invariant regular language is definable in it.

Existentially defining recursively enumerable relations
In this section, we prove Theorem 2.1.Therefore, we now concentrate on definability in the Σ 1 -fragment.Moreover, for an alphabet A, we will sometimes use the phrase Σ 1 -definable over A as a shorthand for definability in the Σ 1 -fragment over the structure (A * , ≼, (w) w∈A * ).
Notation.For an alphabet A, we write A =k , A ≥k , and A ≤k for the set of words over A that have length exactly k, at least k, and at most k, respectively.We write |w| for the length of a word w.If B ⊆ A is a subalphabet of A then |w| B denotes the number of occurrences of letters a ∈ B in w, or simply |w| a if B = {a} is a singleton.Furthermore, we write π B : A * → B * for the projection morphism which keeps only the letters from B. If B = {a, b}, we also write π a,b for π {a,b} .The downward closure of a word v ∈ A * is defined as v↓ := {u ∈ A * | u ≼ v}.
Basic relations.We will use two kinds of relations, concatenation and counting letters, which are shown to be Σ 1 -definable in (A * , ≼, (w) w∈A * ) as part of the undecidability proof of the truth problem in [HSZ17, Theorem III.3].The following relations are Σ 1 -definable if |A| ≥ 2.
Moreover, we will make use of a classical fact from word combinatorics: For u, v ∈ A * , we have uv = vu if and only if there is a word r ∈ A * with u ∈ r * and v ∈ r * [Ber79].In particular, if p is primitive, meaning that p ∈ A + and there is no r ∈ A * with |r| < |p| and p ∈ r * , then up = pu is equivalent to u ∈ p * .Furthermore, note that by counting letters as above, and using concatenation, we can also say |u| a = |vw| a , i.e., |u| a = |v| a + |w| a for a ∈ A. With these building blocks, we can state arbitrary linear equations over terms |u| a with u ∈ A * and a ∈ A. For any subalphabet B ⊆ A one can clearly define B * over A. Hence definability of a relation over B also implies definability over the larger alphabet A.
Finite state transducers.An important ingredient of our proof is to define regular languages in the subword order, and, more generally, rational transductions, i.e., relations recognized by finite state transducers.
For k ∈ N, a k-ary finite state transducer The transducer T recognizes the k-ary relation R(T ) ⊆ (A * ) k containing precisely those k-tuples (w 1 , . . ., w k ), for which there is a transition sequence q 0 (a 1,1 ,...,a k,1 ) m for all i ∈ {1, . . ., k}.Such a transition sequence is called an accepting run of T .We sometimes prefer to think of the w i as produced output rather than consumed input and thus occasionally use terminology accordingly.A relation T is called a rational transduction if it is recognized by some finite state transducer T .Unary transducers (i.e., k = 1) recognize the regular languages.
Overview.As outlined in the introduction, our proof consists of six steps.In Steps I-III, we show that we can define all rational transductions T ⊆ (A * ) k over the alphabet B, if |B| ≥ |A| + 1.In Step IV, we define the special language G = {a n b n | n ≥ 0} * .From G, all recursively enumerable languages can be obtained using rational transductions and intersection, which in Step V allows us to define over B all recursively enumerable relations over A, provided that |B| ≥ |A| + 1.Finally, in Step VI, we use projections to binary alphabets to define arbitrary recursively enumerable relations over A, if |A| ≥ 3.
Step I: Defining Kleene stars.We first define the languages X * , where X consists of words of equal length.To this end, we establish an alternative representation for such sets.
Example 3.1.Before proving the general statement, let us illustrate how to define the language {ab, ba} * using an auxiliary symbol #.It suffices to define the language {ab#, ba#} * and then project to {a, b}.The simple but key observation is that u ∈ {ab, ba} ⇐⇒ u ≼ bab and u ≼ aba and |u| ≥ 2. (3.1) We claim that a word w belongs to {ab#, ba#} * if and only if The "only if"-direction is immediate.For the "if"-direction consider a word w satisfying (3.2), i.e. w = w 1 # • • • w n # where each w i belongs to {a, b} * .Then each word w i is a subword of aba and bab.Since |w| = 3n either all words w i have length 2 and therefore w i ∈ {ab, ba} by (3.1); or, there exists some w i with |w i | > 2. But then again w i ∈ {ab, ba} by (3.1), contradicting Lemma 3.2.Every nonempty set X ⊆ A =ℓ can be written as X = A ≥ℓ ∩ p∈P p↓ for some finite set P ⊆ A * .
Proof.We can assume ℓ ≥ 1 since otherwise X = {ε} = A ≥0 ∩ ε↓.Let w ∈ A * be any permutation of A (i.e., each letter of A appears exactly once in w).If a ∈ A, then (w \ a) denotes the word obtained from w by deleting a.For any nonempty word u = a 1 • • • a k ∈ A + , a 1 , . . ., a k ∈ A, define the word Note that p u does not contain u as a subword: In trying to embed each letter a i of u into p u , the first possible choice for a 1 comes after the initial sequence (w \ a 1 ) (w \ a 1 ).Similarly, the next possible choice for each subsequent a i is right after (w \ a i ) (w \ a i ).However, this only works until a k−1 , since there is no a k at the end of p u .On the other hand, observe that p u contains every word v ∈ A ≤k \ {u} as a subword: Here u ∈ A =ℓ+1 was added to also exclude all words of length greater than ℓ.
Lemma 3.3.Let A ⊆ B be finite alphabets and # ∈ B \ A. Let X ⊆ A =k and Y ⊆ A =ℓ be sets.Then (X#Y #) * and X * are Σ 1 -definable over B.
Proof.We can clearly assume that X, Y are nonempty.By Lemma 3.2 we can write X = A ≥k ∩ p∈P p↓ and Y = A ≥ℓ ∩ q∈Q q↓ for some finite sets P, Q ⊆ A * .We claim that w ∈ (A ∪ {#}) * belongs to (X#Y #) * if and only if Observe that the number n is uniquely determined by |w| # .The "only if"-direction is clear.Conversely, suppose that w ∈ (A ∪ {#}) * satisfies the formula.We can factorize w = x 1 #y 1 # . . .x n #y n # where each x i is a subword of each word p ∈ P , and each y i is a subword of each word q ∈ Q.If some word x i were strictly longer than k, then it would belong to X by the representation of X, and in particular would have length k, contradiction.Therefore each word x i has length at most k, and similarly each word y i has length at most ℓ.However, since the total length of x 1 y 1 . . .
Here, we express u ∈ (p#q#) * as follows.If p ̸ = q, then p#q# is primitive and u ∈ (p#q#) * is equivalent to u(p#q#) = (p#q#)u.If p = q, then u ∈ (p#q#) * is equivalent to up# = p#u and |u| # being even.Finally, to define X * we set Y = {ε} and obtain Step II: Blockwise transductions.On our way towards rational transductions, we work with a subclass of transductions.If T ⊆ A * × A * is any subset, then we define the relation We call a transduction blockwise if it is of the form T * for some T ⊆ A =k × A =ℓ and k, ℓ ∈ N.
and hence L is Σ 1 -definable over B by Lemma 3.3.The languages X = (A =k ##) * and Y = (#A =ℓ #) * are also definable over B by Lemma 3.3.Then (x, y) ∈ R if and only if Step III: Rational transductions.We are ready to define arbitrary rational transductions.
Proof.Let a, b ∈ B. Let us first give an overview.Suppose the transducer for T has n transitions.Of course, we may assume that every run contains at least one transition.The idea is that a sequence of transitions is encoded by a word, where transition j ∈ {1, . . ., n} is represented by a j b n+1−j .We will define predicates run and input i for i ∈ {1, . . ., k} with (w 1 , . . ., w k ) ∈ T ⇐⇒ ∃w ∈ {a, b} * : run(w) Here, run(w) states that w encodes a sequence of transitions that is a run of the transducer.Moreover, input i (w, w i ) states that w i ∈ A * is the input of this run in the i-th coordinate.
We begin with the predicate run.Let us call the words in X = {a j b n+1−j | j ∈ {1, . . ., n}} the transition codes.Let ∆ be the set of all words a i b n+1−i a j b n+1−j for which the target state of transition i and the source state of transition j are the same.Note that a word w ∈ X * represents a run if (1) w begins with a transition that can be applied in an initial state, (2) w ends with a transition that leads to a final state, and (3) either w ∈ ∆ * ∩ X∆ * X or w ∈ X∆ * ∩ ∆ * X, depending on whether the run has an even or an odd number of transitions.Thus, we can define run(w) using prefix and suffix relations and membership to sets ∆ * .The prefix and suffix relation can be defined over {a, b} using concatenation, see also [HSZ17, Theorem III.3, step 14].Finally, we can express w ∈ X * , w ∈ ∆ * and similar with Lemma 3.3.
It remains to define the input i predicate.In the case that every transition reads a single letter on each input (i.e., no ε input), we can simply replace each transition code in w by its i-th input letter using a blockwise transduction.To handle ε inputs, we define input i in two steps.Fix i and let A = {a 1 , . . ., a m }.We first obtain an encoded version u i of the i-th input from w: For every transition that reads a j , we replace its transition code with ab j ab m−j a.Moreover, for each transition that reads ε, we replace the transition code by b m+3 .Using Lemma 3.4, this replacement is easily achieved using a blockwise transduction.Hence, each possible input in A ∪ {ε} is encoded using a block from Y ∪ {b m+3 }, where Y = {ab j ab m−j a | j ∈ {1, . . ., m}}.
Suppose we have produced the encoded input u i ∈ (Y ∪ {b m+3 }) * .In the next step, we want to define the word v i ∈ Y * , which is obtained from u i by removing each block b m+3 from u i .We do this as follows: Note that here, we can express v i ∈ Y * because of Lemma 3.3.In the final step, we turn v i into the input w i ∈ A by replacing each block ab j ab m−j a with a j for j ∈ {1, . . ., m}.This is just a blockwise transduction and can be defined by Lemma 3.4 because |B| ≥ |A| + 1.
Remark 3.6.We do not use this here, but Lemma 3.5 also holds without the assumption |B| ≥ 3. Indeed, if |B| = 2, then this would imply |A i | = 1 for every i.Then we can write A i = {a i } for (not necessarily distinct) letters a 1 , . . ., a k .Since T is rational, the set of all (x 1 , . . ., x k ) ∈ N k with (a x 1 1 , . . ., a x k k ) ∈ T is semilinear, and thus Σ 1 -definable in (N, +, 0).It follows from the known predicates that T is Σ 1 -definable using subwords over {a 1 , . . ., a k }.
Step IV: Generator language.Our next ingredient is to express a particular non-regular language G (and its variant G # ): This will be useful because from G, one can produce all recursively enumerable sets by way of rational transductions and intersection.
Proof.Note that Here, the language # * ab# * can be defined using concatenation.Moreover, since every word in # * ab# * is primitive, we express w ∈ v * by saying vw = wv.
Proof.Suppose # ∈ A \ {a, b}.Since G = π a,b (G # ), it suffices to define G # .We can define the language a * b * # as a concatenation of a * , b * , and #.The next step is to define the language K = (a * b * #) * .To this end, notice that Here, since the words in a * b * # are primitive, we can express v ∈ u * by saying vu = uv.Thus, we can define K. Using K and Lemma 3.7, we can define G # , since Step V: Recursively enumerable relations over two letters.We are now ready to define all recursively enumerable relations over two letters in (A * , ≼, (w) w∈A * ), provided that |A| ≥ 3.For two rational transductions T ⊆ A * × B * and S ⊆ B * × C * , and a language L ⊆ A * , we denote application of T to L as T L = {v ∈ B * | ∃u ∈ L : (u, v) ∈ T } ⊆ B * , and we denote composition of S and T as The latter is again a rational transduction (see e.g.[Ber79]).
Let us briefly sketch the proof of Lemma 3.9.It essentially states that every recursively enumerable language can be accepted by a machine with access to two counters that work in a restricted way.The two counters have instructions to increment, decrement, and zero test (which correspond to the letters a, b, and # in G # ).The restriction, which we call "locally one-reversal" (L1R) is that in between two zero tests of some counter, the instructions of that counter must be one-reversal : There is a phase of increments and then a phase of decrements (in other words: after a decrement, no increments are allowed until the next zero test).
To show this, Hartmanis and Hopcroft use the classical fact that every recursively enumerable language can be accepted by a four counter machine (without the L1R property).Then, the four counter values p, q, r, s can be encoded as 2 p 3 q 5 r 7 s in a single integer register that can (i) multiply with, (ii) divide by, (iii) test non-divisibility by the constants 2, 3, 5, 7.Such a register, in turn, is easily simulated using two L1R-counters: For example, to multiply by f ∈ {2, 3, 5, 7}, one uses a loop that decrements the first counter and increments the second by f , until the first counter is zero.The other instructions are similar.
Lemma 3.10.For every recursively enumerable relation R ⊆ ({a, b} * ) k , there is a rational transduction T ⊆ ({a, b} * ) k+2 such that (3.4) Proof.We shall build T out of several other transductions.These will be over larger alphabets, but since we merely compose them to obtain T , this is not an issue.
A standard fact from computability theory states that a relation is recursively enumerable if and only if it is the homomorphic image of some recursively enumerable language.In particular, there is a recursively enumerable language L ⊆ B * and morphisms β 1 , . . ., β k such that R = {(β 1 (w), . . ., β k (w)) | w ∈ L}.By Lemma 3.9, we may write Notice that if γ : {a, b, #} * → {a, b} * is the morphism with γ(a) = a, γ(b) = b, and Taking the pre-image under a morphism (here γ) and then intersecting with the regular language (here (a * b * #) * ) can be performed by a single rational transduction [Ber79, Theorem 3.2].This means, there is a rational transduction S ⊆ {a, b} * × {a, b, #} * with G # = SG.Therefore, we can replace G # in the above expression for L and arrive at In sum, we observe that (w 1 , . . ., w k ) ∈ R if and only if there exists a w ∈ C * with w ∈ (T 1 • S)G and w ∈ (T 2 • S)G such that w i = β i (α(w)) for i ∈ {1, . . ., k}.Consider the relation Note that T is rational: A transducer can guess w, letter by letter, and on track i ∈ {1, . . ., k}, it outputs the image under β i (α(•)) of each letter.To compute the output on tracks k + 1 and k + 2, it simulates transducers for T 1 • S and T 2 • S.Moreover, we have T ⊆ ({a, b} * ) k+2 and our observation implies that (3.4) holds.
Step VI: Arbitrary recursively enumerable relations.We have seen that if |A| ≥ 3, then we can define over A every recursively enumerable relation over two letters.In the proof, we use a third letter as an auxiliary letter.Our last step is to define all recursively enumerable relations that can use all letters of A freely.This clearly implies Theorem 2.1.To this end, we observe that every word is determined by its binary projections.Lemma 3.12.Let A be an alphabet with |A| ≥ 2 and let u, v ∈ A * such that for every binary alphabet B ⊆ A, we have π Proof.Towards a contradiction, suppose u ̸ = v.We clearly have |u| = |v|.Thus, if w ∈ A * is the longest common prefix of u and v, then u = wau ′ and v = wbv ′ for some letters a ̸ = b and words u ′ , v ′ ∈ A * .But then the words π a,b (u) and π a,b (v) differ: After the common prefix π a,b (w), the word π a,b (u) continues with a and the word π a,b (v) continues with b.
We now fix a, b ∈ A with a ̸ = b.For any binary alphabet B ⊆ A let ρ B : A * → {a, b} * be any morphism with ρ B (B) = {a, b} and ρ B (c) = ε for all c ∈ A \ B, i.e., ρ B first projects a word over A to B and then renames the letters from B to {a, b}.Recall that |A| 2 is the number of binary alphabets B ⊆ A. We define the encoding function e : A * → ({a, b} * ) ( |A| 2 ) which maps a word u ∈ A * to the tuple consisting of all words ρ B (u) for all binary alphabets B ⊆ A (in some arbitrary order).Note that e is injective by Lemma 3.12.Lemma 3.13.
Proof.For binary alphabets B, C ⊆ A, a map σ : B * → C * is called a binary renaming if (i) σ is a word morphism and (ii) σ restricted to B is a bijection of B and C. If, in addition, there is a letter # ∈ B ∩ C such that σ(#) = #, then we say that σ fixes a letter.
Observe that if we can Σ 1 -define all binary renamings, then the encoding function e can be Σ 1 -defined using projections and binary renamings.Thus, it remains to define all binary renamings.For this, note that every binary renaming can be written as a composition of (at most three) binary renamings that each fix some letter.Hence, it suffices to define any binary renaming that fixes a letter.Suppose σ : {c, #} * → {d, #} * with σ(c) = d and σ(#) = #.Without loss of generality, we assume c and {cd, #} * is definable by Lemma 3.7.

Restricting the signature to a single constant
In [HSZ17, Remark 3.4] the authors observe that their undecidability result for the existential fragment of subword logic with constants still holds, even if only a finite set of constant symbols is allowed.More precisely, they show that for any alphabet A with |A| ≥ 2 there are finitely many words w 1 , . . ., w n ∈ A * such that the truth problem for the Σ 1 -fragment over the structure (A * , ≼, w 1 , . . ., w n ) is undecidable.Furthermore, they remark that one can strengthen this result even more to only requiring a single constant W ∈ A * .The proof of the latter can be found in the extended version [HSZ21, Theorem 3.5].Using similar techniques, we can likewise strengthen our Σ 1 -definability result for alphabets of size at least 3: Theorem 4.1.Let A be an alphabet with |A| ≥ 3. Then there is a fixed word W ∈ A * such that every recursively enumerable relation R ⊆ Like in [HSZ21], we make use of the fact that any word of length at least 3 is uniquely defined by its set of strict subwords: Note that Lemma 4.2 does not hold for words of length 2 since ab↓ \ {ab} = {ε, a, b} = ba↓ \ {ba}.
Proof of Theorem 4.1.We begin with the case A = {a, b, c}, i.e. |A| = 3. Recall that concatenation is Σ 1 -definable in the structure (A * , ≼, (w) w∈A * ).Let w 1 , . . ., w n be the constant symbols that appear in the formula defining the concatenation relation.We choose W = a m+1 b m+2 c m+3 , where m = max 1≤i≤n |w i | is the maximal length among these constants.Since every word is a concatenation of letters, it now suffices to show that the letters a, b, c and words w 1 , . . ., w n are all Σ 1 -definable over (A * , ≼, W ).
In the following we use u ≺ v as a shorthand for u ≼ v ∧ u ̸ = v.The formula defines a sequence of subwords v 0 , . . ., v 3m+5 of W with |v i | = i.In particular, we have v 1 ∈ A.
If we repeat the same formula for two more sets of variables u 0 , . . ., u 3m+5 , t 0 , . . ., t 3m+5 , and additionally require then we have defined a, b, and c, but only up to renaming of letters.To ensure v 1 = c we use the following formula: It defines a sequence of v ′ 0 , . . .v ′ m+3 of subwords of W , among which the letters u 1 and t 1 do not occur.Therefore this sequence is comprised of words in v * 1 , and by choice of W a sequence of this length cannot exist for v 1 = a or v 1 = b.Observe that also |v ′ i | = i, which means that we have now additionally defined the words c 0 = ε to c m+3 .Using two similar formulas with m + 3 and m + 2 variables, respectively, we can now define the words b 0 to b m+2 and a 0 to a m+1 as well.
Observe that for any word w and any letter a ′ , |w| a ′ = ℓ is equivalent to a ℓ ≼ w∧a ℓ+1 ̸ ≼ w.Using this fact we can fix the number of occurrences of each letter a ′ , and therefore also fix the total length of a word.We continue by defining the remaining words of length ≤ 2, which are ab, ba, ac, ca, bc, cb.The formula defines the word s 01 = ab.By copying this formula for a new free variable s 10 and replacing the last conjunct by s 10 ̸ ≼ W , we can likewise define s 10 = ba.Similarly, we can also distinguish ac from ca, as well as bc from cb, since in both cases one of them is a subword of W while the other is not.
Finally we inductively define all constants up to length m.Let w be a word of length 3 ≤ |w| ≤ m.Then by induction hypothesis we have defined all constants up to length |w| − 1. Furthermore for every a ′ ∈ A we have defined the constants a ′|w| a ′ and a ′|w| a ′ +1 , since |w| a ′ ≤ |w| ≤ m.By Lemma 4.2 the word w is uniquely defined by its set of strict subwords.Therefore the following formula defines s = w: Since a, b, c and w 1 , . . ., w n have at most length m, we have successfully defined all of the required constants.To conclude the case |A| = 3, we add existential quantifiers for all the variables representing constants that we do not use outside of these auxiliary formulas.Since we only ever used existential quantification, we have shown definability in the Σ 1 -fragment of (A * , ≼, W ).
The case A = {a 1 , . . ., a k } for k ≥ 4 is very similar.We define the number m in the same way as before and choose W = a m+1 1 • • • a m+k k .Then we can again define all constants a 0 i to a m+1 i for every letter a i .We also proceed to define all remaining words of length 2, in the same way as in the previous case.Finally, Lemma 4.2 and the fact ̸ ≼ w allow us to inductively define all constants w up to length m, like before.

Further consequences
In this section, we prove Corollary 2.2, Corollary 2.4 and Observation 2.5.When working with higher levels (Σ 0 i for i ≥ 2) of the arithmetic hierarchy, it will be convenient to use a slightly different definition than the one using oracle Turing machines: [Koz10, Theorem 35.1] implies that for i ≥ 1, a relation R ⊆ (A * ) k belongs to Σ 0 i+1 if and only if it can be written as R = π((A * ) k+ℓ \ S), where S ⊆ (A * ) k+ℓ is a relation in Σ 0 i and π : (A * ) k+ℓ → (A * ) k is the projection to the first k coordinates.
Proof of Corollary 2.2.It is immediate that every predicate definable in the Σ i -fragment of (A * , ≼, (w) w∈A * ) belongs to Σ 0 i , because the subword relation is recursively enumerable.We show the converse using induction on i, such that Theorem 2.1 is the base case.Now suppose that every relation in Σ 0 i is definable in the Σ i -fragment of (A * , ≼, (w) w∈A * ) and consider a relation R ⊆ (A * ) k in Σ 0 i+1 .Then we can write R = π((A * ) k+ℓ \ S) for some ℓ ≥ 0, where π : (A * ) k+ℓ → (A * ) k is the projection to the first k coordinates, and S ⊆ (A * ) k+ℓ is a relation in Σ 0 i .By induction, S is definable by a Σ i -formula φ over (A * , ≼, (w) w∈A * ).By negating φ and moving all negations inwards, we obtain a Π i -formula ψ that defines (A * ) k+ℓ \ S. Finally, adding existential quantifiers for the variables corresponding to the last ℓ coordinates yields a Σ i+1 -formula for R = π((A * ) k+ℓ \ S).
Note that if instead of Theorem 2.1 we use Theorem 4.1 as the base case in the induction above, it follows that there exists a word W ∈ A * such that Corollary 2.2 also holds for the structure (A * , ≼, W ) instead of (A * , ≼, (w) w∈A * ).This together with Theorem 4.1 yields Remark 2.3.
Finally, we look at the expressive power of the pure logic (A * , ≼).We start by proving Corollary 2.4, which characterizes relations definable in the Σ i -fragment of (A * , ≼) for i ≥ 2.
Proof of Corollary 2.4.Clearly, every relation definable with a Σ i -formula over (A * , ≼) must be automorphism-invariant and must define a relation in Σ 0 i .Conversely, consider an automorphism-invariant relation R ⊆ (A * ) k in Σ 0 i .Then R is definable using a Σ i -formula φ with free variables x 1 , . . ., x k over (A * , ≼, (w) w∈A * ) by Corollary 2.2.Let w 1 , . . ., w ℓ be the constants occurring in φ.From φ, we construct the Σ i -formula φ ′ over (A * , ≼), by replacing each occurrence of w j by a fresh variable y j .
The expressiveness of the existential fragment of (A * , ≼) is not well understood.Partial results are summarized in Observation 2.5.
Proof of Observation 2.5.Take a recursively enumerable, but undecidable subset S ⊆ N. Fix a letter a ∈ A and define the unary language L = {a n | n ∈ S}.By [HSZ21, Theorem 3.5] there exists a word W ∈ A * and a Σ 1 -formula φ(x) over (A * , ≼, W ) which defines L. Consider the formula φ ′ in the Σ 1 -fragment of (A * , ≼) obtained by replacing each occurrence of W by a fresh variable y.Then (v, W ) satisfies φ ′ if and only if v ∈ L. Thus, φ ′ defines an undecidable relation.
For the second statement, we claim that every language L ⊆ A * that is Σ 1 -definable in (A * , ≼) satisfies A * LA * ⊆ L. Hence, many automorphism-invariant regular languages such as a∈A a * are not definable.Note that for a ∈ A and u, v ∈ A * , we have u ≼ v if and only if au ≼ av.Thus, every Σ 0 -definable relation R ⊆ (A * ) k satisfies (w 1 , . . ., w k ) ∈ R if and only if (aw 1 , . . ., aw k ) ∈ R. Symmetrically, (w 1 , . . ., w k ) ∈ R is equivalent to (w 1 a, . . ., w k a) ∈ R. As a projection of a Σ 0 -definable relation, L thus satisfies A * LA * ⊆ L.
Observation 2.5 raises the question whether there are undecidable Σ 1 -definable languages in (A * , ≼), which we leave as an open problem.In fact, all examples of Σ 1 -definable languages that we have constructed are regular.While each individual Σ 1 -definable language could be decidable, the following observation implies that the membership problem for Σ 1 -definable languages is not decidable, if the formula is part of the input.
Proof.By [HSZ21, Theorem 3.5] there exists a word W ∈ A * so that the truth problem for Σ 1 -formulas over (A * , ≼, W ) is undecidable, which asks whether a given a sentence φ (formula without free variables) is true.In other words, testing whether W satisfies the formula obtained from φ by replacing each occurrence of W by a fresh free variable is undecidable.

Conclusion
We have shown how to define all recursively enumerable relations in the existential fragment of the subword order with constants for each alphabet A with |A| ≥ 3.If |A| = 1, then the relations definable in (A * , ≼, (w) w∈A * ) correspond to relations over N definable in (N, ≤) with constants.Hence, this case is very well understood: This structure admits quantifier elimination [Pél92, Theorem 2.2(b)], which implies that the Σ 1 -fragment is expressively complete and also that a subset of A * is only definable if it is finite or co-finite.In particular, Theorem 2.1 does not hold for |A| = 1.
We leave open whether Theorem 2.1 still holds over a binary alphabet.If this is the case, then we expect that substantially new techniques are required.In order to express non-trivial relations over two letters, our proof often uses a third letter as a separator and marker for "synchronization points" in subword embeddings.
For example, we can say |u| = 3 • |v| a + 2 • |w| b for u, v, w ∈ A * and a, b ∈ A. This also allows us to state modulo constraints, such as ∃v : |u| a = 2 • |v| a , i.e., "|u| a is even".Finally, counting letters lets us define projections: Note that for B ⊆ A and u, v ∈ A * , we have v = π B (u) if and only if v ≼ u and |v| b = |u| b for each b ∈ B as well as ¬(a ≼ v) for every a ∈ A \ B.

Remark 4. 3 .
In the proof of Theorem 4.1 we show that there is a number m ∈ N such that for the alphabetA = {a 1 , . . ., a k } with k ≥ 4 the word a m+1 1 • • • a m+k kis a valid choice for the constant symbol W .Moreover, the proof still works for any number m ′ > m andW = a m ′ +1 1 • • • a m ′ +k k.Therefore the theorem does not just hold for one fixed word W , but an infinite family of words W m ′ .