Universal codes of the natural numbers

A code of the natural numbers is a uniquely-decodable binary code of the natural numbers with non-decreasing codeword lengths, which satisfies Kraft's inequality tightly. We define a natural partial order on the set of codes, and show how to construct effectively a code better than a given sequence of codes, in a certain precise sense. As an application, we prove that the existence of a scale of codes (a well-ordered set of codes which contains a code better than any given code) is independent of ZFC.


Introduction
Lossless coding theory concerns the problem of encoding a sequence of symbols in some alphabet, usually binary.We demand two properties from our codes: they need to be (uniquely) decodable, and they should be concise, that is, the codewords should be short.In this paper we address the following problem: how concise can a coding system for the natural numbers be?
In 1975, Elias [Eli75] considered this problem and constructed a sequence of efficient codes, culminating in the so-called ω-code (almost the same code had been discovered by Levenshtein [Lev68] in 1968).The third member in Elias's sequence of codes, called the γ-code, is already asymptotically optimal in the sense that given a non-decreasing high-entropy distribution on the natural numbers, the expected codeword length is almost optimal; consult Elias [Eli75] for a formal definition.
A natural question to ask is whether there exists an optimal code.We formulate this question in Section 3 and show that not only is there no single optimal code, but there is also no optimal sequence of codes.Since the proofs of these results are constructive, they can be used to construct a fast-growing hierarchy of codes.Elias's construction cannot be used to obtain this result, as we show in Section 3.1.
Care must be taken when considering the practical implications of these results: while all codes we consider are effective, they are not necessarily efficient, in the sense that 2 Y. FILMUS encoding and decoding could be slow.Furthermore, in practice one is not interested in the asymptotic performance of a code, but in its performance on integers up to a certain application-specific bound, or even on a certain class of distributions.
We go on further and consider the existence of a scale of codes, which is an uncountable sequence of codes, ordered so that latter codes are better (in the sense of Definition 2.4 below), and containing a code better than any given code.We show in Section 4 that the existence of a scale is independent of ZFC, imitating classical results on functions on the natural numbers ordered by dominance.

Definitions
We start with some basic notations.The set of all finite binary strings is denoted {0, 1} * .The set of natural numbers (including zero) is denoted N. The set of finite sequences of natural numbers is denoted N * .The length of a binary string x is denoted |x|.
Next, some terminology from recursion theory.A sequence a(n) is called effective if the mapping n → a(n) is recursive (computable by an algorithm).A sequence a n (m) of sequences is effective if the mapping (n, m) → a n (m) is recursive.A real number x is effective if there is a recursive function mapping n to a closed rational interval of width at most 1/n containing x (all rational intervals appearing in this paper are closed).
A sequence a(n) is effective relative to another sequence b(n) if the mapping n → a(n) is recursive given an oracle for the mapping n → b(n).The concept of being effective relative to a sequence of sequences or to a real number is defined analogously.Similarly we can extend the definition to cover sequences of sequences and real numbers which are effective relative to other data.
We proceed to define binary codes, which are our main focus of study.
Definition 2.1.A (uniquely-decodable) binary code of the natural numbers is a mapping C : N → {0, 1} * with the property that the function C * : N * → {0, 1} * defined by A prefix code has the additional property that C(n) is not a prefix of C(m) for any n = m.
Lemma 2.2 (Kraft's inequality).Let C be a binary code.Then Conversely, given a sequence c : N → N satisfying the inequality Due to this inequality and its converse, our study will concentrate only on the lengths of codewords rather than the codewords themselves.This prompts the following definition.
A code is a precode in which Kraft's inequality is tight.A proper precode is a precode in which Kraft's inequality is strict.
The theory can also be developed with respect to non-monotone codes, but we feel that this is less natural.We require that Kraft's inequality be tight for technical reasons (to make our constructions effective).We feel that this is not a large concession since (as we show in Section 3) any binary code can be improved to a binary code in which Kraft's inequality is tight.
Following properties of the sequence of codes constructed by Elias [Eli75], we define a partial order on precodes.
Definition 2.4.Let c, d be precodes.We say that c ≺ d This definition corresponds to the ratio test for convergent series: indeed, with any precode c we can associate a convergent series c ′ (n) = 2 −c(n) , and then c ≺ d if and only if c ′ (n)/d ′ (n) → ∞.This differs from the definition used by Cholshchevnikova [Cho83] and Vojtáš [Voj87], who apply the ratio test to the remainder term.
Armed with this definition, we can give some evidence to our claim that non-monotone codes are less natural.
Lemma 2.5.There is a function d : N → N, satisfying Kraft's inequality tightly, such that c d for any code c.
Proof.Define d as follows: The critical values, n = 4 k + k − 1, are 0, 4, 17, 66, . . .and so on.Let us check that d satisfies Kraft's equality: If c is any code then for any n we have

Existence of optimal codes
Our goal in this section is to show that there is no optimal code, or even optimal sequence of codes.This is the statement of the following theorem.
Theorem 3.1.For every sequence of codes (c n ) n∈N there is a code d, effective relative to the sequence, such that d ≺ c n for every n ∈ N.
Similar results in the related context of fast-growing functions were proved by du Bois-Reymond [dBR75] and Hadamard [Had94].Compared to these results, the main challenges in proving Theorem 3.1 are constructing d in an effective way, and ensuring that d is monotone.
The first step in proving Theorem 3.1 is constructing effectively a precode e satisfying e ≺ c n for every n ∈ N.
Lemma 3.2.For every sequence of codes c = (c k ) k∈N there is a proper precode e, effective relative to c, such that e ≺ c k for every k ∈ N. Furthermore, σ(e) ≤ 1/2 and σ(e) is also effective relative to c.
This shows that d is effective relative to c.Moreover, since the codes c k are monotone, so is d.We will construct a precode e ≺ d, and it will follow (as we show below) that e ≺ c k for all k ∈ N.
We start by computing a sequence The existence of p 0 implies that σ(d) is convergent, and so The idea now is to construct the sequence e as follows.Choose an appropriate increasing sequence 0 = r 0 < r 1 < • • • , and let e(n) = d(n) − m + C for r m ≤ n < r m+1 .We will choose the points r m for m ≥ 1 from the set J, and this will ensure that e is monotone.An appropriate choice of the points r m will ensure that σ(e) < ∞ is computable (as a function of C), and will enable us to choose a value of C guaranteeing σ(e) ≤ 1/2.
The sequence (r m ) m∈N is defined as follows.Let r 0 = 0, and for m ≥ 1, let r m be the minimal element of J which is larger than both r m−1 and p 2m .The sequence r is clearly effective relative to c. Define a sequence e ′ by e ′ (n The sequence e ′ is also effective relative to c.We claim that e ′ is monotone.Indeed, if We proceed to show that σ(e ′ ) is computable.For all m ∈ N we have This shows that σ(e ′ ) is computable.In particular, we can find an integer  Repeating this operation, we obtain a code d.
The main difficulty is computing an integer k such that 2 −k ≤ 1 − σ(e) ≤ 2 −k+2 .This is accomplished by computing an approximation to log 2 (1 − σ(e)), a function which is the subject of the following routine technical lemma.
Lemma 3.3.Let δ > 0 be a rational number and let x be a real number satisfying x ≤ 1− δ.Then log 2 (1 − x) is effective relative to x and δ.
Proof.Let ∆ be an integer satisfying δ ≥ 1/∆.The function Given non-zero n ∈ N, we show how to compute an interval of length at most 1/n containing log 2 (1 − x), given δ and an oracle for x.We start by computing ∆ = ⌈1/δ⌉ and N = ⌈2C∆n⌉ ≥ 2∆.We ask the oracle for a rational interval [a, b] ] is an interval of width at most C∆/N ≤ 1/(2n) containing f (x).Finally, using a Taylor series expansion we compute rationals g(a), g(b) approximating f (a), f (b) up to 1/(4n).The interval [g(b), g(a)] is a rational interval of width at most 1/n containing f (x).
Given this technical lemma, we are able to implement the program described above for the second step of the proof of Theorem 3.1.
Lemma 3.4.For any proper precode e there is a code d, effective relative to e and log 2 (1 − σ(e)), such that d(n) ≤ e(n) for all n ∈ N.
Proof.In this proof, whenever we use the term effective, we mean effective relative to e and log 2 (1 − σ(e)).

Y. FILMUS
We construct a sequence d t of precodes converging to d (we make this notion precise below).We will ensure that σ(d t ) < 1 and that the sequences d t and log 2 (1 − σ(d t )) are effective, and furthermore σ(d t ) is strictly increasing.
The starting point is the sequence d 0 (n) = e(n).Next suppose that d t has been defined.We will find effectively an integer k t satisfying ) is effective, we can effectively find an interval I t of width at most 1 containing it, and an integer k ′ t such that t , and otherwise we put , applying Lemma 3.3, we see that log 2 (1 − σ(d t+1 )) is effective.
We define d(n) = min t d t (n).Since k t is non-decreasing and k t −→ ∞, d is effective.Since each d t is monotone, so is d.Clearly σ(d) ≥ σ(d t ), hence 1 − σ(d t ) −→ 0 implies that σ(d) ≥ 1.On the other hand, each prefix of d is a prefix of d t for all sufficiently large t.Since each d t is a precode, we deduce that for all m ∈ N, m n=0 2 −d(m) < 1, and so σ(d) ≤ 1.Put together, σ(d) is a code.
We are now ready to prove the main theorem.
Proof of Theorem 3.1.Lemma 3.2 shows that there is a proper precode e satisfying e c n for all n ∈ N which is effective relative to c, and furthermore σ(e) ≤ 1/2 is also effective relative to c. Lemma 3.3 implies that log 2 (1 − σ(e)) is effective relative to c, and so we can apply Lemma 3.4 to obtain a code d satisfying d(m) ≤ e(m) for all m ∈ N which is effective relative to c.This clearly implies that d ≺ c n for all n ∈ N.

3.1.
Elias's construction.The proof of Theorem 3.1 is somewhat complicated, and one wonders whether there is any simpler construction.In this section we explain Elias's construction, and show that it doesn't always produce a better code.
Elias [Eli75] defines a sequence of codes, starting with the trivial code α(n) = n + 1. Successive codes in the sequence are defined by applying the following operation.Definition 3.5.Let c be a code.The successor code S(c) is defined by Lemma 3.6.For any code c, S(c) is a code which is effective relative to c.
Proof.Clearly S(c) is monotone and effective relative to c.It also satisfies Kraft's equality: If we start with α and apply the operation S successively, then we obtain progressively better codes.However, this phenomenon isn't universal.
Lemma 3.7.There exists an effective code c such that S(c) c.
The sequence is clearly monotone, and the contribution of stage n to the sum in Kraft's inequality is This lemma shows that Elias's construction cannot be used in place of Lemma 3.2.In the same paper, Elias also defines the ω-code, which is obtained through a diagonalization-like construction from the sequence of codes S (t) (α).We do not know how to generalize this construction.

Existence of scale
In the preceding section, we have shown that there is no optimal sequence of codes.However, if we widen our scope by allowing uncountable sequences, such an object could perhaps be found.
Definition 4.1.A scale of codes S is a set which is well-ordered with respect to ≺ (every non-empty subset of S has a maximal element) and is cofinal in the poset of codes (for every code c there is a code d ≺ c in S).
Instead of insisting that the scale be well-ordered, we could instead ask for it to be a chain (any two elements are comparable).Standard arguments show that if such an object exists then so does a scale.
Mimicking a result of Hausdorff [Hau07], we show that a scale exists given that the continuum hypothesis (CH) holds.This follows from Theorem 3.1 using a standard argument.

Y. FILMUS
Theorem 4.2.If CH holds then there exists a scale of codes.
Proof.We construct a scale S = {s α : α ∈ ω 1 } by transfinite induction on ω 1 , using the fact that the cardinality of the set of codes is c = ℵ 1 .Fix an enumeration (c α ) α<ω 1 of all codes.At step α, use Theorem 3.1 to construct a code s α ≺ {s β : β < α} ∪ {c α }, using the fact that α is countable.By construction, S is well-ordered.Since s α ≺ c α for any code c α , S is a scale.
We can also construct a model in which no scale exists.To that end, following a suggestion by Stefan Geschke [(ma], we add ω 2 codes using Cohen's forcing.Theorem 3.1 then implies, using standard arguments, that the poset of codes has no scale.Similar arguments appear in Frankiewicz and Zbierski [FZ94, II.5],Jech [Jec06, §24] and Scheepers [Sch93]. The construction uses the concept of code prefix, which represents partial information regarding a code.Definition 4.3.A code prefix is a finite non-decreasing sequence c(0), . . ., c(n) of natural numbers satisfying Kraft's inequality strictly, σ(c) < 1.
We say that a code (or code prefix) d extends a code prefix c if, as a sequence, c is a prefix of d.
The following lemma encapsulates all the information we need to know about codes, gleaned mainly from Theorem 3.1.Proof.For the first item, let c = c(0), . . ., c(n) be a code prefix.We can extend c to a code prefix c(0), . . ., c(n + 1) in infinitely many ways.Any such extension can be extended to a code prefix c ′ such that σ(c ′ ) = 1 − 2 −c(n+1) .Finally, extend c ′ to a code by affixing c(n + 1) + 1, c(n + 1) + 2, . . .at its end.For the second item, let c = c(0), . . ., c(r) be a code prefix, and let d be a code.Since c is monotone, σ(c) = A/2 c(r) for some integer A, and so σ(c) ≤ 1 − 2 −c(r) .Use Theorem 3.1 (with c n = d for all n ∈ N) to construct a code e ≺ d.We are now in a position to describe the forcing construction.The entire construction takes place inside a countable transitive model M of ZFC.Definition 4.5.A code prefix bundle is an ω 2 -sequence of code prefixes, only finitely many of which have non-zero length.The forcing P consists of the set of code prefix bundles, ordered by c < d whenever for each α < ω 2 , c α extends d α .
The support of a code prefix bundle c, denoted supp c, is the set of α < ω 2 such that c α has non-zero length.The support is always finite.
Lemma 4.6.The forcing P satisfies the countable chain condition: every antichain in P (a subset C ⊆ P in which any two c, d ∈ C are incompatible: there is no e ∈ P satisfying e < c and e < d) is at most countable.
Proof.Suppose that C is an uncountable antichain in P. Since the support of any code prefix bundle is finite, the ∆-system lemma shows that there is an uncountable subset D ⊆ C and a finite subset S ⊆ ω 2 such that supp c ∩ supp d = S for all c, d ∈ D. For each α ∈ S there are only countably many possible code prefixes, and so since S is finite, there is an uncountable subset E ⊆ D such that c α = d α for all α ∈ S and c, d ∈ E. However, since supp c ∩ supp d = S and c, d agree on S for all c, d ∈ E, all code prefix bundles in E are compatible, contradicting the assumption that C is an antichain.
Let G be a generic filter over P, and construct the model M [G], which contains G. Since P satisfies the countable chain condition, the forcing preserves cardinals.In the remainder of the section, we show that M [G] contains no scale of codes.
We start with some consequences of Lemma 4.4.Proof.The first item follows directly from Lemma 4.4(a).
The second item follows from the countable chain condition.Indeed, every code c ∈ M [G] (represented as a set of pairs (n, c(n))) has a nice name of the form {((n, m), a) : a ∈ A n,m }, where each A n,m ⊆ P is an antichain.Lemma 4.6 shows that each A n,m is countable, and so C = n,m∈N A n,m is countable.Each a ∈ C has finite support, and so altogether the name depends on countably many coordinates of c.
To prove the third item, we show that given n ∈ N, any code prefix bundle f can be extended to a code prefix bundle g that forces c α (m) ≤ d(m) for some m ≥ n.Let D = val(d, G).Using Lemma 4.4(b), we can extend f α to h α which satisfies h α (m) ≤ D(m) for some m ≥ n.The value of the prefix D(0), . . ., D(m) is forced by some code prefix bundle k extending f .Since d doesn't depend on the coordinate α, we can assume that k α = f α .The code prefix bundle g extends k by g α = h α , and by construction it forces c α (m) ≤ d(m).
Lemma 4.7 allows us to show that the bounding number of the poset of codes is ω 1 while its dominating number is ω 2 , implying that there is no scale of codes.
Theorem 4.8.In M [G] there is no scale of codes.
Proof.Let c be the ω 2 -sequence defined by c α = f ∈G f α .Suppose S is a scale of codes.For α < ω 1 , let s α ∈ S satisfy s α ≺ c α .We claim that S ′ = {s α : α < ω 1 } is cofinal in the poset of codes.Otherwise, there exists a code s ∈ S such that s ≺ s α ≺ c α for all α < ω 1 .Yet according to Lemma 4.7(b), such a code has a name which depends only on countably many coordinates of c. Considering any other coordinate α < ω 1 , Lemma 4.7(c) shows that s ⊀ c α .
The fact that S ′ is cofinal contradicts Lemma 4.7(c) in a different way: according to Lemma 4.7(b), all codes in S ′ have names depending (together) on at most ω 1 coordinates of c. Considering any other coordinate α < ω 2 , Lemma 4.7(c) shows that s ⊀ c α for all s ∈ S ′ , contradicting the fact that S ′ is cofinal.We conclude that S cannot have been a scale.

Discussion
Fast-growing hierarchies.Theorem 3.1 can be used to construct a fast-growing hierarchy of effective codes.Let µ be a countable ordinal, and assign a computable fundamental sequence (α (i) ) i∈N to every limit ordinal α < µ.The fast-growing hierarchy (c α ) α<µ is defined according to the following rules.The base case is c 0 (n) = n + 1.For a successor ordinal α + 1, use Theorem 3.1 to construct a code c α+1 ≺ c α .For a limit ordinal α, use Theorem 3.1 to construct a code c α such that c α ≺ c α (i) for all i ∈ N.
Cardinal characteristics of the continuum.Section 4 shows that the existence of a scale of codes is independent of ZFC.However, a more satisfying answer will explain how this phenomenon is related to other cardinal characteristics of the continuum.Specifically, it is known that if we do not require our codes to be monotone, then the resulting poset of codes is Tukey-equivalent to the ideal of measure-zero sets [Bar10, Lemma 4.12].Todorčević [Tod] conjectures that our poset is also Tukey-equivalent to the same ideal.
and so e ≺ c k .The second step of the proof of Theorem 3.1 completes the precode constructed in Lemma 3.2 to a code.Given a proper precode e, we construct a code d e by pointwise decreasing e.The idea is as follows.Suppose that 2 −k ≤ 1 − σ(e) ≤ 2 −k+2 .Find the first m such that e(m) > k, and create a new code e ′ by setting e ′ (m) = k and e ′ (n) = e(n) for n = m.The new code satisfies σ(e ′ ) ≥ σ(e) + 2 −k−1 and so 1 − σ(e ′ ) ≤ (7/8)(1 − σ(e)).

Lemma 4. 4 .
Let c be a code prefix.(a) The code prefix c can be extended to a code in infinitely many ways.(b) Given any code d and n ∈ N, the code prefix c can be extended to a code prefix b such that b(m) ≤ d(m) for some m ≥ n.

Lemma 4. 7 .
Let c be the ω 2 -sequence defined by c α = f ∈G f α .(a) For each α < ω 2 , c α is a code.Moreover, for α = β, c α = c β .(b) Every code in M [G] has a name in M P which depends on countably many coordinates of c. (c) Let d ∈ M P be a name of a code which does not depend on c α .Then val(d, G) ⊀ c α .
It remains to show that for all k ∈ N, e ≺ c k .Given k, t ∈ N, for all n ≥ r k+t+C we have e . Given d t and k t , define d t+1 as follows.Let m t be the minimal position for which d t (m t ) > k t .The new sequence d t+1 is obtained from d t by setting d t+1 (m t ) = k t and d t+1