Addressing Machines as models of lambda-calculus

Turing machines and register machines have been used for decades in theoretical computer science as abstract models of computation. Also the $\lambda$-calculus has played a central role in this domain as it allows to focus on the notion of functional computation, based on the substitution mechanism, while abstracting away from implementation details. The present article starts from the observation that the equivalence between these formalisms is based on the Church-Turing Thesis rather than an actual encoding of $\lambda$-terms into Turing (or register) machines. The reason is that these machines are not well-suited for modelling $\lambda$-calculus programs. We study a class of abstract machines that we call"addressing machine"since they are only able to manipulate memory addresses of other machines. The operations performed by these machines are very elementary: load an address in a register, apply a machine to another one via their addresses, and call the address of another machine. We endow addressing machines with an operational semantics based on leftmost reduction and study their behaviour. The set of addresses of these machines can be easily turned into a combinatory algebra. In order to obtain a model of the full untyped $\lambda$-calculus, we need to introduce a rule that bares similarities with the $\omega$-rule and the rule $\zeta_\beta$ from combinatory logic.


Introduction
In theoretical computer science several models of computation have been considered over the years, since the pioneering work of Turing [Tur36]. Turing Machines (TMs) certainly played a crucial role in the understanding of the notion of computation, while Register Machines (RMs) are more adapted to represent programs executed in a von Neumann architecture [Rog87]. From a recursion-theoretic perspective, the class of partial recursive functions provides a natural description of those numeric functions that can be calculated by a mechanical device [Kle36]. In mathematical logic, λ-calculus [Bar84] and the related formalismcombinatory logic [CF58] -proved to be an inexhaustible source of inspiration for the development of formal systems, proof assistants and functional programming languages. As it is well-known, the basic computational mechanism of λ-calculus is the symbolic substitution Contents. The aim of the paper is twofold. On the one side we want to present the class of addressing machines and analyze their fundamental properties. This is done in Section 2, where we describe their operational semantics in two different styles: as a term rewriting system (small-step semantics) and as a set of inference rules (big-step semantics). The two approaches are shown to be equivalent in case of addressing machines executing a terminating program (Proposition 2.16). On the other side, we wish to construct a model of the untyped λ-calculus based on addressing machines, and study the interpretations of λ-terms. For this reason, we recall in the preliminary Section 1 the main facts about λ-calculus, its equational theories and denotational models. It turns out that the set A of addresses, together with the operation of application previously described, is not a combinatory algebra (nor, a fortiori, a λ-model). In Section 3 we show that it can be turned into a combinatory algebra by quotienting under an equivalence relation arising naturally from our small-step operational semantics. Two addresses are equivalent if the corresponding machines are interconvertible using a more liberal rewriting relation. From the confluence property enjoyed by this relation, we infer the consistency of the algebra (Proposition 3.12). Unfortunately, the combinatory algebra so-obtained is not yet a model of λ-calculus -there are still β-convertible λ-terms having different interpretations. Section 5 is devoted to showing that a λ-model actually arises when adding to the system a mild form of extensionality sharing similarities both with the ω-rule in λ-calculus [Bar71] and with the rule ζ β from combinatory logic [HS86]. The

Preliminaries
We present some notions that will be useful in the rest of the article.
The λ-calculus is given by the set Λ endowed with reduction relations that turn it into a higher-order term rewriting system.
We say that a relation R ⊆ Λ 2 is compatible if it is compatible w.r.t. application and λ-abstraction. This means that, for M, N, P ∈ Λ, if M R N holds then also M P R N P , P M R P N and λx.M R λx.N hold. (iii) Moreover, we define → βη = → β ∪ → η . (iv) The relations → β , → η and → βη respectively generate the notions of multi-step reduction β , η , βη (resp. conversion = β , = η , = βη ) by taking the reflexive and transitive (and symmetric) closure.
Theorem 1.5 (Church-Rosser). The reduction relation β(η) is confluent: The λ-terms are classified into solvable and unsolvable, depending on their capability of interaction with the environment. Definition 1.6. A λ-term M is called solvable if (λ x.M ) P = β I for some x and P ∈ Λ. Otherwise M is called unsolvable.
We say that a λ-term M has a head normal form (hnf ) if it reduces to a λ-term of shape λx 1 . . . x n .yM 1 · · · M k for some n, k ≥ 0. As shown by Wadsworth  (i) A λ-theory T is any congruence on Λ 2 including β-conversion = β .
(ii) A λ-theory T is called: • extensional, if T contains the η-conversion = η as well; • sensible, if T is consistent and equates all unsolvable λ-terms; • semi-sensible, if T does not equate a solvable and an unsolvable.
The set of all λ-theories, ordered by inclusion, forms a quite rich complete lattice. We denote by λ (resp. λη) the smallest (resp. extensional) λ-theory. Both λ and λη are consistent, semi-sensible but not sensible. A λ-theory can be introduced syntactically, or semantically as the theory of a model. The model theory of λ-calculus is largely based on the notion of combinatory algebras, and its variations (see, e.g., [Koy82,Sel02,Mey82,HLS72] and [Bar84, Ch. 5]).
(i) An applicative structure is given by A = (A, · ) where A is a set and (·) is a binary operation on A called application. We represent application as juxtaposition and we assume it is left-associative, e.g., abc = (a · b) · c. An equivalence on A is a congruence if it is compatible w.r.t. application: (ii) A combinatory algebra C = (C, ·, k, s) is an applicative structure for a signature with two constants k, s, such that k = s and (∀x, y, z ∈ C): kxy = x, and sxyz = xz(yz).
We say that C is extensional if the following holds: (iii) Given a combinatory algebra C and a congruence on (C, · ), define: . It is easy to check that if k s then C is a combinatory algebra.
We call k and s the basic combinators; the derived combinators i and ε are defined by i = skk and ε = s(ki). It is not difficult to verify that every combinatory algebra satisfies the identities ix = x and εxy = xy.
It is well-known that combinatory algebras are models of combinatory logic. A λ-term M can be interpreted in any combinatory algebra C by first translating M into a term X of combinatory logic, written (M ) CL = X, and then interpreting the latter in C. However, there might be β-convertible λ-terms M, N that are interpreted as distinguished elements of C. For this reason, not all combinatory algebras are actually models of λ-calculus.
The axioms of an elementary subclass of combinatory algebras, called λ-models, were expressly chosen to make coherent the definition of interpretation of λ-terms (see [Bar84,Def. 5.2.1]). The Meyer-Scott axiom is the most important axiom in the definition of a λ-model. In the first-order language of combinatory algebras it becomes: The combinator ε becomes an inner choice operator, that makes coherent the interpretation of an abstraction λ-term.
1.3. Syntactic λ-models. The definition of a λ-model is difficult to handle in practice because the five Curry's axioms [Bar84, Thm. 5.2.5] are complicated to verify by hand. To prove that a certain combinatory algebra is actually a λ-model, it is preferable to exploit Hindley's (equivalent) notion of a syntactic λ-model. See, e.g., [Koy82].
The definition of syntactic λ-model in [Koy82] is general enough to interpret λ-terms possibly containing constantsâ representing elements a of a set A. We follow that tradition and denote by Λ(A) the set of all λ-terms possibly containing constants from A, and we call them λA-terms. For instance, given a ∈ A, we have M = I(λx.xâ)b ∈ Λ(A). All notions, notations and results from Subsection 1.1 extend to λA-terms without any problem. In particular, substitution is extended by settingâ[N/x] =â, for all a ∈ A and N ∈ Λ(A). As an example, the λA-term M above reduces as follows: Observe that substitutions of variables by constants always permute, namely Given a set A, a valuation in A is any map ρ : Var → A. We write Val A for the set of all valuations in A. Given ρ ∈ Val A and a ∈ A, define:    The precise correspondence between λ-models and syntactic λ-models is described in [Bar84], Theorem 5.3.6. For our purposes, it is enough to know that if S is a syntactic λ-model then is a λ-model. We say that S is extensional whenever C S is extensional as a combinatory algebra. This holds iff Th(S) is extensional iff S |= I = 1.

Addressing Machines
In this section we introduce the notion of an Addressing Machine. We first provide some intuitions, then we proceed with the formal description of such machines. The general structure of an addressing machine is composed by two substructures: • the internal components, organized as follows: a finite number of internal registers; an internal program. • the input-tape. As the name suggests, the addressing mechanism is central in this formalism. Each addressing machine is associated with an address, receives a list of addresses in its input-tape and is able to transfer the computation to another machine by calling its address, possibly extending its input-tape.
2.1. Tapes, Registers and Programs. We consider fixed a countable set A of addresses, together with a constant ∅ / ∈ A that we call "null" and that corresponds to an uninitialized register.
(i) An A-valued tape T is a finite (possibly empty) ordered list of addresses T = [a 1 , . . . , a n ] with a i ∈ A for all i ≤ n. We write T A for the set of all A-valued tapes. (ii) Let a ∈ A and T, T ∈ T A . We denote by a :: T the tape having a as first element and T as tail. We write T @ T for the concatenation of T and T , which is an A-valued tape itself. (iii) Given an index i ∈ N, an A ∅ -valued register R i is a memory-cell capable of storing either ∅ or an address a ∈ A. (iv) Given A ∅ -valued registers R 0 , . . . , R n for n ≥ 0, an address a ∈ A and an index i ∈ N, we write R[R i := a] for the registers R where the value of R i has been updated: Notice that, whenever i > n, we assume that R[R i := a] = R.
Addressing machines can be seen as having a RISC architecture, since their internal program is composed by only three instructions. We describe the effects of these basic operations on a machine having r internal registers R 0 , . . . , R r−1 . Therefore, when we say "if an internal register R i exists" we mean that the condition 0 ≤ i < r is satisfied. In the following, i, j, k ∈ N correspond to indices of internal registers: • Load i: corresponds to the action of reading the first element a from the input-tape T , and writing a on the internal register R i . If the input-tape is empty then the machine remains stuck waiting for an input (however, this is not considered as an error state). The precondition to execute the operation is that the input-tape is non-empty, namely T = a :: T ; the postconditions are that R i , if it exists, contains the address a and the input-tape of the machine becomes T . If R i does not exist, i.e. when i ≥ r, the content of R remains unchanged (i.e., the input element a is read and subsequently thrown away). • k App(i, j): corresponds to the action of reading the contents of R i and R j , calling an external application map on the corresponding addresses a 1 , a 2 , and writing the result in the internal register R k , if it exists. The precondition is that R i , R j exist and are initialized, i.e. R i , R j = ∅. The postcondition is that R k , if it exists, contains the address of the machine of address a 1 whose input-tape has been extended with a 2 . Otherwise the content of R remains unchanged. • Call i: transfers the computation to the machine whose address is stored in R i , extending its input-tape with the addresses that are left in T . The precondition is that R i exists and is initialized. The postcondition is that the machine having the address stored in R i is executed on the extended input-tape.
We define what is a syntactically valid program of this language, and introduce a decision procedure for verifying that the preconditions of each instruction are satisfied when it is executed. As we will see in Lemma 2.5, these properties are decidable and statically verifiable. As a consequence, addressing machines will never give rise to an error at run-time.
(i) A program P is a finite list of instructions generated by the following grammar (where ε represents the empty string, and i, j, k ∈ N): In other words a program starts with a list of Load's, continues with a list of App's and possibly ends with a Call. Each of these lists may be empty, in particular the empty-program ε can be generated. (ii) Given a program P , an r ∈ N, and a set I ⊆ {0, . . . , r − 1} of indices (representing initialized registers), define I |= r P as the least relation closed under the rules: We say that a program P is valid with respect to R whenever R |= r P holds for Notice that the notion of a valid program is independent from the tape of a machine.
Examples 2.3. Consider addresses a 1 , a 2 ∈ A, as well as A ∅ -valued registers R 0 = ∅, R 1 = a 1 , R 2 = a 2 , R 3 = ∅ (so r = 4). In this example, the set R of initialized registers as Above we use "5" as an index of an unexisting register. Notice that a program trying to update an unexisting register remains valid (see P 2 , P 3 ), the new value is simply discharged. On the contrary, an attempt at reading the content of an uninitialized (P 4 , P 5 ) or unexisting (P 6 ) register invalidates the whole program.
Lemma 2.5. For all A ∅ -valued registers R and program P it is decidable whether P is valid with respect to R.
Proof. Decidability follows from the fact that the grammar in Definition 2.2(i) is right-linear, the list of registers R is finite, the rules in Definition 2.2(ii) are syntax-directed and their side conditions are decidable.
2.2. Addressing machines and their operational semantics. Everything is in place to introduce the definition of an addressing machine. Thanks to Lemma 2.5 it is reasonable to require that an addressing machine has a valid internal program.
Definition 2.6. (i) An addressing machine M (with r registers) over A is given by a tuple: Examples 2.7. The following are addressing machines.
(i) For every n ∈ N, define an addressing machine with n + 1 registers as: We call x 0 , x 1 , x 2 , . . . indeterminate machines because they share some analogies with variables (they can be used as place holders). (ii) The addressing machine K with 1 register R 0 is defined by: iii) The addressing machine S with 3 registers is defined by: where: S.P = Load (0, 1, 2); 0 App(0, 2); 1 App(1, 2); 2 App(0, 1); Call 2 (iv) Assume that k ∈ A represents the address associated with the addressing machine K.
Define the addressing machine I as The addressing machine D with 1 register is given by: We now enter into the details of the addressing mechanism which constitutes the core of this formalism. In an implementation of addressing machines, it would be reasonable to pick up a fresh address from A whenever a new machine is constructed and save the correspondence in some address table. See Section 6 for more implementation details. To construct a λ-model, we need a uniform way of associating machines with their addresses.
Definition 2.8. Fix a bijective map # : M A → A from the set of all addressing machines over A to the set A of addresses. We call the map #(·) an Address Table Map (ATM).
(i) Given M ∈ M A , we say that #M is the address of M.
(ii) Given an address a ∈ A, we write # −1 (a) for the unique machine having address a.
In other words, we have # −1 (a) = M ⇐⇒ #M = a. (iii) Given M ∈ M A and T ∈ T A , we write M @ T for the machine That is, the application of a to b is the unique address c of the addressing machine obtained by adding b at the end of the input tape of the addressing machine # −1 (a).
Since both M A and A are countable sets, there exist 2 ℵ 0 possible choices for an ATM. Therefore, in general, the process of recursively dereferencing the addresses stored in the registers (or tape) of a machine might not terminate. This kind of behaviour is not pathological, rather intrinsic to the notions of addresses and dereference operators.
In practice, one may desire to work with an ATM performing the association between addressing machines and their addresses in a computable way. However, we do not require our ATMs to satisfy any effectiveness conditions since it would be peculiar to propose a model of computation depending on a pre-existing notion of "computable". The results presented in this paper are independent from the ATM under consideration.
Definition 2.10 (Small step operational semantics). Define a reduction strategy on addressing machines representing one head-step of computation as the least relation closed under the following rules: As usual, we write h for the transitive-reflexive closure of → h . We say that an addressing Similarly, M h stuck() means that M never reduces to a stuck addressing machine.
Remark 2.11. (i) Definition 2.10 is well defined since the validity of a program is preserved by h-reduction: This follows immediately from Definition 2.2(ii). In particular when executing Load i, or Call i, R i must be initialized and when executing k App(i, j) we must have R i , R j = ∅. (ii) Addressing machines in a final state are either of the form R, ε, T or R, Load i; P, [] , and in the latter case they are stuck.
Lemma 2.12. The reduction strategy → h enjoys the following properties: Proof. (i) Since the applicable rule from Definition 2.10, if any, is uniquely determined by the first instruction on M.P and its input-tape M.T .
(ii) Easy. By cases on the rule applied for deriving M → h N.
Examples 2.13. For brevity, we sometimes display only the first instruction of the internal program. Take a, b, c ∈ A.
(i) We show that K behaves as the first projection: (ii) We verify that S behaves as the combinator S from combinatory logic: (iv) Finally, we check that O gives rise to an infinite reduction sequence: Similarly, we can define a big-step operational semantics relating an addressing machine M with its final result (if any). Example 2.15. Recall that K.P = Load (0, −); Call 0. Notice that we cannot prove K @ [a, b] ⇓ # −1 (a) for an arbitrary a ∈ A, as we need to ensure that the resulting machine is in a final state. For this reason, we will use indeterminate machines x 1 , x 2 from Example 2.7(i).
K.P = Load 0; P ; We now show that the two operational semantics are equivalent on terminating computations.
Proposition 2.16. For M, N ∈ M A , the following are equivalent: Proof.
(1 ⇒ 2) By induction on the length n of the reduction In this section we show how to construct a combinatory algebra based on the addressing machines formalism. Recall that the addressing machines K and S have been defined in Example 2.7. Consider the algebraic structure A = (A, · , #K, #S) Since the application (·) is total, A is an applicative structure. However, it is not a combinatory algebra. For instance, the λA-term Kâb is interpreted as the address of the machine K @ [a, b] , which is a priori different from the address "a" because no computation is involved. Therefore, we need to quotient the algebra A by an equivalence relation equating at least all addresses corresponding to the same machine at different stages of the execution.
In the following, we denote by ≡ R an arbitrary binary relation on M A . The symbol R has no formal meaning, it is simply evocative of a relation. In the next definition, we are going to associate with every ≡ R two relations, respectively denoted R ⊆ A 2 and = R ⊆ M 2 A . Definition 3.1. Every binary relation ≡ R ⊆ M 2 A on addressing machines induces a relation which is then extended to: (i) A ∅ -valued registers: (ii) Tuples: a 1 , . . . , a n R b 1 , . . . , b m ⇐⇒ (n = m) ∧ (∀i ∈ {1, . . . , n} . a i R b i ); (This also applies to tuples of A ∅ -valued registers R R R .) (iii) A-valued tapes: [a 1 , . . . , a n ] R [b 1 , . . . , b m ] ⇐⇒ a R b (seen as tuples). In particular, M = R N entails that M and N share the same internal program, the number of internal registers, and the length of their input tape.
Lemma 3.2. If the relation ≡ R is an equivalence then so are R and = R .
Proof. Assume that ≡ R is an equivalence. Then, the fact that R is an equivalence follows from its definition since # −1 (·) is a bijection. Concerning the relation = R , reflexivity, symmetry and transitivity follow immediately from the same properties of R and =.
Definition 3.3. Define ≡ A ⊆ M 2 A as the least equivalence closed under: We say that M, N are evaluation equivalent whenever M ≡ A N. Examples 3.5. From the calculations in Examples 2.13, it follows that Lemma 3.6. The relation A is a congruence on A = (A, · , #K, #S).
Proof. By definition ≡ A is an equivalence, whence so is A by Lemma 3.2. Let us check that A is compatible w.r.t. (·). Consider a A a and b A b . Call M = # −1 (a) and N = # −1 (a ) and proceed by induction on a derivation of M ≡ A N, splitting into cases depending on the last applied rule.
( A ) By definition, there exists Z ∈ M A such that M h Z = A N. By Lemma 2.12(ii), (Transitivity) and (Symmetry) follow from the induction hypothesis.
In order to prove that the congruence A is non-trivial, we are going to characterize the equivalence M ≡ A N it in terms of confluent reductions. For this purpose, we extend → h in such a way that reductions are also possible within registers and elements of the input-tape of an addressing machine.
Definition 3.7. Define the reduction relation → c ⊆ M 2 A as the least relation containing → h and closed under the following rules: R, P, [a 0 , . . . , a n ] → c R, P, [a 0 , . . . , a i−1 , #M, a i+1 , . . . , a n ] (→ T i ) We write M → i N if N is obtained from M by directly applying one of the above rules -this is called an inner step of computation. The transitive and reflexive closure of → c and → i are denoted by c and i , respectively. M , P, T = N . This concludes the proof.
Morally, the term rewriting system (M A , → c ) is orthogonal because (i) the reduction rules defining → c are non-overlapping as → h is deterministic, (→ R i ) reduces a register and (→ T i ) reduces one element of the tape; (ii) the terms on the left-hand side of the arrow are linear, as no equality among subterms is required. Now, it is well-known that orthogonal TRS are confluent, but one cannot apply [Ter03, Thm.4.3.4] directly since we are not exactly dealing with first-order terms (because of the presence of the encoding).

Proposition 3.12. A
A is a non-extensional combinatory algebra. Proof. From the calculations in Example 3.5, it follows that #K · a · b A a and #S · a · b · c A (a · c) · (b · c) hold, for all a, b, c ∈ A. Notice that both addressing machines K and S are stuck, and K = A S since, e.g., K.r = S.r. By Theorem 3.11, we get #K A #S, whence A A is a combinatory algebra.
To check that A A is not extensional, it is sufficient to exhibit two elements of A that are extensionally equal, but distinguished modulo A . For instance, take #K · a and #K · a, where a ∈ A is arbitrary and K is a different implementation of the combinator K, namely: For all a, b ∈ A, easy calculations give #K · a · b A a. Thus, for all b ∈ A, we have whence the two addresses #K · a and #K · a are extensionally equal elements of A A . However, the corresponding addressing machines are both stuck and K @ [a] = A K @ [a] , because 1 = (K @ [a] ).r = (K @ [a] ).r = 2. Since they cannot have a common reduct, we derive K @ [a] ≡ A K @ [a] by Theorem 3.11. We conclude that #K · a A #K · a. Similarly, I = S @ [#K, #K] h stuck( #K, #K, ∅, Load 2; · · · , [] ). These two machines are both stuck and different modulo = A since, e.g., the contents of their register R 1 are #I and #K respectively, and it is easy to check that #I A #K. By Theorem 3.11, we conclude that #S · (#K · #I) · #I A #I.

Lambda Models via Applicative Equivalences
In the previous section we have seen that the equivalence A , thus ≡ A , is too weak to give rise to a model of λ-calculus (Lemma 3.13). The main problem is that a λ-term λx.M is represented as an addressing machine performing a "Load" (to read x from the tape) before evaluating the addressing machine corresponding to M . Since nothing is applied, the tape is empty and the machine gets stuck thus preventing the evaluation of the subterm M . In order to construct a λ-model we introduce the equivalence ae A below. Definition 4.1. Define the relation ≡ ae A as the least equivalence satisfying: We say that M and N are applicatively equivalent whenever M ≡ ae A N. Recall that ae A and = ae A are defined in terms of ≡ ae A as described in Definition 3.1. Also in this case, it is easy to check that = ae A ⊆ ≡ ae A holds. Remark 4.2. The rule (ae) shares similarities with the (ω)-rule in λ-calculus [Bar84, Def. 4.1.10], although being more restricted as only applicable to addressing machine that eventually become stuck. In particular, both rules have countably many premises, therefore a derivation of M ≡ ae A N is a well-founded ω-branching tree (in particular, the tree is countable and there are no infinite paths). Techniques for performing induction "on the length of a derivation" in this kind of systems are well-established, see e.g. [Bar71,IS06]. More details about the underlying ordinals will be given in Section 5.
Examples 4.3. Convince yourself of the following facts.
(i) As seen in the proof of Lemma 3.13, I and S @ [#K · #I, #I] both reduce to stuck machines. For all a ∈ A, we have that By (ae), they are applicatively equivalent.
The following rule is derivable: (cong) (iii) Therefore, ae A is a congruence on A = (A, ·, #K, #S). Proof. (i) By induction on a proof of M ≡ ae A N. Possible cases are: , by (i). So we conclude by transitivity.
(iii) By Lemma 3.2 A is an equivalence, by (ii) a congruence.
We need to show that the congruence ae A is non-trivial, and that the addresses of #K, #S remain distinguished modulo ae A .  (ii) This proof is the topic of Section 5. (iii) By (i), the relation is non-empty. By (ii), x i ≡ ae A x j if and only if i = j, whence there are infinitely many distinguished equivalence classes.
(iv) From Example 2.13, we get: For these machines to be ≡ ae A -equivalent, the former machine should reduce to x 1 , by (ii), which is impossible since #x 1 , Load −; Call 0, [] is stuck.
Remark 4.7. Let n ∈ N, and T = [a 1 , . . . , a n ] ∈ T A . We have: From now on, whenever writing |M | x , we assume that FV(M ) ⊆ x. The following are basic properties of the interpretation map defined above.
(iii) We proceed by structural induction on M . By (ii), if x = ∅ then both addressing machines reduce to stuck ones, so we can test the applicative equivalence by applying an arbitrary a and conclude using (ae) n-times.
Case M =ĉ. Then c[b/y] = c, and we have: Case M = y. Then y[b/y] =b and we have: Case M = λz.P , wlog z / ∈ y, x, so (λz.P )[b/y] = λz.P [b/y]. By (ii) both machines reduce to stuck ones. So we have to apply an extra a n+1 ∈ A.

Consistency Proof via Ordinal Analysis
In this section we adapt Barendregt's proof of consistency of λω (the least λ-theory closed under the (ω)-rule) to prove Lemma 4.5(ii), which entails the consistency of our system. First, we need to introduce in our setting the notion of context and underlined reduction, that are omnipresent techniques in the area of term rewriting systems.

Contexts and Underlined Head Reductions.
In λ-calculus a context is a λ-term possibly containing occurrences of an algebraic variable, called hole, that can be substituted by any λ-term possibly with capture of free variables. We will define a context-machine similarly, namely as an addressing machine possibly having a "hole" denoted by ξ. Formally, we introduce a new machine having no registers or program, only an empty tape (therefore distinguished from all machines populating M A ): We then extend our formalism to include machines working either directly or indirectly with one, or more, occurrences of ξ. We wish to ensure the invariant that a machine M with no occurrences of ξ maintain as address #M -for this reason we need to extend the range of addresses in a conservative way.
Consider a countable set B of addresses such that A ∩ B = ∅, and write X = A ∪ B for the set of extended addresses. As usual, we set Definition 5.1.
The number of occurrences of ξ in an extended machine X has been defined to handle the fact that recursively dereferencing all the addresses contained in an extended addressing machine might result in a non-terminating process (see Remark 2.9). As previously mentioned, a key property of contexts in λ-calculus is that one can plug a λ-term into the hole and obtain a regular λ-term. Similarly, given M ∈ M ξ X and X ∈ M ξ X , we can define the addressing machine X M obtained from X by recursively substituting (even in the registers/tapes) each occurrence of ξ by M. However, this operation is well-defined only when occ ξ (X) is finite, so we focus on extended machines enjoying this property.
(i) A context-machine is any C ∈ M ξ X satisfying occ ξ (C) ∈ N. (ii) Given a context-machine C and M ∈ M A , define the addressing machine C M as follows: where (assuming a ∈ X, T = [a 1 , . . . , a n ] ∈ T X with occ ξ (a :: T ) ∈ N): T M = [a 1 M , . . . , a n M ]. In the following, when writing C M (resp. a M , R i M , T M ) we silently assume that the number of occurrences of ξ in C (resp. a, R i , T ) is finite. Let us introduce a notion of reduction for context-machines that allows to mimic the underlined reduction from [Bar71]. The idea is to decompose a machine N as N = C M where C is a context-machine and M the underlined sub-machine. It is now possible to reduce C independently from M until either the machine reaches a final-state or ξ reaches the head-position. In the latter case, we substitute the head occurrence of ξ by M, and continue the computation.
(i) The head reduction → h is generalized to extended machines in the obvious way, using #(·) rather than #(·) to compute the addresses. In particular, the machine ξ @ T → h is in final state, but it is not stuck. ( Lemma 5.6. For C, C ∈ M ξ X and M, N ∈ M A , the following are equivalent: We conclude by induction hypothesis. Subcase C = R, P, T . By case analysis on P . All cases follow easily from the induction hypothesis. (2 ⇒ 1) By induction on the length n of the reduction C M h C . Subcase C = R, P, T . By case analysis on P . All cases follow easily from the induction hypothesis.
Consider now a scenario where C M h C M . Assuming M ≈ α N, one might expect that also C N h C N holds. In general, this is not the case because M and N might reach the head position and get control of the computation. Using the underlined (head-)reduction from Definition 5.4(ii) we can substitute N for M along the reduction (when it comes in head position) and construct a proof of C N ≡ γ C N having a lower ordinal γ < α.
Lemma 5.9. Let α > 0, C ∈ M ξ X , M, N ∈ M A such that M ≈ α N. If C → M h C and C M h stuck(), then there exists γ < α such that C N ≡ γ C N . Proof. By cases on the shape of C.
Case C = ξ @ T for some T ∈ T X and C = M @ T . From M ≈ α N and Lemma 5.8(v), we get that M h stuck(M ) for some M ∈ M A . Since C M = M @ (T M ) cannot reduce to a stuck addressing machine, we must have T M = []. In other words, T = [a 0 , . . . , a n ] for some n ≥ 0. Notice that, for all a i ∈ T X , we have a i N ∈ A (by construction). By Lemma 5.8(v), there exists γ < α such that N @ [a 0 N ] ≡ γ M @ [a 0 N ] . By definition: So we construct the proof: In all the other cases, C N → h C N , therefore C N ≡ 0 C N .
Corollary 5.10. Let n ∈ N, α > 0, C ∈ M ξ X , M, N ∈ M A . If C M h x n and M ≈ α N then there exists γ < α such that C N ≡ γ x n .
Proof. Assume C M h x n . Equivalently, by Lemma 5.6, we have C M h x n . By definition, there exists C 1 , . . . , C k ∈ M ξ X such that Notice that C i M h x n and, since ¬stuck(x n ), we have C i N h stuck(). By Lemma 5.9, there exists γ 1 , . . . , γ k < α such that C i N ≡ γ i C i+1 N . By transitivity (Tr α ) and (≤ α ) we obtain M ≡ α x n for α = sup i γ i .
Proposition 5.11. Let M, N ∈ M A , α ∈ ω 1 and n ∈ N. If M ≡ α N and N h x n then M h x n . Proof. We proceed by induction on α. Since we perform a double induction, the induction hypothesis with respect to this induction is called the α-IH (α-inductive hypothesis).
Case α = 0. By Lemma 5.8(ii), we get M ≡ ae A N h x n , so we conclude M h x n by confluence (Theorem 3.11) and → i -postponement (Lemma 3.8).
Case α > 0. By Lemma 5.8(iii), there exist Z 1 , . . . , Z k such that By induction on k, we prove that (5.1) implies M h x n . We call this k-IH. Subcase k = 0. Then M = N h x n and we are done. Subcase k > 0. From the k-IH we derive Z 1 h x n . From M ∼ α Z 1 and Lemma 5.8(iv), there is a context-machine C such that M = C[M ] and Z 1 = C[N ] with M ≈ α N and C[N ] h x n . By applying Lemma 5.9 we obtain C[M ] ≡ γ x n for some γ < α. We conclude by applying the α-IH.

Conclusions and Further Works
In this paper, we have shown that it is possible to obtain a model of the untyped λ-calculus based on a kind of computational machines that operate exclusively on "addresses", without any reference to some basic data-type. The result only depends on the assumption that every machine has a unique address (and vice versa every address identifies a machine) and is completely independent from the specific nature of the addresses themselves.
A natural question that can be raised is whether addressing machines can be seen as a representation of Combinatory Logic's operational semantics in disguise, since their instructions essentially incorporate the contents of the rewriting rules of the basic combinators. To correct this simplistic point of view, observe that the address table map is an arbitrary bijection, whence there are uncountably many possible choices. In particular, address table maps may have arbitrary computational complexity. On the contrary, the operational semantics is constrained to work with the subterms of the current term, i.e. it uses a very "narrow" address table map. We plan to investigate in future works what possibilities arise from the extra degree of freedom given by the arbitrary nature of this map.
We would like to explore whether the theory of the λ-model S defined in Section 5 depends on the specific nature of the bijection #(·) : A → M A . As discussed in Remark 2.9, certain ATMs display some peculiarities, since they may create infinite chains of references morally representing infinitary objects. In fact, given an ATM #(−) and an injection f : N → A, a simple application of Hilbert's Hotel allows to define a new ATM # (−) where machines (M f n ) n∈N satisfying M f n = f (n), ε, [# (M f n+1 )] exist. However, these machines are not λ-definable, whence they should simply constitute non-definable "junk" from the model-theoretic perspective. Therefore, we conjecture that Th(S) is actually independent from the choice of the lookup function #(·). In case of a positive answer, it would be interesting to provide a complete characterization of the associated λ-theory.
In Section 5 we have shown that Th(S) is neither extensional nor sensible. This is due to the fact that we kept our construction tight: at each step -from applicative structure, to combinatory algebra, and finally to λ-model -we added the minimal quotient resolving the issue. In order to obtain an extensional model, it would be sufficient to replace the rule (ae) with a form of extensionality non-restricted to machines that become stuck once executed. Similarly, a sensible model can be obtained by collapsing all the addresses of those machines exhibiting a non-terminating behaviour when executed on a number of indeterminates large enough. These quotients are not difficult to define, but the non-trivial problem becomes to prove that the resulting λ-model is non-trivial. This is left for further works.
A different line of research, more in the direction of functional programming, is to expand the computational capabilities of addressing machines by adding simple data-types and the associated basic operations. In fact, although data-types are unnecessary to achieve Turing-completeness, they are desirable to perform arithmetical operations and conditionals. Preliminary investigations [IMM21] show that extending addressing machines with numerals, conditional branching, natural numbers basic arithmetic instructions opens the way for representing Plotkin's PCF [Plo77]. These investigations show the precise simulation existing between addressing machine's head reduction and the corresponding evaluation strategy defined on PCF extended with explicit substitutions [LM99]. We will check if results of this kind extend to the call-by-value untyped setting. To begin with, we plan to study whether addressing machines can be used to represent the crumbling abstract machines from [ACGC19].
To perform some tests on addressing machines, we have implemented the formalism both in functional and imperative style. Even if the sources remain for internal use only, some technical choices deserve a discussion. Although not explicitly required by the definition, any implementation must rely on a computable association between addressing machines and the corresponding addresses. To implement such a bijection, one could try to use as addresses the actual pointers to the structures representing the machines, but the referenced data might change without affecting the address. A naive solution consists in defining an association list of type A × M A and an incremental approach. The list is initialized as the empty-list. When a new machine M is created, one checks whether M belongs to π 2 ( ): in the affirmative case there is nothing to do as the machine is already known; otherwise, a new address a is generated and the pair (a, M) is added to the list . This guarantees that an address uniquely identifies a machine and that, when an address is used, the corresponding machine has already been introduced. For a more optimized solution one should employ the hash-consing technique, allowing to implement the same concept in a more efficient way.