Rast: A Language for Resource-Aware Session Types

Traditional session types prescribe bidirectional communication protocols for concurrent computations, where well-typed programs are guaranteed to adhere to the protocols. However, simple session types cannot capture properties beyond the basic type of the exchanged messages. In response, recent work has extended session types with refinements from linear arithmetic, capturing intrinsic attributes of processes and data. These refinements then play a central role in describing sequential and parallel complexity bounds on session-typed programs. The Rast language provides an open-source implementation of session-typed concurrent programs extended with arithmetic refinements as well as ergometric and temporal types to capture work and span of program execution. To further support generic programming, Rast also enhances arithmetically refined session types with recently developed nested parametric polymorphism. Type checking relies on Cooper's algorithm for quantifier elimination in Presburger arithmetic with a few significant optimizations, and a heuristic extension to nonlinear constraints. Rast furthermore includes a reconstruction engine so that most program constructs pertaining the layers of refinements and resources are inserted automatically. We provide a variety of examples to demonstrate the expressivity of the language.

to this correspondence, the cut reduction properties of linear logic entail type safety of session typed processes and guarantee freedom from deadlocks (global progress) and session fidelity (type preservation) ensuring adherence to the communication protocols at runtime.
The Rast programming language is based on session types derived from intuitionistic linear logic, extended with equirecursive types and recursive process definitions. Rast also supports full parametric polymorphism enabling definition of polymorphic data structures. It furthermore supports arithmetic type refinements as well as ergometric and temporal types to measure the total work and span of Rast programs. The theory underlying Rast has been developed in several papers, including the Curry-Howard interpretation of linear logic as session-typed processes [CP10,CPT16], the treatment of general equirecursive types and type equality [GH05], asynchronous communication [GV10,DCPT12], ergometric types [DHP18b], temporal types [DHP18a], indexed types [GG13,DP20c], indexed type equality [DP20b], and nested polymorphism [DDMP21].
We focus on key aspects of language design and implementation, not the underlying theory which can be found in the cited papers. A notable exception is subtyping for the full language, including nested polymorphic types, whose properties are the subject of ongoing research. We present Rast in layers, using typing rules, an operational semantics, and examples to explain and illustrate increasingly advanced features via their type structure. All language layers satisfy the properties of preservation (session fidelity) and progress (deadlock-freedom), with slightly different statements depending on the semantic properties under consideration. For example, in the presence of ergometric types the sum of potential and total work expended remains constant. This paper is a significantly revised and extended version of a system description [DP20a], including the formal definition of typing and computation, nested parametric polymorphism, subtyping, and additional examples.
We begin with motivation and a brief overview of the main features of the language using a concurrent queue data structure as a running example. The following type specifies the interface to a polymorphic queue server in the system of basic recursive session types storing elements of type A and supporting the operations of insert and delete. The external choice operator dictates that the process providing this data structure accepts either one of two messages: the labels ins or del. In the case of ins, it receives an element of type A denoted by the operator, and then the type recurses back to queue[A]. On receiving a del request, the process can respond with one of two labels (none or some), indicated by the internal choice operator ⊕. If the queue is empty, it responds with none and then terminates (indicated by 1). If the queue is nonempty, it responds with some followed by the element of type A (expressed with the ⊗ operator) and recurses. The type queue[A] denotes the type name queue instantiated with the session type A.
However, the simple session type does not express the conditions under which the none and some branches must be chosen, which requires tracking the length of the queue. Rast extends session types with arithmetic refinements [DP20b,DP20c]  uses the index refinement n to indicate the number of elements in the queue. In addition, the type constraint ?{φ}. A read as "there exists a proof of φ" is analogous to the assertion of φ in imperative languages. Conceptually, the process providing the queue must provide a proof of n = 0 after sending none, and a proof of n > 0 after sending some respectively. It is therefore constrained in its choice between the two branches based on the value of the index n. Since the constraint domain is decidable and the actual form of a proof is irrelevant to the outcome of a computation, in the implementation no proof is actually sent. As is standard in session types, the dual constraint to ?{φ}. A is !{φ}. A (for all proofs of φ, analogous to the assumption of φ). We also add explicit quantifier type constructors ∃n. A and ∀n. A that send and receive natural numbers, respectively.
Arithmetic refinements are instrumental in expressing sequential and parallel complexity bounds. These are captured with ergometric [DHP18b, DBH + 21] and temporal session types [DHP18a]. They rely on index refinements to express, for example, the size of lists, stacks, and queue data structures, or the height of trees and express work and time bounds as a function of these refinements.
Ergometric session types [DHP18b] capture the sequential complexity of programs, often called the work. Revisiting the queue example, consider an implementation where each element in the queue corresponds to a process. Then insertion acts like a bucket brigade, passing the new element one by one to the end of the queue. Among multiple cost models provided by Rast is one where each send operation requires 1 unit of work (erg). In this cost model, such a bucket brigade requires 2n ergs because each process has to send ins and then the new element. On the other hand, responding to the del request requires only 2 ergs: the provider responds with none and closes the channel, or some followed by the element. This gives us the following type which expresses that the client has to send 2n ergs of potential to insert an element ( 2n ), and 2 ergs to delete an element ( 2 ). The ergometric type system (described in Section 6) verifies this work bound using the potential operators as described in the type. Temporal session types [DHP18a] capture the time complexity of programs assuming maximal parallelism on unboundedly many processors, often called the span. How does this work out in our example? We adopt a cost model where each send and receive action takes one unit of time (tick). First, we note that a use of a queue is at the client's discretion, so should be available at any point in the future, expressed by the type constructor . Secondly, the queue does not interact at all with the elements it contains, so they have to be of type A for an arbitrary A. Since each interaction takes 1 tick, the next interaction requires at least 1 tick to elapse, captured by the next-time operator . During insertion, we need more time than this: a process needs 2 ticks to pass the element down the queue, so it takes 3 ticks overall until it can receive the next insert or delete request after an insertion. This reasoning yields the following temporal type: We see that even though the bucket brigade requires much work for every insertion (linear in the length of the queue), it has a lot of parallelism because there are only a constant number of required delays between consecutive insertions or deletions.
Rast follows the design principle that bases an explicit language directly on the correspondence with the sequent calculus for the underlying logic (such as linear logic, or temporal or ergometric linear logic), extended with recursively defined types and processes. Programming in this fully explicit form tends to be unnecessarily verbose, so Rast also provides an implicit language in which most constructs related to index refinements and amortized work analysis are omitted. Explicit programs are then recovered by a prooftheoretically motivated algorithm [DP20c] for reconstruction which is sound and complete on valid implicit programs.
Rast is implemented in SML and available as an open-source repository [DDP19]. It allows the user to choose explicit or implicit syntax and the exact cost models for work and time analysis. The implementation consists of a lexer, parser, reconstruction engine, arithmetic solver, type checker, and an interpreter, with particular attention to providing precise error messages. The repository also contains a number of illustrative examples that highlight various language features, some of which we briefly sketch in this paper.
To summarize, Rast makes the following contributions: (1) A session-typed programming language with arithmetic refinements applied to ergometric and temporal types for parallel complexity analysis.
(2) An extension with full parametric polymorphism to enable generic programming.
(3) A subtyping algorithm that works well in practice despite its theoretical undecidability [DP20b] and uses Cooper's algorithm [Coo72] with some small improvements to decide constraints in Presburger arithmetic (and heuristics for nonlinear constraints). (4) A type checking algorithm that is sound and complete relative to subtyping. (5) A sound and complete reconstruction algorithm for a process language where most index and ergometric constructs remain implicit. (6) An interpreter for executing session-typed programs using the recently proposed shared memory semantics [PP20].

Example: An Implementation of Queues
We use the implementation of queues as sketched in the introduction as a first example program, starting with the indexed version. The concrete syntax of types is a straightforward rendering of their abstract syntax ( Each channel has exactly two endpoints: a provider and a client. Session fidelity ensures that provider and client always agree on the type of the channel and carry out complementary actions. The type of the channel evolves during communication, since it has to track where the processes are in the protocol as they exchange messages. In our example, we need two kinds of processes: an empty process at the end of the queue, and an elem process that holds an element x. The empty process provides an empty queue, that is, a service of type queue[A]{0} along a channel named q. It does not use any channels (indicated by an empty context '.'), so its type is declared with The turnstile '|-' separates the channels used from the channel that is provided (which is always exactly one, analogous to a value returned by a function). The notation elem[A]{n} indicates that the type A and natural number n are parameters of this process.
Listing 1 shows the implementation of the two forms of processes in Rast. Comments, starting with a % character and extending to the end of the line, provide a brief explanation for the actions of each line of code. This code is in explicit form, both in its refinements and polymorphism. Thus, the type and natural number parameters to a process need to be 9:6 A. Das  Rast also provides the programmer with an option to write code in implicit form where the two asserts would be omitted since they can be read off the type at the corresponding place in the protocol. Of course, the type checker verifies that the assertion is justified and fails with an error message if it is not, whether the construct is explicit or implicit. However, even in the implicit form, parameters (both type and natural numbers) to a process need to be provided to spawn a new instance of it. The full implementation can be found in tests/queues-quant.rast in the repository.
Concealing Queue Size. As a final pair of illustrations, we describe how to wrap a sized queue in an unsized one, and vice-versa. First, we define the type of an unsized queue (introduced in Section 1). % if 'none', send 'none' on u u <-> q % identify and terminate | some => x <-recv q ; % receive channel x from q u.some ; % send 'some' on u send u x ; % send channel x on u u <-conceal[A]{n-1} q ) ) The conceal process case analyzes on the label received on u. If it receives an ins request on u, it forwards the same on q and recurses at index n+1 Similarly, it forwards the del request to q and forwards on the response from q to u. In the none branch, it identifies the channels u and q and terminates. In the some branch, it forwards the element x received from q on to u and recurses at index n+1.
We can also define a dual display process that takes an unsized queue as input and provides a sized queue.  In this case, we use existential quantifier to express that there exists an n such that the unsized queue has size n. The display process will first send an index natural number n and then behave as a queue of size n. We omit the process definition which also exists in the file tests/queues-quant.rast in the repository.

Basic System of Session Types
The underlying system of session types is derived from a Curry-Howard interpretation [CP10,CPT16] of intuitionistic linear logic [GL87]. The key idea is that an intuitionistic linear sequent A 1 A 2 . . . A n A is interpreted as the interface to a process P . We label each of the antecedents with a channel name x i and the succedent with channel name z. The x i 's are channels used by P and z is the channel provided by P .
(x 1 : A 1 ), (x 2 : A 2 ), . . . , (x n : A n ) P :: (z : C) The resulting judgment formally states that process P provides a service of session type C along channel z, while using the services of session types A 1 , . . . , A n provided along channels x 1 , . . . , x n respectively. All these channels must be distinct. We often abbreviate the linear antecedent of the sequent by ∆. Thus, the formal typing judgment is written as where ∆ represents the linear antecedents x i : A i , P is the process expression and x : A is the linear succedent. Additionally, Σ is a fixed valid signature containing type and process definitions (explained later). Because it is fixed, we elide it from the presentation of the typing rules. We will extend the typing judgment in the subsequent sections as we introduce polymorphism (Section 4), refinements (Section 5), ergometric (Section 6), and temporal session types (Section 7). At runtime, a program is represented using a multiset of semantic objects denoting processes and messages defined as a configuration.
S ::= · | S, S | proc(c, P ) | msg(c, M ) We formalize the operational semantics as a system of multiset rewriting rules [CS09]. We introduce semantic objects proc(c, P ) and msg(c, M ) which mean that process P or message M provide along channel c. A process configuration is a multiset of such objects, where any two provided channels are distinct.
In this section, we briefly review the structural type formers that constitute the base fragment of Rast. The type grammar is defined as Types A, B, C ::= ⊕{ : Table 1 overviews the types, their associated process terms, their continuation (both in types and terms) and operational description. For each type, the first row describes the provider's viewpoint, while the second row describes the client's matching but dual viewpoint.
Choice Operators. The internal choice type constructor ⊕{ : A } ∈L is an n-ary labeled generalization of the additive disjunction A ⊕ B. Operationally, it requires the provider of x : ⊕{ : A } ∈L to send a label label k ∈ L on channel x and continue to provide type A k . The corresponding process term is written as (x.k ; P ) where the continuation P provides type x : A k . Dually, the client must branch based on the label received on x using the process term case x ( ⇒ Q ) ∈L where Q is the continuation in the -th branch.
(k ∈ L) ∆ P :: Communication is asynchronous, so that the provider (c.k ; P ) sends a message k along c and continues as P without waiting for it to be received. As a technical device to ensure that consecutive messages on a channel arrive in order, the sender also creates a fresh continuation channel c so that the message k is actually represented as (c.k ; c ↔ c ) (read: send k along c and continue along c ). This formulation has the advantage that a message is just a special form of process only with a different semantic symbol. The reason we distinguish processes and messages in the semantics is that messages, unlike processes, are only allowed to interact with other processes, not spontaneously create messages. For instance, in the ⊕S rule below, if we used proc to represent the message, we could apply the same rule recursively to create new messages. When the message k is received along c, we select branch k and also substitute the continuation channel c for c. Rules ⊕S and ⊕C below describe the operational behavior of the provider and client respectively.
The external choice constructor { : A } ∈L generalizes additive conjunction and is the dual of internal choice reversing the role of the provider and client. Thus, the provider branches on the label k ∈ L sent by the client. The typing rules are as follows Semantics rules S and C express the operational behavior at runtime.
Channel Passing . The tensor operator A ⊗ B prescribes that the provider of x : A ⊗ B sends a channel, say w of type A, and continues to provide type B. The corresponding process term is (send x w ; P ) where P is the continuation. Correspondingly, its client must receive a channel on x using the term (y ← recv x ; Q), binding it to variable y and continuing to execute Q.
The dual operator A B allows the provider (y ← recv x ; P ) to receive a channel of type A and continue to provide type B with process term P . The client of A B, on the other hand, sends channel w of type A and continues to use B using term (send x w ; Q). The semantics rules are the exact dual to ⊗.
Termination. The type 1 indicates termination requiring that the provider of x : 1 send a close message, formally written as (close x) followed by terminating the communication. Correspondingly, the client of x : 1 uses the term (wait x ; Q) to wait for the close message before continuing with executing Q. Linearity enforces that the provider does not use any channels, as indicated by the empty context in 1R.
Operationally, the provider waits for the closing message, which has no continuation channel since the provider terminates. Forwarding . A forwarding process (x ↔ y) identifies the channels x and y so that any further communication along either x or y will be along the unified channel. Its typing rule corresponds to the logical rule of identity.
Operationally, a process c ↔ d forwards any message M that arrives on d to c and vice-versa.
Since channels are used linearly, the forwarding process can then terminate, ensuring proper renaming, as exemplified in the rules below.
Process Definitions. Process definitions have the form ∆ f = P :: (x : A) where f is the name of the process and P its definition, with ∆ being the channels used by f and x : A being the provided channel. All definitions are collected in a fixed global signature Σ. For a valid signature, we require that ∆ P :: (x : A) for every definition, thereby allowing definitions to be mutually recursive. A new instance of a defined process f can be spawned with the expression x ← f y ; Q where y is a sequence of channels matching the antecedents ∆. The newly spawned process will use all variables in y and provide x to the continuation Q. The following def rule describes the typing of a spawn. The declaration of f is looked up in the signature Σ (first premise), and the channel types in ∆ are matched to the signature B (second premise). Similarly, the freshly created channel x has type B from the signature. The corresponding semantics rule defC also performs a similar substitution.
where y : Sometimes a process invocation is a tail call, written without a continuation as x ← f y. This is a short-hand for x ← f y ; x ↔ x for a fresh variable x , that is, we create a fresh channel and immediately identify it with x.
Type Definitions. Session types can be defined recursively, departing from a strict Curry-Howard interpretation of linear logic, analogous to the way pure ML or Haskell depart from a pure interpretation of intuitionistic logic. A type definition for a type name V is of the form V = A, where A is a type expression. The signature Σ contains all type definitions, that are possibly mutually recursive. For a well-formed signature, we require A to be contractive, i.e., A itself must not be a type name. Our type definitions are equirecursive so we can silently replace type names V by A during type checking, and no explicit rules for recursive types are needed. Because both process definitions and type definitions may be recursive, processes in our language may not be terminating.

Polymorphic Session Types
In this section, we describe the modifications to the Rast language required to realize nested polymorphism [DDMP21] to support general-purpose programming. First, due to the presence of type variables, the formal typing judgment is extended with v and written as v ; ∆ Σ P :: (x : A) where v stores the type variables (which we denote by α). We presuppose and maintain that all free type variables in ∆, P , and A are contained in v. The signature Σ is still fixed but the process and type definitions are now (possibly) parameterized by type variables.
To support polymorphism, we need two primary additions. First, we need to update the form of process and type definitions. Secondly, we add explicit quantifiers to allow exchange of types at runtime. Thus, the extended type grammar is is parameterized by a sequence of distinct type variables α that its definition A can refer to. We can use type names in an expression using V [B]. Type expressions can also refer to parameter α available in scope. The free variables in type A refer to the set of type variables that occur freely in A. Since types are equirecursive, the type V [B] is considered equivalent to its unfolding A[B/α]. All type names V occurring in a valid signature must be defined, and all type variables defined in a valid definition must be distinct. Furthermore, for a valid definition V [α] = A, the free variables occurring in A must be contained in α.
Process Definitions. Process definitions now have the form ∆ f [α] = P :: (x : A) where f is the name of the process and P its definition, with ∆ being the channels used by f and x : A being the offered channel. In addition, α is a sequence of type variables that ∆, P and A can refer to. These type variables are implicitly universally quantified at the outermost level. The spawn expression (x ← f [A] y ; Q) now takes a sequence of types A matching the type variables α, making the polymorphism explicit. The def rule is updated to reflect this modification.
Note that A is substituted for α while matching the types in ∆ and y (second premise). Similarly, the provided channel x has type B from the signature with A substituted for α.
Explicit Quantifiers. To support full parametric polymorphism, Rast also provides two dual type constructors, ∃α. A and ∀α. A to exchange types between processes.
The provider checks whether type B is valid and continues to provide type A[B/α]. The client receives a type, binding it to α which is ensured by adding α to v.
Operationally, the provider (send x [B] ; P ) sends the type B and the continuation channel c along c and continues executing P . The client receives the type B and continuation channel c and substitutes B for α and c for c in Q.
Dually, a provider with a universally typed session ∀α. A receives an arbitrary type, binds it to α, and proceeds to provide session prescribed by type A, possibly referring α. On the other hand, the client sends a valid type B and continues with session A[B/α]. The formal rules for ∀α. A are v, α ; ∆ P :: Since polymorphism is parametric, it is possible to avoid explicitly sending types at runtime if this optimization is desired and does not interfere with other lower-level aspects of an implementation such as dynamic monitoring.
Example: Context-Free Languages. Recursive session types capture the class of regular languages [TV16]. However, in practice, many useful languages are beyond regular. As an illustration, suppose we would like to express a balanced parentheses language, also known as the Dyck language [Dyc82] with the end-marker $. We use L to denote an opening symbol, and R to denote a closing symbol (in a session-typed mindset, L can represent client request and R is server response). We need to enforce that each L has a corresponding closing R and they are properly nested. To express this, we need to track the number of L's in the output with the session type. However, this notion of memory is beyond the expressive power of regular languages, so mere recursive session types will not suffice. We utilize the expressive power of nested types to express this behavior. ] incrementing the number of T's in the type by 1. Dually, whenever the type outputs R, it recurses with x decrementing the number of T's in the type by 1. The type D denotes a balanced word with no unmatched L's. Moreover, since we can only output $ (or L) at the type D and not R, we obtain the invariant that any word of type D must be balanced. If we imagine the parameter x as the symbol stack, outputting an L pushes T on the stack, while outputting R pops T from the stack. The definition of D ensures that once an L is outputted, the symbol stack is initialized with T[D] indicating one unmatched L. The file polytests/dyck.rast in the repository contains the complete code.

Refinement Session Types
In this section, we index types with arithmetic refinements that describe intrinsic attributes of corresponding channels. In addition to the type constructors arising from the connectives of intuitionistic linear logic (⊕, , ⊗, 1, ) and type variables arising from polymorphism, To account for refinements, the typing judgment is further extended and has the form where V are arithmetic variables n, C are constraints over these variables expressed as a single proposition, ∆ are the linear antecedents x i : A i , P is a process expression, and x : A is the linear succedent. We presuppose and maintain that all free index variables in C, ∆, P , and A are contained among V. As described in Section 4, the process and type definitions in signature Σ are now also parameterized by arithmetic variables. In addition we write V ; C φ for semantic entailment (proving φ assuming C) in the constraint domain where V contains all arithmetic variables in C and φ. We now describe quantifiers (∃n. A, ∀n. A) and constraints (?{φ}. A, !{φ}. A). An overview of the types, process expressions, their continuation, and operational meaning can be found in Table 3. and future computation cannot depend on the form of the proof (what is known in type theory as proof irrelevance) such messages are not actually exchanged. Instead, it is the provider's responsibility to ensure that φ holds, while the client is permitted to assume that φ is true. Therefore, and in an analogy with imperative languages, we write assert c {φ} ; P for a process that asserts φ for channel c and continues with P , while assume c {φ} ; Q assumes φ and continues with Q. ?L Notice how the provider must verify the truth of φ given the currently known constraints C (the premise V ; C φ), while the client assumes φ by adding it to C. Operationally, the provider creates a message containing the constraint (which simply evaluates to ) that is received by the client. Since the constraints exchanged at runtime are always trivial, we can skip the communication entirely for constraints. However, in the formal semantics we still require communication for uniformity with other type constructors. but what could we write for P 2 in the some branch? Intuitively, computation should never get there because the provider can not assert 0 > 0. Formally, we use the process expression 'impossible' to indicate that computation can never reach this spot: case c ( none ⇒ assume c {0 = 0} ; P 1 | some ⇒ assume c {0 > 0} ; impossible) In implicit syntax, we can omit the some branch altogether and it would be reconstructed in the form shown above. Abstracting away from this example, the typing rule for impossibility simply checks that the constraints are indeed unsatisfiable V ; C ⊥ V ; C ; ∆ impossible :: (x : A) unsat There is no operational rule for this scenario since in well-typed configurations the process expression 'impossible' is dead code and can never be reached.
Example: Binary Numbers. As another example, consider natural numbers in binary representation. The idea is that, for example, the number 13 in binary (1101) 2 form is represented as a sequence of messages (b1, b0, b1, b1, e, close) sent or received on a given channel with the least significant bit first. Here e represents 0 (the empty sequence of bits), while b0 and b1 represent bits 0 and 1, respectively.
type bin = +{ b0 : bin, b1 : bin, e : 1 } We can then index binary numbers with their value. Because (linear) arithmetic contains no division operator, we express the type bin{n} of binary numbers with value n using existential quantification, with the concrete syntax ?k. A for ∃k. A.

Ergometric Session Types
An important application of refinement types is complexity analysis. Prior works on resourceaware session types [DHP18b, DHP18a, DBH + 21] crucially rely on arithmetic refinements to express work and time bounds. The design principle we followed is that they should be conservative over the basic and indexed session types, so that previously defined programs and type-checking rules do not change. In this section, we review the ergometric type system that computes work intuitively defined as the total operations executed by the system.
The key idea is that processes store potential and messages carry potential. This potential can either be consumed to perform work or exchanged using special messages. The type system provides the programmer with the flexibility to specify what constitutes work. Thus, programmers can choose to count the resource they are interested in, and the type system provides the corresponding upper bound. Our current examples assign unit cost to message sending operations (exempting those for index objects or potentials themselves) effectively counting the total number of "real" messages exchanged during a computation.
Two dual type constructors r A and r A are used to exchange potential. Table 4 contains a description of these types and their operational behavior. The provider of x : r A must pay r units of potential along x using process term (pay x {r} ; P ), and continue to  provide A by executing P . These r units are deducted from the potential stored inside the sender. Dually, the client must receive the r units of potential using the term (get x {r} ; Q) and add this to its internal stored potential. Finally, since processes are allowed to store potential, the typing judgment records the potential available to a process above the turnstile V ; C ; v ; ∆ q Σ P :: (x : A). We allow potential q to refer to index variables in V to capture variable potential. The typing rules for r A are In both cases, we check that the exchanged potential in the expression and type matches (r 1 = r 2 ), and while paying, we ensure that the sender has sufficient potential to pay. We use distinct variables r 1 and r 2 to illustrate that the process and type expressions can use syntactically different but semantically equal annotations (e.g. r 1 = n + n, r 2 = 2 * n). On the other hand, the receiver adds the r 1 units to its process potential. We extend the semantic objects with work counters: proc(c, w, P ) (resp. msg(c, w, M )) denotes a process (resp. message) providing channel c, executing P (resp. M ) and having performed work w so far. The semantics rules for are expressed as follows. The freshly created message in rule S has not performed any work so far, therefore has w = 0. In the C rule, the work performed by the message is absorbed by the receiving process. Thus, the total work done is conserved, and no work is dropped by the system. Note that even though the message was created with w = 0, it can interact with forwarding processes and absorb the work performed by them. Thus, when they interact with the receiver process, they may have a different work annotation w . We follow the same approach for all the rules of operational semantics so far.
The dual type r A enables the provider to receive potential that is sent by its client. Its rules are the exact inverse of . Work is precise, that is, before terminating a process must have 0 potential, which can be achieved by explicitly consuming any remaining potential.
Example: Ergometric Queue. We have already seen the ergometric types of queues as a bucket brigade in the introduction. We show it now in concrete syntax, where <{p}| receives potential p. Interestingly, the exact code of Listing 1 will check against this more informative type (see file examples/list-work.rast). The cost model will insert the appropriate work {r} action and reconstruction will insert the actions to pay and get potential. For a queue implemented internally as two stacks we can perform an amortized analysis. Briefly, the queue process maintains two lists: one (in) to store messages when they are enqueued, and a reversed list (out) from which they are dequeued. When the client wishes to dequeue an element and the out list is empty, the provider reverses the in list to serve as the new out list. A careful analysis shows that if this data structure is used linearly, both insert and delete have constant amortized time. More specifically we obtain the type taking an ordinary, non-temporal program and adding delays capturing the intended cost. For example, if only the blocking operations should cost one unit of time, a delay is added before the continuation of every receiving construct. For type checking, the delay construct subtracts one operator from every channel it refers to. We denote consuming t units on the left of the context using [A] −t L , and on the right by As we will explain soon, consuming time units on the left and the right differ due to the and ♦ modalities.
To express the semantics, we now use proc(c, w, t, P ) to denote a process P at local clock t. This local clock advances by r as the process executes a delay (r).
( C) : proc(c, w, t, delay (r) ; P ) → proc(c, w, t + r, P ) Always A. A process providing x : A promises to be available at any time in the future, including now. When the client would like to use this provider it (conceptually) sends a message now! along x and then continues to interact according to type A.
A process P providing x : A must be able to wait indefinitely. But this is only possible if all the channels that P uses can also wait indefinitely. This is enforced in the rule by the condition ∆ delayed which requires each antecedent to have the form y i : n i B i . Eventually A. The dual of A is ♦A. A process providing ♦A promises to provide A eventually. When a process offering x : ♦A is ready, it will send a now! message along x and then continue at type A. Conversely, the client of x : ♦A will have to be ready and waiting for the now! message to arrive along x and then continue at type A. We use (when? (c) ; Q) for the corresponding client. The typing rules for now! and when? are somewhat subtle. The predicate C delayed ♦ means that C must have the form n ♦C (for some n) requiring that C may be delayed a fixed finite number of time steps and then must be allowed to communicate at an arbitrary time in the future. The semantics rules for ♦ are exact inverse of .   Table 5 provides a formal description of the temporal types. Since all temporal operators ultimately model time, they interact with each other and the temporal displacement operator used in LR rule needs to be generalized as described below. We define Below S denotes a non-temporal type. When the displacement is undefined, the LR rule cannot be applied. More details can be found in prior work [DHP18a]. Because Rast currently does not have reconstruction for time we have to update the program with the five temporal actions presented in this section (two instances of delay, two of when, and one of now). A key observation here is that in the case of elem the process r does not need to be ready instantaneously, but can be ready after a delay of 2 ticks, because that is how long it takes to receive the ins label and the element along q. This slack is also reflected in the type of empty because it becomes then back of a new element when the end of the queue is reached.

Subtyping
A late addition to Rast was the introduction of subtyping as a generalization of type equality. Declaratively, in the system of basic session types (Section 3), we have just two rules to plus the rules for subtyping A ≤ B. The latter follow Gay and Hole [GH05] by defining it coinductively as the largest type simulation. This includes the standard notion of depth and width subtyping for internal and external choice, except that our relation happens to be exactly reversed from theirs due to our intuitionistic framework. We introduce fresh internal definitions for all intermediate type subexpressions, which allows us a practically efficient implementation of their algorithm that incrementally constructs this simulation. For type-checking purposes, we can restrict the uses of subtyping to forwarding (rule id), spawn (rule def), and channel passing (rules ⊗R and L). Our implementation of the linear λ-calculus in Section 10 exploits subtyping by observing that all values (type val) are also expressions (type exp), that is, val ≤ exp. This means we can pass a channel of type val to a process expecting a channel of type exp, which is used in the implementation of β-reduction.
The algorithms for subtyping become increasingly more complicated with the addition of indexed types [DP20b] (which is in fact undecidable) and nested polymorphic types (the subject of ongoing research, generalizing type equality [DDMP21]). Nevertheless, the basic structure of incrementally constructing a simulation remains intact, just recognizing that a new pair is already in the partial simulation becomes more difficult. Our example suite shows that even in the undecidable cases, type checking (including subtyping) does not become a significant bottleneck.

Implementation
We have implemented a prototype for Rast in Standard ML (8100 lines of code). This implementation contains a lexer and parser (1200 lines), reconstruction engine (900 lines), an arithmetic solver (1200 lines), a type checker (2500 lines), pretty printer (400 lines), and an interpreter (200 lines). The source code is well-documented and available opensource [DDP19].
Syntax . Table 6 describes the syntax for Rast programs. Each row presents the abstract and concrete representation of a session type, and its corresponding providing expression. A program contains a series of mutually recursive type and process declarations and definitions. The first line is a type definition, where v is the name with type parameters a and index variables n and A is its definition. The second line is a process declaration, where f is the process name, (x 1 : A 1 ) . . . (x n : A n ) are the used channels and corresponding types, while the provided channel is x of type A. Finally, the last line is a process definition for the same process f defined using the process expression P . In addition, f can be parameterized by type variables a and index variables n. We use a hand-written lexer and shift-reduce parser to read an input file and generate the corresponding abstract syntax tree of the program.

9:22
A. Das x <-f x1 ... xn Table 6: Abstract and Corresponding Concrete Syntax for Types and Expressions The reason to use a hand-written parser instead of a parser generator is to anticipate the most common syntax errors that programmers make and respond with the best possible error messages.
Validity Checking . Once the program is parsed and its abstract syntax tree is extracted, we perform a validity check on it. We check that all index refinements, potentials, and delay operators are non-negative. We also check that all index expressions are closed with respect to the the index variables in scope, and similarly for type expressions. To simplify and improve the efficiency of the subtyping algorithm, we also assign internal names to type subexpressions [DP20c,DDMP21] parameterized over their free type and index variables. These internal names are not visible to the programmer.
Cost Model . The cost model defines the execution cost of each construct. Since our type system is parametric in the cost model, we allow programmers to specify the cost model they want to use. Although programmers can create their own cost model (by inserting work or delay expressions in the process expressions), we provide three custom cost models: send, recv, and recvsend. If we are analyzing work (resp. time), the send cost model inserts a work{1} (resp. delay{1}) before (resp. after) each send operation. Similarly, recv model assigns a cost of 1 to each receive operation. The recvsend cost model assigns a cost of 1 to each send and receive operation. Reconstruction and Type Checking . The programmer can use a flag in the program file to indicate whether they are using explicit or implicit syntax. If the syntax is explicit, the reconstruction engine performs no program transformation. However, if the syntax is implicit, we use the implicit type system to approximately type-check the program. Once completed, we use the forcing calculus, introduced in prior work [DP20c] to insert assert, assume, pay, get and work constructs. The core idea here is simple: insert assume or get constructs eagerly, i.e., as soon as available on a channel, and insert assert and pay lazily, i.e., just before communicating on that channel. The forcing calculus proves that this reconstruction technique is sound and complete in the absence of certain forms of quantifier alternations (which are checked before reconstruction is performed). We only perform reconstruction for proof constraints and ergometric types; reconstruction of type and index quantifiers and temporal constructs is left to future work. The implementation takes some care to provide constructive and precise error messages, in particular as session types (not to mention arithmetic refinements, ergometric types, and temporal types) are likely to be unfamiliar. We designed the abstract syntax tree to also contain the relevant source code location information which is utilized while generating the error message.
Subtyping . At the core of type checking lies subtyping, defined coinductively [GH05]. In the presence of arithmetic refinements, subtyping and also type equality are undecidable, but we have found what seems to be a practical approximation [DP20b,DDMP21], incrementally constructing a simulation closed under reflexivity. The data structures are rather straightforward, emphasizing simplicity over efficiency since subtyping tends to be fast in practice. There are several places where the translation from the underlying theory to the implementation are not straightforward. After the file has been read and the validity of types has been verified, we compute the variance of all type constructors in all type arguments (covariant, contravariant, nonvariant, or bivariant) by a greatest fixed point computation, starting from the assumption that all arguments are nonvariant. The variance information is then used when determining if a new subtyping goal is implied by the partial simulation constructed so far. The algorithm employs syntactic matching (allowing the type variables in the simulation to be instantiated) without consulting the partial simulation again (which could lead to nontermination). It also calls upon the constraint solver to determine if the constrained pairs in the partial simulation contain enough information to entail the constraints in the goal. The theory establishing soundness of this subtyping algorithm in the presence of nested polymorphic types under their coinductive interpretation is currently under development. Arithmetic Solver . To determine the validity of arithmetic propositions that is used by our refinement layer, we use a straightforward implementation of Cooper's decision procedure [Coo72] for Presburger arithmetic. We found a small number of optimizations were necessary, but the resulting algorithm has been quite efficient and robust.
(1) We eliminate constraints of the form x = e (where x does not occur in e) by substituting e for x in all other constraints to reduce the total number of variables.
(2) We exploit that we are working over natural numbers so all solutions have a natural lower bound, i.e., 0.
We also extend our solver to handle non-linear constraints. Since non-linear arithmetic is undecidable, in general, we use a normalizer which collects coefficients of each term in the multinomial expression.
(1) To check e 1 = e 2 , we normalize e 1 − e 2 and check that each coefficient of the normal form is 0.
(2) To check e 1 ≥ e 2 , we normalize e 1 − e 2 and check that each coefficient is non-negative.
(3) If we know that x ≥ c, we substitute y + c for x in the constraint that we are checking with the knowledge that the fresh y ≥ 0. (4) We try to find a quick counterexample to validity by plugging in 0 and 1 for the index variables.
If the constraint does not fall in the above two categories, we print the constraint and trust that it holds. A user can then view these constraints manually and confirm their validity. At present, all of our examples pass without having to trust unsolvable constraints with our set of heuristics beyond Presburger arithmetic.
Interpreter . The current version of the interpreter pursues a sequential schedule following a prior proposal [PP20]. We only execute programs that have no free type or index variables and only one externally visible channel, namely the one provided. When the computation finishes, the messages that were asynchronously sent along this distinguished channel are shown, while running processes waiting for input are displayed simply as a dash '-'. The interpreter is surprisingly fast. For example, using a linear prime sieve to compute the status (prime or composite) of all numbers in the range [2, 257] takes 27.172 milliseconds using MLton during our experiments (see machine specifications below).  (1) arithmetic: natural numbers in unary and binary representation indexed by their value and processes implementing standard arithmetic operations. (2) integers: an integer counter represented using two indices x and y with value x − y.
(3) linlam: expressions in the linear λ-calculus indexed by their size. (4) list: lists indexed by their size, and some standard operations such as append, reverse, map, fold, etc. Also provides and implementation of stacks and queues using lists. (5) primes: the sieve of Eratosthenes to classify numbers as prime or composite. (6) segments: type seg[n] = ∀k.list [k] list[n+k] representing partial lists with a constantwork append operation. (7) ternary: natural numbers and integers represented in balanced ternary form with digits 0, 1, −1, indexed by their value, and a few standard operations on them. This example is noteworthy since it is the only one stressing the arithmetic decision procedure. (8) theorems: processes representing valid circular [DP19] proofs of simple theorems such as n(k + 1) = nk + n, n + 0 = n, n * 0 = 0, etc. (9) tries: a trie data structure to store multisets of binary numbers, with constant amortized work insertion and deletion verified with ergometric types. We highlight interesting examples from some case studies showcasing the invariants that can be proved using arithmetic refinements and nested polymorphism.
Linear λ-Calculus. We implemented the linear λ-calculus with evaluation (weak head normalization) of terms. We use higher-order abstract syntax, representing linear abstraction in the object language by a process receiving a message corresponding to its argument. This is inspired by Milner's call-by-name encoding of the (nonlinear) λ-calculus in the 9:26

A. Das and F. Pfenning
Vol. 18:1 π-calculus [Mil92]. We expand on it by considering typing in metalanguage, and also provide a static analysis of size of normal forms and number of reductions.
type exp = +{ lam : exp -o exp, app : exp * exp } We would like evaluation to return a value (a λ-abstraction), so we take advantage of the structural nature of types (allowing us to reuse the label lam) to define the value type.
type val = +{ lam : exp -o exp } Rast can infer that val is a subtype of exp. We can derive constructors apply for expressions and lambda for values (we do not need the corresponding constructor for expressions).
decl eval : (e : exp) |-(v : val) proc v <-eval e = case e ( lam => v <-lambda e | app => e1 <-recv e ; % e = e2 v1 <-eval e1 ; case v1 ( lam => send v1 e ; v <-eval v1 ) ) If e sends a lam label, we just rebuild the expression as a value. If e sends an app label then e represents a linear application e 1 e 2 and the continuation has type exp ⊗ exp. This means we receive a channel representing e 1 and the continuation (still called e) behaves like e 2 . We note this with a comment in the source. We then evaluate e 1 which exposes a λ-expression along the channel v 1 . We send e along v 1 , carrying out the reduction via communication. The result of this (still called v 1 ) is evaluated to yield the final value v. This program is available in the repository at examples/linlam.rast.
We would now like to prove that the value of a linear λ-expression is smaller than or equal to the original expression. At the same time we would like to rule out a class of so-called exotic terms in the representation, which are possible due to the presence of recursion in the metalanguage. We achieve this by indexing the types exp and val with their size. For an application, this is easy: the size is one more than the sum of the sizes of the subterms.
type exp{n} = +{ lam : ... app : ?n1. ?n2. ?{n = n1+n2+1}. exp{n1} * exp{n2} } The size n 2 + 1 of a λ-expression is one more than the size n 2 of its body, but what is that in our higher-order representation? The body of a linear function takes an expression of size n 1 and then behaves like an expression of size n 1 + n 2 . Solving for n 2 then gives use the following type definitions and types for the constructor processes. The universal quantification over n 1 in the type of lam is important, because a linear λ-expression may be applied to an argument of any size. We also cannot predict the size of the result of evaluation, so we have to use existential quantification: The value of an expression of size n will have size k for some k ≤ n. Type-checking now verifies that if evaluation terminates, the resulting value is smaller than the expression (or of equal size). The repository contains the implementation in the file examples/linlam-size.rast.
Remarkably, ergometric session types can bound the number of reductions using an amortized analysis of work! For this, we assign 1 erg (unit of potential) to each λ-expression. Our cost model is that all operations are free, except the equivalent of a β-reduction which Languages. Designing complete languages like Rast frees the researcher from the limitations and idiosyncrasies of the host language as they explore the design space. A relatively early effort was the object-oriented language MOOL [Vas11] which distinguishes linear and nonlinear channels.
A different style of language is SePi [BMV12,FV13] based on the π-calculus. It supports linear refinements in terms of uninterpreted propositions (which may reference integers) in addition to assert and assume primitives on them. They are not intended to capture internal properties of data structures of processes; instead, they allow the programmer to express some security properties.
The CO 2 middleware language [BCPP15, BCM + 15] supports binary timed session types. The notion of time here is external. As such, it does not measure work or span based on a cost model like Rast, but specifies interaction time windows for processes that can be enforced dynamically via monitors.
Concurrent C0 [WPP16] is an implementation of linear and shared session types as an extension of C0, a small type-safe and memory-safe subset of C. It integrates the basic session types from Section 3 with shared session type [BP17] in the context of an imperative language. Relatedly, the Nomos language [DBH + 21] integrates linear and shared ergometric session types with a functional language to aid smart contract programming. Although Nomos does not support temporal types and polymorphism, it embeds a linear programming solver to automatically infer the exact potential annotations.
Links [LM16b, LM17, FLMD19] is a language aimed at developing web applications. While based on a different foundations, it is related to SILL [TCP13,Gri16] in that both integrate traditional functional types with linear session types. As such, they can express many (nonlinear) programs that Rast cannot, but they support neither arithmetic refinements nor ergometric or temporal types.
Context-free session types [TV16, AMV20] generalize ordinary session types with sequential composition as well as permitting some polymorphism. The linear sublanguage of context-free session types can be modeled in Rast with nested polymorphism [DDMP21]. Several dependent extensions for session types in prior work including proof exchanges [TCP11], subtyping and constraint relations based on ATS [WX17], and equality based on βηcongruences [TY18]. However, none of them formally investigate type equality, nor provide a type checking algorithm realized in an implementation.

Conclusion
This paper describes the Rast programming language. In particular, we focused on the concrete syntax, type checking and subtyping, parametric polymorphism [DDMP21], the refinement layer [DP20b,DP20c], and its applicability to work [DHP18b] and span analysis [DHP18a]. The refinements rely on an arithmetic solver based on Cooper's algorithm [Coo72]. The interpreter uses the shared memory semantics introduced in recent work [PP20]. We concluded with several examples demonstrating the efficacy of the refined type system in expressing and verifying properties about data structure sizes and values. We also illustrated the work and span bounds for several examples, all of which have been verified with our system, and are available in an open-source repository [DDP19].
In the future, we plan to address some limitations of the Rast language. One goal of Rast was to explore the boundaries of purely linear programming with general recursion. Often, this imposes a certain programming discipline and can be inconvenient if we need to drop Vol. 18:1 RAST 9:33 or duplicate channels. Recent work on adjoint logic [PP19] uniformly integrates different logical layers into a unified language by assigning modes to communication. We plan to utilize this adjoint formulation to support shared [BP17] and unrestricted channels. Prior work on the SILL [Gri16] and Nomos [DBH + 21] have demonstrated such an integration is helpful in general-purpose programming.
In the direction of parametric polymorphism, we plan to develop the theory of subtyping. Our initial investigation suggests that subtyping is undecidable, and thus would like to explore the boundaries of our current subtyping algorithm. Relatedly, we also plan to explore if we can extend the ideas of reconstruction to explicit quantifiers.
With respect to refinements, we intend to pursue richer constraint domains such as non-linear arithmetic, particularly SMT. We would also like to support reconstruction for the temporal fragment of Rast. Unfortunately, the operator affects all connected channels at once and its proof-theoretic properties are not as uniform as those of polymorphism, proof constraints, or ergometric types, posing a significant challenge. We also plan to explore dependent session type systems to express data-dependent distributed protocols.