Adventures in time and space

This paper investigates what is essentially a call-by-value version of PCF under a complexity-theoretically motivated type system. The programming formalism, ATR, has its first-order programs characterize the polynomial-time computable functions, and its second-order programs characterize the type-2 basic feasible functionals of Mehlhorn and of Cook and Urquhart. (The ATR-types are confined to levels 0, 1, and 2.) The type system comes in two parts, one that primarily restricts the sizes of values of expressions and a second that primarily restricts the time required to evaluate expressions. The size-restricted part is motivated by Bellantoni and Cook's and Leivant's implicit characterizations of polynomial-time. The time-restricting part is an affine version of Barber and Plotkin's DILL. Two semantics are constructed for ATR. The first is a pruning of the naive denotational semantics for ATR. This pruning removes certain functions that cause otherwise feasible forms of recursion to go wrong. The second semantics is a model for ATR's time complexity relative to a certain abstract machine. This model provides a setting for complexity recurrences arising from ATR recursions, the solutions of which yield second-order polynomial time bounds. The time-complexity semantics is also shown to be sound relative to the costs of interpretation on the abstract machine.


Introduction
A Lisp programmer knows the value of everything, but the cost of nothing.
-Alan Perlis Perlis' quip is an overstatement-but not by much. Programmers in functional (and objectoriented) languages have few tools for reasoning about the efficiency of their programs. Almost all tools from traditional analysis of algorithms are targeted toward roughly the first-order fragment of C. What tools there are from formal methods are interesting, but piecemeal and preliminary.
It is well-known that prn is not a BFF: starting with polynomial-time primitives, prn can be used to define any primitive recursive function. However as Cobham noted, if one modifies (1.1) by adding the side-condition: this modified prn produces definitions of just polynomial-time computable functions from polynomial-time computable primitives. Bellantoni and Cook [BC92] showed how get rid of explicit use of such a side condition through what amounts to a typing discipline. However, their approach (which has been in large part adopted by the implicit computational complexity community, see Hofmann's survey [Hof00]), requires that prn be a "special form" and that the f in (1.1) must be ultimately given by a purely syntactic definition. We, on the other hand, want to be able to define prn within ATR (see Figure 13) and have the definition's meaning given by a conventional, higher-type denotational semantics. We thus use Bellantoni and Cook's [BC92] (and Leivant's [Lei95]) ideas in both syntactic and semantic contexts. That is, we extract the growth-rate bounds implicit in the aforementioned systems, extend these bounds to higher types, and create a type system, programming language, and semantic models that work to enforce these bounds. As a consequence, we can define prn (with a particular typing) and be assured that, whether the f corresponds to a purely syntactic term or to the interpretation a free variable, prn will not go wrong by producing something of huge complexity. The language and its model thus implicitly incorporate side-conditions on growth via types. 4 Handling constructs like prn as first class functions is important because programmers care more about such combinators than about most any standard BFF.
Outline. Our ATR formalism is based on Bellantoni and Cook [BC92] and Leivant's [Lei95] ideas on using "data ramification" to rein in computational complexity. §3 puts these ideas in a concrete form of BCL, a simple type-level 1 programming formalism, and sketches the proofs of three basic results on BCL: (i) that each BCL expression is polynomial sizebounded, (ii) that computing the value of a BCL expression is polynomial time-bounded, and (iii) each polynomial-time computable function is denoted by some BCL-expression. Most of this paper is devoted to showing the analogous results for ATR. §4 discusses how one might change BCL into a type-2 programming formalism, some of the problems one encounters, and our strategies for dealing with these problems. ATR, our type-2 system, is introduced in §5 along with its type system, typing rules, and basic syntactic properties. The goal of § §6-10 is to show (type-2) polynomial size-boundedness for ATR. This is complicated by the fact (described in §6) that the naïve semantics for ATR permits exponential blow-ups. § §7-9 show how to prune back the naïve semantics to obtain a setting in which we can 4 Incorporating side-conditions in models is nothing new. A fixed-point combinator has the implicit sidecondition that its argument is continuous (or at least monotone) so that, by Tarski's fixed-point theorem [Win93], we know the result is meaningful. Models of languages with fixed-point combinators typically have continuity built-in so the side-condition is always implicit.
prove polynomial size-boundedness, which is shown in §10. The goal of § §11-15 is to show (type-2) polynomial time-boundedness for ATR. Our notion of the cost of evaluating ATR expressions is based on a particular abstract machine (described in §11.1) that implements an ATR-interpreter and the costs we assign to this machine's steps (described in §11.2). §12 and §13 set up a time-complexity semantics for ATR − expressions (where ATR − consists of ATR without its recursion construct) and establish that this time-complexity semantics is: (i) sound for the abstract machine's cost model (i.e., the semantics provides upper bounds on these costs), and (ii) polynomial time-bounded, that is that the time-complexity each ATR expression e has a second-order polynomial bound over the time-complexities of e's free variables. §16 shows that ATR can compute each type-2 basic feasible functional. §17 considers possible extensions of our work. We begin in §2 which sets out some basic background definitions with § §2.8-2.14 covering the more exotic topics. the second author describe this work along its evolution. Thanks to Neil Jones and Luke Ong for inviting the second author to Oxford for a visit and for some extremely helpful comments on an early draft of this paper. Thanks to Syracuse University for hosting the first author during September 2005. Thanks also to the anonymous referees of both the POPL version of this paper [DR06] and the present paper for many extremely helpful comments. Finally many thanks to Peter O'Hearn, Josh Berdine, and the Queen Mary theory group for hosting the second author's visit in the Autumn of 2005 and for repeatedly raking his poor type-systems over the coals until something reasonably simple and civilized survived the ordeals. This work was partially supported by EPSRC grant GR/T25156/01 and NSF grant CCR-0098198.

Notation and conventions
2. 1. Numbers and strings. We use two representations of the natural numbers: dyadic and unary. Each element of N is identified with its dyadic representation over { 0, 1 }, i.e., 0 ≡ ǫ, 1 ≡ 0, 2 ≡ 1, 3 ≡ 00, etc. We freely pun between x ∈ N as a number and a 0-1-string. Each element of ω is identified with its unary representation over { 0 }, i.e., 0 ≡ ǫ, 1 ≡ 0, 2 ≡ 00, 3 ≡ 000, etc. The elements of N are used as numeric/string values to be computed over. The elements of ω are used as tallies to represent lengths, run times, and generally anything that corresponds to a size measurement. Notation: For each natural number k, k = 0 k . Also x ⊕ y = the concatenation of strings x and y.

2.12.
Second-order polynomials. We define the second-order polynomials [KC96] as a type-level 2 fragment of the simply typed λ-calculus over base type T with arithmetic operations ∨, +, and * . Figure 4 gives the syntax, where the syntactic categories are: constants (K), raw expressions (P ), and type expressions (T ). We often write ∨-, +-, and * -expressions in infix form. The typing rules are Id-I, →-I, and →-E from Figure 2 together with the rules in Figure 5. Moreover, the only variables allowed are those of of type levels 0 and 1. Our semantics L (for length) for second-order polynomials is: L[[σ]] = MC σ for each σ, a simple type over T, and L[[Σ ⊢ p: σ]] = the standard definition. The depth a second-order polynomial q is the maximal depth of nesting of applications in q's β-normal

The BCL formalism
The programming formalisms of this paper are built on work of Bellantoni and Cook [BC92] and Leivant [Lei95]. Bellantoni and Cook's paper takes a programming formalism for the primitive recursive functions, imposes certain intensionally-motivated constraints, and obtains a formalism for the polynomial-time computable functions. To explain these constraints and how they rein in computational strength, we sketch both BCL, a simple type-1 programming formalism based on Bellantoni and Cook's and Leivant's ideas, and BCL's properties. 9 This sketch provides an initial framework for this paper's formalisms.
BCL has the same syntax as PCF ( §2.7) with three changes: (i) fix is replaced with prn (for primitive recursion on notation [Cob65]) that has the reduction rule given by (1.1), (ii) the only variables allowed are those of base type, and (iii) the type system is altered as described below. If we were to stay with the simple types over N and the PCF-typing rules (Figure 2 and with prn: (N → N → N) → N → N), the resulting formalism would compute exactly the primitive recursive functions. Instead we modify the types and typing as follows. N is replaced with two base types, N norm (normal values) and N safe (safe values), subtype ordered N norm ≤: N safe . The BCL types are just the type-level 0 and 1 simple types 8 Since, for example, for f : T → T, f ≡η λx f (x) and depth(λx f (x)) = 1. 9 BCL is much closer to Leivant's formalism [Lei95], which uses a ramified type system, than Bellantoni and Cook's, which does not use a conventional type system.
in prn g w Figure 8: Two sample BCL programs over N norm and N safe . Both base types have intended interpretation N. The point of the two base types is to separate the roles of N-values: a N norm -value can be used to drive a recursion, but cannot be the result of a recursion, whereas a N safe -value can be the result of a recursion, but cannot be used to drive a recursion. These intentions are enforced by the BCL typing rules, consisting of: ID-I, →-I, and →-E from Figure 2; Const-I, c 0 -I, c 1 -I, d-I, t 0 -I, t 1 -I, down-I, and If-I also from Figure 2 where each N is changed to N safe ; and the rules in Figure 7. (Zero-I and d-I ′ are needed to make the prn reduction rules typecorrect.) Figure 8 contains two sample BCL programs. For the sake of readability, we use the let construct as syntactic sugar. 10 Propositions 1, 2, and 3 state the key computational limitations and capabilities of BCL. In the following x: N norm abbreviates x 1 : N norm , . . . , x m : N norm and y: N safe abbreviates y 1 : N safe , . . . , y n : N safe . Recall from §2.13 that our standard notion of time complexity is the time cost model of the CEK-machine (Definition 48(a)).
Proposition 2's proof rests on three observations: (i) evaluating (prn e e ′ ) takes |e ′ |many (top-level) recursions, (ii) by the first observation and the details of CEK costs, the time-cost of a CEK evaluation of a BCL expression can be bounded by a polynomial over the lengths of base type values involved, and (iii) Proposition 1 provides polynomial bounds on all these lengths. Proposition 2 thus follows through a straightforward induction on the syntactic structure of e. Proposition 3's proof is mostly an exercise in programming.
]] = f . BCL is ≤:-predicative in the sense that no information about a N safe -value can ever make its way into a N norm -value. For example: BCL's ≤:-predicativity plays a key role in proving the polynomial size-bounds of Proposition 1, but plays no direct (helpful) role in the other proofs.

Building a better BCL
Our definition of ATR in the next section can be thought of as building an extension of BCL that: (i) computes the type-2 BFFs, (ii) replaces prn with something closer to fix, and (iii) admits reasonably direct complexity theoretic analyses. This section motivates some of the differences between BCL and ATR.
Types and depth. We want to extend BCL's type system to allow definitions of functions as such F 0 = λf ∈ N → N, x ∈ N f (f (x)), a basic feasible functional. A key question then is how to assign types to functional parameters such as f above. Under f : N norm → N safe , F 0 fails to have a well-typed definition. Under any of f : N norm → N norm , f : N safe → N safe , and f : N safe → N norm , F 0 has a well-typed definition, but then so does F 1 = λf ∈ N → N, x ∈ N f (|x|) (x) which is not basic feasible. Thus some nontrivial modification of the BCL types seems necessary for any extension to type-level 2.  We sketch a naïve extension that uses of an informal notion of the depth of an expression (based on second-order polynomial depth, see §2.12). Let the naïve depth of an expression (in normal form) be the depth of nesting of applications of type-level 1 variables. For example, given f : N → N → N, then f (c 0 (f (c 0 (x), y)), c 1 (d(y))) has naïve depth 2. We can regard the values of x and y as (depth-0) inputs and the values of c 0 (x) and d(y) as the results of polynomial-time computations over those inputs. Taking the type-level 1 variables as representing oracles, the value of f (c 0 (x), y) can then be regarded as a depth-1 input (that is an input that is in response to a depth-0 query); hence, c 0 (f (c 0 (x), y)) is the result of a polynomial-time computation over a depth-1 input. Similarly, the value of f (c 0 (f (c 0 (x), y)), c 1 (d(y))) can be regarded as a depth-2 input. Thus, our naïve extension amounts to having, for each d ∈ ω, depth-d versions of both N norm and N safe and treating all arrow types as "depth polymorphic" so, for instance, the type of f as above indicates that f takes depth-d safe values to depth-(d + 1) normal values, for each d ∈ ω. This permits a well-typed definition for F 0 , but not for F 1 .
The naïvete of the above is shown by another example. Let F 2 is basic feasible, |F 2 (f, y)| ≤ |y|, but it is reasonable to think of F 2 (f, y) having unbounded naïve depth. Our solution to this problem is to use a more relaxed version of ≤:-predictivity than that of BCL. To explain this let us consider BCL ′ , which is the result of adding rules of Figure 9 to BCL. (The rewrite rule for down is given in Figure 3.) These typing rules allow information about N safe values to flow into N norm values, but only in very controlled ways. In down-I ′ , the controlling condition is that the length of this N safe information is bounded by the length of some prior N norm value. In If-I ′ , essentially only one bit of information about a N safe value is allowed to influence the N norm value of the expression. Because of these controlling conditions, the proofs of Propositions 1, 2, and 3 go through for BCL ′ with only minor changes, but in place of Proposition 4 we have: Each BCL ′ type γ has a quantitative meaning in the sense that every element of { V[[e]] Γ ⊢ BCL ′ e: γ } has a polynomial size-bound of a particular form. ATR has rules analogous to If-I ′ and down-I ′ and, consequently, functions such as F 2 have well-typed definitions. Moreover, each ATR type γ has a quantitative meaning in the sense that { V[[e]] ⊢ ATR e: γ } = the set of all ATR-computable functions having second-order polynomial size-bounds of a form dictated by γ. In particular, for each γ, a d γ ∈ ω can be read off such that all the bounding polynomials for type-γ objects can be of depth ≤ d γ . This is the (non-naïve) connection of ATR's type-system to the notion of depth. The above glosses over the issue of the "depth polymorphic" higher types which are discussed in §5.
Truncated fixed points. For PCF, fix is thought as expressing general recursion. It would be ever so convenient if one could replace fix with some higher-type polynomial-time construct and obtain "the" feasible version of PCF in which all (and only) the polynomialtime recursion schemes are expressible. However, because of some basic limitations of subrecursive programming formalisms [Mar72,Roy87], it is unlikely that there is any finite collection of constructs through which one can express all and only such recursion schemes.
Our goals are thus more modest. We make use of the programming construct crec, for clocked recursion. The crec construct is a descendant of Cobham's [Cob65] bounded recursion on notation and not a true fixed-point constructor. The reduction rule for crec is: where a is a constant and x = x 1 , . . . , x k is a sequence of variables. Roughly, |a| acts as the tally of the number of recursions thus far and 0 ⊕ a is the result of a tick of the clock. The value of x 1 is the program's estimate of the total number of recursions it needs to do its job. Typing constraints will make sure that each crec-recursion terminates after polynomiallymany steps. Without these constraints, crec is essentially equivalent to fix. Clocking the fixed point process is a strong restriction. However, results on clocked programming systems ([RC94, Chapter 4]) suggest that clocking, whether explicit or implicit, is needed to produce programs for which one can determine explicit run-time bounds. Along with clocking, we impose two other restrictions on recursions.
One use. In any expression of the form (crec a (λ r f e)), we require that f has at most one use in e. Operationally this means that, in any possible evaluation of e, at most one application of f takes place. One consequence of this restriction is that no free occurrence of f is allowed within any inner crec expression. (Even if f occurs but once in an inner crec, the presumption is that f may be used many times.) Affine typing constraints enforce this one-use restriction. Note that prn is a one-use form of recursion.
The motivation for the one-use restriction stems from the recurrence equations that come out of time-complexity analyses of recursions. Under the one-use restriction, bounds on the cost of m steps of a crec recursion are provided by recurrences of the form T (m, n) ≤ T (m − 1, n) + q( n), where n represents the other parameters and q is a (second-order) polynomial. Such T 's grow polynomially in m. Thus, a polynomial bound on the depth of a crec recursion implies a polynomial bound on the recursion's total cost. If, say, two uses were allowed, the recurrences would be of the form T (m, n) ≤ 2 · T (m − 1, n) + q( n) and such T 's can grow exponentially in m. which every recursive call is a tail call. Formally, we say that (crec a (λ r f e)) expresses a tail recursion when each occurrence of f in e is as the head of a tail call in e. 11 Simplicity is the foremost motivation for the restriction to tail recursions as they are easy to work with from both programming and complexity-theoretic standpoints. Additionally, tail recursion is a well-studied and widely-used universal form of recursion: there are continuation passing style translations of many program constructs into pure tail-recursive programs. (Reynolds [Rey93] provides a nice historical introduction.) Understanding the complexity theoretic properties of tail-recursive programs should lead to an understanding of a much more general set of programs.

Affine tiered recursion
Syntax. ATR (for affine tiered recursion) has the same syntax as PCF with three changes: (i) fix is replaced with crec as discussed in the previous section, (ii) the only variables allowed are those of type-levels 0 and 1, and (iii) the type system is altered as described below.
Types. The ATR types consist of labeled base types (T 0 from Figure 10) and the level 1 and 2 simple types over these base types. We first consider labels (L from Figure 10).

Labels.
Labels are strings of alternating ⋄'s and ✷'s in which the rightmost symbol of a nonempty label is always ⋄. A label a k . . . a 0 can be thought of as describing programoracle conversations: each symbol a i represents an action (✷ = an oracle action, ⋄ = a program action) with the ordering in time being a 0 through a k . Terminology: ε = the empty label, ℓ ≤ ℓ ′ means label ℓ is a suffix of label ℓ ′ , and ℓ ∨ ℓ ′ is the ≤-maximum of ℓ and ℓ ′ . Also let succ(ℓ) = the successor of ℓ in the ≤-ordering, depth(ℓ) = the number of ✷'s in ℓ, and, for each d ∈ ω, Labeled base types. The ATR base types are all of the form N ℓ , where ℓ is a label. These base types are subtype-ordered by: N ℓ ≤: N ℓ ′ ⇐⇒ ℓ ≤ ℓ ′ . We thus have the linear ordering: N ε ≤: N ⋄ ≤: N ✷⋄ ≤: N ⋄✷⋄ ≤: · · · , or equivalently, N ✷ 0 ≤: N ⋄ 0 ≤: N ✷ 1 ≤: N ⋄ 1 ≤: · · · . Define depth(N ℓ ) = depth(ℓ). N ✷ d and N ⋄ d are the depth-d analogues of the BCL ′ -types N norm and N safe , respectively. These types can be interpreted as follows.
• A N ε -value is an ordinary base-type input or else is bounded by some prior (i.e., previously computed) N ε -value. • A N ⋄ d -value is the result of a (type-2) polynomial-time computation over N ✷ d -values or else is bounded by some prior N ⋄ d -value.
• A N ✷ d+1 -value is the answer to a query made to a type-1 input on N ⋄ d -values or else is bounded by some prior N ✷ d+1 -value. The N ✷ d types are called oracular and the N ⋄ d 's are called computational.
The ATR arrow types. These are just the level 1 and 2 simple types over the N ℓ 's. The subtype relation ≤: is extended to these arrow types as in (2.1). Terminology: Let shape(σ) = the simple type over N resulting from erasing all the labels. The tail of a type is given by: Let depth(σ) = depth(tail(σ)). When tail(σ) is oracular, we also call σ oracular and let side(σ) = ✷. When tail(σ) is computational, we call σ computational and let side(σ) = ⋄.
A type is impredicative when it fails to be predicative. An ATR type (σ 1 , . . . , σ k ) → N ℓ is flat when tail(σ i ) = N ℓ for some i. A type is strict when it fails to be flat.
Examples: N ε → N ⋄ is predicative whereas N ⋄ → N ε is impredicative, and both are strict. Both N ⋄ → N ⋄ and N ⋄ → N ✷⋄ → N ⋄ are flat, but the first is predicative and the second impredicative. Recursive definitions tend to involve flat types.
Example 23 below illustrates that values of both impredicative and flat types require special restrictions in any sensible semantics of ATR. Our semantic restrictions for these types are made precise in §7 and §9 below. Here we give a quick sketch of these restrictions as they figure in definition of ∝, the shifts-to relation, used in the typing rules. Typing rules. The ATR-typing rules are given in Figure 11. The rules Zero-I, Const-I, Int-Id-I, Subsumption, op-I, →-I, and →-E are essentially lifts from BCL (with one subtlety regarding →-E discussed below). The if-I and down-I rules were motivated in §4. The remaining three rules Aff-Id-I and crec-I (that relate to recursions and the split type contexts) and Shift (that coerces types) require some discussion.
Affinely restricted variables and crec. Each ATR type judgment is of the form Γ; ∆ ⊢ e: γ where each type context is separated into two parts: a intuitionistic zone (Γ) and an affine zone (∆). Γ and ∆ are simply finite maps (with disjoint preimages) from variables to ATRtypes. By convention, " " denotes an empty zone. Also by convention we shall restrict our attention to ATR type judgments in which each affine zone consists of at most one type assignment. (See Scholium 7(a).) In reading the rules of Figure 11, think of a variable in an affine zone as destined to be the recursor variable in some crec expression. An intuitionistic zone can be thought of as assigning types to each of the mundane variables.
Terminology: A variable f is said to be affinely restricted in Γ; ∆ ⊢ e: σ if and only if f is assigned a type by ∆ or is λ r -abstracted over in e. where:  The use of split type contexts is adapted from Barber and Plotkin's DILL [Bar96,BP97], 12 a linear typing scheme that permits a direct description of →, the intuitionistic arrow of the conventional simple types. The key rule borrowed from DILL is →-E which forbids free occurrences of affinely restricted variables in the operand position of any intuitionistic application. This precludes the typing of crec-expressions containing subterms such as λ r f (λg (g (g ǫ)) f ) ≡ β λ r f (f (f ǫ)) where f is used multiple times.
The crec-I rule forbids any free occurrence of an affinely restricted variable; if such a free occurrence was allowed, it could be used any number of times through the crecrecursion. The crec-I rule requires that the recursor variable have a type γ ∈ R which in turn becomes the type of the crec-expression. The restrictions in R's definition (in Figure 11) are a more elaborate version of the typing restrictions for prn-expressions in BCL. When γ = (N ✷ d , b 2 , . . . , b k ) → b ∈ R, it turns out that R's restrictions limit a typeγ crec-expression to at most p-many recursions, where p is some fixed, depth-d second-order polynomial (Theorem 43). Excluding N ⋄ , . . . , N ⋄ d−1 in γ forbids depth 0, . . . , d− 1 analogues of N safe -parameters from figuring in the recursion, and consequently, the recursion cannot accumulate information that could change the value of p unboundedly.

Scholium 7.
(a) Judgments with with multiple type assignments in their affine zone are derivable. However, such a judgment is a dead end in the sense that crec-I, the only means to eliminate an affine-zone variable, requires a singleton affine zone.
(b) ATR has no explicit ⊸-types. Implicitly, a (λ r f e) subexpression is of type γ ⊸ γ and crec-I plays roles of both ⊸-I and ⊸-E. ATR's very restricted use of affinity permits this ⊸-bypass.
(c) As mentioned in §4, the restriction to tail recursions in crec-I is in the interest of simplicity. In a follow-up to the present paper, we show how to relax this restriction to allow a broader range of affine recursions in ATR programs [DR07]. Dealing with this broader range of recursions turns out to require nontrivial extensions of the techniques of § §12-15 below.
Shift. The Shift rule covariantly coerces the type of a term to be deeper. Before stating the definition of the shifts-to relation (∝), we first consider the simple case of shifting types of shape N → N. The core idea is simply: The motivation for this is that if p and q are second-order polynomials of depths d p and d q , respectively, and x is a base-type variable appearing in p that is treated as representing a depth-d . The full story for shifting level-1 types has to account of arbitrary arities, the sides of the component types, and impredicative and flat types, but even so it is still not too involved. Shifting level-2 types involves a new set of issues that we discuss after dealing with the level-1 case. Recall that max(∅) = 0.
(a) We inductively define ∝ by: It follows from this and condition (i) that no type (or component of a type) can change sides as a result of a shift.
For level-1 types: Condition (i) says that the component types on the right are either the same as or else deeper versions of the corresponding types on the left. Condition (ii) preserves flatness (which is critical in level-2 shifting). Condition (iii) is just the core idea stated above. Note that the max in Definition 8(b) includes only types ≤: N ℓ 0 . This is because as remarked above, if σ i : N ℓ 0 , then the i-th argument has essentially no effect on the size of the N ℓ ′ 0 -result. Example: Consider the problem: What should the value of d be? Suppose f : N ✷ 0 → N ✷ 1 . Without using subsumption, building a term of type N ✷ 3 from f requires nesting applications of f (using type-1 shifts). The longest chain of such depth-increasing otherwise, where ℓ ′ = max{ ℓ i ℓ i < ℓ 0 } and ℓ ′′ is the suffix of ℓ following the leftmost occurrence of ℓ 0 in ℓ. Figure 12: The definition of undo applications is 3. 13 When the argument type N ✷ 0 → N ✷ 1 is shifted to N ✷ 0 → N ✷ 2 , each application of this argument now ups the depth by an additional +1. So, the largest depth that can result from the change is d = 3 + 3 · 1 = 6. When shifting ( σ) → N ℓ to some ( σ ′ ) → N ℓ ′ with each σ i and σ ′ i a level-1 type, to determine ℓ ′ we must: (a) determine all the ways a N ℓ value could be built by a chain of depth-increasing applications of arguments of the types σ, (b) for each of these ways, figure the increase in the depth of the N ℓ -value when each σ i -argument is replaced by its σ ′ i version, and (c) compute the maximum of these increases. To help in this, we introduce undo in Figure 12.
To compute undo(τ, N ℓ ), one determines if a type-τ argument could be used in a chain of depth-increasing applications that build a N ℓ value, and if so, one figures (in terms of ℓ) where a leftmost application of such an argument could occur, and returns the ≤:-largest type of the arguments of this application. (It is straightforward to prove that undo behaves as claimed.) N.B. If undo(γ, N ℓ ) is defined, then undo(γ, N ℓ ) : N ℓ . We now define: , when the σ i 's contain both level-0 and level-1 types and where ( γ) i denotes the subsequence of level-i types of γ.
The recursion of Definition 9(a) determines maximum increase in depth as outlined above. Since applications amount to simultaneous substitutions, the contributions of the level-0 and level-1 argument shifts are independent. Thus Definition 9(b)'s formula suffices for the general case. Example: See the discussion below of fcat from Figure 13. Now let us consider the reason behind condition (ii) in Definition 8(a). A term of a flat type can be used an arbitrary number of times in constructing a value. Consequently, if Definition 8(a) had allowed flat level-1 types (which increase the depth by 0) to be shifted to strict level-1 types (which increase the depth by a positive amount), then it would have been impossible to bound the depth increase of shifts involving arguments of flat types.
Some examples. Figure 13 contains five sample programs. These examples use the syntactic sugar of the let and letrec constructs. 14 The first three programs and their typing are 13 Note that the outer two of these three applications must involve shifting the type of the argument. Also, informally, in f (f (f (down(f (f (f (f (ǫ)))), ǫ)))) only the outer three applications of f count as a chain of depth-increasing applications because of the drop in depth caused by the down. Formally, no shadowed (Definition 29) application can be in a depth-increasing chain.
where len(z) = the dyadic representation of the length of z. This is a surprising and subtle example of a BFF due to Kapron [Kap91] and was a key example that lead to the Kapron-Cook Theorem [KC96]. In findk, we assume we have: a type- Filling in these missing definitions is a straightforward exercise. A more challenging exercise is to define (5.1) via prn's.

Semantics.
The CEK machine of ( §11.1) provides an operational semantics of ATR. For a denotational semantics we provisionally take the obvious modification of PCF's V-semantics.
(V was introduced in §2.7.) Example 23 illustrates some inherent difficulties with V as a semantics for ATR. We shall circumvent these difficulties by some selective pruning of V in §7 and §9.
Some syntactic properties.
Lemma 13 (Unique typing of subterms). If Γ; ∆ ⊢ e: σ, then each occurrence of a subterm in e has a uniquely assignable type that is consistent with Γ; ∆ ⊢ e: σ.
Lemma 11 follows from a straightforward structural induction on judgment derivations. The proof of Lemma 12 is an adaptation the argument for [Pie02, Theorem 15. 3.4]. The proof of Lemma 13 is also an adaptation of standard arguments. We make frequent, implicit use of Lemma 13 below. Lemma 14 is a reality check on the definition of ∝. The proof of this is a completely standard induction on derivations except in the case where the last rule used in deriving Γ; ∆ ⊢ λ x e: ( σ) → N ℓ is Shift. The argument for this case is an induction on the structure of e, where application is the key subcase. There one simply checks that our definition of ∝ correctly calculates upper bounds on the increase in depth.
ATR's computational limitations and capabilities. The major goals of the rest of the paper are to establish type-level 2 analogues of Propositions 1, 2, and 3 for ATR. We shall first prove Theorem 43, a polynomial size-boundedness result for ATR. The groundwork for this result will be the investigation of second-order size-bounds in the next few sections.
Remark 15 (Related work). As noted in §1, ramified types based on Bellantoni and Cook's ideas, higher types, and linear types are common features of work on implicit complexity (see Hofmann's survey [Hof00]), but most of that work has focused on guaranteeing complexity of type-level 1 programs. The ATR type system is roughly a refinement of the type systems of [IKR01,IKR02] which were constructed to help study higher-type complexity classes. Also, the type systems of this paper and [IKR01,IKR02] were greatly influenced by Leivant's elegant ramified type systems [Lei95,Lei94]. We note that in [Lei03] Leivant proposes a formalism that uses intersection types to address the same problems dealt with by our Shift rule (e.g., how to type f (f (x))). Cook's ideas are predicative in the sense of Proposition 4-no information about "safe values" can influence "normal values." Two principles followed in this paper are: (i) The ramification of data (e.g., the normal/safe distinction) and the complexity it adds to the type system is something we will put up with to control the size of values; (ii) however, if there is a good reason to cut through the ramification while still controlling sizes, then we will happily do so. As a consequence of (i), our type system for second-order polynomial size-bounds is strictly predicative. As a consequence of (ii), ATR's type system includes the if-I and down-I rules and impredicative types to handle examples like F 2 of (4.1).
There is a price for the down construct-its use tends to complicate correctness arguments for algorithms. For example, consider the subexpression (down (k + 1) x) in the ATR-program for findk in Figure 13. The purpose of the down is to guarantee to the type system that the subexpression's value is small (e.g., ≤ |x|). The correctness of the algorithm depends critically on the easy observation that, in any run of the program, the value of the subexpression will always be k + 1. This is common in expressing algorithms in ATR-one knows that a value is small, but an application of down is needed to convince the type-system of this. As a result the correctness proof needs a lemma showing that original value is indeed small and the down expression does not change the value. Thus our use of down and (mild) impredicativity is a compromise between the simplicity, but restrictiveness, of predicative systems and the richer, but more complex, type systems that permit finer reasoning about size. 15 6. Size bounds 6.1. The second-order polynomials under the size types. To work with size bounds, we introduce the size types and a typing of second-order polynomials under these types. The size types parallel the intuitionistic part of ATR's type system.
These |σ|'s are the size types. All the ATR-types terminology and operations (e.g., shape, tail, ≤:, ∝, etc.) are defined analogously for size types. Later, a pruned version of the L-semantics will end up as our intended semantics for the second-order polynomials to parallel our pruning of the V-semantics for ATR.
The following definition formalizes what it means for an ATR expression to be polynomially size-bounded. N.B. This definition heavily overloads the "length of" notation, |·|. In particular, if x is an ATR variable, we treat |x| as a size-expression variable. Definition 18(c) is based on a similar notion from [IKR02]. Lemma 21 (Label Soundness). Suppose Σ ⊢ p: σ has a derivation in which the only types assigned by contexts are from  Example 23. Let e 1 and e 2 be as given in Figure 15, let prn be as in Figure 13, and let dup be an ATR-version of the definition in Figure 8.
The problem of Example 23(a) is addressed in §7 by pruning the Land V-semantics to restrict impredicative-type values. The problem of Example 23(b) is addressed in §9 by further pruning to restrict flat-type values.

Impredicative types and nearly well-foundedness
Failing to restrict impredicative-type values leads to problems like the one of Example 23(a). These problems can be avoided by requiring that each impredicative-type value have a length that is nearly well-founded.
] is γ-well-founded when γ = T ℓ or else γ = (σ 1 , . . . , σ k ) → T ℓ and, for each i with tail(σ i ) : T ℓ , the function t has no dependence on its i-th argument. A t is nearly γ-well-founded when there is a γ-well-founded t ′ such that t ≤ t ′ .
Remark 25. Why nearly well-founded? The natural sources of ATR-terms with impredicative types are the if-then-else and down constructs. Let c = λx, y, z (if x then y else z) and d = λx, y (down x y), where ⊢ c: Neither |c| nor |d| is wellfounded since |c| = λk, m, n (m, if k = 0; n, otherwise) and |d| = λk, m min(k, m). However, both |c| and |d| are nearly well-founded as |c| ≤ λk, m, n (m ∨ n) and |d| ≤ λk, m m.
Lemma 26 follows by a straightforward induction and indicates that a semantics for the second-order polynomials based on nearly well-foundedness will be well defined. Ter- There is still a problem with impredicative-type values. In deriving closed-form upper bounds on recursions, we often need a well-founded upper bound on the value of a variable of an impredicative type. There is no effective way to obtain such bound. We thus do the next best thing: give a canonical such upper bound a name and work with that name.
Definition 28. We add a new combinator, p, to the second-order polynomials such that Figure 17 for p's typing rule.) For each variable x, we abbreviate (p x) by p x .
The choice p makes is analogous to choice of a in the situation where one knows f ∈ O(n) and picks the least a ∈ ω such that f (n) ≤ a · (n + 1) for all n ∈ ω. In most uses, p x 's are destined to be substituted for by concrete, well-founded terms.
To help work with terms involving impredicative types we introduce: Definition 29 (Shadowing). Suppose Σ ⊢ p: σ. An occurrence of a subterm r of p is shadowed when the occurrence properly appears within another shadowed occurrence or else the occurrence has an enclosing subexpression (t r) where the occurrence of t is of an impredicative type σ → τ with tail(σ) : tail(τ ). A variable x is a shadowed free variable for p when all of x's free occurrences in p are shadowed; otherwise x is an unshadowed free variable for p.

Safe upper bounds
The restriction to the V nwf -semantics solves the problem with impredicative types, but not the problem with flat types. To work towards a solution of this later problem, in this section we introduce the notion of a safe second-order polynomial (Definition 30) and show that any expression (in a simplification of ATR) that does not involve flat-type variables has a safe upper bound. The next section proposes a solution to the flat-type problem: that each flat-type length must have a safe upper bound. Theorem 43, in §10, shows that this proposed solution does indeed work. Convention: In this section b, γ, σ, and τ range over size types. In writing p = (x p 1 . . . p k ), we mean x is a variable and, when k = 0, p = x.
(a) We say that p is b-strict with respect to Σ when tail(γ) ≤: b and every unshadowed free-variable occurrence in p has a type with tail : b.
(b) We say that p is b-chary with respect to Σ when γ = b and either (i) p = (x q 1 · · · q k ) with each q i b-strict or (ii) p = p 1 ∨ · · · ∨ p m , where each p i satisfies (i). (Note that 0 sneaks in as b-chary; take m = 0 in (ii).) (c) We say that p is γ-safe with respect to Σ if and only if (i) when γ = T ✷ d , then p = nwf q ∨ r where q is γ-strict and r is γ-chary, (ii) when γ = T ⋄ d , then p = nwf q + r where q is a γ-strict and r is γ-chary r, and (iii) when γ = σ → τ , then (p x) is τ -safe with respect to Σ, x: σ.
With the above notions, we drop the "with respect to Σ" when Σ is clear from context. Examples: Recall the bound p + n j=1 |y j | of Proposition 1(b). In terms of the sizetypes, the subterm p is T ⋄ -strict, the subterm n j=1 |y j | is T ⋄ -chary, and hence, p + n j=1 |y j | 24 N. DANNER AND J. S. ROYER s-I: Figure 16: The additional typing rules for GR is T ⋄ -safe. Roughly, Proposition 1 implies that each BCL expression has a safe size-bound.
Strictness and chariness are syntactic notions, whereas safety is a semantic notion because of the use of = nwf in Definition 30(c). Thus: Proof. Since 0 is both b-strict and b-chary and since p = nwf p ∨ 0 = nwf 0 ∨ p = nwf p + 0 = nwf 0 + p, the lemma follows.
The next lemma notes a key property of safe second-order polynomials.
Lemma 32 (Safe substitution). Fix Σ. Given a γ-safe p 0 , a σ-safe p 1 , and a variable x with Σ(x) = σ, we can effectively find a γ-safe p ′ 0 such that p 0 [x : = p 1 ] ≤ nwf p ′ 0 . Proof. Except for the case when p 1 is a λ-expression, the argument is a straightforward induction. When p 1 is a λ-expression, the substitution can trigger a cascade of other substitutions to deal with. However, as we are working with an applied simply-typed λ-calculus, strong normalization holds [Win93], and hence, these cascades are finite. Consequently, to deal with this case we simply use a stronger induction than before, say on the syntactic structure of p 0 and p 1 and on the length of the longest path of β-reductions to normal form of p 0 [x : = p 1 ]. This is fairly conventional and left to the reader. 18 Remark 25 informally argued that if e, an ATR expression, does not involve impredicative-type variables, then |e| has a well-founded upper bound. The analogous argument here would be that if e does not involve flat-type variables, then |e| has a safe upper bound. This assertion is true, but not so interesting because most natural crec-expressions have their recursor variable of flat type. To get around this problem we introduce a little formalism, GR (for growth rate) which includes a simple iteration construct that does not depend so heavily on flat-type variables and which captures ATR's growth rate properties including ATR's difficulty with flat-type values. We show in Theorem 34 that GR expressions that do not involve flat-type variables have safe upper bounds. We straightforwardly extend the L nwf -semantics for second-order polynomials to GR. Note: L nwf [[λm, n (R f m n)]] ρ = λm, n ∈ ω n2 m when ρ(f ) = λk ∈ ω 2k. So GR has familiar problems with flat-type values. We note that the GR analogues of Lemmas 19, 20, and 22 all hold. Terminology: Σ ⊢ s: σ is flat-type-variable free when no variable is explicitly or implicitly assigned a flat type by the judgment.
Theorem 34. Given a flat-type-variable free Σ ⊢ s: γ, we can effectively find a γ-safe p s with respect to Σ such that s ≤ nwf p s . Moreover, we can choose p s so that all free variable occurrences are unshadowed. Proof. Without loss of generality we assume that s is in β-normal form. The argument is a structural induction on the derivation of Σ ⊢ s: γ. We consider the cases of the last rule used in the derivation. Let d range over ω. Case Case: →-I. This case follows by the induction hypothesis and clause (iii) in Definition 30(c).
Case: →-E. This case follows by the induction hypothesis and Lemma 32. Case: Subsumption. Then by Subsumption we know that Σ ⊢ s: γ ′ where γ ′ ≤: γ. Without loss of generality, we assume γ ′ : γ. By the induction hypothesis there exists p, a γ ′ -safe size-bound for s with respect to Σ. It follows from Definition 30 that p is γ-strict with respect to Σ. Hence, p s = p suffices. Case: Thus this case follows from Lemma 20 (in both its second-order polynomial and GR versions) and Definition 30.
Case: s-I. Then s = (s s 1 ) and γ = T ⋄ d . So by s-I, we know that Σ ⊢ s 1 : T ⋄ d and by the induction hypothesis we have that there is a T ⋄ d -strict q and a T ⋄ d -chary r with s 1 ≤ nwf q + r. Thus p s = (q + 1) + r suffices since q + 1 is T ⋄ d -strict.
Lemma 36's proof is an induction on the derivation of Σ ⊢ p: σ. Everything is fairly straightforward except that the →-E case depends critically on Lemma 32. Lemma 36 indicates that a semantics for the second-order polynomials based on well-temperedness will be well defined. There is still a problem with flat-type values. To give closed-form upper bounds on recursions, we sometimes need to decompose a safe flat-type polynomial into strict and chary parts. (Recall that safety is a semantic, not syntactic, notion.) For flat-type-variable free safe polynomials this is easy. A way of breaking flat-type variables into strict and chary parts would allow us to extend this decomposition to all safe polynomials. We introduce two new combinators to effect such a decomposition. Since there is no canonical way to do this decomposition, we take a different (and trickier) approach from that of Definition 28.
Definition 38. We add two new combinators, q and r, to the second-order polynomials with typing rules given in Figure 17 . . , x im , x ′′ = x j 1 , . . . , x jn , and { z 1 , . .
, where (i) q is b-strict with respect to Σ, x: σ, (ii) r is b-chary with respect to Σ, x: σ and r has no occurrence of any z i , and (iii) , for oracular γ.
For each variable x, we abbreviate (q x) by q x and (r x) by r x . Also, we take (q x x ′ ) as being b-strict and (r x x ′′ ) as being b-chary.
By Definitions 35 and 37, q and r as in Definition 38 must exist. By the axiom of choice, there are functions that pick out particular q and r. N.B. The choices of q and r are arbitrary subject to satisfying conditions (i), (ii), and (iii) of Definition 38. The semantics for the second-order polynomials is thus parameterized by the functions that pick out the required q's and r's. The choices q and r make are analogous to the choices of a and b ∈ ω in the situation were one knows that f ∈ O(n) and picks some arbitrary a and b such that f (n) ≤ a · n + b for all n.  Lemma 41 (Manifestly safe substitution). Fix Σ. Given a manifestly γ-safe p 0 , a manifestly σ-safe p 1 , and a variable x with Σ(x) = σ, we can effectively find a manifestly γ-safe p ′ 0 such that p 0 [x : = p 1 ] ≤ nwf p ′ 0 . Proof. This is a straightforward adaptation of the proof of Lemma 32.
We now have a reasonable semantics for ATR and the tools to work with this semantics to establish (in Theorem 43) a safe polynomial boundedness result for ATR, where: Definition 42. Suppose Γ; ∆ ⊢ e: σ. We say that p is a |σ|-safe polynomial size-bound for e with respect to Γ; ∆ when p is a |σ|-safe second-order polynomial with respect to |Γ; ∆| and |V wt [[e]] ρ| ≤ L wt [[p]] |ρ| for all ρ ∈ V wt [[Γ; ∆]]; if in addition p is manifestly |σ|-safe with respect to Γ; ∆, we say that p is a manifestly |σ|-safe polynomial size-bound for e with respect to Γ; ∆. (The "with respect to" clause is dropped when it is clear from context.)
Proof. The argument is a structural induction on the derivation of Γ; ∆ ⊢ e: γ. We consider the cases of the last rule used in the derivation. Excluding the crec case, everything is fairly straightforward. Fix ρ ∈ V wt [[Γ; ∆]]. Note that |V wt [[ · ]] ρ| is invariant under β-and η-equivalence. So without loss of generality, we assume that e is in β-normal form.
Cases: Int-Id-I and Aff-Id-I. Then e = x, a variable. Subcase: γ is strict. Then p e = |x| clearly suffices. Subcase: γ is flat. Hence, level(γ) = 1. Let Then by Definitions 37 and 38, Case: c a -I. Then e = (c a e ′ ) for some e ′ and γ = N ⋄ d for some d. By the induction hypothesis, there is p e ′ , a manifestly T ⋄ d -safe polynomial size-bound for e ′ with respect to Γ; ∆. Clearly p e = 1 + p e ′ suffices.
Cases: Subsumption and Shift. These follow as in the proof of Theorem 34. Aside: For the arguments for the →-I and →-E cases below, recall from §2.10 that (2.2) and (2.3) provide the definition of length for elements of TC of type-level 1 and type-level 2, respectively, and that higher-type lengths are pointwise monotone nondecreasing.
Case: →-I. Then e = λx e ′ and γ = σ → τ . By our induction hypothesis, there is a p e ′ , a manifestly |τ |-safe polynomial size bound for e ′ with respect to Γ, x: σ; ∆. Let Case: →-E. Then e = (e 0 e 1 ) and for some σ we have that Γ; ∆ ⊢ e 0 : σ → γ and Γ; ⊢ e 1 : σ. By the induction hypothesis, there are p e 0 and p e 1 such that p e 0 is a manifestly (|σ| → |τ |)-safe polynomial size bound for e 0 and p e 1 is a manifestly |σ|-safe polynomial size-bound for e 1 . By Lemma 41 we can effectively find a manifestly γ-safe p e such that (p e 0 p e 1 ) ≤ wt p e . Then we have the chain of bounds of Figure 19. Clearly this p e suffices.
Case: If-I. Then e = (if e 0 then e 1 else e 2 ). By the induction hypothesis, there are p e 1 and p e 2 , manifestly |γ|-safe polynomial size-bounds for e 1 and e 2 respectively. Clearly p e = p e 1 ∨ p e 2 suffices.
We have just one case left, but now the real work starts.    Figure 11.) For simplicity we assume { b 1 , . .
Without loss of generality we suppose: where Γ; f : γ ⊢ B: b 0 for Γ = Γ, x 1 : b 1 , . . . , x k : b k , B is in β-normal form, and TailPos(f, B). Aside: To find p e for this case, we analyze e's tail recursion and determine size bounds on how large the tail-recursion's arguments can grow. In particular, we show that there is a polynomial bound beyond which the first argument cannot grow; hence, by (4.2), this polynomial bounds the depth of e's tail recursion. From this bound on recursion depth and from the size bounds on the tail-recursion arguments, constructing p e is straightforward. To derive these bounds, we proceed a little informally and work with unfolded versions of e.
Consider the occurrences of f in B. Since we have TailPos(f, B) and Γ; f : γ ⊢ B: b 0 , these occurrences must have enclosing expressions of the form (f e 1 . . . e k ), where Γ; ⊢ e 1 : b 1 , . . . Γ; ⊢ e k : b k . For a given such subexpression of B, we know by the induction hypothesis that, for each i = 1, . . . , k, there is a p i , a manifestly b i -safe polynomial sizebound for e i with respect to Γ; . Since f occurs but finitely many times in B, we may choose p 1 , . . . , p k so that they bound the size of the corresponding argument expressions for every f -application in B. Without loss of generality, we assume that if b i = b j , then p i = p j .
By the induction hypothesis, we may take p i to be q ∨ r ∨ t ∨ t, where q is T ✷ 0 -strict, r is strictly T ✷ 0 -chary, t = b 0 a=0 |u 0 a |, and t = c 0 a=0 |u 0 a |. It follows from the size typing rules that the only T ✷ 0 -strict terms are = wt 0. So, it suffices to take p i = r ∨ t∨t. Note that r = r ξ and t = t ξ since neither r nor t have any occurrences of any u 0 a . Also recall that we are assuming that if b i = b j , then p i = p j . Thus, for each a, |u 0 Hence, our choice of p i suffices for this case.
Case: b i = N ✷ 1 . By the induction hypothesis, we can take p i to be of the form q∨r∨ t∨t, where q is T ✷ 1 -strict, r is strictly T ✷ 1 -chary, t = b 1 a=0 |u 1 a |, and t = c 1 a=0 |u 1 a |. We first consider q. Since Γ does not assign any of x 1 , . . . , x k the type N ⋄0 , the only variables from x 1 , . . . , x k whose lengths can occur in q are those assigned type N ε . Let q = q ξ, where for each i ′ with b i ′ = N ε , we take p i ′ to satisfy part (a). Hence, it follows that q ξ = wt q. Also, by the monotonicity of everything in sight, we have that q ≤ wt q. By the same argument, for r = r ξ we have that r ξ = wt r and r ≤ wt r. So, it suffices to take p i = q ∨ r ∨ t ∨ t. Note that t = t ξ since t has no occurrence of any u d a . Also recall that we are assuming that if b i = b j , then p i = p j . Thus, for each a, |u 1 Hence, our choice of p i suffices for this case. Cases These cases follow from essentially the same as argument given for the b i = N ✷ 1 case.
Therefore, part (a) follows. We henceforth assume that p i satisfies part (a) for each i with b i ≤: N ✷ d 1 .

For parts (b) and (c), consider the cases of
By the induction hypothesis, we may take p i to be of the form Note that as in the previous cases, t = t ξ. Also recall that we are assuming that if Hence, taking q i = q and r i = r suffices for this case.
Case: b i = N ✷ d 1 +1 . By the induction hypothesis, we may take p i to be of the form By an argument similar to the one for the previous case it follows that taking q i = q and r i = r suffices for this case too.
Cases: b i = N ⋄ d 1 +1 , . . . , b max . These cases follow from essentially the same as arguments as given for the previous two cases.

Lemma 44
Henceforth we assume that each p i is as in Lemma 44 and, in the cases where N ✷ d 1 : b i , q i and r i are as in that lemma too. For each n ∈ ω, define e (n) = the β-normal form of the n-level unfolding of e's crec-recursion, where β-and η-reductions are used to neaten up things as in the definition of e (1) . So, e (0) = e and e (1) = our prior definition of e (1) . Let ξ (0) = the empty substitution and ξ (n+1) = ξ • ξ (n) = the (n + 1)-fold composition ξ. It follows that, with respect to Γ; , for each i and n, (p i ξ (n) ) is a size bound for i-th argument expression of every f -application in e (n) .
Lemma 45 (The n step lemma). For each i and n: Proof. Part (a) follows directly from Lemma 44(a). For parts (b) and (c) we first note that by monotonicity we have that, for all k and i, (q i ∨ r i )ξ (k) ≤ wt (q i ∨ r i ) ξ (k+1) . Now, for part (b), it follows immediately from Lemma 44(b) that, for each n and i, we have Hence by the noted monotonicity of (q i ∨r i )ξ (·) , part (b) follows. For part (c), first fix i such that b i = N ⋄ d with d ≥ d 1 . It follows from an easy induction that for all n, p i ξ (n) ≤ wt ( n j=1 q i ξ (j) ) + ( n j=1 r i ξ (j) ) ∨ p i ; note the parallel to the argument for the prn-case of Proposition 1. Hence by monotonicity of q i ξ (·) and r i ξ (·) , part (c) follows.  . . , b max in turn, we inductively define θ σ to be the substitution [x j : = p ′ j b j : σ] and also define, for each i with b i = σ: Lemma 47 By the induction hypothesis, there exists p B , a manifestly |b 0 |-safe polynomial sizebound for B (as in (10.1)) with respect to Γ; f : γ. By Lemma 41, we can effectively find a manifestly b 0 -safe p such that p B |f | : = 0 γ , |x 1 | : = p ′ 1 , . . . , |x k | : = p ′ k ≤ wt p. The effect of the substitution on p B is to trivialize |f | and replace each |x i | with the final size bound from Lemma 47. It follows that p is a manifestly b 0 -safe size bound for the value returned by final step of the crec-recursion. Since TailPos(f, A), p is also a size bound on the value returned by the entire (tail) recursion. Thus, p E = λ|x 1 |, . . . |x k | p suffices for the crec case.

Theorem 43
11. An abstract machine Our next major goal is to show that every ATR expression is computable within a second-order polynomial time-bound (Theorem 79). Before formalizing time bounds, we first need to make precise what is being bounded. Below we set out the abstract machine that provides the operational semantics of PCF, BCL, and ATR and, based on this, §11.2 introduces and justifies our notion of the time cost of an expression evaluation.  We note that the standard proof that storage modification machines and Turing machines are polynomially-related models of computation [Sch80] straightforwardly extends to show that, at type-levels 1 and 2, our CEK model of computation and cost is (second-order) polynomially related to Kapron and Cook's oracle Turing machines under their answerlength cost model [KC96].

Time bounds
As the next step towards showing polynomial time-boundedness for ATR, the present section sets up a formal framework for working with time bounds. We start by noting the obvious: Run time is not an extensional property of programs. That is, V wt -equivalent expressions can have quite distinct run time properties. Because of this we introduce T , a new semantics for ATR that provides upper bounds on the time complexity of expressions.
The setting. Our framework for time complexities uses the following simple setting.
CEK costs. Time costs are assigned to ATR-computations via the CEK cost model. Worst-case bounds. T [[e]] will provide a worst-case upper bound on the CEK cost of evaluating e, but not necessarily a tight upper bound.
No free lunch. All evaluations have positive costs. This even applies to "immediately evaluating" expressions (e.g., λ-expressions), since checking whether something "immediateevaluates" counts as a computation with costs.
Inputs as oracles. We treat each type-level 1 input f as an oracle. In a time-complexity context this means that f is thought of answering any query in one time step, or equivalently, any computation involved in determining the reply to a query happens unobserved off-stage. Thus the cost of a query to f involves only (i) the time to write down a query v, and (ii) the time to read the reply f (v). The times (i) and (ii) are bounded by roughly |v| and |f |(|v|), respectively. Thus our time bounds will ultimately be expressed in terms of the lengths of the values of free and input variables.
Currying and time complexity. In common usage, "the time complexity of e" can mean one of two things. When e is of base type, the phrase usually refers to the time required to compute the value of e. We might think of this as time past-the time it took to arrive at e's value. When e is of an arrow type and thus describes a procedure, the phrase usually refers to the function that, given the sizes of arguments, returns the maximum time the procedure will take when run on arguments of the specified sizes. We might think of this as time in possible futures in which e's value is applied. An expression can have both a past and futures of interest. Consider (e 0 e 1 ) where e 0 is of type N ε → N ε → N ⋄ and e 1 is of type N ε . Then (e 0 e 1 ) has a time complexity in the first sense as it took time to evaluate the expression, and, since (e 0 e 1 ) is of type N ε → N ⋄ , it also has a time complexity in the second sense. Now consider just e 0 itself. It too can have a nontrivial time complexity in the first sense and the potential/futures part of e 0 's time complexity must account for the multiple senses of time complexity just attributed to (e 0 e 1 ). Type-level-2 expressions add further twists to the story. Our treatment of time complexity takes into account these extended senses.
Costs and potentials. In the following the time complexity of an expression e always has two components: a cost and a potential. A cost is always a positive (tally) integer and is intended to be an upper bound on the time it takes to evaluate e. The form of a potential depends on the type of e. Suppose e is of a base (i.e., string) type. Then e's potential is intended to be an upper bound on the length of its value, an element of ω. The length of e's value describes the potential of e in the sense that when e's value is used, its length is the only facet of the value that plays a role in determining time complexities. Now suppose e is of type, say, N ε → N ⋄ . Then e's potential will be an f e ∈ (ω → ω × ω) that maps a p ∈ ω (the length/potential of the value of an argument of e) to a (c r , p r ) ∈ ω × ω where c r is the cost of applying the value of e to something of length p and p r is the length/potential of the result. Note that (c r , p r ) is a time complexity for something of base type. Generalizing from this, our motto will be: The potential of a type-(σ → τ ) thing is a map from potentials of type-σ things to time complexities of type-τ things. 21 Our first task in making good on this motto is to situate time complexities in a suitable semantic model. 22 A model for time complexities. The time types are the result of the following translations ( · and · ) of ATR types: Condition (i) above restricts L wt [[ σ → τ ]] so that the projection Pot acts as advertised. Condition (ii) restricts each f ∈ L wt [[ σ → τ ]] so that the size information in f (p) depends only on the size information in p.
We can now define the T (time-complexity) and P (potential) interpretations of the ATR types. (The P-interpretation is a notational convenience.) The T -interpretation of constants and oracles. The following two definitions introduce a translation from the V wt model into the T model. We use this translation to assign time complexities to program inputs: string constants and oracles. . 21 In a more general setting (e.g., call-by-name), a (σ → τ ) potential is a map from σ-time-complexities to τ -time-complexities, as an operator may be applied to an unevaluated operand.
. We view f as the time complexity of f as an oracle: the only time costs associated with applying f are those involved in setting up applications of f and reading off the results. Recall that under call-by-value, a λ-expression immediately evaluates to itself. The function-symbol f will be treated analogously to a λ-term. Hence, the cost component of f is 1. The definition of f parallels both our informal discussion of the notion of the potential of a type-level 1 function and the definition of the length of functions of type levels 1 and 2 in §2. 10. One can show that when f is a type-level 2, f is total. (The argument is similar to the proof of the totality of the type-level 2 notion of length defined by (2.3) in §2. 10.) Definition 51 and the type-level 1 part of Definition 52 describe the time complexities of possible ATR inputs. The following lemma unpacks the definition of f for f of typelevel 1. The proof is a straightforward induction and hence omitted. Suppose that t 0 (respectively, t 1 ) is the time complexity of a type-(σ → τ ) expression e 0 (respectively, type-σ expression e 1 ). Then t 0 ⋆t 1 is intended to be the time complexity of (e 0 e 1 ). The cost component of t 0 ⋆ t 1 is: (the cost of evaluating e 0 ) + (the cost of evaluating e 1 ) + (the cost of applying e 0 's value to e 1 's value) + 3, where the 3 is the CEK-overhead of an application. The potential component is simply the potential of the result of the application. The next lemma works out of the effect of the ⋆ operation for type-level 1 oracles. For k = 1, the right-hand side of (12.1) simplifies to: ((1 ∨ |v 1 |) + 1 ∨ |f |(|v 1 |) + 4, |f |(|v 1 |)). Figure 21. For k = 2, the right-hand side of (12.1) simplifies to: (1 ∨ |v 1 |) + (1 ∨ |v 2 |) + 1 ∨ |f |(|v 1 |, |v 2 |) + 9, |f |(|v 1 |, |v 2 |) . We leave it to the reader to break down its cost component.

T -Environments.
As a companion to T -application we shall define an analogue of currying in T . First, we introduce T -environments. Recall that in a call-by-value language, variables name values [Plo75], i.e., the end result of a (terminating) evaluation. Thus, a value does not need to be evaluated again, at least no more than an input value does. Hence, if a T -environment maps a variable to a type-γ time complexity (c, p), then c should be: 1 ∨ p, when γ is a base type, and 1, when γ is an arrow type.
Note the complementary roles of Λ ⋆ and ⋆: Λ ⋆ shifts the past (the cost) into the future (the potential) and ⋆ shifts part of the future (the potential) into the past (the cost). This being complexity theory, there are carrying charges on all this shifting. This is illustrated in the next lemma that shows how Λ ⋆ and ⋆ interact. First, we introduce: Lemma 59 (Almost the η-law). Suppose Γ, ∆, X, x, σ, and τ are as in Definition 57. Let Γ ′ ; ∆ ′ be the result of removing x 1 : σ 1 , . . . , x k : σ k from Γ; ∆. Let ̺ ∈ T [[Γ; ∆]] and let ̺ ′ be the restriction of ̺ to preimage(Γ ′ ; ∆ ′ ). Then The lemma's proof is another straightforward calculation.
Projections. The next definition introduces a way of recovering more conventional bounds from time complexities. Note, by Definitions 51 and 52, and Lemmas 53 and 55, when v is a string constant or a type-1 oracle, the value of v is a function of the value of |v|. So, by an abuse of notation, we treat v as a function of |v| for such v.
We call Cost(t) and Pot(t), respectively, the base cost and base potential of t. With Pot's definition in hand, we make good on the promise to check that the notions defined between Definitions 49 and 60 make sense.
All three parts follow straightforwardly from the definitions.
Time-complexity polynomials. To complete the basic time-complexity framework, we define an extension of the second-order polynomials for the simple product types over T, T ε , T ⋄ , . . . under the L-semantics. The restriction of these to the time types under the L wt -semantics are the time-complexity polynomials. First we extend the grammar for raw  expressions to include: P : : = (P, P ) | π 1 (P ) | π 2 (P ). Then we add the following new typing rules for second-order polynomials: where σ, σ 1 , and σ 2 simple product types over T, T ε , T ⋄ , . . . and ⊙ stands for any of * , +, or ∨. Next we extend the arithmetic operations to all types by recursively defining, for each γ and each u, v ∈ L[[γ]]: Finally, the L-interpretation of the polynomials is just the standard definition.

The time-complexity interpretation of ATR −
Here we establish a polynomial time-boundedness result for ATR − , the subsystem of ATR obtained by dropping the crec construct. Definition 63 introduces the T -interpretation of ATR − and the proof of Theorem 67 shows that ATR − -expressions have time complexities that are polynomial bounded and well-behaved in other ways. All of this turns out to be pleasantly straightforward. The hard work comes in the following two sections: §14 establishes a key time-complexity decomposition property concerning the affine types and §15 uses this decomposition to define the T -interpretation of crec expressions and to prove a polynomial boundedness theorem for ATR time complexities.
Convention: Through out this section suppose that γ, σ, and τ are ATR types and Γ; ∆ is an ATR type context. Definition 63. Figure 22 provides the T -interpretation for each ATR − construct. (1 ∨ p y + 4, p y + 2)). Theorem 67(b) For the proof of soundness, we shall first define a logical relation ⊑ tc γ between CEKclosures and time-complexities. Roughly, eρ ⊑ tc γ (c, p) says that the time complexity (c, p) bounds the cost of evaluating the closure eρ. Conventions on CEK-closures: CEK-closures are written eρ. (We always assume F V (e) ⊆ preimage(ρ).) A CEK-closure eρ is called a value when e is a CEK-value. eρ ↓ vρ ′ means that starting from (e,ρ, halt), the CEKmachine eventually ends up with (v,ρ ′ , halt), where vρ ′ is a value. Below, v ranges over CEK-values and p and q range over potentials.

Definition 69.
(c r , p r ) = p 0 (p 1 ). By the induction hypothesis on e 0 and e 1 : There are two subcases to consider based on the form of v 0 . Subcase: v 0 = λx e ′ 0 for some Γ, x: σ; ∆ ⊢ e ′ 0 : γ. Then (13.1b) means that, for all type-τ values vρ ′′ and all q with Note that e ′ 0ρ ′ Hence, in this subcase e is as required. Subcase: v 0 is an oracle. The argument here is a repeat, mutatis mutandis, of the proof of previous subcase.
Part (c). The argument follows along the lines of the proof of (b).

Scholium 71.
The T -interpretation of ATR − (and later, ATR) sits in-between the actual costs of evaluating expressions on our CEK machine and the sought-after polynomial time-bounds on these costs. Why is working with T -interpretations preferable to working directly with executions of CEK machines and their costs? Part of the reason is that Tinterpretations have built-in to them the cost-potential aspects expressions. One would somehow have to replicate these in working directly with CEK-computations. Another part of the reason is that T -interpretations collapse the many possible paths of a CEKcomputation into a single time-complexity. The T -interpretation of if-then-else is chiefly responsible for these collapses. Scholium 80 notes that these collapses are a source of some trouble in dealing with crec-expressions.

An affine decomposition of time complexities
When analyzing the time complexity of a program, one often needs to decompose its time complexity into pieces that may have little to do with the program's apparent syntactic structure. Theorem 74 below is a general time-complexity decomposition result for ATR expressions. The ATR typing rules for affinely restricted variables are critical in ensuring this time-complexity decomposition. The decomposition is used in the next section to obtain the recurrences for the analysis of the time complexity of crec expressions. Note that the theorem presupposes that that T [[ · ]] is defined on crec expressions. However, since no affinely restricted variable can occur free in a well-typed crec expression and since the application of the theorem will be within a structural induction, this presupposition does not add any difficulties.
Remark 72. In fact, the time-complexity of a crec expression e will be defined in terms of time-complexities of expressions built up from subexpressions of e using term constructors other than crec. Thus a completely standard structural induction for establishing soundness does not quite work. A fully formal proof would have first established results such as "if e 0 ⊑ tc σ→τ X 0 and e 1 ⊑ tc σ X 1 , then (e 0 e 1 ) ⊑ tc τ X 0 ⋆ X 1 " where the X i 's are general mappings from T -environments to time complexities. These lemmas would then be used to carry out the induction steps of a structural induction which, in all but the crec case, would just quote the relevant lemma. Rather than impose this additional level of detail on the reader, we have opted for a less formal approach here and will assume that if we inductively have soundness for a subterm e, then we also have it for terms built up from e without crec.
To help in the statement and proof of the Affine Decomposition Theorem, we introduce the following definitions and conventions.
where the t 's are as in the lemma's statement. It follows from TailPos(f, e) that the following three cases are the only ones to consider.
Case 1: f fails to occur in A. Then (14.2) follows immediately. Scholium 75. As demonstrated in [DR07], handling forms of recursion beyond tail recursion requires notions of decomposition more sophisticated than (14.1). Moreover, if explicit ⊸-types were added to ATR, then the decomposition also becomes more involved than (14.1).
For the analysis of crec expressions we need the following corollary to Theorem 74. We leave its proof to the reader who should be mindful of Remark 72 above.

The time-complexity interpretation of ATR
We are now in a position to consider the time complexity properties of crec expressions. Remark 77 below motivates the T -interpretation of crec expressions given in Definition 78. The remark's analysis will be reused in establishing soundness and polynomial time-boundedness for ATR. (c + 1, 0), otherwise; where p 1 = pot(T [[x 1 ]] ̺ ′ ), c = 2 · p 1 + 2 · |a| + 5, and t is as before.
By the analysis for the →-I case in Theorem 67's proof, (λ x B a ) ⊑ tc γ Λ ⋆ ( x, X a ). As cost CEK (λ x B a , ) = 1, we have that e a ⊑ tc γ Y a , where Y a def = dally(1, Λ ⋆ ( x, X a )). Definition 83. We say that each base type is unhindered and that (γ 1 , . . . , γ k ) → N ℓ is unhindered when (γ 1 , . . . , γ k ) → N ℓ is strict, predicative and each γ i unhindered. Claim 1: U σ ⊆ BFF σ . Proof: It is straightforward to express a crec-recursion with PCF's fix-construct with only polynomially-much over head on the cost of the simulation. Hence, the claim follows from Theorem 79.
Claim 2: BFF σ ⊆ U σ . Proof: Kapron and Cook [KC96] showed that the type-2 basic feasible functionals are characterized by the functions computable in second-order polynomial time-bounded oracle Turing machines (OTMs). Proposition 18 from [IKR01] shows how to simulate any second-order polynomial time-bounded oracle Turing machine using that paper's ITLP 2 programming formalism. That simulation is easily adapted to ATR. Hence, the claim follows.
Note: The proof's two claims are constructive in that: (i) given a closed ATR-expression e of unhindered type, one can construct an equivalent PCF expression e ′ and a second-order polynomial p e that bounds the run time of e ′ , and (ii) given an OTM M and a second-order polynomial p that bounds the run time of M, one can construct an ATR-expression that computes the same function as M.
Claim 2 can be extended beyond unhindered types as follows. For each ATR arrowtype γ = (γ 1 , . . . , γ k ) → N ℓ , and each type-shape(γ) OTM M, we say that M computes a BFF γ -function when there is a type-|γ| polynomial p such that the run time of M on ( v) is bounded by p(|v 1 |, . . . , |v k |). The proof of Claim 2 lifts to show: for all ATR arrow-types γ, each BFF γ -function is ATR computable.

Conclusions
ATR is a small functional language, based on PCF, which has the property that each ATR program has a second-order polynomial time-bound. The ATR-computable functions include the basic feasible functionals at type-levels 1 and 2. However, the ATR-computable functions contain other functions, such as prn, that are not basic feasible in the original sense of Cook and Urquhart [CU93]. ATR is able to express such functions thanks to its type system and supporting semantics that work together to control growth rates and time complexities. Without some such controls feasible recursion schemes, such as prn, cannot be first-class objects of a programming language.
The ATR type-system and semantics were crafted so that ATR's complexity properties could be established through adaptations of standard tools for the analysis of conventional programming languages (e.g., intuitionistic and affine types, denotational semantics for ATR and its time complexity, and an abstract machine that provides both an operational semantics for ATR and a basis for the time-complexity semantics). As ATR is based on PCF (a theoretical first-cousin of both ML and Haskell), our results suggest that one might be able to craft "feasible sublanguages" of ML and Haskell that are both theoretically well-supported and tolerable for programmers.
ATR and its semantic and analytic frameworks are certainly not the final word on any issue. Here we discuss several possible extensions of our work.
More general recursions. In [DR07] we consider an expansion of ATR that allows a fairly wide range of affine (one-use) recursions. In particular, the expanded ATR can fairly naturally express the classic insertion-and selection-sort algorithms. Handling this larger set of recursions requires some nontrivial extensions of our framework for analyzing time-complexities.
Dealing with nonlinear recursions (e.g., the standard quicksort algorithm) is trickier to handle because there must be independent clocks on each branch of the recursion that together guarantee certain global upper bounds.
Recursions with type-level 1 parameters. Another possible extension of ATR would be to allow type-level 1 parameters in crec-recursions so that, for example, one could give a continuation-passing-style definition of prn. Because type-1 parameters in recursions act to recursively define functions, these parameters must be affinely restricted just like principle recursor variables of crec-expressions. Consequently, such an extension must also include explicit ⊸-types to restrict these parameters. However, along with the ⊸-types come (explicitly or implicitly) tensor-products and these cause problems in analyzing crecrecursions (e.g., one is forced account for all the possible interactions of the affine parameters in the course of a recursion and so the naïve "polynomial" time-bounds are exponential in size).
Lazy evaluation. For a lazy (e.g., call-by-need) version of ATR, one would need to: (i) construct an abstract machine for this lazy-ATR, (ii) modify the T -semantics a bit to accommodate the lazy constructs; and (iii) rework the T -interpretation of ATR which would then have to be shown monotone, sound, and constructively polynomial time-bounded. (Since the well-tempered semantics is extensional, it requires very few changes for a lazy-ATR.) If our lazy-ATR allowed infinite strings, then the V wt -semantics would also have to be modified. Note that Sands [San90] and Van Stone [VS03] both consider lazy evaluation in their work.
Lists and streams. There are multiple senses of the "size" of a list. For example, the run-time of reverse should depend on just a list's length, whereas the run-time of a search depends on both the list's length and the sizes of the list's elements. Any useful extension of ATR that includes lists needs to account for these multiple senses of size in the type system and the well-tempered and time-complexity semantics. If lists are combined with laziness, then we also have the problem of handling infinite lists. However, ATR and its semantics already handle one flavor of infinite object, i.e., type-level 1 inputs, so handling a second flavor of infinite object many not be too hard.
Type checking, type inference, time-bound inference. We have not studied the problem of ATR type checking. But since ATR is just an applied simply typed lambda calculus with subtyping, standard type-checking tools should suffice. Type inference is a much more interesting problem. We suspect that a useful type inference algorithm could be based on Frederiksen and Jones' [FJ04] work on applying size-change analysis to detect whether programs run in polynomial time. Another interesting problem would be to start with a well-typed ATR program and then extract reasonably tight size and time bounds (as opposed to the not-so-tight bounds given by Theorem 79).
Beyond type-level 2. There are semantic and complexity-theoretic issues to be resolved in order to extend the semantics of ATR to type-levels 3 and above. The key problem is that our definition of the length of a type-2 function (2.3) does not generalize to typelevel 3. This is because for Ψ ∈ MC ((N→N)→N)→N and G ∈ MC (N→N)→N , we can have sup{ |Ψ(F )| |F | ≤ |G| } = ∞, even when G is 0-1 valued. To fix this problem one can introduce a different notion of length that incorporates information about a function's modulus of continuity. It appears that ATR and the V wt -and T -semantics extend to this new setting. However, it also appears that this new notion of length gives us a new notion of higher-type feasibility that goes beyond the BFFs. Sorting out what is going on here should be the source of other adventures.