Unification modulo a 2-sorted Equational theory for Cipher-Decipher Block Chaining

. We investigate uniﬁcation problems related to the Cipher Block Chaining (CBC) mode of encryption. We ﬁrst model chaining in terms of a simple, convergent, rewrite system over a signature with two disjoint sorts: list and element. By interpreting a particular symbol of this signature suitably, the rewrite system can model several practical situations of interest. An inference procedure is presented for deciding the uniﬁcation problem modulo this rewrite system. The procedure is modular in the following sense: any given problem is handled by a system of ‘list-inferences’, and the set of equations thus derived between the element-terms of the problem is then handed over to any (‘black-box’) procedure which is complete for solving these element-equations. An example of application of this uniﬁcation procedure is given, as attack detection on a Needham-Schroeder like protocol, employing the CBC encryption mode based on the associative-commutative (AC) operator XOR. The 2-sorted convergent rewrite system is then extended into one that fully captures a block chaining encryption-decryption mode at an abstract level, using no AC-symbols; and uniﬁcation modulo this extended system is also shown to be decidable.


Introduction
The technique of chaining is applicable in many situations.A simple case is e.g., when we want to calculate the partial sums (resp.products) of a (not necessarily bounded) list of integers, with a given 'base' integer; such a list of partial sums (resp.products) can be calculated, incrementally, with the help of the following two equations: bc(nil, z) = nil, bc(cons(x, Y ), z) = cons(h(x, z), bc(Y, h(x, z))) Note: The first part of this paper, devoted to unification modulo BC, is a more detailed version of the work we presented at LATA 2012 ([2]).

Notation and Preliminaries
We consider a ranked signature Σ, with two disjoint sorts: τ e and τ l , consisting of binary functions bc, cons, h, and a constant nil, and typed as follows: bc : τ l × τ e → τ l , cons : τ e × τ l → τ l , h : τ e × τ e → τ e , nil : τ l .
We also assume given a set X of countably many variables; the objects of our study are the (well-typed) terms of the algebra T (Σ, X ); terms of the type τ e will be referred to as elements; and those of the type τ l as lists.It is assumed that the only constant of type list is nil; the other constants, if any, will all be of the type element.For better readability, the set of variables X will be divided into two subsets: those to which 'lists' can get assigned will be denoted with upper-case letters as: X, Y, Z, U, V, W, . . ., with possible suffixes or primes; these will be said to be variables of type τ l ; variables to which 'elements' can get assigned will be denoted with lower-case letters, as: x, y, z, u, v, w, . . ., with possible suffixes or primes; these will be said to be variables of type τ e .The theory we shall be studying first in this paper is defined by the two axioms (equations) already mentioned in the Introduction: bc(nil, z) = nil, bc(cons(x, Y ), z) = cons( h(x, z), bc(Y, h(x, z)) ) It is easy to see that these axioms can both be oriented left-to-right under a suitable lexicographic path ordering (lpo) (cf.e.g., [10]), and that they form then a convergenti.e., confluent and terminating -2-sorted rewrite system.
As mentioned in the previous section, we consider two theories that contain the above two axioms.The first is where these are the only axioms; we call that theory BC 0 .The other theory is where h is interpreted as for CBC, i.e., where h(x, y) = e k (x ⊕ y) where ⊕ is exclusive-or and e k is encryption using some (fixed) given key k.This theory will be referred to as BC 1 .We use the phrases "BC-unification" and "unification modulo BC" to refer to unification problems modulo both the theories, collectively.
Note that in the case where h is a free uninterpreted symbol (i.e., BC 0 ) h is fully cancellative in the sense that for any terms s 1 , t 1 , s 2 , t 2 , h(s 1 , t 1 ) ≈ BC h(s 2 , t 2 ) if and only if s 1 ≈ BC s 2 and t 1 ≈ BC t 2 .But when h is interpreted for CBC, this is no longer true; in such a case, h will be only semi-cancellative, in the sense that for all terms s 1 , s 2 , t, the following holds: h is right-cancellative: h(s 1 , t) ≈ BC h(s 2 , t) if and only if s 1 ≈ BC s 2 , and h is also left-cancellative: h(t, s 1 ) ≈ BC h(t, s 2 ) if and only if s 1 ≈ BC s 2 .Thus, in the sequel, when we look for the unifiability of any set of element equations modulo BC 0 (resp.modulo BC 1 ) the cancellativity of h (resp.the semi-cancellativity of h) will be used as needed, in general without any explicit mention.
Our concern in this section, and the one following, is the equational unification problems modulo BC 0 and BC 1 .We assume without loss of generality (wlog) that any given BCunification problem P is in standard form, i.e., P is given as a set of equations EQ, each having one of the following forms: U = ?V, U = ?bc(V, y), U = ?cons(v, W ), U = ?nil, u = ?v, v = ?h(w, x), u = ?const where const stands for any ground constant of sort τ e .The first four kinds of equationsthe ones with a list-variable on the left-hand side -are called list-equations, and the rest (those which have an element-variable on the left-hand side) are called element-equations.
For any problem P in standard form, L(P) will denote the subset formed of its list-equations, and E(P) the subset of element-equations.A set of element-equations is said to be in dag-solved form (or d-solved form) ( [14]) if and only if they can be arranged as a list x 1 = ?t 1 , . . ., x n = ?t n , such that: ∀ 1 ≤ i < j ≤ n: x i and x j are distinct variables, and x i does not occur in t i nor in any t j .
Such a notion is naturally extended to sets of list-equations as well.In the next section we give an inference system for solving any BC-unification problem in standard form.For any given problem P, its rules will transform L(P) into one in d-solved form.The elementequations at that point can be passed on to an algorithm for solving them -thus in the case of BC 1 what we need is an algorithm for solving the general unification problem modulo the theory of exclusive-or.Any development presented below -without further precision on h -is meant as one which will be valid for both BC 0 and BC 1 .

Inference System for BC-Unification
The inference rules have to consider two kinds of equations: the rules for the listequations in P, i.e., equations whose left-hand sides (lhs) are variables of type τ l , and the rules for the element-equations, i.e., equations whose lhs are variables of type τ e .Our method of solving any given unification problem will be 'modular' on these two sets of equations: The list-inference rules will be shown to terminate under suitable conditions, and then all we will need to do is to solve the resulting set of element-equations for h.
A few technical points need to be mentioned before we formulate our inference rules.Note first that it is not hard to see that cons is cancellative; by this we mean that cons(s 1 , T 1 ) ≈ BC cons(s 2 , T 2 ), for terms s 1 , s 2 , T 1 , T 2 , if and only if s 1 ≈ BC s 2 and T 1 ≈ BC T 2 .On the other hand, it can be shown by structural induction (and the semi-cancellativity of h) that bc is conditionally semi-cancellative, depending on whether its first argument is nil or not; for details, see Appendix-1.This property of bc will be assumed in the sequel.
The inference rules given below will have to account for cases where an 'occur-check' succeeds on some list-variable, and the problem will be unsolvable.The simplest among such cases is when we have an equation of the form U = ?cons(z, U ) in the problem.But one could have more complex unsolvable cases, where the equations involve both cons and bc; e.g., when P contains equations of the form: U = ?cons(x, V ), U = ?bc(V, y); the problem will be unsolvable in such a case: indeed, from the axioms of BC, one deduces that V must be of the form V = ?cons(v, V ′ ), for some v and V ′ , then x must be of the form x = ?h(v, y), and subsequently V = ?bc(V ′ , x), and we are back to a set of equations of the same format.We need to infer failure in all such cases.With that purpose, we define the following relations on the list-variables of the equations in P: Note that ∼ bc is the symmetric closure of the relation > bc ; its reflexive, symmetric and transitive closure is denoted as ∼ * bc .The transitive closure of > bc is denoted as > + bc ; and its reflexive transitive closure as > * bc .Note, on the other hand, that U = ?bc(U, x) is solvable by the substitution {U := nil}; in fact this equation forces U to be nil, as would also a set of equations of the form U = ?bc(V, y), V = ?bc(U, x).Such cycles (as well as some others) have to be checked to determine whether a list-variable is forced to be nil.This can be effectively done with the help of the relations defined above on the type τ l variables.We define, recursively, a set nonnil of the list-variables of P that cannot be nil for any unifying substitution, as follows: We have then the following obvious result: Lemma 3.1.A variable U ∈ nonnil if and only if there are variables V and W such that U ∼ * bc V and V > cons W .Some of the inference rules below will refer to a graph whose nodes are the list-variables of the given problem P, 'considered equivalent up to equality'; more formally: for any listvariable U of P, we denote by [U ] the equivalence class of list-variables that get equated to U in P, in the following sense: Any relation R defined over the list-variables of P is then extended naturally to these equivalence classes, by setting: R( Definition 3.2.Let G l = G l (P) be the graph whose nodes are the equivalence classes on the list-variables of P, with arcs defined as follows: From a node [U ] on G l there is a directed arc to a (not necessarily different) node [V ] on G l if and only if: • Either U > cons V : in which case the arc is labeled with > cons • U > bc V : in which case the arc is labeled with > bc .In the latter case, G l will also have a two-sided (undirected) edge between [U ] and [V ], which is labeled with ∼ bc .The graph G l is called the propagation graph for P.
A node [U ] on G l is said to be a bc/bc-peak if P contains two different equations of the form U = ?bc(V, x), U = ?bc(W, y); the node [U ] is said to be a cons/bc-peak if P has two different equations of the form U = ?cons(x, V 1 ), U = ?bc(V, z).
On the set of nodes of G l , we define a partial relation ≻ l by setting: [U ] ≻ l [V ] iff there is a path on G l from [U ] to [V ], at least one arc of which has label > cons .In other words, For instance, the variable U violates occur-check in the problem: U = ?bc(W, z), W = ?cons(x, U ), as well as in the problem: It can be checked that both the problems are unsatisfiable.(L1) Variable Elimination: (L4.a) Semi-Cancellation on bc, at a bc/bc-peak: EQ ⊎ {U = ?bc(V, x), U = ?bc(W, x)} EQ ∪ {U = ?bc(V, x), W = ?V } (L4.b)Push bc below cons, at a nonnil bc/bc-peak: EQ ⊎ {U = ?bc(V, x), U = ?bc(W, y)} EQ ∪ {V = ?cons(v, Z), W = ?cons(w, Z), U = ?cons(u, U ′ ), U ′ = ?bc(Z, u), u = ?h(v, x), u = ?h(w, y)} if U ∈ nonnil (L5) Splitting, at a cons/bc-peak: EQ ⊎ {U = ?cons(x, U 1 ), U = ?bc(V, z)} EQ ⊎ {U = ?cons(v, W ), U = ?nil} F AIL The symbol '⊎' in the premises of the above inference rules stands for disjoint set union (and '∪' for usual set union).The role of the Variable Elimination inference rule (L1) is to keep the propagation graph of P irredundant: each variable has a unique representative node on G l (P), up to variable equality.This rule is applied most eagerly.Rules (L2), (L3.a)-(L3.c)and (L4.a) come next in priority, and then (L4.b).The Splitting rule (L5) is applied in the "laziest" fashion, i.e., (L5) is applied only when no other rule is applicable.The above inference rules are all "don't-care" nondeterministic.(The priority notions just mentioned serve essentially for optimizing the inference procedure.) The validity of the rule (L4.b) ('Pushing bc below cons') results from the cancellativity of cons and the semi-cancellativity of bc (Appendix-1).Note that the variables Z, U ′ , and u in the 'inferred part' of this rule (L4.b) might need to be fresh; the same is true also for the variables y and V 2 in the inferred part of the Splitting rule; but, in either case this is not obligatory, if the equations already present can be used for applying these rules.Type-inference failure is assumed to be checked implicitly; no explicit rule is given.
The following point should be kept in mind: Any given problem P naturally 'evolves' under the inference rules; and new variables might get added in the process, if rule (L5) or rule (L4.b) is applied; but none of the variables initially present in P can disappear in the process; not even under the Variable Elimination rule (L1).Thus, although the graph G l referred to in the Occur-Check Violation rule (L6) is the graph of the 'current problem', the node it refers to might still be one corresponding to an initial variable.
We show now that such an introduction of fresh variables cannot go for ever, and that the above "don't-care" nondeterministic rules suffice, essentially, for deciding unifiability modulo the axioms of BC.Proposition 3.3.Let P be any BC-unification problem, given in standard form.The system IN F l of list-inference rules, given above, terminates on P in polynomially many steps.
Proof.Assume given a problem P in standard form, for which the inference process does not lead to failure on Occur-Check (L6) or Size-Conflict (L7).If IN F l is non-terminating on such a P, at least one of the rules of IN F l must have been applied infinitely often along some inference chain; we show that this cannot be true for any of the rules in IN F l .
Note first that an equation of the form U = ?V in P is never handled in 'both directions' by the variable elimination rule (L1); an application of this rule means: every occurrence of the variable U in the problem is replaced by the variable V .It is easy to check then, that for this reason, (L1) cannot give rise to non-termination.On the other hand, the list-inference rules (L2) through (L4.a) eliminate a (directed) outgoing arc from some node of G l ; so their termination is easy to check.It should be clear, that for these three rules, termination is polynomial (even linear).Thus, to show the termination of the entire inference process in polynomially many steps, we have to look at how the problem evolves under the rule (L5) (Splitting) and the rule (L4.b) (Pushing bc below cons).We show that if occur-check violation (L6) does not occur, then the applications of the rule (L5) or of the rule (L4.b) cannot go on forever.
For proving this, we shall be using an equivalence relation denoted as ∼ β , on the listvariables of the given problem.It is defined as the smallest equivalence relation 1 satisfying the following conditions, on the list-variables of P: Observe now that the number of bc-equations, i.e., list-equations of the form U = ?bc(V, z), never increases.This number decreases in most cases, except for (L1), (L2) and (L5).The splitting rule (L5) does not decrease the number of bc-equations and may 1 The relation ∼ β can be viewed as a combination of the unification closure, a notion defined by Kanellakis and Revesz [15], and the congruence closure of ∼ * bc .The difference is that here we are working with a typed system.
introduce new variables, but the number of ∼ β -equivalence classes of nodes (on the current graph) does not increase: Indeed, applying the splitting rule (L5) on a list-equation U = ?bc(V, z) removes that equation, and creates a list-equation of the form Suppose now that applying the splitting rule does not terminate.Then, at some stage, the derived problem will have a sequence of variables of the form such that the length of the sequence n strictly exceeds the initial number of ∼ β -equivalence classes -which cannot increase under splitting, as we just observed above.So there must exist indices 0 Let j ≤ n be the smallest integer for which there exists an i, 0 ≤ i < j, such that U i ∼ β U j .Then, by the definition of ∼ β , we must have U i ∼ * bc U j .Consequently, we would then also have [U i ] ≻ l [U i ]; and that would have caused the inference process to terminate with FAIL, as soon as both the variables U i and U j appear in the problem derived under the inferences.
Termination of (L4.b) can now be proved as follows: The number of ∼ * bc -equivalence classes may increase by 1 with each application of (L4.b), but the number of ∼ β -equivalence classes remains the same, for the same reason as above.Let m be the number of bc-equations in the input problem and let n be the number of variables in the input problem.We then show that the total number of applications of (L4.b) and (L5) cannot exceed mn: Indeed, whenever one of (L4.b) or (L5) is applied, some number of bc-equations are removed and an equal or lesser number are added, whose variables belong to ∼ β -equivalence classes at a 'lower level' as explained above, i.e., below some cons steps.There are at most n such equivalence classes, since the number of ∼ β equivalence classes does not increase (and there cannot be more than n such equivalence classes, to start with).So a bc-equation cannot be "pushed down" more than n times.Since there are initially m bc-equations, the total number of applications of (L4.b) and (L5) cannot exceed mn.
A set of equations will be said to be L-reduced if none of the above inference rules (L1) through (L7) is applicable.(Note: such a problem may not be in d-solved form: an easy example is given a couple of paragraphs below.) Unification modulo BC: The rules (L1) through (L7) are not enough to show the existence of a unifier modulo BC.The subset of element-equations, E(P), may not be solvable; for example, the presence of an element-equation of the form {x = ?h(x, z)} should lead to failure.However, we have the following:

and only if the set E(P) of its element-equations is solvable.
Proof.If L(P) is L-reduced, then setting every list-variable that is not in nonnil to nil will lead to a unifier for L(P), modulo BC, provided E(P) is solvable.
Recall that BC 0 is the theory defined by BC when h is uninterpreted.Proposition 3.5.Let P be any BC 0 -unification problem, given in standard form.Unifiability of P modulo BC 0 is decidable in polynomial time (wrt the size of P).
Proof.If the inferences of IN F l applied to P lead to failure, then P is not unifiable modulo BC; so assume that this is not the case, and replace P by an equivalent problem which is L-reduced, deduced in polynomially many steps by Proposition 3.3.By Proposition 3.4, the unifiability modulo BC of such a P amounts to checking if the set E(P) of its elementequations is solvable.We are in the case where h is uninterpreted, so to solve E(P) we apply the rules for standard unification, and check for their termination without failure; this can be done in polynomial time [5].(In this case, h is fully cancellative.) It can be seen that while termination of the above inference rules guarantees the existence of a unifier (provided the element equations are syntactically solvable), the resulting L-reduced system may not lead directly to a unifier.For instance, the L-reduced system of list-equations {U = ?bc(V, x), U = ?bc(V, y)} is unifiable, with the following two incomparable unifiers: {x := y, U := bc(V, y)} and {U := nil, V := nil} To get a complete set of unifiers we need three more inference rules, which are "don'tknow" nondeterministic, to be applied only to L-reduced systems: (L8) Nil-solution-Branch for bc, at a bc/bc-peak: EQ ⊎ {U = ?bc(V, x), U = ?bc(W, y)} EQ ∪ {U = ?nil, V = ?nil, W = ?nil} (L9) Guess a non-Nil branch for bc, at a bc/bc-peak: EQ ⊎ {U = ?bc(V, x), U = ?bc(W, y)} EQ ∪ {V = ?cons(v, Z), W = ?cons(w, Z), U = ?cons(u, U ′ ), U ′ = ?bc(Z, u), u = ?h(v, x), u = ?h(w, y)} (L10) Standard Unification on bc: EQ ⊎ {U = ?bc(V, x), U = ?bc(W, y)} EQ ∪ {U = ?bc(W, y), V = ?W, x = ?y} Rule (L9) nondeterministically 'guesses' U to be in nonnil; in other words, it applies rule (L4.b) 'unconditionally'.The inference system thus extended will be referred to as IN F ′ l .By the same reasonings as developed above, IN F ′ l also terminates, in polynomially many steps, on any problem given in standard form.We establish now a technical result, valid whether or not h is interpreted: Proposition 3.6.Let P be any BC-unification problem in standard form, to which none of the inferences of IN F ′ l is applicable.Then its set of list-equations is in d-solved form.Proof.If none of the equations in P involve bc or cons (i.e., all equations are equalities between list-variables), then the proposition is proved by rule (L1) (Variable Elimination).
Observe first that if IN F l is inapplicable to P, then, on the propagation graph G l for P, there is at most one outgoing directed arc of G l at any node U : Otherwise, suppose there are two distinct outgoing arcs at some node U on G l ; if both directed arcs bear the label > cons , then rule (L2) of IN F l would apply; if both bear the label > bc , then one of (L4.a), (L4.b), (L9), (L10) would apply; the only remaining case is where one of the outgoing arcs is labeled with > cons and the other has label > bc , but then the splitting rule (L5) would apply.
Consider now any given connected component Γ of G l .There can be no directed cycle from any node U on Γ to itself: otherwise the Occur-Check-Violation rule (L6) would have applied.It follows, from this observation and the preceding one, that there is a unique end-node U 0 on Γ, i.e., a node from which there is no directed outgoing arc; and also that for any given node U on Γ, there is a unique well-defined directed path leading from U to that end-node U 0 .
It follows easily from these, that the list-variables on the left hand sides of the equations in P (on the different connected components of G l ) can be ordered suitably, so as to satisfy the condition for P to be in a d-solved form.
Example 3.7.The following BC 0 -unification problem is in standard form: We apply (L5) (Splitting) and write V = ?cons(v 1 , V 1 ), with v 1 , V 1 fresh; this, followed by an application of rule (L2) (Cancellation on cons) leads to: We apply cancellativity of h (valid for BC 0 ), and an element-variable elimination; the problem thus derived is the following: No rule of IN F l is applicable: in particular, (L4.b) doesn't apply since W is not in nonnil; but the rule (L8) (Nil-solution Branch for bc) can be nondeterministically applied: These equations, in d-solved form, give a solution to the original problem.
(ii) For the sake of completeness, we could also try the rule (L9) (Guess a non-Nil branch) nondeterministically, successively on the two equations for W in the problem derived above; so we write V 1 = ?cons(v 2 , V ′ 2 ) and V 2 = ?cons(v 3 , V ′ 3 ).These applications of (L9), followed by applications of Variable elimination, Cancellation on cons, and the cancellativity of h (valid for the theory BC 0 ), will lead us to: The list-equations are in d-solved form, but the element-equations being unsatisfiable we are led to failure.
(iii) For the following problem (almost same as (i) above, but for an element-equation): U = ?cons(x, W ), U = ?bc(V, y), W = ?bc(V 2 , y), y = ?a the reasonings as developed in (ii) above would have led us to a non-nil solution for W : 3 is any arbitrary list, and v 1 , v 2 are any arbitrary elements.
We turn our attention in the following section to the unification problem modulo BC.When h is uninterpreted, we saw that this unification is decidable in polynomial time.But when h is interpreted so that BC models CBC, we shall see that unification modulo BC 1 is NP-complete.

Solving a BC-Unification problem
Let P be a BC-Unification problem, given in standard form.We assume that IN F ′ l has terminated without failure on P; we saw, in the preceding section (Proposition 3.6), that P is then in d-solved form.We also assume that we have a sound and complete procedure for solving the element-equations of P, that we shall denote as IN F e .For the theory BC 0 where h is uninterpreted, we know (Proposition 3.5) that IN F e is standard unification, with cancellation rules for h, and failure in case of 'symbol clash'.For the theory BC 1 , where h(x, y) is interpreted as e k (x ⊕ y) for some fixed key k, IN F e will have rules for semicancellation on h, besides the rules for unification modulo XOR in some fixed procedure; such a procedure is assumed given once and for all.
In all cases, we shall consider IN F e as a black-box that either returns most general unifiers (mgu's) for the element-equations of P, or a failure message when these are not satisfiable.Note that IN F e is unitary for BC 0 and finitary for BC 1 .For any problem P in d-solved form, satisfiable under the theory BC 0 , there is a unique mgu, as expressed by the equations of P themselves (cf.also [14]), that we shall denote by θ P .Under BC 1 there could be more than one (but finitely many) mgu's; we shall agree to denote by θ P any one among them.The entire procedure for solving any BC-unification problem P, given in standard form, can now be synthesized as a nondeterministic algorithm: The Algorithm A: Given a BC-unification problem P, in standard form.
G l = Propagation graph for P. IN F ′ l = Inference procedure given above for L(P).IN F e = Any given (complete) procedure for solving the equations of E(P).
(1) Compute a standard form for P, to which the "don't-care" inferences of IN F l are no longer applicable.If this leads to failure, exit with FAIL.Otherwise, replace P by this standard form.(2) Apply the "don't-know" nondeterministic rules (L8)-(L10), followed by the rules of IN F l as needed, until the equations no longer get modified by the inference rules (L1)-(L10).If this leads to failure, exit with FAIL.(3) Apply the procedure IN F e for solving the residual set E(P) of element-equations; if this leads to failure, exit with FAIL.(4) Otherwise let σ be the substitution on the variables of P as expressed by the resulting equations.Return σ as a solution to P.
Proposition 4.1.The algorithm A is sound and complete.
Proof.The soundness of A follows from the soundness (assumed) of IN F e and that of IN F ′ l , which is easy to check: obviously, if P ′ is any problem derived from P by applying any of these inference rules, then any solution for P ′ corresponds to a solution for P. The completeness of A follows from the completeness (assumed) of IN F e , and the completeness of IN F ′ l that we prove below.
Lemma 4.2.If σ is a solution for a given BC-unification problem P in standard form, then there is a sequence of IN F ′ l -inference steps that transforms P into a problem P ′ in d-solved form such that σ is an instance of θ P ′ (modulo BC).
Proof.We know that the inference rules of IN F ′ l terminate on P; let N be the maximum number of steps needed for this termination, including along all possible "don't-know" branches of the process.We prove the lemma by induction on N , and case analysis for the possible branches.
Observe first that if P ′ is a problem derived from P under any inference rule of IN F ′ l , then the given substitution σ, on on the variables of P, extends naturally as a substitution on the variables of P ′ , satisfying the equations of P ′ .(This needs to be checked only if P ′ might involve new variables, such as when P ′ is derived from P under rule (L5) or rule (L4.b); the reasoning is straightforward for either of these cases.) If P ′ is derived from P by applying one of the "don't-care" rules of IN F l , then the assertion of the lemma follows from the above observation and the induction hypothesis.So we may assume wlog that the given problem P is already L-reduced (i.e., none of the inferences of IN F l is applicable).If such a P is already in d-solved form, then we are done, since σ BC θ P , for some mgu θ P .(If the theory is BC 1 , this means: there exists one among the finitely many mgus, for which this holds.) If P is not in d-solved form, then several cases are possible, depending on the possible inference branches.It suffices to consider one such case -the reasoning being quite similar for all the others.Suppose there are two equations U = ?bc(Z, v) and U = ?bc(Y, w) in P. If σ(v) = BC σ(w), then we must have σ(Z) = BC σ(Y ), and σ is extendable as a solution for the problem obtained by applying the rule (L10).If σ(v) = BC σ(w), then σ must be extendable as a solution to the problem derived under rule (L8) or rule (L9).The induction hypothesis (on the maximum number of inference steps needed for termination) completes then the argument to prove the lemma, in all cases.Proposition 4.3.Unification modulo BC is finitary.
Proof.Let P be a satisfiable BC-unification problem.We can assume without loss of generality that P is in standard form, because any unification problem can be converted to a finite problem in standard form.Let S be the set of mgus for P. By lemma 4.2, for each σ ∈ S, there is a sequence of IN F ′ l -inference steps that leads to a problem P ′ in d-solved form, and an mgu θ P ′ such that σ is an instance of θ P ′ .Let D be the set of all such derived problems.Because all the inference rules in IN F ′ l terminate, and because there are finitely many inference rules, D contains finitely many problems.
In the uninterpreted case BC 0 , σ is θ P ′ for some P ′ ∈ D, so there are finitely many unifiers in S. For BC 1 , note that unification modulo XOR is finitary [16].Therefore, there are finitely many XOR-mgus for the element problem derived from P ′ , so there are finitely many unifiers in S that are instances of θ P ′ .Since there are finitely many problems in D, there are finitely many unifiers in S.
4.1.BC 1 -Unification is NP-Complete.Recall that BC 0 is the theory defined by BC when h is uninterpreted, and BC 1 is the theory when h is interpreted so that BC models the (XOR-based) cipher-block-chaining mode CBC.Proof.NP-hardness follows from the fact that general unification modulo XOR is NPcomplete [12].We deduce the NP-upper bound from the following facts: a) For any given BC-unification problem, computing a standard form is in polynomial time, wrt the size of the problem.b) Given a standard form, the propagation graph can be constructed in polynomial time (wrt its number of variables).c) Applying (L1)-(L10) till termination takes only polynomially many steps.d) Extracting the set of element-equations from the resulting set of equations is in P. e) Solving the element-equations, with the procedure IN F e , using unification modulo XOR, is in NP.

An Illustrative Example.
The following public key protocol is a slight variant of one that was studied in [11] the modification is that the namestamp of the sender of a message appears as the first block of the encrypted message body, and not the second as was specified in [11]: where A, B are the participants of the protocol session, m is a message that they intend secret for others, and kb (resp.ka) is the public key of B (resp.A).
If the CBC encryption mode is assumed and the message blocks are all of the same size, then this protocol becomes insecure; here is why.Let e Z (x) stand for the encryption e(x, kz) with the public key kz of any principal Z. Example 4.5.The above attack (which exploits the properties of XOR: x⊕x = 0, x⊕0 = x) can be modeled as solving a certain BC 1 -unification problem.We assume that the names A, B, I, as well as the initialization vector w, are constants accessible to I. The message m and the initialization vector v, that A and B have agreed upon, are constants intended to be secret for I.We shall interpret the function symbol h of BC in terms of encryption with the public key of B: i.e., h(x, y) is e B (x ⊕ y).
The protocol above can then be modeled as follows: We assume that the list of terms  h(I, w), [h(m, h(A, v))]), for the element-variable z, i.e., B needs to solve the element-equation: h(z, h(I, w)) = ?h(m, h(A, v)); since h is interpreted here so that BC models CBC, (s)he can do so by setting: z := m ⊕ h(A, v) ⊕ h(I, w); and that precisely leads to the attack.Remark 4.6.(i) The above analysis does not go through if the namestamp forms the second block of the encrypted part of the messages sent.In such a case, the protocol is 'leak-proof' even under CBC, provided we assume that an IV for a message is a secret to be shared only by the sender and the intended recipient of the message, and that it is not transmitted -as clear text or encrypted -as an initial 'block number zero' of the message body.Actually, by reasoning as above, one checks that the intruder I in such a case can only get hold of m ⊕ v, where v is the (secret) IV that only A and B share.This in a sense is in accordance with [11], where the protocol was 'proved secure' under such a specification.
(ii) The considerations above lead us to conclude, implicitly, that in cryptographic protocols employing the CBC encryption mode, it is necessary to forbid free access to the IVs of the 'records' of the 'messages' sent, if information leak is to be avoided.This fact has been pointed out in the 90's, by Bellare et al ([6]), and again, in some detail, by K. G. Paterson et al in [19]; both point out that TLS 1.0 -with its predictable IVs -is inherently insecure.For more on this point, and on the relative advantages of TLS 1.1, TLS 1.2 over TLS 1.0, the reader can also consult, e.g., http://www.educatedguesswork.org/2011/09/(Note: keeping IVs as shared secrets alone may not always be sufficient in general, as is shown by Example 2 above.)

A generic Block Chained Cipher-Decipher Scheme
In this section we extend the 2-sorted equational theory BC 0 studied above, into one that fully models, in a simple manner and without using any AC-symbols, a 'generic' block chaining encryption-decryption scheme.This theory, that we shall refer to as DBC, is defined by the following set of (2-sorted) equations: where g is typed as g : τ e × τ e → τ e and db is typed as db : τ l × τ e → τ l .
All these equations can be oriented from left to right under a suitable reduction ordering, to form a convergent (2-sorted) rewrite system.The 6th equation says that db is a leftinverse for bc; it is actually an inductive consequence of the first five: i.e., for any list-term X and element-term y both in ground normal form, db(bc(X, y), y) reduces to X under the first five, a fact that can be easily checked by structural induction, cf.Appendix-2.(Its insertion as an equational axiom is for technical reasons, as will be explained in Remark 5.8(ii) below.) A few words, by way of intended semantics in the context of cryptographic protocols, seem appropriate: h(x, y) would in such a context stand for the encryption with the public key of an intended recipient B, of message x, 'coupled' in a sense to be defined, with y as initialization vector (IV); and g(h(x, y), y) would be the decryption of h(x, y) with the private key of B, to be then 'decoupled', again in a sense to be defined, with y.If an agent A wants to send a list of terms cons(x, Y ) to recipient B, (s)he would send out bc(cons(x, Y ), z) where z is the IV they have mutually agreed upon; and B would see it as the list of terms cons(h(x, z), bc(Y, h(x, z))), from which (s)he can retrieve the individual message terms by applying the last equation for db in the system DBC.
This generic block chained encryption-decryption scheme is a natural abstraction of the usual (XOR-based) CBC: it suffices to interpret the roles of h and g suitably, and define properly the meanings of 'coupling' and 'decoupling', to get the usual CBC mode; for that, one would define the 'coupling' as well as 'decoupling' of x with y as x ⊕ y; h(x, y) would then stand for e B (x ⊕ y), and g(z, y) would stand for d B (z) ⊕ y, where d B is decryption with the private key of B. If we go back to Example 4.5 based on the usual CBC, the encrypted part of what A sends out to B (with the notation employed there) is the list of terms: [ h(A, v), h(m, h(A, v)) ], that corresponds to the term bc([A, m], v).By applying the fifth equation in DBC to this list of terms, under the assignments: i.e., the list [A, m].In other words, the usual XOR-based CBC is indeed an 'instance' of the theory DBC.
Remark 5.1.Other 'concrete' cipher-decipher block chaining modes can also be seen as instances of DBC; one among them is the Cipher FeedBack encryption mode (CFB), which is defined as follows: Let M = p 1 . . .p n be a message given as a list of n 'plaintext' message subblocks.Then the encryption of M with any given key k and initialization vector v is defined as the list c 1 . . .c n , of ciphertext message subblocks, where: This encryption mode (also using XOR) is very similar to CBC, but works in the reverse direction (cf.e.g., http://en.wikipedia.org/wiki/Blockcipher modes of operation).
It is an instance of DBC, if the 'coupling' and the 'decoupling' operations of DBC, namely h(x, y) and g(x, y), are both defined as x ⊕ e k (y).
The theory DBC thus appears, indeed, as a high level equational abstraction of the block chained encryption-decryption mode; it employs no AC-symbols for this abstraction.It is easy to see, on the other hand, that the equations of DBC can all be oriented left-toright under a suitable reduction ordering, to give a convergent rewrite system.We shall be showing below that unification modulo DBC is NP-decidable; it turns out to be actually NP-complete, due to the presence of a left-inverse for h (namely g).
Remark 5.2.: It is important to note that the function g is not semi-cancellative: g(h(g(t, u), u), u) = DBC g(t, u), but h(g(t, u), u) and t need not be equivalent modulo DBC.However, it is easy to show that g is left-cancellative; see Appendix-1 for the details.

Unification modulo DBC.
We assume without loss of generality that any DBC-unification problem P is given in a standard form, i.e., as a set of equations EQ, each having one of the following forms: U = ?V, U = ?bc(V, y), U = ?db(V, y), U = ?cons(v, W ), U = ?nil, u = ?v, u = ?g(w, y), v = ?h(w, x), u = ?const We have to extend some of the notions and notation of Section 3.1, in order to take db into account.These extensions concern the propagation graph G l of the problem and nonnil, the set of variables which cannot be nil.
(i) If U = ?db(V, y) is in P, then write U > db V ; in which case, insert a directed arc on G l from [U ] to [V ] and label it with > db .The graph G l will also have then a two-sided (undirected) edge between [U ] and [V ], labeled with ∼ db .(ii) The set of variables nonnil, defined earlier, is extended as follows: If U = ?db(V, y) is in P, then U is in nonnil if and only if V is in nonnil.
We define a new relation > c = > bc ∪ > db .Its symmetric closure is ∼ c and its transitive, reflexive, and symmetric closure is ∼ * c .The relations > + c , > + db , > * db are then defined in the usual manner.If U ∼ c V , then U and V are related by 'chaining', i.e. by some number of bc and db operations.We refine then the partial relation ≻ l on the nodes of G l as follows: This relation can still continue to be read as: [U ] ≻ l [V ] iff there is a directed path on G l from [U ] to [V ], at least one arc of which has label > cons .
We extend now the inference system IN F ′ l of Section 3.1 by adding the following listinferences; these additional rules are essentially the db-counterparts of the list-inferences of IN F ′ l which only needed to consider bc.(There are several reasons why we have not worked with DBC right from the start -maybe the inference system would possibly have been more concise, if we had done so.A first reason is, that would have been at the expense of readability; a second reason is that BC-unification is of interest on its own, especially for BC 1 , as is shown by Example 4.5 above; a third and conclusive reason is that the inference system we present below for DBC-unification, actually reduces the problem to a problem of BC-unification.)We first formulate the "don't-care" nondeterministic inference rules.EQ ⊎ { U = ?db(V, x), V = ?nil } EQ ∪ { U = ?nil, V = ?nil } (DB1.c)Nil solution-3 for db:: EQ ⊎ { U = ?db(V, x), U = ?db(W, y) } EQ ∪ { V = ?cons(v, V ′ ), W = ?cons(w, W ′ ), U = ?cons(u, U ′ ), U ′ = ?db(V ′ , v), U ′ = ?db(W ′ , w), u = ?g(v, x), u = ?g(w, y) } if U ∈ nonnil (DB3.b)Push bc and db below cons at a nonnil bc/db-peak :: Splitting for db at a cons/db-peak:: EQ ⊎ { U = ?cons(x, U 1 ), U = ?db(V, z) } EQ ∪ { U = ?cons(x, U 1 ), x = ?g(y, z), U 1 = ?db(V 1 , y), V = ?cons(y, V 1 ) } (DB5) Flip db to bc conditionally: : Rules (DB3.a),(DB3.b),(DB4) and (DB5) have the lowest priority: they are to be applied in the "laziest" fashion.The rule (DB3.b)("Push bc and db below cons. . .if nonnil") is justified by the conditional left-cancellativity of db (cf.Lemma F, Appendix-2).Rule (DB5) is actually a 'narrowing' step, justified by the fact that db 'is a left-inverse' for bc.
For the completeness of the procedure, we shall also need a few more list inference rules which are "don't-know" nondeterministic; namely, the rules (DB6.a)-(DB8)below: (DB6.a)Guess a Nil-solution-Branch for db at a db/db-peak :: EQ ⊎ {U = ?db(V, x), U = ?db(W, y)} EQ ∪ {U = ?nil, V = ?nil, W = ?nil} (DB6.b)Guess a Nil-solution-Branch for bc and db at a bc/db-peak :: EQ ⊎ {U = ?bc(V, x), U = ?db(W, y)} EQ ∪ {U = ?nil, V = ?nil, W = ?nil} (DB7.a)Guess a Narrowing step for db at a db/db-peak :: EQ ⊎ {U = ?db(V, x), U = ?db(W, y} EQ ∪ {V = ?bc(U, x), U = ?db(W, y}} if V ≯ * db U (DB7.b) Guess a Narrowing step for db at a bc/db-peak :: EQ ⊎ {U = ?bc(V, x), U = ?db(W, y} EQ ⊎ {U = ?db(V, x), U = ?db(W, y)} EQ ∪ {U = ?db(W, y), V = ?W, x = ?y} We denote by IN F ′′ l the inference system that extends IN F ′ l with the list-inference rules (DB1)-(DB8), given above.It is important to note that the Occur-Check Violation rule (L6) is henceforth to be applied to DBC-unification problems in standard form, under the partial relation ≻ l as has been refined above.Proposition 5.3.Let P be any DBC-unification problem, given in standard form.The inference system IN F ′′ l terminates on P in polynomially many steps.Proof.This is an extension of Proposition 3.3, to the inference system IN F ′′ l .The proof of that earlier proposition can be carried over practically verbatim: we only have to show that the new inferences that might introduce fresh variables, namely the three rules (DB3.a),(DB3.b) and (DB4), cannot lead to a non-terminating chain of inferences.To ensure this, a first observation is that the relation ∼ β , which was used in the proof of Proposition 3.3, has to be refined now so as to take also into account the relation ∼ db , the symmetric closure of > db , as follows: A second observation is that these three rules which might introduce fresh variables remove a ∼ db -edge at some node U , and introduce a new ∼ db -edge at a node U ′ such that U > cons U ′ ; but the number of ∼ β -equivalence classes remains the same, by the same argument as developed in the proof of Proposition 3.3.The other details of that earlier proof carry over verbatim.
Given any DBC-unification problem P in standard form, let A ′′ denote the inference procedure based on the rules of IN F ′′ l , given above for its list-equations; we augment the procedure A ′′ with any given complete procedure for solving the residual set of elementequations in the problem, when the list-inference rules of IN F ′′ l are no longer applicable.We have then the following result: Proposition 5.4.The procedure A ′′ is sound and complete for solving DBC-unification problems given in standard form.
Proof.The proof uses the same lines of reasoning as for Proposition 4.1.The procedure A ′′ is sound, because to any solution of a problem derived under any of its inferences, corresponds a solution for the initial problem.The completeness of A ′′ is again proved, for any given problem, by induction on the maximum number of inference steps needed for the termination of the procedure A ′′ on the problem; and using case analysis when necessary, based on the "don't-know" inference rules (DB6.a)-(DB8)above, for such an analysis.We leave out the details, which are straightforward.Proposition 5.5.Let P be a DBC-unification problem in standard form, to which none of the inferences of IN F ′′ l is applicable.Then its subset of list-equations with non-nil variables on the left-hand side is in d-solved form.
Proof.This extends Proposition 3.6 to the inference system IN F ′′ l .Note that we just need to show the following: From any given node [U ] on any given connected component Γ of the Propagation graph G l , there is an unambiguous, cycle-free, directed path to a welldetermined end-node on Γ.Now, given that any directed arc on G l is labeled with either > cons , or > bc , or > db , there can be at most one outgoing arc from [U ]: otherwise one of the inferences (DB2)-(DB8) would have been applicable; there can be no directed ≻ l -cycle either at [U ], otherwise the Occur-Check violation rule would have been applicable.Thus, the proof of that earlier proposition carries over, essentially verbatim.Proposition 5.6.Unification modulo the theory DBC is NP-complete.
Proof.Given any DBC-unification problem P, computing a standard form can be done in polynomial time (wrt the number of variables of P); the same holds also for constructing the propagation graph for the standard form.Applying then the inference rules of IN F ′′ l till termination, on this standard form, takes only polynomially many steps, by Proposition 5.3.In case of non-failure, extracting the set of element-equations from the resulting problem can obviously be done in polynomial time.
To show that solving P is in NP, it suffices therefore to show that the set of its elementequations can be solved, modulo the theory defined by the single equation g(h(x, y), y) = x, in nondeterministic polynomial time.But this is a collapsing convergent system, and the unification problem for such theories is known to be decidable and finitary [13,18].In particular, a decision procedure can be built by using basic normalized narrowing, e.g., as given in [5]; cf. also [17].We outline, briefly, such a procedure: Procedure for Solving E(P): Note that every equation in E(P) is either a g-equation, i.e., an equation of the form u = ?g(x, v); or an h-equation, of the form u = ?h(x, y).equations: car(cons(x, Y )) = x (6.1) cdr(cons(x, Y )) = Y (6.2) bc(nil, z) = nil (6.3) bc(cons(x, Y ), z) = cons(h(x, z), bc(Y, h(x, z))) (6.4) g(h(x, y), y) = x (6.5) db(nil, z) = nil (6.6) db(cons(x, Y ), z) = cons(g(x, z), db(Y, x)) (6.7) db(bc(X, y), y) = X (6.8) with car typed as τ l → τ e , and cdr as τ l → τ l .All these equations can be oriented leftto-right under a suitable simplification ordering, and the resulting rewrite system remains convergent.It is not difficult to check that, even after the addition of these two projection rules, unification problems -with some very minor restrictions on the form of equations involving car and cdr -can still be assumed in a standard form, and solved by the inference procedure IN F ′′ l given above.In other words, the results of Section 5 remain valid for this enlarged 2-sorted convergent rewrite system -that we shall again refer to as DBC, since no confusion seems likely.
The rewrite system DBC thus enlarged can actually been shown to be ∆-strong in the sense of [3], under a suitable precedence based (lpo-or rpo-like) simplification ordering, by taking ∆ to be the subsystem formed of the two rules (6.1) and (6.2).It would then follow from Proposition 11 of [3], that the so-called 'passive deduction' problem, for an intruder, is decidable, if the intruder capabilities are modeled by this theory DBC.This would yield, to our knowledge, the first purely rewrite/unification based approach for analyzing cryptographic protocols employing the CBC encryption mode.The details will be given elsewhere, where we also hope to present decision procedures for a couple of other security problems, where an intruder eavesdrops or guesses some low-entropy data in the context of block ciphers.
Finally, observe that unification modulo equational theories often serves as an auxiliary procedure in several formal protocol analysis tools, such as Maude-NPA, CL-Atse, . . ., for handling algebraic properties of cryptoprimitives.The work we have presented in this paper could be of use in these tools, as a first step towards the automation of attack detection in cryptographic protocols employing CBC.
Appendix-1: On the Cancellativity properties of bc, g and db Lemma A. For all terms T 1 , T 2 , t, we have: bc(T 1 , t) ≈ BC bc(T 2 , t) if and only if T 1 ≈ BC T 2 .
Proof.The proof is by structural induction on the terms, based on the semi-cancellativity of h and the cancellativity of cons.If either T 1 or T 2 is nil, then the other has to be nil too, and the assertion of the Lemma is trivial.So suppose that T 1 and T 2 are not nil.Then T 1 = cons(u 1 , T ′ 1 ) and T 2 = cons(u 2 , T ′ 2 ), for some terms u 1 , u From the semi-cancellativity of h, we then deduce that: Proof.The proof is by exactly the same reasonings as for proving the previous lemma.
We shall paraphrase these two lemmas together by saying that bc is "conditionally" semi-cancellative.
As for the analogs of the above results for the operator db of DBC, we first observe that the function g is not semi-cancellative -more precisely, it is not right-cancellative: indeed, we have g(h(g(t, u), u), u) = DBC g(t, u), although h(g(t, u), u) = DBC t, in general.But left-cancellativity holds for g.

3. 1 .
Inference System IN F l for List-Equations.
Under the CBC encryption mode, what A sends to B is the following list, in the ML-notation: A → B : [ A, [ e B (A ⊕ v), e B (m ⊕ e B (A ⊕ v)) ] ].Here ⊕ stands for XOR and v is the initialization vector (IV ) agreed upon between A and B. But then, some other agent I, entitled to open a session with B with initialization vector w, can get hold of the first encrypted block (namely: e B (A ⊕ v)) as well as the second encrypted block of what A sent to B, namely e B (m, e B (A ⊕ v)); (s)he can then send the following as a 'bona fide' message to B: I → B : [ I, [ e B (I ⊕ w), e B (m ⊕ e B (A ⊕ v)) ] ]; upon which B will send back to I the following: B → I : [ B, [ e I (B ⊕ w), e I ( m ⊕ e B (A ⊕ v) ⊕ e B (I ⊕ w) ⊕ e I (B ⊕ w) ) ] ].It is clear now, that the intruder I can get hold of the message m intended to remain secret for him/her: By decrypting the second block of the (encrypted part of the) message received from B, (s)he first deduces: m ⊕ e B (A ⊕ v) ⊕ e B (I ⊕ w) ⊕ e I (B ⊕ w); by XOR-ing this with the first block of the message, (s)he obtains: m ⊕ e B (A ⊕ v) ⊕ e B (I ⊕ w); from which (s)he can deduce m by XOR-ing with e B (I ⊕ w) and e B (A ⊕ v), both of which are known to him/her (the latter of these two terms is the first block of the message from A to B, that (s)he has intercepted).
is seen by the latter as the list of terms [ A, bc([A, m], v) ]; (s)he first recovers the namestamp A of the sender, then checks that the second argument under bc in what (s)he received is the IV agreed upon with A; subsequently (s)he sends back the appropriate list of terms to A, acknowledging receipt of the message.Now, due to our CBC-assumption, the ground terms h(A, v), h(m, h(A, v)) are both accessible to the intruder I.So the attack by I, mentioned above, corresponds to the fact that I can send to B the following list of terms: [ I, [ h(I, w), h(m, h(A, v)) ] ].That the attack materializes follows from the fact that B can solve the BC 1 -unification problem: bc([I, z], w) = ?cons(