Probabilistic Algorithmic Knowledge

The framework of algorithmic knowledge assumes that agents use deterministic knowledge algorithms to compute the facts they explicitly know. We extend the framework to allow for randomized knowledge algorithms. We then characterize the information provided by a randomized knowledge algorithm when its answers have some probability of being incorrect. We formalize this information in terms of evidence; a randomized knowledge algorithm returning ``Yes'' to a query about a fact \phi provides evidence for \phi being true. Finally, we discuss the extent to which this evidence can be used as a basis for decisions.


Introduction
Under the standard possible-worlds interpretation of knowledge, which goes back to Hintikka [1962], an agent knows ϕ if ϕ is true at all the worlds the agent considers possible.This interpretation of knowledge has been found useful in capturing some important intuitions in the analysis of distributed protocols [Fagin, Halpern, Moses, and Vardi 1995].However, its usefulness is somewhat limited by what Hintikka [1962] called the logical omniscience problem: agents know all tautologies and know all logical consequences of their knowledge.Many approaches have been developed to deal with the logical omniscience problem (see [Fagin, Halpern, Moses, and Vardi 1995, Chapter 10 and 11] for a discussion and survey).We focus on one approach here that has been called algorithmic knowledge [Halpern, Moses, and Vardi 1994].The idea is simply to assume that agents are equipped with "knowledge algorithms" that they use to compute what they know.An agent algorithmically knows ϕ if his knowledge algorithm says "Yes" when asked ϕ. 1  Algorithmic knowledge is a very general approach.For example, Berman, Garay, and Perry [1989] implicitly use a particular form of algorithmic knowledge in their analysis of Byzantine agreement.Roughly speaking they allow agents to perform limited tests based on the information they have; agents know only what follows from these limited Evidence has been widely studied in the literature on inductive logic [Kyburg 1983].We focus on the evidence contributed specifically by a randomized knowledge algorithm.In a companion paper [Halpern and Pucella 2003], we consider a formal logic for reasoning about evidence.
The rest of this paper is organized as follows.In Section 2, we review algorithmic knowledge (under the assumption that knowledge algorithms are deterministic).In Section 3, we give semantics to algorithmic knowledge in the presence of randomized knowledge algorithms.In Section 4, we show how the definition works in the context of an example from the security domain.In Section 5 we characterize the information provided by a randomized knowledge algorithm in terms of evidence.We conclude in Section 6.All proofs are deferred to the appendix.

Reasoning about Knowledge and Algorithmic Knowledge
The aim is to be able to reason about properties of systems involving the knowledge of agents in the system.To formalize this type of reasoning, we first need a language.The syntax for a multiagent logic of knowledge is straightforward.Starting with a set Φ of primitive propositions, which we can think of as describing basic facts about the system, such as "the door is closed" or "agent A sent the message m to B", more complicated formulas are formed by closing off under negation, conjunction, and the modal operators K 1 , . .., K n and X 1 , . . ., X n .Thus, if ϕ and ψ are formulas, then so are ¬ϕ, ϕ ∧ ψ, K i ϕ (read "agent i knows ϕ"), and X i ϕ (read "agent i can compute ϕ").As usual, we take ϕ ∨ ψ to be an abbreviation for ¬(¬ϕ ∧ ¬ψ) and ϕ ⇒ ψ to be an abbreviation for ¬ϕ ∨ ψ.
The standard possible-worlds semantics for knowledge uses Kripke structures [Kripke 1963].Formally, a Kripke structure is composed of a set S of states or possible worlds, an interpretation π which associates with each state in S a truth assignment to the primitive propositions (i.e., π(s)(p) ∈ {true, false} for each state s ∈ S and each primitive proposition p), and equivalence relations ∼ i on S (recall that an equivalence relation is a binary relation which is reflexive, symmetric, and transitive).The relation ∼ i is agent i's possibility relation.Intuitively, s ∼ i t if agent i cannot distinguish state s from state t (so that if s is the actual state of the world, agent i would consider t a possible state of the world).For our purposes, the equivalence relations are obtained by taking a set L of local states, and giving each agent a view of the state, that is, a function L i : S → L. We define s ∼ i t if and only if L i (s) = L i (t).In other words, agent i considers the states s and t indistinguishable if he has the same local state at both states.
To interpret explicit knowledge of the form X i ϕ, we assign to each agent a knowledge algorithm that the agent can use to determine whether he knows a particular formula.A knowledge algorithm A takes as inputs a formula of the logic, a local state ℓ in L, as well as the state as a whole.This is a generalization of the original presentation of algorithmic knowledge [Halpern, Moses, and Vardi 1994], in which the knowledge algorithms did not take the state as input.The added generality is necessary to model knowledge algorithms that query the state-for example, a knowledge algorithm might use a sensor to determine the distance between a robot and a wall (see Section 5).Knowledge algorithms are required to be deterministic and terminate on all inputs, with result "Yes", "No", or "?".A knowledge algorithm says "Yes" to a formula ϕ (in a given state) if the algorithm determines that the agent knows ϕ at the state, "No" if the algorithm determines that the agent does not know ϕ at the state, and "?" if the algorithm cannot determine whether the agent knows ϕ.
An algorithmic knowledge structure M is a tuple (S, π, L 1 , . . ., L n , A 1 , . . ., A n ), where L 1 , . . ., L n are the view functions on the states, and A 1 , . . ., A n are knowledge algorithms. 2  We define what it means for a formula ϕ to be true (or satisfied) at a state s in an algorithmic knowledge structure M , written (M, s) |= ϕ, inductively as follows: (M, s) |= ϕ for all t with s ∼ i t (M, s) |= X i ϕ if A i (ϕ, L i (s), s) = "Yes".The first clause shows how we use the π to define the semantics of the primitive propositions.The next two clauses, which define the semantics of ¬ and ∧, are the standard clauses from propositional logic.The fourth clause is designed to capture the intuition that agent i knows ϕ exactly if ϕ is true in all the states that i considers possible.The final clause interprets X i ϕ via agent i's knowledge algorithm.Thus, agent i has algorithmic knowledge of ϕ at a given state if the agent's algorithm outputs "Yes" when presented with ϕ, the agent's local state, and the state.(Both the outputs "No" and "?" result in lack of algorithmic knowledge.)As usual, we say that a formula ϕ is valid in structure M and write We can think of K i as representing implicit knowledge, facts that the agent implicitly knows, given its information.One can check that implicit knowledge is closed under implication, that is, is valid, and that an agent implicitly knows all valid formulas, so that if ϕ is valid, then K i ϕ is valid.These properties say that agents are very powerful reasoners.What is worse, while it is possible to change some properties of knowledge by changing the properties of the relation ∼ i , no matter how we change it, we still get closure under implication and knowledge of valid formulas as properties.They seem to be inescapable features of the possible-worlds approach.This suggests that the possible-worlds approach is appropriate only for "ideal knowers", ones that know all valid formulas as well as all logical consequences of their knowledge, and thus inappropriate for reasoning about agents that are computationally limited.In contrast, X i represents explicit knowledge, facts whose truth the agent can compute explicitly.Since we put no a priori restrictions on the knowledge algorithms, an agent can explicitly know both ϕ and ϕ ⇒ ψ without explicitly knowing ψ, for example.
As defined, there is no necessary connection between X i ϕ and K i ϕ.An algorithm could very well claim that agent i knows ϕ (i.e., output "Yes") whenever it chooses to, including at states where K i ϕ does not hold.Although algorithms that make mistakes are common, we are often interested in knowledge algorithms that are correct.We say that a knowledge algorithm is sound for agent i in the structure M if for all states s of M and formulas ϕ, A i (ϕ, L i (s), s) = "Yes" implies (M, s) |= K i ϕ, and A i (ϕ, L i (s), s) = "No" implies (M, s) |= ¬K i ϕ.Thus, a knowledge algorithm is sound if its definite answers are correct.If we restrict attention to sound algorithms, then algorithmic knowledge can be viewed as an instance of awareness, as defined by Fagin and Halpern [1988]. 2 Halpern, Moses, and Vardi [1994] introduced algorithmic knowledge in the context of dynamic systems, that is, systems evolving in time.The knowledge algorithm is allowed to change at every state of the system.Since the issues that interest us do not involve time, we do not consider dynamic systems in this paper.We remark that what we are calling "algorithmic knowledge structures" here are called "algorithmic structures" in [Fagin, Halpern, Moses, and Vardi 1995;Halpern, Moses, and Vardi 1994].The term "algorithmic knowledge structures" is used in the paperback edition of [Fagin, Halpern, Moses, and Vardi 1995].
There is a subtlety here, due to the asymmetry in the handling of the answers returned by knowledge algorithms.The logic does not let us distinguish between a knowledge algorithm returning "No" and a knowledge algorithm returning "?"; they both result in lack of algorithmic knowledge.3In section 5.3, where we define the notion of a reliable knowledge algorithm, reliability will be characterized in terms of algorithmic knowledge, and thus the definition will not distinguish between a knowledge algorithm returning "No" or "?".Thus, in that section, for simplicity, we consider algorithms that are complete, in the sense that they always return either "Yes" or "No", and not "?".More precisely, for a formula ϕ, define a knowledge algorithm A i to be ϕ-complete for agent i in the structure M if for all states s of M , A i (ϕ, L i (s), s) ∈ {"Yes", "No"}.

Randomized Knowledge Algorithms
Randomized knowledge algorithms arise frequently in the literature (although they have typically not been viewed as knowledge algorithms).In order to deal with randomized algorithms in our framework, we need to address a technical question.Randomized algorithms are possibly nondeterministic; they may not yield the same result on every invocation with the same arguments.Since X i ϕ holds at a state s if the knowledge algorithm answers "Yes" at that state, this means that, with the semantics of the previous section, X i ϕ would not be well defined.Whether it holds at a given state depends on the outcome of random choices made by the algorithm.However, we expect the semantics to unambiguously declare a formula either true or false.
Before we describe our solution to the problem, we discuss another potential solution, which is to define the satisfaction relation probabilistically.That is, rather than associating a truth value with each formula at each state, we associate a probability Pr s (ϕ) with each formula ϕ at each state s.The standard semantics can be viewed as a special case of this semantics, where the probabilities are always either 0 or 1.Under this approach, it seems reasonable to take Pr s (p) to be either 0 or 1, depending on whether primitive proposition p is true at state s, and to take Pr s (X i ϕ) to be the probability that i's knowledge algorithm returns "Yes" given inputs ϕ, L i (s), and s.However, it is not then clear how to define Pr s (ϕ ∧ ψ).Taking it to be Pr s (ϕ)Pr s (ψ) implicitly treats ϕ and ψ as independent, which is clearly inappropriate if ψ is ¬ϕ. 4 Even ignoring this problem, it is not clear how to define Pr s (X i ϕ ∧ X i ψ), since again there might be correlations between the output of the knowledge algorithm on input (ϕ, L i (s), s) and input (ψ, L i (s), s).
We do not use probabilistic truth values in this paper.Instead, we deal with the problem by adding information to the semantic model to resolve the uncertainty about the truth value of formulas of the form X i ϕ.Observe that if the knowledge algorithm A is randomized, then the answer that A gives on input (ϕ, ℓ, s) will depend on the outcome of coin tosses (or whatever other randomizing device is used by A).We thus turn the randomized algorithm into a deterministic algorithm by supplying it with an appropriate argument.For example, we supply an algorithm that makes random choices by tossing coins a sequence of outcomes of coin tosses.We can now interpret a knowledge algorithm answering "Yes" with probability α at a state by considering the probability of those sequences of coin tosses at the state that make the algorithm answer "Yes".
Formally, we start with (possibly randomized) knowledge algorithms A 1 , . . ., A n .For simplicity, assume that the randomness in the knowledge algorithms comes from tossing coins.A derandomizer is a tuple v = (v 1 , . . ., v n ) such that for every agent i, v i is a sequence of outcomes of coin tosses (heads and tails).There is a separate sequence of coin tosses for each agent rather than just a single sequence of coin tosses, since we do not want to assume that all agents use the same coin.Let V be the set of all such derandomizers.To every randomized algorithm A we associate a derandomized algorithm A d which takes as input not just the query ϕ, local state ℓ, and state s, but also the sequence v i of i's coin tosses, taken from a derandomizer (v 1 , . . ., v n ).A probabilistic algorithmic knowledge structure is a tuple N = (S, π, L 1 , . . ., L n , A d 1 , . . ., A d n , ν), where ν is a probability distribution on V and A d i is the derandomized version of A i .(Note that in a probabilistic algorithmic knowledge structure the knowledge algorithms are in fact deterministic.) The only assumption we make about the distribution ν is that it does not assign zero probability to the nonempty sets of sequences of coin tosses that determine the result of the knowledge algorithm.More precisely, we assume that for all agents i, formulas ϕ, and states s, Yes"}) > 0, and similarly for "No" and "?" answers.Note that this property is satisfied, for instance, if ν assigns nonzero probability to every sequence of coin tosses.We do not impose any other restrictions on ν.In particular, we do not require that the coin be fair or that the tosses be independent.Of course, we can capture correlation between the agents' coins by using an appropriate distribution ν.
The truth of a formula is now determined relative to a pair (s, v) consisting of a state s and a derandomizer v.We abuse notation and continue to call these pairs states.The semantics of formulas in a probabilistic algorithmic knowledge structure is a straightforward extension of their semantics in algorithmic knowledge structures.The semantics of primitive propositions is given by π; conjunctions and negations are interpreted as usual; for knowledge and algorithmic knowledge, we have given that v i describes the outcomes of the coin tosses.It is perhaps best to interpret (M, s, v) |= X i ϕ as saying that agent i's knowledge algorithm would say "Yes" if it were run in state s with derandomizer v i .The semantics for knowledge then enforces the intuition that the agent knows neither the state nor the derandomizer used.5 Having the sequence of coin tosses as part of the input allows us to talk about the probability that i's algorithm answers yes to the query ϕ at a state s.
To capture this in the language, we extend the language to allow formulas of the form Pr(ϕ) ≥ α, read "the probability of ϕ is at least α". 6The semantics of such formulas is straightforward: t ∈ S such that s ∼i t.This would be appropriate if the agent knew the derandomizer being used. 6We allow α to be an arbitrary real number here.If we were concerned with complexity results and having a finitary language, it would make sense to restrict α to being rational, as is done, for example, in [Fagin, Halpern, and Megiddo 1990].None of our results would be affected if we restrict α in this way.
Note that the truth of Pr(ϕ) ≥ α at a state (s, v) is independent of v. Thus, we can abuse notation and write (N, s) |= Pr(ϕ) ≥ α.In particular, (N, s) |= Pr(X i ϕ) < α (or, equivalently, (N, s) |= Pr(¬X i ϕ) ≥ 1 − α) if the probability of the knowledge algorithm returning "Yes" on a query ϕ is less than α, given state s.
If all the knowledge algorithms used are deterministic, then this semantics agrees with the semantics given in Section 2. To make this precise, note that if A is deterministic, then Propositions 3.1 and 3.2 justify the decision to "factor out" the randomization of the knowledge algorithms into semantic objects that are distinct from the states; the semantics of formulas that do not depend on the randomized choices do not in fact depend on those additional semantic objects.

An Example from Security
As we mentioned in the introduction, an important area of application for algorithmic knowledge is the analysis of cryptographic protocols.In previous work [Halpern and Pucella 2002], we showed how algorithmic knowledge can be used to model the resource limitations of an adversary.We briefly review the framework of that paper here.
Participants in a security protocol are viewed as exchanging messages in the free algebra generated by a set P of plaintexts and a set K of keys, over abstract operations • (concatenation) and {| | } (encryption).The set M of messages is the smallest set that contains K and P and is closed under encryption and concatenation, so that if We make the assumption, standard in the security literature, that concatenation and encryption have enough redundancy to recognize that a term is in fact a concatenation m 1 • m 2 or an encryption {|m| } k .
In an algorithmic knowledge security structure, some of the agents are participants in the security protocol being modeled, while other agents are adversaries that do not participate in the protocol, but attempt to subvert it.The adversary is viewed as just another agent, whose local state contains all the messages it has intercepted, as well as the keys initially known to the adversary, such as the public keys of all the agents.We use initkey(ℓ) to denote the set of initial keys known by an agent with local state ℓ and write recv(m) ∈ ℓ if m is one of the messages received (or intercepted in the case of the adversary) by an agent with local state ℓ.We assume that the language includes a primitive proposition has i (m) for every message m, essentially saying that message m is contained within a message that agent i has received.Define the containment relation ⊑ on M as the smallest relation satisfying the following constraints: (1) m ⊑ m; (2 Clearly, the adversary may not explicitly know that he has a given message if that message is encrypted using a key that the adversary does not know.To capture these restrictions, Dolev and Yao [1983] gave a now-standard description of capabilities of adversaries.Succinctly, a Dolev-Yao adversary can compose messages, replay them, or decipher them if he knows the right keys, but cannot otherwise "crack" encrypted messages.The Dolev-Yao model can be formalized by a relation H ⊢ DY m between a set H of messages and a message m. (Our formalization is equivalent to many other formalizations of Dolev-Yao in the literature, and is similar in spirit to that of Paulson [1998].)Intuitively, H ⊢ DY m means that an adversary can "extract" message m from a set of received messages and keys H, using the allowable operations.The derivation is defined using the following inference rules: where k −1 represents the key used to decrypt messages encrypted with k.
We can encode these capabilities via a knowledge algorithm A DY for the adversary as agent i. Intuitively, the knowledge algorithm A DY simply implements a search for the derivation of a message m from the messages that the agent has received and the initial set of keys, using the rules given above.The most interesting case in the definition of A DY is when the formula is has i (m).To compute A DY i (has i (m), ℓ, s), the algorithm simply checks, for every message m ′ received by the adversary, whether m is a submessage of m ′ , according to the keys that are known to the adversary (given by the function keysof ).Checking whether m is a submessage of m ′ is performed by a function submsg, which can take apart messages created by concatenation, or decrypt messages as long as the adversary knows the decryption key.(The function submsg basically implements the inference rules for ⊢ DY .) Note that the algorithm does not use the input s.Further details can be found in [Halpern and Pucella 2002], where it is also shown that A DY i is a sound knowledge algorithm that captures the Dolev-Yao adversary in the following sense: Proposition 4.1.[Halpern and Pucella 2002] If M = (S, π, L 1 , . . ., L n , A 1 , . . ., A n ) is an algorithmic knowledge security structure with an adversary as agent i and The Dolev-Yao algorithm is deterministic.It does not capture, for example, an adversary who guesses keys in an effort to crack an encryption.Assume that the key space consists of finitely many keys, and let guesskeys(r) return r of these, chosen uniformly at random.Let A DY+rg(r) i be the result of modifying A DY i to take random guessing into account (the rg stands for random guess), so that A DY+rg(r)   i (has i (m), ℓ, s) is defined by the following algorithm: if m ∈ initkeys(ℓ) then return "Yes" return "Yes" return "?".
(As before, the algorithm does not use the input s.) Using A DY+rg(r)   i , the adversary gets to work with whatever keys he already had available, all the keys he can obtain using the standard Dolev-Yao algorithm, and the additional r randomly chosen keys returned by guesskeys(r).
Of course, if the total number of keys is large relative to r, making r random guesses should not help much.Our framework lets us make this precise.Proposition 4.2.Suppose that N = (S, π, L 1 , . . ., L n , A d 1 , . . ., A d n , ν) is a probabilistic algorithmic knowledge security structure with an adversary as agent i and that A i = A DY+rg(r) i .
Let K be the number of distinct keys used in the messages in the adversary's local state ℓ (i.e., the number of keys used in the messages that the adversary has intercepted at a state s with L i (s) = ℓ).Suppose that K/|K| < 1/2 and that ν is the uniform distribution on sequences of coin tosses.
Proposition 4.2 says that what we expect to be true is in fact true: random guessing of keys is sound, but it does not help much (at least, if the number of keys guessed is a small fraction of the total numbers of keys).If it is possible that the adversary does not have algorithmic knowledge of m, then the probability that he has algorithmic knowledge is low.While this result just formalizes our intuitions, it does show that the probabilistic algorithmic knowledge framework has the resources to formalize these intuitions naturally.

Probabilistic Algorithmic Knowledge
While the "guessing" extension of the Dolev-Yao algorithm considered in the previous section is sound, we are often interested in randomized knowledge algorithms that may sometimes make mistakes.We consider a number of examples in this section, to motivate our approach.
First, suppose that Bob knows (or believes) that a coin is either fair or double-headed, and wants to determine which.He cannot examine the coin, but he can "test" it by having it tossed and observing the outcome.Let dh be a proposition that is true if and only if the coin is double-headed.Bob uses the following dh-complete randomized knowledge algorithm A Bob : when queried about dh, the algorithm "tosses" the coin, returning "Yes" if the coin lands heads and "No" if the coin lands tails.It is not hard to check that if the coin is double-headed, then A Bob answers "Yes" with probability 1 (and hence "No" with probability 0); if the coin is fair, then A Bob answers "Yes" with probability 0.5 (and hence "No" with probability 0.5 as well).Thus, if the coin fair, there is a chance that A Bob will make a mistake, although we can make the probability of error arbitrarily small by applying the algorithm repeatedly (alternatively, by increasing the number of coin tosses performed by the algorithm).
Second, consider a robot navigating, using a probabilistic sensor.This sensor returns the distance to the wall in front of the robot, within some tolerance.For simplicity, suppose that if the wall is at distance m, then the sensor will return a reading of m−1 with probability 1/4, a reading of m with probability 1/2, and a reading of m + 1 with probability 1/4.Let wall(m) be a proposition true at a state if and only if the wall is at distance at most m in front of the robot.Suppose that the robot uses the following knowledge algorithm A Robot to answer queries.Given query wall(m), A Robot observes the sensor.Suppose that it reads r.If r ≤ m, the algorithm returns "Yes", otherwise, it returns "No".It is not hard to check that if the wall is actually at distance less than or equal to m, then A Robot answers "Yes" to a query wall(m) with probability ≤ 3/4 (and hence "No" with probability ≥ 1/4).If the wall is actually at distance greater than m, then A Robot answers "Yes" with probability ≤ 1/4 (and hence "No" with a probability ≥ 1/4).
There are two ways of modeling this situation.The first (which is what we are implicitly doing) is to make the reading of the sensor part of the knowledge algorithm.This means that the actual reading is not part of the agent's local state, and that the output of the knowledge algorithm depends on the global state.The alternative would have been to model the process of reading the sensor in the agent's local state.In that case, the output of the knowledge algorithm would depend only on the agent's local state.There is a tradeoff here.While on the one hand it is useful to have the flexibility of allowing the knowledge algorithm to depend on the global state, ultimately, we do not want the knowledge algorithm to use information in the global state that is not available to the agent.For example, we would not want the knowledge algorithm's answer to depend on the actual distance to the wall (beyond the extent to which the sensor reading depends on the actual distance).It is up to the modeler to ensure that the knowledge algorithm is appropriate.A poor model will lead to poor results.
Finally, suppose that Alice has in her local state a number n > 2. Let prime be a proposition true at state s if and only if the number n in Alice's local state is prime.Clearly, Alice either (implicitly) knows prime or knows ¬prime.However, this is implicit knowledge.Suppose that Alice uses Rabin's [1980] primality-testing algorithm to test if n is prime.That algorithm uses a (polynomial-time computable) predicate P (n, a) with the following properties, for a natural number n and 1 ≤ a ≤ n − 1: (1) P (n, a) ∈ {0, 1}; (2) if n is composite, P (n, a) = 1 for at least n/2 choices of a; (3) if n is prime, P (n, a) = 0 for all a.Thus, Alice uses the following randomized knowledge algorithm A Alice : when queried about prime, the algorithm picks a number a at random between 0 and the number n in Alice's local state; if P (n, a) = 1, it says "No" and if P (n, a) = 0, it says "Yes".(It is irrelevant for our purposes what the algorithm does on other queries.)It is not hard to check that A Alice has the following properties.If the number n in Alice's local state is prime, then A Alice answers "Yes" to a query prime with probability 1 (and hence "No" to the same query with probability 0).If n is composite, A Alice answers "Yes" to a query prime with probability ≤ 1/2 and "No" with probability ≥ 1/2.Thus, if n is composite, there is a chance that A Alice will make a mistake, although we can make the probability of error arbitrarily small by applying the algorithm repeatedly.While this problem seems similar to the double-headed coin example above, note that we have only bounds on the probabilities here.The actual probabilities corresponding to a particular number n depend on various number theoretic properties of that number.We return to this issue in Section 5.2.
Randomized knowledge algorithms like those in the examples above are quite common in the literature.They are not sound, but are "almost sound".The question is what we can learn from such an "almost sound" algorithm.Returning to the first example, we know the probability that A Bob says "Yes" (to the query dh) given that the coin is double-headed; what we are interested in is the probability that the coin is double-headed given that A Bob says "Yes".(Of course, the coin is either double-headed or not.However, if Bob has to make decisions based on whether the coin is double-headed, it seems reasonable for him to ascribe a subjective probability to the coin being double-headed.It is this subjective probability that we are referring to here.) Taking "dh" to represent the event "the coin is double-headed" (thus, the proposition dh is true at exactly the states in dh), by Bayes' rule, Pr(dh | A Bob says "Yes") = Pr(A Bob says "Yes" | dh)Pr(dh) Pr(A Bob says "Yes") .
The only piece of information in this equation that we have is Pr(A Bob says "Yes" | dh).If we had Pr(dh), we could derive Pr(A Bob says "Yes").However, we do not have that information, since we did not assume a probability distribution on the choice of coin.Although we do not have the information needed to compute Pr(dh | A Bob says "Yes"), there is still a strong intuition that if X i dh holds, this tells us something about whether the coin is double-headed.How can this be formalized?
5.1.Evidence.Intuitively, the fact that X i ϕ holds provides "evidence" that ϕ holds.But what is evidence?There are a number of definitions in the literature.They all essentially give a way to assign a "weight" to different hypotheses based on an observation; they differ in exactly how they assign the weight (see [Kyburg 1983] for a survey).Some of these approaches make sense only if there is a probability distribution on the hypotheses.Since this is typically not the case in the applications of interest to us (for example, in the primality example, we do not want to assume a probability on the input n), we use a definition of evidence given by Shafer [1982] and Walley [1987], which does not presume a probability on hypotheses.We start with a set H of hypotheses, which we take to be mutually exclusive and exhaustive; thus, exactly one hypothesis holds at any given time.For the examples of this paper, the hypotheses of interest have the form H = {h 0 , ¬h 0 }, where the hypothesis ¬h 0 is the negation of hypothesis h 0 .Intuitively, this is because we want to reason about the evidence associated with a formula or its negation (see Section 5.3).For example, if h 0 is "the coin is double-headed", then ¬h 0 is "the coin is not double-headed" (and thus, if there are only two kinds of coins, double-headed and fair, then ¬h 0 is "the coin is fair").We are given a set O of observations, which can be understood as outcomes of experiments that we can make.Assume that for each hypothesis h ∈ H there is a probability space (O, 2 O , µ h ).Intuitively, µ h (ob) is the probability of ob given that hypothesis h holds.While this looks like a conditional probability, notice that it does not require a probability on H. Taking ∆(O) to denote the set of probability measures on O, define an evidence space to be a tuple E = (H, O, F), where H, O, and F : H → ∆(O).Thus, F associates with each hypothesis a probability on observations (intuitively, the probability that various observations are true given that the hypothesis holds).We often denote F(h) as µ h .For an evidence space E, the weight that the observation ob lends to hypothesis h ∈ H, written w E (ob, h), is .
(5.1) Equation (5.1) does not define a weight w E for an observation ob such that h∈H µ h (ob) = 0.
Intuitively, this means that the observation ob is impossible.In the literature on confirmation theory it is typically assumed that this case never arises.More precisely, it is assumed that all observations are possible, so that for every observation ob, there is an hypothesis h such that µ h (ob) > 0. In our case, making this assumption is unnatural.We want to view the answers given by knowledge algorithms as observations, and it seems perfectly reasonable to have a knowledge algorithm that never returns "No", for instance.As we shall see (Proposition 5.1), the fact that the weight of evidence is undefined in the case that h∈H µ h (ob) = 0 is not a problem in our intended application, thanks to our assumption that ν does not assign zero probability to the nonempty sets of sequences of coin tosses that determine the result of the knowledge algorithm.
Observe that the measure w E always lies between 0 and 1, with 1 indicating that the full weight of the evidence of observation ob is provided to the hypothesis.While the weight of evidence w E looks like a probability measure (for instance, for each fixed observation ob for which h∈H µ h (ob) > 0, the sum h∈H w E (ob, h) is 1), one should not interpret it as a probability measure.It is simply a way to assign a weight to hypotheses given observations.It is possible to interpret the weight function w as a prescription for how to update a prior probability on the hypotheses into a posterior probability on those hypotheses, after having considered the observations made.We do not focus on these aspects here; see [Halpern and Pucella 2003] for more details.
For the double-headed coin example, the set H of hypotheses is {dh, ¬dh}.The observations O are simply the possible outputs of the knowledge algorithm A Bob on the formula dh, namely, {"Yes", "No"}.From the discussion following the description of the example, it follows that µ dh ("Yes") = 1 and µ dh ("No") = 0, since the algorithm always says "Yes" when the coin is double-headed.Similarly, µ ¬dh ("Yes") is the probability that the algorithm says "Yes" if the coin is not double-headed.By assumption, the coin is fair if it is not double-headed, so µ ¬dh ("Yes") = 1/2 and µ ¬dh ("No") = 1/2.Define F(dh) = µ dh and F(¬dh) = µ ¬dh , and let E = ({dh, ¬dh}, {"Yes", "No"}, F).
It is easy to check that w E ("Yes", dh) = 2/3 and w E ("Yes", ¬dh) = 1/3.Intuitively, a "Yes" answer to the query dh provides more evidence for the hypothesis dh than the hypothesis ¬dh.Similarly, w("No", dh) = 0 and w("No", ¬dh) = 1.Thus, an output of "No" to the query dh indicates that the hypothesis ¬dh must hold.
This approach, however, is not quite sufficient to deal with the sensor example because, in that example, the probability of an observation does not depend solely on whether the hypothesis is true or false.The probability of the algorithm answering "Yes" to a query wall(10) when wall( 10) is true depends on the actual distance m to the wall: • if m ≤ 9, then µ wall(10) ("Yes") = 1 (and thus µ wall(10) ("No") = 0); • if m = 10, then µ wall(10) ("Yes") = 3/4 (and thus µ wall(10) ("No") = 1/4).Similarly, the probability of the algorithm answering "Yes" to a query wall(10) in a state where ¬wall(10) holds depends on m in the following way: • if m = 11, then µ ¬wall(10) ("Yes") = 1/4; • if m ≥ 12, then µ ¬wall(10) ("Yes") = 0.It does not seem possible to capture this information using the type of evidence space defined above.In particular, we do not have a single probability measure over the observations given a particular hypothesis.One reasonable way of capturing the information is to associate a set of probability measures on observations with each hypothesis; intuitively, these represent the possible probabilities on the observations, depending on the actual state.
To make this precise, define a generalized evidence space to be a tuple E = (H, O, F), where now F : H → 2 ∆(O) .We require F(h) = ∅ for at least one h ∈ H. What is the most appropriate way to define weight of evidence given sets of probability measures?As a first step, consider the set of all possible weights of evidence that are obtained by taking any combination of probability measures, one from each set F(h) (provided that F(h) = ∅).This gives us a range of possible weights of evidence.We can then define upper and lower weights of evidence, determined by the maximum and minimum values in the range, somewhat analogous to the notions of upper and lower probability [Halpern 2003].(Given a set P of probability measures, the lower probability of a set U is inf µ∈P µ(U ); its upper probability is sup µ∈P µ(U ).)Let Thus, W E (ob, h) is the set of possible weights of evidence for the hypothesis h given by ob.Define the lower weight of evidence function w E by taking w E (ob, h) = inf W E (ob, h); similarly, define the upper weight of evidence function w E by taking w We show in Proposition 5.1 that, in the special case where F(h) is a singleton for all h (which has been the focus of all previous work in the literature), W E (ob, h) is a singleton under our assumptions.In particular, the denominator is not 0 in this case.Of course, if Lower and upper evidence can be used to model the examples at the beginning of this section.In the sensor example, with H = {wall(10), ¬wall(10)}, there are two probability measures associated with the hypothesis wall(10), namely, µ wall(10),≤9 ("Yes") = 1 µ wall(10),=10 ("Yes") = 3/4; similarly, there are two probability measures associated with the hypothesis ¬wall(10), namely µ ¬wall(10),=11 ("Yes") = 1/4 µ ¬wall(10),≥12 ("Yes") = 0.
The primality example can be dealt with in the same way.Take H = {prime, ¬prime}.There is a single probability µ prime associated with the hypothesis prime, namely µ prime ("Yes") = 1; intuitively, if the number is prime, the knowledge algorithm always returns the right answer.In contrast, there are a number of different probability measures µ ¬prime,n associated with the hypothesis ¬prime, one per composite number n, where we take µ ¬prime,n ("Yes") to be the probability that the algorithm says "Yes" when the composite number n is in Alice's local state.Note that this probability is 1 minus the fraction of "witnesses" a < n such that P (n, a) = 1.The fraction of witnesses depends on number-theoretic properties of n, and thus may be different for different choices of composite numbers n.Moreover, Alice is unlikely to know the actual probability µ ¬prime,n .As we mentioned above, it has been shown that µ ¬prime,n ≤ 1/2 for all composite n, but Alice may not know any more than this.Nevertheless, for now, we assume that Alice is an "ideal" agent who knows the set {µ ¬prime,n | n is composite}.(Indeed, in the standard Kripke structure framework for knowledge, it is impossible to assume anything else!) We consider how to model the set of probabilities used by a "less-than-ideal" agent in Section 5.2.Let E be the corresponding generalized evidence space.Then Since µ ¬prime,n ("Yes") ≤ 1/2 for all composite n, it follows that w E ("Yes", prime) ≥ 2/3.Similarly, W E ("Yes", ¬prime) = {µ ¬prime,n ("Yes")/(µ ¬prime,n ("Yes") + 1) | n composite}.
Since µ ¬prime,n ("Yes") ≤ 1/2 for all composite n, we have that w E ("Yes", ¬prime) ≤ 1/3.Therefore, if the algorithm answers "Yes" to a query prime, the evidence supports the hypothesis that the number is indeed prime.
Note that, in modeling this example, we have assumed that the number n is not in Alice's local state and that Alice knows the fraction of witnesses a for each composite number n.This means that the same set of probabilities used by Alice for all choices of n (since the set of probabilities used depends only on Alice's local state), and is determined by the set of possible fraction of elements < n that are witnesses, for each composite number n.
Assuming that n is in Alice's local state (which is actually quite a reasonable assumption!) and that Alice does not know the fraction of numbers less than n that are witnesses adds new subtleties; we consider them in Section 5.2.5.2.Evidence for Randomized Knowledge Algorithms.We are now ready to discuss randomized knowledge algorithms.What does a "Yes" answer to a query ϕ given by an "almost sound" knowledge algorithm tell us about ϕ?As the discussion in Section 5.1 indicates, a "Yes" answer to a query ϕ provides evidence for the hypotheses ϕ and ¬ϕ.This can be made precise by associating an evidence space with every state of the model to capture the evidence provided by the knowledge algorithm.To simplify the presentation, we restrict our attention to knowledge algorithms that are ϕ-complete.(While it is possible to deal with general knowledge algorithms that also can return "?" using these techniques, we already saw that the logic does not let us distinguish between a knowledge algorithm returning "No" and a knowledge algorithm returning "?"; they both result in lack of algorithmic knowledge.In the next section, where we define the notion of a reliable knowledge algorithm, reliability will be characterized in terms of algorithmic knowledge, and thus the definition will not distinguish between a knowledge algorithm returning "No" or "?".In order to establish a link between the notion of reliability and evidence, it is convenient to either consider ϕ-complete algorithms, or somehow identify the answers "No" and "?".We choose the former.)Note that the knowledge algorithms described in the examples at the beginning of this section are all complete for their respective hypotheses.We further assume that the truth of ϕ depends only on the state, and not on coin tosses, that is, ϕ does not contain occurrences of the X i operator.
(We omit the "?" from the set of possible observation if the knowledge algorithm is ϕcomplete, as is the case in the three examples given at the beginning of this section.)Since the agent does not know which state s ∈ S ℓ is the true state, he must consider all the probabilities in F ℓ,ϕ (ϕ) and F ℓ,ϕ (¬ϕ) in his evidence space.
We can now make precise the claim at which we have been hinting throughout the paper.Under our assumptions, for all evidence spaces of the form E A i ,ϕ,ℓ that arise in this construction, and all observations ob that can be made in local state ℓ, there must be some expression in W E A i ,ϕ,ℓ (ob, h) with a nonzero denominator.Intuitively, this is because if ob is observed at some state s such that L i (s) = ℓ, our assumptions ensure that µ s,ϕ (ob) > 0. In other words, observing ob means that the probability of observing ob must be greater than 0. Proposition 5.1.For all probabilistic algorithmic knowledge structures N , agents i, formulas ϕ, and local states ℓ of agent i that arise in N , if ob is a possible output of i's knowledge algorithm A d i in local state ℓ on input ϕ, then there exists a probability measure µ ∈ F ℓ,ϕ (ϕ) ∪ F ℓ,ϕ (¬ϕ) such that µ(ob) > 0.
In particular, it follows from Proposition 5.1 that, under our assumptions, the evidence function is always defined in the special case where F ℓ,ϕ (h) is a singleton for all hypotheses h.
To be able to talk about evidence within the logic, we introduce operators to capture the lower and upper evidence provided by the knowledge algorithm of agent i, Ev i (ϕ) and Ev i (ϕ), read "i's lower (resp., upper) weight of evidence for ϕ", with semantics defined as follows: (N, s, v) and (N, s, v) |= Ev i (ϕ) = α.By Proposition 5.1, these formulas are all well defined.This definition of evidence has a number of interesting properties.For instance, obtaining full evidence in support of a formula ϕ essentially corresponds to establishing the truth of ϕ.Proposition 5.2.For all probabilistic algorithmic knowledge structures N , we have Suppose that we now apply the recipe above the derive the evidence spaces for the three examples at the beginning of this section.For the double-headed coin example, consider a structure N with two states s 1 and s 2 , where the coin is double-headed at state s 1 and fair at state s 2 , so that (N, s 1 , v) |= dh and (N, s 2 , v) |= ¬dh.Since Bob does not know whether the coin is fair or double-headed, it seems reasonable to assume that Bob has the same local state ℓ 0 at both states.Thus, S ℓ 0 = {s 1 , s 2 }, S ℓ 0 ,dh = {s 1 }, and S ℓ 0 ,¬dh {s 2 }.Since we are interested only in the query dh and there is only one local state, we can consider the single evidence space E = ({dh, ¬dh}, {"Yes", "No"}, F dh ), where We can check that, for all states (s, v) where A Bob (dh, ℓ 0 , s, v Bob ) = "Yes", (N, s, v) |= Ev(dh) = 2/3 and (N, s, v) |= Ev(dh) = 2/3, while at all states (s, v) where A Bob (dh, ℓ 0 , s, v Bob ) = "No", (N, s, v) |= Ev(dh) = 0 and (N, s, v) |= Ev(dh) = 0.In other words, the algorithm answering "Yes" provides evidence for the coin being double-headed, while the algorithm answering "No" essentially says that the coin is fair.For the probabilistic sensor example, consider a structure N with states s m (m ≥ 1), where the wall at state s m is at distance m from the robot.Suppose that we are interested in the hypotheses wall(10) and ¬wall(10), so that (N, s m , v) |= wall(10) if and only m ≤ 10.The local state of the robot is the same at every state, say ℓ 0 .Thus, S ℓ 0 = {s m | m ≥ 1}, S ℓ 0 ,wall(10) = {s m | 1 ≤ m ≤ 10}, and S ℓ 0 ,¬wall(10) = {s m | m ≥ 11}.Again, since there is only one local state and we are interested in only one query (wall(10) we can consider the single evidence space E = ({wall(10), ¬wall(10)}, {"Yes", "No"}, F wall(10) ), where It is straightforward to compute that, for all states (s, v) where A Robot (wall(10), ℓ 0 , s, v Robot ) = "Yes", (N, s, v) |= Ev(wall(10)) ≥ 3/4 and (N, s, v) |= Ev(wall(10)) ≤ 1, while at all states (s, v) where A Robot (wall(10), ℓ 0 , s, v Robot ) = "No", (N, s, v) |= Ev(wall(10)) ≤ 1/4 and (N, s, v) |= Ev(wall( 10)) ≥ 0. In other words, the algorithm answering "Yes" provides evidence for the wall being at distance at most 10, while the algorithm answering "No" provides evidence for the wall being further away.Finally, we consider the primality example.Earlier we discussed this example under the assumption that the number n was not part of Alice's local state.Under this assumption, it seems reasonable to assume that there is only one local state, call it ℓ, and that we can identify the global state with the number n.Thus, S ℓ,prime = {n | n is prime} and S ℓ,¬prime = {n | n is not prime}.Define F prime (prime) = {µ prime }, where µ prime ("Yes") = 1, while F prime (¬′) = {µ n | n is not prime}, where µ n ("Yes") is the fraction of numbers a < n such that P (n, a) = 0.
What should we do if Alice knows the input (so that n is part of the local state)?In that case, it seems that the obvious thing to do is to again have one state denoted n for every number n, but since n is now part of the local state, we can take S n = {n}.But modeling things this way also points out a problem.With this state space, since the agent considers only one state possible in each local state, it is easy to check that (N, s, v) |= Ev(prime) = 1 if s ∈ S n with n prime, and (N, s, v) |= Ev(¬prime) = 1 if s ∈ S n with n not prime.The knowledge algorithm is not needed here.Since the basic framework implicitly assumes that agents are logically omniscient, Alice knows whether or not n is prime.
To deal with this, we need to model agents that are not logically omniscient.Intuitively, we would like to model Alice's subjective view of the number.If she does not know whether the number n is prime, she must consider possible a world where n is prime and a world where n is not prime.We should allow her to consider possible a world where n is prime, and another world where n is not prime. of course, if n is in fact prime, then the world where n is not prime is what Hintikka [1975] has called an impossible possible worlds, one where the usual laws of arithmetic do not hold.Similarly, since Alice does not know how likely the knowledge algorithm is to return "Yes" if n is composite (i.e., how many witnesses a there are such that P (n, a) = 0), then we should allow her to consider possible the impossible worlds where the number of witnesses is k for each k > n/2.(We restrict to k > n/2 to model the fact that Alice does know that there are at least n/2 witnesses if n is composite.)Thus, consider the structure N with states s n,prime and s n,¬prime,k (for n ≥ 2, n/2 < k ≤ n).Intuitively, s n,¬prime,k is the state where there are k witnesses.(Clearly, if there is more information about the number of witnesses, then the set of states should be modified appropriately.)At states s n,prime and s n,¬prime,α , Alice has the same local state, which we call ℓ n (since we assume that n is stored in her local state); however (N, s n,prime , v) |= prime, while (N, s n,¬prime,k , v) |= ¬prime.For a local state ℓ n , define S ℓn,prime = {s n,prime }, and S ℓn,¬prime = {s n,¬prime,k | n/2 < k ≤ n}, and let S ℓn = S ℓn,prime ∪ S ℓn,¬prime .In this model, the evidence space at local state ℓ n is therefore E n = ({prime, ¬prime}, {"Yes", "No"}, F ℓn,prime ), where Using impossible possible worlds in this way gives us just the answers we expect.We can check that, for all states (s, v) where A Alice (prime, s Alice , s, v) = "Yes", (N, s, v) |= Ev(prime) ≥ 2/3, while at all states (s, v) where A Alice (prime, s Alice , s, v) = "No", (N, s, v) |= Ev(prime) = 0.In other words, the algorithm returning "Yes" to the query whether the number in Alice's local state is prime provides evidence for the number being prime, while the algorithm returning "No" essentially says that the number is composite.5.3.Reliable Randomized Knowledge Algorithms.As we saw in the previous section, a "Yes" answer to a query ϕ given by an "almost sound" knowledge algorithm provides evidence for ϕ.We now examine the extent to which we can characterize the evidence provided by a randomized knowledge algorithm.To make this precise, we need to first characterize how reliable the knowledge algorithm is.(In this section, for simplicity, we assume that we are dealing with complete algorithms, which always answer either "Yes" or "No".Intuitively, this is because reliability, as we will soon see, talks about the probability of a knowledge algorithm answering "Yes" or anything but "Yes".Completeness ensures that there is a single observation that can be interpreted as not-"Yes"; this lets us relate reliability to our notion of evidence in Propositions 5.3 and 5.5.Allowing knowledge algorithms to return both "No" and "?" would require us to talk about the evidence provided by the disjunction "No"-or-"?" of the observations, a topic beyond the scope of this paper.)A randomized knowledge algorithm A i is (α, β)-reliable for ϕ in N (for agent i) if α, β ∈ [0, 1] and for all states s and derandomizers v, • (N, s, v) |= ϕ implies µ s ("Yes") ≥ α; • (N, s, v) |= ¬ϕ implies µ s ("Yes") ≤ β.These conditions are equivalent to saying In other words, if ϕ is true at state s, then an (α, β)-reliable algorithm says "Yes" to ϕ at s with probability at least α (and hence is right when it answers "Yes" to query ϕ with probability at least α); on the other hand, if ϕ is false, it says "Yes" with probability at most β (and hence is wrong when it answer "Yes" to query ϕ with probability at most β).The primality testing knowledge algorithm is (1, 1/2)-reliable for prime.
The intuition here is that (α, β)-reliability is a way to bound the probability that the knowledge algorithm is wrong.The knowledge algorithm can be wrong in two ways: it can answer "No" or "?" to a query ϕ when ϕ is true, and it can answer "Yes" to a query ϕ when ϕ is not true.If a knowledge algorithm is (α, β)-reliable, then the probability that it answers "No" or "?" when the answer should be "Yes" is at most 1 − α: the probability that it answers "Yes" when it should not is at most β.
We can now capture the relationship between reliable knowledge algorithms and evidence.The relationship depends in part on what the agent considers possible.Proposition 5.3.
Notice that Proposition 5.3 talks about the evidence that the knowledge algorithm provides for ϕ.Intuitively, we might expect some kind of relationship between the evidence for ϕ and the evidence for ¬ϕ.A plausible relationship would be that high evidence for ϕ implies low evidence for ¬ϕ, and low evidence for ϕ implies high evidence for ¬ϕ.Unfortunately, given the definitions in this section, this is not the case.Evidence for ϕ is completely unrelated to evidence for ¬ϕ.Roughly speaking, this is because evidence for ϕ is measured by looking at the results of the knowledge algorithm when queried for ϕ, and evidence for ¬ϕ is measured by looking at the results of the knowledge algorithm when queried for ¬ϕ.There is nothing in the definition of a knowledge algorithm that says that the answers of the knowledge algorithm to queries ϕ and ¬ϕ need to be related in any way.
A relationship between evidence for ϕ and evidence for ¬ϕ can be established by considering knowledge algorithms that are "well-behaved" with respect to negation.There is a natural way to define the behavior of a knowledge algorithm on negated formulas.Intuitively, a strategy to evaluate A i (¬ϕ, ℓ, s, v i ) is to evaluate A i (ϕ, ℓ, s, v i ), and returns the negation of the result.There is a choice to be made in the case when the A i returns "?" to the query for ϕ.One possibility is to return "?" to the query for ¬ϕ when the query for ϕ returns "?"; another possibility is to return "Yes" is the query for ϕ returns "?".A randomized knowledge algorithm A weakly respects negation if, for all local states ℓ and derandomizers v, Similarly, a randomized knowledge algorithm A strongly respects negation, if for all local states ℓ and derandomizers v, Note that if A i is ϕ-complete, then the output of A i on input ¬ϕ is the same whether A i weakly or strongly respects negation.Say A i respects negation if it weakly or strongly respects negation.Note that if A i is ϕ-complete and respects negation, then A i is ¬ϕcomplete.
Our first result shows that for knowledge algorithms that respect negation, reliability for ϕ is related to reliability for ¬ϕ: Proposition 5.4.If A i respects negation, is ϕ-complete, and is (α, β)-reliable for ϕ in N , then It is easy to check that if A i is ϕ-complete and respects negation, then X i ϕ ⇔ ¬X i ¬ϕ is a valid formula.Combined with Proposition 5.3, this yields the following results.Proposition 5.5.If A i respects negation, is ϕ-complete, and is

Conclusion
The goal of this paper is to understand what the evidence provided by a knowledge algorithm tells us.To take an example from security, consider an enforcement mechanism used to detect and react to intrusions in a system.Such an enforcement mechanism uses algorithms that analyze the behavior of users and attempt to recognize intruders.While the algorithms may sometimes be wrong, they are typically reliable, in our sense, with some associated probabilities.Clearly the mechanism wants to make sensible decisions based on this information.How should it do this?What actions should the system take based on a report that a user is an intruder?
If we have a probability on the hypotheses, evidence can be used to update this probability.More precisely, as shown in [Halpern and Fagin 1992], evidence can be viewed as a function from priors to posteriors.For example, if the (cumulative) evidence for n being a prime is α and the prior probability that n is prime is β, then a straightforward application of Bayes' rule tells us that the posterior probability of n being prime (that is, the probability of n being prime in light of the evidence) is (αβ)/(αβ + (1 − α)(1 − β)).7 Therefore, if we have a prior probability on the hypotheses, including the formula ϕ, then we can decide to perform an action when the posterior probability of ϕ is high enough.(A similar interpretation holds for the evidence expressed by w E and w E ; we hope to report on this topic in future work.)However, what can we do when there is no probability distribution on the hypotheses, as in the primality example at the beginning of this section?The probabilistic interpretation of evidence still gives us a guide for decisions.As before, we assume that if the posterior probability of ϕ is high enough, we will act as if ϕ holds.The problem, of course, is that we do not have a prior probability.However, the evidence tells us what prior probabilities we must be willing to assume for the posterior probability to be high enough.
For example, a "Yes" from a (.999, .001)-reliablealgorithm for ϕ says that as long as the prior probability of ϕ is at least .01,then the posterior is at least .9.This may be sufficient assurance for an agent to act.
Of course, it is also possible to treat evidence as primitive, and simply decide to act is if the hypothesis for which there is more evidence, or for the hypothesis for which evidence is above a certain threshold is true.It would in fact be of independent interest to study the properties of a theory of decisions based on a primitive notion of evidence.We leave this to future work.Proposition 4.2.Suppose that N = (S, π, L 1 , . . ., L n , A d 1 , . . ., A d n , ν) is a probabilistic algorithmic knowledge security structure with an adversary as agent i and that A i = A DY+rg(r) i .Let K be the number of distinct keys used in the messages in the adversary's local state ℓ (i.e., the number of keys used in the messages that the adversary has intercepted at a state s with L i (s) = ℓ).Suppose that K/|K| < 1/2 and that ν is the uniform distribution on sequences of coin tosses.
Proof.It is not hard to show that the r keys that the adversary guesses do no good at all if none of them match a key used in a message intercepted by the adversary.By assumption, K keys are used in messages intercepted by the adversary.The probability that a key chosen at random is one of these K is K/|K|, since there are |K| keys altogether.Thus, the probability that a key chosen at random is not one of these K is 1−(K/|K|).The probability that none of the r keys chosen at random is one of these K is therefore (1 − (K/|K|)) r .We now use some standard approximations.Note that (1 − (K/|K|)) r = e r ln(1−(K/|K|) , and Thus, if 0 < x < 1/2, then ln(1 − x) > −2x.It follows that if K/|K| < 1/2, then e r ln(1−(K/|K|)) > e −2rK/|K| .Since the probability that a key chosen at random does not help to compute algorithmic knowledge is greater than e −2rK/|K| , the probability that it helps is less than 1 − e −2rK/|K| .Soundness of A i with respect to has i (m) follows from Proposition 4.1 (since soundness follows for arbitrary initkeys(ℓ) ⊆ K).
Proposition 5.1.For all probabilistic algorithmic knowledge structures N , agents i, formulas ϕ, and local states ℓ of agent i that arise in N , if ob is a possible output of i's knowledge algorithm A d i in local state ℓ on input ϕ, then there exists a probability measure µ ∈ F ℓ,ϕ (ϕ) ∪ F ℓ,ϕ (¬ϕ) such that µ(ob) > 0.
Proof.Suppose that ℓ is a local state of agent i that arises in N and ob is a possible output of A d i in local state ℓ on input ϕ.Thus, there exists a state s and derandomizer v such that By assumption, µ s (ob) > 0. Proposition 5.2.For all probabilistic algorithmic knowledge structures N , we have By our assumption about derandomizers, this means that there is no state s ′ and derandomizer ).Thus, we cannot have (N, s, v) |= ¬ϕ.Hence, (N, s, v) |= ϕ, as required.
The following lemma gives an algebraic relationship that is useful in the proofs of Propositions 5.3 and 5.5.
The proof of part (d) is similar to that of (b), and is left to the reader.
To prove Proposition 5.5, we need a preliminary lemma, alluded to in the text.
Lemma A.2.If N is a probabilistic algorithmic knowledge structure where agent i uses a knowledge algorithm A i that is ϕ-complete and that respects negation, then N |= X i ϕ ⇔ ¬X i ¬ϕ.
Proof.Let s be a state of N and let v be a derandomizer.If (N, s, v) |= X i ϕ, then A d i (ϕ, L i (s), s, v i ) = "Yes".Since A i respects negation and is ϕ-complete, this implies that A d i (¬ϕ, L i (s), s, v i ) = "No" (A d i cannot return "?" since it is ϕ-complete) and hence that (N, s, v) |= X i ¬ϕ, so (N, s, v) |= ¬X i ¬ϕ.Thus, (N, s, v) |= X i ϕ ⇒ ¬X i ¬ϕ.Since s and v were arbitrary, we have that N |= X i ϕ ⇒ ¬X i ¬ϕ.Conversely, let s be a state of N and let v be a derandomizer.If (N, s, v) |= ¬X i ¬ϕ, then (N, s, v) |= X i ¬ϕ, that is, A d i (¬ϕ, L i (s), s, v i ) = "Yes".Since A i is ϕ-complete and respects negation, A i is ¬ϕcomplete, so it must be the case that A d i (¬ϕ, L i (s), s, v i ) = "No".Therefore, A d i (ϕ, L i (s), s, v i ) = "Yes", and (N, s, v) |= X i ϕ.Since s and v were arbitrary, we have that N |= X i ϕ. 2 ∧ Ev i (ϕ) ≤ 1 2 if (α, β) = (1, 1).Proof.Suppose that A i is (α, β)-reliable for ϕ in N .Since A i is ϕ-complete and respects negation, by Proposition 5.4, A i is (1 − β, 1 − α)-reliable for ¬ϕ in N .For part (a), suppose that (α, β) = (0, 0).Let s be a state of N and let v be a derandomizer.By Proposition 5.3 applied to ϕ, (N, s, v) |= X i ϕ ∧ ¬K i ¬ϕ ⇒ Ev i (ϕ) ≥ α α + β .Since s and v are arbitrary,
Putting this together, we get Since s and v are arbitrary, We leave the proof of parts (b) and (d) to the reader.