Bayesian networks and decision graphs are formal graphical languages for representation and communication of decision scenarios requiring reasoning under uncertainty. Their strengths are two-sided. It is easy for humans to construct and to understand them, and when communicated to a computer, they can easily be compiled. Furthermore, handy algorithms are developed for analyses of the models and for providing responses to a wide range of requests such as belief updating, determining optimal strategies, conflict analyses of evidence, and most probable explanation.
The book emphasizes both the human and the computer side. Upcoming SlideShare. Like this presentation? Why not share! Embed Size px. Start on. Show related SlideShares at end. WordPress Shortcode. Published in: Education. Full Name Comment goes here. Are you sure you want to Yes No. Be the first to like this. Evidence may be transmitted through a diverging connection unless it is instantiated.
E Fig. Diverging connection.
- Clusters, Orders, and Trees: Methods and Applications: In Honor of Boris Mirkins 70th Birthday.
- EconPapers: Bayesian Networks and Decision Graphs, 2nd Edition by Finn V. Jensen, Thomas D. Nielsen?
- Software for Bayesian Networks and Decision Graphs.
- Investing in a Sustainable World: Why GREEN Is the New Color of Money on Wall Street.
- MySQL High Availability: Tools for Building Robust Data Centers (2nd Edition).
If A is instantiated, it blocks communication between its children. Sex has an impact on length of hair as well as stature. On the other hand, if we know that the person is a man, then the length of his hair gives us no extra clue on his stature. Converging Connections A description of the situation in Figure 2. If nothing is known about A except what may be inferred from knowledge of its parents B,. Knowledge of one possible cause of an event does not tell us anything about the other possible causes. However, if anything is known about the consequences, then information on one possible cause may tell us something about the other causes.
If we now get the information that there is fuel in the tank, then our certainty in the spark plugs being dirty will increase since this will explain why the car cannot start. Conversely, if we get the information that there is no fuel on the car, then our certainty in the spark plugs being dirty will decrease since the lack of fuel explains why the car cannot start. In Figure 2. Observe that in the second example we observe only A indirectly through information about F ; knowing the state of F tells us something about the state of E, which in turn tells us something about A.
Converging connection. If A changes certainty, it opens communication between its parents. Examples in which the parents of A are dependent. The dotted lines indicate insertion of evidence. The conclusion is that evidence may be transmitted through a converging connection only if either the variable in the connection or one of its descendants has received evidence.
Remark: Evidence about a variable is a statement of the certainties of its states. If the variable is instantiated, we call it hard evidence; otherwise, it is called soft. In the example above, we can say that hard evidence about the variable F provides soft evidence about the variable A. Blocking in the case of serial and diverging connections requires hard evidence, whereas opening in the case of converging connections holds for all kinds of evidence.
Salmonella Flu Nausea Pallor Fig. If A and B are not d-separated, we call them d-connected. The evidence entered at B and M represents instantiations. If evidence is entered at A, it may be transmitted to D. The variable B is blocked, so the evidence cannot pass through B to E. However, it may be passed to H and K. Note that although A and B are d-connected, changes in the belief in A will not necessarily change the belief in B. The Markov blanket of a variable A is the set consisting of the parents of A, the children of A, and the variables sharing a child with A.
The Markov blanket has the property that when instantiated, A is dseparated from the rest of the network see Figure 2. A theorem should be as follows. Claim: If A and B are d-separated, then changes in the certainty of A have no impact on the certainty of B. A causal network with M and B instantiated. The node A is d-separated from G only. Causal networks with hard evidence entered the variables are instantiated. An easier way of performing this test, without having to consider the various types of connections, is as follows: First you construct the so-called ancestral graph consisting of A, B, and C together with all nodes from which there is a directed path to either A, B, or C see Figure 2.
Next, you insert an undirected link between each pair of nodes with a common child and then you make all links undirected. The resulting graph see Figure 2. The above procedure generalizes straightforwardly to the case in which we work with sets of variables rather than single variables: you just construct the ancestral graph using these sets of variables and perform the same steps as above: A and B are then d-separated given C if all paths connecting a variable in A with a variable in B intersect a variable in C.
This can be expressed by attaching numbers to the links. Let A be a parent of B. Using probability calculus, it would be natural to let P B A be the strength of the link. It may happen that the domain to be modeled contains causal feedback cycles see Figure 2. For causal networks, no calculus has been developed that can cope with feedback cycles, but certain noncausal models have been proposed to deal with this issue.
For Bayesian networks we require that the network does not contain cycles. A directed graph with a feedback cycle. This is not allowed in Bayesian networks. Note that if A has no parents, then the table reduces to the unconditional probability table P A. For the DAG in Figure 2. It has been claimed that prior probabilities are an unwanted introduction of bias to the model, and calculi have been invented in order to avoid it. However, as discussed in Section 2. A directed acyclic graph DAG. That is, when building the structure of a Bayesian network model, we need not insist on having the 2.
The model should not include conditional independences that do not hold in the real world. The causal network for the reduced car start problem. We have used the abbreviations Fu Fuel? The remaining tables are given in Table 2. However, P U grows exponentially with the number of variables, and U need not be very large before the table becomes intractably large. Therefore, we look for a more compact representation of P U , i. Conditional probabilities for the model in Figure 2.
Based on these two properties, what other properties can be deduced about P U? We shall show that this holds in general. For probability distributions over sets of variables, we have an equation called the chain rule. For Bayesian networks this equation has a special form. First we state the general chain rule: Proposition 2. Theorem 2. First we should show that P U is indeed a probability distribution. That is, we need to show that Axioms 1—3 hold. This is left as an exercise see Exercise 2. This is done through induction in the number of variables in BN.
The chain rule yields that a Bayesian network is a compact representation of a joint probability distribution. The following example illustrates how to exploit that for reasoning under uncertainty. In this example, we apply the rules of probability calculus to the Car Start Problem. This is done to illustrate that probability calculus can be used to perform the reasoning in the example, in particular, explaining away. In Chapter 4, we give general algorithms for probability updating in Bayesian networks. We will use the Bayesian network from Example 2.
We will use the joint probability table for the reasoning. The result is given in Tables 2. By marginalizing FM and Fu out of Table 2. If they do not, normalize by dividing by the sum. The numbers are given in Table 2. Note that P e , the prior probability of e, is obtained by marginalizing A out of P A, e.
Note also that P A, e is the result of multiplying P A by 0,. Let A be a variable with n states. The joint probability table P U, e is the table obtained from P U by replacing all entries with A not in state i or j by the value zero and leaving the other entries unchanged. Using the chain rule for Bayesian networks, we have the following theorem. Let BN be a Bayesian network over the universe U, and let e1 ,. You may, for example, receive a statement from someone that the chance of A being in state a1 is twice as high as for a2. This type of evidence is called likelihood evidence.
It is possible to treat this kind of evidence in Bayesian networks. The preceding statement is then represented by the distribution 0. However, because it is unclear what it means that a likelihood statement is true, P e cannot be interpreted as the probability of the evidence, and P U, e therefore has an unclear semantics.
Bayesian network - Wikipedia
We will not deal further with likelihood evidence. However, U need not be large before P U becomes intractably large. In this section, we illustrate how the calculations can be performed without having to deal with the full joint table. In Chapter 4, we give a detailed treatment of algorithms for probability updating. Consider the Bayesian network in Figure 2.
A Bayesian network. Notice that we need not calculate the full table P U with entries. If we wait until evidence is entered, we will in this case need to work with a table with only entries. Later, we see that we need not work with tables larger than entries. Next, we marginalize H out. Finally, we calculate this product and marginalize B and C out of it.
The method we just used is called variable elimination and can be described in the following way: we start with a set T of tables, and whenever we wish to marginalize a variable X, we take from T all tables with X in their domains, calculate the product of them, marginalize X out of it, and place the resulting table in T.
In this respect, a Bayesian network is one type of compact representation among many others. However, there is more to it than this: From a knowledge engineering point of view, a Bayesian network is a type of graphical model. The structure of the network is formulated in a graphical communication language for which the language features have a very simple semantics, namely causality.
However, most often humans can communicate sensibly about causal relations in a knowledge domain. In Chapter 3, we extend the modeling language, and in Part II we present other types of graphical models. As mentioned, graphical models are communication languages. They consist of a qualitative part, where features from graph theory are used, and a quantitative part consisting of potentials, which are real-valued functions over sets of nodes from the graph; in Bayesian networks the potentials are conditional probability tables.
The next step in the use of graphical models has to do with communication to a computer. You wish to communicate a graphical model to a computer, and the computer should be able to process the model and give answers to various queries. The next concern is the scope of the language: what is the range of domains and tasks that you will be able to model with this language?
Actually, the language for Bayesian networks is a context-free language with a single context-sensitive aspect no directed cycles. As described in Chapter 4, there are algorithms for probability updating in Bayesian networks, but basically probability updating is NP-hard. This means that some models have an updating time exponential in the number of nodes. On the other hand, the running times of the algorithms can be easily calculated without actually running them.
In Chapter 4 and Part II, we treat complexity issues for the various graphical languages presented. The variables together with the directed edges form an acyclic directed graph DAG. To each variable A with parents B1 ,. Exercises 45 Inserting Evidence Let e1 ,. A proof that Bayesian networks admit d-separation can be found in Pearl, or in Lauritzen, Geiger and Pearl proved that d-separation is the correct criterion for directed graphical models, in the sense that for any DAG, a probability distribution can be found for which the d-separation criterion is sound and complete.
Meek furthermore proved that for a given DAG, the set of discrete probability distributions for which the d-separation criterion is not complete has measure zero. Bayesian networks have a long history in statistics, and can be traced back at least to the work in Minsky, The basis for the inference method presented in Section 2. The fact that inference is NP-hard was proved in Cooper, To illustrate that simple rules cannot cope with uncertainty reasoning, consider the following two cases: 46 2 Causal and Bayesian Networks i I have an urn with a red ball and a white ball in it.
If I add a red ball and shake it, what is the certainty of drawing a red ball in one draw? If I add a white ball instead, what is the certainty of drawing a red ball? If I combine the two actions, what is the certainty of drawing a red ball? I am also more certain to hit the target if I close the right eye. What is the combined certainty if I do both? Exercise 2. Construct a causal network and follow the reasoning in the following story. Convinced that a burglar has broken into his house, Holmes rushes to his car and heads for home.
On his way, he listens to the radio, and in the news it is reported that there has been a small earthquake in the area. Knowing that earthquakes have a tendency to turn on burglar alarms, he returns to work. Consider the Car Start Problem in Section 2. Since this is not the case, either there must be a leak in the tank, someone has stolen gasoline during the night, or the fuel meter is malfunctioning.
In the graphs in Figures 2. Figure for Exercise 2. For each pair of variables in the causal network in Figure 2. Consider the network in Figure 2. What are the minimal set s of variables required to d-separate C and E that is, sets of variables for which no proper subset d-separates C and E? What are the minimal set s of variables required to d-separate A and B? What are the maximal set s of variables that d-separate C and E that is, sets of variables for which no proper superset d-separates C and E?
What are the maximal set s of variables that d-separate A and B? A causal network for Exercise 2. What is the Markov blanket of each variable? Let A be a variable in a DAG. Show that A is d-separated from the remaining uninstantiated variables. Apply the procedure using the ancestral graph given in Section 2. If D2 is also an I-submap of D1 , they are said to be I-equivalent. Which of the four DAGs in Figure 2. The accompanying conditional probability tables are shown in Tables 2. Are A and C d-separated given B? Are A and C conditionally independent given B? A simple Bayesian network for Exercise 2.
Use this expression to show that B and D are conditionally independent given A and C. Consider the Bayesian network from Exercise 2. What is P B, C, e?
A professional Java software tool for teaching and applications
Assuming that each variable has ten states, what is the maximum size of a table during the procedure? Consider the DAG a in Exercise 2. Calculate P A, B, C. Conditional probability tables for Exercise 2. E Install an editor for Bayesian networks a reference to a list of systems can be found in the preface. E Construct a Bayesian network for Exercise 1. E Construct a Bayesian network to follow the reasoning from Exercise 2.
Use your own estimates of probabilities for the network. E Consider the Bayesian network in Figure 2. Use your system to investigate whether A and C are independent. Tables for Exercise 2. E Use your system and Section 2. However, as can be seen from the calculations in Section 2. Fortunately, software tools that can do the calculational job for us are available. In the rest of this book, we assume that the reader has access to such a system some URLs are given in the preface.
Therefore, we can start by concentrating on how to use Bayesian networks in model building and defer a presentation of methods for probability updating to Chapter 4. In Section 3. Section 3. The examples cover theoretically well-founded probabilities as well as probabilities taken from databases and purely subjective estimates.
Finally, Section 3. We call them hypothesis events. The hypothesis events detected are then grouped into sets of mutually exclusive and exhaustive events to form hypothesis variable. The next thing to have in mind is that in order to come up with a certainty estimate, we should provide some information channels, and the task is to identify the types of achievable information that may reveal something about the hypothesis variables. These types of information are grouped into 52 3 Building Models information variablesinformation variable, and a typical piece of information is a statement that a certain variable is in a particular state, but softer statements are also allowed.
To detect whether the milk is infected, you have a test, which may give either a positive or a negative test result.
Bayesian networks and decision graphs
The test is not perfect. It may give a positive result on clean milk as well as a negative result on infected milk. We have two hypothesis events: milk infected and milk not infected, and because they are mutually exclusive and exhaustive, they are grouped into the variable Infected? A possible information source is the test results, which can be either positive or negative. For this, we establish the variable Test with states pos and neg. The causal direction between the two variables is from Infected? Test Fig. The Bayesian network for the milk test. Warning: Certainly, no sensible person will claim that a positive test result may infect the milk.
Cows with infected milk will heal over time, and a clean cow has a risk of having infected milk the next day. Now, imagine that the farmer performs the test each day. After a week, he has not only the current test result but also the six previous test results. For each day, we have a model like the one in Figure 3.
These seven models should be connected such that past knowledge can be used for the current conclusion. A natural way would be to let the state of the milk yesterday have an impact on the state today. This yields the model in Figure 3. The model in Figure 3. A seven-day model for the milk test. This forecast will not be improved by knowing that the milk was not infected on day three. For various diseases, such an assumption will not be valid.
Some diseases have a natural span of time. If the Markov property of Figure 3. For example, it may be argued that you also need to go an extra day back, and the model will be as in Figure 3. A seven-day model with a two-day memory of infection. In the reasoning, we cannot exploit knowledge of the exact state of the previous infection node, but we may use a probability distribution based on a test result. The second hidden assumption has to do with the test.
Any two test nodes are d-separated given any infection node on the path. This means that the fault probability of the test is independent of whether it was previously correct. This is done in Figure 3. A minor digression on modeling of tests: It is good to have as a rule that no test is perfect. A seven-day model with two-day memory for infection and a one-day memory of correctness of test.
This is not all, though. You should also take the mechanism for false test results into account. Now, you may have the option of repeating the test, but will this be of any help? It will depend on the mechanisms that cause the test to give a wrong result. Models for these two types of failure mechanisms are shown in Figure 3. Test1 HIV? Test2 a Test1 Test2 b Fig. Model a illustrates the scenario in which a repeated test may provide new information, and model b shows the situation in which repeating a test always produces the same result.
I wake up in the morning with a sore throat. If it is severe angina, I will not go to work.
- Bayesian networks and decision graphs - PDF Free Download.
- Bayesian Networks and Decision Graphs (second edition);
- Rigid Analytic Geometry and Its Applications.
- Bayesian Networks and Decision Graphs / Edition 2.
- Treatise on Solid State Chemistry: Volume 6A Surfaces I.
- Pro Apache Tomcat 5/5.5.
To gain more insight, I can take my temperature, and I can look down my throat for yellow spots. The hypothesis events must be organized into a set of variables with mutually exclusive and exhaustive states. We may use the variables indicated previously, but we may also use only one variable Sick? We choose to use the two variables Cold? The information variables are Sore Throat? The variable Fever? Now it is time to consider the causal structure between the variables. We need not worry about how information is transmitted through the network.
The only thing to worry about is which variables have a direct causal impact on other variables. In this example, we have that Cold? The model is given in Figure 3. Sore Throat? See Spots? A model for Cold? The next thing to check is whether the conditional independences laid down in the model correspond to reality. For example, the model in Figure 3. If we do not agree, we may introduce a link from See Spots? For now, we accept the conditional independences given by the model. Following the method from Section 3. Next, we will analyze the conditional independences stated by the model.
A model for pregnancy. From this, we infer some expectations for the test results. Now, if we get a negative test result from the blood test, will this change our expectation for the urine test? There are several ways to change the model. You might, for example, introduce a link between the two test nodes, but there is no natural direction. A more correct model for pregnancy. For the model in Figure 3. If the model in Figure 3. In the model in Figure 3. Such variables are called mediating variables.
Mediating variables are often introduced when two 3. Some standard situations are illustrated in Figure 3. In examples a and b , A and B are not independent, whereas A and B are not independent given D in examples c and d. In the second round, you may discard at most two cards. The hypothesis events are the various types of hands in the game. Ambiguities are resolved according to rank.
The only information to acquire is the number of cards the player discards in the two rounds. By saying this, we are making an approximation again. If, for example, you have seen three aces, then he cannot have two aces. A causal structure for the information variables and the hypothesis variable could be as in Figure 3. However, this structure will leave us with no clue as to how to specify the probabilities. The causal structure will then be as in Figure 3. A structure for the poker game. The two mediating variables OH0 and OH1 are introduced.
In a naive Bayes model the information variables are assumed to be independent given the hypothesis variable see Figure 3. A naive Bayes model. For any set of observations f1 ,. This product is also called the likelihood for H given f1 ,. The posterior probability for H is then calculated as P H f1 ,.
What is particularly attractive with the calculation in equation 3. On the other hand, as we also saw from the insemination example, the independence assumption need not hold, and if the model is used anyway, the conclusions may be misleading. However, in certain application areas such as diagnosis the naive Bayes model has been shown to provide very good performance, even when the independence assumption is violated.
This is partly due to the fact that for many diagnostic problems we are interested only in identifying the most probable disease. We shall return to these models in Section 8. However, you cannot expect this part of the modeling always to go smoothly. Furthermore, causality is not a well-understood concept. Is a causal relation a property of the real world, or rather, is it a concept in our minds helping us to organize our perception of the world? For now, we make only one point about this issue, namely that in some situations you may be able to infer information about causality based on actions that change the state of the world.
For example, assume that you are confronted with two correlated variables A and B, but you cannot determine a direction.
Bayesian Networks Decision Graphs by Finn Jensen
If you observe the state of A, you will change your belief of B and vice versa. If this does not make you change your belief of B, then A is not a cause of B. On the other hand, if this imagined test indicates no causal arrow in any direction, then you should look for an event that has a causal impact on both A and B. The basis for the conditional probabilities can have an epistemological status ranging from well-founded theory over frequencies in a database to subjective estimates. We will give examples of each type.
The retailer of the test should provide P Test Infected? Let both numbers be 0. In the case of a positive test result, the milk may still be clean, and to come up with a probability we need the prior probabilities P Infected? Assume that this particular farm has 50 cows, and that the milk from all cows is poured into a container and transported to the dairy, which tests the milk with a very precise test.
Now we must make various assumptions.
The next assumption could be that outbreaks of infected milk for the cows in the farm are independent. This yields a coin-tossing model with P Infected? This completes the model, and next you can use a computer system to calculate posterior probabilities. The interesting question for this situation is, if we get a positive test result, what is the probability that the milk is infected?
This is left as an exercise see Exercise 3. For the seven-day model in Figure 3. There are two numbers to estimate: the risk of becoming infected and the chance of being cured. These numbers must be based on experience. For the sake of the example, let the risk of becoming infected be 0. This gives the numbers in Table 3. For the seven-day model with a two-day memory of infection Figure 3.
If we assume that the risk of being infected is the same as before, that the infection always lasts at least two days, and that after this the chance of being cured is 0. For the seven-day model with two-day memory of infection as well as correctness of test Figure 3. If we assume that a correct test has a Figure 3. A seven-day model with a two-day memory for infection and a one-day memory of correctness of test. With the preceding assumptions, the required tables are as in Table 3.
Ann is the mother of both Fred and Gwenn, but their fathers are in no way related. The colt John with 3. The disease is so serious that John is displaced instantly, and since the stud farm wants the gene out of production, Henry and Irene are taken out of breeding. What are the probabilities for the remaining horses to be carriers of the unwanted gene? The genealogical structure for the horses is given in Figure 3. Genealogical structure for the horses in the stud farm. The only information variable is John.
Before the information on John is acquired, he may have three genotypes: he may be sick aa , a carrier aA , or he may be pure AA. The hypothesis events are the genotypes of all other horses in the stud farm. The conditional probabilities for inheritance are both empirically and theoretically wellstudied, and the probabilities are as shown in Table 3. The inheritance tables could be as in Table 3. However, for all horses except John, we have additional knowledge. Since they are in production, they cannot be of type aa. P Child Father, Mother for genetic inheritance.
It is also possible to calculate the conditional probabilities directly. P Child Father, Mother when the parents are not sick.