3.3 Bayesian Networks Illustrated
Given a situation where it might rain today, and might rain tomorrow, what is the probability that it will rain on both days? Rain on two consecutive days are not independent events with isolated probabilities. If it rains on one day, it is more likely to rain the next. Solving such a problem involves determining the chances that it will rain today, and then determining the chance that it will rain tomorrow conditional on the probability that it will rain today. These are known as "joint probabilities." Suppose that P(rain today) = 0.20 and P(rain tomorrow given that it rains today) = 0.70. The probability of such joint events is determined by:
which can also be expressed as:
Working out the joint probabilities for all eventualities, the results can be expressed in a table format:
|Rain Tomorrow||No Rain Tomorrow||Marginal Probability of Rain Today|
|No Rain Today||0.16||0.64||0.80|
|Marginal Probability of Rain Tomorrow||0.30||0.70|
From the table, it is evident that the joint probability of rain over both days is 0.14, but there is a great deal of other information that had to be brought into the calculations before such a determination was possible. With only two discrete, binary variables, four calculations were required.
This same scenario can be expressed using a Bayesian Network Diagram as shown ("!" is used to denote "not").
Figure 1: A Bayesian Network showing the probability of rain
One attraction of Bayesian Networks is the efficiency that only one branch of the tree needs to be traversed. We are really only concerned with P(E1), P(E2|E1) and P(E2,E1).
We can also utilize the graph both visually and algorithmically to determine which parameters are independent of each other. Instead of calculating four joint probabilities, we can use the independence of the parameters to limit our calculations to two. It is self-evident that the probabilities of rain on the second day having rained on the first are completely autonomous from the probabilities of rain on the second day having not rained on the first.
At the same time as emphasizing parametric indifference, Bayesian Networks also provide a parsimonious representation of conditionality among parametric relationships. While the probability of rain today and the probability of rain tomorrow are two discrete events (it cannot rain both today and tomorrow at the same time), there is a conditional relationship between them (if it rains today, the lingering weather systems and residual moisture are more likely to result in rain tomorrow). For this reason, the directed edges of the graph are connected to show this dependency.
Friedman and Goldszmidt suggest looking at Bayesian Networks as a "story". They offer the example of a story containing five random variables: "Burglary", "Earthquake", "Alarm", "Neighbour Call", and "Radio Announcement". In such a story, "Burglary" and "Earthquake" are independent, and "Burglary" and "Radio Announcement" are independent given "Earthquake." This is to say that there is no event which effects both burglaries and earthquakes. As well, "Burglary" and "Radio Announcements" are independent given "Earthquake"--meaning that while a radio announcement might result from an earthquake, it will not result as a repercussion from a burglary.
Because of the independence among these variables, the probability of P(A,R,E,B) (The joint probability of an alarm, radio announcement, earthquake and burglary) can be reduced from:
involving 15 parameters to 8:
P(A,R,E,B) = P(A|E,B)*P(R|E)*P(E)*P(B)
This significantly reduced the number of joint probabilities involved. This can be represented as a Bayesian Network:
Figure 2: The conditional probabilities of an alarm given the independent events of a burglary and earthquake.
Using a Bayesian Network offers many advantages over traditional methods of determining causal relationships. Independence among variables is easy to recognize and isolate while conditional relationships are clearly delimited by a directed graph edge: two variables are independent if all the paths between them are blocked (given the edges are directional). Not all the joint probabilities need to be calculated to make a decision; extraneous branches and relationships can be ignored (One can make a prediction of a radio announcement regardless of whether an alarm sounds). By optimizing the graph, every node can be shown to have at most k parents. The algorithmic routines required can then be run in O(2kn) instead of O(2n) time. In essence, the algorithm can run in linear time (based on the number of edges) instead of exponential time (based on the number of parameters).
Associated with each node is a set of conditional probability distributions. For example, the "Alarm" node might have the following probability distribution:
|E||B||P(A | E,B)||P(!A|E,B)|
For example, should there be both an earthquake and a burglary, the alarm has a 90% chance of sounding. With only an earthquake and no burglary, it would only sound in 20% of the cases. A burglary unaccompanied by an earthquake would set off the alarm 90% of the time, and the chance of a false alarm given no antecedent event should only have a probability of 0.1% of the time. Obviously, these values would have to be determined a posteriori.