3. Bayesian Networks

3.1 Introduction

The concept of conditional probability is a useful one. There are countless real world examples where the probability of one event is conditional on the probability of a previous one. While the sum and product rules of probability theory can anticipate this factor of conditionality, in many cases such calculations are NP-hard. The prospect of managing a scenario with 5 discrete random variables (25-1=31 discrete parameters) might be manageable. An expert system for monitoring patients with 37 variables resulting in a joint distribution of over 237 parameters would not be manageable[6].

3.2 Definition

Consider a domain U of n variables, x1,...xn. Each variable may be discrete having a finite or countable number of states, or continuous. Given a subset X of variables xi where xi is an element of U, if one can observe the state of every variable in X, then this observation is called an instance of X and is denoted as X=p(x_sub_i|x_sub_1,...x_sub_i-1,xi)=p(x_sub_i|PI_sub_i,xi)k_sub_X_arrow for the observations x_sub_i=k_sub_i, x_sub_i is an element of X. The "joint space" of U is the set of all instances of U. p(X=k_sub_X_arrow | Y=k_sub_Y_arrow, xi) denotes the "generalized probability density" that X=p(x_sub_i|x_sub_1,...x_sub_i-1,xi)=p(x_sub_i|PI_sub_i,xi)k_sub_X_arrow given Y=k_sub_Y_arrow for a person with current state information xi. p(X|Y, xi) then denotes the "Generalized Probability Density Function" (gpdf) for X, given all possible observations of Y. The joint gpdf over U is the gpdf for U.

A Bayesian network for domain U represents a joint gpdf over U. This representation consists of a set of local conditional gpdfs combined with a set of conditional independence assertions that allow the construction of a global gpdf from the local gpdfs. As shown previously, the chain rule of probability can be used to ascertain these values:

p(x_sub_1,...x_sub_n | xi) = PI from i=1 to i=n (p(x_sub_i | x_sub_1,...x_sub_i-1, xi)) (eq. 10)

One assumption imposed by Bayesian Network theory (and indirectly by the Product Rule of probability theory) is that each variable xi, PI_sub_i is a proper subset of {x_sub_1,...x_sub_i-1} must be a set of variables that renders xi and {x1,...xi-1} conditionally independent. In this way:

p(x_sub_i|x_sub_1,...x_sub_i-1,xi)=p(x_sub_i|PI_sub_i,xi) (eq. 11)[7]

A Bayesian Network Structure then encodes the assertions of conditional independence in equation 10 above. Essentially then, a Bayesian Network Structure Bs "is a directed acyclic graph such that (1) each variable in U corresponds to a node in Bs, and (2) the parents of the node corresponding to xi are the nodes corresponding to the variables in [Pi]i."[8]

"A Bayesian-network gpdf set Bp is the collection of local gpdfs p(x_sub_i | PI_sub_i, xi) for each node in the domain."[9]