3. Bayesian Networks
3.1 Introduction
The concept of conditional probability is a useful one. There are countless real world examples where the probability of one event is conditional on the probability of a previous one. While the sum and product rules of probability theory can anticipate this factor of conditionality, in many cases such calculations are NP-hard. The prospect of managing a scenario with 5 discrete random variables (25-1=31 discrete parameters) might be manageable. An expert system for monitoring patients with 37 variables resulting in a joint distribution of over 237 parameters would not be manageable[6].
3.2 Definition
Consider a domain U of n variables, x1,...xn. Each variable may be discrete having a finite or countable number of states, or continuous. Given a subset X of variables xi where xi U, if one can observe the state of every variable in X, then this observation is called an instance of X and is denoted as X=
for the observations
. The "joint space" of U is the set of all instances of U.
denotes the "generalized probability density" that X=
given Y=
for a person with current state information
. p(X|Y,
) then denotes the "Generalized Probability Density Function" (gpdf) for X, given all possible observations of Y. The joint gpdf over U is the gpdf for U.
A Bayesian network for domain U represents a joint gpdf over U. This representation consists of a set of local conditional gpdfs combined with a set of conditional independence assertions that allow the construction of a global gpdf from the local gpdfs. As shown previously, the chain rule of probability can be used to ascertain these values:

One assumption imposed by Bayesian Network theory (and indirectly by the Product Rule of probability theory) is that each variable xi, must be a set of variables that renders xi and {x1,...xi-1} conditionally independent. In this way:

A Bayesian Network Structure then encodes the assertions of conditional independence in equation 10 above. Essentially then, a Bayesian Network Structure Bs "is a directed acyclic graph such that (1) each variable in U corresponds to a node in Bs, and (2) the parents of the node corresponding to xi are the nodes corresponding to the variables in [Pi]i."[8]
"A Bayesian-network gpdf set Bp is the collection of local gpdfs for each node in the domain."[9]