Learning pseudo-independent models
A type of problem domain known as a pseudo-independent (PI) model poses difficulty for the common algorithms for learning probabilistic networks based on a single-link lookahead search. To learn PI models, a special search method called a multi-link lookahead is required, and the method has been implemented in an algorithm called the RML by Xiang et. at in 2000. The algorithm is equipped with the scoring metric of cross entropy measuring the goodness-of-fit of a model and a user-provided acceptance threshold of a goodness-of-fit score, defining a stopping condition of the search. Recently an overfitting problem has been identified in using a multi-link search with a single threshold. io fix the problem, this thesis presents a new scoring metric that explicitly incorporates model complexity based on the principle of minimum description length (MDL), so that goodness¬of-fit can be explicitly traded for complexity and vice versa. This scoring metric requires accurate estimation of the complexity of PI models. This thesis proposes a novel methodology for measuring the complexity of PI models based on a new perspective called a hypercube. The new complexity formulae for different types of PI models are presented. These formulae have been discovered from a deeper understanding of the characteristics of each PI model type and by the hypercube method. The formulae are simple in form and provide insight into the structural dependency relationships among the probability parameters of a PI model. In addition to the new scoring metric and PI model complexity, this thesis presents a local score computation method that uses only a small subset.