Petri Kontkanen
Complex Systems Computation Group (CoSCo)
P.O.Box 26, Department of Computer Science
FIN-00014 University of Helsinki, Finland
E-mail: Petri.Kontkanen@cs.Helsinki.FI
Phone: +358-9-70844226
Fax: +358-9-70844441
Petri Myllymäki
Complex Systems Computation Group (CoSCo)
P.O.Box 26, Department of Computer Science
FIN-00014 University of Helsinki, Finland
E-mail: Petri.Myllymaki@cs.Helsinki.FI
Phone: +358-9-70844212
Fax: +358-9-70844441
Tomi Silander
Complex Systems Computation Group (CoSCo)
P.O.Box 26, Department of Computer Science
FIN-00014 University of Helsinki, Finland
E-mail: Tomi.Silander@cs.Helsinki.FI
Phone: +358-9-70844214
Fax: +358-9-70844441
Henry Tirri
Complex Systems Computation Group (CoSCo)
P.O.Box 26, Department of Computer Science
FIN-00014 University of Helsinki, Finland
E-mail: Henry.Tirri@cs.Helsinki.FI
Phone: +358-9-70844173
Fax: +358-9-70844441
Kimmo Valtonen
Complex Systems Computation Group (CoSCo)
P.O.Box 26, Department of Computer Science
FIN-00014 University of Helsinki, Finland
E-mail: Kimmo.Valtonen@cs.Helsinki.FI
Phone: +358-9-70844178
Fax: +358-9-70844441
Given a set of sample data, we study three alternative methods for determining the predictive distribution of an unseen data vector. In particular, we are interested in the behavior of the predictive accuracy of these three predictive methods as a function of the degree of the domain assumption violations. We explore this question empirically by using artificially generated data sets, where the assumptions can be violated in various ways. Our empirical results suggest that if the model assumptions are only mildly violated, marginalization over the model parameters may not be necessary in practice. This is due to the fact that in this case the computationally much simpler predictive distribution based on a single, maximum posterior probability model shows similar performance as the computationally more demanding marginal likelihood approach. The results also give support to Rissanen's theoretical results about the usefulness of using Jeffreys' prior distribution for the model parameters.
Predictive inference, Jeffreys' prior, MDL, Bayesian networks