We address statistical classifier design given a mixed training set consisting of a small labelled feature set and a (generally larger) set of unlabel led feat u res. This situation arises, e.g., for medical images, where although training features may be plentiful, expensive expertise is required to extract their class labels. We propose a classifier structure and learning algorithm that make effective use of unlabelled data to improve performance. The learning is based on maximization of the total data likelihood, i.e. over both the labelled and unlabelled data subsets. Two distinct EM learning algorithms are proposed, differing in the EM formalism applied for unlabelled data. The classifier, based on a joint probability model for features and labels, is a "mixture of experts" structure that is equivalent to the radial basis function (RBP) classifier, but unlike RBFs, is amenable to likelihood-based training. The scope of application for the new method is greatly extended by the observation that test data, or any new data to classify, is in fact additional, unlabelled data-thus, a combined learning/classification operation-much akin to what is done in image segmentation-can be invoked whenever there is new data to classify. Experiments with data sets from the UC Irvine database demonstrate that the new learning algorithms and structure achieve substantial performance gains over alternative approaches.