Incomplete-data classification using logistic regression

David Williams, Xuejun Liao, Ya Xue, Lawrence Carin

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    55 Scopus citations

    Abstract

    A logistic regression classification algorithm is developed for problems in which the feature vectors may be missing data (features). Single or multiple imputation for the missing data is avoided by performing analytic integration with an estimated conditional density function (conditioned on the non-missing data). Conditional density functions are estimated using a Gaussian mixture model (GMM), with parameter estimation performed using both expectation maximization (EM) and Variational Bayesian EM (VB-EM). Using widely available real data, we demonstrate the general advantage of the VB-EM GMM estimation for handling incomplete data, vis-à-vis the EM algorithm. Moreover, it is demonstrated that the approach proposed here is generally superior to standard imputation procedures.

    Original languageEnglish (US)
    Title of host publicationICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
    EditorsL. Raedt, S. Wrobel
    Pages977-984
    Number of pages8
    StatePublished - 2005
    EventICML 2005: 22nd International Conference on Machine Learning - Bonn, Germany
    Duration: Aug 7 2005Aug 11 2005

    Publication series

    NameICML 2005 - Proceedings of the 22nd International Conference on Machine Learning

    Other

    OtherICML 2005: 22nd International Conference on Machine Learning
    Country/TerritoryGermany
    CityBonn
    Period8/7/058/11/05

    All Science Journal Classification (ASJC) codes

    • General Engineering

    Fingerprint

    Dive into the research topics of 'Incomplete-data classification using logistic regression'. Together they form a unique fingerprint.

    Cite this