TY - GEN
T1 - Learning classifiers from distributional data
AU - Lin, Harris T.
AU - Lee, Sanghack
AU - Bui, Ngot
AU - Honavar, Vasant
PY - 2013
Y1 - 2013
N2 - Many big data applications give rise to distributional data wherein objects or individuals are naturally represented as K-tuples of bags of feature values where feature values in each bag are sampled from a feature and object specific distribution. We formulate and solve the problem of learning classifiers from distributional data. We consider three classes of methods for learning distributional classifiers: (i) those that rely on aggregation to encode distributional data into tuples of attribute values, i.e., instances that can be handled by traditional supervised machine learning algorithms, (ii) those that are based on generative models of distributional data, and (iii) the discriminative counterparts of the generative models considered in (ii) above. We compare the performance of the different algorithms on real-world as well as synthetic distributional data sets. The results of our experiments demonstrate that classifiers that take advantage of the information available in the distributional instance representation outperform or match the performance of those that fail to fully exploit such information.
AB - Many big data applications give rise to distributional data wherein objects or individuals are naturally represented as K-tuples of bags of feature values where feature values in each bag are sampled from a feature and object specific distribution. We formulate and solve the problem of learning classifiers from distributional data. We consider three classes of methods for learning distributional classifiers: (i) those that rely on aggregation to encode distributional data into tuples of attribute values, i.e., instances that can be handled by traditional supervised machine learning algorithms, (ii) those that are based on generative models of distributional data, and (iii) the discriminative counterparts of the generative models considered in (ii) above. We compare the performance of the different algorithms on real-world as well as synthetic distributional data sets. The results of our experiments demonstrate that classifiers that take advantage of the information available in the distributional instance representation outperform or match the performance of those that fail to fully exploit such information.
UR - http://www.scopus.com/inward/record.url?scp=84886077520&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84886077520&partnerID=8YFLogxK
U2 - 10.1109/BigData.Congress.2013.47
DO - 10.1109/BigData.Congress.2013.47
M3 - Conference contribution
AN - SCOPUS:84886077520
SN - 9780768550060
T3 - Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013
SP - 302
EP - 309
BT - Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013
T2 - 2013 IEEE International Congress on Big Data, BigData 2013
Y2 - 27 June 2013 through 2 July 2013
ER -