In this paper, we investigate the performance of statistical, mathematical programming and heuristic linear models for cost-sensitive classification. In particular, we use five cost-sensitive techniques including Fisher's discriminant analysis (DA), asymmetric misclassification cost mixed integer programming (AMC-MIP), cost-sensitive support vector machine (CS-SVM), a hybrid support vector machine and mixed integer programming (SVMIP) and heuristic cost-sensitive genetic algorithm (CGA) techniques. Using simulated datasets of varying group overlaps, data distributions and class biases, and real-world datasets from financial and medical domains, we compare the performances of our five techniques based on overall holdout sample misclassification cost. The results of our experiments on simulated datasets indicate that when group overlap is low and data distribution is exponential, DA appears to provide superior performance. For all other situations with simulated datasets, CS-SVM provides superior performance. In case of real-world datasets from financial domain, CGA and AMC-MIP hold a slight edge over the two SVM-based classifiers. However, for medical domains with mixed continuous and discrete attributes, SVM classifiers perform better than heuristic (CGA) and AMC-MIP classifiers. The SVMIP model is the most computationally inefficient model and poor performing model.
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Theoretical Computer Science
- Computational Theory and Mathematics
- Artificial Intelligence