Quantifying Data Difficulty with Polarized K-Entropy for Assessing Machine Learning Models

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Data difficulty level measurement is a critical aspect of machine learning performance evaluation. Several measures have been used to assess the difficulty level of classifying data points in binary classification. However, these measures typically involve building a machine learning model first, which is then used to assess the data difficulty level. In this paper, we propose a novel model agnostic measure named as polarized K-entropy to evaluate the difficulty of classifying a data instance. Our measure leverages the computation of entropy based on the nearest neighbors of a data point. We conducted experiments to evaluate the effectiveness of our proposed method by analyzing how the accuracy of machine learning models change with respect to data difficulty. We used Spearman's rank correlation coefficient to analyze this relationship for neural network, support vector machine, and random forest. Our results show that our measure outperformed the non-conformity measure in all the experiments conducted for six datasets using the selected machine learning models.

Original languageEnglish (US)
Title of host publicationProceedings - 2024 IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages7-12
Number of pages6
ISBN (Electronic)9798350351187
DOIs
StatePublished - 2024
Event25th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2024 - San Jose, United States
Duration: Aug 7 2024Aug 9 2024

Publication series

NameProceedings - 2024 IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2024

Conference

Conference25th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2024
Country/TerritoryUnited States
CitySan Jose
Period8/7/248/9/24

All Science Journal Classification (ASJC) codes

  • Computer Vision and Pattern Recognition
  • Information Systems
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality

Fingerprint

Dive into the research topics of 'Quantifying Data Difficulty with Polarized K-Entropy for Assessing Machine Learning Models'. Together they form a unique fingerprint.

Cite this