Abstract
In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.
Original language | English (US) |
---|---|
Title of host publication | 2010 Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS'2010 |
DOIs | |
State | Published - 2010 |
Event | 2010 Annual North American Fuzzy Information Processing Society Conference, NAFIPS'2010 - Toronto, ON, Canada Duration: Jul 12 2010 → Jul 14 2010 |
Other
Other | 2010 Annual North American Fuzzy Information Processing Society Conference, NAFIPS'2010 |
---|---|
Country/Territory | Canada |
City | Toronto, ON |
Period | 7/12/10 → 7/14/10 |
All Science Journal Classification (ASJC) codes
- General Computer Science
- General Mathematics