TY - GEN
T1 - Deep Clustering for Mixed-type Data with Frequency Encoding and Doubly Weighted Cross Entropy Loss
AU - Choi, Deogho
AU - Chae, Daniel
AU - Kim, Woo Yeon
AU - Kim, Jihong
AU - Yang, Janghoon
AU - Shin, Jitae
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Clustering algorithm is unsupervised learning that groups a set of data into distinctive classes according to the similarity between each data sample. Most of previous researches have focused on improving K-prototypes or training proper numerical representations of categorical features using autoencoder. But in this research, we investigate that applying frequency encoding to categorical features can be sufficiently effective. Furthermore, we propose doubly weighted cross entropy loss, DW-CE loss, to find optimal cluster centroid by training fully connected layer. The experiment with two mixed-type datasets, credit approval and heart disease, from UCI repository shows that the proposed clustering with frequency encoding and DW-CE loss provides better performance than existing state of the arts methods in most of cases.
AB - Clustering algorithm is unsupervised learning that groups a set of data into distinctive classes according to the similarity between each data sample. Most of previous researches have focused on improving K-prototypes or training proper numerical representations of categorical features using autoencoder. But in this research, we investigate that applying frequency encoding to categorical features can be sufficiently effective. Furthermore, we propose doubly weighted cross entropy loss, DW-CE loss, to find optimal cluster centroid by training fully connected layer. The experiment with two mixed-type datasets, credit approval and heart disease, from UCI repository shows that the proposed clustering with frequency encoding and DW-CE loss provides better performance than existing state of the arts methods in most of cases.
UR - https://www.scopus.com/pages/publications/85140608585
UR - https://www.scopus.com/pages/publications/85140608585#tab=citedBy
U2 - 10.1109/ITC-CSCC55581.2022.9894964
DO - 10.1109/ITC-CSCC55581.2022.9894964
M3 - Conference contribution
AN - SCOPUS:85140608585
T3 - ITC-CSCC 2022 - 37th International Technical Conference on Circuits/Systems, Computers and Communications
SP - 141
EP - 144
BT - ITC-CSCC 2022 - 37th International Technical Conference on Circuits/Systems, Computers and Communications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 37th International Technical Conference on Circuits/Systems, Computers and Communications, ITC-CSCC 2022
Y2 - 5 July 2022 through 8 July 2022
ER -