TY - GEN
T1 - Geographical feature classification from text using (active) convolutional neural networks
AU - Yang, Liping
AU - MacEachren, Alan M.
AU - Mitra, Prasenjit
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/12
Y1 - 2020/12
N2 - Deep learning can discover intricate patterns hidden in big data, and has much better scalability than traditional machine learning when the volume of data increases dramatically. Thus, deep learning has gained many successes in various domains and applications such as image classification, text classification, and machine translation. In this paper, we use deep learning to classify geographical features (e.g., mountains, rivers, landmarks, and cities) from text, using geolocated Wikipedia entries as the case study application. We employ one of the most commonly used deep learning architectures, convolutional neural networks (CNNs) and its integration with active learning (creating what we call active CNNs), to train the geographical feature classifiers on the Wikipedia text data set obtained from GeoNames (which provides the feature type for each geolocated entity). We evaluate the performance of CNNs and active CNNs with multiple metrics (i.e., accuracy, F1 score, and confusion matrix). Our experiment results demonstrated that CNNs and active CNNs can effectively classify geo-referenced text entities into predefined geographical features. In addition, our experiment results show that active CNNs outperform CNNs for hard to distinguish classes. In our experiment, we also compared results for hierarchical multi-class classification and flat multiclass classification, and the results show that hierarchical multiclass classification significantly outperforms flat multi-class classification for the data set we used.
AB - Deep learning can discover intricate patterns hidden in big data, and has much better scalability than traditional machine learning when the volume of data increases dramatically. Thus, deep learning has gained many successes in various domains and applications such as image classification, text classification, and machine translation. In this paper, we use deep learning to classify geographical features (e.g., mountains, rivers, landmarks, and cities) from text, using geolocated Wikipedia entries as the case study application. We employ one of the most commonly used deep learning architectures, convolutional neural networks (CNNs) and its integration with active learning (creating what we call active CNNs), to train the geographical feature classifiers on the Wikipedia text data set obtained from GeoNames (which provides the feature type for each geolocated entity). We evaluate the performance of CNNs and active CNNs with multiple metrics (i.e., accuracy, F1 score, and confusion matrix). Our experiment results demonstrated that CNNs and active CNNs can effectively classify geo-referenced text entities into predefined geographical features. In addition, our experiment results show that active CNNs outperform CNNs for hard to distinguish classes. In our experiment, we also compared results for hierarchical multi-class classification and flat multiclass classification, and the results show that hierarchical multiclass classification significantly outperforms flat multi-class classification for the data set we used.
UR - http://www.scopus.com/inward/record.url?scp=85102490361&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102490361&partnerID=8YFLogxK
U2 - 10.1109/ICMLA51294.2020.00188
DO - 10.1109/ICMLA51294.2020.00188
M3 - Conference contribution
AN - SCOPUS:85102490361
T3 - Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
SP - 1182
EP - 1198
BT - Proceedings - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
A2 - Wani, M. Arif
A2 - Luo, Feng
A2 - Li, Xiaolin
A2 - Dou, Dejing
A2 - Bonchi, Francesco
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 19th IEEE International Conference on Machine Learning and Applications, ICMLA 2020
Y2 - 14 December 2020 through 17 December 2020
ER -