TY - JOUR
T1 - Design and implementation of a parallel geographically weighted k-nearest neighbor classifier
AU - Pu, Yingxia
AU - Zhao, Xinyi
AU - Chi, Guangqing
AU - Zhao, Shuhe
AU - Wang, Jiechen
AU - Jin, Zhibin
AU - Yin, Junjun
N1 - Publisher Copyright:
© 2019 Elsevier Ltd
PY - 2019/6
Y1 - 2019/6
N2 - The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. The geographically weighted k-nearest neighbors (gwk-NN) classifier, which incorporates spatial information into the traditional k-NN classifier, has demonstrated better performance in mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence into spectral information is computationally intensive. To improve the computing performance of the gwk-NN classifier, this study first considered two commonly used parallel strategies—data parallelism and task parallelism—in the model training and image classification stages. Then, our implementation of the corresponding parallel algorithms was carried out by calling message passing interface (MPI) and the geospatial data abstraction library (GDAL) in the C++ development environment on a standalone eight-core computer. Based on the performance of these two strategies, the potentiality of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification was further investigated. Our experimental results indicate that the parallel gwk-NN classifier can improve the classification efficiency of high-resolution remote sensing images with multiple land cover types. Specifically, the data parallelism method is more effective than the task parallelism method in both the model training and classification stages because of the minor effect of parallel overhead on the total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies, as evidenced by the two largest speedups being attained under dual parallelism I (5.28 ×), which is based on the premise of task parallelism, and dual parallelism II (5.73 ×), in which the priority is given to data decomposition. Comparatively, dual parallelism II provides the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.
AB - The development of high-performance classifiers represents an important step in improving the timeliness of remote sensing classification in the era of high spatial resolution. The geographically weighted k-nearest neighbors (gwk-NN) classifier, which incorporates spatial information into the traditional k-NN classifier, has demonstrated better performance in mitigating salt-and-pepper noise and misclassification. However, the integration of spatial dependence into spectral information is computationally intensive. To improve the computing performance of the gwk-NN classifier, this study first considered two commonly used parallel strategies—data parallelism and task parallelism—in the model training and image classification stages. Then, our implementation of the corresponding parallel algorithms was carried out by calling message passing interface (MPI) and the geospatial data abstraction library (GDAL) in the C++ development environment on a standalone eight-core computer. Based on the performance of these two strategies, the potentiality of dual parallelism (the simultaneous exploitation of data and task parallelism) in image classification was further investigated. Our experimental results indicate that the parallel gwk-NN classifier can improve the classification efficiency of high-resolution remote sensing images with multiple land cover types. Specifically, the data parallelism method is more effective than the task parallelism method in both the model training and classification stages because of the minor effect of parallel overhead on the total execution time. In addition, dual parallelism can take advantage of data and task parallel strategies, as evidenced by the two largest speedups being attained under dual parallelism I (5.28 ×), which is based on the premise of task parallelism, and dual parallelism II (5.73 ×), in which the priority is given to data decomposition. Comparatively, dual parallelism II provides the best performance by overlapping computation and data transmission, which is compatible with the current trend toward multicore architectures.
UR - http://www.scopus.com/inward/record.url?scp=85063545491&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063545491&partnerID=8YFLogxK
U2 - 10.1016/j.cageo.2019.02.009
DO - 10.1016/j.cageo.2019.02.009
M3 - Article
C2 - 32581418
AN - SCOPUS:85063545491
SN - 0098-3004
VL - 127
SP - 111
EP - 122
JO - Computers and Geosciences
JF - Computers and Geosciences
ER -