TY - JOUR
T1 - Swin-T-NFC CRFs
T2 - An encoder–decoder neural model for high-precision UAV positioning via point cloud super resolution and image semantic segmentation
AU - Wang, Suhong
AU - Wang, Hongqing
AU - She, Shufeng
AU - Zhang, Yanping
AU - Qiu, Qingju
AU - Xiao, Zhifeng
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - 3D point cloud and remote sensing images have been the primary data types for the development of high-precision positioning systems. Equipped with a LiDAR and a camera, an Unmanned Aerial Vehicle (UAV) can explore an uncharted territory and gather both 3D scans and aerial images in real time to dynamically inspect the surroundings. However, the high cost of a high-resolution LiDAR hinders the development of the perception module of an UAV. Also, it is essential to adopt accurate image semantic segmentation (SemSeg) algorithms to better understand the sensing environment. As hardware advancement is ongoing, support from the software side is crucial. A promising strategy for cost control in building a LiDAR-based positioning system is through point cloud super-resolution (SupRes), a technique that improves the point cloud resolution via algorithms. This study investigates a deep learning-based framework that adopts a classic encoder–decoder structure for both point cloud SupRes and image SemSeg. Unlike prior studies that mainly use convolutional neural networks (CNNs) for feature extraction, our model, named Swin-T-NFC CRFs, consists of a Vision Transformer (ViT)-based encoder and a fully connected conditional random fields (FC-CRFs)-based decoder, connected via a pyramid pooling module and multiple skip connections. Moreover, both encoder and decoder are coupled with a shifted window strategy that allows cross-window connection. As such, patches from different windows of the feature map can participate in self-attention computation, leading to more powerful modeling ability. Experimental results demonstrate that our method can effectively boost the prediction accuracy, reduce the error, and consistently outperform the state-of-the-art methods on simulated/real-world point cloud datasets and the urban drone dataset version 6.
AB - 3D point cloud and remote sensing images have been the primary data types for the development of high-precision positioning systems. Equipped with a LiDAR and a camera, an Unmanned Aerial Vehicle (UAV) can explore an uncharted territory and gather both 3D scans and aerial images in real time to dynamically inspect the surroundings. However, the high cost of a high-resolution LiDAR hinders the development of the perception module of an UAV. Also, it is essential to adopt accurate image semantic segmentation (SemSeg) algorithms to better understand the sensing environment. As hardware advancement is ongoing, support from the software side is crucial. A promising strategy for cost control in building a LiDAR-based positioning system is through point cloud super-resolution (SupRes), a technique that improves the point cloud resolution via algorithms. This study investigates a deep learning-based framework that adopts a classic encoder–decoder structure for both point cloud SupRes and image SemSeg. Unlike prior studies that mainly use convolutional neural networks (CNNs) for feature extraction, our model, named Swin-T-NFC CRFs, consists of a Vision Transformer (ViT)-based encoder and a fully connected conditional random fields (FC-CRFs)-based decoder, connected via a pyramid pooling module and multiple skip connections. Moreover, both encoder and decoder are coupled with a shifted window strategy that allows cross-window connection. As such, patches from different windows of the feature map can participate in self-attention computation, leading to more powerful modeling ability. Experimental results demonstrate that our method can effectively boost the prediction accuracy, reduce the error, and consistently outperform the state-of-the-art methods on simulated/real-world point cloud datasets and the urban drone dataset version 6.
UR - http://www.scopus.com/inward/record.url?scp=85141319485&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141319485&partnerID=8YFLogxK
U2 - 10.1016/j.comcom.2022.10.011
DO - 10.1016/j.comcom.2022.10.011
M3 - Article
AN - SCOPUS:85141319485
SN - 0140-3664
VL - 197
SP - 52
EP - 60
JO - Computer Communications
JF - Computer Communications
ER -