ZTE Communications ›› 2024, Vol. 22 ›› Issue (4): 67-77.DOI: 10.12142/ZTECOM.202404010
收稿日期:
2024-02-29
出版日期:
2024-12-20
发布日期:
2024-12-03
CHEN Hao1,2(), ZHANG Kaijiong1,2, CHEN Jun1,2, ZHANG Ziwen1,2, JIA Xia1,2
Received:
2024-02-29
Online:
2024-12-20
Published:
2024-12-03
About author:
CHEN Hao (chen.hao16@zte.com.cn) received his BS and MS degrees in control theory and control engineering from Harbin Engineering University, China in 2018 and 2020. He has been engaged in deep learning technologies in ZTE Corporation since his graduation. His research interests include digital humans, SLAM, and image recognition.. [J]. ZTE Communications, 2024, 22(4): 67-77.
CHEN Hao, ZHANG Kaijiong, CHEN Jun, ZHANG Ziwen, JIA Xia. Unsupervised Motion Removal for Dynamic SLAM[J]. ZTE Communications, 2024, 22(4): 67-77.
Experiment | Configuration | K09 | K10 | |
---|---|---|---|---|
Depth estimate | - | 11.527 | 4.775 | |
PENet | 4.413 | 3.366 | ||
CompletionFormer | 3.569 | 2.748 | ||
GA-net | 2.689 | 1.414 | ||
Dynamic region removal | No | 5.131 | 2.351 | |
Yes | 2.689 | 1.414 | ||
Loop detection | - | 4.058 | 1.412 | |
Flow | 3.835 | 1.414 | ||
Flow and feature | τ=0.5 | 3.665 | 1.414 | |
τ=1.2 | 2.689 | 1.420 | ||
τ=2 | 3.872 | 1.427 |
Table 1 Ablations experiment of UMR-SLAM, where the best results are displayed in bold
Experiment | Configuration | K09 | K10 | |
---|---|---|---|---|
Depth estimate | - | 11.527 | 4.775 | |
PENet | 4.413 | 3.366 | ||
CompletionFormer | 3.569 | 2.748 | ||
GA-net | 2.689 | 1.414 | ||
Dynamic region removal | No | 5.131 | 2.351 | |
Yes | 2.689 | 1.414 | ||
Loop detection | - | 4.058 | 1.412 | |
Flow | 3.835 | 1.414 | ||
Flow and feature | τ=0.5 | 3.665 | 1.414 | |
τ=1.2 | 2.689 | 1.420 | ||
τ=2 | 3.872 | 1.427 |
Method | K09 | K10 | VK01 | VK02 | VK06 | VK18 | VK20 |
---|---|---|---|---|---|---|---|
DROID-SLAM | 5.453 | 2.514 | 0.197 | 0.192 | 0.007 | 1.030 | 3.041 |
Our URM-SLAM | 2.689 | 1.414 | 0.128 | 0.030 | 0.007 | 0.812 | 1.189 |
Table 2 ATE [M] metric for dynamic SLAM on the KITTI (K) and Virtual KITTI2 (VK) datasets, where we achieve the best results. All test results are based on RGBD
Method | K09 | K10 | VK01 | VK02 | VK06 | VK18 | VK20 |
---|---|---|---|---|---|---|---|
DROID-SLAM | 5.453 | 2.514 | 0.197 | 0.192 | 0.007 | 1.030 | 3.041 |
Our URM-SLAM | 2.689 | 1.414 | 0.128 | 0.030 | 0.007 | 0.812 | 1.189 |
Method | Input Modes | DVO-SLAM[ | ORB-SLAM2 | PointCorr[ | DROID-SLAM | Ours |
---|---|---|---|---|---|---|
Slightly dynamic | fr2/desk-person | 0.104 | 0.006 | 0.008 | 0.019 | 0.014 |
fr3/sitting-static | 0.012 | 0.008 | 0.010 | 0.006 | 0.007 | |
fr3/sitting-xyz | 0.242 | 0.010 | 0.009 | 0.011 | 0.009 | |
fr3/sitting-rpy | 0.176 | 0.025 | 0.023 | 0.022 | 0.020 | |
fr3/sitting-halfsphere | 0.220 | 0.025 | 0.024 | 0.023 | 0.022 | |
Highly dynamic | fr3/walking-static | 0.752 | 0.408 | 0.011 | 0.007 | 0.004 |
fr3/walking-xyz | 1.383 | 0.722 | 0.087 | 0.015 | 0.013 | |
fr3/walking-rpy | 1.292 | 0.805 | 0.161 | 0.050 | 0.045 | |
fr3/walking-halfsphere | 1.014 | 0.723 | 0.035 | 0.029 | 0.032 |
Table 3 Dynamic SLAM results of TUM dynamic sequences, measured as ATE [M]. The best results are displayed in bold
Method | Input Modes | DVO-SLAM[ | ORB-SLAM2 | PointCorr[ | DROID-SLAM | Ours |
---|---|---|---|---|---|---|
Slightly dynamic | fr2/desk-person | 0.104 | 0.006 | 0.008 | 0.019 | 0.014 |
fr3/sitting-static | 0.012 | 0.008 | 0.010 | 0.006 | 0.007 | |
fr3/sitting-xyz | 0.242 | 0.010 | 0.009 | 0.011 | 0.009 | |
fr3/sitting-rpy | 0.176 | 0.025 | 0.023 | 0.022 | 0.020 | |
fr3/sitting-halfsphere | 0.220 | 0.025 | 0.024 | 0.023 | 0.022 | |
Highly dynamic | fr3/walking-static | 0.752 | 0.408 | 0.011 | 0.007 | 0.004 |
fr3/walking-xyz | 1.383 | 0.722 | 0.087 | 0.015 | 0.013 | |
fr3/walking-rpy | 1.292 | 0.805 | 0.161 | 0.050 | 0.045 | |
fr3/walking-halfsphere | 1.014 | 0.723 | 0.035 | 0.029 | 0.032 |
1 | ZHONG F W, WANG S, ZHANG Z Q, et al. Detect-SLAM: making object detection and SLAM mutually beneficial [C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018: 1001–1010. DOI: 10.1109/WACV.2018.00115 |
2 | WU W X, GUO L, GAO H L, et al. YOLO-SLAM: a semantic SLAM system towards dynamic environment with geometric constraint [J]. Neural computing and applications, 2022, 34(8): 6011–6026. DOI: 10.1007/s00521-021-06764-3 |
3 | YU C, LIU Z X, LIU X J, et al. DS-SLAM: a semantic visual SLAM towards dynamic environments [C]//International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018: 1168–1174. DOI: 10.1109/IROS.2018.8593691 |
4 | LIU Y B, MIURA J. RDS-SLAM: Real-time dynamic SLAM using semantic segmentation methods [J]. IEEE access, 2021, 9: 23772–23785. DOI: 10.1109/ACCESS.2021.3050617 |
5 | BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes [J]. IEEE robotics and automation letters, 2018, 3(4): 4076–4083. DOI: 10.1109/LRA.2018.2860039 |
6 | WANG C J, LUO B, ZHANG Y, et al. DymSLAM: 4D dynamic scene reconstruction based on geometrical motion segmentation [J]. IEEE robotics and automation letters, 2021, 6(2): 550–557. DOI: 10.1109/LRA.2020.3045647 |
7 | SUN Y X, LIU M, MENG Q H. Motion removal for reliable RGB-D SLAM in dynamic environments [J]. Robotics and autonomous systems, 2018, 108: 115–128. DOI: 10.1016/j.robot.2018.07.002 |
8 | DAI W C, ZHANG Y, LI P, et al. RGB-D SLAM in dynamic environments using point correlations [J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(1): 373–389. DOI: 10.1109/TPAMI.2020.3010942 |
9 | YUAN C F, XU Y L, ZHOU Q. PLDS-SLAM: point and line features SLAM in dynamic environment [J]. Remote sensing, 2023, 15(7): 1893. DOI: 10.3390/rs15071893 |
10 | ZHANG J, HENEIN M, MAHONY R, et al. VDO-SLAM: a visual dynamic object-aware SLAM system [EB/OL]. (2020-05-22) [2021-12-14]. |
11 | CHO H M, KIM E. Dynamic object-aware visual odometry (VO) estimation based on optical flow matching [J]. IEEE access, 1961, 11: 11642–11651. DOI: 10.1109/ACCESS.2023.3241961 |
12 | YE W C, LAN X Y, CHEN S, et al. PVO: panoptic visual odometry [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023: 9579–9589. DOI: 10.1109/CVPR52729.2023.00924 |
13 | SHEN S H, CAI Y L, WANG W S, et al. DytanVO: joint refinement of visual odometry and motion segmentation in dynamic environments [C]//International Conference on Robotics and Automation (ICRA). IEEE, 2023: 4048–4055. DOI: 10.1109/ICRA48891.2023.10161306 |
14 | DOSOVITSKIY A, FISCHER P, ILG E, et al. FlowNet: learning optical flow with convolutional networks [C]//International Conference on Computer Vision (ICCV). IEEE, 2015: 2758–2766. DOI: 10.1109/ICCV.2015.316 |
15 | ILG E, MAYER N, SAIKIA T, et al. FlowNet 2.0: evolution of optical flow estimation with deep networks [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 1647–1655. DOI: 10.1109/CVPR.2017.179 |
16 | JONSCHKOWSKI R, STONE A, BARRON J T, et al. What matters in unsupervised optical flow [M]//Lecture notes in computer science. Cham: Springer International Publishing, 2020: 557–572. DOI: 10.1007/978-3-030-58536-5_33 |
17 | RANJAN A, BLACK M J. Optical flow estimation using a spatial pyramid network [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2720–2729. DOI: 10.1109/CVPR.2017.291 |
18 | SUN D Q, YANG X D, LIU M Y, et al. PWC-net: CNNs for optical flow using pyramid, warping, and cost volume [C]//Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 8934–8943. DOI: 10.1109/CVPR.2018.00931 |
19 | TEED Z, DENG J. RAFT: recurrent all-pairs field transforms for optical flow [M]//lecture notes in computer science. Cham: Springer International Publishing, 2020: 402–419. DOI: 10.1007/978-3-030-58536-5_24 |
20 | TEED Z, DENG J. RAFT-3D: Scene Flow using Rigid-Motion Embeddings [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021: 8371–8380. DOI: 10.1109/CVPR46437.2021.00827 |
21 | REN Z L, GALLO O, SUN D Q, et al. A fusion approach for multi-frame optical flow estimation [C]//IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019: 2077–2086. DOI: 10.1109/WACV.2019.00225 |
22 | SHI H, ZHOU Y F, YANG K L, et al. CSFlow: learning optical flow via cross strip correlation for autonomous driving [C]//Intelligent Vehicles Symposium (IV). IEEE, 2022: 1851–1858. DOI: 10.1109/IV51971.2022.9827341 |
23 | GARREPALLI R, JEONG J, RAVINDRAN R C, et al. DIFT: dynamic iterative field transforms for memory efficient optical flow [C]//Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2023: 2220–2229. DOI: 10.1109/CVPRW59228.2023.00216 |
24 | HUI T W, TANG X O, LOY C C. A lightweight optical flow CNN: revisiting data fidelity and regularization [J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(8): 2555–2569. DOI: 10.1109/TPAMI.2020.2976928 |
25 | GALVEZ-LÓPEZ D, TARDOS J D. Bags of binary words for fast place recognition in image sequences [J]. IEEE transactions on robotics, 2012, 28(5): 1188–1197. DOI: 10.1109/TRO.2012.2197158 |
26 | SUENDERHAUF N, SHIRAZI S, JACOBSON A, et al. Place recognition with ConvNet landmarks: viewpoint-robust, condition-robust, training-free [C]//Proceedings of Robotics: Science and Systems XI. Robotics: Science and Systems Foundation, 2015: 1–10. DOI: 10.15607/rss.2015.xi.022 |
27 | GAO X, ZHANG T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system [J]. Autonomous robots, 2017, 41(1): 1–18. DOI: 10.1007/s10514-015-9516-2 |
28 | MERRILL N, HUANG G Q. Lightweight unsupervised deep loop closure [EB/OL]. (2018-05-24) [2023-10-10]. |
29 | MEMON A R, WANG H S, HUSSAIN A. Loop closure detection using supervised and unsupervised deep neural networks for monocular SLAM systems [J]. Robotics and autonomous systems, 2020, 126: 103470. DOI: 10.1016/j.robot.2020.103470 |
30 | TEED Z, DENG J. DROID-SLAM: deep visual SLAM for monocular, stereo, and RGB-D cameras [J]. Advances in neural information processing systems, 2021, 34: 16558–16569 |
31 | ZHANG Y M, GUO X D, POGGI M, et al. CompletionFormer: depth completion with convolutions and vision transformers [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023: 18527–18536. DOI: 10.1109/CVPR52729.2023.01777 |
32 | ZHANG F H, PRISACARIU V, YANG R G, et al. GA-net: guided aggregation net for end-to-end stereo matching [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 185–194. DOI: 10.1109/CVPR.2019.00027 |
33 | CABON Y, MURRAY N, HUMENBERGER M. Virtual KITTI 2 [EB/OL]. (2020-01-29) [2023-10-10]. |
34 | GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: the KITTI dataset [J]. The international journal of robotics research, 2013, 32(11): 1231–1237. DOI: 10.1177/0278364913491297 |
35 | SCHUBERT D, GOLL T, DEMMEL N, et al. The TUM VI benchmark for evaluating visual-inertial odometry [C]//International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018: 1680–1687. DOI: 10.1109/IROS.2018.8593419 |
36 | MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras [J]. IEEE transactions on robotics, 2017, 33(5): 1255–1262. DOI: 10.1109/TRO.2017.2705103 |
37 | BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes [J]. IEEE robotics and automation letters, 2018, 3(4): 4076–4083. DOI: 10.1109/LRA.2018.2860039 |
38 | KERL C, STURM J, CREMERS D. Robust odometry estimation for RGB-D cameras [C]//IEEE International Conference on Robotics and Automation. IEEE, 2013: 3748–3754. DOI: 10.1109/ICRA.2013.6631104 |
39 | DAI W C, ZHANG Y, LI P, et al. RGB-D SLAM in dynamic environments using point correlations [J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(1): 373–389. DOI: 10.1109/TPAMI.2020.3010942 |
40 | YE W C, YU X Y, LAN X Y, et al. DeFlowSLAM: self-supervised scene motion decomposition for dynamic dense SLAM [EB/OL]. [2023-10-10]. |
No related articles found! |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||