Unsupervised Motion Removal for Dynamic SLAM

doi:10.12142/ZTECOM.202404010

ZTE Communications ›› 2024, Vol. 22 ›› Issue (4): 67-77.DOI: 10.12142/ZTECOM.202404010

• Research Papers • Previous Articles Next Articles

Unsupervised Motion Removal for Dynamic SLAM

CHEN Hao¹^,²(), ZHANG Kaijiong¹^,², CHEN Jun¹^,², ZHANG Ziwen¹^,², JIA Xia¹^,²

^1.ZTE Corporation, Shenzhen 518057, China
^2.State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen 518055, China

Received:2024-02-29 Online:2024-12-20 Published:2024-12-03
About author:CHEN Hao (chen.hao16@zte.com.cn) received his BS and MS degrees in control theory and control engineering from Harbin Engineering University, China in 2018 and 2020. He has been engaged in deep learning technologies in ZTE Corporation since his graduation. His research interests include digital humans, SLAM, and image recognition.
ZHANG Kaijiong received his MS degree from Shanghai Jiao Tong University, China in 2020. He is currently an algorithm engineer with ZTE Corporation. His research interests include computer vision, image/video processing and artificial intelligence.
CHEN Jun received his master’s degree in aerospace science and technology from Nanjing University of Aeronautics and Astronautics, China. He has been engaged in the R&D of computer graphics, computer vision, and cloud computing for more than 10 years in ZTE Corporation, and has accumulated rich experience in solution and engineering.
ZHANG Ziwen received his bachelor’s degree in instrument science and technology and master’s degree in instrument engineering from Harbin Institute of Technology, China in 2018 and 2020 respectively. After graduation, he worked at ZTE Corporation as a computer vision algorithm engineer. He has been engaged in algorithm research, design, improvement and end-to-end deployment optimization in the fields of face detection and recognition, image matching, SLAM, digital human generation, and portrait stylization migration for a long time, and has accumulated rich experience in these fields.
JIA Xia received her BS and MS degrees in control theory and control engineering from Taiyuan University of Technology, China, and Dalian University of Technology, China in 1995 and 2001, respectively. She joined ZTE Corporation in 2001 and worked in the State Key Laboratory of Mobile Network and Mobile Multimedia Technology. Her main research interests include deep learning techniques, face detection and recognition, Re-ID, and activity detection and recognition.

Abstract

Abstract:

We propose a dynamic simultaneous localization and mapping technology for unsupervised motion removal (UMR-SLAM), which is a deep learning-based dynamic RGBD SLAM. It is the first time that a scheme combining scene flow and deep learning SLAM is proposed to improve the accuracy of SLAM in dynamic scenes, in response to the situation where dynamic objects cause pose changes. The entire process does not require explicit object segmentation as supervisory information. We also propose a loop detection scheme that combines optical flow and feature similarity in the backend optimization section of the SLAM system to improve the accuracy of loop detection. UMR-SLAM is rewritten based on the DROID-SLAM code architecture. Through experiments on different datasets, it has been proven that our scheme has higher pose accuracy in dynamic scenarios compared with the current advanced SLAM algorithm.

Key words: dynamic RGBD SLAM, update module, motion estimation, scene flow

CHEN Hao, ZHANG Kaijiong, CHEN Jun, ZHANG Ziwen, JIA Xia. Unsupervised Motion Removal for Dynamic SLAM[J]. ZTE Communications, 2024, 22(4): 67-77.

Figures/Tables 9

References 40

1	ZHONG F W, WANG S, ZHANG Z Q, et al. Detect-SLAM: making object detection and SLAM mutually beneficial [C]//Proceedings of IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2018: 1001–1010. DOI: 10.1109/WACV.2018.00115
2	WU W X, GUO L, GAO H L, et al. YOLO-SLAM: a semantic SLAM system towards dynamic environment with geometric constraint [J]. Neural computing and applications, 2022, 34(8): 6011–6026. DOI: 10.1007/s00521-021-06764-3
3	YU C, LIU Z X, LIU X J, et al. DS-SLAM: a semantic visual SLAM towards dynamic environments [C]//International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018: 1168–1174. DOI: 10.1109/IROS.2018.8593691
4	LIU Y B, MIURA J. RDS-SLAM: Real-time dynamic SLAM using semantic segmentation methods [J]. IEEE access, 2021, 9: 23772–23785. DOI: 10.1109/ACCESS.2021.3050617
5	BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes [J]. IEEE robotics and automation letters, 2018, 3(4): 4076–4083. DOI: 10.1109/LRA.2018.2860039
6	WANG C J, LUO B, ZHANG Y, et al. DymSLAM: 4D dynamic scene reconstruction based on geometrical motion segmentation [J]. IEEE robotics and automation letters, 2021, 6(2): 550–557. DOI: 10.1109/LRA.2020.3045647
7	SUN Y X, LIU M, MENG Q H. Motion removal for reliable RGB-D SLAM in dynamic environments [J]. Robotics and autonomous systems, 2018, 108: 115–128. DOI: 10.1016/j.robot.2018.07.002
8	DAI W C, ZHANG Y, LI P, et al. RGB-D SLAM in dynamic environments using point correlations [J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(1): 373–389. DOI: 10.1109/TPAMI.2020.3010942
9	YUAN C F, XU Y L, ZHOU Q. PLDS-SLAM: point and line features SLAM in dynamic environment [J]. Remote sensing, 2023, 15(7): 1893. DOI: 10.3390/rs15071893
10	ZHANG J, HENEIN M, MAHONY R, et al. VDO-SLAM: a visual dynamic object-aware SLAM system [EB/OL]. (2020-05-22) [2021-12-14].
11	CHO H M, KIM E. Dynamic object-aware visual odometry (VO) estimation based on optical flow matching [J]. IEEE access, 1961, 11: 11642–11651. DOI: 10.1109/ACCESS.2023.3241961
12	YE W C, LAN X Y, CHEN S, et al. PVO: panoptic visual odometry [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023: 9579–9589. DOI: 10.1109/CVPR52729.2023.00924
13	SHEN S H, CAI Y L, WANG W S, et al. DytanVO: joint refinement of visual odometry and motion segmentation in dynamic environments [C]//International Conference on Robotics and Automation (ICRA). IEEE, 2023: 4048–4055. DOI: 10.1109/ICRA48891.2023.10161306
14	DOSOVITSKIY A, FISCHER P, ILG E, et al. FlowNet: learning optical flow with convolutional networks [C]//International Conference on Computer Vision (ICCV). IEEE, 2015: 2758–2766. DOI: 10.1109/ICCV.2015.316
15	ILG E, MAYER N, SAIKIA T, et al. FlowNet 2.0: evolution of optical flow estimation with deep networks [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 1647–1655. DOI: 10.1109/CVPR.2017.179
16	JONSCHKOWSKI R, STONE A, BARRON J T, et al. What matters in unsupervised optical flow [M]//Lecture notes in computer science. Cham: Springer International Publishing, 2020: 557–572. DOI: 10.1007/978-3-030-58536-5_33
17	RANJAN A, BLACK M J. Optical flow estimation using a spatial pyramid network [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017: 2720–2729. DOI: 10.1109/CVPR.2017.291
18	SUN D Q, YANG X D, LIU M Y, et al. PWC-net: CNNs for optical flow using pyramid, warping, and cost volume [C]//Conference on Computer Vision and Pattern Recognition. IEEE, 2018: 8934–8943. DOI: 10.1109/CVPR.2018.00931
19	TEED Z, DENG J. RAFT: recurrent all-pairs field transforms for optical flow [M]//lecture notes in computer science. Cham: Springer International Publishing, 2020: 402–419. DOI: 10.1007/978-3-030-58536-5_24
20	TEED Z, DENG J. RAFT-3D: Scene Flow using Rigid-Motion Embeddings [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021: 8371–8380. DOI: 10.1109/CVPR46437.2021.00827
21	REN Z L, GALLO O, SUN D Q, et al. A fusion approach for multi-frame optical flow estimation [C]//IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019: 2077–2086. DOI: 10.1109/WACV.2019.00225
22	SHI H, ZHOU Y F, YANG K L, et al. CSFlow: learning optical flow via cross strip correlation for autonomous driving [C]//Intelligent Vehicles Symposium (IV). IEEE, 2022: 1851–1858. DOI: 10.1109/IV51971.2022.9827341
23	GARREPALLI R, JEONG J, RAVINDRAN R C, et al. DIFT: dynamic iterative field transforms for memory efficient optical flow [C]//Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2023: 2220–2229. DOI: 10.1109/CVPRW59228.2023.00216
24	HUI T W, TANG X O, LOY C C. A lightweight optical flow CNN: revisiting data fidelity and regularization [J]. IEEE transactions on pattern analysis and machine intelligence, 2021, 43(8): 2555–2569. DOI: 10.1109/TPAMI.2020.2976928
25	GALVEZ-LÓPEZ D, TARDOS J D. Bags of binary words for fast place recognition in image sequences [J]. IEEE transactions on robotics, 2012, 28(5): 1188–1197. DOI: 10.1109/TRO.2012.2197158
26	SUENDERHAUF N, SHIRAZI S, JACOBSON A, et al. Place recognition with ConvNet landmarks: viewpoint-robust, condition-robust, training-free [C]//Proceedings of Robotics: Science and Systems XI. Robotics: Science and Systems Foundation, 2015: 1–10. DOI: 10.15607/rss.2015.xi.022
27	GAO X, ZHANG T. Unsupervised learning to detect loops using deep neural networks for visual SLAM system [J]. Autonomous robots, 2017, 41(1): 1–18. DOI: 10.1007/s10514-015-9516-2
28	MERRILL N, HUANG G Q. Lightweight unsupervised deep loop closure [EB/OL]. (2018-05-24) [2023-10-10].
29	MEMON A R, WANG H S, HUSSAIN A. Loop closure detection using supervised and unsupervised deep neural networks for monocular SLAM systems [J]. Robotics and autonomous systems, 2020, 126: 103470. DOI: 10.1016/j.robot.2020.103470
30	TEED Z, DENG J. DROID-SLAM: deep visual SLAM for monocular, stereo, and RGB-D cameras [J]. Advances in neural information processing systems, 2021, 34: 16558–16569
31	ZHANG Y M, GUO X D, POGGI M, et al. CompletionFormer: depth completion with convolutions and vision transformers [C]//Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023: 18527–18536. DOI: 10.1109/CVPR52729.2023.01777
32	ZHANG F H, PRISACARIU V, YANG R G, et al. GA-net: guided aggregation net for end-to-end stereo matching [C]//Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019: 185–194. DOI: 10.1109/CVPR.2019.00027
33	CABON Y, MURRAY N, HUMENBERGER M. Virtual KITTI 2 [EB/OL]. (2020-01-29) [2023-10-10].
34	GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: the KITTI dataset [J]. The international journal of robotics research, 2013, 32(11): 1231–1237. DOI: 10.1177/0278364913491297
35	SCHUBERT D, GOLL T, DEMMEL N, et al. The TUM VI benchmark for evaluating visual-inertial odometry [C]//International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018: 1680–1687. DOI: 10.1109/IROS.2018.8593419
36	MUR-ARTAL R, TARDÓS J D. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras [J]. IEEE transactions on robotics, 2017, 33(5): 1255–1262. DOI: 10.1109/TRO.2017.2705103
37	BESCOS B, FÁCIL J M, CIVERA J, et al. DynaSLAM: tracking, mapping, and inpainting in dynamic scenes [J]. IEEE robotics and automation letters, 2018, 3(4): 4076–4083. DOI: 10.1109/LRA.2018.2860039
38	KERL C, STURM J, CREMERS D. Robust odometry estimation for RGB-D cameras [C]//IEEE International Conference on Robotics and Automation. IEEE, 2013: 3748–3754. DOI: 10.1109/ICRA.2013.6631104
39	DAI W C, ZHANG Y, LI P, et al. RGB-D SLAM in dynamic environments using point correlations [J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 44(1): 373–389. DOI: 10.1109/TPAMI.2020.3010942
40	YE W C, YU X Y, LAN X Y, et al. DeFlowSLAM: self-supervised scene motion decomposition for dynamic dense SLAM [EB/OL]. [2023-10-10].

Experiment	Configuration		K09	K10
Depth estimate	-		11.527	4.775
	PENet		4.413	3.366
	CompletionFormer		3.569	2.748
	GA-net		2.689	1.414
Dynamic region removal	No		5.131	2.351
Dynamic region removal	Yes		2.689	1.414
Loop detection	-		4.058	1.412
	Flow		3.835	1.414
	Flow and feature	τ=0.5	3.665	1.414
		τ=1.2	2.689	1.420
		τ=2	3.872	1.427

Experiment	Configuration		K09	K10
Depth estimate	-		11.527	4.775
	PENet		4.413	3.366
	CompletionFormer		3.569	2.748
	GA-net		2.689	1.414
Dynamic region removal	No		5.131	2.351
Dynamic region removal	Yes		2.689	1.414
Loop detection	-		4.058	1.412
	Flow		3.835	1.414
	Flow and feature	τ=0.5	3.665	1.414
		τ=1.2	2.689	1.420
		τ=2	3.872	1.427

Method	K09	K10	VK01	VK02	VK06	VK18	VK20
DROID-SLAM	5.453	2.514	0.197	0.192	0.007	1.030	3.041
Our URM-SLAM	2.689	1.414	0.128	0.030	0.007	0.812	1.189

Method	K09	K10	VK01	VK02	VK06	VK18	VK20
DROID-SLAM	5.453	2.514	0.197	0.192	0.007	1.030	3.041
Our URM-SLAM	2.689	1.414	0.128	0.030	0.007	0.812	1.189

Method	Input Modes	DVO-SLAM^[38]	ORB-SLAM2	PointCorr^[39]	DROID-SLAM	Ours
Slightly dynamic	fr2/desk-person	0.104	0.006	0.008	0.019	0.014
	fr3/sitting-static	0.012	0.008	0.010	0.006	0.007
	fr3/sitting-xyz	0.242	0.010	0.009	0.011	0.009
	fr3/sitting-rpy	0.176	0.025	0.023	0.022	0.020
	fr3/sitting-halfsphere	0.220	0.025	0.024	0.023	0.022
Highly dynamic	fr3/walking-static	0.752	0.408	0.011	0.007	0.004
	fr3/walking-xyz	1.383	0.722	0.087	0.015	0.013
	fr3/walking-rpy	1.292	0.805	0.161	0.050	0.045
	fr3/walking-halfsphere	1.014	0.723	0.035	0.029	0.032

Unsupervised Motion Removal for Dynamic SLAM

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 9

References 40

Related Articles 0

Recommended Articles 0

Metrics