Multi-View Image-Based 3D Reconstruction in Indoor Scenes: A Survey

doi:10.12142/ZTECOM.202403011

ZTE Communications ›› 2024, Vol. 22 ›› Issue (3): 91-98.DOI: 10.12142/ZTECOM.202403011

收稿日期:2023-09-14 出版日期:2024-09-25 发布日期:2024-09-29

Multi-View Image-Based 3D Reconstruction in Indoor Scenes: A Survey

LU Ping¹^,²(), SHI Wenzhe¹^,²(), QIAO Xiuquan³

^1.State Key Laboratory of Mobile Network and Mobile Multimedia Technology, Shenzhen 518055, China
^2.ZTE Corporation, Shenzhen 518057, China
^3.State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China

Received:2023-09-14 Online:2024-09-25 Published:2024-09-29
About author:LU Ping (lu.ping@zte.com.cn) is the deputy president of ZTE Corporation, where he is also the general manager of the Industrial Digitalization Solution Dept., and the executive deputy director of State Key Laboratory of Mobile Network and Mobile Multimedia Technology. His research interests include cloud computing, big data, augmented reality, and multimedia service-based technologies. He has supported and participated in multiple major national science and technology projects and national science and technology support projects. He has published multiple papers and authored two books.
SHI Wenzhe (shi.wenzhe@zte.com.cn) is a strategy planner of ZTE Corporation where he is also an engineer for XRExplore Platform Product Planning and a member of the National Key Laboratory for Mobile Network and Mobile Multimedia Technology. His research interests include indoor visual AR navigation, SFM 3D reconstruction, visual SLAM, real-time cloud rendering, VR, and spatial perception.
QIAO Xiuquan is currently a full professor with Beijing University of Posts and Telecommunications, China, where he is also the deputy director of the Network Service Foundation Research Center, State Key Laboratory of Networking and Switching Technology. He has authored or co-authored over 60 technical papers in international journals and at conferences, including the IEEE Communications Magazine, Proceedings of IEEE, Computer Networks, IEEE Internet Computing, IEEE Transactions on Automation Science and Engineering, and ACM SIGCOMM Computer Communication Review. His current research interests include the future Internet, services computing, computer vision, distributed deep learning, augmented reality, virtual reality, and 5G networks. Dr. QIAO was a recipient of the Beijing Nova Program in 2008 and the First Prize of the 13th Beijing Youth Outstanding Science and Technology Paper Award in 2016. He served as the associate editor for Computing (Springer) and the editor board of China Communications.
Supported by:
ZTE Industry?University?Institute Cooperation Funds(HC?CN?20221102002)

摘要/Abstract

Abstract:

Three-dimensional reconstruction technology plays an important role in indoor scenes by converting objects and structures in indoor environments into accurate 3D models using multi-view RGB images. It offers a wide range of applications in fields such as virtual reality, augmented reality, indoor navigation, and game development. Existing methods based on multi-view RGB images have made significant progress in 3D reconstruction. These image-based reconstruction methods not only possess good expressive power and generalization performance, but also handle complex geometric shapes and textures effectively. Despite facing challenges such as lighting variations, occlusion, and texture loss in indoor scenes, these challenges can be effectively addressed through deep neural networks, neural implicit surface representations, and other techniques. The technology of indoor 3D reconstruction based on multi-view RGB images has a promising future. It not only provides immersive and interactive virtual experiences but also brings convenience and innovation to indoor navigation, interior design, and virtual tours. As the technology evolves, these image-based reconstruction methods will be further improved to provide higher quality and more accurate solutions to indoor scene reconstruction.

Key words: 3D reconstruction, MVS, NeRF, neural implicit surface

. [J]. ZTE Communications, 2024, 22(3): 91-98.

LU Ping, SHI Wenzhe, QIAO Xiuquan. Multi-View Image-Based 3D Reconstruction in Indoor Scenes: A Survey[J]. ZTE Communications, 2024, 22(3): 91-98.

图/表 3

参考文献 30

1	FISHER A, CANNIZZARO R, COCHRANE M, et al. ColMap: a memory-efficient occupancy grid mapping framework [J]. Robotics and autonomous systems, 2021, 142: 103755. DOI: 10.1016/j.robot.2021.103755
2	CERNEA D. OpenMVS: multi-view stereo reconstruction library [EB/OL]. [2023-05-20].
3	YANG J Y, MAO W, ALVAREZ J M, et al. Cost volume pyramid based depth inference for multi-view stereo [C]//Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 4876–4885. DOI: 10.1109/CVPR42600.2020.00493
4	WEI Z Z, ZHU Q T, MIN C, et al. AA-RMVSNet: Adaptive aggregation recurrent multi-view stereo network [C]//Proc. IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, 2021: 6167–6176. DOI: 10.1109/ICCV48922.2021.00613
5	LU P, SHENG B, SHI W Z. Scene visual perception and AR navigation applications [J]. ZTE communications, 2023, 21(1): 81–88. DOI: 10.12142/ZTECOM.202301010
6	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [J]. Advances in neural information processing systems, 2017: 30
7	DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale [EB/OL]. (2020-01-22)[2023-05-20].
8	MILDENHALL B, SRINIVASAN P P, TANCIK M, et al. NeRF: representing scenes as neural radiance fields for view synthesis [M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020: 405–421. DOI: 10.1007/978-3-030-58452-8_24
9	MATURANA D, SCHERER S. VoxNet: a 3D Convolutional Neural Network for real-time object recognition [C]//Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015: 922–928. DOI: 10.1109/IROS.2015.7353481
10	WANG N Y, ZHANG Y D, LI Z W, et al. Pixel2Mesh: generating 3D mesh models from single RGB images [C]//European Conference on Computer Vision. Cham: Springer, 2018: 55–71. DOI: 10.1007/978-3-030-01252-6_4
11	SAYED M, GIBSON J, WATSON J, et al. SimpleRecon: 3D reconstruction without 3D convolutions [M]//Lecture Notes in Computer Science. Cham: Springer Nature Switzerland, 2022: 1–19. DOI: 10.1007/978-3-031-19827-4_1
12	SUN J M, XIE Y M, CHEN L H, et al. NeuralRecon: real-time coherent 3D reconstruction from monocular video [C]//Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021: 15593–15602. DOI: 10.1109/CVPR46437.2021.01534
13	YARIV L, GU J T, KASTEN Y, et al. Volume rendering of neural implicit surfaces [C]//Proc. 35th International Conference on Neural Information Processing System. NIPS, 2021: 4805–4815
14	WANG P, LIU L J, LIU Y, et al. NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction [EB/OL]. (2020-06-20) [2023-05-20].
15	NG P C, HENIKOFF S. SIFT: predicting amino acid changes that affect protein function [J]. Nucleic acids research, 2003, 31(13): 3812–3814. DOI: 10.1093/nar/gkg509
16	BAY H, ESS A, TUYTELAARS T, et al. Speeded-up robust features (SURF) [J]. Computer vision and image understanding, 2008, 110(3): 346–359. DOI: 10.1016/j.cviu.2007.09.014
17	RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: an efficient alternative to SIFT or SURF [C]//Proc. International Conference on Computer Vision. IEEE, 2011: 2564–2571. DOI: 10.1109/ICCV.2011.6126544
18	MOULON P, MONASSE P, PERROT R, et al. OpenMVG: open multiple view geometry [M]//KERAUTRET B, COLOM M, MONASSE P, eds. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2017: 60–74. DOI: 10.1007/978-3-319-56414-2_5
19	WU Z F, SHEN C H, VAN DEN HENGEL A. Wider or deeper: revisiting the ResNet model for visual recognition [J]. Pattern recognition, 2019, 90: 119–133. DOI: 10.1016/j.patcog.2019.01.006
20	BARKAU R L. UNET: one-dimensional unsteady flow through a full network of open channels. User's manual [R]. Hydrologic Engineering Center Davis CA, 1996
21	YAO Y, LUO Z X, LI S W, et al. MVSNet: depth inference for unstructured multi-view stereo [C]//European Conference on Computer Vision. Springer, 2018: 785-801. DOI: 10.1007/978-3-030-01237-3_47
22	IM S, JEON H G, LIN S, et al. DPSNet: end-to-end deep plane sweep stereo [EB/OL]. (2019-05-02)[2023-05-06].
23	MUREZ Z, VAN AS T, BARTOLOZZI J, et al. Atlas: end-to-end 3D scene reconstruction from posed images [M]//Lecture Notes in Computer Science. Cham: Springer International Publishing, 2020: 414–431. DOI: 10.1007/978-3-030-58571-6_25
24	YAN J F, WEI Z Z, YI H W, et al. Dense hybrid recurrent multi-view stereo net with dynamic consistency checking [C]//16th European Conference on Computer Vision. Cham: Springer, 2020: 674-689.10.1007/978-3-030-58548-8_39
25	YU Z H, PENG S Y, NIEMEYER M, et al. MonoSDF: exploring monocular geometric cues for neural implicit surface reconstruction [EB/OL]. (2022-06-01)[2023-05-20].
26	LORENSEN W E, CLINE H E. Marching cubes: a high resolution 3D surface construction algorithm [J]. ACM SIGGRAPH computer graphics, 1987, 21(4): 163–169. DOI: 10.1145/37402.37422
27	SHEN T, GAO J, YIN K, et al. Deep marching tetrahedra: a hybrid representation for high-resolution 3d shape synthesis [J]. Advances in Neural Information Processing Systems, 2021, 34: 6087-6101.
28	MUNKBERG J, CHEN W Z, HASSELGREN J, et al. Extracting triangular 3D models, materials, and lighting from images [C]//Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022: 8270–8280. DOI: 10.1109/CVPR52688.2022.00810
29	YARIV L, HEDMAN P, REISER C, et al. BakedSDF: meshing neural SDFs for real-time view synthesis [EB/OL]. (2022-06-01)[2023-05-20].
30	FAREK J, HUGHES D, SALERNO W, et al. xAtlas: Scalable small variant calling across heterogeneous next-generation sequencing experiments [J]. GigaScience, 2023, 12: giac125. DOI: 10.1093/gigascience/giac125

Multi-View Image-Based 3D Reconstruction in Indoor Scenes: A Survey

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 3

参考文献 30

相关文章 0

编辑推荐

Metrics