Key Technologies in Mobile Visual Search and MPEG Standardization Activities

ZTE Communications ›› 2012, Vol. 10 ›› Issue (2): 57-66.

• Special Topic • Previous Articles

Key Technologies in Mobile Visual Search and MPEG Standardization Activities

Ling-Yu Duan, Jie Chen, Chunyu Wang, Rongrong Ji, Tiejun Huang, and Wen Gao

Institute of Digital Media, Peking University, Beijing 100871, China

Received:2012-03-08 Online:2012-06-25 Published:2012-06-25
About author:Ling-Yu Duan (lingyu@pku.edu.cn) received his MSc degree in automation from The University of Science and Technolohy, China, in 1999. He received his MSc degree in computer science from the National University of Singapore in 2002 and his PhD degree in information technology from The University of Newcastle, Australia, in 2007. From 2003 to 2008, he was a research scientist at the Institute for Infocomm Research, Singapore. Since 2008, he has been an associate professor at the School of Electrical Engineering and Computer Science at Peking University. Dr. Duan currently His research interests include visual search and reality augmentation, multimedia content analysis, and mobile media computing. He has authored more than 70 papers in these areas.

Jie Chen (cjie@pku.edu.cn) is a PhD candidate at the School of Electrical Engineering and Computer Science, Peking University. His research interest include mobile visual search, low bit-rate visual descriptors, and vector quantizer. He has published more than 10 journal or conference papers.

Chunyu Wang (wangchunyu@pku.edu.cn) is a PhD candidate at the School of Electrical Engineering and Computer Science, Peking University. His research interests include visual search, object recognition, and activity recognition.

Rongrong Ji (rrji@pku.edu.cn) received his PhD degree in computer science from Harbin Institute of Technology, China. He is currently a postdoctoral research fellow at Columbia University. His research interests include image and video search, content analysis and understanding, mobile visual search and recognition, and interactive human-computer interface. Dr. Ji received the Best Paper Award at ACM Multimedia 2011 and received a Microsoft Fellowship in 2007.

Tiejun Huang (tjhuang@pku.edu.cn) received his BSc and MSC degrees in computer science from Wuhan University of Technology in 1992 and 1995. He received his PhD degree in pattern recognition and image analysis from Huazhong University of Science and Technology, China, in 1998. He is currently a professor in the School of Electrical Engineering and Computer Science, Peking University. He is also vice director of the National Engineering Laboratory for Video Technology of China. His research interests include video coding, image understanding, digital rights management (DRM), and digital library. He has published more than sixty peer-reviewed papers and has authored or co-authored three books. He is a member of the board of directors for the Digital Media Project; he is on the advisory board for IEEE Computing Now; he is on the editorial board of the Journal on 3D Research; and he is on the board of the Chinese Institute of Electronics.

Wen Gao (wgao@pku.edu.cn) received his MSc degree in computer science from Harbin Institute of Technology, China, in 1985. He received his PhD degree in electronics engineering from the University of Tokyo in 1991. He is a professor in the School of Electronics Engineering and Computer Science, Peking University. He has led research efforts in video coding, face recognition, sign language recognition and synthesis, and multimedia retrieval. Professor Gao was admitted as an Academedian of the China Engineering Academy in 2011 and became an IEEE Fellow in 2010 for his contribution to video coding technology. He has been on the editorial boards of IEEE Trans. on Multimedia, IEEE Trans. Circuits Syst. For Video Tech., and several other top international academic journals. He was the chair of IEEE Int. Conf. Multimedia & Expo (ICME) 2007, and ACM Int. Conf. Multimedia (ACM-MM) 2009. He has authored four books and published more than 500 research papers on video coding, signal processing, computer vision, and pattern recognition.

Key Technologies in Mobile Visual Search and MPEG Standardization Activities

Ling-Yu Duan, Jie Chen, Chunyu Wang, Rongrong Ji, Tiejun Huang, and Wen Gao

Institute of Digital Media, Peking University, Beijing 100871, China

作者简介:Ling-Yu Duan (lingyu@pku.edu.cn) received his MSc degree in automation from The University of Science and Technolohy, China, in 1999. He received his MSc degree in computer science from the National University of Singapore in 2002 and his PhD degree in information technology from The University of Newcastle, Australia, in 2007. From 2003 to 2008, he was a research scientist at the Institute for Infocomm Research, Singapore. Since 2008, he has been an associate professor at the School of Electrical Engineering and Computer Science at Peking University. Dr. Duan currently His research interests include visual search and reality augmentation, multimedia content analysis, and mobile media computing. He has authored more than 70 papers in these areas.

Jie Chen (cjie@pku.edu.cn) is a PhD candidate at the School of Electrical Engineering and Computer Science, Peking University. His research interest include mobile visual search, low bit-rate visual descriptors, and vector quantizer. He has published more than 10 journal or conference papers.

Chunyu Wang (wangchunyu@pku.edu.cn) is a PhD candidate at the School of Electrical Engineering and Computer Science, Peking University. His research interests include visual search, object recognition, and activity recognition.

Rongrong Ji (rrji@pku.edu.cn) received his PhD degree in computer science from Harbin Institute of Technology, China. He is currently a postdoctoral research fellow at Columbia University. His research interests include image and video search, content analysis and understanding, mobile visual search and recognition, and interactive human-computer interface. Dr. Ji received the Best Paper Award at ACM Multimedia 2011 and received a Microsoft Fellowship in 2007.

Tiejun Huang (tjhuang@pku.edu.cn) received his BSc and MSC degrees in computer science from Wuhan University of Technology in 1992 and 1995. He received his PhD degree in pattern recognition and image analysis from Huazhong University of Science and Technology, China, in 1998. He is currently a professor in the School of Electrical Engineering and Computer Science, Peking University. He is also vice director of the National Engineering Laboratory for Video Technology of China. His research interests include video coding, image understanding, digital rights management (DRM), and digital library. He has published more than sixty peer-reviewed papers and has authored or co-authored three books. He is a member of the board of directors for the Digital Media Project; he is on the advisory board for IEEE Computing Now; he is on the editorial board of the Journal on 3D Research; and he is on the board of the Chinese Institute of Electronics.

Wen Gao (wgao@pku.edu.cn) received his MSc degree in computer science from Harbin Institute of Technology, China, in 1985. He received his PhD degree in electronics engineering from the University of Tokyo in 1991. He is a professor in the School of Electronics Engineering and Computer Science, Peking University. He has led research efforts in video coding, face recognition, sign language recognition and synthesis, and multimedia retrieval. Professor Gao was admitted as an Academedian of the China Engineering Academy in 2011 and became an IEEE Fellow in 2010 for his contribution to video coding technology. He has been on the editorial boards of IEEE Trans. on Multimedia, IEEE Trans. Circuits Syst. For Video Tech., and several other top international academic journals. He was the chair of IEEE Int. Conf. Multimedia & Expo (ICME) 2007, and ACM Int. Conf. Multimedia (ACM-MM) 2009. He has authored four books and published more than 500 research papers on video coding, signal processing, computer vision, and pattern recognition.

Abstract

Abstract: Visual search has been a long-standing problem in applications such as location recognition and product search. Much research has been done on image representation, matching, indexing, and retrieval. Key component technologies for visual search have been developed, and numerous real-world applications are emerging. To ensure application interoperability, the Moving Picture Experts Group (MPEG) has begun standardizing visual search technologies and is developing the compact descriptors for visual search (CDVS) standard. MPEG seeks to develop a collaborative platform for evaluating existing visual search technologies. Peking University has participated in this standardization since the 94th MPEG meeting, and significant progress has been made with the various proposals. A test model (TM) has been selected to determine the basic pipeline and key components of visual search. However, the first-version TM has high computational complexity and imperfect retrieval and matching. Core experiments have therefore been set up to improve TM. In this article, we summarize key technologies for visual search and report the progress of MPEG CDVS. We discuss Peking University’s efforts in CDVS and also discuss unresolved issues.

Key words: visual search, mobile, visual descriptors, low bit rate, compression

摘要： Visual search has been a long-standing problem in applications such as location recognition and product search. Much research has been done on image representation, matching, indexing, and retrieval. Key component technologies for visual search have been developed, and numerous real-world applications are emerging. To ensure application interoperability, the Moving Picture Experts Group (MPEG) has begun standardizing visual search technologies and is developing the compact descriptors for visual search (CDVS) standard. MPEG seeks to develop a collaborative platform for evaluating existing visual search technologies. Peking University has participated in this standardization since the 94th MPEG meeting, and significant progress has been made with the various proposals. A test model (TM) has been selected to determine the basic pipeline and key components of visual search. However, the first-version TM has high computational complexity and imperfect retrieval and matching. Core experiments have therefore been set up to improve TM. In this article, we summarize key technologies for visual search and report the progress of MPEG CDVS. We discuss Peking University’s efforts in CDVS and also discuss unresolved issues.

关键词: visual search, mobile, visual descriptors, low bit rate, compression

Ling-Yu Duan, Jie Chen, Chunyu Wang, Rongrong Ji, Tiejun Huang, and Wen Gao. Key Technologies in Mobile Visual Search and MPEG Standardization Activities[J]. ZTE Communications, 2012, 10(2): 57-66.

[1]	TONG Ze, DENG Bowen, ZHENG Lele, ZHANG Tao. Utility-Improved Key-Value Data Collection with Local Differential Privacy for Mobile Devices [J]. ZTE Communications, 2022, 20(4): 15-21.
[2]	TANG Junwen, XU Shenheng, YANG Fan, LI Maokun. Recent Developments of Transmissive Reconfigurable Intelligent Surfaces: A Review [J]. ZTE Communications, 2022, 20(1): 21-27.
[3]	LI Xiuxian, LI Zhetao, OUYANG Yan, DUAN Haohua, XIANG Liyao. Using UAV to Detect Truth for Clean Data Collection in Sensor‑Cloud Systems [J]. ZTE Communications, 2021, 19(3): 30-45.
[4]	LIU Junyu, YANG Yongjian, WANG En. BPPF: Bilateral Privacy-Preserving Framework for Mobile Crowdsensing [J]. ZTE Communications, 2021, 19(2): 20-28.
[5]	LOU Kaihao, YANG Yongjian, YANG Funing, ZHANG Xingliang. Maximum-Profit Advertising Strategy Using Crowdsensing Trajectory Data [J]. ZTE Communications, 2021, 19(2): 29-43.
[6]	LI Yezhen, REN Yongli, YANG Fan, XU Shenheng, ZHANG Jiannian. A Novel 28 GHz Phased Array Antenna for 5G Mobile Communications [J]. ZTE Communications, 2020, 18(3): 20-25.
[7]	CHEN Haowei, ZENG Liekang, YU Shuai, CHEN Xu. Knowledge Distillation for Mobile Edge Computation Offloading [J]. ZTE Communications, 2020, 18(2): 40-48.
[8]	WU Hequan. Ten Reflections on 5G [J]. ZTE Communications, 2020, 18(1): 1-4.
[9]	Stephen ANOKYE, Mohammed SEID, SUN Guolin. A Survey on Machine Learning Based Proactive Caching [J]. ZTE Communications, 2019, 17(4): 46-55.
[10]	WU Kesong, CAO Xianbin, CHEN Zhifeng, WU Dapeng. Adaptive Mobile Video Delivery Based on Fountain Codes and DASH: A Survey [J]. ZTE Communications, 2018, 16(3): 9-14.
[11]	XU Yiling, ZHANG Ke, HE Lanyi, JIANG Zhiqian, ZHU Wenjie. Introduction to Point Cloud Compression [J]. ZTE Communications, 2018, 16(3): 3-8.
[12]	ZHOU Yuezhi, ZHANG Di, ZHANG Yaoxue. A Transparent and User-Centric Approach to Unify Resource Management and Code Scheduling of Local, Edge, and Cloud [J]. ZTE Communications, 2017, 15(4): 3-11.
[13]	HUANG Huawei, GUO Song. Adaptive Service Provisioning for Mobile Edge Cloud [J]. ZTE Communications, 2017, 15(2): 2-10.
[14]	DONG Zhenjiang, SHUANG Kai, CAI Yanan, WANG Wei, and LI Congbing. An Optimization of HTTP/2 for Mobile Applications [J]. ZTE Communications, 2016, 14(S1): 35-42.
[15]	Bin Li, Jizheng Xu. An Introduction to High Efficiency Video Coding Range Extensions [J]. ZTE Communications, 2016, 14(1): 12-18.

Key Technologies in Mobile Visual Search and MPEG Standardization Activities

Key Technologies in Mobile Visual Search and MPEG Standardization Activities

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics