ZTE Communications ›› 2012, Vol. 10 ›› Issue (2): 57-66.

• Special Topic • Previous Articles    

Key Technologies in Mobile Visual Search and MPEG Standardization Activities

Ling-Yu Duan, Jie Chen, Chunyu Wang, Rongrong Ji, Tiejun Huang, and Wen Gao   

  1. Institute of Digital Media, Peking University, Beijing 100871, China
  • Received:2012-03-08 Online:2012-06-25 Published:2012-06-25
  • About author:Ling-Yu Duan (lingyu@pku.edu.cn) received his MSc degree in automation from The University of Science and Technolohy, China, in 1999. He received his MSc degree in computer science from the National University of Singapore in 2002 and his PhD degree in information technology from The University of Newcastle, Australia, in 2007. From 2003 to 2008, he was a research scientist at the Institute for Infocomm Research, Singapore. Since 2008, he has been an associate professor at the School of Electrical Engineering and Computer Science at Peking University. Dr. Duan currently His research interests include visual search and reality augmentation, multimedia content analysis, and mobile media computing. He has authored more than 70 papers in these areas.

    Jie Chen (cjie@pku.edu.cn) is a PhD candidate at the School of Electrical Engineering and Computer Science, Peking University. His research interest include mobile visual search, low bit-rate visual descriptors, and vector quantizer. He has published more than 10 journal or conference papers.

    Chunyu Wang (wangchunyu@pku.edu.cn) is a PhD candidate at the School of Electrical Engineering and Computer Science, Peking University. His research interests include visual search, object recognition, and activity recognition.

    Rongrong Ji (rrji@pku.edu.cn) received his PhD degree in computer science from Harbin Institute of Technology, China. He is currently a postdoctoral research fellow at Columbia University. His research interests include image and video search, content analysis and understanding, mobile visual search and recognition, and interactive human-computer interface. Dr. Ji received the Best Paper Award at ACM Multimedia 2011 and received a Microsoft Fellowship in 2007.

    Tiejun Huang (tjhuang@pku.edu.cn) received his BSc and MSC degrees in computer science from Wuhan University of Technology in 1992 and 1995. He received his PhD degree in pattern recognition and image analysis from Huazhong University of Science and Technology, China, in 1998. He is currently a professor in the School of Electrical Engineering and Computer Science, Peking University. He is also vice director of the National Engineering Laboratory for Video Technology of China. His research interests include video coding, image understanding, digital rights management (DRM), and digital library. He has published more than sixty peer-reviewed papers and has authored or co-authored three books. He is a member of the board of directors for the Digital Media Project; he is on the advisory board for IEEE Computing Now; he is on the editorial board of the Journal on 3D Research; and he is on the board of the Chinese Institute of Electronics.

    Wen Gao (wgao@pku.edu.cn) received his MSc degree in computer science from Harbin Institute of Technology, China, in 1985. He received his PhD degree in electronics engineering from the University of Tokyo in 1991. He is a professor in the School of Electronics Engineering and Computer Science, Peking University. He has led research efforts in video coding, face recognition, sign language recognition and synthesis, and multimedia retrieval. Professor Gao was admitted as an Academedian of the China Engineering Academy in 2011 and became an IEEE Fellow in 2010 for his contribution to video coding technology. He has been on the editorial boards of IEEE Trans. on Multimedia, IEEE Trans. Circuits Syst. For Video Tech., and several other top international academic journals. He was the chair of IEEE Int. Conf. Multimedia & Expo (ICME) 2007, and ACM Int. Conf. Multimedia (ACM-MM) 2009. He has authored four books and published more than 500 research papers on video coding, signal processing, computer vision, and pattern recognition.

Key Technologies in Mobile Visual Search and MPEG Standardization Activities

Ling-Yu Duan, Jie Chen, Chunyu Wang, Rongrong Ji, Tiejun Huang, and Wen Gao   

  1. Institute of Digital Media, Peking University, Beijing 100871, China
  • 作者简介:Ling-Yu Duan (lingyu@pku.edu.cn) received his MSc degree in automation from The University of Science and Technolohy, China, in 1999. He received his MSc degree in computer science from the National University of Singapore in 2002 and his PhD degree in information technology from The University of Newcastle, Australia, in 2007. From 2003 to 2008, he was a research scientist at the Institute for Infocomm Research, Singapore. Since 2008, he has been an associate professor at the School of Electrical Engineering and Computer Science at Peking University. Dr. Duan currently His research interests include visual search and reality augmentation, multimedia content analysis, and mobile media computing. He has authored more than 70 papers in these areas.

    Jie Chen (cjie@pku.edu.cn) is a PhD candidate at the School of Electrical Engineering and Computer Science, Peking University. His research interest include mobile visual search, low bit-rate visual descriptors, and vector quantizer. He has published more than 10 journal or conference papers.

    Chunyu Wang (wangchunyu@pku.edu.cn) is a PhD candidate at the School of Electrical Engineering and Computer Science, Peking University. His research interests include visual search, object recognition, and activity recognition.

    Rongrong Ji (rrji@pku.edu.cn) received his PhD degree in computer science from Harbin Institute of Technology, China. He is currently a postdoctoral research fellow at Columbia University. His research interests include image and video search, content analysis and understanding, mobile visual search and recognition, and interactive human-computer interface. Dr. Ji received the Best Paper Award at ACM Multimedia 2011 and received a Microsoft Fellowship in 2007.

    Tiejun Huang (tjhuang@pku.edu.cn) received his BSc and MSC degrees in computer science from Wuhan University of Technology in 1992 and 1995. He received his PhD degree in pattern recognition and image analysis from Huazhong University of Science and Technology, China, in 1998. He is currently a professor in the School of Electrical Engineering and Computer Science, Peking University. He is also vice director of the National Engineering Laboratory for Video Technology of China. His research interests include video coding, image understanding, digital rights management (DRM), and digital library. He has published more than sixty peer-reviewed papers and has authored or co-authored three books. He is a member of the board of directors for the Digital Media Project; he is on the advisory board for IEEE Computing Now; he is on the editorial board of the Journal on 3D Research; and he is on the board of the Chinese Institute of Electronics.

    Wen Gao (wgao@pku.edu.cn) received his MSc degree in computer science from Harbin Institute of Technology, China, in 1985. He received his PhD degree in electronics engineering from the University of Tokyo in 1991. He is a professor in the School of Electronics Engineering and Computer Science, Peking University. He has led research efforts in video coding, face recognition, sign language recognition and synthesis, and multimedia retrieval. Professor Gao was admitted as an Academedian of the China Engineering Academy in 2011 and became an IEEE Fellow in 2010 for his contribution to video coding technology. He has been on the editorial boards of IEEE Trans. on Multimedia, IEEE Trans. Circuits Syst. For Video Tech., and several other top international academic journals. He was the chair of IEEE Int. Conf. Multimedia & Expo (ICME) 2007, and ACM Int. Conf. Multimedia (ACM-MM) 2009. He has authored four books and published more than 500 research papers on video coding, signal processing, computer vision, and pattern recognition.

Abstract: Visual search has been a long-standing problem in applications such as location recognition and product search. Much research has been done on image representation, matching, indexing, and retrieval. Key component technologies for visual search have been developed, and numerous real-world applications are emerging. To ensure application interoperability, the Moving Picture Experts Group (MPEG) has begun standardizing visual search technologies and is developing the compact descriptors for visual search (CDVS) standard. MPEG seeks to develop a collaborative platform for evaluating existing visual search technologies. Peking University has participated in this standardization since the 94th MPEG meeting, and significant progress has been made with the various proposals. A test model (TM) has been selected to determine the basic pipeline and key components of visual search. However, the first-version TM has high computational complexity and imperfect retrieval and matching. Core experiments have therefore been set up to improve TM. In this article, we summarize key technologies for visual search and report the progress of MPEG CDVS. We discuss Peking University’s efforts in CDVS and also discuss unresolved issues.

Key words: visual search, mobile, visual descriptors, low bit rate, compression

摘要: Visual search has been a long-standing problem in applications such as location recognition and product search. Much research has been done on image representation, matching, indexing, and retrieval. Key component technologies for visual search have been developed, and numerous real-world applications are emerging. To ensure application interoperability, the Moving Picture Experts Group (MPEG) has begun standardizing visual search technologies and is developing the compact descriptors for visual search (CDVS) standard. MPEG seeks to develop a collaborative platform for evaluating existing visual search technologies. Peking University has participated in this standardization since the 94th MPEG meeting, and significant progress has been made with the various proposals. A test model (TM) has been selected to determine the basic pipeline and key components of visual search. However, the first-version TM has high computational complexity and imperfect retrieval and matching. Core experiments have therefore been set up to improve TM. In this article, we summarize key technologies for visual search and report the progress of MPEG CDVS. We discuss Peking University’s efforts in CDVS and also discuss unresolved issues.

关键词: visual search, mobile, visual descriptors, low bit rate, compression