ZTE Communications ›› 2016, Vol. 14 ›› Issue (S1): 54-60.DOI: DOI:10.3969/j.issn.1673-5188.2016.S1.008

• Research Paper • Previous Articles    

Action Recognition in Surveillance Videos with Combined Deep Network Models

ZHANG Diankai1, ZHAO Rui-Wei2, SHEN Lin1, CHEN Shaoxiang2, SUN Zhenfeng2, and JIANG Yu-Gang2   

  1. 1. ZTE Corporation, Nanjing 210012, China;
    2. Fudan University, Shanghai 201203, China
  • Online:2016-12-01 Published:2019-11-29
  • About author:ZHANG Diankai (zhang.diankai@zte.com.cn) received his BE degree in electronic information engineering and MS degree in signal and information processing from Nanjing University of Posts and Telecommunications (NUPT), China in 2006 and 2009. He is a senior video and image algorithm engineer of ZTE Corporation. His research interests include video and image processing, pattern recognition, and computer vision.
    ZHAO Rui-Wei (rw.du.zhao@gmail.com) received BS degree in 2005 and MS degree in 2009 from Tongji University, China. He is currently a PhD candidate in the School of Computer Science at Fudan University, China. His research interests include deep learning methods for image and video recognition.
    SHEN Lin (shen.lin2@zte.com.cn) received her BE degree in communication engineering and MS degree in computer application from Nanjing University of Science and Technology (NUST), China in 2007 and 2009. She is a senior video and image algorithm engineer of ZTE Corporation. Her research interests include video and image processing, pattern recognition, and computer vision.
    CHEN Shaoxiang (forwchen@gmail.com) is currently a research student in the School of Computer Science at Fudan University. His research interests include object detection and tracking in video data.
    SUN Zhenfeng (zf_sun@foxmail.com) is currently studying at School of Computer Science, Fudan University towards the degree of Doctor of Engineering. His research interests include multimedia software systems and computer vision.
    JIANG Yu-Gang (yugang.jiang@gmail.com) received the PhD degree in computer science from the City University of Hong Kong, China in 2009. During 2008-2011, he was with the Department of Electrical Engineering, Columbia University, USA. He is currently a full professor of computer science at Fudan University. His research interests include multimedia retrieval and computer vision. He is one of the organizers of the annual THUMOS Challenge on Large Scale Action Recognition, and served as a program chair of ACM ICMR 2015. He is the recipient of many awards, including the prestigious ACM China Rising Star Award (2014).
  • Supported by:
    This work is supported by ZTE Industry-Academia-Research Cooperation Funds.

Action Recognition in Surveillance Videos with Combined Deep Network Models

ZHANG Diankai1, ZHAO Rui-Wei2, SHEN Lin1, CHEN Shaoxiang2, SUN Zhenfeng2, and JIANG Yu-Gang2   

  1. 1. ZTE Corporation, Nanjing 210012, China;
    2. Fudan University, Shanghai 201203, China
  • 作者简介:ZHANG Diankai (zhang.diankai@zte.com.cn) received his BE degree in electronic information engineering and MS degree in signal and information processing from Nanjing University of Posts and Telecommunications (NUPT), China in 2006 and 2009. He is a senior video and image algorithm engineer of ZTE Corporation. His research interests include video and image processing, pattern recognition, and computer vision.
    ZHAO Rui-Wei (rw.du.zhao@gmail.com) received BS degree in 2005 and MS degree in 2009 from Tongji University, China. He is currently a PhD candidate in the School of Computer Science at Fudan University, China. His research interests include deep learning methods for image and video recognition.
    SHEN Lin (shen.lin2@zte.com.cn) received her BE degree in communication engineering and MS degree in computer application from Nanjing University of Science and Technology (NUST), China in 2007 and 2009. She is a senior video and image algorithm engineer of ZTE Corporation. Her research interests include video and image processing, pattern recognition, and computer vision.
    CHEN Shaoxiang (forwchen@gmail.com) is currently a research student in the School of Computer Science at Fudan University. His research interests include object detection and tracking in video data.
    SUN Zhenfeng (zf_sun@foxmail.com) is currently studying at School of Computer Science, Fudan University towards the degree of Doctor of Engineering. His research interests include multimedia software systems and computer vision.
    JIANG Yu-Gang (yugang.jiang@gmail.com) received the PhD degree in computer science from the City University of Hong Kong, China in 2009. During 2008-2011, he was with the Department of Electrical Engineering, Columbia University, USA. He is currently a full professor of computer science at Fudan University. His research interests include multimedia retrieval and computer vision. He is one of the organizers of the annual THUMOS Challenge on Large Scale Action Recognition, and served as a program chair of ACM ICMR 2015. He is the recipient of many awards, including the prestigious ACM China Rising Star Award (2014).
  • 基金资助:
    This work is supported by ZTE Industry-Academia-Research Cooperation Funds.

Abstract: Action recognition is an important topic in computer vision. Recently, deep learning technologies have been successfully used in lots of applications including video data for sloving recognition problems. However, most existing deep learning based recognition frameworks are not optimized for action in the surveillance videos. In this paper, we propose a novel method to deal with the recognition of different types of actions in outdoor surveillance videos. The proposed method first introduces motion compensation to improve the detection of human target. Then, it uses three different types of deep models with single and sequenced images as inputs for the recognition of different types of actions. Finally, predictions from different models are fused with a linear model. Experimental results show that the proposed method works well on the real surveillance videos.

Key words: action recognition, deep network models, model fusion, surveillance video

摘要: Action recognition is an important topic in computer vision. Recently, deep learning technologies have been successfully used in lots of applications including video data for sloving recognition problems. However, most existing deep learning based recognition frameworks are not optimized for action in the surveillance videos. In this paper, we propose a novel method to deal with the recognition of different types of actions in outdoor surveillance videos. The proposed method first introduces motion compensation to improve the detection of human target. Then, it uses three different types of deep models with single and sequenced images as inputs for the recognition of different types of actions. Finally, predictions from different models are fused with a linear model. Experimental results show that the proposed method works well on the real surveillance videos.

关键词: action recognition, deep network models, model fusion, surveillance video