ZTE Communications ›› 2019, Vol. 17 ›› Issue (2): 59-66.DOI: 10.12142/ZTECOM.201902009

• Research Paper • Previous Articles    

SRSC: Improving Restore Performance for Deduplication-Based Storage Systems

ZUO Chunxue1, WANG Fang1, TANG Xiaolan2, ZHANG Yucheng1, FENG Dan1   

  1. 1. Key Laboratory of Information Storage Systems, Engineering Research Center of Data Storage Systems and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
    2. 5G Application Product Line, ZTE Corporation, Shenzhen, Guangdong 518057, China
  • Received:2018-06-09 Online:2019-06-11 Published:2019-11-14
  • About author:ZUO Chunxue (cxzuo@hust.edu.cn) is currently working toward the Ph.D. degree in computer architecture at Huazhong University of Science and Technology, China. Her research interest is data deduplication and restore performance|WANG Fang received the B.E., M.E., and Ph.D. degrees in computer science and technology from Huazhong University of Science and Technology (HUST), China in 1994, 1997, and 2001, respectively. She is a professor of the School of Computer Science and Technology, HUST. Her research interests include computer architecture, massive storage systems, and parallel file systems. She has more than 40 publications to her credit in journals and international conferences including ACM TACO, SC, MSST, ICPP, ICA3PP, HPDC, and ICDCS|TANG Xiaolan received the M.E. degree in physical electronics from Huazhong University of Science and Technology, China. She is currently a project manager with ZTE Corporation. Her research interests include core network, cloud storage, and 5G technologies|ZHANG Yucheng is currently a Ph.D. student majoring in computer architecture at Huazhong University of Science and Technology, China. His research interests include data deduplication, storage systems, etc. He has several papers in refereed journals and conferences including IEEE-TC, INFOCOM, etc|FENG Dan received the B.E., M.E., and Ph.D. degrees in computer science and technology from Huazhong University of Science and Technology (HUST), China in 1991, 1994, and 1997, respectively. She is a professor and the dean of the School of Computer, HUST. Her research interests include computer architecture, and massive storage systems. She has many publications in major journals and international conferences, including IEEE-TC, IEEETPDS, FAST, USENIX ATC, and MSST
  • Supported by:
    This work was supported in part by ZTE Industry-Academia-Research Cooperation Funds, the National Natural Science Foundation of China under Grant Nos(61502191);This work was supported in part by ZTE Industry-Academia-Research Cooperation Funds, the National Natural Science Foundation of China under Grant Nos(61502190);This work was supported in part by ZTE Industry-Academia-Research Cooperation Funds, the National Natural Science Foundation of China under Grant Nos(61602197);This work was supported in part by ZTE Industry-Academia-Research Cooperation Funds, the National Natural Science Foundation of China under Grant Nos(61772222);Fundamental Research Funds for the Central Universities under Grant Nos(2017KFYXJJ065);Fundamental Research Funds for the Central Universities under Grant Nos(2016YXMS085);The Hubei Provincial Natural Science Foundation of China under Grant Nos(2016CFB226);The Hubei Provincial Natural Science Foundation of China under Grant Nos(2016CFB192);Key Laboratory of Information Storage System Ministry of Education of China

Abstract:

Modern backup systems exploit data deduplication technology to save storage space whereas suffering from the fragmentation problem caused by deduplication. Fragmentation degrades the restore performance because of restoring the chunks that are scattered all over different containers. To improve the restore performance, the state-of-the-art History Aware Rewriting Algorithm (HAR) is proposed to collect fragmented chunks in the last backup and rewrite them in the next backup. However, due to rewriting fragmented chunks in the next backup, HAR fails to eliminate internal fragmentation caused by self-referenced chunks (that exist more than two times in a backup) in the current backup, thus degrading the restore performance. In this paper, we propose Selectively Rewriting Self-Referenced Chunks (SRSC), a scheme that designs a buffer to simulate a restore cache, identify internal fragmentation in the cache and selectively rewrite them. Our experimental results based on two real-world datasets show that SRSC improves the restore performance by 45% with an acceptable sacrifice of the deduplication ratio.

Key words: data deduplication, fragmentation, restore performance