ZTE Communications ›› 2014, Vol. 12 ›› Issue (4): 8-15.DOI: DOI:10.3969/j.issn.1673-5188.2014.04.002

• Special Topic • Previous Articles     Next Articles

HMIBase: An Hierarchical Indexing System for Storing and Querying Big Data

Shengmei Luo1, Di Zhao2, Wei Ge2, Rong Gu2, Chunfeng Yuan2, and Yihua Huang2   

  1. 1. ZTE Corporation, Nanjing 210012, China;
    2. Nanjing University, Nanjing 210046, China
  • Received:2014-05-23 Online:2014-12-25 Published:2014-12-25
  • About author:Shengmei Luo (luo.shengmei@zte.com.cn) received his MS degree in telecommunication and electronics from Harbin Institute of Technology in 1996. He is a chief architect at ZTE Corporation. His research interests include cloud computing, cloud storage, and big data.

    Di Zhao (zd08135@126.com) received his BS degree in computer science and technology from Nanjing University in 2012. He is currently a master’s degree candidate at the Department of Computer Science and Technology, Nanjing University. His research interests include parallel computing and analyzing and processing of big data.

    Wei Ge (gloria.w.ge@qq.com) received her MS degree from Northeastern University in 2003. She is currently a PhD candidate in computer science at Nanjing University. Her research interests include data management, database query optimization, and big data query optimization. She has published 10 papers in journals and conference proceedings, including in Science in China (series F), APWeb 2003, and Journal of Electronics.

    Rong Gu (gurongwalker@gmail.com) received his BS degree in computer science from Nanjing University of Aeronautics and Astronautics in 2011. He is currently a PhD candidate in computer science at Nanjing University. His research interests include parallel and distributed computing, cloud computing, and big-data parallel processing.

    Chunfeng Yuan (cfyuan@nju.edu.cn) is a professor at the Department of Computer Science, Nanjing University. She received her BS and MS degrees in computer science from Nanjing University. Her main research interests include compute system architecture, big data parallel processing, and Web information mining.

    Yihua Huang (yhuang@nju.edu.cn) is a professor at the Department of Computer Science, Nanjing University. He received his BS, MS, and PhD degrees in computer science from Nanjing University. His research interests include parallel and distributed computing, big-data parallel processing, and Web information mining.
  • Supported by:
    This work is supported by China National Science Foundation (Grant 61223003) and ZTE Industry-Academia-Research Cooperation Funds.

HMIBase: An Hierarchical Indexing System for Storing and Querying Big Data

Shengmei Luo1, Di Zhao2, Wei Ge2, Rong Gu2, Chunfeng Yuan2, and Yihua Huang2   

  1. 1. ZTE Corporation, Nanjing 210012, China;
    2. Nanjing University, Nanjing 210046, China
  • 作者简介:Shengmei Luo (luo.shengmei@zte.com.cn) received his MS degree in telecommunication and electronics from Harbin Institute of Technology in 1996. He is a chief architect at ZTE Corporation. His research interests include cloud computing, cloud storage, and big data.

    Di Zhao (zd08135@126.com) received his BS degree in computer science and technology from Nanjing University in 2012. He is currently a master’s degree candidate at the Department of Computer Science and Technology, Nanjing University. His research interests include parallel computing and analyzing and processing of big data.

    Wei Ge (gloria.w.ge@qq.com) received her MS degree from Northeastern University in 2003. She is currently a PhD candidate in computer science at Nanjing University. Her research interests include data management, database query optimization, and big data query optimization. She has published 10 papers in journals and conference proceedings, including in Science in China (series F), APWeb 2003, and Journal of Electronics.

    Rong Gu (gurongwalker@gmail.com) received his BS degree in computer science from Nanjing University of Aeronautics and Astronautics in 2011. He is currently a PhD candidate in computer science at Nanjing University. His research interests include parallel and distributed computing, cloud computing, and big-data parallel processing.

    Chunfeng Yuan (cfyuan@nju.edu.cn) is a professor at the Department of Computer Science, Nanjing University. She received her BS and MS degrees in computer science from Nanjing University. Her main research interests include compute system architecture, big data parallel processing, and Web information mining.

    Yihua Huang (yhuang@nju.edu.cn) is a professor at the Department of Computer Science, Nanjing University. He received his BS, MS, and PhD degrees in computer science from Nanjing University. His research interests include parallel and distributed computing, big-data parallel processing, and Web information mining.
  • 基金资助:
    This work is supported by China National Science Foundation (Grant 61223003) and ZTE Industry-Academia-Research Cooperation Funds.

Abstract: Relational database management systems are usually deployed on single-node machines and have strict limitations in terms of data structure. This means they do not work well with big data, and NoSQL has been proposed as a solution. To make data querying more efficient, indexes and memory cache techniques are used in NoSQL databases. In this paper, we propose a hierarchical indexing mechanism and a prototype distributed data-storage system, called HMIBase, which has hierarchical indexes for non-primary keys in tables and makes data querying more efficient. HMIBase uses HBase as the lower data storage and creates a memory cache for more efficient data transmission. HMIBase supports coprocessor-to-process update requests. It also provides a client with query and update APIs and a server to support RPCs from the client and finish jobs. To improve the cache hit ratio, we propose a memory cache replacement strategy, called Hot Score algorithm, in HMIBase. The experimental results show that Hot Score algorithm is better than other cache-replacement strategies.

Key words: NoSQL, In-Memory Index, HMIBase, Hot Score

摘要: Relational database management systems are usually deployed on single-node machines and have strict limitations in terms of data structure. This means they do not work well with big data, and NoSQL has been proposed as a solution. To make data querying more efficient, indexes and memory cache techniques are used in NoSQL databases. In this paper, we propose a hierarchical indexing mechanism and a prototype distributed data-storage system, called HMIBase, which has hierarchical indexes for non-primary keys in tables and makes data querying more efficient. HMIBase uses HBase as the lower data storage and creates a memory cache for more efficient data transmission. HMIBase supports coprocessor-to-process update requests. It also provides a client with query and update APIs and a server to support RPCs from the client and finish jobs. To improve the cache hit ratio, we propose a memory cache replacement strategy, called Hot Score algorithm, in HMIBase. The experimental results show that Hot Score algorithm is better than other cache-replacement strategies.

关键词: NoSQL, In-Memory Index, HMIBase, Hot Score