ZTE Communications ›› 2013, Vol. 11 ›› Issue (2): 1-2.

• Special Topic • Previous Articles     Next Articles

Big Data:Where Dreams Take Flight

Chengzhong Xu1 and Zhibin Yu2   

  1. 1. Wayne State University, Detroit, USA
    2. Shenzhen Institutes of Advanced Technology, China
  • Online:2013-06-25 Published:2013-06-25
  • About author:Chengzhong Xu received his BSc degree and MSc degree in computer science and engineering from Nanjing University in 1986 and 1989. He received his PhD degree in computer engineering from the University of Hong Kong in 1993. His research interests include computer architecture, distributed systems, virtualization, and cloud computing. Dr. Xu is a professor of electrical and computer engineering at Wayne State University, Detroit, USA. He is also the director of the Cloud and Internet Computer Laboratory at Wayne State University. He is a senior member of the IEEE and member of the ACM.

    Zhibin Yu received his PhD degree in computer science from Huazhong University of Science and Technology (HUST) in 2008. He spent one year as a visiting scholar at the Laboratory of Computer Architecture, Department of Electrical and Computer Engineering, University of Texas at Austin. He is currently an associate professor at the Shenzhen Institutes of Advanced Technology, China. His research interests include micro-architecture simulation, computer architecture, workload characterization and generation, performance evaluation, multicore architecture, and virtualization technologies. In 2005, he won first prize in the HUST Young Lecturers Teaching Contest. In 2003, he won second prize in the HUST Teaching Quality Assessment. He is a member of the IEEE and ACM.

Big Data:Where Dreams Take Flight

Chengzhong Xu1 and Zhibin Yu2   

  1. 1. Wayne State University, Detroit, USA
    2. Shenzhen Institutes of Advanced Technology, China
  • 作者简介:Chengzhong Xu received his BSc degree and MSc degree in computer science and engineering from Nanjing University in 1986 and 1989. He received his PhD degree in computer engineering from the University of Hong Kong in 1993. His research interests include computer architecture, distributed systems, virtualization, and cloud computing. Dr. Xu is a professor of electrical and computer engineering at Wayne State University, Detroit, USA. He is also the director of the Cloud and Internet Computer Laboratory at Wayne State University. He is a senior member of the IEEE and member of the ACM.

    Zhibin Yu received his PhD degree in computer science from Huazhong University of Science and Technology (HUST) in 2008. He spent one year as a visiting scholar at the Laboratory of Computer Architecture, Department of Electrical and Computer Engineering, University of Texas at Austin. He is currently an associate professor at the Shenzhen Institutes of Advanced Technology, China. His research interests include micro-architecture simulation, computer architecture, workload characterization and generation, performance evaluation, multicore architecture, and virtualization technologies. In 2005, he won first prize in the HUST Young Lecturers Teaching Contest. In 2003, he won second prize in the HUST Teaching Quality Assessment. He is a member of the IEEE and ACM.

Abstract: From academia to industry, big data has become a buzzword in information technology. The US Federal Government is paying much attention to the big-data revolution. In 2012, fourteen US government departments allocated funds to 87 big-data projects [1]. Europe has the second largest amount of data [2], and most universities and research institutes have already established big-data research programs. In Asia, especially in China, central and local governments have been setting aside funds for their own big-data programs. The big-data related 973 Projects in China are good examples of this. Industry players have been following in the footsteps of big-data pioneers such as Google, Facebook, Twitter, and Baidu, and more and more companies are rushing into the big-data business. Companies have been analyzing the purchasing behavior of huge numbers of customers and have been devising more attractive plans and policies. Big data is already an important part of the $64 billion database and data analytics market [3]. Indeed, big data will open up commercial opportunities comparable in scale to those created by enterprise software of the late 1980s, the internet of the 1990s, and the social media explosion today.
However, what is big data? It has been defined in many different ways. We prefer to define big data as data sets that are too big for current information technologies to capture, transmit, store, process, or visualize. Although this definition is simple, it encompasses computing complexity theory, computer architecture, operating system, programming model, database technologies, algorithms, and applications. People from different fields have dramatically different understandings of big data, which is why there is so much excitement and conjecture surrounding it.
In this special issue, we present papers that discuss big-data technology from different perspectives. These are not only high-level surveys but also reports on initial results from big-data projects. Communication infrastructure is one of the most important aspects of big data. Yi Zhu and Zhengkun Mi from Nanjing University of Posts and Telecommunications discuss content-centric networking, which is seen as a promising approach to big-data distribution. They propose a networking architecture for processing big data, and this architecture is fundamentally different from TCP/IP. Shengmei Luo et al. from the Cloud Computing & IT Institute of ZTE Corporation present a survey of big-data analytics. They analyze challenges related to storage, data-mining algorithms, and programming models for big data. They also predict opportunities in the big-data era. Although there are many potential business opportunities in big data, security is of the utmost importance for users and cannot be overlooked. Ruixuan Li et al. from Huazhong University of Science and Technology provide an overview of data security and privacy-preservation for cloud storage. They carefully investigate confidentiality, data integrity, and data availability. They also propose a feasible solution to current security problems. Shigang Chen et al. from the University of Florida delve more deeply into data integrity. They propose a novel authenticated data structure called Cloud Merkle B+ tree that supports dynamic operations such as insertion, deletion and modification. CMBT lowers overhead fromO (n ) toO (logn ).
Moving to big data applications, algorithms oriented towards a single machine are not necessarily efficient in big-data platforms because many machines need to run concurrently for the same task. Weisong Shi et al. from Wayne State University design a mechanism called SPBD that reduces the response time of big-data systems. This mechanism is very feasible in practice. Zhendong Bei et al. report their experiences with big-data applications that use MapReduce/Hadoop. They confirm that manually tuning up to 190 Hadoop configuration parameters is extremely time consuming, if at all possible. They then propose an automatic performance prediction scheme based on random forest to determine the best configuration parameter combinations. Their experimental results show that their scheme can predict the performance of Hadoop systems very accurately.
Challenges and opportunities exist together in the big-data era. We believe most of these challenges will be overcome and opportunities will be realized. Big data is a field where dreams will take flight.

Key words: Big Data

摘要: From academia to industry, big data has become a buzzword in information technology. The US Federal Government is paying much attention to the big-data revolution. In 2012, fourteen US government departments allocated funds to 87 big-data projects [1]. Europe has the second largest amount of data [2], and most universities and research institutes have already established big-data research programs. In Asia, especially in China, central and local governments have been setting aside funds for their own big-data programs. The big-data related 973 Projects in China are good examples of this. Industry players have been following in the footsteps of big-data pioneers such as Google, Facebook, Twitter, and Baidu, and more and more companies are rushing into the big-data business. Companies have been analyzing the purchasing behavior of huge numbers of customers and have been devising more attractive plans and policies. Big data is already an important part of the $64 billion database and data analytics market [3]. Indeed, big data will open up commercial opportunities comparable in scale to those created by enterprise software of the late 1980s, the internet of the 1990s, and the social media explosion today.
However, what is big data? It has been defined in many different ways. We prefer to define big data as data sets that are too big for current information technologies to capture, transmit, store, process, or visualize. Although this definition is simple, it encompasses computing complexity theory, computer architecture, operating system, programming model, database technologies, algorithms, and applications. People from different fields have dramatically different understandings of big data, which is why there is so much excitement and conjecture surrounding it.
In this special issue, we present papers that discuss big-data technology from different perspectives. These are not only high-level surveys but also reports on initial results from big-data projects. Communication infrastructure is one of the most important aspects of big data. Yi Zhu and Zhengkun Mi from Nanjing University of Posts and Telecommunications discuss content-centric networking, which is seen as a promising approach to big-data distribution. They propose a networking architecture for processing big data, and this architecture is fundamentally different from TCP/IP. Shengmei Luo et al. from the Cloud Computing & IT Institute of ZTE Corporation present a survey of big-data analytics. They analyze challenges related to storage, data-mining algorithms, and programming models for big data. They also predict opportunities in the big-data era. Although there are many potential business opportunities in big data, security is of the utmost importance for users and cannot be overlooked. Ruixuan Li et al. from Huazhong University of Science and Technology provide an overview of data security and privacy-preservation for cloud storage. They carefully investigate confidentiality, data integrity, and data availability. They also propose a feasible solution to current security problems. Shigang Chen et al. from the University of Florida delve more deeply into data integrity. They propose a novel authenticated data structure called Cloud Merkle B+ tree that supports dynamic operations such as insertion, deletion and modification. CMBT lowers overhead fromO (n ) toO (logn ).
Moving to big data applications, algorithms oriented towards a single machine are not necessarily efficient in big-data platforms because many machines need to run concurrently for the same task. Weisong Shi et al. from Wayne State University design a mechanism called SPBD that reduces the response time of big-data systems. This mechanism is very feasible in practice. Zhendong Bei et al. report their experiences with big-data applications that use MapReduce/Hadoop. They confirm that manually tuning up to 190 Hadoop configuration parameters is extremely time consuming, if at all possible. They then propose an automatic performance prediction scheme based on random forest to determine the best configuration parameter combinations. Their experimental results show that their scheme can predict the performance of Hadoop systems very accurately.
Challenges and opportunities exist together in the big-data era. We believe most of these challenges will be overcome and opportunities will be realized. Big data is a field where dreams will take flight.

关键词: Big Data