Search Result

Select

Persistent Data Layout in File Systems

LUO Shengmei, LU Youyou, YANG Hongzhang, SHU Jiwu, ZHANG Jiacheng

ZTE Communications 2018, 16 (3): 59-66. DOI: 10.19729/j.cnki.1673-5188.2018.03.010

Abstract （106）

HTML （2）

PDF （430KB）（100）

Save

Data layout in a file system is the organization of data stored in external storages. The data layout has a huge impact on performance of storage systems. We survey three main kinds of data layout in traditional file systems: in-place update file system, log-structured file system, and copy-on-write file system. Each file system has its own strengths and weaknesses under different circumstances. We also include a recent usage of persistent layout in a file system that combines both flash memory and byte-addressable non-volatile memory. With this survey, we conclude that persistent data layout in file systems may evolve dramatically in the era of emerging non-volatile memory.

Table and Figures | Reference | Related Articles | Metrics

Select

Random Forest Based Very Fast Decision Tree Algorithm for Data Stream

DONG Zhenjiang, LUO Shengmei, WEN Tao, ZHANG Fayang, LI Lingjuan

ZTE Communications 2017, 15 (S2): 52-57. DOI: 10.3969/j.issn.1673-5188.2017.S2.009

Abstract （96）

HTML （0）

PDF （418KB）（180）

Save

The Very Fast Decision Tree (VFDT) algorithm is a classification algorithm for data streams. When processing large amounts of data, VFDT requires less time than traditional decision tree algorithms. However, when training samples become fewer, the label values of VFDT leaf nodes will have more errors, and the classification ability of single VFDT decision tree is limited. The Random Forest algorithm is a combinational classifier with high prediction accuracy and noise-tolerant ability. It is constituted by multiple decision trees and can make up for the shortage of single decision tree. In this paper, in order to improve the classification accuracy on data streams, the Random Forest algorithm is integrated into the process of tree building of the VFDT algorithm, and a new Random Forest Based Very Fast Decision Tree algorithm named RFVFDT is designed. The RFVFDT algorithm adopts the decision tree building criterion of a Random Forest classifier, and improves Random Forest algorithm with sliding window to meet the unboundedness of data streams and avoid process delay and data loss. Experimental results of the classification of KDD CUP data sets show that the classification accuracy of RFVFDT algorithm is higher than that of VFDT. The less the samples are, the more obvious the advantage is. RFVFDT is fast when running in the multi-thread mode.

Table and Figures | Reference | Related Articles | Metrics

Select

A Distributed In-Memory Database Solution for Mass Data Applications

Dong Hao, Luo Shengmei, Zhang Hengsheng

ZTE Communications 2010, 8 (4): 45-48.

Abstract （143）

PDF （576KB）（445）

Save

In this paper, a Distributed In-Memory Database (DIMDB) system is proposed to improve processing efficiency in mass data applications. The system uses an enhanced language similar to Structured Query Language (SQL) with a key-value storage schema. The design goals of the DIMDB system is described and its system architecture is discussed. Operation flow and the enhanced SQL-like language are also discussed, and experimental results are used to test the validity of the system.

Related Articles | Metrics

Select

Cloud Computing Technology and Its Applications

Zhao Pei, Lu Ping, Luo Shengmei

ZTE Communications 2010, 8 (4): 34-38.

Abstract （143）

PDF （642KB）（127）

Save

Virtualization and distributed parallel architecture are typical cloud computing technologies. In the area of virtualization technology, this article discusses physical resource pooling, resource pool management and use, cluster fault location and maintenance, resource pool grouping, and construction and application of heterogeneous virtualization platforms. In the area of distributed technology, distributed file system and Key/Value storage engine are discussed. A solution is proposed for the host bottleneck problem, and a standard storage interface is proposed for the distributed file system. A directory-based storage scheme for Key/Value storage engine is also proposed.