ZTE Communications ›› 2022, Vol. 20 ›› Issue (3): 27-34.DOI: 10.12142/ZTECOM.202203004

• Special Topic • Previous Articles     Next Articles

Federated Learning Based on Extremely Sparse Series Clinic Monitoring Data

LU Feng1, GU Lin1(), TIAN Xuehua1, SONG Cheng1, ZHOU Lun2   

  1. 1.School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
    2.Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430074, China
  • Received:2022-08-18 Online:2022-09-13 Published:2022-09-14
  • About author:LU Feng received her MS and PhD degrees in computer science from Huazhong University of Science and Technology, China in 1997 and 2006. She is currently an associate professor in School of Computer Science and Technology, Huazhong University of Science and Technology. Her current research interests include big data, artificial intelligence and distributed computing. She has authored two books and over 20 papers in refereed journals and conferences in these areas. She is a member of CCF and a senior member of the first Session of Chinese Hospital Association, Health Data Application and Management Committee. She was the recipient of three teaching achievement and curriculum development awards.|GU Lin (lingu@hust.edu.cn) received her MS and PhD degrees in computer science from University of Aizu, Fukushima, Japan in 2011 and 2015. She is currently an associate professor in School of Computer Science and Technology, Huazhong University of Science and Technology, China. Her current research interests include serverless computing, network function virtualization, cloud computing, software-defined networking, and data center networking. She has authored two books and over 40 papers in refereed journals and conferences in these areas. She is a member of IEEE and a senior number of CCF.|TIAN Xuehua received her MS degree from the School of Computer Science and Technology, Huazhong University of Science and Technology, China in 2022. She works on medical data mining and machine learning. She mainly focuses on working with sparse time series data.|SONG Cheng received his bachelor’s degree from the School of Computer Science and Artificial Intelligence, Wuhan University of Technology, China in 2021. He majored in software engineering. He is working on data mining and machine learning for an MS degree at Huazhong University of Science and Technology, China.|ZHOU Lun received his MS Degree in Tongji Medical College, Huazhong University of Science and Technology, China in 2003. He is an associate professor of geriatrics at Tongji Hospital. His research interests include the pathogenesis of congenital heart disease and early warning of severe diseases in the elderly. He has published more than 20 papers in refereed journals such as PNAS and JMCC.
  • Supported by:
    Hubei Provincial Development and Reform Commission Program “Hubei Big Data Analysis Platform and Intelligent Service Project for Medical and Health”

Abstract:

Decentralized machine learning frameworks, e.g., federated learning, are emerging to facilitate learning with medical data under privacy protection. It is widely agreed that the establishment of an accurate and robust medical learning model requires a large number of continuous synchronous monitoring data of patients from various types of monitoring facilities. However, the clinic monitoring data are usually sparse and imbalanced with errors and time irregularity, leading to inaccurate risk prediction results. To address this issue, this paper designs a medical data resampling and balancing scheme for federated learning to eliminate model biases caused by sample imbalance and provide accurate disease risk prediction on multi-center medical data. Experimental results on a real-world clinical database MIMIC-IV demonstrate that the proposed method can improve AUC (the area under the receiver operating characteristic) from 50.1% to 62.8%, with a significant performance improvement of accuracy from 76.8% to 82.2%, compared to a vanilla federated learning artificial neural network (ANN). Moreover, we increase the model’s tolerance for missing data from 20% to 50% compared with a stand-alone baseline model.

Key words: federate learning, time-series electronic health records (EHRs), feature engineering, imbalance sample