ZTE Communications ›› 2024, Vol. 22 ›› Issue (2): 49-54.DOI: 10.12142/ZTECOM.202402007

• Review • Previous Articles     Next Articles

Learned Distributed Query Optimizer: Architecture and Challenges

GAO Jun1(), HAN Yinjun2, LIN Yang2, MIAO Hao1, XU Mo2   

  1. 1.Peking University, Beijing 100871, China
    2.ZTE Corporation, Shenzhen 518057, China
  • Received:2023-07-28 Online:2024-06-28 Published:2024-06-25
  • About author:GAO Jun (gaojun@pku.edu.cn) received his BE and ME degrees in computer science from Shandong University, China in 1997 and 2000, and his PhD degree in computer science from Peking University, China in 2003. Currently he is a professor with the School of Computer Science, Peking University. His major research interests include web data management, graph data management and AI+DB.
    HAN Yinjun is a senior engineer with ZTE Corporation. He has published multiple papers, obtained more than ten authorized patents, won multiple provincial and ministerial awards, and is a senior member of CCF. His main research interests include database systems and storage systems.
    LIN Yang is a research and development engineer of ZTE Corporation. She received her master degree from Nanjing University of Science and Technology, China in 2017. Her research interests include query optimization, AI4DB and DB4AI.
    MIAOHao s a postgraduate student in the School of Computer Science, Peking University, China. His major research interests include graph neural network and AI+DB.
    XU Mo is a research and development engineer of ZTE Corporation. He received his master degree from Monash University, Australia. His research interests include query optimization, AI4DB and database kernel development.
  • Supported by:
    NSFC(61832001);ZTE Industry-University-Institute Fund Project

Abstract:

The query processing in distributed database management systems (DBMS) faces more challenges, such as more operators, and more factors in cost models and meta-data, than that in a single-node DMBS, in which query optimization is already an NP-hard problem. Learned query optimizers (mainly in the single-node DBMS) receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware. In this paper, we focus on extensions of learned query optimizers to distributed DBMSs. Specifically, we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones. In addition, we discuss the challenges and possible solutions.

Key words: distributed query processing, query optimization, learned query optimizer