Learned Distributed Query Optimizer: Architecture and Challenges

doi:10.12142/ZTECOM.202402007

ZTE Communications ›› 2024, Vol. 22 ›› Issue (2): 49-54.DOI: 10.12142/ZTECOM.202402007

• Review • Previous Articles Next Articles

Learned Distributed Query Optimizer: Architecture and Challenges

GAO Jun¹(), HAN Yinjun², LIN Yang², MIAO Hao¹, XU Mo²

^1.Peking University, Beijing 100871, China
^2.ZTE Corporation, Shenzhen 518057, China

Received:2023-07-28 Online:2024-06-28 Published:2024-06-25
About author:GAO Jun (gaojun@pku.edu.cn) received his BE and ME degrees in computer science from Shandong University, China in 1997 and 2000, and his PhD degree in computer science from Peking University, China in 2003. Currently he is a professor with the School of Computer Science, Peking University. His major research interests include web data management, graph data management and AI+DB.
HAN Yinjun is a senior engineer with ZTE Corporation. He has published multiple papers, obtained more than ten authorized patents, won multiple provincial and ministerial awards, and is a senior member of CCF. His main research interests include database systems and storage systems.
LIN Yang is a research and development engineer of ZTE Corporation. She received her master degree from Nanjing University of Science and Technology, China in 2017. Her research interests include query optimization, AI4DB and DB4AI.
MIAOHao s a postgraduate student in the School of Computer Science, Peking University, China. His major research interests include graph neural network and AI+DB.
XU Mo is a research and development engineer of ZTE Corporation. He received his master degree from Monash University, Australia. His research interests include query optimization, AI4DB and database kernel development.
Supported by:
NSFC(61832001);ZTE Industry-University-Institute Fund Project

Abstract

Abstract:

The query processing in distributed database management systems (DBMS) faces more challenges, such as more operators, and more factors in cost models and meta-data, than that in a single-node DMBS, in which query optimization is already an NP-hard problem. Learned query optimizers (mainly in the single-node DBMS) receive attention due to its capability to capture data distributions and flexible ways to avoid hard-craft rules in refinement and adaptation to new hardware. In this paper, we focus on extensions of learned query optimizers to distributed DBMSs. Specifically, we propose one possible but general architecture of the learned query optimizer in the distributed context and highlight differences from the learned optimizer in the single-node ones. In addition, we discuss the challenges and possible solutions.

Key words: distributed query processing, query optimization, learned query optimizer

GAO Jun, HAN Yinjun, LIN Yang, MIAO Hao, XU Mo. Learned Distributed Query Optimizer: Architecture and Challenges[J]. ZTE Communications, 2024, 22(2): 49-54.

Figures/Tables 1

References 41

1	MUKHERJEE N, CHAVAN S, COLGAN M, et al. Distributed architecture of Oracle database in-memory [J]. Proceedings of the VLDB endowment, 2015, 8(12): 1630–1641. DOI: 10.14778/2824032.2824061
2	BLAKELEY J A, CUNNINGHAM C, ELLIS N, et al. Distributed/heterogeneous query processing in Microsoft SQL server [C]//The 21st International Conference on Data Engineering (ICDE’05). IEEE, 2005: 1001–1012. DOI: 10.1109/ICDE.2005.51
3	YANG Z K, YANG C H, HAN F S, et al. OceanBase [J]. Proceedings of the VLDB endowment, 2022, 15(12): 3385–3397. DOI: 10.14778/3554821.3554830
4	CHANG L, WANG Z W, MA T, et al. HAWQ: a massively parallel processing SQL engine in hadoop [C]//The 2014 ACM SIGMOD International Conference on Management of Data. ACM, 2014: 1223–1234. DOI: 10.1145/2588555.2595636
5	IBARAKI T, KAMEDA T. On the optimal nesting order for computing N-relational joins [J]. ACM transactions on database systems, 9(3): 482–502. DOI: 10.1145/1270.1498
6	RUPPRECHT L, CULHANE W, PIETZUCH P. SquirrelJoin: network-aware distributed join processing with lazy partitioning [J]. Proceedings of the VLDB endowment, 2017, 10(11): 1250–1261. DOI: 10.14778/3137628.3137636
7	WANG G P. The optimization of query processing in Oceanbase 4.0. [EB/OL]. (2022-11-23) [2023-08-01].
8	MARCUS R, NEGI P, MAO H Z, et al. Neo: a learned query optimizer [J]. Proceedings of the VLDB endowment, 2019, 12(11): 1705–1718. DOI: 10.14778/3342263.3342644
9	YU X, LI G L, CHAI C L, et al. Reinforcement learning with tree-LSTM for join order selection [C]//The 36th International Conference on Data Engineering (ICDE). IEEE, 2020: 1297–1308. DOI: 10.1109/ICDE48307.2020.00116
10	MARCUS R, NEGI P, MAO H Z, et al. BAO: making learned query optimization practical [C]//The 2021 International Conference on Management of Data. ACM, 2021: 1275–1288. DOI: 10.1145/3448016.3452838
11	NEGI P, INTERLANDI M, MARCUS R, et al. Steering query optimizers: a practical take on big data workloads [C]//The 2021 International Conference on Management of Data. ACM, 2021: 2557–2569. DOI: 10.1145/3448016.3457568
12	YANG Z H, CHIANG W L, LUAN S F, et al. Balsa: learning a query optimizer without expert demonstrations [C]//The 2022 International Conference on Management of Data. ACM, 2022: 931–944. DOI: 10.1145/3514221.3517885
13	CHEN T Y, GAO J, CHEN H D, et al. LOGER: a learned optimizer towards generating efficient and robust query execution plans [J]. Proceedings of the VLDB endowment, 2023, 16(7): 1777–1789. DOI: 10.14778/3587136.3587150
14	DOSHI L, ZHUANG V, JAIN G, et al. Kepler: robust learning for faster parametric query optimization [EB/OL]. [2023-08-01].
15	WANG W, ZHANG M H, CHEN G, et al. Database meets deep learning [J]. ACM SIGMOD record, 2016, 45(2): 17–22. DOI: 10.1145/3003665.3003669
16	ZHOU X H, CHAI C L, LI G L, et al. Database meets artificial intelligence: a survey [J]. IEEE transactions on knowledge and data engineering, 2022, 34(3): 1096–1116. DOI: 10.1109/TKDE.2020.2994641
17	LAN H, BAO Z F, PENG Y W. A survey on advancing the DBMS query optimizer: cardinality estimation, cost model, and plan enumeration [J]. Data science and engineering, 2021, 6(1): 86–101. DOI: 10.1007/s41019-020-00149-7
18	CAI Q P, CUI C, XIONG Y Y, et al. A survey on deep reinforcement learning for data processing and analytics [J]. IEEE transactions on knowledge and data engineering, 2023, 35(5): 4446–4465. DOI: 10.1109/TKDE.2022.3155196
19	ZHAO X Y, ZHOU X H, LI G L. Automatic database knob tuning: a survey [J]. IEEE transactions on knowledge and data engineering, 2023, 35(12): 12470–12490. DOI: 10.1109/TKDE.2023.3266893
20	GUO C X, CHEN H, ZHANG F, et al. Distributed join algorithms on multi-CPU clusters with GPUDirect RDMA [C]//The 48th International Conference on Parallel Processing. ACM, 2019: 1–10. DOI: 10.1145/3337821.3337862
21	GAO H, SAKHARNYKH N. Scaling joins to a thousand GPUs. [EB/OL]. [2023-08-01].
22	PAUL J, LU S L, HE B S, et al. MG-join: a scalable join for massively parallel multi-GPU architectures [C]//International Conference on Management of Data. ACM, 2021: 1413–1425. DOI: 10.1145/3448016.3457254
23	YANG Z H, LIANG E, KAMSETTY A, et al. Deep unsupervised cardinality estimation [EB/OL]. (2019-11-21) [2023-08-01].
24	HILPRECHT B, SCHMIDT A, KULESSA M, et al. DeepDB: learn from data, not from queries! [EB/OL]. (2019-09-02) [2023-08-01].
25	WANG J Y, CHAI C L, LIU J B, et al. FACE [J]. Proceedings of the VLDB endowment, 2021, 15(1): 72–84. DOI: 10.14778/3485450.3485458
26	DUTT A, WANG C, NAZI A, et al. Selectivity estimation for range predicates using lightweight models [J]. Proceedings of the VLDB endowment, 2019, 12(9): 1044–1057. DOI: 10.14778/3329772.3329780
27	LI B B, LU Y, KANDULA S. Warper: efficiently adapting learned cardinality estimators to data and workload drifts [C]//International Conference on Management of Data. ACM, 2022: 1920–1933. DOI: 10.1145/3514221.3526179
28	NEGI P, WU Z N, KIPF A, et al. Robust query driven cardinality estimation under changing workloads [J]. Proceedings of the VLDB endowment, 2023, 16(6): 1520–1533. DOI: 10.14778/3583140.3583164
29	KANDULA S, ORR L, CHAUDHURI S. Pushing data-induced predicates through joins in big-data clusters [J]. Proceedings of the VLDB endowment, 2019, 13(3): 252–265. DOI: 10.14778/3368289.3368292
30	ZHAO Y, CONG G, SHI J C, et al. QueryFormer [J]. Proceedings of the VLDB endowment, 2022, 15(8): 1658–1670. DOI: 10.14778/3529337.3529349
31	ZHANG H, YU J X, ZHANG Y K, et al. Parallel query processing: To separate communication from computation [C]//International Conference on Management of Data. ACM, 2022: 1447–1461. DOI: 10.1145/3514221.3526164
32	POLYCHRONIOU O, SEN R, ROSS K A. Track join: distributed joins with minimal network traffic [C]//SIGMOD International Conference on Management of Data. ACM, 2014: 1483–1494
33	STAMOS J W, YOUNG H C. A symmetric fragment and replicate algorithm for distributed joins [J]. IEEE transactions on parallel and distributed systems, 1993, 4(12): 1345–1354. DOI: 10.1109/71.250116
34	YANG Y, YOUILL M, WOICIK M, et al. FlexPushdownDB: hybrid pushdown and caching in a cloud DBMS [J]. Proceedings of the VLDB Endowment, 2021, 14(11): 2101–2113
35	ROY D, PANDA P, ROY K. Tree-CNN: a hierarchical deep convolutional neural network for incremental learning [EB/OL]. (2019-09-18) [2023-08-01].
36	TAI K S, SOCHER R, MANNING C D. Improved semantic representations from tree-structured long short-term memory networks [EB/OL]. (2015-05-30) [2023-08-01].
37	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//The 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6000–6010. DOI: 10.5555/3295222.3295349
38	YANG Z K, YANG C H, HAN F S, et al. OceanBase: a 707 million tpmC distributed relational database system [J]. Proceedings of the VLDB endowment, 2022, 15(12): 3385–3397. DOI: 10.14778/3554821.3554830
39	SIDDIQUI T, JINDAL A, QIAO S, et al. Cost models for big data query processing: Learning, retrofitting, and our findings [EB/OL]. (2020-02-07) [2023-08-01].
40	MARCUS R, PAPAEMMANOUIL O. Plan-structured deep neural network models for query performance prediction [EB/OL]. (2019-01-31) [2023-08-01].
41	WANG J C, DING D, WANG H, et al. Polyjuice: high-performance transactions via learned concurrency control [EB/OL]. (2021-06-15) [2023-08-01].

Learned Distributed Query Optimizer: Architecture and Challenges

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 1

References 41

Related Articles 0

Recommended Articles 0

Metrics