ZTE Communications ›› 2023, Vol. 21 ›› Issue (3): 11-21.DOI: 10.12142/ZTECOM.202303003
• Special Topic • Previous Articles Next Articles
FENG Bingyi, FENG Mingxiao, WANG Minrui, ZHOU Wengang, LI Houqiang()
Received:
2023-06-10
Online:
2023-09-21
Published:
2023-09-21
About author:
FENG Bingyi received his BE degree in computer science from Anhui University, China in 2021. He is working towards his MS degree at University of Science and Technology of China. His research interest focuses on deep reinforcement learning, multi-agent reinforcement learning, and machine learning systems.|FENG Mingxiao received his BE degree in computer science from University of Science and Technology of China in 2017. Now he is working towards his PhD degree with the School of Information Science and Technology, University of Science and Technology of China. His research interests mainly include deep reinforcement learning, multi-agent reinforcement learning, and large language model.|WANG Minrui received his BE degree in computer science from Anhui University, China in 2020, and his MS degree from the University of Science and Technology of China, in 2023. His research interests mainly include deep reinforcement learning, multi-agent reinforcement learning, and machine learning for recommendation systems.|ZHOU Wengang and LI Houqiang are the corresponding authors.|LI Houqiang (Supported by:
FENG Bingyi, FENG Mingxiao, WANG Minrui, ZHOU Wengang, LI Houqiang. Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management[J]. ZTE Communications, 2023, 21(3): 11-21.
Add to citation manager EndNote|Ris|BibTeX
URL: https://zte.magtechjournal.com/EN/10.12142/ZTECOM.202303003
Climate Zone 2A | Climate Zone 3A | Climate Zone 4A | Climate Zone 5A | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NSVV↓ | SRR↑ | NHVV↓ | HRR↑ | NSVV↓ | SRR↑ | NHVV↓ | HRR↑ | NSVV↓ | SRR↑ | NHVV↓ | HRR↑ | NSVV↓ | SRR↑ | NHVV↓ | HRR↑ | |
RBC | 86 181 | 0.0% | 158 736 | 0.0% | 110 902 | 0.0% | 193 751 | 0.0% | 83 648 | 0.0% | 162 076 | 0.0% | 106 823 | 0.0% | 195 277 | 0.0% |
A2C | 79 905 | 7.3% | 154 662 | 2.6% | 101 102 | 8.8% | 185 201 | 4.4% | 81 648 | 2.4% | 158 902 | 2.0% | 93 365 | 12.6% | 174 671 | 10.6% |
PPO | 79 601 | 7.6% | 153 849 | 3.1% | 100 954 | 9.0% | 184 365 | 4.8% | 81 224 | 2.9% | 155 645 | 4.0% | 92 920 | 13.0% | 173 997 | 10.9% |
MAA2C | 73 264 | 15.0% | 139 654 | 12.0% | 89 423 | 19.4% | 162 249 | 16.3% | 74 569 | 10.9% | 144 274 | 11.0% | 79 369 | 25.7% | 154 786 | 20.7% |
MAPPO | 73 919 | 14.2% | 139 210 | 12.3% | 88 236 | 20.4% | 160 345 | 17.2% | 74 126 | 11.4% | 144 316 | 11.0% | 78 314 | 26.7% | 150 322 | 23.0% |
HMAA2C | 64 516 | 25.1% | 125 497 | 20.9% | 78 158 | 29.5% | 146 392 | 24.4% | 63 105 | 24.6% | 122 568 | 24.4% | 60 766 | 43.1% | 127 494 | 34.7% |
HMAPPO | 63 320 | 26.5% | 123 116 | 22.4% | 77 724 | 29.9% | 145 946 | 24.7% | 62 865 | 24.8% | 121 829 | 24.8% | 59 887 | 43.9% | 125 386 | 35.8% |
Table 1 Overall performance on four scenarios, where HMAA2C and HMAPPO refers to MAA2C and MAPPO applied with multi-agent hierarchical graph attention (MAHGA) (↓ denotes the lower the better, and↑ denotes the higher the better)
Climate Zone 2A | Climate Zone 3A | Climate Zone 4A | Climate Zone 5A | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NSVV↓ | SRR↑ | NHVV↓ | HRR↑ | NSVV↓ | SRR↑ | NHVV↓ | HRR↑ | NSVV↓ | SRR↑ | NHVV↓ | HRR↑ | NSVV↓ | SRR↑ | NHVV↓ | HRR↑ | |
RBC | 86 181 | 0.0% | 158 736 | 0.0% | 110 902 | 0.0% | 193 751 | 0.0% | 83 648 | 0.0% | 162 076 | 0.0% | 106 823 | 0.0% | 195 277 | 0.0% |
A2C | 79 905 | 7.3% | 154 662 | 2.6% | 101 102 | 8.8% | 185 201 | 4.4% | 81 648 | 2.4% | 158 902 | 2.0% | 93 365 | 12.6% | 174 671 | 10.6% |
PPO | 79 601 | 7.6% | 153 849 | 3.1% | 100 954 | 9.0% | 184 365 | 4.8% | 81 224 | 2.9% | 155 645 | 4.0% | 92 920 | 13.0% | 173 997 | 10.9% |
MAA2C | 73 264 | 15.0% | 139 654 | 12.0% | 89 423 | 19.4% | 162 249 | 16.3% | 74 569 | 10.9% | 144 274 | 11.0% | 79 369 | 25.7% | 154 786 | 20.7% |
MAPPO | 73 919 | 14.2% | 139 210 | 12.3% | 88 236 | 20.4% | 160 345 | 17.2% | 74 126 | 11.4% | 144 316 | 11.0% | 78 314 | 26.7% | 150 322 | 23.0% |
HMAA2C | 64 516 | 25.1% | 125 497 | 20.9% | 78 158 | 29.5% | 146 392 | 24.4% | 63 105 | 24.6% | 122 568 | 24.4% | 60 766 | 43.1% | 127 494 | 34.7% |
HMAPPO | 63 320 | 26.5% | 123 116 | 22.4% | 77 724 | 29.9% | 145 946 | 24.7% | 62 865 | 24.8% | 121 829 | 24.8% | 59 887 | 43.9% | 125 386 | 35.8% |
1 |
LIU H T, WU W C. Online multi-agent reinforcement learning for decentralized inverter-based volt-VAR control [J]. IEEE transactions on smart grid, 2021, 12(4): 2980–2990. DOI: 10.1109/TSG.2021.3060027
DOI URL |
2 | WANG J H, XU W K, GU Y J, et al. Multi-agent reinforcement learning for active voltage control on power distribution networks [EB/OL]. (2021-10-27) [2023-06-10]. |
3 |
WANG D X, MENG K, GAO X D, et al. Coordinated dispatch of virtual energy storage systems in LV grids for voltage regulation [J]. IEEE transactions on industrial informatics, 2018, 14(6): 2452–2462. DOI: 10.1109/TII.2017.2769452
DOI URL |
4 |
YI J, WANG P, TAYLOR P C, et al. Distribution network voltage control using energy storage and demand side response [C]//The 3rd IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe). IEEE, 2013: 1–8. DOI: 10.1109/ISGTEurope.2012.6465666
DOI URL |
5 |
PIGOTT A, CROZIER C, BAKER K, et al. GridLearn: multiagent reinforcement learning for grid-aware building energy management [J]. Electric power systems research, 2022, 213: 108521. DOI: 10.1016/j.epsr.2022.108521
DOI URL |
6 |
LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments [C]//The 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6382–6393. DOI: 10.5555/3295222.3295385
DOI URL |
7 |
RASHID T, DE WITT C S, FARQUHAR G, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning [C]//35th International Conference on Machine Learning. ICML, 2018: 6846–6859. DOI: 10.48550/arXiv.1803.11485
DOI URL |
8 |
YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative multi-agent games [J]. Advances in neural information processing systems, 2022, 35: 24611–24624. DOI: 10.48550/arXiv.2103.01955
DOI URL |
9 | MA Y, HAO X, HAO J, et al. A hierarchical reinforcement learning based optimization framework for large-scale dynamic pickup and delivery problems [J]. Advances in neural information processing systems, 2021, 34: 23609–23620 |
10 |
ZHANG H C, FENG S Y, LIU C, et al. CityFlow: a multi-agent reinforcement learning environment for large scale city traffic scenario [C]//The World Wide Web Conference. ACM, 2019: 3620–3624. DOI: 10.1145/3308558.3314139
DOI URL |
11 |
CHEN X, QU G N, TANG Y J, et al. Reinforcement learning for selective key applications in power systems: recent advances and future challenges [J]. IEEE transactions on smart grid, 2022, 13(4): 2935–2958. DOI: 10.1109/TSG.2022.3154718
DOI URL |
12 | VAZQUEZ-CANTELI J R, DEY S, HENZE G, et al. Citylearn: standardizing research in multi-agent reinforcement learning for demand response and urban energy management [EB/OL]. (2020-12-18) [2023-06-10]. |
13 | PAPOUDAKIS G, CHRISTIANOS F, RAHMAN A, et al. Dealing with non-stationarity in multi-agent deep reinforcement learning [EB/OL]. (2019-6-11) [2023-06-10]. |
14 |
BAKER K. Learning warm-start points for Ac optimal power flow [C]//The 29th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2019: 1–6. DOI: 10.1109/MLSP.2019.8918690
DOI URL |
15 |
OLIEHOEK F A, AMATO C. Finite-horizon dec-POMDPs [M]. A concise introduction to decentralized POMDPS. Cham: Springer, 2016: 33–40. DOI: 10.1007/978-3-319-28929-8_3
DOI URL |
16 |
FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients [C]//The AAAI Conference on Artificial Intelligence. ACM, 2018: 2974–2982. DOI: 10.1609/aaai.v32i1.11794
DOI URL |
17 |
CHEN T Y, BU S R, LIU X, et al. Peer-to-peer energy trading and energy conversion in interconnected multi-energy microgrids using multi-agent deep reinforcement learning [J]. IEEE transactions on smart grid, 2022, 13(1): 715–727. DOI: 10.1109/TSG.2021.3124465
DOI URL |
18 |
GLATT R, SILVA F L DA, SOPER B, et al. Collaborative energy demand response with decentralized actor and centralized critic [C]//The 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation. ACM, 2021: 333–337. DOI: 10.1145/3486611.3488732
DOI URL |
19 | VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks [EB/OL]. (2017-10-30) [2023-06-10]. |
20 |
XU K, LI C, TIAN Y, et al. Representation learning on graphs with jumping knowledge networks [C]//International conference on machine learning. PMLR, 2018: 5453–5462. DOI: 10.48550/arXiv.1806.03536
DOI URL |
21 |
YOU Y N, CHEN T L, SUI Y D, et al. Graph contrastive learning with augmentations [C]//The 34th International Conference on Neural Information Processing Systems. ACM, 2020: 5812–5823. DOI: 10.5555/3495724.3496212
DOI URL |
22 | SRINIVAS A, LASKIN M, ABBEEL P. CURL: contrastive unsupervised representations for reinforcement learning [EB/OL]. (2020-04-08) [2023-06-10]. |
23 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [EB/OL]. (2017-08-28) [2023-06-10]. |
24 |
BARAN M E, WU F F. Network reconfiguration in distribution systems for loss reduction and load balancing [J]. IEEE power engineering review, 1989, 9(4): 101–102. DOI: 10.1109/MPER.1989.4310642
DOI URL |
25 |
MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [C]//The 33rd International Conference on International Conference on Machine Learning. ACM, 2016: 1928–1937. DOI: 10.48550/arXiv.1602.01783
DOI URL |
26 |
CAO D, HU W H, ZHAO J B, et al. A multi-agent deep reinforcement learning based voltage regulation using coordinated PV inverters [J]. IEEE transactions on power systems, 2020, 35(5): 4120–4123. DOI: 10.1109/TPWRS.2020.3000652
DOI URL |
27 |
ZHANG Y, WANG X N, WANG J H, et al. Deep reinforcement learning based volt-VAR optimization in smart distribution systems [J]. IEEE transactions on smart grid, 2021, 12(1): 361–371. DOI: 10.1109/TSG.2020.3010130
DOI URL |
28 |
SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation [C]//The 30th International Conference on Neural Information Processing Systems. ACM, 2016: 2252–2260. DOI: 10.5555/3157096.3157348
DOI URL |
29 |
FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al. Learning to communicate with Deep multi-agent reinforcement learning [C]//The 30th International Conference on Neural Information Processing Systems. ACM, 2016: 2145–2153. DOI: 10.5555/3157096.3157336
DOI URL |
30 |
JIANG J C, LU Z Q. Learning attentional communication for multi-agent cooperation [C]//The 32nd International Conference on Neural Information Processing Systems. ACM, 2018: 7265–7275. DOI: 10.5555/3327757.3327828
DOI URL |
31 |
GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for Quantum chemistry [C]//The 34th International Conference on Machine Learning. ACM, 2017: 1263–1272. DOI: 10.5555/3305381.3305512
DOI URL |
32 |
LEE J, LEE I, KANG J. Self-attention graph pooling [C]//International conference on machine learning. PMLR, 2019: 3734-3743. DOI: 10.48550/arXiv.1904.08082
DOI URL |
33 |
IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning [C]//International Conference on Machine Learning. PMLR, 2019: 2961–2970. DOI: 10.48550/arXiv.1810.02912
DOI URL |
34 | JIANG J C, DUN C, HUANG T J, et al. Graph convolutional reinforcement learning [EB/OL]. (2018-10-22) [2023-06-10]. |
35 |
RYU H, SHIN H, PARK J. Multi-agent actor-critic with hierarchical graph attention network [C]//The AAAI Conference on Artificial Intelligence. AAAI, 2020: 7236–7243. DOI: 10.1609/aaai.v34i05.6214
DOI URL |
36 | YOON D, HONG S, LEE B J, et al. Winning the L2RPN challenge: power grid management via semi-markov afterstate actor-critic [EB/OL]. (2021-01-13) [2023-06-10]. |
37 |
HOSSAIN R R, HUANG Q H, HUANG R K. Graph convolutional network-based topology embedded deep reinforcement learning for voltage stability control [J]. IEEE transactions on power systems, 2021, 36(5): 4848–4851. DOI: 10.1109/TPWRS.2021.3084469
DOI URL |
No related articles found! |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||