Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management

doi:10.12142/ZTECOM.202303003

ZTE Communications ›› 2023, Vol. 21 ›› Issue (3): 11-21.DOI: 10.12142/ZTECOM.202303003

• Special Topic • Previous Articles Next Articles

Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management

FENG Bingyi, FENG Mingxiao, WANG Minrui, ZHOU Wengang, LI Houqiang()

University of Science and Technology of China, Hefei 230026, China

Received:2023-06-10 Online:2023-09-21 Published:2023-09-21
About author:FENG Bingyi received his BE degree in computer science from Anhui University, China in 2021. He is working towards his MS degree at University of Science and Technology of China. His research interest focuses on deep reinforcement learning, multi-agent reinforcement learning, and machine learning systems.|FENG Mingxiao received his BE degree in computer science from University of Science and Technology of China in 2017. Now he is working towards his PhD degree with the School of Information Science and Technology, University of Science and Technology of China. His research interests mainly include deep reinforcement learning, multi-agent reinforcement learning, and large language model.|WANG Minrui received his BE degree in computer science from Anhui University, China in 2020, and his MS degree from the University of Science and Technology of China, in 2023. His research interests mainly include deep reinforcement learning, multi-agent reinforcement learning, and machine learning for recommendation systems.|ZHOU Wengang and LI Houqiang are the corresponding authors.|LI Houqiang (lihq@ustc.edu.cn) received his BS, ME, and PhD degrees in electronic engineering from University of Science and Technology of China (USTC) in 1992, 1997, and 2000, respectively. He is a professor and the Vice Dean of the School of Information Science and Technology, USTC, and the Director of MOE-Microsoft Key Laboratory of Multimedia Computing and Communication. He is a fellow of IEEE. His research interests include deep learning, reinforcement learning, image/video coding, image/video analysis, and computer vision, etc. He has authored and co-authored over 300 papers in journals and conferences, and holds over 60 granted patents.
Supported by:
National Key R&D Program of China(2022ZD0119802);National Natural Science Foundation of China(61836011)

Abstract

Abstract:

The increasing adoption of renewable energy has posed challenges for voltage regulation in power distribution networks. Grid-aware energy management, which includes the control of smart inverters and energy management systems, is a trending way to mitigate this problem. However, existing multi-agent reinforcement learning methods for grid-aware energy management have not sufficiently considered the importance of agent cooperation and the unique characteristics of the grid, which leads to limited performance. In this study, we propose a new approach named multi-agent hierarchical graph attention reinforcement learning framework (MAHGA) to stabilize the voltage. Specifically, under the paradigm of centralized training and decentralized execution, we model the power distribution network as a novel hierarchical graph containing the agent-level topology and the bus-level topology. Then a hierarchical graph attention model is devised to capture the complex correlation between agents. Moreover, we incorporate graph contrastive learning as an auxiliary task in the reinforcement learning process to improve representation learning from graphs. Experiments on several real-world scenarios reveal that our approach achieves the best performance and can reduce the number of voltage violations remarkably.

Key words: demand-side management, graph neural networks, multi-agent reinforcement learning, voltage regulation

FENG Bingyi, FENG Mingxiao, WANG Minrui, ZHOU Wengang, LI Houqiang. Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management[J]. ZTE Communications, 2023, 21(3): 11-21.

Figures/Tables 6

References 37

1	LIU H T, WU W C. Online multi-agent reinforcement learning for decentralized inverter-based volt-VAR control [J]. IEEE transactions on smart grid, 2021, 12(4): 2980–2990. DOI: 10.1109/TSG.2021.3060027 DOI URL
2	WANG J H, XU W K, GU Y J, et al. Multi-agent reinforcement learning for active voltage control on power distribution networks [EB/OL]. (2021-10-27) [2023-06-10].
3	WANG D X, MENG K, GAO X D, et al. Coordinated dispatch of virtual energy storage systems in LV grids for voltage regulation [J]. IEEE transactions on industrial informatics, 2018, 14(6): 2452–2462. DOI: 10.1109/TII.2017.2769452 DOI URL
4	YI J, WANG P, TAYLOR P C, et al. Distribution network voltage control using energy storage and demand side response [C]//The 3rd IEEE PES Innovative Smart Grid Technologies Europe (ISGT Europe). IEEE, 2013: 1–8. DOI: 10.1109/ISGTEurope.2012.6465666 DOI URL
5	PIGOTT A, CROZIER C, BAKER K, et al. GridLearn: multiagent reinforcement learning for grid-aware building energy management [J]. Electric power systems research, 2022, 213: 108521. DOI: 10.1016/j.epsr.2022.108521 DOI URL
6	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments [C]//The 31st International Conference on Neural Information Processing Systems. ACM, 2017: 6382–6393. DOI: 10.5555/3295222.3295385 DOI URL
7	RASHID T, DE WITT C S, FARQUHAR G, et al. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning [C]//35th International Conference on Machine Learning. ICML, 2018: 6846–6859. DOI: 10.48550/arXiv.1803.11485 DOI URL
8	YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative multi-agent games [J]. Advances in neural information processing systems, 2022, 35: 24611–24624. DOI: 10.48550/arXiv.2103.01955 DOI URL
9	MA Y, HAO X, HAO J, et al. A hierarchical reinforcement learning based optimization framework for large-scale dynamic pickup and delivery problems [J]. Advances in neural information processing systems, 2021, 34: 23609–23620
10	ZHANG H C, FENG S Y, LIU C, et al. CityFlow: a multi-agent reinforcement learning environment for large scale city traffic scenario [C]//The World Wide Web Conference. ACM, 2019: 3620–3624. DOI: 10.1145/3308558.3314139 DOI URL
11	CHEN X, QU G N, TANG Y J, et al. Reinforcement learning for selective key applications in power systems: recent advances and future challenges [J]. IEEE transactions on smart grid, 2022, 13(4): 2935–2958. DOI: 10.1109/TSG.2022.3154718 DOI URL
12	VAZQUEZ-CANTELI J R, DEY S, HENZE G, et al. Citylearn: standardizing research in multi-agent reinforcement learning for demand response and urban energy management [EB/OL]. (2020-12-18) [2023-06-10].
13	PAPOUDAKIS G, CHRISTIANOS F, RAHMAN A, et al. Dealing with non-stationarity in multi-agent deep reinforcement learning [EB/OL]. (2019-6-11) [2023-06-10].
14	BAKER K. Learning warm-start points for Ac optimal power flow [C]//The 29th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2019: 1–6. DOI: 10.1109/MLSP.2019.8918690 DOI URL
15	OLIEHOEK F A, AMATO C. Finite-horizon dec-POMDPs [M]. A concise introduction to decentralized POMDPS. Cham: Springer, 2016: 33–40. DOI: 10.1007/978-3-319-28929-8_3 DOI URL
16	FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual multi-agent policy gradients [C]//The AAAI Conference on Artificial Intelligence. ACM, 2018: 2974–2982. DOI: 10.1609/aaai.v32i1.11794 DOI URL
17	CHEN T Y, BU S R, LIU X, et al. Peer-to-peer energy trading and energy conversion in interconnected multi-energy microgrids using multi-agent deep reinforcement learning [J]. IEEE transactions on smart grid, 2022, 13(1): 715–727. DOI: 10.1109/TSG.2021.3124465 DOI URL
18	GLATT R, SILVA F L DA, SOPER B, et al. Collaborative energy demand response with decentralized actor and centralized critic [C]//The 8th ACM International Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation. ACM, 2021: 333–337. DOI: 10.1145/3486611.3488732 DOI URL
19	VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks [EB/OL]. (2017-10-30) [2023-06-10].
20	XU K, LI C, TIAN Y, et al. Representation learning on graphs with jumping knowledge networks [C]//International conference on machine learning. PMLR, 2018: 5453–5462. DOI: 10.48550/arXiv.1806.03536 DOI URL
21	YOU Y N, CHEN T L, SUI Y D, et al. Graph contrastive learning with augmentations [C]//The 34th International Conference on Neural Information Processing Systems. ACM, 2020: 5812–5823. DOI: 10.5555/3495724.3496212 DOI URL
22	SRINIVAS A, LASKIN M, ABBEEL P. CURL: contrastive unsupervised representations for reinforcement learning [EB/OL]. (2020-04-08) [2023-06-10].
23	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [EB/OL]. (2017-08-28) [2023-06-10].
24	BARAN M E, WU F F. Network reconfiguration in distribution systems for loss reduction and load balancing [J]. IEEE power engineering review, 1989, 9(4): 101–102. DOI: 10.1109/MPER.1989.4310642 DOI URL
25	MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [C]//The 33rd International Conference on International Conference on Machine Learning. ACM, 2016: 1928–1937. DOI: 10.48550/arXiv.1602.01783 DOI URL
26	CAO D, HU W H, ZHAO J B, et al. A multi-agent deep reinforcement learning based voltage regulation using coordinated PV inverters [J]. IEEE transactions on power systems, 2020, 35(5): 4120–4123. DOI: 10.1109/TPWRS.2020.3000652 DOI URL
27	ZHANG Y, WANG X N, WANG J H, et al. Deep reinforcement learning based volt-VAR optimization in smart distribution systems [J]. IEEE transactions on smart grid, 2021, 12(1): 361–371. DOI: 10.1109/TSG.2020.3010130 DOI URL
28	SUKHBAATAR S, SZLAM A, FERGUS R. Learning multiagent communication with backpropagation [C]//The 30th International Conference on Neural Information Processing Systems. ACM, 2016: 2252–2260. DOI: 10.5555/3157096.3157348 DOI URL
29	FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al. Learning to communicate with Deep multi-agent reinforcement learning [C]//The 30th International Conference on Neural Information Processing Systems. ACM, 2016: 2145–2153. DOI: 10.5555/3157096.3157336 DOI URL
30	JIANG J C, LU Z Q. Learning attentional communication for multi-agent cooperation [C]//The 32nd International Conference on Neural Information Processing Systems. ACM, 2018: 7265–7275. DOI: 10.5555/3327757.3327828 DOI URL
31	GILMER J, SCHOENHOLZ S S, RILEY P F, et al. Neural message passing for Quantum chemistry [C]//The 34th International Conference on Machine Learning. ACM, 2017: 1263–1272. DOI: 10.5555/3305381.3305512 DOI URL
32	LEE J, LEE I, KANG J. Self-attention graph pooling [C]//International conference on machine learning. PMLR, 2019: 3734-3743. DOI: 10.48550/arXiv.1904.08082 DOI URL
33	IQBAL S, SHA F. Actor-attention-critic for multi-agent reinforcement learning [C]//International Conference on Machine Learning. PMLR, 2019: 2961–2970. DOI: 10.48550/arXiv.1810.02912 DOI URL
34	JIANG J C, DUN C, HUANG T J, et al. Graph convolutional reinforcement learning [EB/OL]. (2018-10-22) [2023-06-10].
35	RYU H, SHIN H, PARK J. Multi-agent actor-critic with hierarchical graph attention network [C]//The AAAI Conference on Artificial Intelligence. AAAI, 2020: 7236–7243. DOI: 10.1609/aaai.v34i05.6214 DOI URL
36	YOON D, HONG S, LEE B J, et al. Winning the L2RPN challenge: power grid management via semi-markov afterstate actor-critic [EB/OL]. (2021-01-13) [2023-06-10].
37	HOSSAIN R R, HUANG Q H, HUANG R K. Graph convolutional network-based topology embedded deep reinforcement learning for voltage stability control [J]. IEEE transactions on power systems, 2021, 36(5): 4848–4851. DOI: 10.1109/TPWRS.2021.3084469 DOI URL

	Climate Zone 2A				Climate Zone 3A				Climate Zone 4A				Climate Zone 5A
	NSVV↓	SRR↑	NHVV↓	HRR↑	NSVV↓	SRR↑	NHVV↓	HRR↑	NSVV↓	SRR↑	NHVV↓	HRR↑	NSVV↓	SRR↑	NHVV↓	HRR↑
RBC	86 181	0.0%	158 736	0.0%	110 902	0.0%	193 751	0.0%	83 648	0.0%	162 076	0.0%	106 823	0.0%	195 277	0.0%
A2C	79 905	7.3%	154 662	2.6%	101 102	8.8%	185 201	4.4%	81 648	2.4%	158 902	2.0%	93 365	12.6%	174 671	10.6%
PPO	79 601	7.6%	153 849	3.1%	100 954	9.0%	184 365	4.8%	81 224	2.9%	155 645	4.0%	92 920	13.0%	173 997	10.9%
MAA2C	73 264	15.0%	139 654	12.0%	89 423	19.4%	162 249	16.3%	74 569	10.9%	144 274	11.0%	79 369	25.7%	154 786	20.7%
MAPPO	73 919	14.2%	139 210	12.3%	88 236	20.4%	160 345	17.2%	74 126	11.4%	144 316	11.0%	78 314	26.7%	150 322	23.0%
HMAA2C	64 516	25.1%	125 497	20.9%	78 158	29.5%	146 392	24.4%	63 105	24.6%	122 568	24.4%	60 766	43.1%	127 494	34.7%
HMAPPO	63 320	26.5%	123 116	22.4%	77 724	29.9%	145 946	24.7%	62 865	24.8%	121 829	24.8%	59 887	43.9%	125 386	35.8%

	Climate Zone 2A				Climate Zone 3A				Climate Zone 4A				Climate Zone 5A
	NSVV↓	SRR↑	NHVV↓	HRR↑	NSVV↓	SRR↑	NHVV↓	HRR↑	NSVV↓	SRR↑	NHVV↓	HRR↑	NSVV↓	SRR↑	NHVV↓	HRR↑
RBC	86 181	0.0%	158 736	0.0%	110 902	0.0%	193 751	0.0%	83 648	0.0%	162 076	0.0%	106 823	0.0%	195 277	0.0%
A2C	79 905	7.3%	154 662	2.6%	101 102	8.8%	185 201	4.4%	81 648	2.4%	158 902	2.0%	93 365	12.6%	174 671	10.6%
PPO	79 601	7.6%	153 849	3.1%	100 954	9.0%	184 365	4.8%	81 224	2.9%	155 645	4.0%	92 920	13.0%	173 997	10.9%
MAA2C	73 264	15.0%	139 654	12.0%	89 423	19.4%	162 249	16.3%	74 569	10.9%	144 274	11.0%	79 369	25.7%	154 786	20.7%
MAPPO	73 919	14.2%	139 210	12.3%	88 236	20.4%	160 345	17.2%	74 126	11.4%	144 316	11.0%	78 314	26.7%	150 322	23.0%
HMAA2C	64 516	25.1%	125 497	20.9%	78 158	29.5%	146 392	24.4%	63 105	24.6%	122 568	24.4%	60 766	43.1%	127 494	34.7%
HMAPPO	63 320	26.5%	123 116	22.4%	77 724	29.9%	145 946	24.7%	62 865	24.8%	121 829	24.8%	59 887	43.9%	125 386	35.8%

Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 6

References 37

Related Articles 0

Recommended Articles 0

Metrics