ZTE Communications ›› 2023, Vol. 21 ›› Issue (3): 22-28.DOI: 10.12142/ZTECOM.202303004
• Special Topic • Previous Articles Next Articles
Received:
2023-06-16
Online:
2023-09-21
Published:
2023-09-21
About author:
YU Junpeng received his master’s degree in communication and information systems. He is a senior engineer with the Nanjing Research Institute of Electronics Technology, the deputy secretary-general of Intelligent Perception Special Committee of Jiangsu Association of Artificial Intelligence. His research interests include radar systems and intelligent processing technologies based on artificial intelligence. He has participated in many key artificial intelligence projects sponsored by the Ministry of Science and Technology of the People’s Republic of China.|CHEN Yiyu ( Supported by:
YU Junpeng, CHEN Yiyu. A Practical Reinforcement Learning Framework for Automatic Radar Detection[J]. ZTE Communications, 2023, 21(3): 22-28.
Add to citation manager EndNote|Ris|BibTeX
URL: https://zte.magtechjournal.com/EN/10.12142/ZTECOM.202303004
Test | Random Policy | Proposed | Experts |
---|---|---|---|
Trial 1 | -9.24 | 24.32 | 26.14 |
Trial 2 | 12.78 | 28.77 | 29.33 |
Trial 3 | 6.34 | 25.38 | 30.85 |
Average | 3.29 | 26.16 | 28.77 |
Table 1 Online testing results
Test | Random Policy | Proposed | Experts |
---|---|---|---|
Trial 1 | -9.24 | 24.32 | 26.14 |
Trial 2 | 12.78 | 28.77 | 29.33 |
Trial 3 | 6.34 | 25.38 | 30.85 |
Average | 3.29 | 26.16 | 28.77 |
1 |
GENG Z, YAN H, ZHANG J, et al. Deep-learning for radar: a survey [J]. IEEE access, 2021, 9: 141800- 141818. DOI: 10.1109/ACCESS.2021.3119561
DOI URL |
2 |
AZIZ M M, MAUD A R M, HABIB A. Reinforcement learning based techniques for radar anti-jamming [C]// International Bhurban Conference on Applied Sciences and Technologies (IBCAST). IEEE, 2021: 1021– 1025. DOI: 10.1109/IBCAST51254.2021.9393209
DOI URL |
3 | SUTTON R S, BARTO A G. Reinforcement learning: an introduction (2nd ed) [M]. Cambridge, USA: MIT press, 2018 |
4 |
SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search [J]. Nature, 2016, 529( 7587): 484– 489. DOI: 10.1038/nature16961
DOI URL |
5 |
VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning [J]. Nature, 2019, 575( 7782): 350– 354. DOI: 10.1038/s41586-019-1724-z
DOI URL |
6 | LI J J, KOYAMADA S, YE Q W, et al. Suphx: mastering mahjong with deep reinforcement learning [EB/OL]. [ 2023-04-16]. |
7 |
DEGRAVE J, FELICI F, BUCHLI J, et al. Magnetic control of tokamak plasmas through deep reinforcement learning [J]. Nature, 2022, 602( 7897): 414– 419. DOI: 10.1038/s41586-021-04301-9
DOI URL |
8 |
SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model [J]. Nature, 2020, 588( 7839): 604– 609. DOI: 10.1038/s41586-020-03051-4
DOI URL |
9 |
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518( 7540): 529– 533. DOI: 10.1038/nature14236
DOI URL |
10 |
WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning [C]// The 33rd International Conference on International Conference on Machine Learning. ACM, 2016: 1995– 2003. DOI: 10.5555/3045390.3045601
DOI URL |
11 | HAUSKNECHT M, STONE P. Deep recurrent Q-learning for partially observable MDPs [J]. AAAI fall symposium, 2015: 29– 37 |
12 |
LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning [EB/OL]. [ 2023-04-16]. . DOI: 10.1016/S1098-3015(10)67722-4
DOI URL |
13 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [EB/OL]. ( 2017-08-28) [ 2023-04-16]. |
14 | HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor [EB/OL]. ( 2018-08-08) [ 2023-04-16]. |
15 | FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods [C]// International Conference on Machine Learning. PMLR, 2018: 1587– 1596 |
16 | FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration [EB/OL]. ( 2019-08-10) [ 2023-04-16]. |
17 | KUMAR A, FU J, TUCKER G, et al. Stabilizing off-policy Q-learning via bootstrapping error reduction [EB/OL]. ( 2019-11-25) [ 2023-04-16]. |
18 | NACHUM O, DAI B, KOSTRIKOV I, et al. AlgaeDICE: policy gradient from arbitrary experience [EB/OL]. ( 2019-12-04) [ 2023-04-16]. |
19 | LIU Y, SWAMINATHAN A, AGARWAL A, et al. Off-policy policy gradient with state distribution correction [EB/OL]. ( 2019-07-16) [ 2023-04-16]. |
20 | KUMAR A, ZHOU A, TUCKER G, et al. Conservative Q-learning for offline reinforcement learning [EB/OL]. ( 2020-08-19) [ 2023-04-16]. |
21 | MATSUSHIMA T, FURUTA H, MATSUO Y, et al. Deployment-efficient reinforcement learning via model-based offline optimization [EB/OL]. ( 2020-06-05) [ 2023-04-16]. |
22 | YU T H, THOMAS G, YU L, et al. MOPO: model-based offline policy optimization [J]. Advances in neural information processing systems, 2020, 33: 14129- 14142. |
23 | KIDAMBI R, RAJESWARAN A, NETRAPALLI P, et al. MOReL: Model-based offline reinforcement learning [J]. Advances in neural information processing systems, 2020, 33: 21810– 21823 |
24 | YU T H, KUMAR A, RAFAILOV R, et al. COMBO: conservative offline model-based policy optimization [EB/OL]. ( 2022-01-27) [ 2023-04-16]. |
25 | JANNER M, FU J, ZHANG M, et al. When to trust your model: Model-based policy optimization [C]// The 33rd International Conference on Neural Information Processing Systems. ACM, 2019: 12519– 12530 |
26 |
FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks [C]// The 34th International Conference on Machine Learning. ACM, 2017: 1126– 1135. DOI: 10.5555/3305381.3305498
DOI URL |
27 | NICHOL A, ACHIAM J, SCHULMAN J. On first-order meta-learning algorithms [EB/OL]. ( 2018-10-22) [ 2023-04-16]. |
28 | SONG X Y, GAO W B, YANG Y X, et al. ES-MAML: simple hessian-free meta learning [EB/OL]. ( 2020-07-07) [ 2023-04-16]. |
29 | ANTONIO A, STORKEY A, EDWARDS H. How to train your MAML [EB/OL]. ( 2019-03-15) [ 2023-04-16]. |
30 | DUAN Y, SCHULMAN J, CHEN X, et al. RL 2: fast reinforcement learning via slow reinforcement learning [EB/OL]. ( 2016-11-10) [ 2023-04-16]. |
31 | MISHRA N, ROHANINEJAD M, CHEN X, et al. A simple neural attentive meta-learner [EB/OL]. ( 2018-02-15) [ 2023-04-16]. . |
32 | PARISOTTO E. Meta Reinforcement Learning through Memory [D]// Pittsburgh: Carnegie Mellon University, 2021 |
33 | SÆMUNDSSON S, HOFMANN K, DEISENROTH M P. Meta reinforcement learning with latent variable Gaussian processes [EB/OL]. ( 2018-05-20) [ 2023-06-16]. |
34 | ZINTGRAF L, SHIARLIS K, KURIN V, et al. Fast context adaptation via meta-learning [C]// The 36th International Conference on Machine Learning. ICML, 2019: 13262– 13276 |
35 |
LAN L, LI Z, GUAN X, et al. Meta reinforcement learning with task embedding and shared policy [C]// The 28th International Joint Conference on Artificial Intelligence. ACM, 2019: 2794– 2800. DOI: 10.24963/ijcai.2019/387
DOI URL |
36 | HUMPLIK J, GALASHOV A, HASENCLEVER L, et al. Meta reinforcement learning as task inference [EB/OL]. ( 2019-05-15) [ 2023-06-16]. |
37 | FAKOOR R, CHAUDHARI P, SOATTO S, et al. Meta-Q-Learning [EB/OL]. ( 2019-09-30) [ 2023-06-16]. |
38 | RAILEANU R, GOLDSTEIN M, SZLAM A D, et al. Fast adaptation to new environments via policy-dynamics value functions [C]// International Conference on Machine Learning. ICML, 2020: 7920– 7931 |
39 | ZINTGRAF L, SCHULZE S, LGL M, et al. VariBAD: variational Bayes-adaptive deep RL via meta-learning [J]. The journal of machine learning research, 2021, 22( 1): 13198– 13236 |
40 |
HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning [C]// Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 9726– 9735. DOI: 10.1109/CVPR42600.2020.00975
DOI URL |
41 | LASKIN M, SRINIVAS A, ABBEEL P C. Contrastive unsupervised representations for reinforcement learning [C]// The 37th International Conference on Machine Learning. ICML, 2020: 5595– 5606 |
42 | WANG B, XU S, KEUTZER K, et al. Improving context-based meta-reinforcement learning with self-supervised trajectory contrastive learning [EB/OL]. ( 2021-05-10) [ 2023-04-16]. |
43 |
WANG S S, LIU Z, XIE R, et al. Reinforcement learning for compressed-sensing based frequency agile radar in the presence of active interference [J]. Remote sensing, 2022, 14( 4): 968. DOI: 10.3390/rs14040968
DOI URL |
44 | PATTANAYAK K, KRISHNAMURTHY V, BERRY C. Meta-cognition. an inverse-inverse reinforcement learning approach for cognitive radars [C]// The 25th International Conference on Information Fusion (FUSION). IEEE, 2022: 1– 8 |
45 |
ZHAI W T, WANG X R, GRECO M S, et al. Weak target detection in massive MIMO radar via an improved reinforcement learning approach [C]// IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 4993– 4997. DOI: 10.1109/ICASSP43922.2022.9746472
DOI URL |
46 |
OTT J, SERVADEI L, MAURO G, et al. Uncertainty-based meta-reinforcement learning for robust radar tracking [C]// The 21st IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2023: 1476– 1483. DOI: 10.1109/ICMLA55696.2022.00232
DOI URL |
47 |
SNOW L, KRISHNAMURTHY V, SADLER B M. Identifying coordination in a cognitive radar network-a multi-objective inverse reinforcement learning approach [C]// IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1– 5. DOI: 10.1109/ICASSP49357.2023.10096376
DOI URL |
48 |
MENG F Q, TIAN K S, WU C F. Deep reinforcement learning-based radar network target assignment [J]. IEEE sensors journal, 2021, 21( 14): 16315– 16327. DOI: 10.1109/JSEN.2021.3074826
DOI URL |
49 | FU J, KUMAR A, NACHUM O, et al. D 4 RL: Datasets for deep data-driven reinforcement learning [EB/OL]. ( 2020-04-15) [ 2023-04-16]. |
50 | KINGMA D P, WELLING M. Auto-encoding variational Bayes [EB/OL]. ( 2013-12-10) [ 2023-04-16]. |
[1] | WEI Zhiqing, ZHANG Yongji, JI Danna, LI Chenfei. Sensing and Communication Integrated Fast Neighbor Discovery for UAV Networks [J]. ZTE Communications, 2024, 22(3): 69-82. |
[2] | SHEN Jiahao, JIANG Ke, TAN Xiaoyang. Boundary Data Augmentation for Offline Reinforcement Learning [J]. ZTE Communications, 2023, 21(3): 29-36. |
[3] | REN Min, XU Renyu, ZHU Ting. Double Deep Q-Network Decoder Based on EEG Brain-Computer Interface [J]. ZTE Communications, 2023, 21(3): 3-10. |
[4] | FENG Bingyi, FENG Mingxiao, WANG Minrui, ZHOU Wengang, LI Houqiang. Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management [J]. ZTE Communications, 2023, 21(3): 11-21. |
[5] | YOU Qian, XU Qian, YANG Xin, ZHANG Tao, CHEN Ming. RIS-Assisted UAV-D2D Communications Exploiting Deep Reinforcement Learning [J]. ZTE Communications, 2023, 21(2): 61-69. |
[6] | JIA Haonan, HE Zhenqing, TAN Wanlong, RUI Hua, LIN Wei. Distributed Multi-Cell Multi-User MISO Downlink Beamforming via Deep Reinforcement Learning [J]. ZTE Communications, 2022, 20(4): 69-77. |
[7] | JI Hong, ZHANG Tianxiang, ZHANG Kai, WANG Wanyuan, WU Weiwei. Efficient Network Slicing with Dynamic Resource Allocation [J]. ZTE Communications, 2021, 19(1): 11-19. |
[8] | Stephen ANOKYE, Mohammed SEID, SUN Guolin. A Survey on Machine Learning Based Proactive Caching [J]. ZTE Communications, 2019, 17(4): 46-55. |
[9] | DONG Shaokang, CHEN Jiarui, LIU Yong, BAO Tianyi, GAO Yang. Reinforcement Learning from Algorithm Model to Industry Innovation: A Foundation Stone of Future Artificial Intelligence [J]. ZTE Communications, 2019, 17(3): 31-41. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||