A Practical Reinforcement Learning Framework for Automatic Radar Detection

doi:10.12142/ZTECOM.202303004

Abstract

Abstract:

At present, the parameters of radar detection rely heavily on manual adjustment and empirical knowledge, resulting in low automation. Traditional manual adjustment methods cannot meet the requirements of modern radars for high efficiency, high precision, and high automation. Therefore, it is necessary to explore a new intelligent radar control learning framework and technology to improve the capability and automation of radar detection. Reinforcement learning is popular in decision task learning, but the shortage of samples in radar control tasks makes it difficult to meet the requirements of reinforcement learning. To address the above issues, we propose a practical radar operation reinforcement learning framework, and integrate offline reinforcement learning and meta-reinforcement learning methods to alleviate the sample requirements of reinforcement learning. Experimental results show that our method can automatically perform as humans in radar detection with real-world settings, thereby promoting the practical application of reinforcement learning in radar operation.

Key words: meta-reinforcement learning, radar detection, reinforcement learning, offline reinforcement learning

YU Junpeng, CHEN Yiyu. A Practical Reinforcement Learning Framework for Automatic Radar Detection[J]. ZTE Communications, 2023, 21(3): 22-28.

Figures/Tables 5

Figure 1 Framework of reinforcement learning

Figure 2 Environment interaction framework

Figure 3 Process of learning framework

Figure 4 Network structure of the agent

Table 1 Online testing results

Test	Random Policy	Proposed	Experts
Trial 1	-9.24	24.32	26.14
Trial 2	12.78	28.77	29.33
Trial 3	6.34	25.38	30.85
Average	3.29	26.16	28.77

References 50

1	GENG Z, YAN H, ZHANG J, et al. Deep-learning for radar: a survey [J]. IEEE access, 2021, 9: 141800- 141818. DOI: 10.1109/ACCESS.2021.3119561 DOI URL
2	AZIZ M M, MAUD A R M, HABIB A. Reinforcement learning based techniques for radar anti-jamming [C]// International Bhurban Conference on Applied Sciences and Technologies (IBCAST). IEEE, 2021: 1021– 1025. DOI: 10.1109/IBCAST51254.2021.9393209 DOI URL
3	SUTTON R S, BARTO A G. Reinforcement learning: an introduction (2nd ed) [M]. Cambridge, USA: MIT press, 2018
4	SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search [J]. Nature, 2016, 529( 7587): 484– 489. DOI: 10.1038/nature16961 DOI URL
5	VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning [J]. Nature, 2019, 575( 7782): 350– 354. DOI: 10.1038/s41586-019-1724-z DOI URL
6	LI J J, KOYAMADA S, YE Q W, et al. Suphx: mastering mahjong with deep reinforcement learning [EB/OL]. [ 2023-04-16].
7	DEGRAVE J, FELICI F, BUCHLI J, et al. Magnetic control of tokamak plasmas through deep reinforcement learning [J]. Nature, 2022, 602( 7897): 414– 419. DOI: 10.1038/s41586-021-04301-9 DOI URL
8	SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model [J]. Nature, 2020, 588( 7839): 604– 609. DOI: 10.1038/s41586-020-03051-4 DOI URL
9	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518( 7540): 529– 533. DOI: 10.1038/nature14236 DOI URL
10	WANG Z Y, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning [C]// The 33rd International Conference on International Conference on Machine Learning. ACM, 2016: 1995– 2003. DOI: 10.5555/3045390.3045601 DOI URL
11	HAUSKNECHT M, STONE P. Deep recurrent Q-learning for partially observable MDPs [J]. AAAI fall symposium, 2015: 29– 37
12	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning [EB/OL]. [ 2023-04-16]. . DOI: 10.1016/S1098-3015(10)67722-4 DOI URL
13	SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [EB/OL]. ( 2017-08-28) [ 2023-04-16].
14	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor [EB/OL]. ( 2018-08-08) [ 2023-04-16].
15	FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods [C]// International Conference on Machine Learning. PMLR, 2018: 1587– 1596
16	FUJIMOTO S, MEGER D, PRECUP D. Off-policy deep reinforcement learning without exploration [EB/OL]. ( 2019-08-10) [ 2023-04-16].
17	KUMAR A, FU J, TUCKER G, et al. Stabilizing off-policy Q-learning via bootstrapping error reduction [EB/OL]. ( 2019-11-25) [ 2023-04-16].
18	NACHUM O, DAI B, KOSTRIKOV I, et al. AlgaeDICE: policy gradient from arbitrary experience [EB/OL]. ( 2019-12-04) [ 2023-04-16].
19	LIU Y, SWAMINATHAN A, AGARWAL A, et al. Off-policy policy gradient with state distribution correction [EB/OL]. ( 2019-07-16) [ 2023-04-16].
20	KUMAR A, ZHOU A, TUCKER G, et al. Conservative Q-learning for offline reinforcement learning [EB/OL]. ( 2020-08-19) [ 2023-04-16].
21	MATSUSHIMA T, FURUTA H, MATSUO Y, et al. Deployment-efficient reinforcement learning via model-based offline optimization [EB/OL]. ( 2020-06-05) [ 2023-04-16].
22	YU T H, THOMAS G, YU L, et al. MOPO: model-based offline policy optimization [J]. Advances in neural information processing systems, 2020, 33: 14129- 14142.
23	KIDAMBI R, RAJESWARAN A, NETRAPALLI P, et al. MOReL: Model-based offline reinforcement learning [J]. Advances in neural information processing systems, 2020, 33: 21810– 21823
24	YU T H, KUMAR A, RAFAILOV R, et al. COMBO: conservative offline model-based policy optimization [EB/OL]. ( 2022-01-27) [ 2023-04-16].
25	JANNER M, FU J, ZHANG M, et al. When to trust your model: Model-based policy optimization [C]// The 33rd International Conference on Neural Information Processing Systems. ACM, 2019: 12519– 12530
26	FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks [C]// The 34th International Conference on Machine Learning. ACM, 2017: 1126– 1135. DOI: 10.5555/3305381.3305498 DOI URL
27	NICHOL A, ACHIAM J, SCHULMAN J. On first-order meta-learning algorithms [EB/OL]. ( 2018-10-22) [ 2023-04-16].
28	SONG X Y, GAO W B, YANG Y X, et al. ES-MAML: simple hessian-free meta learning [EB/OL]. ( 2020-07-07) [ 2023-04-16].
29	ANTONIO A, STORKEY A, EDWARDS H. How to train your MAML [EB/OL]. ( 2019-03-15) [ 2023-04-16].
30	DUAN Y, SCHULMAN J, CHEN X, et al. RL ²: fast reinforcement learning via slow reinforcement learning [EB/OL]. ( 2016-11-10) [ 2023-04-16].
31	MISHRA N, ROHANINEJAD M, CHEN X, et al. A simple neural attentive meta-learner [EB/OL]. ( 2018-02-15) [ 2023-04-16]. .
32	PARISOTTO E. Meta Reinforcement Learning through Memory [D]// Pittsburgh: Carnegie Mellon University, 2021
33	SÆMUNDSSON S, HOFMANN K, DEISENROTH M P. Meta reinforcement learning with latent variable Gaussian processes [EB/OL]. ( 2018-05-20) [ 2023-06-16].
34	ZINTGRAF L, SHIARLIS K, KURIN V, et al. Fast context adaptation via meta-learning [C]// The 36th International Conference on Machine Learning. ICML, 2019: 13262– 13276
35	LAN L, LI Z, GUAN X, et al. Meta reinforcement learning with task embedding and shared policy [C]// The 28th International Joint Conference on Artificial Intelligence. ACM, 2019: 2794– 2800. DOI: 10.24963/ijcai.2019/387 DOI URL
36	HUMPLIK J, GALASHOV A, HASENCLEVER L, et al. Meta reinforcement learning as task inference [EB/OL]. ( 2019-05-15) [ 2023-06-16].
37	FAKOOR R, CHAUDHARI P, SOATTO S, et al. Meta-Q-Learning [EB/OL]. ( 2019-09-30) [ 2023-06-16].
38	RAILEANU R, GOLDSTEIN M, SZLAM A D, et al. Fast adaptation to new environments via policy-dynamics value functions [C]// International Conference on Machine Learning. ICML, 2020: 7920– 7931
39	ZINTGRAF L, SCHULZE S, LGL M, et al. VariBAD: variational Bayes-adaptive deep RL via meta-learning [J]. The journal of machine learning research, 2021, 22( 1): 13198– 13236
40	HE K M, FAN H Q, WU Y X, et al. Momentum contrast for unsupervised visual representation learning [C]// Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020: 9726– 9735. DOI: 10.1109/CVPR42600.2020.00975 DOI URL
41	LASKIN M, SRINIVAS A, ABBEEL P C. Contrastive unsupervised representations for reinforcement learning [C]// The 37th International Conference on Machine Learning. ICML, 2020: 5595– 5606
42	WANG B, XU S, KEUTZER K, et al. Improving context-based meta-reinforcement learning with self-supervised trajectory contrastive learning [EB/OL]. ( 2021-05-10) [ 2023-04-16].
43	WANG S S, LIU Z, XIE R, et al. Reinforcement learning for compressed-sensing based frequency agile radar in the presence of active interference [J]. Remote sensing, 2022, 14( 4): 968. DOI: 10.3390/rs14040968 DOI URL
44	PATTANAYAK K, KRISHNAMURTHY V, BERRY C. Meta-cognition. an inverse-inverse reinforcement learning approach for cognitive radars [C]// The 25th International Conference on Information Fusion (FUSION). IEEE, 2022: 1– 8
45	ZHAI W T, WANG X R, GRECO M S, et al. Weak target detection in massive MIMO radar via an improved reinforcement learning approach [C]// IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 4993– 4997. DOI: 10.1109/ICASSP43922.2022.9746472 DOI URL
46	OTT J, SERVADEI L, MAURO G, et al. Uncertainty-based meta-reinforcement learning for robust radar tracking [C]// The 21st IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2023: 1476– 1483. DOI: 10.1109/ICMLA55696.2022.00232 DOI URL
47	SNOW L, KRISHNAMURTHY V, SADLER B M. Identifying coordination in a cognitive radar network-a multi-objective inverse reinforcement learning approach [C]// IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023: 1– 5. DOI: 10.1109/ICASSP49357.2023.10096376 DOI URL
48	MENG F Q, TIAN K S, WU C F. Deep reinforcement learning-based radar network target assignment [J]. IEEE sensors journal, 2021, 21( 14): 16315– 16327. DOI: 10.1109/JSEN.2021.3074826 DOI URL
49	FU J, KUMAR A, NACHUM O, et al. D 4 RL: Datasets for deep data-driven reinforcement learning [EB/OL]. ( 2020-04-15) [ 2023-04-16].
50	KINGMA D P, WELLING M. Auto-encoding variational Bayes [EB/OL]. ( 2013-12-10) [ 2023-04-16].

[1]	WEI Zhiqing, ZHANG Yongji, JI Danna, LI Chenfei. Sensing and Communication Integrated Fast Neighbor Discovery for UAV Networks [J]. ZTE Communications, 2024, 22(3): 69-82.
[2]	SHEN Jiahao, JIANG Ke, TAN Xiaoyang. Boundary Data Augmentation for Offline Reinforcement Learning [J]. ZTE Communications, 2023, 21(3): 29-36.
[3]	REN Min, XU Renyu, ZHU Ting. Double Deep Q-Network Decoder Based on EEG Brain-Computer Interface [J]. ZTE Communications, 2023, 21(3): 3-10.
[4]	FENG Bingyi, FENG Mingxiao, WANG Minrui, ZHOU Wengang, LI Houqiang. Multi-Agent Hierarchical Graph Attention Reinforcement Learning for Grid-Aware Energy Management [J]. ZTE Communications, 2023, 21(3): 11-21.
[5]	YOU Qian, XU Qian, YANG Xin, ZHANG Tao, CHEN Ming. RIS-Assisted UAV-D2D Communications Exploiting Deep Reinforcement Learning [J]. ZTE Communications, 2023, 21(2): 61-69.
[6]	JIA Haonan, HE Zhenqing, TAN Wanlong, RUI Hua, LIN Wei. Distributed Multi-Cell Multi-User MISO Downlink Beamforming via Deep Reinforcement Learning [J]. ZTE Communications, 2022, 20(4): 69-77.
[7]	JI Hong, ZHANG Tianxiang, ZHANG Kai, WANG Wanyuan, WU Weiwei. Efficient Network Slicing with Dynamic Resource Allocation [J]. ZTE Communications, 2021, 19(1): 11-19.
[8]	Stephen ANOKYE, Mohammed SEID, SUN Guolin. A Survey on Machine Learning Based Proactive Caching [J]. ZTE Communications, 2019, 17(4): 46-55.
[9]	DONG Shaokang, CHEN Jiarui, LIU Yong, BAO Tianyi, GAO Yang. Reinforcement Learning from Algorithm Model to Industry Innovation: A Foundation Stone of Future Artificial Intelligence [J]. ZTE Communications, 2019, 17(3): 31-41.