End-to-End Chinese Entity Recognition Based on BERT-BiLSTM-ATT-CRF

doi:10.12142/ZTECOM.2022S1005

Abstract

Abstract:

Traditional named entity recognition methods need professional domain knowledge and a large amount of human participation to extract features, as well as the Chinese named entity recognition method based on a neural network model, which brings the problem that vector representation is too singular in the process of character vector representation. To solve the above problem, we propose a Chinese named entity recognition method based on the BERT-BiLSTM-ATT-CRF model. Firstly, we use the bidirectional encoder representations from transformers (BERT) pre-training language model to obtain the semantic vector of the word according to the context information of the word; Secondly, the word vectors trained by BERT are input into the bidirectional long-term and short-term memory network embedded with attention mechanism (BiLSTM-ATT) to capture the most important semantic information in the sentence; Finally, the conditional random field (CRF) is used to learn the dependence between adjacent tags to obtain the global optimal sentence level tag sequence. The experimental results show that the proposed model achieves state-of-the-art performance on both Microsoft Research Asia (MSRA) corpus and people’s daily corpus, with F1 values of 94.77% and 95.97% respectively.

Key words: named entity recognition (NER), feature extraction, BERT model, BiLSTM, attention mechanism, CRF

LI Daiyi, TU Yaofeng, ZHOU Xiangsheng, ZHANG Yangming, MA Zongmin. End-to-End Chinese Entity Recognition Based on BERT-BiLSTM-ATT-CRF[J]. ZTE Communications, 2022, 20(S1): 27-35.

Figures/Tables 9

References 38

1	GRIDACH M. Character-level neural network for biomedical named entity recognition [J]. Journal of biomedical informatics, 2017, 70: 85–91. DOI: 10.1016/j.jbi.2017.05.002 DOI URL
2	LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2016. DOI: 10.18653/v1/n16-1030 DOI URL
3	MA X Z, HOVY E. End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF [C]//54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2016: 1064–1074
4	SHIN Y, LEE S G. Learning context using segment-level LSTM for neural sequence labeling [J]. IEEE/ACM transactions on audio, speech, and language processing, 2020, 28: 105–115. DOI: 10.1109/TASLP.2019.2948773 DOI URL
5	DONG D Z, OUYANG S. Optimization Techniques of Network Communication in Distributed Deep Learning Systems [J]. ZTE technology journal, 2020, 26(5): 2-8. DOI：10.12142/ZTETJ.202005002 DOI
6	HAMMERTON J. Named entity recognition with long short-term memory [C]//Proceedings of the seventh conference on natural language learning at HLT-NAACL. Association for Computational Linguistics, 2003: 172–175. DOI: 10.3115/1119176.1119202 DOI URL
7	CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs [J]. Transactions of the association for computational linguistics, 2016, 4: 357–370. DOI: 10.1162/tacl_a_00104 DOI URL
8	LI L S, GUO Y K. Biomedical named entity recognition based on CNN-BLSTM-CRF model [J]. Journal of Chinese information processing, 2018, 32(1): 116–122. DOI: 10.3969/j.issn.1003-0077.2018.01.015 DOI URL
9	LUO L, YANG Z H, YANG P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition [J]. Bioinformatics, 2017, 34(8): 1381–1388. DOI: 10.1093/bioinformatics/btx761 DOI URL
10	WU F Z, LIU J X, WU C H, et al. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation [C]//The World Wide Web Conference. ACM, 2019: 3342–3348. DOI: 10.1145/3308558.3313743 DOI URL
11	QIN Y, SHEN G W, ZHAO W B, et al. Network security entity recognition method based on deep neural network [J]. Journal of Nanjing university (natural science), 2019, 55 (1): 29–40
12	ZHANG Y, YANG J. Chinese NER using lattice LSTM [EB/OL]. (2018-07-05)[2020-05-01].
13	WANG L, XIE Y, ZHOU J S, et al. Fragment level Chinese named entity recognition based on neural network [J]. Journal of Chinese information processing, 2018, 32 (3): 84 – 90, 100. DOI: 10. 3969/j.issn.1003-0077.2018.03.012 DOI URL
14	LIU X J, GU L C, SHI X Z. Named entity recognition based on BiLSTM and attention mechanism [J]. Journal of luoyang institute of technology, 2019, 29 (1): 65 – 70
15	LIU W, XU T G, XU Q H, et al. An encoding strategy based word-character LSTM for Chinese NER [C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 2379–2389
16	DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2018-10-11)[2020-05-01].
17	MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL]. (2013-09-07)[2021-05-01].
18	PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation [C]//Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2014. DOI: 10.3115/v1/d14-1162 DOI URL
19	PETERS M E, NEUMANN M, IYYER M, et al. Deep Contextualized Word Representations [C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018: 2227–2237. DOI: 10.18653/v1/N18-1202 DOI URL
20	VASWANI A, SHAZEER N, PARMAR N. et al. Attention is all you need [C]//Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. NIPS, 2017: 5998–6008
21	JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I. An empirical exploration of recurrent network architectures [C]//32nd International Conference on Machine Learning. JMLR, 2015: 2342–2350
22	GUO D, ZHENG Q F, PENG X J, et al. Face detection detection, alignment alignment, quality assessment and attribute analysis with multi-task hybrid convolutional neural networks [J]. ZTE Communications, 2019, 17(3): 15–22. DOI: 10.12142/ZTECOM.201903004 DOI URL
23	GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J]. Neural networks, 2005, 18(5/6): 602–610. DOI: 10.1016/j.neunet.2005.06.042 DOI URL
24	TAN Z X, WANG M X, XIE J, et al. Deep semantic role labeling with self-attention [EB/OL]. (2017-12-05)[2020-05-01].
25	SHEN T, ZHOU T Y, LONG G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding [EB/OL]. (2017-11-20)[2020-05-01].
26	LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [C]//18th International Conference on Machine Learning 2001 (ICML2001. ACM, 2001: 282–289
27	ZHU Y Y, WANG G X, KARLSSON B F. CAN-NER: Convolutional attention network for Chinese named entity recognition [EB/OL]. (2019-04-30)[2020-05-01].
28	VITERBI A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm [J]. IEEE transactions on information theory, 1967, 13(2): 260–269. DOI: 10.1109/TIT.1967.1054010 DOI URL
29	SI N W, WANG H J, LI W, et al. Chinese part of speech tagging model based on attentional long-term memory network [J]. Computer science, 2018, 45 (4): 66–70
30	LEVOW G A. The third international Chinese language processing bakeoff: word segmentation and named entity recognition [C]//Fifth SIGHAN Workshop on Chinese Language Processing. Association for Computational Linguistics, 2006: 108–117
31	CHEN A T, PENG F C, SHAN R, et al. Chinese named entity recognition with conditional probabilistic models [C]//Fifth SIGHAN Workshop on Chinese Language Processing. Association for Computational Linguistics, 2006: 173–176
32	ZHANG S X, QIN Y, WEN J, et al. Word segmentation and named entity recognition for sighan bakeoff3 [C]//Fifth SIGHAN Workshop on Chinese Language Processing. Association for Computational Linguistics, 2013: 158–161
33	ZHOU J S, QU W G, ZHANG F. Chinese named entity recognition via joint identification and categorization [J]. Chinese journal of electronics, 2013, 22(2): 225–230
34	LU Y N, ZHANG Y, JI D H. Multiprototype Chinese character embedding [C]//Tenth International Conference on Language Resources and Evaluation. Association for Computational Linguistics, 2016: 855-859
35	DONG C H, ZHANG J J, ZONG C Q, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition [M]//Natural Language Understanding and Intelligent Applications. Cham, witzerland: Springer International Publishing, 2016: 239–250. DOI: 10.1007/978-3-319-50496-4_20 DOI URL
36	LI N, GUAN H M, YANG P, et al. Chinese named entity recognition method based on BERT-IDCNN-CRF [J]. Journal of shandong university (science edition), 2020, 55 (1): 102–109
37	XIE T, YANG J N, LIU H. Chinese entity recognition based on BERT-BiLSTM-CRF model [J]. Computer systems & applications, 2020(7): 48–55
38	HE H. HanLP: Han language processing [EB/OL]. (2020-04-30)[2020-07-01].

Dataset	Type	Train	Dev	Test
People’s Daily	Sentence	17.6k	0.9k	1.7k
MSRA	Sentence	46.4k	Null	4.4k

Dataset	Type	Train	Dev	Test
People’s Daily	Sentence	17.6k	0.9k	1.7k
MSRA	Sentence	46.4k	Null	4.4k

Layer	Parameter	Value
BERT	Transformer layer number	12
	Hidden layer dimension	768
	Head number	12
BiLSTM	Optimizer	Adam
	Batch size	32
	Dropout rate	0.5
	Learning rate	0.001 5
	Hidden layer number	200

Layer	Parameter	Value
BERT	Transformer layer number	12
	Hidden layer dimension	768
	Head number	12
BiLSTM	Optimizer	Adam
	Batch size	32
	Dropout rate	0.5
	Learning rate	0.001 5
	Hidden layer number	200

Model	P/%	R/%	F1/%
LSTM-CRF	84.20	80.20	82.00
BiLSTM	81.08	79.21	80.05
BiLSTM-CRF	87.21	83.21	85.09
BERT-BiLSTM-CRF	96.04	95.30	95.67
BERT-BiLSTM-ATT-CRF	96.28	95.67	95.97