ZTE Communications ›› 2022, Vol. 20 ›› Issue (S1): 27-35.DOI: 10.12142/ZTECOM.2022S1005
• Research Paper • Previous Articles Next Articles
LI Daiyi1(), TU Yaofeng2, ZHOU Xiangsheng2, ZHANG Yangming2, MA Zongmin1
Online:
2022-01-25
Published:
2022-03-01
About author:
LI Daiyi (Supported by:
LI Daiyi, TU Yaofeng, ZHOU Xiangsheng, ZHANG Yangming, MA Zongmin. End-to-End Chinese Entity Recognition Based on BERT-BiLSTM-ATT-CRF[J]. ZTE Communications, 2022, 20(S1): 27-35.
Add to citation manager EndNote|Ris|BibTeX
URL: https://zte.magtechjournal.com/EN/10.12142/ZTECOM.2022S1005
Dataset | Type | Train | Dev | Test |
---|---|---|---|---|
People’s Daily | Sentence | 17.6k | 0.9k | 1.7k |
MSRA | Sentence | 46.4k | Null | 4.4k |
Table 1 Statistics of datasets
Dataset | Type | Train | Dev | Test |
---|---|---|---|---|
People’s Daily | Sentence | 17.6k | 0.9k | 1.7k |
MSRA | Sentence | 46.4k | Null | 4.4k |
Layer | Parameter | Value |
---|---|---|
BERT | Transformer layer number | 12 |
Hidden layer dimension | 768 | |
Head number | 12 | |
BiLSTM | Optimizer | Adam |
Batch size | 32 | |
Dropout rate | 0.5 | |
Learning rate | 0.001 5 | |
Hidden layer number | 200 |
Table 2 Optimal hyper-parameter values of BERT-BiLSTM-ATT-CRF model
Layer | Parameter | Value |
---|---|---|
BERT | Transformer layer number | 12 |
Hidden layer dimension | 768 | |
Head number | 12 | |
BiLSTM | Optimizer | Adam |
Batch size | 32 | |
Dropout rate | 0.5 | |
Learning rate | 0.001 5 | |
Hidden layer number | 200 |
Model | P/% | R/% | F1/% |
---|---|---|---|
LSTM-CRF | 84.20 | 80.20 | 82.00 |
BiLSTM | 81.08 | 79.21 | 80.05 |
BiLSTM-CRF | 87.21 | 83.21 | 85.09 |
BERT-BiLSTM-CRF | 96.04 | 95.30 | 95.67 |
BERT-BiLSTM-ATT-CRF | 96.28 | 95.67 | 95.97 |
Table 3 Test results on People’s Daily corpus
Model | P/% | R/% | F1/% |
---|---|---|---|
LSTM-CRF | 84.20 | 80.20 | 82.00 |
BiLSTM | 81.08 | 79.21 | 80.05 |
BiLSTM-CRF | 87.21 | 83.21 | 85.09 |
BERT-BiLSTM-CRF | 96.04 | 95.30 | 95.67 |
BERT-BiLSTM-ATT-CRF | 96.28 | 95.67 | 95.97 |
Model | P/% | R/% | F1/% |
---|---|---|---|
LSTM-CRF | 83.45 | 80.20 | 82.00 |
BiLSTM | 78.72 | 79.21 | 80.05 |
BiLSTM-CRF | 86.79 | 83.21 | 85.09 |
BERT-BiLSTM-CRF | 94.38 | 94.92 | 94.65 |
BERT-BiLSTM-ATT-CRF | 94.52 | 95.02 | 94.77 |
Table 4 Test results on MSRA corpus
Model | P/% | R/% | F1/% |
---|---|---|---|
LSTM-CRF | 83.45 | 80.20 | 82.00 |
BiLSTM | 78.72 | 79.21 | 80.05 |
BiLSTM-CRF | 86.79 | 83.21 | 85.09 |
BERT-BiLSTM-CRF | 94.38 | 94.92 | 94.65 |
BERT-BiLSTM-ATT-CRF | 94.52 | 95.02 | 94.77 |
Model | P/% | R/% | F1/% |
---|---|---|---|
CHEN et al. (2006)[ | 91.22 | 81.71 | 86.20 |
ZHANG et al. (2006)[ | 92.20 | 90.18 | 91.18 |
ZHOU et al. (2013)[ | 91.86 | 88.75 | 90.28 |
LU et al. (2016)[ | NULL | NULL | 87.94 |
Radical-BiLSTM-CRF (2016)[ | 91.28 | 90.62 | 90.95 |
IDCNN-CRF (2017)[ | 89.39 | 84.64 | 86.95 |
Lattice-LSTM-CRF (2018)[ | 93.57 | 92.79 | 93.18 |
CNN-BiLSTM-CRF(2019)[ | 91.63 | 90.56 | 91.09 |
WC-LSTM-pertain (2019)[ | Null | Null | 93.74 |
BERT-IDCNN-CRF (2020)[ | 94.86 | 93.97 | 94.41 |
BERT-BiLSTM-CRF (2020)[ | 94.38 | 94.92 | 94.65 |
HanLP (BERT)[ | 94.79 | 95.65 | 95.22 |
BERT-BiLSTM-ATT-CRF | 94.52 | 95.02 | 94.77 |
Table 5 Different models compared on MSRA corpus
Model | P/% | R/% | F1/% |
---|---|---|---|
CHEN et al. (2006)[ | 91.22 | 81.71 | 86.20 |
ZHANG et al. (2006)[ | 92.20 | 90.18 | 91.18 |
ZHOU et al. (2013)[ | 91.86 | 88.75 | 90.28 |
LU et al. (2016)[ | NULL | NULL | 87.94 |
Radical-BiLSTM-CRF (2016)[ | 91.28 | 90.62 | 90.95 |
IDCNN-CRF (2017)[ | 89.39 | 84.64 | 86.95 |
Lattice-LSTM-CRF (2018)[ | 93.57 | 92.79 | 93.18 |
CNN-BiLSTM-CRF(2019)[ | 91.63 | 90.56 | 91.09 |
WC-LSTM-pertain (2019)[ | Null | Null | 93.74 |
BERT-IDCNN-CRF (2020)[ | 94.86 | 93.97 | 94.41 |
BERT-BiLSTM-CRF (2020)[ | 94.38 | 94.92 | 94.65 |
HanLP (BERT)[ | 94.79 | 95.65 | 95.22 |
BERT-BiLSTM-ATT-CRF | 94.52 | 95.02 | 94.77 |
1 |
GRIDACH M. Character-level neural network for biomedical named entity recognition [J]. Journal of biomedical informatics, 2017, 70: 85–91. DOI: 10.1016/j.jbi.2017.05.002
DOI URL |
2 |
LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2016. DOI: 10.18653/v1/n16-1030
DOI URL |
3 | MA X Z, HOVY E. End-to-end sequence labeling via Bi-directional LSTM-CNNs-CRF [C]//54th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2016: 1064–1074 |
4 |
SHIN Y, LEE S G. Learning context using segment-level LSTM for neural sequence labeling [J]. IEEE/ACM transactions on audio, speech, and language processing, 2020, 28: 105–115. DOI: 10.1109/TASLP.2019.2948773
DOI URL |
5 |
DONG D Z, OUYANG S. Optimization Techniques of Network Communication in Distributed Deep Learning Systems [J]. ZTE technology journal, 2020, 26(5): 2-8. DOI:10.12142/ZTETJ.202005002
DOI |
6 |
HAMMERTON J. Named entity recognition with long short-term memory [C]//Proceedings of the seventh conference on natural language learning at HLT-NAACL. Association for Computational Linguistics, 2003: 172–175. DOI: 10.3115/1119176.1119202
DOI URL |
7 |
CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs [J]. Transactions of the association for computational linguistics, 2016, 4: 357–370. DOI: 10.1162/tacl_a_00104
DOI URL |
8 |
LI L S, GUO Y K. Biomedical named entity recognition based on CNN-BLSTM-CRF model [J]. Journal of Chinese information processing, 2018, 32(1): 116–122. DOI: 10.3969/j.issn.1003-0077.2018.01.015
DOI URL |
9 |
LUO L, YANG Z H, YANG P, et al. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition [J]. Bioinformatics, 2017, 34(8): 1381–1388. DOI: 10.1093/bioinformatics/btx761
DOI URL |
10 |
WU F Z, LIU J X, WU C H, et al. Neural Chinese named entity recognition via CNN-LSTM-CRF and joint training with word segmentation [C]//The World Wide Web Conference. ACM, 2019: 3342–3348. DOI: 10.1145/3308558.3313743
DOI URL |
11 | QIN Y, SHEN G W, ZHAO W B, et al. Network security entity recognition method based on deep neural network [J]. Journal of Nanjing university (natural science), 2019, 55 (1): 29–40 |
12 | ZHANG Y, YANG J. Chinese NER using lattice LSTM [EB/OL]. (2018-07-05)[2020-05-01]. |
13 |
WANG L, XIE Y, ZHOU J S, et al. Fragment level Chinese named entity recognition based on neural network [J]. Journal of Chinese information processing, 2018, 32 (3): 84 – 90, 100. DOI: 10. 3969/j.issn.1003-0077.2018.03.012
DOI URL |
14 | LIU X J, GU L C, SHI X Z. Named entity recognition based on BiLSTM and attention mechanism [J]. Journal of luoyang institute of technology, 2019, 29 (1): 65 – 70 |
15 | LIU W, XU T G, XU Q H, et al. An encoding strategy based word-character LSTM for Chinese NER [C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2019: 2379–2389 |
16 | DEVLIN J, CHANG M W, LEE K, et al. BERT: pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2018-10-11)[2020-05-01]. |
17 | MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [EB/OL]. (2013-09-07)[2021-05-01]. |
18 |
PENNINGTON J, SOCHER R, MANNING C. Glove: global vectors for word representation [C]//Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2014. DOI: 10.3115/v1/d14-1162
DOI URL |
19 |
PETERS M E, NEUMANN M, IYYER M, et al. Deep Contextualized Word Representations [C]//Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2018: 2227–2237. DOI: 10.18653/v1/N18-1202
DOI URL |
20 | VASWANI A, SHAZEER N, PARMAR N. et al. Attention is all you need [C]//Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. NIPS, 2017: 5998–6008 |
21 | JOZEFOWICZ R, ZAREMBA W, SUTSKEVER I. An empirical exploration of recurrent network architectures [C]//32nd International Conference on Machine Learning. JMLR, 2015: 2342–2350 |
22 |
GUO D, ZHENG Q F, PENG X J, et al. Face detection detection, alignment alignment, quality assessment and attribute analysis with multi-task hybrid convolutional neural networks [J]. ZTE Communications, 2019, 17(3): 15–22. DOI: 10.12142/ZTECOM.201903004
DOI URL |
23 |
GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J]. Neural networks, 2005, 18(5/6): 602–610. DOI: 10.1016/j.neunet.2005.06.042
DOI URL |
24 | TAN Z X, WANG M X, XIE J, et al. Deep semantic role labeling with self-attention [EB/OL]. (2017-12-05)[2020-05-01]. |
25 | SHEN T, ZHOU T Y, LONG G D, et al. DiSAN: directional self-attention network for RNN/CNN-free language understanding [EB/OL]. (2017-11-20)[2020-05-01]. |
26 | LAFFERTY J, MCCALLUM A, PEREIRA F. Conditional random fields: probabilistic models for segmenting and labeling sequence data [C]//18th International Conference on Machine Learning 2001 (ICML2001. ACM, 2001: 282–289 |
27 | ZHU Y Y, WANG G X, KARLSSON B F. CAN-NER: Convolutional attention network for Chinese named entity recognition [EB/OL]. (2019-04-30)[2020-05-01]. |
28 |
VITERBI A. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm [J]. IEEE transactions on information theory, 1967, 13(2): 260–269. DOI: 10.1109/TIT.1967.1054010
DOI URL |
29 | SI N W, WANG H J, LI W, et al. Chinese part of speech tagging model based on attentional long-term memory network [J]. Computer science, 2018, 45 (4): 66–70 |
30 | LEVOW G A. The third international Chinese language processing bakeoff: word segmentation and named entity recognition [C]//Fifth SIGHAN Workshop on Chinese Language Processing. Association for Computational Linguistics, 2006: 108–117 |
31 | CHEN A T, PENG F C, SHAN R, et al. Chinese named entity recognition with conditional probabilistic models [C]//Fifth SIGHAN Workshop on Chinese Language Processing. Association for Computational Linguistics, 2006: 173–176 |
32 | ZHANG S X, QIN Y, WEN J, et al. Word segmentation and named entity recognition for sighan bakeoff3 [C]//Fifth SIGHAN Workshop on Chinese Language Processing. Association for Computational Linguistics, 2013: 158–161 |
33 | ZHOU J S, QU W G, ZHANG F. Chinese named entity recognition via joint identification and categorization [J]. Chinese journal of electronics, 2013, 22(2): 225–230 |
34 | LU Y N, ZHANG Y, JI D H. Multiprototype Chinese character embedding [C]//Tenth International Conference on Language Resources and Evaluation. Association for Computational Linguistics, 2016: 855-859 |
35 |
DONG C H, ZHANG J J, ZONG C Q, et al. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition [M]//Natural Language Understanding and Intelligent Applications. Cham, witzerland: Springer International Publishing, 2016: 239–250. DOI: 10.1007/978-3-319-50496-4_20
DOI URL |
36 | LI N, GUAN H M, YANG P, et al. Chinese named entity recognition method based on BERT-IDCNN-CRF [J]. Journal of shandong university (science edition), 2020, 55 (1): 102–109 |
37 | XIE T, YANG J N, LIU H. Chinese entity recognition based on BERT-BiLSTM-CRF model [J]. Computer systems & applications, 2020(7): 48–55 |
38 | HE H. HanLP: Han language processing [EB/OL]. (2020-04-30)[2020-07-01]. |
[1] | DENG Letian, ZHAO Yanru. Deep Learning-Based Semantic Feature Extraction: A Literature Review and Future Directions [J]. ZTE Communications, 2023, 21(2): 11-17. |
[2] | CHEN Liangqin, TIAN Liping, XU Zhimeng, CHEN Zhizhang. A Survey of Wi-Fi Sensing Techniques with Channel State Information [J]. ZTE Communications, 2020, 18(3): 57-63. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||