pp. 4489-4504
S&M2424 Research Paper of Special Issue https://doi.org/10.18494/SAM.2020.3127 Published: December 29, 2020 Rapid Extraction of Research Areas from Scientific and Technological Literature [PDF] Chuan Yin, Wanzeng Liu, Duoduo Yin, Xi Zhai, Kexin Liu, Changfeng Jing, and He Huang (Received September 29, 2020; Accepted December 8, 2020) Keywords: smart city, knowledge extraction, study area extraction, BiLSTM-CRF, random forest model
Along with the rapid development of Internet Plus, big data, and other technologies, the construction of smart cities is promoting the transformation and upgrading of mapping geographic information models from traditional information services to intelligent services with spatial sensing. At present, however, most of the knowledge needed to provide intelligent services is implicit in the form of unstructured text in various books and journal papers in related fields, which is difficult to capture, use, analyze, and share. In particular, geographical feature knowledge is one of the types of knowledge that needs to be extracted urgently. To solve this problem, in this paper, we propose a method for the rapid extraction of research areas from scientific and technological literature abstracts. Firstly, with the help of a general naming entity identification tool, we propose a method of rapidly annotating place-name entities in administrative divisions. Then, combining the bidirectional long short-term memory conditional random field (BiLSTM-CRF) model with a place-name database covering five levels of administrative divisions in China, the identification, disambiguation, and relationship extraction of place names in different administrative divisions are realized. On this basis, the extraction of research areas is regarded as a two-classification problem, feature vectors such as frequency and location are constructed for the names of the extracted administrative divisions, and the classification model is constructed with the random forest algorithm to rapidly extract research areas. The experimental results show that the recognition accuracy of place names in administrative areas in this study is 92.61% and the recognition accuracy of research areas is 90.31%. The results are superior to those of similar algorithms; thus, the proposed method can accurately and rapidly extract research areas.
Corresponding author: Wanzeng LiuThis work is licensed under a Creative Commons Attribution 4.0 International License. Cite this article Chuan Yin, Wanzeng Liu, Duoduo Yin, Xi Zhai, Kexin Liu, Changfeng Jing, and He Huang, Rapid Extraction of Research Areas from Scientific and Technological Literature, Sens. Mater., Vol. 32, No. 12, 2020, p. 4489-4504. |