TY - JOUR
T1 - MEIM: a multi-source software knowledge entity extraction integration model
AU - Lv, Wuqian
AU - Liao, Zhifang
AU - Liu, Shengzong
AU - Zhang, Yan
N1 - Funding Information:
Acknowledgement: The works that are described in this paper are supported by Ministry of Science and Technology: Key Research and Development Project (2018YFB003800), Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and Technology (Hunan University of Finance and Economics) 2017TP1025 and HNNSF 2018JJ2535. We are also grateful to corresponding author Shengzong Liu and his project NSF61802120.
PY - 2021
Y1 - 2021
N2 - Entity recognition and extraction are the foundations of knowledge graph construction. Entity data in the field of software engineering come from different platforms and communities, and have different formats. This paper divides multi-source software knowledge entities into unstructured data, semi-structured data and code data. For these different types of data, Bi-directional Long Short- Term Memory (Bi-LSTM) with Conditional Random Field (CRF), template matching, and abstract syntax tree are used and integrated into a multi-source software knowledge entity extraction integration model (MEIM) to extract software entities. The model can be updated continuously based on user's feedbacks to improve the accuracy. To deal with the shortage of entity annotation datasets, keyword extraction methods based on Term Frequency-Inverse Document Frequency (TF-IDF), TextRank, and K-Means are applied to annotate tasks. The proposed MEIM model is applied to the Spring Boot framework, which demonstrates good adaptability. The extracted entities are used to construct a knowledge graph, which is applied to association retrieval and association visualization.
AB - Entity recognition and extraction are the foundations of knowledge graph construction. Entity data in the field of software engineering come from different platforms and communities, and have different formats. This paper divides multi-source software knowledge entities into unstructured data, semi-structured data and code data. For these different types of data, Bi-directional Long Short- Term Memory (Bi-LSTM) with Conditional Random Field (CRF), template matching, and abstract syntax tree are used and integrated into a multi-source software knowledge entity extraction integration model (MEIM) to extract software entities. The model can be updated continuously based on user's feedbacks to improve the accuracy. To deal with the shortage of entity annotation datasets, keyword extraction methods based on Term Frequency-Inverse Document Frequency (TF-IDF), TextRank, and K-Means are applied to annotate tasks. The proposed MEIM model is applied to the Spring Boot framework, which demonstrates good adaptability. The extracted entities are used to construct a knowledge graph, which is applied to association retrieval and association visualization.
KW - Entity extraction
KW - Software data
KW - Software knowledge graph
U2 - 10.32604/cmc.2020.012478
DO - 10.32604/cmc.2020.012478
M3 - Article
AN - SCOPUS:85096494701
SN - 1546-2218
VL - 66
SP - 1027
EP - 1042
JO - Computers, Materials and Continua
JF - Computers, Materials and Continua
IS - 1
ER -