In the field of information security, a gap exists in the study of coreference resolution of entities. A hybrid method is proposed to solve the problem of coreference resolution in information security. The work consists of two parts: the first extracts all candidates (including noun phrases, pronouns, entities, and nested phrases) from a given document and classifies them; the second is coreference resolution of the selected candidates. In the first part, a method combining rules with a deep learning model (Dictionary BiLSTM-Attention-CRF, or DBAC) is proposed to extract all candidates in the text and classify them. In the DBAC model, the domain dictionary matching mechanism is introduced, and new features of words and their contexts are obtained according to the domain dictionary. In this way, full use can be made of the entities and entity-type information contained in the domain dictionary, which can help solve the recognition problem of both rare and long entities. In the second part, candidates are divided into pronoun candidates and noun phrase candidates according to the part of speech, and the coreference resolution of pronoun candidates is solved by making rules and coreference resolution of noun phrase candidates by machine learning. Finally, a dataset is created with which to evaluate our methods using information security data. The experimental results show that the proposed model exhibits better performance than the other baseline models.
Bibliographical noteFunding Information:
Funding Statement: This work was supported by the National Natural Science Foundation of China (grant no. 61602515).
© 2020 Tech Science Press. All rights reserved.
Copyright 2020 Elsevier B.V., All rights reserved.
- Coreference resolution
- Hybrid method
- Information security