Knowledge-driven graph similarity for text classification

Niloofer Shanavas, Hui Wang, Zhiwei Lin, Glenn Hawe

Research output: Contribution to journalArticlepeer-review

9 Downloads (Pure)

Abstract

Automatic text classification using machine learning is significantlyaffected by the text representation model. The structural information in textis necessary for natural language understanding, which is usually ignored invector-based representations. In this paper, we present a graph kernel-basedtext classification framework which utilises the structural information in texteffectively through the weighting and enrichment of a graph-based representation.We introduce weighted co-occurrence graphs to represent text documents,which weight the terms and their dependencies based on their relevance to textclassification. We propose a novel method to automatically enrich the weightedgraphs using semantic knowledge in the form of a word similarity matrix. Thesimilarity between enriched graphs, knowledge-driven graph similarity, is calculated using a graph kernel. The semantic knowledge in the enriched graphsensures that the graph kernel goes beyond exact matching of terms and patternsto compute the semantic similarity of documents. In the experimentson sentiment classification and topic classification tasks, our knowledge-drivensimilarity measure significantly outperforms the baseline text similarity measureson five benchmark text classification datasets.
Original languageEnglish
Pages (from-to)1-15
Number of pages15
JournalInternational Journal of Machine Learning and Cybernetics
Volume0
Early online date19 Nov 2020
DOIs
Publication statusE-pub ahead of print - 19 Nov 2020

Keywords

  • automatic text classification
  • document similarity measure
  • graph-based text representation
  • graph enrichment
  • graph kernels
  • supervised term weighting
  • SVM

Fingerprint Dive into the research topics of 'Knowledge-driven graph similarity for text classification'. Together they form a unique fingerprint.

Cite this