Phy-PMRFI : Phylogeny-Aware Prediction of Metagenomic Functions Using Random Forest Feature Importance

Jyotsna Talreja Wassan, Haiying Wang, Fiona Browne, Huiru Zheng

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)
157 Downloads (Pure)

Abstract

High-throughput sequencing techniques have accelerated functional metagenomics studies through the generation of large volumes of ‘omics’ data. The integration of these data using computational approaches is potentially useful for predicting metagenomic functions. Machine learning models can be trained using microbial features (e.g. taxonomical units in human microbiome) which are then used to classify microbial data into different functional classes (e.g. healthy versus diseased states). For analyzing the omics data, features (i.e. the microbial taxas) as well as taxonomical relations between the features are important. The relationships are potentially uncoverable from the phylogenetic tree of microbial taxas. In this paper, we propose a novel integrative framework, namely Phy-PMRFI, driven by phylogeny-based modelling of omics data to predict metagenomic functions by using important features selected by a Random Forest Importance (RFI) strategy. The proposed framework integrates the underlying phylogenetic tree information with abundance measures of microbial species (features) by creating a novel phylogeny and abundance aware matrix structure (PAAM). Phy-PMRFI progresses by ranking the columns of the obtained matrix (i.e. the microbial features) by using the RFI measure, which are further used as input for the microbiome classification. The resultant feature set enhances the performance of the most popular state-of-art methods such as Support Vector Machines. Our proposed integrative framework also outperforms the state-of-the-art pipeline of Phylogenetic Isometric Log-Ratio Transform (PhILR) and MetaPhyl (e.g. obtaining 90 % accurate predictions with Phy-PMRFI over human throat microbiome in comparison to other approaches of PhILR with 53% and MetaPhyl with 71% Accuracy).
Original languageEnglish
Pages (from-to)273-282
Number of pages9
JournalIEEE Transactions on Nanobioscience
Volume18
Issue number3
Early online date24 Apr 2019
DOIs
Publication statusPublished (in print/issue) - 28 Jun 2019

Keywords

  • Metagenomics
  • Phylogeny
  • Classification
  • Machine Learning (ML)
  • operational Taxonomic Units (OTUs)
  • Random Forest Importance (RFI)

Fingerprint

Dive into the research topics of 'Phy-PMRFI : Phylogeny-Aware Prediction of Metagenomic Functions Using Random Forest Feature Importance'. Together they form a unique fingerprint.

Cite this