Metagenomics is one of the most prolific “omic” sciences in the context of biological research on environmental microbial communities. The studies related to metagenomics generate high-dimensional, sparse, complex, and biologically rich datasets. In this research, we propose a framework which integrates omics-knowledge to identify suitable-reduced set of microbiome features for gaining insights into functional classification of metagenomic sequences. The proposed approach has been applied to two Use Case studies on: - (1) cattle rumen microbiota samples, differentiating nitrate and vegetable oil treated feed for improving cattle performance and (2) human gut microbiota and classifying them in functionally annotated categories of leanness, obesity, or overweight. A high accuracy of 97.5% and Area Under Curve performance value (AUC) of 0.972 was achieved for classifying Bos taurus, cattle rumen microbiota using Logistic Regression (LR) as classification model as well as feature selector in wrapper based strategy for Use Case 1 and 94.4% accuracy with AUC of 1.000, for Use Case 2 on human gut microbiota. In general, LR classifier with wrapper - LR learner as feature selector, proved to be most robust in our analysis.
- OTUs (Operational Taxonomic Units)
- Machine Learning Classification