Prediction of Pathogenic Proteins in Metagenomic and Genomic Datasets
The identification of virulent proteins in any de-novo sequenced genome is useful in estimating its pathogenic ability and understanding the mechanism of pathogenesis. Similarly, the identification of such proteins could be valuable in comparing the metagenome of healthy and diseased individuals and estimating the proportion of pathogenic species. However, the mechanisms underlying pathogenesis are complex, diverse, species-specific, host-specific, and involve several processes including virulence, adhesion, invasion, secretion and drug resistance. Due to the inherent complexity of these processes, the pathogenic species and the implicated proteins show considerable diversity and often exhibit insignificant similarity with known proteins. Thus, it is difficult to predict the virulent proteins by using homology-based methods such as BLAST. In addition, BLAST is awfully slow which further limits its usability on large genomic and metagenomic datasets. In this scenario, composition or profile-based approaches using Support Vector Machines (SVM) or Hidden Markov Model (HMM) could provide efficient and reliable alternatives.

Motivation to develop a better, comprehensive and more efficient software
The tools which are currently available for the prediction of pathogenic proteins provide limited accuracy and cannot be used on large-scale genomic or metagenomic datasets.
Therefore, we have developed MP3 tool using an integrated SVM-HMM approach to provide improved efficiency and accuracy to predict pathogenic proteins in both genomic and metagenomic datasets. It is available as stand-alone tool as well as a publicly available web-server.
Total Visits Since Inception:

Gupta A, Kapil R, Dhakan DB, Sharma VK (2014): MP3: A Software Tool for the Prediction of Pathogenic Proteins in Genomic and Metagenomic DataPLoS One 9: e93907.