AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning

Phasit Charoenkwan; Saeed Ahmed; Chanin Nantasenamat; Julian M.W. Quinn; Mohammad Ali Moni; Pietro Lio’; Watshara Shoombuatong

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/73378

Full metadata record

DC Field	Value	Language
dc.contributor.author	Phasit Charoenkwan	en_US
dc.contributor.author	Saeed Ahmed	en_US
dc.contributor.author	Chanin Nantasenamat	en_US
dc.contributor.author	Julian M.W. Quinn	en_US
dc.contributor.author	Mohammad Ali Moni	en_US
dc.contributor.author	Pietro Lio’	en_US
dc.contributor.author	Watshara Shoombuatong	en_US
dc.date.accessioned	2022-05-27T08:40:42Z	-
dc.date.available	2022-05-27T08:40:42Z	-
dc.date.issued	2022-12-01	en_US
dc.identifier.issn	20452322	en_US
dc.identifier.other	2-s2.0-85129950097	en_US
dc.identifier.other	10.1038/s41598-022-11897-z	en_US
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85129950097&origin=inward	en_US
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/73378	-
dc.description.abstract	Amyloid proteins have the ability to form insoluble fibril aggregates that have important pathogenic effects in many tissues. Such amyloidoses are prominently associated with common diseases such as type 2 diabetes, Alzheimer's disease, and Parkinson's disease. There are many types of amyloid proteins, and some proteins that form amyloid aggregates when in a misfolded state. It is difficult to identify such amyloid proteins and their pathogenic properties, but a new and effective approach is by developing effective bioinformatics tools. While several machine learning (ML)-based models for in silico identification of amyloid proteins have been proposed, their predictive performance is limited. In this study, we present AMYPred-FRL, a novel meta-predictor that uses a feature representation learning approach to achieve more accurate amyloid protein identification. AMYPred-FRL combined six well-known ML algorithms (extremely randomized tree, extreme gradient boosting, k-nearest neighbor, logistic regression, random forest, and support vector machine) with ten different sequence-based feature descriptors to generate 60 probabilistic features (PFs), as opposed to state-of-the-art methods developed by a single feature-based approach. A logistic regression recursive feature elimination (LR-RFE) method was used to find the optimal m number of 60 PFs in order to improve the predictive performance. Finally, using the meta-predictor approach, the 20 selected PFs were fed into a logistic regression method to create the final hybrid model (AMYPred-FRL). Both cross-validation and independent tests showed that AMYPred-FRL achieved superior predictive performance than its constituent baseline models. In an extensive independent test, AMYPred-FRL outperformed the existing methods by 5.5% and 16.1%, respectively, with accuracy and MCC of 0.873 and 0.710. To expedite high-throughput prediction, a user-friendly web server of AMYPred-FRL is freely available at http://pmlabstack.pythonanywhere.com/AMYPred-FRL. It is anticipated that AMYPred-FRL will be a useful tool in helping researchers to identify new amyloid proteins.	en_US
dc.subject	Multidisciplinary	en_US
dc.title	AMYPred-FRL is a novel approach for accurate prediction of amyloid proteins by using feature representation learning	en_US
dc.type	Journal	en_US
article.title.sourcetitle	Scientific Reports	en_US
article.volume	12	en_US
article.stream.affiliations	Department of Computer Science and Technology	en_US
article.stream.affiliations	The University of Queensland	en_US
article.stream.affiliations	Mahidol University	en_US
article.stream.affiliations	Garvan Institute of Medical Research	en_US
article.stream.affiliations	Chiang Mai University	en_US
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record