SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins

Phasit Charoenkwan; Nalini Schaduangrat; Mohammad Ali Moni; Pietro Lio’; Balachandran Manavalan; Watshara Shoombuatong

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/74727

Full metadata record

DC Field	Value	Language
dc.contributor.author	Phasit Charoenkwan	en_US
dc.contributor.author	Nalini Schaduangrat	en_US
dc.contributor.author	Mohammad Ali Moni	en_US
dc.contributor.author	Pietro Lio’	en_US
dc.contributor.author	Balachandran Manavalan	en_US
dc.contributor.author	Watshara Shoombuatong	en_US
dc.date.accessioned	2022-10-16T06:48:25Z	-
dc.date.available	2022-10-16T06:48:25Z	-
dc.date.issued	2022-07-01	en_US
dc.identifier.issn	18790534	en_US
dc.identifier.issn	00104825	en_US
dc.identifier.other	2-s2.0-85131801539	en_US
dc.identifier.other	10.1016/j.compbiomed.2022.105704	en_US
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85131801539&origin=inward	en_US
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/74727	-
dc.description.abstract	Thermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold cross-validation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://github.com/plenoi/SAPPHIRE.	en_US
dc.subject	Computer Science	en_US
dc.subject	Medicine	en_US
dc.title	SAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteins	en_US
dc.type	Journal	en_US
article.title.sourcetitle	Computers in Biology and Medicine	en_US
article.volume	146	en_US
article.stream.affiliations	Department of Computer Science and Technology	en_US
article.stream.affiliations	The University of Queensland	en_US
article.stream.affiliations	Mahidol University	en_US
article.stream.affiliations	Sungkyunkwan University	en_US
article.stream.affiliations	Chiang Mai University	en_US
Appears in Collections:	CMUL: Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record