Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/74727
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPhasit Charoenkwanen_US
dc.contributor.authorNalini Schaduangraten_US
dc.contributor.authorMohammad Ali Monien_US
dc.contributor.authorPietro Lio’en_US
dc.contributor.authorBalachandran Manavalanen_US
dc.contributor.authorWatshara Shoombuatongen_US
dc.date.accessioned2022-10-16T06:48:25Z-
dc.date.available2022-10-16T06:48:25Z-
dc.date.issued2022-07-01en_US
dc.identifier.issn18790534en_US
dc.identifier.issn00104825en_US
dc.identifier.other2-s2.0-85131801539en_US
dc.identifier.other10.1016/j.compbiomed.2022.105704en_US
dc.identifier.urihttps://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85131801539&origin=inwarden_US
dc.identifier.urihttp://cmuir.cmu.ac.th/jspui/handle/6653943832/74727-
dc.description.abstractThermophilic proteins (TPPs) are important in the field of protein biochemistry and development of new enzymes. Thus, computational methods must be urgently developed to accurately and rapidly identify TPPs. To date, several computational methods have been developed for TPP identification; however, few limitations in terms of performance and utility remain. In this study, we present a novel computational method, SAPPHIRE, to achieve more accurate identification of TPPs using only sequence information without any need for structural information. We combined twelve different feature encodings representing different perspectives and six popular machine learning algorithms to train 72 baseline models and extract the key information of TPPs. Subsequently, the informative predicted probabilities from the baseline models were mined and selected using a genetic algorithm in conjunction with a self-assessment-report approach. Finally, the final meta-predictor, SAPPHIRE, was built and optimized by applying an optimal feature set. The performance of SAPPHIRE in the 10-fold cross-validation test showed that a superior predictive performance compared with several baseline models could be achieved. Moreover, SAPPHIRE yielded an accuracy of 0.942 and Matthew's coefficient correlation of 0.884, which were 7.68 and 5.12% higher than those of the current existing methods, respectively, as indicated by the independent test. The proposed computational approach is anticipated to facilitate large-scale identification of TPPs and accelerate their applications in the food industry. The codes and datasets are available at https://github.com/plenoi/SAPPHIRE.en_US
dc.subjectComputer Scienceen_US
dc.subjectMedicineen_US
dc.titleSAPPHIRE: A stacking-based ensemble learning framework for accurate prediction of thermophilic proteinsen_US
dc.typeJournalen_US
article.title.sourcetitleComputers in Biology and Medicineen_US
article.volume146en_US
article.stream.affiliationsDepartment of Computer Science and Technologyen_US
article.stream.affiliationsThe University of Queenslanden_US
article.stream.affiliationsMahidol Universityen_US
article.stream.affiliationsSungkyunkwan Universityen_US
article.stream.affiliationsChiang Mai Universityen_US
Appears in Collections:CMUL: Journal Articles

Files in This Item:
There are no files associated with this item.


Items in CMUIR are protected by copyright, with all rights reserved, unless otherwise indicated.