การปรับค่าเสียหายในการจำแนกประเภทผิดเพื่อการเรียนรู้ที่อ่อนไหวต่อค่าเสียหายบนชุดข้อมูลชีวการแพทย์ที่ไม่สมดุล

ปริญญา  ปันสิน

Please use this identifier to cite or link to this item: http://cmuir.cmu.ac.th/jspui/handle/6653943832/73868

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	จักรเมธ บุตรกระจ่าง	-
dc.contributor.author	ปริญญา ปันสิน	en_US
dc.date.accessioned	2022-08-16T01:00:38Z	-
dc.date.available	2022-08-16T01:00:38Z	-
dc.date.issued	2022-06	-
dc.identifier.uri	http://cmuir.cmu.ac.th/jspui/handle/6653943832/73868	-
dc.description.abstract	The data where number of examples in each class differ significantly or imbalance data can be found in various application domains. The traditional supervised learning following the Empirical Risk Minimisation principle, which minimises the misclassification regardless of the types of error, often yields a classification model that generalises poorly on the minority class. Cost-sensitive learning is one of the promising approaches to introducing inductive bias into the model for imbalance data classification. This thesis the aim comparative study of misclassification cost and initial weight assignment strategies for AdaBoost. And bring about to propose method for automatically determine suitably cost of misclassification and initial weight. In this thesis, we studied three strategies for determining misclassification costs for an imbalance dataset and incorporated such costs into a cost- sensitive AdaBoost algorithm. The strategies consist of Imbalance Ratio which is determine misclassification cost from ratio of each class instance, Grid Search which is find expected parameter procedure for learning step and Distribution Correction that is modify the initial weight by sample size in those target class. Apply whole strategies with Cost-Sensitive AdaBoost. The experimental results based on five imbalance biomedical testbeds. The results are appear the imbalance ratio strategy seemed to over- estimate the misclassification costs and as a result yielded a model which is too focused on the minority class. The grid search improved upon the traditional AdaBoost on some datasets but is still comparable to AdaBoost overall. And the distribution correction strategy seemed to outperform all other strategies. It is therefore recommended that the proposed distribution correction method is the most effective strategy in terms of imbalance-aware performance measures.	en_US
dc.language.iso	other	en_US
dc.publisher	เชียงใหม่ : บัณฑิตวิทยาลัย มหาวิทยาลัยเชียงใหม่	en_US
dc.subject	ปรับค่าเสียหายในการจำแนกประเภทผิด	en_US
dc.subject	Cost sensitive learning	en_US
dc.subject	Misclassification cost assignment	en_US
dc.subject	imbalance data classification	en_US
dc.subject	เรียนรู้ที่อ่อนไหวต่อค่าเสียหาย	en_US
dc.subject	AdaBoost cost sensitive boosting	en_US
dc.subject	ข้อมูลชีวการแพทย์ที่ไม่สมดุล	en_US
dc.subject	Biomedical Data	en_US
dc.title	การปรับค่าเสียหายในการจำแนกประเภทผิดเพื่อการเรียนรู้ที่อ่อนไหวต่อค่าเสียหายบนชุดข้อมูลชีวการแพทย์ที่ไม่สมดุล	en_US
dc.title.alternative	Misclassification cost adjustment for cost-sensitive learning on imbalanced biomedical datasets	en_US
dc.type	Independent Study (IS)
thailis.controlvocab.thash	การวิเคราะห์จัดกลุ่ม -- โปรแกรมคอมพิวเตอร์	-
thailis.controlvocab.thash	เหมืองข้อมูล	-
thailis.controlvocab.thash	คอมพิวเตอร์อัลกอริทึม	-
thesis.degree	master	en_US
thesis.description.thaiAbstract	ข้อมูลที่มีลักษณะจำนวนตัวอย่างในแต่ละคลาสแตกต่างกันหรือข้อมูลที่ไม่สมดุลนั้นสามารถพบได้ในแหล่งข้อมูลต่างๆ ซึ่งโดยพื้นฐานการเรียนรู้แบบมีผู้สอนนั้นจะสอดคล้องตามหลักการ Empirical Risk Minimisation ซึ่งจะลดการจำแนกประเภทผิดโดยไม่คำนึงถึงประเภทการจำแนกผิด ซึ่งตัวแบบการจำแนกข้อมูลที่ได้มักจะมีประสิทธิภาพต่ำกับข้อมูลส่วนน้อย เทคนิคการเรียนรู้ที่อ่อนไหวต่อค่าเสียหาย (Cost-Sensitive Learning) ถือเป็นหนึ่งในแนวทางที่ถูกนำมาใช้เพื่อสร้างตัวแบบการจำแนกสำหรับการจำแนกข้อมูลที่ไม่สมดุล ซึ่งงานวิจัยนี้ได้ศึกษาเปรียบเทียบวิธีการกำหนดค่าเสียหายในการจำแนกประเภทผิดและค่าน้ำหนักเริ่มต้นที่ผนวกใช้กับอัลกอริทึมเอดาบูส และนำไปสู่การนำเสนอวิธีกำหนดค่าเสียหายหรือค่าน้ำหนักเริ่มต้นที่เหมาะสมกับชุดข้อมูลโดยอัตโนมัติ ซึ่งงานวิจัยนี้ได้ศึกษาเทคนิคการกำหนดค่าเสียหายในการจำแนกประเภทผิด 3 แบบคือ เทคนิค Imbalance Radio ซึ่งเป็นวิธีการกำหนดค่าเสียหายจากอัตราส่วนความไม่สมดุลของคลาสข้อมูล เทคนิค Grid Search ที่ใช้หาค่าพารามิเตอร์ที่เหมาะสม และเทคนิค Distribution Correction ที่จะปรับน้ำหนักเริ่มต้นของข้อมูลแต่ละตัวให้อิงกับจำนวนข้อมูลในแต่ละคลาส ซึ่งถือว่าเป็นปรับการกระจายตัวเริ่มต้นของข้อมูลส่วนน้อยแต่ละตัวอย่างให้มีค่าเพิ่มขึ้น และนำเทคนิคเหล่านี้ไปใช้ร่วมกับอัลกอริทึม Cost-Sensitive AdaBoost โดยใช้ข้อมูลชีวการแพทย์ 5 ชุดในการวิจัย ซึ่งได้ผลการวิจัยคือเทคนิค Imbalance Ratio นั้นดูเหมือนว่าทำให้ค่าเสียหายในการจำแนกประเภทข้อมูลผิดนั้นมีค่าสูงกว่าที่ควรจะเป็นส่งผลให้ตัวแบบที่ได้มีประสิทธิด้านความแม่นยำต่ำกว่าวิธีอื่นๆ ส่วนเทคนิค Grid Search นั้นสามารถเพิ่มประสิทธิภาพให้ดีขึ้นในบางชุดข้อมูล แต่ประสิทธิภาพโดยรวมยังเทียบเคียงกับเทคนิคเอดาบูสแบบดั้งเดิม และเทคนิค Distribution Correction ซึ่งมีประสิทธิภาพกับข้อมูลส่วนน้อยที่ดีกว่าการใช้อัลกอริทึมเอดาบูสแบบดั้งเดิมและเทคนิคต่างๆ ที่ใช้ในงานวิจัยอย่างชัดเจน โดยมีประสิทธิภาพสูงสุดซึ่งวัดประสิทธิภาพจากมาตรวัดที่คำนึงถึงความไม่สมดุลของชุดข้อมูล	en_US
Appears in Collections:	SCIENCE: Independent Study (IS)

Files in This Item:

File	Description	Size	Format
parinya_punsin_cost.pdf	การปรับค่าเสียหายในการจำแนกประเภทผิดเพื่อการเรียนรู้ที่อ่อนไหวต่อค่าเสียหายบนชุดข้อมูลชีวการแพทย์ที่ไม่สมดุล	4.09 MB	Adobe PDF	View/Open Request a copy

Show simple item record