Mahmud Iwan Solihin, Chan Jin Yuan, Wan Siu Hong, L. Pui, A. Kit, Wafa Hossain, A. Machmudah
{"title":"SPECTROSCOPY DATA CALIBRATION USING STACKED ENSEMBLE MACHINE LEARNING","authors":"Mahmud Iwan Solihin, Chan Jin Yuan, Wan Siu Hong, L. Pui, A. Kit, Wafa Hossain, A. Machmudah","doi":"10.31436/iiumej.v25i1.2796","DOIUrl":null,"url":null,"abstract":"Near infrared spectroscopy (NIRS) is a widely used analytical technique for non-destructive analysis of various materials including food fraud detection. However, the accurate calibration of NIRS data can be challenging due to the complexity of the underlying relationships between the spectral data and the target variables of interest. Ensemble learning, which combines multiple models to make predictions, has been shown to improve the accuracy and robustness of predictive models in various domains. This paper proposes stacking ensemble machine learning (SEML) for calibration of NIRS data with two levels of learning involved. Eight (8) spectroscopy datasets from public repository and previously published works by the authors are used as the case study. The model well generalized the data in the respective regression tasks with of at least »0.8 in the test samples and in the respective classification tasks with classification accuracy (CA) of at least »0.8 also. In addition, the proposed SEML can improve, or at least reach par with, the accuracy of individual base learners in both train and test samples for all cases of regression and classification datasets. It shows superior performance in test samples for both regression and classification datasets with respectively ranging from 0.86 to nearly 1 and CA ranging from 0.89 to 1. ABSTRAK: Spektroskopi inframerah dekat (NIRS) adalah teknik analitikal yang banyak digunakan bagi analisa pelbagai bahan tanpa merosakkan bahan termasuk ketika mengesan penipuan makanan. Walau bagaimanapun, kalibrasi yang tepat bagi data NIRS adalah sangat mencabar kerana hubungan antara data spektral dan pemboleh ubah sasaran yang ingin dikaji bersifat kompleks. Gabungan pembelajaran (Ensemble learning), iaitu gabungan pelbagai model bagi membuat prediksi, telah terbukti dapat meningkatkan ketepatan dan kecekapan model prediksi dalam pelbagai bentuk. Kajian ini mencadangkan Turutan Gabungan Pembelajaran Mesin (Stacking Ensemble Machine Learning ) (SEML), bagi teknik penentu ukuran data NIRS melibatkan dua tahap pembelajaran. Lapan (8) set data spektroskopi dari repositori awam dan kajian terdahulu oleh pengarang telah digunakan sebagai kes kajian. Model ini menggeneralisasi data dalam tugas regresi masing-masing sebanyak ?0.8 bagi sampel ujian dan pengelasan tugas masing-masing dengan ketepatan klasifikasi (CA) sekurang-kurangnya ?0.8. Tambahan, SEML yang dicadangkan ini dapat membantu, atau sekurang-kurangnya setanding dengan ketepatan individu dalam pembelajaran berkumpulan dalam kedua-dua sampel latihan dan ujian bagi semua kes set data regresi dan klasifikasi. Ia menunjukkan prestasi terbaik dalam sampel ujian bagi kedua-dua kumpulan set data regresi dan klasifikasi dengan masing-masing antara 0.86 hingga hampir 1 dan antara julat 0.89 hingga 1 bagi CA.","PeriodicalId":13439,"journal":{"name":"IIUM Engineering Journal","volume":"53 22","pages":""},"PeriodicalIF":0.6000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IIUM Engineering Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31436/iiumej.v25i1.2796","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Near infrared spectroscopy (NIRS) is a widely used analytical technique for non-destructive analysis of various materials including food fraud detection. However, the accurate calibration of NIRS data can be challenging due to the complexity of the underlying relationships between the spectral data and the target variables of interest. Ensemble learning, which combines multiple models to make predictions, has been shown to improve the accuracy and robustness of predictive models in various domains. This paper proposes stacking ensemble machine learning (SEML) for calibration of NIRS data with two levels of learning involved. Eight (8) spectroscopy datasets from public repository and previously published works by the authors are used as the case study. The model well generalized the data in the respective regression tasks with of at least »0.8 in the test samples and in the respective classification tasks with classification accuracy (CA) of at least »0.8 also. In addition, the proposed SEML can improve, or at least reach par with, the accuracy of individual base learners in both train and test samples for all cases of regression and classification datasets. It shows superior performance in test samples for both regression and classification datasets with respectively ranging from 0.86 to nearly 1 and CA ranging from 0.89 to 1. ABSTRAK: Spektroskopi inframerah dekat (NIRS) adalah teknik analitikal yang banyak digunakan bagi analisa pelbagai bahan tanpa merosakkan bahan termasuk ketika mengesan penipuan makanan. Walau bagaimanapun, kalibrasi yang tepat bagi data NIRS adalah sangat mencabar kerana hubungan antara data spektral dan pemboleh ubah sasaran yang ingin dikaji bersifat kompleks. Gabungan pembelajaran (Ensemble learning), iaitu gabungan pelbagai model bagi membuat prediksi, telah terbukti dapat meningkatkan ketepatan dan kecekapan model prediksi dalam pelbagai bentuk. Kajian ini mencadangkan Turutan Gabungan Pembelajaran Mesin (Stacking Ensemble Machine Learning ) (SEML), bagi teknik penentu ukuran data NIRS melibatkan dua tahap pembelajaran. Lapan (8) set data spektroskopi dari repositori awam dan kajian terdahulu oleh pengarang telah digunakan sebagai kes kajian. Model ini menggeneralisasi data dalam tugas regresi masing-masing sebanyak ?0.8 bagi sampel ujian dan pengelasan tugas masing-masing dengan ketepatan klasifikasi (CA) sekurang-kurangnya ?0.8. Tambahan, SEML yang dicadangkan ini dapat membantu, atau sekurang-kurangnya setanding dengan ketepatan individu dalam pembelajaran berkumpulan dalam kedua-dua sampel latihan dan ujian bagi semua kes set data regresi dan klasifikasi. Ia menunjukkan prestasi terbaik dalam sampel ujian bagi kedua-dua kumpulan set data regresi dan klasifikasi dengan masing-masing antara 0.86 hingga hampir 1 dan antara julat 0.89 hingga 1 bagi CA.
期刊介绍:
The IIUM Engineering Journal, published biannually (June and December), is a peer-reviewed open-access journal of the Faculty of Engineering, International Islamic University Malaysia (IIUM). The IIUM Engineering Journal publishes original research findings as regular papers, review papers (by invitation). The Journal provides a platform for Engineers, Researchers, Academicians, and Practitioners who are highly motivated in contributing to the Engineering disciplines, and Applied Sciences. It also welcomes contributions that address solutions to the specific challenges of the developing world, and address science and technology issues from an Islamic and multidisciplinary perspective. Subject areas suitable for publication are as follows: -Chemical and Biotechnology Engineering -Civil and Environmental Engineering -Computer Science and Information Technology -Electrical, Computer, and Communications Engineering -Engineering Mathematics and Applied Science -Materials and Manufacturing Engineering -Mechanical and Aerospace Engineering -Mechatronics and Automation Engineering