Penerapan XGBoost untuk Seleksi Atribut pada K-Means dalam Clustering Penerima KIP Kuliah
Abstract
Pada proses clustering prioritas penerima bantuan Kartu Indonesia Pintar Kuliah dengan algoritma K-Means ada beberapa masalah yang muncul yaitu masalah seleksi atribut yang penting dan penentuan nilai K yang optimum sehingga membuat proses clustering tidak maksimal dan tidak ideal. Masalah pemilihan atribut yang penting akan diselesaikan dengan menggunakan algoritma XGBoost yang terbukti dapat digunakan untuk memecahkan masalah seperti pada proses clustering prioritas penerima bantuan KIP Kuliah. Hasil penelitian menunjukkan bahwa algoritma XGBoost dapat menentukan 3 (tiga) atribut yang paling penting yaitu Pekerjaan Ayah, Penghasilan Ibu dan Luas Bangunan dari 12 (dua belas) atribut yang ada yaitu Pekerjaan Ayah, Pekerjaan Ibu, Penghasilan Ayah, Penghasilan Ibu, Jumlah Tanggungan, Kepemilikan Rumah, Sumber Listrik, Luas Tanah, Luas Bangunan, Sumber Air, MCK, Prestasi dan metode Elbow terbukti dapat menentukan nilai K yang optimum yaitu nilai K=4. Berdasarkan penggunaan 3 (tiga) atribut terbaik dan nilai K=4 sebagai nilai K optimum berhasil didapatkan clustering yang paling maksimal dan ideal dengan nilai index terkecil yaitu 0.819 dengan menggunakan metode pengujian Davies-Bouldin Index.
In the process of clustering the priority of the recipient Indonesian smart school cards with the K-Means algorithm, there are several problems that arise, namely the problem of selecting important attributes and determining the optimal value of K, so that the process is not maximum and is not ideal. Important attribute selection problems will be solved using proven XGBoost algorithm that can be used to solve problems such as in the process of clustering the priority of recipients of school KIP assistance. The results of the research showed that the XGBoost algorithm can determine the 3 (three) most important attributes, namely Father’s Work, Mother’s Production and Building Size from the 12 (twelve) attributs that exist: Father's Job, Mothers’ Work, Fathers’ Income, Mothers’ Revenue, Number of Dependants, Home Ownership, Electrical Resources, Land Area, Building Area, Water Resource, MCK, Performance and Elbow Method proved to determine the optimal K value of K=4. Based on the use of the 3 (three) best attributes and the value of K = 4 as the optimal K value, the maximum and ideal clustering with the smallest index value is 0.819 using the Davies-Bouldin Index test method.
Keywords
Full Text:
PDFReferences
Puslapdik, Pedoman Pendaftaran Kartu Indonesia Pintar Kuliah (KIP Kuliah). Jakarta: Puslapdik, 2021.
“Clustering,” scikit-learn developers. https://scikit-learn.org/stable/modules/clustering.html# (accessed May 20, 2023).
“Algoritma Clustering Data Science Terupdate 2022.” https://dqlab.id/algoritma-clustering-data-science-terupdate-2022 (accessed May 25, 2023).
D. T. Larose, Data Mining Methods and Models. 2006.
R. Primartha, Algoritma Machine Learning. Bandung: Informatika, 2021.
D. T. Yuliana, M. I. A. Fathoni, and N. Kurniawati, “Penentuan Penerima Kartu Indonesia Pintar KIP Kuliah dengan Menggunakan Metode K-Means Clustering,” Focus ACTion Res. Math., vol. 5, no. 1, pp. 127–141, 2022, doi: 10.30762/f.
M. S. Sompa and R. Ishak, “Clustering Tingkat Ekonomi Mahasiswa Calon Penerima Kartu Indonesia Pintar ( KIP ) Kuliah Metode K-Means,” J. BALOK, vol. 1, no. 2, pp. 65–71, 2022.
F. Nuraeni, D. Kurniadi, and G. F. Dermawan, “Pemetaan Karakteristik Mahasiswa Penerima Kartu Indonesia Pintar Kuliah ( KIP-K ) menggunakan Algoritma K-Means ++,” J. SISFOKOM, vol. 11, pp. 437–443, 2023.
T. Chen and C. Guestrin, “XGBoost : A Scalable Tree Boosting System,” 2016.
“XGBoost Documentation,” xgboost developers. https://xgboost.readthedocs.io/en/latest/ (accessed May 15, 2023).
“Clustering Performance Evaluation,” scikit-learn developers. https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation (accessed May 20, 2023).
A. B. H. Kiat, Y. Azhar, and V. Rahmayanti, “Penerapan Metode K-Means Dengan Metode Elbow Untuk Segmentasi Pelanggan Menggunakan Model RFM (Recency, Frequency & Monetary),” Repositor, vol. 2, no. 7, pp. 945–952, 2020.
R. Ishak and Amiruddin, “Clustering Tingkat Pemahaman Dasar Mahasiswa Pada Pra-Perkuliahan Probabilitas Statistika Dengan Metode K-Means,” Jambura J. Electr. Electron. Eng., vol. 4, pp. 65–69, 2022, doi: 10.37905/jjeee.v4i1.11997.
T. Wahyono, Fundamental of Python for Machine Learning. Yogyakarta: Gava Media, 2018.
BAAK UNISAN, “Dataset Pemohon KIP Kuliah,” Gorontalo, 2023.
DOI: https://doi.org/10.37905/jjeee.v5i2.20253
Refbacks
- There are currently no refbacks.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Published by:
Electrical Engineering Department
Faculty of Engineering
State University of Gorontalo
Jenderal Sudirman Street No.6, Gorontalo City, Gorontalo Province, Indonesia
Telp. 0435-821175; 081340032063
Email: redaksijjeee@ung.ac.id/redaksijjeee@gmail.com
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.