Optimizing Random Forest Parameters with Hyperparameter Tuning for Classifying School-Age KIP Eligibility in West Java

Silfiana Lis Setyowati, Asyifah Qalbi, Rafika Aristawidya, Bagus Sartono, Aulia Rizki Firdawanti

Abstract


Random Forest is an ensemble learning algorithm that combines multiple decision trees to generate a more stable and accurate classification model. This study aims to optimize Random Forest parameters for classifying school-age students' eligibility for the Kartu Indonesia Pintar (KIP) in West Java, based on economic factors. The research uses secondary data from the 2023 National Socio-Economic Survey (SUSENAS) of West Java, with a sample size of 13,044 individuals. To address class imbalance, Synthetic Minority Oversampling Technique (SMOTE) is applied. Hyperparameter tuning through grid search identifies the optimal combination of parameters, including the number of trees (ntree), random variables per split (mtry), and terminal node size (node_size). Model performance is evaluated using balanced accuracy, sensitivity, and specificity. Results indicate that the optimal parameters (mtry = 5, ntree = 674, node_size = 26) yield a balanced accuracy of 65.47%. Significant variables include PKH status, floor area of the house, source of drinking water, and building material type. The model accurately identifies students in need of educational assistance. In conclusion, optimizing Random Forest parameters improves the accuracy of KIP eligibility classification, supporting educational equity policies in West Java. These findings provide a foundation for developing more effective beneficiary selection systems for educational aid.

Keywords


Kartu Indonesia Pintar (KIP); Random Forest; SMOTE; Optimal Parameter; Hyperparameter tuning

Full Text:

PDF

References


Badan Pusat Statistik (BPS), “Statistik Pendidikan Indonesia: Data Angka Partisipasi Sekolah,†2023. [online]. Available: https://www.bps.go.id.

Kementerian Pendidikan dan Kebudayaan (Kemendikbud), “Laporan Penurunan Angka Putus Sekolah melalui Program Indonesia Pintar,†2018. [online]. Available: https://jendela.kemdikbud.go.id.

Badan Pusat Statistik (BPS), “Angka Partisipasi Kasar (APK) dan Angka Partisipasi Murni (APM) pada berbagai jenjang pendidikan di Indonesia,†2014. [online]. Available: https://www.bps.go.id.

Badan Pusat Statistik (BPS), “Angka Partisipasi Kasar (APK) dan Angka Partisipasi Murni (APM) pada berbagai jenjang pendidikan di Indonesia,†2018. [online]. Available: https://www.bps.go.id.

P. Nabillah, I. Permana, M. Afdal, F. Muttakin, and A. Marsal, “A Comparative Study of the Performance of KNN, NBC, C4. 5, and Random Forest Algorithms in Classifying Beneficiaries of the Kartu Indonesia Sehat Program,†JUSIFO (Jurnal Sistem Informasi), vol. 10, no. 1, pp. 17–26, 2024.

N. T. Luchia, M. Mustakim, N. Noviarni, K. Sussolaikah, and T. Arifianto, “Feature Selection In Support Vector Machine And Random Forest Algorithms For The Classification Of Recipients Of The Smart Indonesia Program,†in 2024 International Conference on Circuit, Systems and Communication (ICCSC), IEEE, 2024, pp. 1–6.

Erlin, “Optimasi Parameter Random Forest pada Dataset Tidak Seimbang. Jurnal Ilmu Komputer,†Jurnal Ilmu Komputer, vol. 8, no. 2, pp. 87–96, 2022.

D. Mualfah, W. Fadila, and R. Firdaus, “Teknik SMOTE untuk Mengatasi Imbalance Data pada Deteksi Penyakit Stroke Menggunakan Algoritma Random Forest,†Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 2, pp. 107–113, 2022.

X. Tan et al., “Wireless sensor networks intrusion detection based on SMOTE and the random forest algorithm,†Sensors, vol. 19, no. 1, p. 203, 2019, doi: doi.org/10.3390/s19010203.

K. I. Sundus, B. H. Hammo, M. B. Al-Zoubi, and A. Al-Omari, “Solving the multicollinearity problem to improve the stability of machine learning algorithms applied to a fully annotated breast cancer dataset,†Inform Med Unlocked, vol. 33, p. 101088, 2022.

A. Fernández, S. Garcia, F. Herrera, and N. V Chawla, “SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary,†Journal of artificial intelligence research, vol. 61, pp. 863–905, 2018, doi: 10.1613/jair.1.11192.

P. Probst, M. N. Wright, and A. Boulesteix, “Hyperparameters and tuning strategies for random forest,†Wiley Interdiscip Rev Data Min Knowl Discov, vol. 9, no. 3, p. e1301, 2019, doi: 10.1002/widm.1301.

Q. Wu, Y. Ye, H. Zhang, M. K. Ng, and S.-S. Ho, “ForesTexter: An efficient random forest algorithm for imbalanced text categorization,†Knowl Based Syst, vol. 67, pp. 105–116, 2014, doi: 10.1016/j.knosys.2014.06.004.

G. Louppe, “Understanding random forests: From theory to practice,†arXiv preprint arXiv:1407.7502, 2014, doi: 10.48550/arXiv.1407.7502.

L. Breiman, “Random forests,†Mach Learn, vol. 45, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

A. F. Anjani, D. Anggraeni, and I. M. Tirta, “Implementasi Random Forest Menggunakan SMOTE untuk Analisis Sentimen Ulasan Aplikasi Sister for Students UNEJ,†Jurnal Nasional Teknologi Dan Sistem Informasi, vol. 9, no. 2, pp. 163–172, 2023.




DOI: https://doi.org/10.37905/jjom.v7i1.28736



Copyright (c) 2025 Silfiana Lis Setyowati, Asyifah Qalbi, Rafika Aristawidya, Bagus Sartono, Aulia Rizki Firdawanti

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Jambura Journal of Mathematics has been indexed by

>>>More Indexing<<<


Creative Commons License

Jambura Journal of Mathematics (e-ISSN: 2656-1344) by Department of Mathematics Universitas Negeri Gorontalo is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Powered by Public Knowledge Project OJS. 


Editorial Office


Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Negeri Gorontalo
Jl. Prof. Dr. Ing. B. J. Habibie, Moutong, Tilongkabila, Kabupaten Bone Bolango, Gorontalo, Indonesia
Email: info.jjom@ung.ac.id.