Implementation of K-Prototypes with Feature Selection in Clustering Cervical Cancer Patients based on Risk Factors
Abstract
Cancer is a leading cause of death worldwide, resulting in nearly 10 million deaths or almost one-sixth of all deaths in 2020. Effective primary prevention measures can prevent at least 40% of cancer cases. Cancer mortality rates are higher in developing countries than in developed countries, reflecting disparities in addressing risk factors, detection success, and available treatments. Women in developing countries most frequently suffer from cervical cancer. It is crucial for communities, especially women, to have knowledge about the risk factors for cervical cancer. One potential solution to this issue is the role of machine learning in analyzing cervical cancer patient data. This study uses the K-Prototypes clustering algorithm, which can cluster mixed data, both numerical and categorical. Cervical cancer risk factor data were used in this research. Feature selection was performed to improve the performance of the K-Prototypes algorithm, using feature selection methods Variance Threshold and Correlation Coefficient. The best performance of the K-Prototypes algorithm was obtained using the Correlation Coefficient, as reviewed based on a Silhouette Coefficient of 0.6, a Davies-Bouldin Index of 0.6, and a Calinski-Harabasz Index of 1.080. Interpretation of the clusters formed revealed major differences in the characteristics of risk factors between two clusters, namely age, menopause, and health conditions such as leukorrhea, bleeding, lower abdominal pain, and loss of appetite. Meanwhile, factors related to previous history, reproductive health, and nutritional issues did not show significant differences. The K-Prototypes algorithm is expected to be a solution in identifying groups based on cervical cancer risk factors to assist medical professionals in decision-making and subsequent actions, as well as to provide knowledge to the public.
Keywords
Full Text:
PDFReferences
WHO, “Cervical cancer,” https://www.who.int/news-room/fact-sheets/detail/cervical-cancer, 2024, Accesed on 7 February.
N. Fitriyati, S. A. Faizah, and T. E. Sutanto, “Prediction of the change rate of tumor cells, healthy host cells, and effector immune cells in a three-dimensional cancer model using extended kalman filter,” Jambura Journal of Biomathematics (JJBM), vol. 5, no. 1, pp. 27–37, 2024. DOI:10.37905/jjbm.v5i1.24672
Misgiyanto and D. Susilawati, “Hubungan antara dukungan keluarga dengan tingkat kecemasan penderita kanker serviks paliatif,” Jurnal Keperawatan, vol. 5, no. 1, pp. 1–15, 2014. DOI:10.22219/jk.v5i1.1855
I. Rasjidi, “Epidemiologi kanker serviks,” Indonesian Journal of Cancer, vol. 3, no. 3, 2009. DOI:10.33371/ijoc.v3i3.123
F. Hardiyanti, J. Harlan, and E. Hermawati, “The association between knowledge and preventive behavior of cervical cancer among woman employees in the companies in jakarta,” Indonesian Journal of Cancer, vol. 14, no. 1, pp. 8–15, 2020. DOI:10.33371/ijoc.v14i1.666
M. K. R. INDONESIA, “Pedoman nasional pelayanan kedokteran tata laksana kanker serviks,” Ministry of Health of the Republic of Indonesia, 2018.
D. Y. C. Sogukkuyu and O. Ata, “Diagnosing cervical cancer using machine learning methods,” in HORA 2022 - 4th International Congress on Human-Computer Interaction, Optimization and Robotic Applications, Proceedings, pp. 1–3, 2022. DOI:10.1109/HORA55278.2022.9800033
P. Gupta, I. Jindal, and A. Goyal, “Early detection and prevention of cervical cancer,” in 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), pp. 1–4, 2019. DOI:10.1109/I2CT45611.2019.9033800
S. Widodo, H. Brawijaya, and S. Samudi, “Clustering kanker serviks berdasarkan perbandingan euclidean dan manhattan menggunakan metode k-means,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 2, p. 687, 2021. DOI:10.30865/mib.v5i2.2947
R. M. F. Lubis et al., “Data clustering mining applying the k-means algorithm, cervical cancer behavior risk,” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 7, no. 2, p. 819, 2023. DOI:10.30865/mib.v7i2.6088
E. S. Setianingsih et al., “Clustering of risk factors for coronary heart disease using the k-prototypes algorithm,” in International Conference on Electrical, Computer, Communications and Mechatronics Engineering, ICECCME 2023, 2023. DOI:10.1109/ICECCME57830.2023.10252558.
S. Gorrab, F. B. Rejab, and K. Nouira, “Innovative incremental k-prototypes based feature selection for medicine and healthcare applications,” Smart Innovation, Systems and Technologies, pp. 282–291, 2023. DOI:10.1007/978-981-99-3311-2_25
A. E. Satriatama et al., “Analisis klaster data pasien diabetes untuk identifikasi pola dan karakteristik pasien,” Jurnal Teknologi Dan Sistem Informasi Bisnis, vol. 5, no. 3, pp. 172–182, 2023. DOI:10.47233/jteksis.v5i3.828
R. D. H. Devi and P. Deepika, “Performance comparison of various clustering techniques for diagnosis of breast cancer,” in 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–5, 2015. DOI:10.1109/ICCIC.2015.7435711
R. Ariyani et al., “Pre cervical cancer detection on visual inspection of acetic acid (via) test image using k-means clustering method,” in 2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), pp. 131–135, 2020. DOI:10.1109/ICIMCIS51567.2020.9354317
A. Bengnga, R. Ishak, “Optimalisasi seleksi atribut k-means menggunakan correlation matrix pada clustering penyakit pasien optimization of k-means attribute selection using correlation matrix in patient disease clustering,” Jambura Journal of Electrical and Electronics Engineering, vol. 7, no. 2, pp. 141–148, 2025. DOI:10.37905/jjeee.v7i2.28010
I. J. Ratul et al., “Early risk prediction of cervical cancer: A machine learning approach,” in 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, ECTI-CON 2022, pp. 1–4, 2022. DOI:10.1109/ECTI-CON54298.2022.9795429
O. M. Omone and M. Kozlovszky, “The associations between hpv-infections associated risk factors and cervical cancer associated risk factors using chi-square method,” in INES 2022 - 26th IEEE International Conference on Intelligent Engineering Systems 2022, Proceedings, pp. 225–230, 2022. DOI:10.1109/INES56734.2022.9922618
P. A. Cohen et al., “Cervical cancer,” The Lancet, vol. 393, no. 10167, pp. 169–182, 2019. DOI:10.1016/S0140-6736(18)32470-X
M. Saleh et al., “Cervical cancer: 2018 revised international federation of gynecology and obstetrics staging system and the role of imaging,” American Journal of Roentgenology, vol. 214, no. 5, pp. 1182–1195, 2020. DOI:10.2214/AJR.19.21819
I. Guyon and A. Elisseeff, “An introduction to variable and feature selection,” Journal of Machine Learning Research, vol. 3, pp. 1157–1182, 2003.
Z. Huang, “Extensions to the k-means algorithm for clustering large data sets with categorical values,” Data Mining and Knowledge Discovery, vol. 12, pp. 283–304, 1998. DOI:10.1023/A:1009769707641
Z. R. Fadilah and A. W. Wijayanto, “Perbandingan metode klasterisasi data bertipe campuran: One-hot-encoding, gower distance, dan k-prototype berdasarkan akurasi (studi kasus: Chronic kidney disease dataset),” Journal of Applied Informatics and Computing, vol. 7, no. 1, pp. 63–73, 2023. DOI:10.30871/jaic.v7i1.5857
P. J. Rousseeuw, “Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,” Journal of Computational and Applied Mathematics, vol. 20, pp. 53–65, 1987. DOI:10.1016/0377-0427(87)90125-7
S. Gorrab, F. B. Rejab, and K. Nouira, “Real-time k-prototypes for incremental attribute learning using feature selection,” Machine Learning and Data Analytics for Solving Business Problems, pp. 165–187, 2022. DOI:10.1007/978-3-031-18483-3_9
DOI: https://doi.org/10.37905/jjbm.v6i3.30552
Copyright (c) 2025 Wanda Puspita Hati, Devvi Sarwinda, Bevina Desjwiandra Handari

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Jambura Journal of Biomathematics (JJBM) has been indexed by:
EDITORIAL OFFICE OF JAMBURA JOURNAL OF BIOMATHEMATICS |
![]() | Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Negeri Gorontalo Jl. Prof. Dr. Ing. B. J. Habibie, Moutong, Tilongkabila, Kabupaten Bone Bolango 96554, Gorontalo, Indonesia |
![]() | Email: [email protected] |
![]() | Jambura Journal of Biomathematics (JJBM) by Department of Mathematics Universitas Negeri Gorontalo is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Powered by Public Knowledge Project OJS. |

















