Comparing Logistic Regression and Support Vector Machine in Breast Cancer Problem

Caecilia Bintang Girik Allo, Leonardus Sandy Ade Putra, Nicea Roona Paranoan, Vincentius Abdi Gunawan

Abstract


There are several methods used for the classification problems. There are many different kinds of fields that can be used. Nowadays, Support Vector Machine (SVM) is a popular classification method that has been proposed by many researchers. Using the same method but different distribution methods for creating training and testing data in the same dataset can yield varying results in terms of prediction accuracy, which is crucial in classification. In this paper, we compare the prediction accuracy between SVM results and Logistic Regression results to determine the better method to  classify the current condition of the patient after undergoing some treatment.  Several treatments are used in this paper, including feature selection, feature extraction, separating the train and testing data using Holdout and K-Fold CV. Stepwise selection is done to reduce the features. Training and testing dataset is obtained using the five stratified and non-stratified holdout and five fold stratified and non-stratified cross validation. The result shows that the best method to classify the cancer dataset is five fold stratified cross validation SVM with radial kernel. The obtained accuracy is 81,816% with variance as much as 0,94%.

Keywords


Support Vector Machine; Logistic Regression; Accuracy; Breast Cancer

Full Text:

PDF

References


Agresti, A. (2007) An Introduction to Categorical Data Analysis, New Jersey: John Wiley & Sons, Inc.

Byrne, D., Ohalloran, M., Jones, E., Glavin, M. (2011) ‘Support Vector Machine-Based Ultrawideband Breast Cancer Detection System’, Journal of Electromagnetics Waves and Applications, 25(13), pp. 1807-1816. https://doi.org/10.1163/156939311797454015.

Daliri, M. R. (2013) ‘Combining Extreme Learning Machines Using Support Vector Machines for Breast Tissue Classification’, Computer Methods in Biomechanics and Biomedical Engineering, pp. 185-191. https://doi.org/10.1080/10255842.2013.789100.

Han, J., Kamber, M., and Pei, J. (2012) Data Mining: Concepts and Techniqu, United States of America: Elsevier Inc.

Jacob et al. (2012) ‘Efficient Classifier for Classification of Prognostic Breast Cancer Data Through Data Mining Techniques’, in Proceedings of the World Congress on Engineering and Computer Science. San Fransisco.

Khan, R. A., Ahmad, N., Minallah, N. (2013) ‘Classification and Regression Analysis of the Prognostic Breast Cancer using Generation Optimizing Algorithms’, International Journal of Computer Applications, 68, pp. 42 – 47. https://doi.org/10.5120/11754-7423.

Khandezamin, Ziba., Naderan, Marjan., Rasthi, M. J. (2020) ‘Detection and Classification of Breast Cancer Using Logistic Regression Feature Selection and GMDH Calssifier’, Journal of Biomedical Informatics. https://doi.org/10.1016/j.jbi.2020.103591.

Salama, Gouda I., Abdelhalim, M. B., and Zeid, Magdy. (2012) ‘Breast Cancer Diagnosis on Three Different Datasets using Multi-classifier’, International Journal of Computer and Information Technology, 1(1), pp. 36 – 43.

Sultana, Jabeen and Jilani, Abdul Khader. (2018) ‘Predicting Breast Cancer Using Logistic Regression and Multi-Class Classifiers’, International Journal of Engineering & Technology, 7(4.20), pp. 22 – 26. https://doi.org/10.14419/ijet.v7i4.20.22115.

UCI Machine Learning (2023) Breast Cancer Wisconsin (Prognostic) Data Set. Available at https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Prognostic%29.

World Health Organization (2023) Cancer. Available at https://www.who.int/health-topics/cancer.

World Health Organization (2023) Breast Cancer. Available at https://www.who.int/news-room/fact-sheets/detail/breast-cancer.




DOI: https://doi.org/10.34312/jjps.v4i1.19246

Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Jambura Journal of Probability and Statistics

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Editorial Office of Jambura Journal of Probability and Statistics:
 
Department of Statistics, 3rd Floor Faculty of Mathematics and Natural Sciences, Universitas Negeri Gorontalo
Jl. Prof. Dr. Ing. B.J Habibie, Tilongkabila Kabupaten Bone Bolango, 96119
Telp: +6285398740008 (Call/SMS/WA)
E-mail: redaksi.jjps@ung.ac.id