The Comparison between Ordinal Logistic Regression and Random Forest Ordinal in Identifying the Factors Causing Diabetes Mellitus

Assyifa Lala Pratiwi Hamid, Anwar Fitrianto, Indahwati Indahwati, Erfiani Erfiani, Khusnia Nurul Khikmah


Diabetes is one of the high-risk diseases. The most prominent symptom of this disease is high blood sugar levels. People with diabetes in Indonesia can reach 30 million people. Therefore, this problem needs further research regarding the factors that cause it. Further analysis can be done using ordinal logistic regression and random forest. Both methods were chosen to compare the modelling results in determining the factors causing diabetes conducted in the CDC dataset. The best model obtained in this study is ordinal logistic regression because it generates an accuracy value of 84.52%, which is higher than the ordinal random forest. The four most important variables causing diabetes are body mass index, hypertension, age, and cholesterol.


Ordinal logistic regression; Ordinal random forest; Diabetes

