A Study on Prediction Intervals Produced Using Quantile Regression Forest With and Without Variable Selection

Megawati Megawati, Bagus Sartono, Sachnaz Desta Oktarina

Abstract


Quantile Regression Forest (QRF) is a method that utilizes the random forest algorithm to estimate the conditional distribution of response variables and form quantile prediction intervals. However, when there is a high correlation between covariates, QRF performance may decrease due to the multicollinearity effect, thereby reducing the accuracy of the prediction interval for the target variable. In linear models, multicollinearity must be addressed because it can cause large variances. This study contributes to enhancing the reliability of prediction intervals in correlated data through the integration of adaptive-LASSO with QRF. Specifically, it examines the role of variable selection by the adaptive LASSO method on the performance of the QRF prediction interval in the simulated data, and the best model obtained in the study is then applied to predict the interval in the productivity data of oil palm fresh fruit bunches. The results of the study show that variable selection is proven to produce coverage close to the target prediction interval. In addition, the QRF model with variable selection applied to the productivity data of oil palm fresh fruit bunches produces a good prediction interval.

Keywords


Adaptive-LASSO; Oil Palm Productivity; Prediction Interval; Quantile Regression Forest; Variable Selection

Full Text:

PDF

References


R. Koenker and K. F. Hallock, “Quantile regression,” J. Econ. Perspect., vol. 15, no. 4, pp. 143–156, 2001, doi: 10.1257/jep.15.4.143.

C. Davino, R. Romano, and D. Vistocco, “Handling multicollinearity in quantile regression through the use of principal component regression,” Metron, vol. 80, no. 2, pp. 153–174, 2022, doi: 10.1007/s40300-022-00230-3.

N. Meinshausen, “Quantile regression forests,” J. Mach. Learn. Res., vol. 7, pp. 983–999, 2006.

Y. Fang, P. Xu, J. Yang, and Y. Qin, “A quantile regression forest based method to predict drug response and assess prediction reliability,” PLoS One, vol. 13, no. 10, pp. 1–16, 2018, doi: 10.1371/journal.pone.0205155.

G. Casella and R. L. Berger, Statistical lnference-clear, 2nd ed. Pacific Grove: Duxbury Thomson Learning, 2002.

A. Asrirawan, K. A. Notodiputro, and B. Sartono, “Improving Accuracy of Prediction Intervals of Household Income Using Quantile Regression Forest and Selection of Explanatory Variables,” BAREKENG J. Ilmu Mat. dan Terap., vol. 17, no. 4, pp. 1915–1926, 2023, doi: 10.30598/barekengvol17iss4pp1915-1926.

D. L. Shrestha and D. P. Solomatine, “Machine learning approaches for estimation of prediction interval for the model output,” Neural Networks, vol. 19, no. 2, pp. 225–235, 2006, doi: 10.1016/j.neunet.2006.01.012.

L. Hu, J. Ji, Y. Li, B. Liu, and Y. Zhang, “Quantile Regression Forests to Identify Determinants of Neighborhood Stroke Prevalence in 500 Cities in the USA: Implications for Neighborhoods with High Prevalence,” J. Urban Heal., vol. 98, no. 2, pp. 259–270, 2021, doi: 10.1007/s11524-020-00478-y.

E. S. Kravitz and R. J. Carroll, “Re-evaluating composite scores: Adaptive Lasso variable selection for non-linear models,” Stat, vol. 8, no. 1, pp. 1–10, 2019, doi: 10.1002/sta4.251.

Q. Chen, Z. Xiao, and Q. Yao, “Quantile control via random forest,” J. Econom., no. February, p. 105789, 2024, doi: 10.1016/j.jeconom.2024.105789.

T. Bicalho, C. Bessou, and S. A. Pacca, “Land use change within EU sustainability criteria for biofuels: The case of oil palm expansion in the Brazilian Amazon,” Renew. Energy, vol. 89, pp. 588–597, 2016, doi: 10.1016/j.renene.2015.12.017.

M. J. Chin, P. E. Poh, B. T. Tey, E. S. Chan, and K. L. Chin, “Biogas from palm oil mill effluent (POME): Opportunities and challenges from Malaysia’s perspective,” Renew. Sustain. Energy Rev., vol. 26, pp. 717–726, 2013, doi: 10.1016/j.rser.2013.06.008.

S. Mekhilef, S. Siga, and R. Saidur, “A review on palm oil biodiesel as a source of renewable fuel,” Renew. Sustain. Energy Rev., vol. 15, no. 4, pp. 1937–1949, 2011, doi: 10.1016/j.rser.2010.12.012.

A. R. Firdawanti, I. M. Sumertajaya, and B. Sartono, “Random Forest Lag Distributed Regression for Forecasting on Palm Oil Production,” in Proceedings of the Proceedings of the 1st International Conference on Statistics and Analytics, ICSA 2019, 2-3 August 2019, Bogor, Indonesia, EAI, 2020. doi: 10.4108/eai.2-8-2019.2290493.

L. S. Woittiez, M. T. van Wijk, M. Slingerland, M. van Noordwijk, and K. E. Giller, “Yield gaps in oil palm: A quantitative review of contributing factors,” Eur. J. Agron., vol. 83, pp. 57–77, 2017, doi: 10.1016/j.eja.2016.11.002.

S. D. Oktarina, R. Nurkhoiry, and I. Pradiko, “The effect of climate change to palm oil price dynamics: A supply and demand model,” IOP Conf. Ser. Earth Environ. Sci., vol. 782, no. 3, 2021, doi: 10.1088/1755-1315/782/3/032062.




DOI: https://doi.org/10.37905/euler.v13i3.34392

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Megawati Megawati, Bagus Sartono, Sachnaz Desta Oktarina

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Euler : Jurnal Ilmiah Matematika, Sains dan Teknologi has been indexed by:


 EDITORIAL OFFICE OF EULER : JURNAL ILMIAH MATEMATIKA, SAINS, DAN TEKNOLOGI

 Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Negeri Gorontalo
Jl. Prof. Dr. Ing. B. J. Habibie, Tilongkabila, Kabupaten Bone Bolango 96554, Gorontalo, Indonesia
 Email: [email protected]
 +6287777-586462 (WhatsApp Only)
 Euler : Jurnal Ilmiah Matematika, Sains dan Teknologi (p-ISSN: 2087-9393 | e-ISSN:2776-3706) by Department of Mathematics Universitas Negeri Gorontalo is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.  Powered by Public Knowledge Project OJS.