Propensity Score Matching Pada Pemanfaatan Data Hasil Web Scraping Untuk Perbaikan Statistik Resmi

Fatimah Fatimah, Hari Wijayanto, Farit Mochamad Afendi

Abstract


The Central Statistics Agency (BPS) welcomes the challenge of utilizing big data. One of the BPS publications that can be supported using big data is the inflation figure collected from the consumer price survey. One part of the consumer price survey is the HK-4 Survey, which contains house contract rates. So far, the house contract rates produced by BPS have been underestimated or lower than the actual situation. Improvements to house contract rates are carried out by matching BPS data and web scraping of house rental sites using Propensity Score Matching (PSM). The data used in this study includes DKI Jakarta, Bandung, and Semarang from September to October 2023. This study aims to find the best matching model using PSM to improve official statistics (house contract rates) by combining several propensity score value estimation methods and matching algorithms. Furthermore, the results matching the best model will be used to calculate the corrected house contract rates. The study results show that the best matching model generally uses logistic regression propensity score value estimation, the nearest neighbor matching algorithm with returns and uses a 1:1 ratio. The corrected contract rates are far above the official ones (DKI Jakarta corrected 87.27%, Bandung 316.15%, and Semarang 60.04%). Web Scraping allows it to improve official statistics because it is cost and time-saving, enhances the quality of official statistical data, and supports better decision-making in various sectors.

Keywords


Big Data; House Contract Rates; Propensity Score Matching; Official Statistics; Web Scraping

Full Text:

PDF

References


C. O. Klingenberg, M. A. V. Borges, and J. A. do V. Antunes, “Industry 4.0: What makes it a revolution? A historical framework to understand the phenomenon,” Technol Soc, vol. 70, p. 102009, 2022, doi: 10.1016/j.techsoc.2022.102009.

Perka BPS, Peraturan Kepala BPS No. 36 Tahun 2020 tentang Rencana Strategis Badan Pusat Statistik Tahun 2020-2024. 2020.

M. Yuwono, “Kolaborasi Memperkuat Literasi dan Pemanfaatan Official Statistics,” in the Public Lecture, Bogor, Indonesia, Mar. 2023, pp. 1–12.

A. Ashofteh and J. M. Bravo, “Data science training for official statistics: A new scientific paradigm of information and knowledge development in national statistical systems,” Stat J IAOS, vol. 37, no. 3, pp. 771–789, 2021, doi: 10.3233/SJI-210841.

BPS, “Pemodelan Citra Malam Untuk Estimasi Kemiskinan Desa,” Jakarta, Indonesia, Aug. 2022.

S. Pramana and S. Mariyah, “Big data implementation for price statistics in Indonesia: Past, current, and future developments,” Stat J IAOS, vol. 37, no. 1, pp. 415–427, 2021, doi: 10.3233/SJI-200740.

T. K. Lestari, S. Esko, S. E. Sarpono, and R. Rufiadi, “Indonesia’s Experience of using Signaling Mobile Positioning Data for Official Tourism Statistics,” in 15th world forum on tourism statistics, Cusco, Peru, 2018. Accessed: Jul. 26, 2024. [Online]. Available: http://www. 15th-tourism-stats-forum. com/papers.html

BPS, “Kajian Big Data sebagai Pelengkap Data dan Informasi Statistik Sosial,” Jakarta, Indonesia, 2020.

Badan Pusat Statistik (BPS), “Harga Konsumen Beberapa Barang Dan Jasa Kelompok Perumahan 82 Kota Di Indonesia 2017,” Jakarta, Indonesia, Mar. 2018.

Badan Pusat Statistik (BPS), “Harga Konsumen Beberapa Barang Dan Jasa Kelompok Perumahan Di 82 Kota Di Indonesia 2019,” Jakarta, Indonesia, Mar. 2020.

Badan Pusat Statistik (BPS), “Publikasi Harga Konsumen Beberapa Barang dan Jasa Kelompok Perumahan, Air, Listrik, dan Bahan Bakar Rumah Tangga 90 Kota di Indonesia 2021,” Jakarta, Indonesia, Mar. 2022.

D. Florescu, M. Karlberg, F. Reis, P. R. Del Castillo, M. Skaliotis, and A. Wirthmann, “Will ‘big data’ transform official statistics,” in European Conference on the Quality of Official Statistics. Vienna, Austria, 2014, pp. 2–5.

M. Cannas and B. Arpino, “A comparison of machine learning algorithms and covariate balance measures for propensity score matching and weighting,” Biometrical Journal, vol. 61, no. 4, pp. 1049–1072, Jul. 2019, doi: 10.1002/bimj.201800132.

T.-W. Chang and Y. Kim, “Performance analysis of promotion programs of the smart factory using propensity score matching,” Procedia Comput Sci, vol. 232, pp. 1909–1917, 2024, doi: 10.1016/j.procs.2024.02.013.

W.-D. Liu et al., “Effect of early dexamethasone on outcomes of COVID-19: A quasi-experimental study using propensity score matching,” Journal of Microbiology, Immunology and Infection, vol. 57, no. 3, pp. 414–425, 2024, doi: 10.1016/j.jmii.2024.02.002.

P. C. Austin and D. S. Small, “The use of bootstrapping when using propensity‐score matching without replacement: a simulation study,” Stat Med, vol. 33, no. 24, pp. 4306–4319, 2014, doi: 10.1002/sim.6276.

J. Wood and E. T. Donnell, “Safety evaluation of continuous green T intersections: A propensity scores-genetic matching-potential outcomes approach,” Accid Anal Prev, vol. 93, pp. 1–13, 2016, doi: 10.1016/j.aap.2016.04.015.

A. Diamond and J. S. Sekhon, “Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies,” Review of Economics and Statistics, vol. 95, no. 3, pp. 932–945, 2013, doi: 10.1162/REST_a_00318.

S. J. Staffa and D. Zurakowski, “Five steps to successfully implement and evaluate propensity score matching in clinical research studies,” Anesth Analg, vol. 127, no. 4, 2018, doi: 10.1213/ANE.0000000000002787.

B. Zhao, “Web Scraping,” in Encyclopedia of Big Data, L. A. Schintler and C. L. McNeely, Eds., Cham: Springer International Publishing, 2017, pp. 1–3. doi: 10.1007/978-3-319-32001-4_483-1.

B. Ramsey, M. Turland, and O. Merida, Web Scraping with PHP, 2nd Edition: A Php[architect] Guide, 2nd ed. Canada: PHP [Architect], 2019. [Online]. Available: https://books.google.co.id/books?id=OvZryAEACAAJ

P. C. Austin, “An introduction to propensity score methods for reducing the effects of confounding in observational studies,” Multivariate Behav Res, vol. 46, no. 3, pp. 399–424, 2011, doi: 10.1080/00273171.2011.568786.

S. Guo and M. W. Fraser, Propensity Score Analysis: Statistical Methods and Applications, 2nd ed. United States of America: SAGE publications, 2014.

H. Yasunaga, “Introduction to applied statistics—chapter 1 propensity score analysis,” Annals of Clinical Epidemiology, vol. 2, no. 2, pp. 33–37, 2020, doi: 10.37737/ace.2.2_33.

Q.-Y. Zhao, J.-C. Luo, Y. Su, Y.-J. Zhang, G.-W. Tu, and Z. Luo, “Propensity score matching with R: Conventional methods and new features,” Ann Transl Med, vol. 9, no. 9, 2021, doi: 10.21037/atm-20-3998.

H. Harris and S. J. Horst, “A brief guide to decisions at each step of the propensity score matching process,” Practical Assessment, Research, and Evaluation, vol. 21, no. 1, p. 4, 2016, doi: 10.7275/yq7r-4820.

Z. Zhang, H. Kim, G. Lonjon, and Y. Zhu, “Balance diagnostics after propensity score matching,” Ann Transl Med, vol. 7, p. 16, Jan. 2019, doi: 10.21037/atm.2018.12.10.

D. Bottigliengo, G. Lorenzoni, H. Ocagli, M. Martinato, P. Berchialla, and D. Gregori, “Propensity score analysis with partially observed baseline covariates: A practical comparison of methods for handling missing data,” Int J Environ Res Public Health, vol. 18, no. 13, p. 6694, 2021, doi: 10.3390/ijerph18136694.

Y. Liu, B. Zumbo, P. Gustafson, Y. Huang, E. Kroc, and A. Wu, “Investigating causal DIF via propensity score methods,” Practical Assessment, Research & Evaluation, vol. 21, pp. 1–24, Dec. 2016, doi: 10.7275/ewqz-n963.

BPS, “Pedoman dan Pencacahan Survei Tarif Sewa/Kontrak Rumah, Upah Pembantu Rumah Tangga, Upah Baby Sitter, dan Uang Sekolah (STRPBS) 2020,” BPS, Jakarta, Indonesia, Oct. 2019.

J. Li, L. Xu, L. Tang, S. Wang, and L. Li, “Big data in tourism research: A literature review,” Tour Manag, vol. 68, pp. 301–323, 2018, doi: 10.1016/j.tourman.2018.03.009.

J. Irek, “Web scraping for food price research,” British Food Journal, vol. 121, pp. 3350–3361, Nov. 2019, doi: 10.1108/BFJ-02-2019-0081.

E. L. Groshen, “The future of official statistics,” Harv Data Sci Rev, vol. 3, no. 4, 2021, doi: 10.1162/99608f92.591917c6.




DOI: https://doi.org/10.37905/jjom.v6i2.26568



Copyright (c) 2024 Fatimah Fatimah, Hari Wijayanto, Farit Mochamad Afendi

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.


Jambura Journal of Mathematics has been indexed by

>>>More Indexing<<<


Creative Commons License

Jambura Journal of Mathematics (e-ISSN: 2656-1344) by Department of Mathematics Universitas Negeri Gorontalo is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Powered by Public Knowledge Project OJS. 


Editorial Office


Department of Mathematics, Faculty of Mathematics and Natural Science, Universitas Negeri Gorontalo
Jl. Prof. Dr. Ing. B. J. Habibie, Moutong, Tilongkabila, Kabupaten Bone Bolango, Gorontalo, Indonesia
Email: info.jjom@ung.ac.id.