Air Quality Analysis Using the Pipeline Method (Case Study: Italy Air Quality Dataset 2004–2005)

Rahmad Hidayat Dongka, Moh. Iqbal Hunowu, Ulfatun Nadifa, Ade Irawaty Tolago

Abstract


This study aims to demonstrate how to analyze data from an air pollution dataset recorded in Italy between 2004 and 2005. During this period, air pollution occurred due to various chemical compounds present in the atmosphere. The analysis was carried out using several methods commonly applied in raw data processing, such as data normalization, data separation, and data visualization. The Air Quality dataset, obtained from the UCI Machine Learning Repository, contains 9,357 records with 15 columns representing pollutant measurements. The Pipeline method from the Scikit-learn library was used to clean, process, and transform the data in a structured manner. The results of the analysis indicate that the sensors used in the dataset have a high correlation with laboratory reference measurements (ground truth), particularly the PT08.S1(CO) sensor, which shows a correlation of 0.93 with CO(GT). The correlation heatmap visualization reveals a strong relationship between sensor readings and chemical compounds in the air, indicating that sensor data can be effectively used to monitor air quality in real time.

Keywords


Air Quality; Pipeline Method; Correlation; Pollution Sensor; Data Visualization;

Full Text:

PDF

References


Khan, A., Rahman, M., & Chen, Y. (2024). Data pipeline architectures for environmental monitoring analytics. Journal of Environmental Informatics, 43(2), 145–158.

Singh, D., & Kumar, V. (2023). Machine learning applications in air pollution analysis. Environmental Monitoring and Assessment, 195(3), 1–15. https://doi.org/10.1007/s10661-023-11025-4

Rahman, M., Hasan, M., & Karim, A. (2022). Data preprocessing techniques for environmental sensor datasets. Sensors, 22(15), 5678. https://doi.org/10.3390/s22155678

Zhang, Y., Li, H., & Wang, X. (2023). Data-driven air quality prediction using ensemble learning models. Atmospheric Environment, 296, 119553. https://doi.org/10.1016/j.atmosenv.2023.119553

Agbehadji, I. E., & Obagbuwa, I. C. (2024). Machine learning and deep learning techniques for spatiotemporal air quality prediction: A systematic review. Atmosphere, 15(11), 1352. https://doi.org/10.3390/atmos15111352

Kumar, P., Singh, A., & Gupta, R. (2022). Air pollution prediction using machine learning approaches: A comprehensive review. Environmental Research, 204, 112020. https://doi.org/10.1016/j.envres.2021.112020

Wang, S., Li, X., & Zhao, Y. (2023). Air quality prediction using hybrid machine learning models. Environmental Pollution, 316, 120567. https://doi.org/10.1016/j.envpol.2022.120567

Ashraf, M., & Moradiya, K. (2025). Machine learning–driven carbon monoxide prediction using the UCI air quality dataset. Australian Journal of Artificial Intelligence Review, 7(1), 45–59.

Aram, F., García, E. H., Solgi, E., & Mosavi, A. (2024). Air quality forecasting using machine learning: Comparative analysis and ensemble strategies. Water, Air, & Soil Pollution, 235(4), 198. https://doi.org/10.1007/s11270-024-06915-3

Li, J., Zhao, Z., & Chen, Y. (2021). Machine learning approaches for air pollution prediction: A review. Environmental Modelling & Software, 139, 105025. https://doi.org/10.1016/j.envsoft.2021.105025

Sharma, S., & Goyal, P. (2022). Forecasting air pollutant concentration using data analytics. Sustainable Cities and Society, 77, 103553. https://doi.org/10.1016/j.scs.2021.103553

Chen, Z., Zhang, T., Chen, Z., Xiang, Y., & Xuan, Q. (2021). High-resolution dataset for air quality estimation. Environmental Data Science, 1, e15. https://doi.org/10.1017/eds.2021.15

Fassò, A., Rodeschini, J., Moro, A. F., & Finazzi, F. (2022). Environmental datasets for air quality monitoring and prediction. Scientific Data, 9(1), 432. https://doi.org/10.1038/s41597-022-01523-4

D’Elia, I., Briganti, G., Vitali, L., Piersanti, A., Righini, G., & Ciancarella, L. (2021). Measured and modelled air quality trends in Italy. Atmospheric Chemistry and Physics, 21(14), 10825–10844. https://doi.org/10.5194/acp-21-10825-2021

Blanco, G., Barco, L., Innocenti, L., & Rossi, C. (2024). Urban air pollution forecasting using machine learning and satellite observations. Environmental Data Science, 3, e8. https://doi.org/10.1017/eds.2024.8


Refbacks

  • There are currently no refbacks.


Copyright (c) 2026 International Journal of Embedded Computer Engineering

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.