|Nabavi, SO; Nölscher, AC; Samimi, C; Thomas, C; Haimberger, L; Lüers, J; Held, A: Site-Scale modelling of surface ozone in Northern Bavaria using machine learning algorithms, regional dynamic models, and a hybrid model, Environmental Pollution (2021), doi:doi.org/10.1016/j.envpol.2020.115736 [Link]|
|Key words: Downscaling; Surface ozone; Ensemble learning; Simulation interpretability|
Ozone (O3) is a harmful pollutant when present in the lowermost layer of the atmosphere. Therefore, the European Commission formulated directives to regulate O3 concentrations in near-surface air. However, almost 50% of the 5068 air quality stations in Europe do not monitor O3 concentrations. This study aims to provide a hybrid modeling system that fills these gaps in the hourly surface O3 observations on a site scale with much higher accuracy than existing O3 models. This hybrid model was developed using estimations from multiple linear regression-based eXtreme Gradient Boosting Machines (MLR-XGBM) and O3 reanalysis from European regional air quality models (CAMS-EU). The binary classification of extremely high O3 events and the 1- and 24-h forecasts of hourly O3 were investigated as secondary aims. In this study thirteen stations in Northern Bavaria, out of which six do not monitor O3, were chosen as test sites. Considering the computational complexity of machine learning algorithms (MLAs), we also applied two recent MLA interpretation methods, namely SHapley Additive exPlanations (SHAP) and Local interpretable model-agnostic explanations (LIME).
With SHAP, we showed an increasing effect of temperature on O3 concentrations which intensifies for temperatures exceeding 17 °C. According to LIME, O3 concentration peaks are mainly governed by meteorological factors under dry and warm conditions on a regional scale, whereas local nitrogen oxide concentrations control base O3 concentrations during cold and wet periods.
While recently developed MLAs for the spatial estimation of hourly O3 concentrations had a station-based root-mean-square error (RMSE) above 27 μg/m3, our proposed model significantly reduced the estimation errors by about 66% with an RMSE of 9.49 μg/m3. We also found that logistic regression (LR) and MLR-XGBM performed best in the site-scale classification and 24-h forecast of O3 concentrations (with a station-averaged accuracy and RMSE of 0.95 and 19.34 μg/m3, respectively).