Machine Learning for Carbon Emissions Prediction and Anomaly Detection: A Comparative Study of Traditional and Deep Learning Approaches

Abstract
Accurate greenhouse gas emissions monitoring is critical for evaluating climate policies and ensuring regulatory compliance. Although machine learning offers promising capabilities, systematic comparisons between tradition al and deep learning approaches for structured environmental time-series data remain limited. Here, we evaluated seven prediction models and three anomaly detection approaches on monthly CO₂ emissions data from 20 major emitting countries between 2000 and 2024. Random Forest achieved superior performance (R² = 0.902) com pared to all deep learning architectures (R² ≈ 0), while requiring 15–60 times less training time. SHapley Additive exPlanations revealed that the 12-month lag feature dominated the predictions (75.6% importance), indicating strong year-over-year persistence with direct policy implications. Deep learning failure stems from insufficient training data (3,863 samples), structured tabular characteristics with domain-engineered features, and unnecessary architectural complexity, thereby aligning with emerging evidence that traditional methods often outperform deep learning on tabular datasets. Anomaly detection proved challenging, with all unsupervised models achieving F1 scores below 0.05, suggesting that supervised approaches with labelled examples are necessary for operational deployment. We recommend prioritising Random Forest for prediction, investing in domain-informed feature en gineering, integrating explainability tools for transparency, and deploying anomaly detection with human review. We identified eight research frontiers, including satellite data integration, federated learning, causal inference, and hybrid physics-machine learning models. This study demonstrates that traditional machine learning, combined with domain expertise and interpretability tools, can provide accurate, efficient, and transparent emission predic tions for operational climate monitoring systems.
Description
Keywords
Citation