Imbalanced Learning Techniques for Land Subsidence Prediction: Ensemble Methods and Data Balancing Strategies
Publish place: Eighth International Conference on Technology Development in Materials Engineering, Mining and Geology
Publish Year: 1404
نوع سند: مقاله کنفرانسی
زبان: English
View: 59
This Paper With 15 Page And PDF Format Ready To Download
- Certificate
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
EMGBC09_056
تاریخ نمایه سازی: 1 آذر 1404
Abstract:
Land subsidence poses significant threats to infrastructure, environment, and public safety, making accurate prediction essential for disaster risk reduction and sustainable resource management. This study addresses the critical challenge of class imbalance in land subsidence prediction datasets, where subsidence events are rare compared to stable ground conditions, leading to biased models that poorly detect actual subsidence occurrences. We propose and evaluate several imbalanced learning approaches, including random under-sampling, cost-sensitive algorithms, and ensemble methods (bagging and boosting), for predicting land subsidence in Chaharmahal and Bakhtiari province, Iran. The study utilizes a comprehensive dataset of ۵۱۶ subsidence locations identified through InSAR analysis, along with ۱۳ conditioning factors including geological, hydrological, environmental, and anthropogenic variables. Multiple imbalanced learning techniques are systematically compared using precision, recall, F۱-score, and ROC-AUC score metrics. Results demonstrate that random under-sampling followed by Random Forest achieves the most balanced performance with precision, recall, and F۱-score all reaching ۹۴% and ROC-AUC of ۹۸.۴%. While bagging method applied directly to imbalanced data achieves high recall (۹۶%) and ROC-AUC (۹۹%), it suffers from lower precision due to false positives. The fine-tuned models are used to generate land subsidence susceptibility maps for the entire study area, revealing that eastern and southeastern regions exhibit the highest susceptibility. Risk analysis shows that random under-sampling is more conservative method producing the most balanced risk distribution with ۹.۱% and ۸.۱% of areas classified as high and very high risk, respectively. The findings highlight the critical importance of addressing class imbalance for achieving reliable subsidence prediction. This research provides valuable insights for improving early warning systems and supporting informed decision-making for land subsidence risk management.
Keywords:
Imbalanced Learning , Machine Learning , Land Subsidence Prediction , Ensemble Methods , Bagging , Boosting , Cost Sensitive Algorithm , Land Subsidence Susceptibility Map
Authors
Khayyam Salehi
Department of Computer Science, Faculty of Mathematical Sciences, Shahrekord University, Iran
Maryam Karimi
Department of Computer Science, Faculty of Mathematical Sciences, Shahrekord University, Iran
Khosro Keyani
Department of Civil Engineering, Shahrekord Branch, Islamic Azad University, Shahrekord, Iran