SMOGN: a pre-processing approach for imbalanced regression

Branco, Paula Oliveira, Torgo, Luís, Ribeiro, Rita Paula

Abstract

The problem of imbalanced domains, framed within predictive tasks, is relevant in many practical applications. When dealing with imbalanced domains a performance degradation is usually observed on the most rare and relevant cases for the user. This problem has been thoroughly studied within a classification setting where the target variable is nominal. The exploration of this problem in other contexts is more recent within the research community. For regression tasks, where the target variable is continuous, only a few solutions exist. Pre-processing strategies are among the most successful proposals for tackling this problem. In this paper we propose a new pre-processing approach for dealing with imbalanced regression. Our algorithm, SMOGN, incorporates two existing proposals trying to solve problems detected in both of them. We show that SMOGN has advantages in comparison to other approaches. We also show that our method has a different impact on the learners used, displaying more advantages for Random Forest and Multivariate Adaptive Regression Splines learners.

Publication
2017 First international workshop on learning with imbalanced domains: theory and applications, Page Range: 36-50
Paula Branco
Paula Branco
Assistant Professor

I’m an Assistant Professor at EECS, University of Ottawa. My research interests include Artificial Intelligence, Machine Learning, Imbalanced Domains, Outlier Detection, Anomaly Detection, Fraud Detection and Cybersecurity.