Branco, Paula, Torgo, Luis, Ribeiro, Rita P
Imbalanced domains are an important problem that arises in predictive tasks causing a loss in the performance on the most relevant cases for the user. This problem has been extensively studied for classification problems, where the target variable is nominal. Recently, it was recognized that imbalanced domains occur in several other contexts and for multiple tasks, such as regression tasks, where the target variable is continuous. This paper focuses on imbalanced domains in both classification and regression tasks. Resampling strategies are among the most successful approaches to address imbalanced domains. In this work, we propose variants of existing resampling strategies that are able to take into account the information regarding the neighbourhood of the examples. Instead of performing sampling uniformly, our proposals bias the strategies to reinforce some regions of the data sets. With an extensive set of experiments, we provide evidence of the advantage of introducing a neighbourhood bias in the resampling strategies for both classification and regression tasks with imbalanced data sets.