The CURE for Class Imbalance

Bellinger, Colin, Branco, Paula, Torgo, Luis

Abstract

Addressing the class imbalance problem is critical for several real world applications. The application of pre-processing methods is a popular way of dealing with this problem. These solutions increase the rare class examples and/or decrease the normal class cases. However, these procedures typically only take into account the characteristics of each individual class. This segmented view of the data can have a negative impact. We propose a new method that uses an integrated view of the data classes to generate new examples and remove cases. ClUstered REsampling (CURE) is a method based on a holistic view of the data that uses hierarchical clustering and a new distance measure to guide the sampling procedure. Clusters generated in this way take into account the structure of the data. This enables CURE to avoid common mistakes made by other resampling methods. In particular, CURE prevents the generation of synthetic examples in dangerous regions and undersamples safe, non-borderline, regions of the majority class. We show the effectiveness of CURE in an extensive set of experiments with benchmark domains. We also show that CURE is a user-friendly method that does not require extensive fine-tuning of hyper-parameters.

Publication
2017 First international workshop on learning with imbalanced domains: theory and applications, Page Range: 36-50
Paula Branco
Paula Branco
Assistant Professor

I’m an Assistant Professor at EECS, University of Ottawa. My research interests include Artificial Intelligence, Machine Learning, Imbalanced Domains, Outlier Detection, Anomaly Detection, Fraud Detection and Cybersecurity.