Imbalanced Regression Data Sets
15 data sets for imbalanced regression
Contents
This repository contains the 15 imbalanced regression data sets used in the paper:
Paula Branco, Luis Torgo, and Rita P. Ribeiro "Pre-processing Approaches for Imbalanced Distributions in Regression" submitted to Neurocomputing Journal.
The data sets are provided in 3 formats:
- RDATA
- CSV
- ARFF
The above links allow you to download the data sets in all the available formats.
The following links allow you to obtain the data sets in only one of the formats.
The main characteristics of the 15 regression data sets in this folder are as follows: More Details on the Data Sets
where,
N represents the total number of cases;
tpred represents the number of predictors;
p.nom represents the number of nominal predictors;
p.num represents the number of numeric predictors;
nRare represents the number of cases with target variable relevance above 0.8; and
% Rare=nRare/Nx100.