Data Sets for Imbalanced Regression Learning

15 data sets for imbalanced regression from different domains

Download .zip Download .tar.gz View on GitHub

Imbalanced Regression Data Sets

15 data sets for imbalanced regression

Contents

This repository contains the 15 imbalanced regression data sets used in the paper:

Paula Branco, Luis Torgo, and Rita P. Ribeiro "Pre-processing Approaches for Imbalanced Distributions in Regression" submitted to Neurocomputing Journal.

The data sets are provided in 3 formats:

  • RDATA
  • CSV
  • ARFF

The above links allow you to download the data sets in all the available formats.

The following links allow you to obtain the data sets in only one of the formats.

Rdata zip CSV zip ARFF zip

More Details on the Data Sets

The main characteristics of the 15 regression data sets in this folder are as follows:
IDData SetNtpredp.nomp.numnRare% Rare
DS1a619811383316.7
DS2Abalone417781767916.3
DS3a319811383216.2
DS4a419811383115.7
DS5a119811382814.1
DS6a719811382713.6
DS7boston506130136512.8
DS8a219811382211.1
DS9fuelCons17643712251649.3
DS10heat740012486649.0
DS11availPwr180215781578.7
DS12cpuSm8192120127138.7
DS13maxTorque18023213191297.2
DS14bank8FM44998082886.4
DS15Accel173214311895.1

where,

N represents the total number of cases;

tpred represents the number of predictors;

p.nom represents the number of nominal predictors;

p.num represents the number of numeric predictors;

nRare represents the number of cases with target variable relevance above 0.8; and

% Rare=nRare/Nx100.