Training of Machine Learning (ML) models in real contexts often deals with big data sets and high-class imbalance samples where the class of interest is unrepresented (minority class). Practical solutions using classical ML models address the problem of large data sets using parallel/distributed implementations of training algorithms, approximate model-based solutions, or applying instance selection (IS) algorithms to eliminate redundant information. However, the combined problem of big and high imbalanced datasets has been less addressed. This work proposes three new methods for IS to be able to deal with large and imbalanced data sets. The proposed methods use Locality Sensitive Hashing (LSH) as a base clustering technique, and then three...
The enormous volume of data from different sources, really varied in its typology, generated and pro...
The classification of datasets with a skewed class distribution is an important problem in data mini...
Learning from an imbalanced dataset is a tricky proposition. Because these datasets are biased towar...
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, be...
We proposed a new algorithm to preprocess huge and imbalanced data.This algorithm, based on distance...
AbstractOver recent decades, database sizes have grown considerably. Larger sizes present new challe...
AbstractSeveral applications aim to identify rare events from very large data sets. Classification a...
Addressing the huge amount of data continuously generated is an important challenge in the Machine L...
Learning from imbalanced data is among the most challenging areas in contemporary machine learning. ...
The design of efficient big data learning models has become a common need in a great number of appli...
Despite the tremendous advances in machine learning (ML), training with imbalanced data still poses ...
This work was supported by the Research Projects TIN2011-28488, TIN2013-40765-P, P10-TIC-6858 and P...
AbstractInstance selection is becoming increasingly relevant due to the huge amount of data that is ...
The problem of classification of imbalanced datasets is a critical one. With an increase in the numb...
This thesis aims to scale Bayesian machine learning (ML) to very large datasets. First, I propose a ...
The enormous volume of data from different sources, really varied in its typology, generated and pro...
The classification of datasets with a skewed class distribution is an important problem in data mini...
Learning from an imbalanced dataset is a tricky proposition. Because these datasets are biased towar...
Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, be...
We proposed a new algorithm to preprocess huge and imbalanced data.This algorithm, based on distance...
AbstractOver recent decades, database sizes have grown considerably. Larger sizes present new challe...
AbstractSeveral applications aim to identify rare events from very large data sets. Classification a...
Addressing the huge amount of data continuously generated is an important challenge in the Machine L...
Learning from imbalanced data is among the most challenging areas in contemporary machine learning. ...
The design of efficient big data learning models has become a common need in a great number of appli...
Despite the tremendous advances in machine learning (ML), training with imbalanced data still poses ...
This work was supported by the Research Projects TIN2011-28488, TIN2013-40765-P, P10-TIC-6858 and P...
AbstractInstance selection is becoming increasingly relevant due to the huge amount of data that is ...
The problem of classification of imbalanced datasets is a critical one. With an increase in the numb...
This thesis aims to scale Bayesian machine learning (ML) to very large datasets. First, I propose a ...
The enormous volume of data from different sources, really varied in its typology, generated and pro...
The classification of datasets with a skewed class distribution is an important problem in data mini...
Learning from an imbalanced dataset is a tricky proposition. Because these datasets are biased towar...