Learning from imbalanced data is among the most challenging areas in contemporary machine learning. This becomes even more difficult when considered the context of big data that calls for dedicated architectures capable of high-performance processing. Apache Spark is a highly efficient and popular architecture, but it poses specific challenges for algorithms to be implemented for it. While oversampling algorithms are an effective way for handling class imbalance, they have not been designed for distributed environments. In this paper, we propose a holistic look on oversampling algorithms for imbalanced big data. We discuss the taxonomy of oversampling algorithms and their mechanisms used to handle skewed class distributions. We introduce a ...
In the field of machine learning, the problem of class imbalance considerably impairs the performanc...
Abstract In the classification framework there are prob-lems in which the number of examples per cla...
The class imbalance problem, one of the common data irregularities, causes the development of under-...
The volume of data in today’s applications has meant a change in the way Machine Learning issues are...
Addressing the huge amount of data continuously generated is an important challenge in the Machine L...
Abstract—The “big data ” term has caught the attention of experts in the context of learning from da...
This work was supported by the Research Projects TIN2011-28488, TIN2013-40765-P, P10-TIC-6858 and P...
Big Data applications are emerging during the last years, and researchers from many disciplines are ...
The classification of datasets with a skewed class distribution is an important problem in data mini...
Class imbalance occurs when the distribution of classes between the majority and the minority classe...
Classification techniques in the big data scenario are in high demand in a wide variety of applicati...
The design of efficient big data learning models has become a common need in a great number of appli...
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered \de fac...
Although over 90 oversampling approaches have been developed in the imbalance learning domain, most ...
Abstract — Classification techniques in the big data scenario are in high demand in a wide variety o...
In the field of machine learning, the problem of class imbalance considerably impairs the performanc...
Abstract In the classification framework there are prob-lems in which the number of examples per cla...
The class imbalance problem, one of the common data irregularities, causes the development of under-...
The volume of data in today’s applications has meant a change in the way Machine Learning issues are...
Addressing the huge amount of data continuously generated is an important challenge in the Machine L...
Abstract—The “big data ” term has caught the attention of experts in the context of learning from da...
This work was supported by the Research Projects TIN2011-28488, TIN2013-40765-P, P10-TIC-6858 and P...
Big Data applications are emerging during the last years, and researchers from many disciplines are ...
The classification of datasets with a skewed class distribution is an important problem in data mini...
Class imbalance occurs when the distribution of classes between the majority and the minority classe...
Classification techniques in the big data scenario are in high demand in a wide variety of applicati...
The design of efficient big data learning models has become a common need in a great number of appli...
The Synthetic Minority Oversampling Technique (SMOTE) preprocessing algorithm is considered \de fac...
Although over 90 oversampling approaches have been developed in the imbalance learning domain, most ...
Abstract — Classification techniques in the big data scenario are in high demand in a wide variety o...
In the field of machine learning, the problem of class imbalance considerably impairs the performanc...
Abstract In the classification framework there are prob-lems in which the number of examples per cla...
The class imbalance problem, one of the common data irregularities, causes the development of under-...