Medical datasets are usually imbalanced, where negative cases severely outnumber positive cases. Therefore, it is essential to deal with this data skew problem when training machine learning algorithms. This study uses two representative lung cancer datasets, PLCO and NLST, with imbalance ratios (the proportion of samples in the majority class to those in the minority class) of 24.7 and 25.0, respectively, to predict lung cancer incidence. This research uses the performance of 23 class imbalance methods (resampling and hybrid systems) with three classical classifiers (logistic regression, random forest, and LinearSVC) to identify the best imbalance techniques suitable for medical datasets. Resampling includes ten under-sampling methods (RUS...
In this thesis several sampling methods for Statistical Learning with imbalanced data have been impl...
In general, the imbalanced dataset is a problem often found in health applications. In medical data ...
The present paper studies the influence of two distinct factors on the performance of some resamplin...
In the medical field, many outcome variables are dichotomized, and the two possible values of a dich...
Imbalanced class problem (machine learning) is a problem that arises because of the significant diff...
Many real-world machine learning applications require building models using highly imbalanced datase...
In many application domains such as medicine, information retrieval, cybersecurity, social media, et...
Today, the surge in data has also increased the complexity of class imbalance problem. Real-world sc...
Data mining classification techniques are affected by the presence of imbalances between classes of ...
[[abstract]]Classifying imbalanced data in medical informatics is challenging. Motivated by this iss...
There is an unprecedented amount of data available. This has caused knowledge discovery to garner at...
Class imbalance is a common challenge when dealing with pattern classification of real-world medica...
Learning from imbalanced data has been a research topic studied for many years. There are two main a...
In this paper, we present a new rule induction algorithm for machine learning in medical diagnosis. ...
Imbalance of the classes, characterized by a disproportional ratio of observations in each class, is...
In this thesis several sampling methods for Statistical Learning with imbalanced data have been impl...
In general, the imbalanced dataset is a problem often found in health applications. In medical data ...
The present paper studies the influence of two distinct factors on the performance of some resamplin...
In the medical field, many outcome variables are dichotomized, and the two possible values of a dich...
Imbalanced class problem (machine learning) is a problem that arises because of the significant diff...
Many real-world machine learning applications require building models using highly imbalanced datase...
In many application domains such as medicine, information retrieval, cybersecurity, social media, et...
Today, the surge in data has also increased the complexity of class imbalance problem. Real-world sc...
Data mining classification techniques are affected by the presence of imbalances between classes of ...
[[abstract]]Classifying imbalanced data in medical informatics is challenging. Motivated by this iss...
There is an unprecedented amount of data available. This has caused knowledge discovery to garner at...
Class imbalance is a common challenge when dealing with pattern classification of real-world medica...
Learning from imbalanced data has been a research topic studied for many years. There are two main a...
In this paper, we present a new rule induction algorithm for machine learning in medical diagnosis. ...
Imbalance of the classes, characterized by a disproportional ratio of observations in each class, is...
In this thesis several sampling methods for Statistical Learning with imbalanced data have been impl...
In general, the imbalanced dataset is a problem often found in health applications. In medical data ...
The present paper studies the influence of two distinct factors on the performance of some resamplin...