In this cumulative dissertation thesis, I examine the influence of hyperparameters on machine learning algorithms, with a special focus on random forest. It mainly consists of three papers that were written in the last three years. The first paper (Probst and Boulesteix, 2018) examines the influence of the number of trees on the performance of a random forest. In general it is believed that the number of trees should be set higher to achieve better performance. However, we show some real data examples in which the expectation of measures such as accuracy and AUC (partially) decrease with growing numbers of trees. We prove theoretically why this can happen and argue that this only happens in very special data situations. For other measure...
Recent studies have expanded the focus of machine learning methods like random forests beyond predic...
Random Forests (RF) of tree classifiers are a popular ensemble method for classification. RF have sh...
Random forests are a very effective and commonly used statistical method, but their full theoretical...
In this cumulative dissertation thesis, I examine the influence of hyperparameters on machine learni...
Hyperparameters in machine learning (ML) have received a fair amount of attention, and hyperparamete...
Breiman (2001a,b) has recently developed an ensemble classification and regression approach that dis...
Breiman's (2001) random forests are a very popular class of learning algorithms often able to produc...
Machine-learning algorithms have gained popularity in recent years in the field of ecological modeli...
International audienceIn this paper, we present a non-deterministic strategy for searching for optim...
In this paper we present our work on the Random Forest (RF) family of classification methods. Our go...
In order to create a machine learning model, one is often tasked with selecting certain hyperparamet...
The performance of many machine learning meth-ods depends critically on hyperparameter set-tings. So...
International audienceRandom forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (...
Hyperparameter tuning: Random Forest accuracy scores for multiple numbers of trees on the US-FD1W-25...
The ensemble method random forests has become a popular classification tool in bioinformatics and re...
Recent studies have expanded the focus of machine learning methods like random forests beyond predic...
Random Forests (RF) of tree classifiers are a popular ensemble method for classification. RF have sh...
Random forests are a very effective and commonly used statistical method, but their full theoretical...
In this cumulative dissertation thesis, I examine the influence of hyperparameters on machine learni...
Hyperparameters in machine learning (ML) have received a fair amount of attention, and hyperparamete...
Breiman (2001a,b) has recently developed an ensemble classification and regression approach that dis...
Breiman's (2001) random forests are a very popular class of learning algorithms often able to produc...
Machine-learning algorithms have gained popularity in recent years in the field of ecological modeli...
International audienceIn this paper, we present a non-deterministic strategy for searching for optim...
In this paper we present our work on the Random Forest (RF) family of classification methods. Our go...
In order to create a machine learning model, one is often tasked with selecting certain hyperparamet...
The performance of many machine learning meth-ods depends critically on hyperparameter set-tings. So...
International audienceRandom forests are a learning algorithm proposed by Breiman [Mach. Learn. 45 (...
Hyperparameter tuning: Random Forest accuracy scores for multiple numbers of trees on the US-FD1W-25...
The ensemble method random forests has become a popular classification tool in bioinformatics and re...
Recent studies have expanded the focus of machine learning methods like random forests beyond predic...
Random Forests (RF) of tree classifiers are a popular ensemble method for classification. RF have sh...
Random forests are a very effective and commonly used statistical method, but their full theoretical...