In this thesis, general theoretical tools are constructed which can be applied to develop ma- chine learning algorithms which are consistent, with fast convergence and which minimize the generalization error by asymptotically controlling the rate of false discoveries (FDR) of features, especially for high dimensional datasets. Even though the main inspiration of this work comes from biological applications, where the data is extremely high dimensional and often hard to obtain, the developed methods are applicable to any general statistical learning problem. In this work, the various machine learning tasks like hypothesis testing, classification, regression, etc are formulated as risk minimization algorithms. This allows such learning tasks...
Despite the popularity of the false discovery rate (FDR) as an error control metric for large-scale ...
This thesis deals with multiple testing problems in high-dimension, a regime which has become increa...
The false discovery rate (FDR) has been a widely used error measure in situations where a large numb...
Data Mining is characterised by its ability at processing large amounts of data. Among those are the...
Supervised and unsupervised classification are common topics in machine learning in both scientific ...
We study the performance -- and specifically the rate at which the error probability converges to ze...
Consider the multiclassification (discrimination) problem with known prior probabilities and a multi...
International audienceThe False Discovery Rate (FDR) is a commonly used type I error rate in multipl...
This thesis deals with statistical questions raised by the analysis of high-dimensional genomic data...
Abstract\ud \ud Background\ud In investigating differe...
We consider several statistical approaches to binary classification and multiple hypothesis testing ...
We develop minimax optimal risk bounds for the general learning task consisting in predicting as wel...
In many scientific and medical settings, large-scale experiments are generating large quantities of ...
Abstract Background Procedures for controlling the false discovery rate (FDR) are widely applied as ...
Background: When many (up to millions) of statistical tests are conducted in discovery set analyses ...
Despite the popularity of the false discovery rate (FDR) as an error control metric for large-scale ...
This thesis deals with multiple testing problems in high-dimension, a regime which has become increa...
The false discovery rate (FDR) has been a widely used error measure in situations where a large numb...
Data Mining is characterised by its ability at processing large amounts of data. Among those are the...
Supervised and unsupervised classification are common topics in machine learning in both scientific ...
We study the performance -- and specifically the rate at which the error probability converges to ze...
Consider the multiclassification (discrimination) problem with known prior probabilities and a multi...
International audienceThe False Discovery Rate (FDR) is a commonly used type I error rate in multipl...
This thesis deals with statistical questions raised by the analysis of high-dimensional genomic data...
Abstract\ud \ud Background\ud In investigating differe...
We consider several statistical approaches to binary classification and multiple hypothesis testing ...
We develop minimax optimal risk bounds for the general learning task consisting in predicting as wel...
In many scientific and medical settings, large-scale experiments are generating large quantities of ...
Abstract Background Procedures for controlling the false discovery rate (FDR) are widely applied as ...
Background: When many (up to millions) of statistical tests are conducted in discovery set analyses ...
Despite the popularity of the false discovery rate (FDR) as an error control metric for large-scale ...
This thesis deals with multiple testing problems in high-dimension, a regime which has become increa...
The false discovery rate (FDR) has been a widely used error measure in situations where a large numb...