Many variable selection algorithms in data mining utilize ranking and thresholding predictors as a principle due to its scalability and simplicity. Most ranking methods, however, rely on the assumption of the data independence. It might be misleading if dependence and heteroscedasticity, which are prevalent in real world data, are ignored. Instead of assuming completely (conditionally) independence, we capture the potential data dependency via a block-wise way. Through this view, I propose two approaches, the GEE approach and the Tukey approach, to handle the dependency and heteroscedasticity. In addition, I show that these two approaches are asymptotically equivalent under mild assumptions. I applied this idea to the Netflix data for the K...
International audienceThe analysis of data generated by high throughput technologies such as DNA mic...
Variable screening and variable selection methods play important roles in modeling high dimensional ...
Linear regression models are commonly used statistical models for predicting a response from a set o...
Many variable selection algorithms in data mining utilize ranking and thresholding predictors as a p...
From the perspective of econometrics, an accurate variable selection method greatly enhances the rel...
With the ever-increasing amount of computational power available, so broadens the horizon of statist...
International audienceThe analysis of high throughput data has renewed the statistical methodology f...
This dissertation focuses on developing novel model selection techniques, the process by which a sta...
This dissertation covers topics in classification with high-dimensional data, variable selection in ...
Modern applications of statistical approaches involve high-dimensional complex data, where variable ...
In this paper, we review state-of-the-art methods for feature selection in statistics with an applic...
With respect to variable selection in linear regression, partial correlation for normal models (Buhl...
In a wide range of applications, datasets are generated for which the number of variables p exceeds ...
\ud Motivated by the recent trend in ``Big data", we are interested in the case where both $p$, the ...
Feature selection plays a pivotal role in knowledge discovery and contemporary scientific research. ...
International audienceThe analysis of data generated by high throughput technologies such as DNA mic...
Variable screening and variable selection methods play important roles in modeling high dimensional ...
Linear regression models are commonly used statistical models for predicting a response from a set o...
Many variable selection algorithms in data mining utilize ranking and thresholding predictors as a p...
From the perspective of econometrics, an accurate variable selection method greatly enhances the rel...
With the ever-increasing amount of computational power available, so broadens the horizon of statist...
International audienceThe analysis of high throughput data has renewed the statistical methodology f...
This dissertation focuses on developing novel model selection techniques, the process by which a sta...
This dissertation covers topics in classification with high-dimensional data, variable selection in ...
Modern applications of statistical approaches involve high-dimensional complex data, where variable ...
In this paper, we review state-of-the-art methods for feature selection in statistics with an applic...
With respect to variable selection in linear regression, partial correlation for normal models (Buhl...
In a wide range of applications, datasets are generated for which the number of variables p exceeds ...
\ud Motivated by the recent trend in ``Big data", we are interested in the case where both $p$, the ...
Feature selection plays a pivotal role in knowledge discovery and contemporary scientific research. ...
International audienceThe analysis of data generated by high throughput technologies such as DNA mic...
Variable screening and variable selection methods play important roles in modeling high dimensional ...
Linear regression models are commonly used statistical models for predicting a response from a set o...