The data preparation step of the data mining process represents 80% of the problem and is both time consuming and critical for the quality of the modeling. In this thesis, our purpose is to design an evaluation criterion of data representations, in order to automate data preparation. To overcome this problem, we introduce a non parametric family of density estimation models, named data grid models. Each variable is partitioned in intervals or in groups of values according to whether it is numerical of categorical, and the whole data space is partitioned into a grid of cells resulting from the cross-product of the univariate partitions. We then consider density estimation models where the density is assumed constant per data grid cell. Becau...
Au vu de l'augmentation du nombre de jeux de données de grande dimension, la sélection de variables ...
In multi-relational data mining, data are represented in a relational form where the individuals of ...
This thesis is focused on the development of computationally efficient procedures for regression mod...
The data preparation step of the data mining process represents 80% of the problem and is both time ...
This paper introduces a new method to automatically, rapidly and reliably evaluate the class conditi...
This thesis deals with the problem of modeling and estimation of high-dimensional MoE models, toward...
Il existe des situations de modélisation statistique pour lesquelles le problème classique de classi...
This thesis deals with variable selection for clustering. This problem has become all the more chall...
This thesis deals with the problem of modeling and estimation of high-dimensional MoE models, toward...
Density estimation is a classical and well studied problem in modern statistics. In the case of low ...
The dimensionality of current applications increases which makes the density estimation a difficult ...
This manuscript addresses the problem of model selection, studied in the linear regression framework...
Databases, and in particular relational databases, are a common paradigm for storing and querying da...
The selection of a proper model is an essential task in statistical learning. In general, for a give...
The selection of a proper model is an essential task in statistical learning. In general, for a give...
Au vu de l'augmentation du nombre de jeux de données de grande dimension, la sélection de variables ...
In multi-relational data mining, data are represented in a relational form where the individuals of ...
This thesis is focused on the development of computationally efficient procedures for regression mod...
The data preparation step of the data mining process represents 80% of the problem and is both time ...
This paper introduces a new method to automatically, rapidly and reliably evaluate the class conditi...
This thesis deals with the problem of modeling and estimation of high-dimensional MoE models, toward...
Il existe des situations de modélisation statistique pour lesquelles le problème classique de classi...
This thesis deals with variable selection for clustering. This problem has become all the more chall...
This thesis deals with the problem of modeling and estimation of high-dimensional MoE models, toward...
Density estimation is a classical and well studied problem in modern statistics. In the case of low ...
The dimensionality of current applications increases which makes the density estimation a difficult ...
This manuscript addresses the problem of model selection, studied in the linear regression framework...
Databases, and in particular relational databases, are a common paradigm for storing and querying da...
The selection of a proper model is an essential task in statistical learning. In general, for a give...
The selection of a proper model is an essential task in statistical learning. In general, for a give...
Au vu de l'augmentation du nombre de jeux de données de grande dimension, la sélection de variables ...
In multi-relational data mining, data are represented in a relational form where the individuals of ...
This thesis is focused on the development of computationally efficient procedures for regression mod...