In high-dimensional statistics, the manifold hypothesis presumes that the data lie near low-dimensional structures, called manifolds. This assumption helps explain why machine learning algorithms work so well on high-dimensional data, and is satisfied for many real-life data sets.We present in this thesis some contributions regarding the estimation of two quantities in this framework: the density of the underlying distribution, and the reach of its support. For the problem of reach estimation, we suggest different strategies based on important geometric invariants — namely the convexity defect functions, and measures of metric distortions — from which we derive minimax-optimal rates of convergence. Regarding the problem of density estimatio...
We consider practical density estimation from large data sets sampled on manifolds embedded in Eucli...
Certains jeux de données présentent des caractéristiques géométriques et topologiques non triviales ...
The amount of data is continuously increasing through online databases such as Flicker1. Not only is...
We study the Bayesian density estimation of data living in the offset of an unknown submanifold of t...
This thesis introduces geometric representations relevant to the analysis of datasets of random vect...
This thesis introduces geometric representations relevant to the analysis of datasets of random vect...
Geometry plays an important role in modern statistical learning theory, and many different aspects o...
Geometry plays an important role in modern statistical learning theory, and many different aspects o...
International audienceWe focus on the problem of manifold estimation: given a set of observations sa...
We introduce an information theoretic method for nonparametric, non-linear dimensionality reduction,...
We introduce an information theoretic method for nonparametric, nonlinear dimensionality reduction, ...
The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches...
We consider the problem of analyzing data for which no straight forward and meaningful Euclidean rep...
We are increasingly confronted with very high dimensional data from speech,images, genomes, and othe...
The hypothesis that high dimensional data tends to lie in the vicinity of a low di-mensional manifol...
We consider practical density estimation from large data sets sampled on manifolds embedded in Eucli...
Certains jeux de données présentent des caractéristiques géométriques et topologiques non triviales ...
The amount of data is continuously increasing through online databases such as Flicker1. Not only is...
We study the Bayesian density estimation of data living in the offset of an unknown submanifold of t...
This thesis introduces geometric representations relevant to the analysis of datasets of random vect...
This thesis introduces geometric representations relevant to the analysis of datasets of random vect...
Geometry plays an important role in modern statistical learning theory, and many different aspects o...
Geometry plays an important role in modern statistical learning theory, and many different aspects o...
International audienceWe focus on the problem of manifold estimation: given a set of observations sa...
We introduce an information theoretic method for nonparametric, non-linear dimensionality reduction,...
We introduce an information theoretic method for nonparametric, nonlinear dimensionality reduction, ...
The Gaussian kernel and its traditional normalizations (e.g., row-stochastic) are popular approaches...
We consider the problem of analyzing data for which no straight forward and meaningful Euclidean rep...
We are increasingly confronted with very high dimensional data from speech,images, genomes, and othe...
The hypothesis that high dimensional data tends to lie in the vicinity of a low di-mensional manifol...
We consider practical density estimation from large data sets sampled on manifolds embedded in Eucli...
Certains jeux de données présentent des caractéristiques géométriques et topologiques non triviales ...
The amount of data is continuously increasing through online databases such as Flicker1. Not only is...