We address the problem of maximally selected chi-square statistics in the case of a binary Y variable and a nominal X variable with several categories. The distribution of the maximally selected chi-square statistic has already been derived when the best cutpoint is chosen from a continuous or an ordinalX, but not when the best split is chosen from a nominal X. In this paper, we derive the exact distribution of the maximally selected chi-square statistic in this case using a combinatorial approach. Applications of the derived distribution to variable selection and hypothesis testing are discussed based on simulations. As an illustration, our method is applied to a pregnancy and birth data set
The use of goodness-of-fit test statistics for discrete or categorical data is widespread throughout...
In large sample studies where distributions may be skewed and not readily transformed to sym-metry, ...
The maximally selected statistic approach in building tree models is shown to be a cause of variable...
We address the problem of maximally selected chi-square statistics in the case of a binary Y variabl...
The association between a binary variable Y and a variable X with an at least ordinal measurement sc...
Binary outcomes that depend on an ordinal predictor in a non-monotonic way are common in medical dat...
This paper proposes a method of partitioning the total chi-square statistic obtained for matched dic...
We are concerned with three different types of multivariate chi-square distributions. Their members ...
In this work, a simpler algorithm for computing probability values of a Chi-square (χ2) random varia...
The Gini gain is one of the most common variable selection criteria in machine learning. We derive t...
Zero-inflated distributions are common in statistical problems where there is interest in testing ho...
The identification and assessment of prognostic factors is one of the major tasks in clinical resear...
International audienceWe investigate the class of splitting distributions as the composition of a si...
The maximally selected statistic approach in building tree models is shown to be a cause of variable...
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SN...
The use of goodness-of-fit test statistics for discrete or categorical data is widespread throughout...
In large sample studies where distributions may be skewed and not readily transformed to sym-metry, ...
The maximally selected statistic approach in building tree models is shown to be a cause of variable...
We address the problem of maximally selected chi-square statistics in the case of a binary Y variabl...
The association between a binary variable Y and a variable X with an at least ordinal measurement sc...
Binary outcomes that depend on an ordinal predictor in a non-monotonic way are common in medical dat...
This paper proposes a method of partitioning the total chi-square statistic obtained for matched dic...
We are concerned with three different types of multivariate chi-square distributions. Their members ...
In this work, a simpler algorithm for computing probability values of a Chi-square (χ2) random varia...
The Gini gain is one of the most common variable selection criteria in machine learning. We derive t...
Zero-inflated distributions are common in statistical problems where there is interest in testing ho...
The identification and assessment of prognostic factors is one of the major tasks in clinical resear...
International audienceWe investigate the class of splitting distributions as the composition of a si...
The maximally selected statistic approach in building tree models is shown to be a cause of variable...
Most genetic diseases are complex, i.e. associated to combinations of SNPs rather than individual SN...
The use of goodness-of-fit test statistics for discrete or categorical data is widespread throughout...
In large sample studies where distributions may be skewed and not readily transformed to sym-metry, ...
The maximally selected statistic approach in building tree models is shown to be a cause of variable...