Linguists and speech researchers who use statistical methods often need to estimate the frequency of some type of item in a population containing items of various types. A common approach is to divide the number of cases observed in a sample by the size of the sample; sometimes small positive quantities are added to divisor and dividend in order to avoid zero estimates for types missing from the sample. These approaches are obvious and simple, but they lack principled justification, and yield estimates that can be wildly inaccurate. I.J. Good and Alan Turing developed a family of theoretically well‐founded techniques appropriate to this domain. Some versions of the Good‐Turing approach are very demanding computationally, but we define a ver...
Motivated by diverse applications in ecology, genetics, and language modeling, researchers in learni...
The field of machine translation is almost as old as the modern digital computer. In 1949 Warren Wea...
Abstract. Studies of word frequency distributions have been used in linguisties for some time and ar...
Good-Turing frequency estimation (Good, ) is a simple, effective method for predicting detection pro...
We present results concerning the application of the Good{Turing (GT) estimation method to the frequ...
We present results concerning the application of the Good-Turing (GT) estimation method to the frequ...
In a critical review of the heuristics used to deal with zero word frequencies, we show that four ar...
Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet no method...
Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 Augu...
The problem of estimating discovery probabilities has regained popularity in recent years due to its...
Nowadays statistical tools are often used tool in linguistics, but the reliability of these methods ...
In the space of the last ten years, statistical methods have gone from being virtually unknown in co...
It has been argued that the actual distribution of word frequencies could be reproduced or explained...
The term statistical methods here refers to a methodology that has been dominant in computational ...
Statistics is known to be a quantitative approach to research. However, most of the research done in...
Motivated by diverse applications in ecology, genetics, and language modeling, researchers in learni...
The field of machine translation is almost as old as the modern digital computer. In 1949 Warren Wea...
Abstract. Studies of word frequency distributions have been used in linguisties for some time and ar...
Good-Turing frequency estimation (Good, ) is a simple, effective method for predicting detection pro...
We present results concerning the application of the Good{Turing (GT) estimation method to the frequ...
We present results concerning the application of the Good-Turing (GT) estimation method to the frequ...
In a critical review of the heuristics used to deal with zero word frequencies, we show that four ar...
Estimating distributions over large alphabets is a fundamental machine-learning tenet. Yet no method...
Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 Augu...
The problem of estimating discovery probabilities has regained popularity in recent years due to its...
Nowadays statistical tools are often used tool in linguistics, but the reliability of these methods ...
In the space of the last ten years, statistical methods have gone from being virtually unknown in co...
It has been argued that the actual distribution of word frequencies could be reproduced or explained...
The term statistical methods here refers to a methodology that has been dominant in computational ...
Statistics is known to be a quantitative approach to research. However, most of the research done in...
Motivated by diverse applications in ecology, genetics, and language modeling, researchers in learni...
The field of machine translation is almost as old as the modern digital computer. In 1949 Warren Wea...
Abstract. Studies of word frequency distributions have been used in linguisties for some time and ar...