In a critical review of the heuristics used to deal with zero word frequencies, we show that four are suboptimal, one is good, and one may be acceptable. The four suboptimal strategies are discarding words with zero frequencies, giving words with zero frequencies a very low frequency, adding 1 to the frequency per million, and making use of the Good-Turing algorithm. The good algorithm is the Laplace transformation, which consists of adding 1 to each frequency count and increasing the total corpus size by the number of word types observed. A strategy that may be acceptable is to guess the frequency of absent words on the basis of other corpora and then increasing the total corpus size by the estimated summed frequency of the missing words. ...
It is well known that occurrence counts of words in documents are often modeled poorly by standard...
We present word frequencies based on subtitles of British television programs. We show that the SUBT...
In this Perspective Article we assess the usefulness of Google’s new word frequencies for word recog...
In a critical review of the heuristics used to deal with zero word frequencies, we show that four ar...
Linguists and speech researchers who use statistical methods often need to estimate the frequency of...
It has been argued that the actual distribution of word frequencies could be reproduced or explained...
Resulting from inter-disciplinary research with Linguistics, this book addressed limitations of earl...
Comparing frequency counts over texts or corpora is an important task in many applications and scien...
Given the lack of empirical corpus-based frequency counts in many languages, it would be useful and ...
Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 Augu...
Abstract. Comparing frequency counts over texts or corpora is an im-portant task in many application...
Abstract. Comparing frequency counts over texts or corpora is an im-portant task in many application...
Word frequency is the most important variable in research on word processing and memory. Yet, the ma...
Geoffrey Leech, Paul Rayson and Andrew Wilson. Word Frequencies in Written and Spoken English. 2001,...
We review recent evidence indicating that researchers in experimental psychology may have used subop...
It is well known that occurrence counts of words in documents are often modeled poorly by standard...
We present word frequencies based on subtitles of British television programs. We show that the SUBT...
In this Perspective Article we assess the usefulness of Google’s new word frequencies for word recog...
In a critical review of the heuristics used to deal with zero word frequencies, we show that four ar...
Linguists and speech researchers who use statistical methods often need to estimate the frequency of...
It has been argued that the actual distribution of word frequencies could be reproduced or explained...
Resulting from inter-disciplinary research with Linguistics, this book addressed limitations of earl...
Comparing frequency counts over texts or corpora is an important task in many applications and scien...
Given the lack of empirical corpus-based frequency counts in many languages, it would be useful and ...
Paper presented at the 5th Strathmore International Mathematics Conference (SIMC 2019), 12 - 16 Augu...
Abstract. Comparing frequency counts over texts or corpora is an im-portant task in many application...
Abstract. Comparing frequency counts over texts or corpora is an im-portant task in many application...
Word frequency is the most important variable in research on word processing and memory. Yet, the ma...
Geoffrey Leech, Paul Rayson and Andrew Wilson. Word Frequencies in Written and Spoken English. 2001,...
We review recent evidence indicating that researchers in experimental psychology may have used subop...
It is well known that occurrence counts of words in documents are often modeled poorly by standard...
We present word frequencies based on subtitles of British television programs. We show that the SUBT...
In this Perspective Article we assess the usefulness of Google’s new word frequencies for word recog...