Changes and additions Implements a "word4" tokeniser that is based on new RBBI (RuleBasedBreakIterator) rules, implemented in a new .yml file that can be edited and changed by users, but whose defaults represent a significant improvement in pattern handling for words, sentences, and other forms of patterns. These rules are customised from the ICU rules for breaks, with the standard and customised rules found now in the breakrules/ system folder, so that they could, in principle, be modified by the user. Other minor changes: changes how elapsed time is recorded, by creating a global environment to record these in (aaa.R) improves several of the R-coded patterns that apply to "word2": the hashtag pattern (`pattern_hashtag) the separator p...
New Features Added an nsentence() method for spacyr parsed objects. (#1289) Bug fixes and stabili...
New features Add flatten and levels arguments to as.list.dictionary2() to enable more flexible conv...
Bug fixes and stability enhancements dfm() returns a dfm with the identical column order even if to...
Changes Added block_size to quanteda_options() to control the number of documents in blocked tokeni...
Bug fixes and stability enhancements Fixed bug in dfm_compress() and dfm_group() that changed or de...
New Features tokens_segment() has a new window argument, permitting selection within an asymmetric ...
New features Improvements and consoldiation of methods for detecting multi-word expressions, now ac...
Bug fixes and stability enhancements Fixed a bug causing incorrect counting in fcm(x, ordered = TRU...
New Features Added vertex_labelfont to textplot_network(). Added textmodel_lsa() for Latent Semanti...
Changes Moved data_corpus_irishbudget2010 and data_corpus_dailnoconf1991 to the quanteda.textmodels...
New Features Added to = "tripletlist" output type for convert(), to convert a dfm into a simple tri...
Bug fixes and minor feature additions. Changes since v0.9.9-3 Bug fixes Fixed a bug causing dfm and...
Changes since v0.9.9-50 New features Corpus construction using corpus() now works for a tm::SimpleC...
Last 1.x.x release before major changes in v2. New features Added Yule's I to textstat_lexdiv(). Ad...
quanteda 2.0 introduces some major changes, detailed here. What's new in v2.0 New corpus object str...
New Features Added an nsentence() method for spacyr parsed objects. (#1289) Bug fixes and stabili...
New features Add flatten and levels arguments to as.list.dictionary2() to enable more flexible conv...
Bug fixes and stability enhancements dfm() returns a dfm with the identical column order even if to...
Changes Added block_size to quanteda_options() to control the number of documents in blocked tokeni...
Bug fixes and stability enhancements Fixed bug in dfm_compress() and dfm_group() that changed or de...
New Features tokens_segment() has a new window argument, permitting selection within an asymmetric ...
New features Improvements and consoldiation of methods for detecting multi-word expressions, now ac...
Bug fixes and stability enhancements Fixed a bug causing incorrect counting in fcm(x, ordered = TRU...
New Features Added vertex_labelfont to textplot_network(). Added textmodel_lsa() for Latent Semanti...
Changes Moved data_corpus_irishbudget2010 and data_corpus_dailnoconf1991 to the quanteda.textmodels...
New Features Added to = "tripletlist" output type for convert(), to convert a dfm into a simple tri...
Bug fixes and minor feature additions. Changes since v0.9.9-3 Bug fixes Fixed a bug causing dfm and...
Changes since v0.9.9-50 New features Corpus construction using corpus() now works for a tm::SimpleC...
Last 1.x.x release before major changes in v2. New features Added Yule's I to textstat_lexdiv(). Ad...
quanteda 2.0 introduces some major changes, detailed here. What's new in v2.0 New corpus object str...
New Features Added an nsentence() method for spacyr parsed objects. (#1289) Bug fixes and stabili...
New features Add flatten and levels arguments to as.list.dictionary2() to enable more flexible conv...
Bug fixes and stability enhancements dfm() returns a dfm with the identical column order even if to...