Changes Added block_size to quanteda_options() to control the number of documents in blocked tokenization. Fixed print.dictionary2() to control the printing of nested levels with max_nkey (#1967) Added textstat_summary() to provide detailed information about dfm, tokens and corpus objects. It will replace summary() in future versions. Fixed a performance issue causing slowdowns in tokenizing (using the default what = "word") corpora with large numbers of documents that contain social media tags and URLs that needed to be preserved (such a large corpus of Tweets). Updated the (default) "word" tokenizer to preserve hashtags and usernames better with non-ASCII text, and made these patterns user-configurable in quanteda_options(). The followi...
New Features tokens_segment() has a new window argument, permitting selection within an asymmetric ...
Changes and additions Implements a "word4" tokeniser that is based on new RBBI (RuleBasedBreakItera...
Changes Moved data_corpus_irishbudget2010 and data_corpus_dailnoconf1991 to the quanteda.textmodels...
quanteda 2.0 introduces some major changes, detailed here. What's new in v2.0 New corpus object str...
Bug fixes and stability enhancements Fixed bug in dfm_compress() and dfm_group() that changed or de...
New features Improvements and consoldiation of methods for detecting multi-word expressions, now ac...
New features Add flatten and levels arguments to as.list.dictionary2() to enable more flexible conv...
New Features Added vertex_labelfont to textplot_network(). Added textmodel_lsa() for Latent Semanti...
Changes since v0.9.9-50 New features Corpus construction using corpus() now works for a tm::SimpleC...
New Features Added as.dfm() methods for tm DocumentTermMatrix and TermDocumentMatrix objects. (#122...
Bug fixes and minor feature additions. Changes since v0.9.9-3 Bug fixes Fixed a bug causing dfm and...
Bug fixes and stability enhancements Fixed a bug causing incorrect counting in fcm(x, ordered = TRU...
Bug fixes and stability enhancements fcm() computes the marginal frequency of upper-case tokens cor...
New Features Added an nsentence() method for spacyr parsed objects. (#1289) Bug fixes and stabili...
Last 1.x.x release before major changes in v2. New features Added Yule's I to textstat_lexdiv(). Ad...
New Features tokens_segment() has a new window argument, permitting selection within an asymmetric ...
Changes and additions Implements a "word4" tokeniser that is based on new RBBI (RuleBasedBreakItera...
Changes Moved data_corpus_irishbudget2010 and data_corpus_dailnoconf1991 to the quanteda.textmodels...
quanteda 2.0 introduces some major changes, detailed here. What's new in v2.0 New corpus object str...
Bug fixes and stability enhancements Fixed bug in dfm_compress() and dfm_group() that changed or de...
New features Improvements and consoldiation of methods for detecting multi-word expressions, now ac...
New features Add flatten and levels arguments to as.list.dictionary2() to enable more flexible conv...
New Features Added vertex_labelfont to textplot_network(). Added textmodel_lsa() for Latent Semanti...
Changes since v0.9.9-50 New features Corpus construction using corpus() now works for a tm::SimpleC...
New Features Added as.dfm() methods for tm DocumentTermMatrix and TermDocumentMatrix objects. (#122...
Bug fixes and minor feature additions. Changes since v0.9.9-3 Bug fixes Fixed a bug causing dfm and...
Bug fixes and stability enhancements Fixed a bug causing incorrect counting in fcm(x, ordered = TRU...
Bug fixes and stability enhancements fcm() computes the marginal frequency of upper-case tokens cor...
New Features Added an nsentence() method for spacyr parsed objects. (#1289) Bug fixes and stabili...
Last 1.x.x release before major changes in v2. New features Added Yule's I to textstat_lexdiv(). Ad...
New Features tokens_segment() has a new window argument, permitting selection within an asymmetric ...
Changes and additions Implements a "word4" tokeniser that is based on new RBBI (RuleBasedBreakItera...
Changes Moved data_corpus_irishbudget2010 and data_corpus_dailnoconf1991 to the quanteda.textmodels...