improved language models improved tokenizer maintenance and code efficiency added basic language detection (undocumented) Full Changelog: https://github.com/adbar/simplemma/compare/v0.5.0...v0.6.0If you use this software, please cite it using these metadata
What's Changed Update dev doc by @AhmetNSimsek in https://github.com/FZJ-INM1-BDA/siibra-python/pul...
Software developers use a mix of source code and natural language text to communicate with each othe...
New model architecture: DistilBERT Adding Huggingface's new transformer architecture, DistilBERT des...
What's Changed smaller language data footprint with smallest possible impact on performance, using ...
improved models and disambiguation improved tokenization extended rules for Germa
new languages: Armenian, Greek, Macedonian, Norwegian (Bokmål), and Polish language data reviewed fo...
some bug fixes trust the tokenizer to get the default language don't stumble upon empty sentences in...
rework done on compounding in MBMA. (still work in progress) lots of improvement in MBMA rule handli...
Minor adjustments to the LabelledSegmentationVerification interface. Updates to documentation. Impro...
[Ko van der Sloot] fix for https://github.com/LanguageMachines/frog/issues/96 code improvements, re...
New features and bug fixes. Full Changelog: https://github.com/vertesy/Seurat.utils/compare/v1.4.6.....
Word standardisation of non-standard language as found in user-generated content, using cSMTiser (ht...
Trainable Tokenizer is able to tokenize and segment most languages based on supplied configuration a...
Full Changelog: https://github.com/buncybunny/PBR/commits/v1.0.0If you use this software, please cit...
minor bug fixes, mainly how a stationxml is translated as the output from IRIS has changed. Full Cha...
What's Changed Update dev doc by @AhmetNSimsek in https://github.com/FZJ-INM1-BDA/siibra-python/pul...
Software developers use a mix of source code and natural language text to communicate with each othe...
New model architecture: DistilBERT Adding Huggingface's new transformer architecture, DistilBERT des...
What's Changed smaller language data footprint with smallest possible impact on performance, using ...
improved models and disambiguation improved tokenization extended rules for Germa
new languages: Armenian, Greek, Macedonian, Norwegian (Bokmål), and Polish language data reviewed fo...
some bug fixes trust the tokenizer to get the default language don't stumble upon empty sentences in...
rework done on compounding in MBMA. (still work in progress) lots of improvement in MBMA rule handli...
Minor adjustments to the LabelledSegmentationVerification interface. Updates to documentation. Impro...
[Ko van der Sloot] fix for https://github.com/LanguageMachines/frog/issues/96 code improvements, re...
New features and bug fixes. Full Changelog: https://github.com/vertesy/Seurat.utils/compare/v1.4.6.....
Word standardisation of non-standard language as found in user-generated content, using cSMTiser (ht...
Trainable Tokenizer is able to tokenize and segment most languages based on supplied configuration a...
Full Changelog: https://github.com/buncybunny/PBR/commits/v1.0.0If you use this software, please cit...
minor bug fixes, mainly how a stationxml is translated as the output from IRIS has changed. Full Cha...
What's Changed Update dev doc by @AhmetNSimsek in https://github.com/FZJ-INM1-BDA/siibra-python/pul...
Software developers use a mix of source code and natural language text to communicate with each othe...
New model architecture: DistilBERT Adding Huggingface's new transformer architecture, DistilBERT des...