The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, and named entities. The annotations (and other aspects) of the corpus are documented in the teiHeader and back element of the TEI encoded corpus. In short, they follow (1) the MULTEXT-East V5 morphosyntactic specifications, http://nl.ijs.si/ME/V5/msd/, (2) the UDv2 Guidelines, http://universaldependencies.org/guidelines.html, and (3) the Janes annotation guidelines for named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
ReLDI-NormTagNER-sr 2.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-stan...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenis...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The jos1M corpus contains 1 million words of sampled paragraphs from the Gigafida corpus. It is mean...
The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is mean...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC) consistin...
The ssj500k training corpus contains 500,000 words, manually annotated on the levels of tokenization...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
ReLDI-NormTagNER-sr 2.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-stan...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenis...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The jos1M corpus contains 1 million words of sampled paragraphs from the Gigafida corpus. It is mean...
The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is mean...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC) consistin...
The ssj500k training corpus contains 500,000 words, manually annotated on the levels of tokenization...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
ReLDI-NormTagNER-sr 2.1 is a manually annotated corpus of Serbian tweets. It is meant as a gold-stan...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...