The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation. About half of the corpus is also manually annotated with syntactic dependencies, named entities, and verbal multiword expressions. The annotations of the ssj500k corpus follow (1) the MULTEXT-East V5 morphosyntactic specifications for Slovene, http://nl.ijs.si/ME/V5/msd/, (2) the JOS dependency schema, http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf, (3) the Janes Annotation guidelines for Slovenian named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf, and the Guidelines of the PARSEME shared task on verbal multiword expres...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The jos1M corpus contains 1 million words of sampled paragraphs from the Gigafida corpus. It is mean...
The ssj500k training corpus contains 500,000 words, manually annotated on the levels of tokenization...
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenis...
The ssj500k training corpus is based on two training corpora built within the JOS project (http://nl...
The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisati...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is mean...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokeni...
The jos1M corpus contains 1 million words of sampled paragraphs from the Gigafida corpus. It is mean...
The ssj500k training corpus contains 500,000 words, manually annotated on the levels of tokenization...
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenis...
The ssj500k training corpus is based on two training corpora built within the JOS project (http://nl...
The SETimes.SR training corpus contains 86 726 tokens manually annotated on the levels of tokenisati...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
The jos1M corpus contains 1 million words of sampled paragraphs from the FidaPLUS corpus. It is mean...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Tag is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is mea...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...
Janes-Norm is a manually annotated corpus of Slovene Computer-Mediated Communication (CMC). It is me...