This corpus can be used to build and evaluate methods for extracting and presenting knowledge based on a semantic hypergraph. The corpus consists of 184 simple, complex and dependently complex sentences. All sentences are marked on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, lemmatisation, syntactic dependencies, named entities, and semantic roles. This resource also includes, a representation of a subset of 176 sentences in the form of a semantic hypergraph that can be used to evaluate knowledge extraction methods for Croatian. The sentences used in this corpora are taken from the textbook: Hudeček, L., Mihaljević, M., Sršen, J. and Čamagajevac, S. (2017). Hrvatska Školska Gramatika. Zagreb: Institut z...
Given the extraordinary growth in online documents, methods for automated extraction of semantic rel...
The Croatian web corpus MaCoCu-hr 1.0 was built by crawling the ".hr" internet top-level domain in 2...
Abstract. This paper will present an approach for knowledge extraction from unstructured content. Un...
Ogromna količina ljudskog znanja zapisana je u nestrukturiranom obliku, a pretvaranje znanja u struk...
Leksičkosemantički jezični resursi nezaobilazni su za semantičku obradu prirodnog jezika i mnoge zad...
SenseGraph a graph-like structure of word senses of most common words of the standard Croatian langu...
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenis...
The aim of this paper is to investigate morphological and syntactical levels of sentences of Croatia...
U ovom radu definiran je skup semantičkih okvira za manji broj glagola koji su bili najzastupljeniji...
Tema završnog rada jest izgradnja N-gram jezičnog modela na temelju korpusa hrvatske Wikipedije. U p...
In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmen...
Since Croatian is a highly flective language there is a need for morphological normalization of natu...
The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-...
This paper presents experiments for enlarging the Croatian Morphological Lexicon by applying an auto...
This paper revolves around CROATPAS (Marini & Ježek 2019), a digital lexicographic resource for Croa...
Given the extraordinary growth in online documents, methods for automated extraction of semantic rel...
The Croatian web corpus MaCoCu-hr 1.0 was built by crawling the ".hr" internet top-level domain in 2...
Abstract. This paper will present an approach for knowledge extraction from unstructured content. Un...
Ogromna količina ljudskog znanja zapisana je u nestrukturiranom obliku, a pretvaranje znanja u struk...
Leksičkosemantički jezični resursi nezaobilazni su za semantičku obradu prirodnog jezika i mnoge zad...
SenseGraph a graph-like structure of word senses of most common words of the standard Croatian langu...
The hr500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenis...
The aim of this paper is to investigate morphological and syntactical levels of sentences of Croatia...
U ovom radu definiran je skup semantičkih okvira za manji broj glagola koji su bili najzastupljeniji...
Tema završnog rada jest izgradnja N-gram jezičnog modela na temelju korpusa hrvatske Wikipedije. U p...
In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmen...
Since Croatian is a highly flective language there is a need for morphological normalization of natu...
The hrenWaC corpus version 2.0 consists of parallel Croatian-English texts crawled from the .hr top-...
This paper presents experiments for enlarging the Croatian Morphological Lexicon by applying an auto...
This paper revolves around CROATPAS (Marini & Ježek 2019), a digital lexicographic resource for Croa...
Given the extraordinary growth in online documents, methods for automated extraction of semantic rel...
The Croatian web corpus MaCoCu-hr 1.0 was built by crawling the ".hr" internet top-level domain in 2...
Abstract. This paper will present an approach for knowledge extraction from unstructured content. Un...