Corpus of contemporary written (printed) Czech sized 3.6 GW (i.e. 4.3 billion tokens). It covers mostly the period of 1990–2014 and it is a traditional corpus (as opposed to the web-crawled corpora) with rich metadata containing bibliographical information etc. Although it contains a wide range of text types (fiction, non-fiction, newspapers), the newspapers prevail noticeably. The corpus is lemmatized and morphologically annotated by a combination of stochastic and rule-based methods. The corpus is provided in a (semi-XML) vertical format used as an input to the Manatee query engine. The data thus correspond to the corpus available via the KonText query interface to registered users of the CNC at http://www.korpus.cz with one important ...
In our paper, we present main results of the Czech grant project Internet as a Language Corpus, whos...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
Corpus of contemporary written (printed) Czech sized 3.6 GW (i.e. 4.3 billion tokens). It covers mos...
Representative corpus of contemporary written Czech sized 100 MW. It was created as a representation...
Corpus of contemporary written (printed) Czech sized 4.7 GW (i.e. 5.7 billion tokens). It covers mos...
Corpus of contemporary Czech newspapers and magazines sized 300 MW. It contains various titles publi...
Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of wr...
Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of wr...
Corpus of contemporary Czech newspapers and magazines sized 700 MW. It contains various titles publi...
Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of wr...
Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,...
Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts...
Corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 221 recordings made in 200...
Corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 221 recordings made in 200...
In our paper, we present main results of the Czech grant project Internet as a Language Corpus, whos...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
Corpus of contemporary written (printed) Czech sized 3.6 GW (i.e. 4.3 billion tokens). It covers mos...
Representative corpus of contemporary written Czech sized 100 MW. It was created as a representation...
Corpus of contemporary written (printed) Czech sized 4.7 GW (i.e. 5.7 billion tokens). It covers mos...
Corpus of contemporary Czech newspapers and magazines sized 300 MW. It contains various titles publi...
Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of wr...
Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of wr...
Corpus of contemporary Czech newspapers and magazines sized 700 MW. It contains various titles publi...
Balanced corpus of contemporary written Czech sized 100 MW. It was created as a representation of wr...
Etalon is a manually annotated corpus of contemporary Czech. The corpus contains 1,885,589 words (2,...
Diachronic corpus of Czech sized 3.45 million words (i.e. 4.1 million tokens). It contains 116 texts...
Corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 221 recordings made in 200...
Corpus of informal spoken Czech sized 1 MW. It contains transcriptions of 221 recordings made in 200...
In our paper, we present main results of the Czech grant project Internet as a Language Corpus, whos...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...
ORAL2013 is designed as a representation of authentic spoken Czech used in informal situations (priv...