Word embeddings map words to a high dimensional vector space, where words with similar meanings have similar vectors. We analyzed the problem of automatic identification of verbal idioms in Slovene using features built from embeddings of single words and groups of words. For this purpose, we built two data sets that contain verbal idioms and random word groups described with corresponding features. Using these data sets we evaluated the classification of verbal idioms with support vector machines, random forests, and logistic regression. All three methods were successful, the best being random forests. Due to large computational time and limitation to only identify groups of words with precomputed word embeddings the approach requires furth...
In order to predict the behaviour of networks with machine-learning algorithms, the vector represent...
Manual transcription of speech is slow and is being replaced by automatic speech recognition systems...
Povzemanje besedil naslavlja problem naraščujoče količine tekstovnih podatkov, v katerih želimo odkr...
Word embeddings map words to a high dimensional vector space, where words with similar meanings have...
Sloleks is a lexicon of Slovene word forms which contains - in a structured database - Slovene words...
We aim to learn comma placing using machine learning. Our approach is¸based on adding new attributes...
The goal of this thesis is to create a sentiment dictionary for the Slovenian language which can be ...
Natural language processing greatly depends on a sufficient amount of training data. When handling ...
There is no simple algorithm for stress assignment of Slovene words. Speakers of Slovene are usually...
In this thesis we attempted to implement a slovene chat agent. The agent would serve as an interface...
Cilj diplomske naloge je razvoj klasifikatorja za prepoznavo protipomenk. Za izdelavo rešitve je bil...
The thesis deals with part of speech tagging of Slovene language. Part of speech tagging is a proces...
Slovenian dialect words are covered in various books and publications that have been published over ...
The aim of the thesis is to add the rules for comma usage to the LanguageTool program. Using the Lek...
Natural language processing is an important area of computational linguistics and artificial intelli...
In order to predict the behaviour of networks with machine-learning algorithms, the vector represent...
Manual transcription of speech is slow and is being replaced by automatic speech recognition systems...
Povzemanje besedil naslavlja problem naraščujoče količine tekstovnih podatkov, v katerih želimo odkr...
Word embeddings map words to a high dimensional vector space, where words with similar meanings have...
Sloleks is a lexicon of Slovene word forms which contains - in a structured database - Slovene words...
We aim to learn comma placing using machine learning. Our approach is¸based on adding new attributes...
The goal of this thesis is to create a sentiment dictionary for the Slovenian language which can be ...
Natural language processing greatly depends on a sufficient amount of training data. When handling ...
There is no simple algorithm for stress assignment of Slovene words. Speakers of Slovene are usually...
In this thesis we attempted to implement a slovene chat agent. The agent would serve as an interface...
Cilj diplomske naloge je razvoj klasifikatorja za prepoznavo protipomenk. Za izdelavo rešitve je bil...
The thesis deals with part of speech tagging of Slovene language. Part of speech tagging is a proces...
Slovenian dialect words are covered in various books and publications that have been published over ...
The aim of the thesis is to add the rules for comma usage to the LanguageTool program. Using the Lek...
Natural language processing is an important area of computational linguistics and artificial intelli...
In order to predict the behaviour of networks with machine-learning algorithms, the vector represent...
Manual transcription of speech is slow and is being replaced by automatic speech recognition systems...
Povzemanje besedil naslavlja problem naraščujoče količine tekstovnih podatkov, v katerih želimo odkr...