This is the repository for word segmentation in sanskrit using energy based models. # Word Segmentation in Sanskrit Using Energy Based Models ## Getting Started Please download the 2 compressed files 'dir.zip' and 'wordsegmentation.rar' to your working directory and extract them into folders named 'dir' and 'wordsegmentation' respectively. Your working directory should be as follows * Working Directory * wordsegmentation * skt_dcs_DS.bz2_4K_bigram_mir_10K * skt_dcs_DS.bz2_4K_bigram_mir_heldout * dir ## Prerequisites * Python3 * scipy * numpy * csv * pickle * multiprocessing * bz2 ## Instructions for Training Change your current directory to 'dir' Run the file Train_clique.py by using the following comm...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
Due to the expansion of data generation, more and more natural language processing (NLP) tasks are n...
Dzongkha, the national language of Bhutan, is continuous in written form and it fails to mark the w...
This is the repository for word segmentation in sanskrit using energy based models. # Word Segme...
The work was accepted in Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, S...
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...
Pali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbo...
Due to the powerful development of internet use, the amount of unstructuredMyanmar text data has inc...
Existing state of the art approaches for Sanskrit Dependency Parsing (SDP), are hybrid in nature, an...
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a ...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...
Thai is a low-resource language, so it is often the case that data is not available in sufficient qu...
Word segmentation is a basic task and animportant problem in natural language processing. InMyanmar ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
Due to the expansion of data generation, more and more natural language processing (NLP) tasks are n...
Dzongkha, the national language of Bhutan, is continuous in written form and it fails to mark the w...
This is the repository for word segmentation in sanskrit using energy based models. # Word Segme...
The work was accepted in Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, S...
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...
Pali Sandhi is a phonetic transformation from two words into a new word. The phonemes of the neighbo...
Due to the powerful development of internet use, the amount of unstructuredMyanmar text data has inc...
Existing state of the art approaches for Sanskrit Dependency Parsing (SDP), are hybrid in nature, an...
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a ...
Myanmar sentences are written as contiguoussequences of syllables with no characters delimiting thew...
Thai is a low-resource language, so it is often the case that data is not available in sufficient qu...
Word segmentation is a basic task and animportant problem in natural language processing. InMyanmar ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
Due to the expansion of data generation, more and more natural language processing (NLP) tasks are n...
Dzongkha, the national language of Bhutan, is continuous in written form and it fails to mark the w...