This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. It comprises: 131 metadata-enriched Buddhist Sanskrit texts for a total of ~ 4 million words (~ 8 million tokens) a ~ 2 million words reference corpus comprising 30 metadata-enriched non-Buddhist Sanskrit texts. The corpus is in romanised Sanskrit (UTF-8 encoding) and is available in three configurations: segmented (with dash-separated words) segmented and stemmed (with capitalised word stem and compounds separated by an @ sign). segmented, stemmed and normalised (normalisation treats some spelling variation and solves sandhi of stems' initials in most cases), recommended for Word Sketches. The latter version can be used t...
Because of the traditional reverence for oral composition and recitation in Sanskrit literature, mos...
This repository contains: the semantically annotated lexical dataset powering the Visual Dictiona...
One of the important features of Sanskrit language is the long tradition of lexicons. The early sour...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a Sanskrit corpus developed at the Mangalam Research Center (Berkeley, California) for the s...
The work was accepted in Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, S...
Lexical datasets containing annotated concordances of words pertaining to the conceptual domains of ...
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as ...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...
Sanskrit is one of the most ancient attested Indo-European languages, and it has one of the oldest l...
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a ...
This article is an edition of thirty-one Sanskrit–Tocharian bilingual fragments of the Udānavarga: t...
These data were used for the study published in: Lugli, Ligeia. 2019. Words or terms? Models of ter...
Because of the traditional reverence for oral composition and recitation in Sanskrit literature, mos...
This repository contains: the semantically annotated lexical dataset powering the Visual Dictiona...
One of the important features of Sanskrit language is the long tradition of lexicons. The early sour...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a Sanskrit corpus developed at the Mangalam Research Center (Berkeley, California) for the s...
The work was accepted in Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, S...
Lexical datasets containing annotated concordances of words pertaining to the conceptual domains of ...
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as ...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...
Sanskrit is one of the most ancient attested Indo-European languages, and it has one of the oldest l...
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a ...
This article is an edition of thirty-one Sanskrit–Tocharian bilingual fragments of the Udānavarga: t...
These data were used for the study published in: Lugli, Ligeia. 2019. Words or terms? Models of ter...
Because of the traditional reverence for oral composition and recitation in Sanskrit literature, mos...
This repository contains: the semantically annotated lexical dataset powering the Visual Dictiona...
One of the important features of Sanskrit language is the long tradition of lexicons. The early sour...