This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. It comprises: 225 metadata-enriched Buddhist Sanskrit texts for a total of ~ 6 million words. The corpus is in romanised Sanskrit (UTF-8 encoding) and is available in three configurations: segmented and stemmed (with capitalised word stem and compounds separated by an @ sign). segmented, stemmed and normalised (normalisation treats some spelling variation and solves sandhi of stems' initials in most cases), recommended for Word Sketches. lemmatized (vertical file, currently as csv conllu version will be available once the corpus has been proofread) The latter version can be used to generate word sketches in Sketch Engine if u...
This article is an edition of thirty-one Sanskrit–Tocharian bilingual fragments of the Udānavarga: t...
This repository contains: the semantically annotated lexical dataset powering the Visual Dictiona...
This repository contains the lexicographic datasets developed for a proof of concept of a Buddhist S...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a Sanskrit corpus developed at the Mangalam Research Center (Berkeley, California) for the s...
The work was accepted in Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, S...
Lexical datasets containing annotated concordances of words pertaining to the conceptual domains of ...
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as ...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...
Sanskrit is one of the most ancient attested Indo-European languages, and it has one of the oldest l...
These data were used for the study published in: Lugli, Ligeia. 2019. Words or terms? Models of ter...
Because of the traditional reverence for oral composition and recitation in Sanskrit literature, mos...
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a ...
This article is an edition of thirty-one Sanskrit–Tocharian bilingual fragments of the Udānavarga: t...
This repository contains: the semantically annotated lexical dataset powering the Visual Dictiona...
This repository contains the lexicographic datasets developed for a proof of concept of a Buddhist S...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a Sanskrit corpus developed at the Mangalam Research Center (Berkeley, California) for the s...
The work was accepted in Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, S...
Lexical datasets containing annotated concordances of words pertaining to the conceptual domains of ...
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as ...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...
Sanskrit is one of the most ancient attested Indo-European languages, and it has one of the oldest l...
These data were used for the study published in: Lugli, Ligeia. 2019. Words or terms? Models of ter...
Because of the traditional reverence for oral composition and recitation in Sanskrit literature, mos...
Sanskrit has a rich source of lexical resources in the form of various kinds of dictionaries, and a ...
This article is an edition of thirty-one Sanskrit–Tocharian bilingual fragments of the Udānavarga: t...
This repository contains: the semantically annotated lexical dataset powering the Visual Dictiona...
This repository contains the lexicographic datasets developed for a proof of concept of a Buddhist S...