This is a corpus of Buddhist Sanskrit Literature developed at the Mangalam Research Center (Berkeley, California) for the study of Buddhist Sanskrit lexicology. It comprises: 368 lemmatized and metadata-enriched Buddhist Sanskrit texts for a total of ~ 7 million words. a tokenised reference corpus of general Sanskrit including 267 texts for a total of ~ 13 million words a metadata table with information about each text in the Buddhist and Reference corpora stemmed and normalised version of the Buddhist corpus & sketch grammar for use in Sketch Engine for questions and feedback, please contact Ligeia Lugli, project director: ligeia.lugli@kcl.ac.uk Lemmatization notes The corpora are in romanised Sanskrit (UTF-8 encoding). Where ...
The Buddhist Translators Workbench (BTW) offers an interactive digital environment for scholars and ...
Sanskrit is one of the most ancient attested Indo-European languages, and it has one of the oldest l...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...
This is a Sanskrit corpus developed at the Mangalam Research Center (Berkeley, California) for the s...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
Lexical datasets containing annotated concordances of words pertaining to the conceptual domains of ...
This repository contains the lexicographic datasets developed for a proof of concept of a Buddhist S...
This repository contains: the semantically annotated lexical dataset powering the Visual Dictiona...
These data were used for the study published in: Lugli, Ligeia. 2019. Words or terms? Models of ter...
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as ...
The work was accepted in Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, S...
Against the backdrop of the argument of the incomprehensibility of Buddhist English language to non-...
The Buddhist Translators Workbench (BTW) offers an interactive digital environment for scholars and ...
Sanskrit is one of the most ancient attested Indo-European languages, and it has one of the oldest l...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...
This is a Sanskrit corpus developed at the Mangalam Research Center (Berkeley, California) for the s...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. ...
Lexical datasets containing annotated concordances of words pertaining to the conceptual domains of ...
This repository contains the lexicographic datasets developed for a proof of concept of a Buddhist S...
This repository contains: the semantically annotated lexical dataset powering the Visual Dictiona...
These data were used for the study published in: Lugli, Ligeia. 2019. Words or terms? Models of ter...
This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as ...
The work was accepted in Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, S...
Against the backdrop of the argument of the incomprehensibility of Buddhist English language to non-...
The Buddhist Translators Workbench (BTW) offers an interactive digital environment for scholars and ...
Sanskrit is one of the most ancient attested Indo-European languages, and it has one of the oldest l...
We describe an innovative computer interface designed to assist annotators in the efficient selectio...