This is a Sanskrit corpus developed at the Mangalam Research Center (Berkeley, California) for the study of Buddhist Sanskrit lexicology. It comprises: 436 lemmatized and metadata-enriched Buddhist Sanskrit texts for a total of ~ 7.5 million words. a lemmatized reference corpus of general Sanskrit including 397 texts for a total of ~ 15.5 million words a metadata table with information about each text in the Buddhist and Reference corpora stemmed and normalised version of the Buddhist corpus & sketch grammar for use in Sketch Engine Lemmatization notes The corpora are in romanised Sanskrit (UTF-8 encoding). Where multiple spelling variants involving a nasals are attested, we have normalised the spelling to ṃ. Verbs are lemmatiz...