This is a proof-of-concept Sanskrit corpus developed for the study of Buddhist Sanskrit lexicology. It comprises: 367 lemmatized and metadata-enriched Buddhist Sanskrit texts for a total of ~ 7 million words. a tokenised reference corpus of general Sanskrit including 267 texts for a total of ~ 13 million words a metadata table with information about each text in the Buddhist and Reference corpora stemmed and normalised version of the Buddhist corpus & sketch grammar for use in Sketch Engine The corpora is in romanised Sanskrit (UTF-8 encoding) Limitations The corpus is currently undergoing proofreading, there still are several segmentation and lemmatization errors. We are grateful to have received an Ashoka grant from the Khyen...