This folder contains R code for a rule-based Buddhist Sanskrit Segmenter and Lemmatiser, as well as data necessary to use and evaluate the Segmenter and explanatory materials. The segmenter has been tested on 639 sentences from 13 Buddhist text (9 sūtras, 4 śāstra) and has been evaluated as achieving 97% accuracy. The code and materials contained in this folder have been developed as part of a Newton International Fellowship at King's College London, funded by the British Academy (NF161436) Contents R code for segmentation, lemmatisation, normalization and evaluation (includes instructions to run code) powerpoint presentation with background and explanation of project Wordlists and Wordlists documentation ngrams and stems frequenc...