It is of great interest to researchers and scholars in many disciplines (particularly those working on cultural heritage projects) to study parallel passages (i.e., identical or similar pieces of text describing the same thing) in digital text archives. Although there exist a few software tools for this purpose, they are restricted to a specific domain (e.g., the Bible) or a specific language (e.g., Hebrew). In this paper, we present in detail how we build a digital infrastructure that can facilitate the search and discovery of parallel passages for any domain in any language. It is at the core of our Samtla (Search And Mining Tools with Linguistic Analysis) system designed in collaboration with historians and linguists. The system has alre...
Our generation has experienced one of the most dramatic changes in how society communicates. Today, ...
The abundance of Bible citations in old Lithuanian writings makes it difficult to study their relati...
This special issue originates in the International workshop on computer aided¬processing of intertex...
It is of great interest to researchers and scholars in many disciplines (particularly those working ...
The term “parallel passage” refers to identical, or approximate text patterns of variable length, wh...
Purpose: The purpose of this paper is to present a language-agnostic approach to facilitate the disc...
We propose a method for efficiently finding all parallel passages in a largecorpus, even if the pass...
Parallel texts complement other documentary resources such as dictionaries, glossaries, and terminol...
In this article we develop an algorithm to detect parallel texts in the Masoretic Text of the Hebrew...
My presentation at the "Classical Philology Goes Digital" workshop in Potsdam (16-17 February 2017),...
This paper presents a workflow to systematically compare translations of Ancient Greek into English...
Parallel corpora are a valuable resource for machine translation, but at present their availability ...
The linguistic features of material in Cultural Heritage (CH) archives may be in various languages r...
International audienceThis paper describes the LINA system for the BUCC 2015 shared track. Following...
Advances in text mining and natural language processing methodologies have the potential to producti...
Our generation has experienced one of the most dramatic changes in how society communicates. Today, ...
The abundance of Bible citations in old Lithuanian writings makes it difficult to study their relati...
This special issue originates in the International workshop on computer aided¬processing of intertex...
It is of great interest to researchers and scholars in many disciplines (particularly those working ...
The term “parallel passage” refers to identical, or approximate text patterns of variable length, wh...
Purpose: The purpose of this paper is to present a language-agnostic approach to facilitate the disc...
We propose a method for efficiently finding all parallel passages in a largecorpus, even if the pass...
Parallel texts complement other documentary resources such as dictionaries, glossaries, and terminol...
In this article we develop an algorithm to detect parallel texts in the Masoretic Text of the Hebrew...
My presentation at the "Classical Philology Goes Digital" workshop in Potsdam (16-17 February 2017),...
This paper presents a workflow to systematically compare translations of Ancient Greek into English...
Parallel corpora are a valuable resource for machine translation, but at present their availability ...
The linguistic features of material in Cultural Heritage (CH) archives may be in various languages r...
International audienceThis paper describes the LINA system for the BUCC 2015 shared track. Following...
Advances in text mining and natural language processing methodologies have the potential to producti...
Our generation has experienced one of the most dramatic changes in how society communicates. Today, ...
The abundance of Bible citations in old Lithuanian writings makes it difficult to study their relati...
This special issue originates in the International workshop on computer aided¬processing of intertex...