ODIN, the Online Database of INterlinear text, is a resource built over language data harvested from linguistic documents (Lewis, 2006). It currently holds approximately 190,000 instances of Interlinear Glossed Text (IGT) from over 1100 languages, automatically extracted from nearly 3000 documents crawled from the Web. A crucial step in building ODIN is identifying the languages of extracted IGT, a challenging task due to the large number of languages and the lack of training data. We demonstrate that a coreference approach to the language ID task significantly outperforms existing algorithms as it provides an elegant solution to the unseen language problem. We also discuss several issues that make automated Language ID and the maintenance ...
This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic ex...
© 2014 Dr. Marco LuiLanguage identification is the task of determining the natural language that a d...
The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with...
ODIN, the Online Database of INterlinear text, is a resource built over language data harvested from...
Abstract ODIN, the Online Database of INterlinear text, is a resource built over language data harve...
While the amount of digitally available data on the worlds’ languages is steadily increasing, with m...
Thesis (Ph.D.)--University of Washington, 2016-08This dissertation examines the suitability of Inter...
While the amount of digitally available data on the worlds' languages is steadily increasing, with m...
In this paper, we describe the expansion of the ODIN resource, a database containing many thousands ...
Language identification of written text has been studied for several decades. Despite this fact, mos...
Linguists seek insight from all human languages, however accessing information from most of the full...
In this paper, we reconsider the problem of language identification of multilingual documents. Autom...
In this paper we present Kratylos, at www.kratylos.org, a web application that creates searchable mu...
Efforts on language documentation have been increasing in the past. While the amount of digital data...
In this paper we present Kratylos, at www.kratylos.org/, a web application that creates searchable m...
This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic ex...
© 2014 Dr. Marco LuiLanguage identification is the task of determining the natural language that a d...
The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with...
ODIN, the Online Database of INterlinear text, is a resource built over language data harvested from...
Abstract ODIN, the Online Database of INterlinear text, is a resource built over language data harve...
While the amount of digitally available data on the worlds’ languages is steadily increasing, with m...
Thesis (Ph.D.)--University of Washington, 2016-08This dissertation examines the suitability of Inter...
While the amount of digitally available data on the worlds' languages is steadily increasing, with m...
In this paper, we describe the expansion of the ODIN resource, a database containing many thousands ...
Language identification of written text has been studied for several decades. Despite this fact, mos...
Linguists seek insight from all human languages, however accessing information from most of the full...
In this paper, we reconsider the problem of language identification of multilingual documents. Autom...
In this paper we present Kratylos, at www.kratylos.org, a web application that creates searchable mu...
Efforts on language documentation have been increasing in the past. While the amount of digital data...
In this paper we present Kratylos, at www.kratylos.org/, a web application that creates searchable m...
This paper presents APiCS-Ligt, an LLOD version of a collection of interlinear glossed linguistic ex...
© 2014 Dr. Marco LuiLanguage identification is the task of determining the natural language that a d...
The paper introduces Ligt, a native RDF vocabulary for representing linguistic examples as text with...