This text is a practical guide for linguists, and programmers, who work with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together at the intersection between the Unicode Standard and the International Phonetic Alphabet. Although these standards are often met with frustration by users, they nevertheless provide language researchers and programmers with a consistent computational architecture needed to process, publish and analyze lexical data from the world's languages. Thus we bring to light common, but not always transparent, pitfalls which researchers face when working with Unicode and IPA. Having identified an...
In order to diversify the sources of information that can handle our platform "Tetralogie" dedicated...
Academic research about digital non-Latin script (hereafter: NLS) research data can pose a number of...
A core concern for E-MELD is the need for a common standard for the digitalization of linguistic dat...
This text is a practical guide for linguists, and programmers, who work with data in multilingual co...
This text is a practical guide for linguists/ and programmers/ who work with data in multilingual co...
This text is a practical guide for linguists, and programmers, who work with data in multilingual co...
Across the world's languages and cultures, most writing systems predate the use of computers. In the...
A universal character encoding is required to produce software that can be localized for any languag...
We agree with Frost that the variety of orthographies in the world's languages complicates the task ...
16p. actes du séminaire sous clé USBInternational audienceIn order to diversify the sources of infor...
A linguist uses various kinds of linguistic data – both text corpora or text collections and dictio...
This thesis describes our improvement of word sense translation for under-resourced languages utiliz...
This chapter first briefly reviews the history of character encoding. Following from this is a discu...
Writing technology is a central issue for Human Language Technology (HLT) both in terms of theory an...
There are a multitude of programming languages in use today; dozens of very popular languages with w...
In order to diversify the sources of information that can handle our platform "Tetralogie" dedicated...
Academic research about digital non-Latin script (hereafter: NLS) research data can pose a number of...
A core concern for E-MELD is the need for a common standard for the digitalization of linguistic dat...
This text is a practical guide for linguists, and programmers, who work with data in multilingual co...
This text is a practical guide for linguists/ and programmers/ who work with data in multilingual co...
This text is a practical guide for linguists, and programmers, who work with data in multilingual co...
Across the world's languages and cultures, most writing systems predate the use of computers. In the...
A universal character encoding is required to produce software that can be localized for any languag...
We agree with Frost that the variety of orthographies in the world's languages complicates the task ...
16p. actes du séminaire sous clé USBInternational audienceIn order to diversify the sources of infor...
A linguist uses various kinds of linguistic data – both text corpora or text collections and dictio...
This thesis describes our improvement of word sense translation for under-resourced languages utiliz...
This chapter first briefly reviews the history of character encoding. Following from this is a discu...
Writing technology is a central issue for Human Language Technology (HLT) both in terms of theory an...
There are a multitude of programming languages in use today; dozens of very popular languages with w...
In order to diversify the sources of information that can handle our platform "Tetralogie" dedicated...
Academic research about digital non-Latin script (hereafter: NLS) research data can pose a number of...
A core concern for E-MELD is the need for a common standard for the digitalization of linguistic dat...