In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art disks and networks. These transcoding functions make little use of the single-instruction-multiple-data (SIMD) instructions available on commodity processors. By designing transcoding algorithms for SIMD instructions, we multiply the speed of transcoding on current systems (x64 and ARM). To ensure reproducibility, we make our software freely available as an open source library
ISBN 978-1-61284-208-0International audienceThis paper presents a strategy to speed-up the simulatio...
This text is a practical guide for linguists, and programmers, who work with data in multilingual co...
ASCII was developed when every computer was an island and over 35 years before the first emoji appea...
Intel includes in its recent processors a powerful set of instructions capable of processing 512-bit...
The majority of text is stored in UTF-8, which must be validated on ingestion. We present the lookup...
We often represent text using Unicode formats (UTF-8 and UTF-16). The UTF-8 format is increasingly p...
Many common document formats on the Internet are text-only such as email (MIME) and the Web (HTML, J...
SIMD instructions are used to speed up multimedia ap-plications in high performance embedded computi...
Unicode strings encoded using Unicode Transformation Format 8-bit (UTF-8) are widely used for the re...
Across the world's languages and cultures, most writing systems predate the use of computers. In the...
A universal character encoding is required to produce software that can be localized for any languag...
Abstract. Current processors include instruction set extensions espe-cially designed for improving t...
This chapter first briefly reviews the history of character encoding. Following from this is a discu...
Single instruction multiple data (SIMD) instructions have been commonly used to accelerate video cod...
Data compression is important in the computing process because it helps to reduce the space occupied...
ISBN 978-1-61284-208-0International audienceThis paper presents a strategy to speed-up the simulatio...
This text is a practical guide for linguists, and programmers, who work with data in multilingual co...
ASCII was developed when every computer was an island and over 35 years before the first emoji appea...
Intel includes in its recent processors a powerful set of instructions capable of processing 512-bit...
The majority of text is stored in UTF-8, which must be validated on ingestion. We present the lookup...
We often represent text using Unicode formats (UTF-8 and UTF-16). The UTF-8 format is increasingly p...
Many common document formats on the Internet are text-only such as email (MIME) and the Web (HTML, J...
SIMD instructions are used to speed up multimedia ap-plications in high performance embedded computi...
Unicode strings encoded using Unicode Transformation Format 8-bit (UTF-8) are widely used for the re...
Across the world's languages and cultures, most writing systems predate the use of computers. In the...
A universal character encoding is required to produce software that can be localized for any languag...
Abstract. Current processors include instruction set extensions espe-cially designed for improving t...
This chapter first briefly reviews the history of character encoding. Following from this is a discu...
Single instruction multiple data (SIMD) instructions have been commonly used to accelerate video cod...
Data compression is important in the computing process because it helps to reduce the space occupied...
ISBN 978-1-61284-208-0International audienceThis paper presents a strategy to speed-up the simulatio...
This text is a practical guide for linguists, and programmers, who work with data in multilingual co...
ASCII was developed when every computer was an island and over 35 years before the first emoji appea...