Many archival recordings of speech from endangered languages remain unannotated and inaccessible to community members and language learning programs. One bottleneck is the time-intensive nature of annotation. An even narrower bottleneck occurs for recordings with access constraints, such as language that must be vetted or filtered by authorised community members before annotation can begin. We propose a privacy-preserving workflow to widen both bottlenecks for recordings where speech in the endangered language is intermixed with a more widely-used language such as English for meta-linguistic commentary and questions (e.g. What is the word for 'tree'?). We integrate voice activity detection (VAD), spoken language identification (SLI), and au...
The newest generation of speech technology caused a huge increase of audio-visual data nowadays bein...
In the last few decades, many scientists were concerned with the fast extinction of languages. Faced...
This paper presents an extension to a very low-resource parallel corpus collected in an endangered ...
Many archival recordings of speech from endangered languages remain unannotated and inaccessible to ...
In crude quantitative terms, Zipf’s law tells us that documentation of something as simple as word u...
Generating accurate word-level transcripts of recorded speech for language documentation is difficul...
As the world moves towards a more globalized scenario, it has brought along with it the extinction o...
The Language Archive manages one of the largest and most varied sets of natural language data. This ...
Pre-trained speech representations like wav2vec 2.0 are a powerful tool for automatic speech recogni...
Evolution and changes of all modern languages is a wellknown fact. However, recently it is reaching ...
Automatic speech recognition (ASR) for low-resource languages is an active field of research. Over t...
Technological developments in the last decades enabled an unprecedented growth in volumes and qualit...
Interoperable annotation formats are fundamental to the utility, expansion, and sustainability of co...
New technologies are seen as an opportunity to 'save' endangered languages. But is this the real cha...
Endangered language documentation places linguists in a competition with time. Comparing to the pre-...
The newest generation of speech technology caused a huge increase of audio-visual data nowadays bein...
In the last few decades, many scientists were concerned with the fast extinction of languages. Faced...
This paper presents an extension to a very low-resource parallel corpus collected in an endangered ...
Many archival recordings of speech from endangered languages remain unannotated and inaccessible to ...
In crude quantitative terms, Zipf’s law tells us that documentation of something as simple as word u...
Generating accurate word-level transcripts of recorded speech for language documentation is difficul...
As the world moves towards a more globalized scenario, it has brought along with it the extinction o...
The Language Archive manages one of the largest and most varied sets of natural language data. This ...
Pre-trained speech representations like wav2vec 2.0 are a powerful tool for automatic speech recogni...
Evolution and changes of all modern languages is a wellknown fact. However, recently it is reaching ...
Automatic speech recognition (ASR) for low-resource languages is an active field of research. Over t...
Technological developments in the last decades enabled an unprecedented growth in volumes and qualit...
Interoperable annotation formats are fundamental to the utility, expansion, and sustainability of co...
New technologies are seen as an opportunity to 'save' endangered languages. But is this the real cha...
Endangered language documentation places linguists in a competition with time. Comparing to the pre-...
The newest generation of speech technology caused a huge increase of audio-visual data nowadays bein...
In the last few decades, many scientists were concerned with the fast extinction of languages. Faced...
This paper presents an extension to a very low-resource parallel corpus collected in an endangered ...