Phoneme boundary detection has been studied due to its central role in various speech applications. In this work, we point out that this task needs to be addressed not only by algorithmic way, but also by evaluation metric. To this end, we first propose a state-of-the-art phoneme boundary detector that operates in an autoregressive manner, dubbed SuperSeg. Experiments on the TIMIT and Buckeye corpora demonstrates that SuperSeg identifies phoneme boundaries with significant margin compared to existing models. Furthermore, we note that there is a limitation on the popular evaluation metric, R-value, and propose new evaluation metrics that prevent each boundary from contributing to evaluation multiple times. The proposed metrics reveal the wea...
Large quantities of speech data are needed to irnprove the performance of speech recognitiol't ...
We have been developing a reliable method of prosodic word boundary detection for Japanese continuou...
Consistent phoneme segmentation is essential in building high quality Text-to-Speech (TTS) voice fon...
Despite using different algorithms, most unsupervised automatic phone segmentation methods achieve s...
Abstract—Automatic phone segmentation techniques based on model selection criteria are studied. We i...
We consider the problem of word boundary detection in spontaneous speech utterances. Acoustic featur...
Automatic phone segmentation techniques based on model selection criteria are studied. We investigat...
We apply transfer learning to the task of phoneme segmentation and demonstrate the utility of repres...
We describe models of prosodic phrasing trained on multiple languages to identify boundaries in an u...
Evaluation of speaker segmentation and diarization normally makes use of forgiveness collars around ...
The determination of right boundaries during phoneme segmentation of a speech signal is an important...
In an automatic speaker verification (ASV) system with prompted passwords, we use vocabulary-depende...
We describe models of prosodic phrasing trained on multiple languages to identify boundaries in an u...
This paper introduces a word boundary detection algorithm that works in a variety of noise condition...
Phone segmentation in ASR is usually performed indirectly by Viterbi decoding of HMM output. Direct ...
Large quantities of speech data are needed to irnprove the performance of speech recognitiol't ...
We have been developing a reliable method of prosodic word boundary detection for Japanese continuou...
Consistent phoneme segmentation is essential in building high quality Text-to-Speech (TTS) voice fon...
Despite using different algorithms, most unsupervised automatic phone segmentation methods achieve s...
Abstract—Automatic phone segmentation techniques based on model selection criteria are studied. We i...
We consider the problem of word boundary detection in spontaneous speech utterances. Acoustic featur...
Automatic phone segmentation techniques based on model selection criteria are studied. We investigat...
We apply transfer learning to the task of phoneme segmentation and demonstrate the utility of repres...
We describe models of prosodic phrasing trained on multiple languages to identify boundaries in an u...
Evaluation of speaker segmentation and diarization normally makes use of forgiveness collars around ...
The determination of right boundaries during phoneme segmentation of a speech signal is an important...
In an automatic speaker verification (ASV) system with prompted passwords, we use vocabulary-depende...
We describe models of prosodic phrasing trained on multiple languages to identify boundaries in an u...
This paper introduces a word boundary detection algorithm that works in a variety of noise condition...
Phone segmentation in ASR is usually performed indirectly by Viterbi decoding of HMM output. Direct ...
Large quantities of speech data are needed to irnprove the performance of speech recognitiol't ...
We have been developing a reliable method of prosodic word boundary detection for Japanese continuou...
Consistent phoneme segmentation is essential in building high quality Text-to-Speech (TTS) voice fon...