ABSTRACT In a system for detecting and measuring phonetic events (here bursts, voice onsets, and voice-onset times), we show that the addition of features smoothed at multiple scales can improve both recall (the proportion of events correctly identified) and measurement accuracy (the timing of events and the difference between event times, relative to expert human judgments). Multi-scale (or "scale space") features had an especially strong positive effect on robustness across datasets with different materials and recording conditions. Standard machine-learning classifiers were able to integrate information across scales, without any special treatment of the multiscale features
Models based on diverse attention mechanisms have recently shined in tasks related to acoustic event...
<p>Speech perception flexibly adapts to short-term regularities of ambient speech input. Recent rese...
International audienceAs human listeners, it seems that we should be experts in processing vocal sou...
Natural sounds contain information on multiple timescales, so the auditory system must analyze and i...
Including information distributed over intervals of syllabic duration (100--250 ms) may greatly impr...
The acoustic-phonetic modeling component of most current speech recognition sys-tems calculates a sm...
Reverberation in speech degrades the performance of speech recognition systems, leading to higher wo...
While a lot of progress has been made during the last years in the field of Automatic Speech recogni...
Robustness against temporal variations is important for emotion recognition from speech audio, since...
Scale-space filtering, proposed by Witkin (ICASSP 84) for describing natural structure in one-dimens...
To communicate effectively animals need to detect temporal vocalization cues that vary over several ...
The frame-synchronized framework has dominated many speech processing systems, such as ASR and AED t...
Acoustic features derived from the short time magnitude and phase spectrum provide complementary inf...
Real-world acoustic events span a wide range of time and frequency resolutions, from short clicks to...
We propose a Multigranular Automatic Speech Recognizer. The hypothesis is that speech signal contai...
Models based on diverse attention mechanisms have recently shined in tasks related to acoustic event...
<p>Speech perception flexibly adapts to short-term regularities of ambient speech input. Recent rese...
International audienceAs human listeners, it seems that we should be experts in processing vocal sou...
Natural sounds contain information on multiple timescales, so the auditory system must analyze and i...
Including information distributed over intervals of syllabic duration (100--250 ms) may greatly impr...
The acoustic-phonetic modeling component of most current speech recognition sys-tems calculates a sm...
Reverberation in speech degrades the performance of speech recognition systems, leading to higher wo...
While a lot of progress has been made during the last years in the field of Automatic Speech recogni...
Robustness against temporal variations is important for emotion recognition from speech audio, since...
Scale-space filtering, proposed by Witkin (ICASSP 84) for describing natural structure in one-dimens...
To communicate effectively animals need to detect temporal vocalization cues that vary over several ...
The frame-synchronized framework has dominated many speech processing systems, such as ASR and AED t...
Acoustic features derived from the short time magnitude and phase spectrum provide complementary inf...
Real-world acoustic events span a wide range of time and frequency resolutions, from short clicks to...
We propose a Multigranular Automatic Speech Recognizer. The hypothesis is that speech signal contai...
Models based on diverse attention mechanisms have recently shined in tasks related to acoustic event...
<p>Speech perception flexibly adapts to short-term regularities of ambient speech input. Recent rese...
International audienceAs human listeners, it seems that we should be experts in processing vocal sou...