Student-teacher training allows a large teacher model or ensemble of teachers to be compressed into a single student model, for the purpose of efficient decoding. However, current approaches in automatic speech recognition assume that the state clusters, often defined by Phonetic Decision Trees (PDT), are the same across all models. This limits the diversity that can be captured within the ensemble, and also the flexibility when selecting the complexity of the student model output. This paper examines an extension to student-teacher training that allows for the possibility of having different PDTs between teachers, and also for the student to have a different PDT from the teacher. The proposal is to train the student to emulate the logical ...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...
In automatic speech recognition, performance gains can often be obtained by combining an ensemble of...
The performance of automatic speech recognition can often be significantly improved by combining mul...
The performance of automatic speech recognition can often be significantly improved by combining mul...
For many tasks in machine learning, performance gains can often be obtained by combining together an...
Ensemble methods often yield significant gains for automatic speech recognition. One method to obtai...
In this paper, a new decision tree-based clustering technique called Phonetic, Dimensional and State...
A high performance automatic speech recognition (ASR) system is an important constituent component o...
In this paper, a fast segmental clustering approach to decision tree tying based acoustic modeling i...
Teacher-student learning can be applied in automatic speech recognition for model compression and do...
A high performance automatic speech recognition (ASR) system is an important constituent component o...
In speech synthesis with sparse training data, phonetic decision trees are frequently used for balan...
Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...
In automatic speech recognition, performance gains can often be obtained by combining an ensemble of...
The performance of automatic speech recognition can often be significantly improved by combining mul...
The performance of automatic speech recognition can often be significantly improved by combining mul...
For many tasks in machine learning, performance gains can often be obtained by combining together an...
Ensemble methods often yield significant gains for automatic speech recognition. One method to obtai...
In this paper, a new decision tree-based clustering technique called Phonetic, Dimensional and State...
A high performance automatic speech recognition (ASR) system is an important constituent component o...
In this paper, a fast segmental clustering approach to decision tree tying based acoustic modeling i...
Teacher-student learning can be applied in automatic speech recognition for model compression and do...
A high performance automatic speech recognition (ASR) system is an important constituent component o...
In speech synthesis with sparse training data, phonetic decision trees are frequently used for balan...
Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...
We investigate multi-sensor modeling of teachers\u27 instructional segments (e.g., lecture, group wo...