Person identification using audio (speech) and vi-sual (facial appearance, static or dynamic) modalities, either independently or jointly, is a thoroughly investi-gated problem in pattern recognition. In this work, we explore a novel task: person identification in a cross-modal scenario, i.e., matching the speaker in an au-dio recording to the same speaker in a video recording, where the two recordings have been made during differ-ent sessions, using speaker specific information which is common to both the audio and video modalities. Several recent psychological studies have shown how humans can indeed perform this task with an accuracy signifi-cantly higher than chance. Here we propose two sys-tems which can solve this task comparably well...
Multimedia databases are growing rapidly in size in the digital age. To increase the value of these ...
Comunicació presentada a: The First International Evaluation Workshop on Classification of Events, A...
The objective of this paper is to learn representations of speaker identity without access to manual...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalit...
We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which...
Abstract-The research presented in this paper describes audio-visual speaker identification experime...
This paper describes a multi-modal person recognition sys-tem for video broadcast developed for part...
We propose an audio-visual target identification approach for egocentric data with cross-modal model...
We propose a person identification technique that can recognize and verify people from unconstrained...
In this paper we present a person identification system based on a combination of acoustic features ...
In this paper we present a person identification system based on a combination of acoustic features ...
. In this paper we investigate benefits of classifier combination (fusion) for a multimodal system f...
Recent years have seen a surge in finding association between faces and voices within a cross-modal ...
Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal f...
Comunicació presentada a: IV Jornadas en Tecnología del Habla, celebrat del 8 al 10 de novembre de 2...
Multimedia databases are growing rapidly in size in the digital age. To increase the value of these ...
Comunicació presentada a: The First International Evaluation Workshop on Classification of Events, A...
The objective of this paper is to learn representations of speaker identity without access to manual...
Person identification using audio (speech) and visual (facial appearance, static or dynamic) modalit...
We introduce a seemingly impossible task: given only an audio clip of someone speaking, decide which...
Abstract-The research presented in this paper describes audio-visual speaker identification experime...
This paper describes a multi-modal person recognition sys-tem for video broadcast developed for part...
We propose an audio-visual target identification approach for egocentric data with cross-modal model...
We propose a person identification technique that can recognize and verify people from unconstrained...
In this paper we present a person identification system based on a combination of acoustic features ...
In this paper we present a person identification system based on a combination of acoustic features ...
. In this paper we investigate benefits of classifier combination (fusion) for a multimodal system f...
Recent years have seen a surge in finding association between faces and voices within a cross-modal ...
Abstract This thesis presents a novel method of audio-visual fusion, known as multi- modal optimal f...
Comunicació presentada a: IV Jornadas en Tecnología del Habla, celebrat del 8 al 10 de novembre de 2...
Multimedia databases are growing rapidly in size in the digital age. To increase the value of these ...
Comunicació presentada a: The First International Evaluation Workshop on Classification of Events, A...
The objective of this paper is to learn representations of speaker identity without access to manual...