In this dissertation, the thesis that deep neural networks are suited for analysis of visual, textual and fused visual and textual content is discussed. This work evaluates the ability of deep neural networks to learn automatic multimodal representations in either unsupervised or supervised manners and brings the following main contributions:1) Recurrent neural networks for spoken language understanding (slot filling): different architectures are compared for this task with the aim of modeling both the input context and output label dependencies.2) Action prediction from single images: we propose an architecture that allow us to predict human actions from a single image. The architecture is evaluated on videos, by utilizing solely one frame...
International audienceCommon approaches to problems involving multiple modalities (classification, r...
Digital technologies have become instrumental in transforming our society. Recent statistical method...
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framewor...
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des c...
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain task...
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain task...
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain task...
Notre perception est par nature multimodale, i.e. fait appel à plusieurs de nos sens. Pour résoudre ...
International audienceVideo hyperlinking represents a classical example of multimodal problems. Comm...
International audienceVideo hyperlinking represents a classical example of multimodal problems. Comm...
International audienceVideo hyperlinking represents a classical example of multimodal problems. Comm...
International audienceVideo hyperlinking represents a classical example of multimodal problems. Comm...
While our representation of the world is shaped by our perceptions, our languages, and our interacti...
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: than...
International audienceCommon approaches to problems involving multiple modalities (classification, r...
International audienceCommon approaches to problems involving multiple modalities (classification, r...
Digital technologies have become instrumental in transforming our society. Recent statistical method...
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framewor...
La thèse porte sur le développement d'architectures neuronales profondes permettant d'analyser des c...
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain task...
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain task...
Our perception is by nature multimodal, i.e. it appeals to many of our senses. To solve certain task...
Notre perception est par nature multimodale, i.e. fait appel à plusieurs de nos sens. Pour résoudre ...
International audienceVideo hyperlinking represents a classical example of multimodal problems. Comm...
International audienceVideo hyperlinking represents a classical example of multimodal problems. Comm...
International audienceVideo hyperlinking represents a classical example of multimodal problems. Comm...
International audienceVideo hyperlinking represents a classical example of multimodal problems. Comm...
While our representation of the world is shaped by our perceptions, our languages, and our interacti...
Recent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: than...
International audienceCommon approaches to problems involving multiple modalities (classification, r...
International audienceCommon approaches to problems involving multiple modalities (classification, r...
Digital technologies have become instrumental in transforming our society. Recent statistical method...
Deep neural networks have boosted the convergence of multimedia data analytics in a unified framewor...