Understanding single modality mediums including audio, visual, and language have achieved great success due to the high learning ability of neural network modules such as Convolution Neural Network (CNN), Long Term Short Term Memory (LSTM), Transformer and MLP-Mixer. However, a joint understanding of language and visual medium is still challenging as features in different modalities have a different distributions and lie in different manifolds.Thanks to the development of better visual and langauge backbone design, effective attention mechanism, and better multi-modality bilinear fusion approaches, the performance of Vision Linguistic Reasoning (VLR) systems, including, Visual Question Answering, Image Captioning, and Referring Expression h...
M.Phil.Learning to distinguish objects in our world using their attributes requires both common sens...
[[abstract]]本研究旨在發展遊戲式網路評量系統-GAM-WATA(Game Assessment Module of the Web-based Assessment and Test An...
Ph.D.This thesis proposes an end-to-end neural framework for expressive text-to-speech (E-TTS) synth...
Ph.D.Over the past a few years, the computer vision community has witnessed great success achieved i...
Image registration, the processing of finding meaningful correspondences between two or multiple ima...
As the sizes of modern circuit designs become bigger and bigger, implementing those large circuits i...
Novel motor task learning by one hand unilaterally results in an auto-gain of performance in the unt...
Ph.D.This thesis mainly investigates the use of posteriorgram-to-acoustic modeling forunconstrained ...
Deep learning in visual understanding and editing tasks has witnessed great success in recent years,...
Ph.D.3D point clouds are standard outputs of 3D scanning devices and depth sensors. Due to the popul...
M.Phil.Object detection, which deals with finding instances of semantic objects of predefined classe...
In this thesis, The Representation, Robustness and Transparency in Deep Graph Learning, we study the...
[[abstract]]網路教學(web-based instruction)已成為學校與企業廣為運用的知識管理工具之一,其主要目的為將部份的外顯與內隱知識建構在教材中,並順利的移轉給學習者。如何建構...
Ph.D.Maximum inner product search (MIPS) has a number of important applications such as recommendati...
Image classification, which attempts to comprehend an entire image as a whole and to classify the im...
M.Phil.Learning to distinguish objects in our world using their attributes requires both common sens...
[[abstract]]本研究旨在發展遊戲式網路評量系統-GAM-WATA(Game Assessment Module of the Web-based Assessment and Test An...
Ph.D.This thesis proposes an end-to-end neural framework for expressive text-to-speech (E-TTS) synth...
Ph.D.Over the past a few years, the computer vision community has witnessed great success achieved i...
Image registration, the processing of finding meaningful correspondences between two or multiple ima...
As the sizes of modern circuit designs become bigger and bigger, implementing those large circuits i...
Novel motor task learning by one hand unilaterally results in an auto-gain of performance in the unt...
Ph.D.This thesis mainly investigates the use of posteriorgram-to-acoustic modeling forunconstrained ...
Deep learning in visual understanding and editing tasks has witnessed great success in recent years,...
Ph.D.3D point clouds are standard outputs of 3D scanning devices and depth sensors. Due to the popul...
M.Phil.Object detection, which deals with finding instances of semantic objects of predefined classe...
In this thesis, The Representation, Robustness and Transparency in Deep Graph Learning, we study the...
[[abstract]]網路教學(web-based instruction)已成為學校與企業廣為運用的知識管理工具之一,其主要目的為將部份的外顯與內隱知識建構在教材中,並順利的移轉給學習者。如何建構...
Ph.D.Maximum inner product search (MIPS) has a number of important applications such as recommendati...
Image classification, which attempts to comprehend an entire image as a whole and to classify the im...
M.Phil.Learning to distinguish objects in our world using their attributes requires both common sens...
[[abstract]]本研究旨在發展遊戲式網路評量系統-GAM-WATA(Game Assessment Module of the Web-based Assessment and Test An...
Ph.D.This thesis proposes an end-to-end neural framework for expressive text-to-speech (E-TTS) synth...