Recently a number of studies demonstrated impressive performance on diverse vision-language multimodal tasks such as image captioning and visual question answering by extending the self-attention based Transformer architecture with multimodal pre-training objectives. Despite its huge potential, vision-language multimodal pre-training in the medical domain has only recently received attention, and only demonstrated improved diagnosis accuracy of vision-language pre-trained models. In this work we explore a broad set of multimodal representation learning tasks in the medical domain, specifically using radiology images and the unstructured report. We propose a new model which adopts a Transformer based architecture combined with a novel multim...
Recent advances in vision and language (V+L) models have a promising impact in the healthcare field....
Large-scale pretrained foundation models have been an emerging paradigm for building artificial inte...
Radiology report generation (RRG) has gained increasing research attention because of its huge poten...
With the availability of large-scale, comprehensive, and general-purpose vision-language (VL) datase...
Medical visual question answering (VQA) is a challenging task that requires answering clinical quest...
Medical image visual question answering (VQA) is a task to answer clinical questions, given a radiog...
Multimodal learning, here defined as learning from multiple input data types, has exciting potential...
The large-scale pre-trained vision language models (VLM) have shown remarkable domain transfer capab...
As transformer evolves, pre-trained models have advanced at a breakneck pace in recent years. They h...
My thesis develops machine learning methods that exploit multimodal clinical data to improve medical...
With the burgeoning amount of data of image-text pairs and diversity of Vision-and-Language (V&L) ta...
This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that...
In the past few years, the emergence of pre-training models has brought uni-modal fields such as com...
In recent years, joint text-image embeddings have significantly improved thanks to the development o...
Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneousl...
Recent advances in vision and language (V+L) models have a promising impact in the healthcare field....
Large-scale pretrained foundation models have been an emerging paradigm for building artificial inte...
Radiology report generation (RRG) has gained increasing research attention because of its huge poten...
With the availability of large-scale, comprehensive, and general-purpose vision-language (VL) datase...
Medical visual question answering (VQA) is a challenging task that requires answering clinical quest...
Medical image visual question answering (VQA) is a task to answer clinical questions, given a radiog...
Multimodal learning, here defined as learning from multiple input data types, has exciting potential...
The large-scale pre-trained vision language models (VLM) have shown remarkable domain transfer capab...
As transformer evolves, pre-trained models have advanced at a breakneck pace in recent years. They h...
My thesis develops machine learning methods that exploit multimodal clinical data to improve medical...
With the burgeoning amount of data of image-text pairs and diversity of Vision-and-Language (V&L) ta...
This paper presents a unified Vision-Language Pre-training (VLP) model. The model is unified in that...
In the past few years, the emergence of pre-training models has brought uni-modal fields such as com...
In recent years, joint text-image embeddings have significantly improved thanks to the development o...
Computer-assisted diagnostic and prognostic systems of the future should be capable of simultaneousl...
Recent advances in vision and language (V+L) models have a promising impact in the healthcare field....
Large-scale pretrained foundation models have been an emerging paradigm for building artificial inte...
Radiology report generation (RRG) has gained increasing research attention because of its huge poten...