In this work, we investigate the usefulness of vision-language models (VLMs) and large language models for binary few-shot classification of medical images. We utilize the GPT-4 model to generate text descriptors that encapsulate the shape and texture characteristics of objects in medical images. Subsequently, these GPT-4 generated descriptors, alongside VLMs pre-trained on natural images, are employed to classify chest X-rays and breast ultrasound images. Our results indicate that few-shot classification of medical images using VLMs and GPT-4 generated descriptors is a viable approach. However, accurate classification requires to exclude certain descriptors from the calculations of the classification scores. Moreover, we assess the ability...
Few-shot classification aims to adapt to new tasks with limited labeled examples. To fully use the a...
The classification of benign and malignant based on ultrasound images is of great value because brea...
Multi-modal foundation models are typically trained on millions of pairs of natural images and text ...
Humans can obtain the knowledge of novel visual concepts from language descriptions, and we thus use...
Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across ...
The large-scale pre-trained vision language models (VLM) have shown remarkable domain transfer capab...
With the availability of large-scale, comprehensive, and general-purpose vision-language (VL) datase...
The prowess that makes few-shot learning desirable in medical image analysis is the efficient use of...
Few-shot learning aims to train models that can be generalized to novel classes with only a few samp...
Modern image classification is based upon directly predicting model classes via large discriminative...
Accurate diagnosis and early detection of various disease conditions are key to improving living con...
In this paper, we propose a new method for improving the performance of 2D descriptors by building a...
Deep learning approaches applied to medical imaging have reached near-human or better-than-human per...
ABSTRACT Diagnostic radiology struggles to maintain high interpretation accuracy. Retrieval of past...
Diagnostic radiology struggles to maintain high interpretation accuracy. Retrieval of past similar c...
Few-shot classification aims to adapt to new tasks with limited labeled examples. To fully use the a...
The classification of benign and malignant based on ultrasound images is of great value because brea...
Multi-modal foundation models are typically trained on millions of pairs of natural images and text ...
Humans can obtain the knowledge of novel visual concepts from language descriptions, and we thus use...
Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across ...
The large-scale pre-trained vision language models (VLM) have shown remarkable domain transfer capab...
With the availability of large-scale, comprehensive, and general-purpose vision-language (VL) datase...
The prowess that makes few-shot learning desirable in medical image analysis is the efficient use of...
Few-shot learning aims to train models that can be generalized to novel classes with only a few samp...
Modern image classification is based upon directly predicting model classes via large discriminative...
Accurate diagnosis and early detection of various disease conditions are key to improving living con...
In this paper, we propose a new method for improving the performance of 2D descriptors by building a...
Deep learning approaches applied to medical imaging have reached near-human or better-than-human per...
ABSTRACT Diagnostic radiology struggles to maintain high interpretation accuracy. Retrieval of past...
Diagnostic radiology struggles to maintain high interpretation accuracy. Retrieval of past similar c...
Few-shot classification aims to adapt to new tasks with limited labeled examples. To fully use the a...
The classification of benign and malignant based on ultrasound images is of great value because brea...
Multi-modal foundation models are typically trained on millions of pairs of natural images and text ...