SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Li, Bohao
Wang, Rui
Wang, Guangzhi
Ge, Yuying
Ge, Yixiao
Shan, Ying

Publication date

August 2023

Language

English

Abstract

Based on powerful Large Language Models (LLMs), recent generative Multimodal Large Language Models (MLLMs) have gained prominence as a pivotal research area, exhibiting remarkable capability for both comprehension and generation. In this work, we address the evaluation of generative comprehension in MLLMs as a preliminary step towards a comprehensive assessment of generative models, by introducing a benchmark named SEED-Bench. SEED-Bench consists of 19K multiple choice questions with accurate human annotations (x 6 larger than existing benchmarks), which spans 12 evaluation dimensions including the comprehension of both the image and video modality. We develop an advanced pipeline for generating multiple-choice questions that target specifi...

Extracted data

We use cookies to provide a better user experience.

Data Protection

SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Abstract

Extracted data

SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension

Abstract

Extracted data

Related items

Related items