The Impact of Image Resolution on Biomedical Multimodal Large Language Models

2510.18304v1 cs.CV, cs.CL 2025-10-23

Авторы:

Liangyu Chen, James Burgess, Jeffrey J Nirschl, Orr Zohar, Serena Yeung-Levy

Abstract

Imaging technologies are fundamental to biomedical research and modern medicine, requiring analysis of high-resolution images across various modalities. While multimodal large language models (MLLMs) show promise for biomedical image analysis, most are designed for low-resolution images from general-purpose datasets, risking critical information loss. We investigate how image resolution affects MLLM performance in biomedical applications and demonstrate that: (1) native-resolution training and inference significantly improve performance across multiple tasks, (2) misalignment between training and inference resolutions severely degrades performance, and (3) mixed-resolution training effectively mitigates misalignment and balances computational constraints with performance requirements. Based on these findings, we recommend prioritizing native-resolution inference and mixed-resolution datasets to optimize biomedical MLLMs for transformative impact in scientific research and clinical applications.

Ссылки и действия

Читать на arXiv Скачать PDF

Дополнительные ресурсы:

The Impact of Image Resolution on Biomedical Multimodal Large Language Models

Авторы:

Abstract

Ссылки и действия

Связанные статьи

Text-Only Training for Image Captioning with Retrieval Augmentation and Modality...

Generalized Medical Phrase Grounding

CartoMapQA: A Fundamental Benchmark Dataset Evaluating Vision-Language Models on...

Thinking with Programming Vision: Towards a Unified View for Thinking with Image...

See, Think, Learn: A Self-Taught Multimodal Reasoner

Навигация